Upload
rolf-page
View
216
Download
0
Embed Size (px)
Citation preview
Challenges in Dialogue
Discourse and Dialogue
CMSC 35900-1
October 27, 2006
Roadmap
• Issues in Dialogue– Dialogue vs General Discourse– Dialogue Acts
• Modeling
• Recognition and Interpretation
– Dialogue Management for Computational Agents
Dialogue vs General Discourse
• Key contrast: Two or more speakers– Primary focus on speech
• Issues in multi-party spoken dialogue– Turn-taking – who speaks next, when?– Collaboration – clarification, feedback,…– Disfluencies– Adjacency pairs, dialogue acts
Turn-Taking
• Multi-party discourse– Need to trade off speaker/hearer roles
• Interpret reference from sequential utterances
• When?– End of sentence?
• No: multi-utterance turns
– Silence?• No: little silence in smooth dialogue:< 250ms
– When other starts speaking?• No: relatively little overlap face-to-face: ~5%
Turn-taking: When
• Rule-governed behavior– Possibly multiple legal turn change times
• Aka transition-relevance places (TRP)
• Generally at utterance boundaries– Utterance not necessarily sentence
– In fact, utterance/sentence boundaries not obvious in speech
» Don’t necessarily pause between sentences
• Automatic utterance boundary detection– Cue words (okay, so,..); POS sequences; prosody
Turn-taking: Who & How
• At each TRP in each turn (Sacks 1974)– If speaker has selected A to speak, A must take floor
– If speaker has selected no one to speak, anyone can
– If no one else takes the turn, the speaker can
• Selecting speaker A:– By explicit/implicit mention: What about it, Bob?
• By gaze, function
• Selecting others: questions, greetings, closing– (Traum et al., 2003)
Turn-taking in HCI
• Human turn end:– Detected by 250ms silence
• System turn end:– Signaled by end of speech– Indicated by any human sound
• Barge-in
• Continued attention:– No signal
Gesture, Gaze & Voice
• Range of gestural signals:– head (nod,shake), shoulder, hand, leg, foot
movements; facial expressions; postures; artifacts– Align with syllables
• Units: phonemic clause + change
• Study with recorded exchanges
Yielding the Floor
• Turn change signal– Offer floor to auditor/hearer
• Cues: pitch fall, lengthening, “but uh”, end gesture, amplitude drop+’uh’, end clause
• Likelihood of change increases with more cues
• Negated by any gesticulation
Taking the Floor
• Speaker-state signal– Indicate becoming speaker
• Occurs at beginning of turns
• Cues:– Shift in head direction
• AND/OR
– Start of gesture
Retaining the Floor
• Within-turn signal– Still speaker: Look at hearer as end clause
• Continuation signal– Still speaker: Look away after within-turn/back
• Back-channel:– ‘mmhm’/okay/etc; nods,
• sentence completion. Clarification request; restate
– NOT a turn: signal attention, agreement, confusion
Segmenting Turns
• Speaker alone:– Within-turn signal->end of one unit;
– Continuation signal -. Beginning of next unit
• Joint signal:– Speaker turn signal (end); auditor ->speaker; speaker-
>auditor
– Within-turn + back-channel + continuation• Back-channels signal understanding
– Early back-channel + continuation
Regaining Attention
• Gaze & Disfluency– Disfluency: “perturbation” in speech
• Silent pause, filled pause, restart
– Gaze:• Conversants don’t stare at each other constantly• However, speaker expects to meet hearer’s gaze
– Confirm hearer’s attention
• Disfluency occurs when realize hearer NOT attending– Pause until begin gazing, or to request attention
Collaborative Communication
• Speaker tries to establish and add to “common ground” – “mutual belief”– Presumed a joint, collaborative activity
• Make sure “mutually believe” the same thing
– Hearer can acknowledge/accept/disagree» Clark & Schaeffer: Degrees of grounding
• Display, Demonstrate/Reformulate, Acknowledgement, Next relevant contribution, Continued attention
Computational Models
• (Traum et al) revised for computation– Involves both speaker and hearer
• Initiate, Continue, Acknowledge, Repair, Request Repair, etc
– Common phenomena• “Back-Channel” – “uh-huh”, “okay”, etc
– Allows hearer to signal continued attention, ack» WITHOUT taking the turn
• Requests for repair – common in human-human– Even more common in human-computer dialogue
Implicature & Grice’s Maxims
• Inferences licensed by utterances• Grice’s Maxims
– Quantity: Be as informative as required• “There are two classes per week” – not 1, or 5
– Quality: Be truthful – don’t lie, – Relevance: Be relevant– Manner: “Be perspicuous”
• Don’t be obscure, ambiguous, prolix, or disorderly
• “Flouting” maxims: Consciously violate for effect– Humor, emphasis,
Speech & Dialogue Acts
• Speech Acts (Austin, Searle)– “Doing things with words”
• E.g. performatives: “I dub thee Sir Lancelot”
– Illocutionary acts: act of asking, answering, promising, etc in saying an utterance
• Include: Assertives: “I propose to..” , Directives: “Stop that”, Commissives: “I promise”, Expressives: “Thank you”, Declarations: “You’re fired”
Dialogue Acts
• (aka Conversational moves)– Enriched set of speech acts
• Capture full range of conversational functions
– Adjacency pairs: Many two-part structures• E.g. Question-Answer, Greeting-Greeting, Request-
Grant, etc…
• Paired for speaker-hearer dyads– Contrast with rhetorical relations in monologue
DAMSL
• Dialogue Act Tagging framework– Adjacency pairs+grounding+repair
• Forward looking functions– Statement, info-request, commit, closing, etc
• Backward looking functions– Focus on link to prior speaker utterance
• Agreement, answer, accept, etc..
Tagged Dialogue
Dialogue Act Recognition
• Goal: Identify dialogue act tag(s) from surface form
• Challenge: Surface form can be ambiguous– “Can you X?” – yes/no question, or info-request
• “Flying on the 11th, at what time?” – check, statement
• Requires interpretation by hearer– Strategies: Plan inference, cue recognition
Plan-inference-based
• Classic AI (BDI) planning framework– Model Belief, Knowledge, Desire
• Formal definition with predicate calculus– Axiomatization of plans and actions as well– STRIPS-style: Preconditions, Effects, Body
– Rules for plan inference
• Elegant, but..– Labor-intensive rule, KB, heuristic development– Effectively AI-complete
Cue-based Interpretation
• Employs sets of features to identify– Words and collocations: Please -> request– Prosody: Rising pitch -> yes/no question– Conversational structure: prior act
• Example: Check: • Syntax: tag question “,right?”• Syntax + prosody: Fragment with rise• N-gram: argmax d P(d)P(W|d)
– So you, sounds like, etc
• Details later ….
From Human to Computer
• Conversational agents– Systems that (try to) participate in dialogues– Examples: Directory assistance, travel info,
weather, restaurant and navigation info
• Issues:– Limited understanding: ASR errors, interpretation– Computational costs:
• broader coverage -> slower, less accurate
Dialogue Manager Tradeoffs
• Flexibility vs Simplicity/Predictability– System vs User vs Mixed Initiative– Order of dialogue interaction– Conversational “naturalness” vs Accuracy– Cost of model construction, generalization,
learning, etc
• Models: FST, Frame-based, HMM, BDI• Evaluation frameworks