Challenges in Dialogue Discourse and Dialogue CMSC 35900-1 October 27, 2006

Challenges in Dialogue

Discourse and Dialogue

CMSC 35900-1

October 27, 2006

Roadmap

• Issues in Dialogue– Dialogue vs General Discourse– Dialogue Acts

• Modeling

• Recognition and Interpretation

– Dialogue Management for Computational Agents

Dialogue vs General Discourse

• Key contrast: Two or more speakers– Primary focus on speech

• Issues in multi-party spoken dialogue– Turn-taking – who speaks next, when?– Collaboration – clarification, feedback,…– Disfluencies– Adjacency pairs, dialogue acts

Turn-Taking

• Multi-party discourse– Need to trade off speaker/hearer roles

• Interpret reference from sequential utterances

• When?– End of sentence?

• No: multi-utterance turns

– Silence?• No: little silence in smooth dialogue:< 250ms

– When other starts speaking?• No: relatively little overlap face-to-face: ~5%

Turn-taking: When

• Rule-governed behavior– Possibly multiple legal turn change times

• Aka transition-relevance places (TRP)

• Generally at utterance boundaries– Utterance not necessarily sentence

– In fact, utterance/sentence boundaries not obvious in speech

» Don’t necessarily pause between sentences

• Automatic utterance boundary detection– Cue words (okay, so,..); POS sequences; prosody

Turn-taking: Who & How

• At each TRP in each turn (Sacks 1974)– If speaker has selected A to speak, A must take floor

– If speaker has selected no one to speak, anyone can

– If no one else takes the turn, the speaker can

• Selecting speaker A:– By explicit/implicit mention: What about it, Bob?

• By gaze, function

• Selecting others: questions, greetings, closing– (Traum et al., 2003)

Turn-taking in HCI

• Human turn end:– Detected by 250ms silence

• System turn end:– Signaled by end of speech– Indicated by any human sound

• Barge-in

• Continued attention:– No signal

Gesture, Gaze & Voice

• Range of gestural signals:– head (nod,shake), shoulder, hand, leg, foot

movements; facial expressions; postures; artifacts– Align with syllables

• Units: phonemic clause + change

• Study with recorded exchanges

Yielding the Floor

• Turn change signal– Offer floor to auditor/hearer

• Cues: pitch fall, lengthening, “but uh”, end gesture, amplitude drop+’uh’, end clause

• Likelihood of change increases with more cues

• Negated by any gesticulation

Taking the Floor

• Speaker-state signal– Indicate becoming speaker

• Occurs at beginning of turns

• Cues:– Shift in head direction

• AND/OR

– Start of gesture

Retaining the Floor

• Within-turn signal– Still speaker: Look at hearer as end clause

• Continuation signal– Still speaker: Look away after within-turn/back

• Back-channel:– ‘mmhm’/okay/etc; nods,

• sentence completion. Clarification request; restate

– NOT a turn: signal attention, agreement, confusion

Segmenting Turns

• Speaker alone:– Within-turn signal->end of one unit;

– Continuation signal -. Beginning of next unit

• Joint signal:– Speaker turn signal (end); auditor ->speaker; speaker-

>auditor

– Within-turn + back-channel + continuation• Back-channels signal understanding

– Early back-channel + continuation

Regaining Attention

• Gaze & Disfluency– Disfluency: “perturbation” in speech

• Silent pause, filled pause, restart

– Gaze:• Conversants don’t stare at each other constantly• However, speaker expects to meet hearer’s gaze

– Confirm hearer’s attention

• Disfluency occurs when realize hearer NOT attending– Pause until begin gazing, or to request attention

Collaborative Communication

• Speaker tries to establish and add to “common ground” – “mutual belief”– Presumed a joint, collaborative activity

• Make sure “mutually believe” the same thing

– Hearer can acknowledge/accept/disagree» Clark & Schaeffer: Degrees of grounding

• Display, Demonstrate/Reformulate, Acknowledgement, Next relevant contribution, Continued attention

Computational Models

• (Traum et al) revised for computation– Involves both speaker and hearer

• Initiate, Continue, Acknowledge, Repair, Request Repair, etc

– Common phenomena• “Back-Channel” – “uh-huh”, “okay”, etc

– Allows hearer to signal continued attention, ack» WITHOUT taking the turn

• Requests for repair – common in human-human– Even more common in human-computer dialogue

Implicature & Grice’s Maxims

• Inferences licensed by utterances• Grice’s Maxims

– Quantity: Be as informative as required• “There are two classes per week” – not 1, or 5

– Quality: Be truthful – don’t lie, – Relevance: Be relevant– Manner: “Be perspicuous”

• Don’t be obscure, ambiguous, prolix, or disorderly

• “Flouting” maxims: Consciously violate for effect– Humor, emphasis,

Speech & Dialogue Acts

• Speech Acts (Austin, Searle)– “Doing things with words”

• E.g. performatives: “I dub thee Sir Lancelot”

– Illocutionary acts: act of asking, answering, promising, etc in saying an utterance

• Include: Assertives: “I propose to..” , Directives: “Stop that”, Commissives: “I promise”, Expressives: “Thank you”, Declarations: “You’re fired”

Dialogue Acts

• (aka Conversational moves)– Enriched set of speech acts

• Capture full range of conversational functions

– Adjacency pairs: Many two-part structures• E.g. Question-Answer, Greeting-Greeting, Request-

Grant, etc…

• Paired for speaker-hearer dyads– Contrast with rhetorical relations in monologue

DAMSL

• Dialogue Act Tagging framework– Adjacency pairs+grounding+repair

• Forward looking functions– Statement, info-request, commit, closing, etc

• Backward looking functions– Focus on link to prior speaker utterance

• Agreement, answer, accept, etc..

Tagged Dialogue

Dialogue Act Recognition

• Goal: Identify dialogue act tag(s) from surface form

• Challenge: Surface form can be ambiguous– “Can you X?” – yes/no question, or info-request

• “Flying on the 11th, at what time?” – check, statement

• Requires interpretation by hearer– Strategies: Plan inference, cue recognition

Plan-inference-based

• Classic AI (BDI) planning framework– Model Belief, Knowledge, Desire

• Formal definition with predicate calculus– Axiomatization of plans and actions as well– STRIPS-style: Preconditions, Effects, Body

– Rules for plan inference

• Elegant, but..– Labor-intensive rule, KB, heuristic development– Effectively AI-complete

Cue-based Interpretation

• Employs sets of features to identify– Words and collocations: Please -> request– Prosody: Rising pitch -> yes/no question– Conversational structure: prior act

• Example: Check: • Syntax: tag question “,right?”• Syntax + prosody: Fragment with rise• N-gram: argmax d P(d)P(W|d)

– So you, sounds like, etc

• Details later ….

From Human to Computer

• Conversational agents– Systems that (try to) participate in dialogues– Examples: Directory assistance, travel info,

weather, restaurant and navigation info

• Issues:– Limited understanding: ASR errors, interpretation– Computational costs:

• broader coverage -> slower, less accurate

Dialogue Manager Tradeoffs

• Flexibility vs Simplicity/Predictability– System vs User vs Mixed Initiative– Order of dialogue interaction– Conversational “naturalness” vs Accuracy– Cost of model construction, generalization,

learning, etc

• Models: FST, Frame-based, HMM, BDI• Evaluation frameworks

Documents

Challenges in Dialogue Discourse and Dialogue CMSC 35900-1 October 27, 2006