26
Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Error Detection in Human-Machine InteractionDan Bohus

DoD Group, Oct 2002

Page 2: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Errors in Spoken-Language Interfaces

Speech Recognition is problematic: Input signal qualityAccents, Non-native speakersSpoken language disfluencies: stutters, false-

starts, /mm/, /um/ Typical Word Error Rates in SDS: 10-30% Systems today lack the ability to gracefully

recover from error

Page 3: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

An exampleS: Are you a registered user ?U: No I'm not. No [NO I'M NOT NO]S: What is your full name ?U: Adam Schumacher [ATHENS IN AKRON]S: There is an Athens in Georgia and in Greece. Which destination did you want ?U: Neither [NEITHER]S: I'm sorry, I didn't catch that. There is an Athens in Georgia and in Greece. Which destination

did you want ?U: Georgia [GEORGIA]S: A flight from Athens... Where do you want to go ?U: Start over [START OVER]S: Do you really want to start over ?U: Yes [YES]S: What city are you leaving from ?U: Hamilton, Ontario [HILTON ONTARIO]S: Sorry, I'm not sure I understood what you said. Where are you leaving from ?U: Hamilton [HILTON]S: I'm still having trouble understanding you. To go on, I need you to answer the following

question. What city are you leaving from ?U: Toronto [TORONTO]

Page 4: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Pathway to a solution

Make systems aware of unreliability in their inputsConfidence scores

Develop a model which learns to optimally choose between several prevention/repair strategies Identify strategiesExpress them in a computable mannerDevelop the model

Page 5: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Papers

Error Detection in Spoken Human-Machine Interaction[E.Krahmer, M. Swerts, M. Theune, M. Weegels]

Problem Spotting in Human-Machine Interaction[E.Krahmer, M. Swerts, M. Theune, M. Weegels]

The Dual of Denial: Discomfirmations in Dialogue and Their Prosodic Correlates[E.Krahmer, M. Swerts, M. Theune, M. Weegels]

Page 6: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Goals

[Let’s look at dialog on page 2]

(1) Analysis of positive an negative cues we use in response to implicit and explicit verification questions

(2) Explore the possibilities of spotting errors on line

Page 7: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Explicit vs. Implicit Explicit

Presumably easier for the system to verify But there’s evidence that it’s not as easy …

Leads to more turns, less efficiency, frustration

ImplicitEfficiencyBut induces a higher cognitive burden which

can result in more confusion~ Systems don’t deal very well with it…

Page 8: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Clarke & Schaeffer Grounding model

Presentation phaseAcceptance phase

Various indicators Go ON / YES Go BACK / NO

Can we detect them reliably (when following implicit and explicit verification questions) ?

Page 9: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Positive and Negative Cues

Positive Negative

Short turns Long turns

Unmarked word order Marked word order

Confirm Discomfirm

Answer No answer

No corrections Corrections

No repetitions Repetitions

New info No new info

Page 10: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Experimental Setup / Data

120 dialogs : Dutch SDS providing train timetable information

487 utterances44 (~10%) not used

Users accepting a wrong result Barge-in Users starting their own contribution

Left 443 resulting adjacent S/U utterances

Page 11: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Results – Nr of words

~Problems Problems

Explicit 1.68 3.44

Implicit 3.21 7.12

Page 12: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Results – Empty turns (%)

~Problems Problems

Explicit 0% 2.6%

Implicit 3.4% 10.3%

Page 13: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Results – Marked word order %

~Problems Problems

Explicit 3.3% 4.4%

Implicit 1.2% 26.9%

Page 14: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Results – Yes/No

~Problems Problems

Explicit Yes 92.8% 6.1%

No 0% 56.6%

Other 7.1% 37.1%

Implicit Yes 0% 0%

No 0% 15.4%

Other 100% ? 84.6%

Page 15: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Results – Repeated/Corrected/New

~Problems Problems

Explicit Repeated 8.5% 23.9%

Corrected 0% 72.6%

New 11.4% 12.4%

Implicit Repeated 2.4% 61.0%

Corrected 0% 92.3%

New 53.6% 36.5%

Page 16: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

First conclusion

People use more negative cues when there are problems

And even more so for implicit confirmations (vs. explicit ones)

Page 17: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

How well can you classify Using individual features

Look at precision/recall Explicit: absence of confirmation Implicit: non-zero number of corrections

Multiple featuresUsed memory based learning

97% accuracy (maj. Baseline 68%) Confirm + Correct is winning, although individually

less good This is overall, right ? How about for explicit vs.

implicit ?

Page 18: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

BUT !!! How many of these features are available on-line?

Positive Negative

Short turns Long turns

Unmarked word order Marked word order

Confirm Disconfirm

Answer No answer

No corrections ? Corrections ?

No repetitions ? Repetitions ?

New info ? No new info ?

Page 19: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

What else can we throw at it ?

Prosody (next paper) Lexical information Acoustic confidence scores

Maybe also of previous utterances Repetitions/Corrections/New info on

transcript ? … …

Page 20: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Papers

Error Detection in Spoken Human-Machine Interaction[E.Krahmer, M. Swerts, M. Theune, M. Weegels]

Problem Spotting in Human-Machine Interaction[E.Krahmer, M. Swerts, M. Theune, M. Weegels]

The Dual of Denial: Discomfirmations in Dialogue and Their Prosodic Correlates[E.Krahmer, M. Swerts, M. Theune, M. Weegels]

Page 21: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Goals

Investigate the prosodic correlates of disconfirmations Is this slightly different than before ? (i.e. now

looking at any corrections? Answer: No)Looked at prosody on “NO” as a go_on vs a

go_back:Do you want to fly from Pittsburgh ?Shall I summarize your trip ?

Page 22: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Human-human

Higher pitch range, longer duration Preceded by a longer delay High H% boundary tone

Expected to see same behavior for disconfirmation in human-machine

Page 23: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Prosodic correlates

Features POSITIVE(‘go on’) NEGATIVE(‘go back’)

Boundary tone Low High

Duration Short Long

Delay Short Long

Pause Short Long

Pitch range Low High

Yes, the correlations are there as expected

Page 24: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Perceptual analysis Took 40 “No” from No+stuff, 20 go_on and

20 go_back (note that some features are lost this way…)

Forced choice randomized task, w/ no feedback; 25 native speakers of Dutch

Results17 go_on correctly identified above chance15 go_back correctly identified above chance;

but also 1 incorrectly identified above chance.

Page 25: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Discussion Q1: Blurred relationships …

Confidence annotationGo_on / Go_back signal

Is that the same as corrections ? Is that the most general case for responses to

implicit/explicit verifications, or should we have a separate detector ?

Q2: What other features could we throw at these problems ? What are the “most juicy” ones ?

Page 26: Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Discussion

Q3: For implicit confirms, are these different in terms of induced response behavior ?When do you want to leave Pittsburgh ? Travelling from Pittsburgh … when do you

want to leave ? When do you want to leave from Pittsburgh to

Boston ?