An Investigation into Recovering from Non-understanding Errors Dan Bohus Dialogs on Dialogs Reading Group Talk Carnegie Mellon University, October 2004

An Investigation into Recovering from Non-understanding ErrorsDan Bohus

Dialogs on Dialogs Reading Group TalkCarnegie Mellon University, October 2004

2

Non-understandings

S: What city are you leaving from?U: Urbana Champaign [OKAY IN THAT SAME PAY]

System knows there was a user turn, but There is no relevant semantic information in the input Confidence is too low to trust any semantic information in

the input

10 – 30% of turns in a mixed initiative system

GOAL: Do a better job at recovering from non-understandings

3

Recovery Ingredients

Detection Set of strategies (actions) Policy (method for choosing between actions)

4

Recovery Ingredients – Non-understandings

Detection Generally, system knows when a non-

understanding happened

Set of strategies (actions) Notify non-understanding, repeat question, ask

repeat/rephrase, provide help, etc.

Policy (method for choosing between actions) Traditionally fixed heuristic

5

Issues under Investigation

Detection Analysis of error types, blame assignment, impact

on task performance Detection of error type Adaptation of rejection threshold

Set of strategies Investigate individual strategy performance Identify potential new strategies

Policy Impact of a “smarter” policy on performance Building a policy from data

6

Issues under Investigation





7

Experimental Design - Overview

Subjects interact over the telephone with RoomLine Perform a number of scenario-based tasks

Between-subjects experiment Control: system uses a random (uniform) policy for

engaging the non-understanding recovery strategies

Wizard: policy is determined at runtime by a human (wizard)

46 subjects, balanced Gender x Native

8

MOVE-ON

HELP

SIGNAL

Non-understanding StrategiesS: For when do you need the room?U: [non-understanding] FAIL Sorry, I didn’t catch that. Tell me for what day you need the room YOU CAN SAY (YCS) Sorry, I didn’t catch that. For when do you need the conference room? You can say something like tomorrow at 10 am … TERSE YOU CAN SAY (TYCS) Sorry, I didn’t catch that. You can say something like tomorrow at 10 am … FULL HELP (HELP) Sorry, I didn’t catch that. I am currently trying to make a conference room reservation for you. Right now I need to know the date and time for when you need the reservation. You can say something like tomorrow at 10 am … ASK REPEAT (AREP) Could you please repeat that? ASK REPHRASE (ARPH) Could you please try to rephrase that? NOTIFY (NTFY) Sorry, I don’t think I understood you correctly… YIELD TURN (YLD) … REPEAT SYSTEM PROMPT (REPP) For when do you need the conference room? EXPLAIN MORE (EXPL) Right now I need to know the date and time for when you need the reservation …

Verb.

V

T

A

T

T

T

T

A

T

Prompt.

Y

N

Y

N

N

N

N

Y

Y

9

Experimental Design: Scenarios

Presented graphically (explained during briefing)

10

Corpus Statistics / Characteristics

46 users; 484 sessions; ~ 9000 turns Transcribed Annotated with:

Misunderstandings & deletions Non-understandings Concept transfer accuracy Transcript grammaticality labels

OK, OOR, OOG, OOS, OOD, VOID

Correct concept values in each turn – [ongoing]

11

Back to the Issues





12

Impact of Policy on Performance

General picture Significant improvements for non-natives, especially after

non-understandings

Global Task success

Significant improvements (x1.77) for non-natives SASSI Scores: nothing detectable

Local WER

significant improvements across the board Understanding error metrics (CT, CER, NONU, MIS)

significant improvement for non-natives Recovery

Nothing detectable (?) Faster on the wizard side

13

Impact of Policy on Performance

… Weird stuff

Conclusion?

14





Back to the Issues

15

Impact on task performance

Models for predicting task success from various types of errors [show in Matlab]

Can shed more light on: Effect of the policy Native / non-native differences Costs of various types of errors

Currently analyzing it. Issues: Build (state-)conditioned cost models Robustness

16

Back to the Issues





17

Individual strategy performance

Under “random”/uniform conditions (control) All-way-comparison: Matlab, summary file (rank

analysis ?) First conclusions:

Moving-on helps Help helps Just signaling is not so good, YLD is pretty bad

Compare with wizard: Ask Repeat boosted (significantly x1.58)

Wizard reverse engineering (?) HELP / FAIL behavior in non-natives (?) Predicting success: when to help, when to ask

repeat?

18

MOVE-ON

HELP

SIGNAL

Non-understanding StrategiesS: For when do you need the room?U: [non-understanding] FAIL Sorry, I didn’t catch that. Tell me for what day you need the room YOU CAN SAY (YCS) Sorry, I didn’t catch that. For when do you need the conference room? You can say something like tomorrow at 10 am … TERSE YOU CAN SAY (TYCS) Sorry, I didn’t catch that. You can say something like tomorrow at 10 am … FULL HELP (HELP) Sorry, I didn’t catch that. I am currently trying to make a conference room reservation for you. Right now I need to know the date and time for when you need the reservation. You can say something like tomorrow at 10 am … ASK REPEAT (AREP) Could you please repeat that? ASK REPHRASE (ARPH) Could you please try to rephrase that? NOTIFY (NTFY) Sorry, I don’t think I understood you correctly… YIELD TURN (YLD) … REPEAT SYSTEM PROMPT (REPP) For when do you need the conference room? EXPLAIN MORE (EXPL) Right now I need to know the date and time for when you need the reservation …

Verb.

V

T

A

T

T

T

T

A

T

Prompt.

Y

N

Y

N

N

N

N

Y

Y

19

Back to the Issues





20

Identify Potential New Strategies

Better informed by the error-type / blame assignment analysis (top of my stack)

So far Ask user to speak shorter Ask user to speak louder Speculative execution

21

Speculative execution

A lot of small recognition errors appear repeatedly YES > THIS, NEXT GUEST > YES GUEST USER > TUESDAY Etc…

Learn from experience how to avoid these errors Example:

S: Did you say you wanted a room for Tuesday?

U: YES [THIS]

S: Sorry, I didn’t catch that. Did you say you wanted a room for Tuesday?

U: YES [YES]

Learn that “THIS” actually means “YES”

22

Speculative execution - components

Learn mapping Learner with high precision (no false positives)

Apply mapping Learner with high recall

Precision / Recall tradeoff

How much can this method really buy us?

23

Speculative Execution – 0st cut

Conservative Learner Learns from non-understanding segments where

Dialogue state is the same throughout (mapping is state-specific)

Final response is in focus, contains only one concept and has high confidence

Conservative Applier Apply only when dialogue state matches and non-

understood input matches perfectly at the state level

Going through the whole dataset, learning as you go results: 10% application at the end, does not asymptote yet

Precision? (480 ruled learned)

How does this look to you?

24

Speculative execution

Of course much more to dig in here … Learners which generalize more Confidence score on the rules Active learning: appliers with confidence, and

feedback into learning Potentially use it in other cases (not only non-

understandings, but potential misunderstandings)

25

Back to the Issues





26

Building a Policy from Data

Experiment shown that wizard boosted performance of Ask Repeat

Can we predict likelihood of success for each strategy from features available online? Identify informative features

Might be better informed by error-type/blame-assignment analysis

Try simple classifiers MDP (?)

Can also formulate problem as a decision boundary or classification problem… (?)

27

Thank you!

28

Experimental Design:Control vs Wizard Conditions

Control: random (uniform) policy Wizard: human with access to audio & system state

Perf

orm

an

ce

Random (uniform) policy

Manually designed policy

Data-driven designed policy

Human wizard with access to audio

? Human wizard with access to only system state ?

Documents

An Investigation into Recovering from Non-understanding Errors Dan Bohus Dialogs on Dialogs Reading Group Talk Carnegie Mellon University, October 2004