33
Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage) for problems submitted by email that we use Final review on Thursday --- come with your questions Also: term papers due Thursday

Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

Embed Size (px)

Citation preview

Page 1: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

Final Exam Information

• Exam is Friday, December 13 11AM-1PM• Exam will be cumulative, slightly

emphasizing material since midterm• Extra credit (and advantage) for

problems submitted by email that we use• Final review on Thursday --- come with

your questions• Also: term papers due Thursday

Page 2: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

The Societal Impact of Computer (and Cognitive) Science

• In 1985:– computers used by

scientists & engineers– little attention to user

interfaces or CHI– almost no “social” apps– little interaction

between CS and psych.– little attention to

design principles– web did not exist

• In 2002:– most Americans have

web/Internet– popular culture and CS

mixed together– Technology and society

increasingly intertwined– many apps are social– “civilians” can generate

rich technology– CHI a rich research area– AI and CHI

Today: Case studies in spoken language interfaces and AI

Page 3: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

Spoken Dialogue Systems

• Provide automated telephone access to DB

• Front end: ASR + TTS• Back end: DB• Middle: dialogue strategy is key

componentuser

ASR

TTS

DBspoken dialogue system

Page 4: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

State-Based Design• System state:

– info attributes perceived so far– individual and average ASR confidences– other dialogue history info– data on particular user

• Dialogue strategy: mapping from current state to system action

• Typically hundreds of states, several reasonable actions from each state

Page 5: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

Typical System Design:Sequential Search

• Choose state and action spaces• Choose and implement a particular,

“reasonable” dialogue strategy• Field system, gather dialogue data

(system logs, dialogue trajectories)• Do simple statistical analyses• Re-field improved dialogue strategy• Can only examine a handful of strategies

Page 6: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

Issues in Dialogue Strategy Design

• Initiative strategy• Confirmation strategy• DB query strategy• Criteria to be optimized

Page 7: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

Facts About ASR• Inputs: audio file; grammar/language

model; acoustic model• Outputs: utterance matched from

grammar, or no match; log-likelihood• Performance precision-recall tradeoff:

– “small” grammar --> high accuracy on constrained utterances, lots of no-matches

– “large” grammar --> match more utterances, but with lower confidence

• ASR community pushing these barriers

Page 8: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

Initiative Strategy

• System initiative vs. user initiative:– “Please state your departure city.”– “Please state your desired itinerary.”– “How can I help you?”

• Influences user expectations• ASR grammar must be chosen accordingly• Best choice may differ from state to state!• May depend on user population & task

Page 9: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

Confirmation Strategy

• High ASR confidence: accept ASR match and move on

• Moderate ASR confidence: confirm• Low ASR confidence: re-ask• How to set confidence thresholds?• Early mistakes can be costly later

Page 10: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

Markov Decision Processes• System state s (in S)• System action a in (in A)• Transition probabilities P(s’|s,a)• Reward function R(s,a) (stochastic)• Fast algorithms for optimal policy• Our application: P(s’|s,a) models the

population of users

1

4

20.8

0.2

31.0

Page 11: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

SDSs as MDPs

...332211 ususus

Initial systemutterance

Initial userutterance

Actions haveprob. outcomes

estimate transition probabilities... P(next est. state | current est. state & action)...and rewards... R(current est. state, action)...from set of exploratory dialogues

Violations of Markov property! Will this work?

a e a e a e ...1 21 2 33

+ system logs

Page 12: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

The RL Approach

• Build initial system that is deliberately exploratory wrt state and action space

• Use dialogue data from initial system to build a Markov decision process (MDP)

• Use methods of reinforcement learning to compute optimal strategy of the MDP

• Re-field (improved?) system given by the optimal policy

Page 13: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

Why Reinforcement Learning?

• ASR output is noisy; user population leads to stochastic behavior

• Design choices have long-term impact; temporal credit assignment problem• Many design choices can be fixed, but - Initiative strategy - Confirmation strategy• Many different performance criteria

Page 14: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

Caveats

• Must still choose states and actions• Must be exploratory with taste• Data sparsity• Violations of the Markov property• A formal framework and

methodology, hopefully automating one important step in system design

Page 15: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

The Application• Dialogue system providing telephone

access to a DB of activities in NJ• Want to obtain 3 attributes:

– activity type (e.g., wine tasting)– location (e.g., Lambertville)– time (e.g., morning)

• Failure to bind an attribute: query DB with don’t-care

Page 16: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

The State Space

• current attribute (A = 1,2,3)• value (V = 0,1)• confidence (C = 1,2,3,4,5)• tries (T = 0,1,2,3)• grammar (G = 0,1)• “trouble” history bit (H = 0,1)

N.B. Non-state variables record attribute values;state does not condition on previous attributes!

Will this work?

Page 17: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

Sample Actions• Initiative (when T = 0):

– open or constrained prompt?– open or constrained grammar?– N.B. might depend on H, A,…

• Confirmation (when V = 1)– confirm or move on or re-ask?– N.B. might depend on C, H, A,…

• Only allowed “reasonable” actions• Results in 42 states with (binary) choices• Small state space, large policy space

Page 18: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

The Experiment• Designed 6 specific tasks, each with web survey• Gathered 75 internal subjects• Split into training and test, controlling for M/F, native/non-

native, experienced/inexperienced• 54 training subjects generated 311 dialogues• Exploratory training dialogues used to build MDP• Optimal strategy for objective TASK COMPLETION computed

and implemented• 21 test subjects performed tasks and web surveys for

modified system generated 124 dialogues• Did statistical analyses of performance changes

Page 19: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

Reward Functions• Objective task completion:

– -1 for an incorrect attribute binding– 0,1,2,3 correct attribute bindings

• Binary version:– 1 for 3 correct bindings, else 0

• Other reward measures: ASR confidence (obj), perceived completion, user satisfaction, future use, perceived understanding, user understanding, ease of use

• Optimized for objective task completion, but predicted improvements in some others

Page 20: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

Main Results

• Objective task completion:– train mean ~ 1.722, test mean ~ 2.176– two-sample t-test p-value ~ 0.0289

• Binary task completion:– train mean ~ 0.515, test mean ~ 0.635– two-sample t-test p-value ~ 0.05

On all dialogues:

On expert dialogues 3-6:

• Binary task completion - train mean ~ 0.456, test mean ~ 0.682 - two-sample t-test p-value ~ 0.001

Page 21: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

Results for Other Rewards• ASR performance (0-3):

– train ~ 2.483, test ~ 2.671, p ~ 0.0375• User satisfaction (“move to the middle” effect):

– %good: train ~ 0.459, test ~ 0.251, p ~ 0.06– %bad: train ~ 0.278, test ~ 0.138, p ~ 0.07

• Similar significant MTM results for ease of use• Many insignificant instances of MTM• Objectives improve, subjectives MTM

Page 22: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

Comparison to Human Design

• Fielded comparison infeasible, but exploratory dialogues provide a Monte Carlo proxy of “consistent trajectories”

• Test policy performance, binary completion: 0.67 (12)• Policy SysNoConfirm: -0.08 (11), significant win• Policy SysConfirm: -0.6 (5), significant win• Policy UserNoConfirm: -0.2 (15), significant win• Policy Mixed: -0.077 (13), significant win• Insignificant: difference with W99, similar to test• Even this is a potential advance...

Page 23: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

Cobot: A Software Agent• User/client of LambdaMOO, a well-known Internet

text chat and programming environment• Software chat agent providing “social statistics”• Previous functionality:

– Extensive logging of human user behavior and interaction

– Provision of social statistics and comparisons – Rudimentary chat based on IR applied to large

documents– Proactive social behavior via reinforcement learning

• This work:– Construction, fielding, and testing of a dialogue system

providing spoken natural language access to Cobot and LambdaMOO via telephone, using speech recognition and text-to-speech

Page 24: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

Sample DialogueHFh waves to Buster.Buster bows gracefully to HFh.Buster is overwhelmed by all these paper deadlines.Buster begins to slowly tear his hair out, one strand at a time.HFh comforts Buster.HFh [to Buster]: Remember, the mighty oak was once a nut like you.Buster [to HFh]: Right, but his personal growth was assured. Thanks anyway, though.Buster feels better now.

Standard verbs and emotes: directed and broadcast speech,hug, wave, bow, nod, eye, poke, zap, grin, laugh, comfort, ...

Page 25: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)
Page 26: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

Calling Cobot

• Provided a dozen or so “friendly” LambdaMOO users with access to a toll-free CobotDS number

• Users call with LambdaMOO user name, numeric password; then enter main CobotDS command loop

• Cobot announces phone call & user in LambdaMOO• From LambdaMOO to phone user:

– MOO users direct arbitrary utterances or verbs to Cobot, prefixed by text “phone:”

– Via TTS, Cobot passes verb or utterance directly to phone user– Virtually no noise on this channel

• From phone user to LambdaMOO: Cobot passes on utterances, verbs from phone user (with attribution)

• Mixed in with Cobot’s other behavior and activities• But this channel is very noisy (due to ASR), so…

Page 27: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

Basic Phone Commands

• 38 standard LambdaMOO verbs, directed or not• “Say” command with multiple ASR grammars:

– Smalltalk grammar: 228 useful phrases & exclamations– Social pleasantries, assertions of mood– Statements of whereabouts

– Cliché grammar: 2950 English witticisms and sayings– User specific personal grammars, periodically updated/modified– User can control grammar via the “grammar” command

• “Listen” command:– At every dialogue turn, CobotDS will attempt to describe all

activity taking place in LambdaMOO– Provides phone user richer view, allows passive participation– User has no scrollback– Pace of activity can quickly outrun TTS rate– Thus filter activity, including via social rules

Page 28: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)
Page 29: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

Other Useful Phone Commands

• “Where” and “who” commands• “Summarize” command

– Intended for use in non-listening mode– Provides summary of last n minutes of activity– Describes which users have generated most activity– Characterizes interactions via verb usage

Page 30: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

Sample Transcript

Page 31: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

A (Very) Small User Study

• 5 LambdaMOO users generated 31 dialogues• Averaged 65 turns per dialogue• Some findings:

– Great variation in usage styles, verbs invoked– Popularity of “listen” command, often in “radio”

mode– Effectiveness of “listen” filtering– Use of grammar command and personal grammars– Evolution of personal grammars to express

limitations– ASR problems!

Page 32: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)
Page 33: Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

The Media Equation: Media = Real Life

[Reeves and Nass]• Politeness• Interpersonal distance• Flattery• Personality• Media and Evolution• Lessons for interface design• Turing Test relevance?