53
Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC [email protected] Collaborators: Satinder Singh, Michael Kearns, Marilyn Walker, Charles Isbell, Jessica Howe *this work was done at AT&T Labs – Research, Florham Park, NJ

Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC [email protected]

Embed Size (px)

Citation preview

Page 1: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Learning, Adaptation and Personalization in Spoken

Dialogue Systems

Diane J. Litman*University of Pittsburgh

Dept. of Computer Science & [email protected]

Collaborators: Satinder Singh, Michael Kearns, Marilyn Walker, Charles Isbell, Jessica Howe

*this work was done at AT&T Labs – Research, Florham Park, NJ

Page 2: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Overview of ResearchAdaptive and Adaptable Systems• TOOT: automatic optimization within a dialogue via

supervised learning [ACL-99, NAACL-00, AAAI-00, NAACL-01, ACL-01]

• NJFun: automatic optimization of default dialogue behavior via reinforcement learning [COLING-00, AAAI-00, JAIR-02] (this talk)

• TOOT, CobotDS: personalization within a dialogue via voice control [UM-99, AAAI-02] (CobotDS: this talk)

Novel Applications• CobotDS: new types of voice-enabled systems pose

many technical challenges (e.g. chat interfaces) [AAAI-02] (this talk)

Page 3: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Today’s Talk: Overview of Issues

• Builders of spoken dialogue systems face fundamental design choices that strongly influence system performance

• Can performance be improved by adapting the system?

• Many forms of adaptation– machine learning– user-led

• Two Case Studies

Page 4: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Outline

• I. Reinforcement Learning in Spoken Dialogue Systems– Background– Applying Reinforcement Learning– NJFun: A Reinforcement Learning Spoken Dialogue

System– A Controlled User Study

• II. CobotDS: A Spoken Dialogue System for Chat– Background: LambdaMOO and Cobot– CobotDS: A Novel Dialogue System– Grammar Personalization, Summarization, and Social

Filtering– Observations from a Pilot User Study

Page 5: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

I. Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun Spoken Dialogue System

Page 6: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Spoken Dialogue Systems

• Front end: ASR (automatic speech recognition) and TTS (text to speech)

• Back end: DB, web, etc.• Middle: dialogue strategy or policy (what

action to take at each point) • Dominated by ASR “precision-recall” trade-off

user

ASR

TTS

DBspoken dialogue system

Page 7: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

NJFun DialoguesTask Description:"You have lived in Stanhope for many years but have never managed to visit its historic sites. Today you are feeling virtuous.

Use RLDS to find out what you can see this morning.”

TTS: Welcome to RLDS. Please say an activity name or say 'list activities' for a list of activities I know about. USER: I would like to visit a historic site in Stanhope this morning. TTS: Did you say you are interested in going to a zoo? USER: No. TTS: I know about amusement parks, aquariums, cruises, historic sites, museums, parks, theaters, wineries, and zoos. Please say an activity name from this list. USER: I would like to visit a historic site. TTS: Did you say you are interested in going to a historic site? USER: Yes. TTS: Please give me more information. USER: ... TTS: Please tell me the location that you are interested in. You can also tell me the time.…. (8 turns omitted)TTS: Thank you for using the system. Please give me feedback by saying 'good', 'so-so', or 'bad'. USER: Bad. TTS: Goodbye and have a nice day!

TTS: Welcome to RLDS. How may I help you? USER: I would like to um find about Stanhope historic sites. TTS: Did you say you are interested in Stanhope? USER: Yes. TTS: What time of the day do you want to go? USER: Uh...morning. TTS: Did you say you want to go in the morning? USER: Yes. TTS: I found 2 historic sites in Stanhope that are open in the morning. They are the Lenape Indian Village, and the Waterloo Village. Thank you for using the system. Please give me feedback by saying 'good', 'so-so', or 'bad'. USER: Good. TTS: Goodbye and have a nice day!

Unoptimized System Optimized System

Page 8: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Initiative Strategy

• System initiative vs. user initiative:– Please state your departure city.– How can I help you?

• Influences user expectations• ASR grammar must be chosen accordingly• Best choice may differ from state to state!• May depend on user population & task• Early mistakes may be costly (e.g. confirmation)• Delayed reward

Page 9: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Typical System Design: Sequential Search

• Choose and implement several “reasonable” dialogue policies

• Field systems, gather dialogue data • Do statistical analyses• Refield system with “best” dialogue policy• Can only examine a handful of policies

Page 10: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Why Reinforcement Learning?(Levin, Pieraccini, Eckert; Walker; Singh, Kearns, Litman, Walker)

• Agents can learn to improve performance by interacting with their environment

• Thousands of possible dialogue policies, and want to automate the choice of the “optimal”

• Can handle many features of spoken dialogue– noisy sensors (ASR output)– stochastic behavior (user population)– delayed rewards, and many possible rewards– multiple plausible actions

• However, many practical challenges remain

Page 11: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Our Approach

• Build initial system that is deliberately exploratory wrt state and action space

• Use dialogue data from initial system to build a Markov decision process (MDP)

• Use methods of reinforcement learning to compute optimal policy of the MDP, with respect to some reward

• Re-field (improved?) system given by the optimal policy

Page 12: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

State-Based Design

• System state: contains information relevant for deciding the next action

• Dialogue policy: mapping from current state to system action

• Typically hundreds of states, several “reasonable” actions from each state

• In practice, need a compressed state

Page 13: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Markov Decision Processes

• System state s (in S)• System action a in (in A); not all states need

have choice• Transition probabilities P(s’|s,a)• Reward function R(s,a) (stochastic)• Fast algorithms for optimal policy• Our application: P(s’|s,a) models the population

of users• Allow choice of actions• Learn best choices• Parallel search in policy space!

Page 14: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

The Application: NJFun

• Dialogue system providing telephone access to a DB of activities in NJ

• Want to obtain 3 attributes:– activity type (e.g., wine tasting)– location (e.g., Lambertville)– time (e.g., morning)

• Failure to bind: query DB with don’t-care

Page 15: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

The State Space

Feature Values Explanation Attribute (A) 1,2,3 Which attribute is being worked on

Confi dence/ Confi rmed (C)

0,1,2 3,4

0,1,2 f or low, medium and high ASR confi dence 3.4 f or explicitly confi rmed, disconfi rmed

Value (V) 0,1 Whether value has been obtained f or current attribute

Tries (T) 0,1,2 How many times current attr has been asked

Grammar (G) 0,1 Whether open or closed grammar was used

History (H) 0,1 Whether trouble on any previous attribute

N.B. Non-state variables record attribute values;state does not condition on previous attributes!

Page 16: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Sample Action Choices

• Initiative (when T = 0)– user (open prompt and grammar)– mixed (constrained prompt, open grammar)– system (constrained prompt and grammar)

• Example– GreetU: “How may I help you?” – GreetS: “Please say an activity name.”

Page 17: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Dialogue Policy Class

• Specify “reasonable” actions for each state– 42 choice states (binary initiative or

confirmation action choices)– no choice for all other states

• Small state space (62), large policy space (2^42)

• Example choice state– initial state: [1,0,0,0,0,0]– action choices: GreetS, GreetU

• Learn optimal action for each choice state

Page 18: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

The Experiment• Designed 6 specific tasks, each with web survey• 54 training subjects generated 311 dialogues• Exploratory training dialogues used to build MDP• Optimal policy for objective (binary) task

completion computed and implemented• 21 test subjects performed tasks and web

surveys for modified system generated 124 dialogues

• Did statistical analyses of performance changes

Page 19: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Main Results

• Objective task completion (-1 to 3, partial credit):– train mean ~ 1.722, test mean ~ 2.176– two-sample t-test p-value ~ 0.0289

• Binary task completion:– train mean ~ 51.5%, test mean ~ 63.5%– two-sample t-test p-value ~ 0.05

Page 20: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Other Results

Subjective measures“move to the middle” rather thanimprove

First graph: It was easy to find the place that I wanted (strongly agree = 5,…, strongly disagree=1)train mean = 3.38, test mean = 3.39, p-value = .98

Page 21: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Other Results (continued)

• Using exploratory dialogues as a Monte Carlo proxy shows that our learned policy outperforms several standard fixed policies

Comparison to Human Design

A Sanity Check of the MDP

• Correlation between Monte Carlo and MDP

Page 22: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Conclusions I

• MDPs and RL a natural and promising framework for automated dialogue policy design

• First practical empirical test of formalism• Resulted in significant system improvements• Favorable comparison to human-designed

strategies• Interesting dialogue results• Care in application:

– choice of states and actions– gathering exploratory data– choice of reward to optimize

Page 23: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Future Work I

• Automate choice of states and actions

• Scale to more complex systems• POMDPs due to hidden state• Learn terminal (and non-terminal)

reward function• Online rather than batch learning

Page 24: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

II. CobotDS: A Spoken Dialogue System for Chat

Page 25: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

A Non-traditional Dialogue System

• DS’s commonly provide access to relatively structured and static back-end databases, and users have well-defined, task-oriented goals

• CobotDS provides spoken access to a complex social text-chat environment– users participate primarily for entertainment or sense of

community– “database” is dynamic and unstructured– imbalance and asynchrony between phone user and chat

users• Leads to Differing Research Foci

– Personalized Social-Filtering in listen mode– Summarization– Personal grammars

Page 26: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

CobotDS

phone user

ASR

TTS

spoken dialogue system

lambdaMOO

server

chatuser

chatuser

chatuser

chatuser

COBOT

Page 27: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

LambdaMOO: Whirlwind Tour

• Multiuser, text-based virtual world• Chat channel: directed speech and

“emotes”• Users create rooms, objects,

behaviors• Founded in 1990; > 5K users • History of AI experimentation

Page 28: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Sample Text ChatHFh waves to Buster.Buster bows gracefully to HFh.Buster is overwhelmed by all these paper deadlines.Buster begins to slowly tear his hair out, one strand at a time.HFh comforts Buster.HFh [to Buster]: Remember, the mighty oak was once a nut like you.Buster [to HFh]: Right, but his personal growth was assured. Thanks anyway, though.Buster feels better now.

Standard verbs and emotes: directed and broadcast speech,hug, wave, bow, nod, eye, poke, zap, grin, laugh, comfort, ...

Page 29: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Cobot

• Has user status, but known to be a bot• Resides in LambdaMOO Living Room• Primary functionality:

– extensive logging and recording– social statistics and queries– emote and chat abilities– reinforcement learning

Page 30: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

CobotDS• Telephone access to Cobot & LambdaMoo• Dialogue system supports basic emotes, “say”

and “listen” commands, info commands,…• CobotDS goals:

– Integrate our dialogue system work – Tackle many different research issues

• Fielded September 2000

Page 31: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Hi, Buster. What do you want to do? Wave.You wave.Who. You who. I am with Cocobot, Betwixt, and HFh. HFh

says How was the movie?, to youSay HFh. What message would you like to pass? Fantastic. You say Fantastic, to HFh. Betwixt waves to you. Summarize. You summarize. Recently, HFh and Betwixt chatted.

Betwixt and Natto bickered. HFh made the most noise.

Grammar. Which grammar would you like to use? Personal. The grammar is now set to Buster. Say. What message would you like to pass? I am in Hoboken. You say I am in Hoboken, to Betwixt. Listen. You listen. Betwixt gives you a nod. HFh to Betwixt,

Sprewell will go to the rim, but Houston settles for jumpers from the parking lot, & then, I grin to HFh.

HFh [to Betwixt]: And thanks to TiVo, I was able to see the game when I got home.

Betwixt [to HFh]: The second half was pretty spectacular.

Cobot turns to pick up the phone. Cobot begins talking to Buster!

Cobot holds up a sign: Buster passes on a wave from the phone.

HFh [to Cobot]: phone: How was the movie? Betwixt [to Cobot]: phone: waveCobot [to HFh]: FantasticCobot [to HFh]: That was from Buster. Cobot holds up a sign: Buster says, 'I am in

Hoboken' from the phone. Betwixt [to Cobot]: phone: nod Cobot holds up a sign: Buster passes on a

listen from the phone.HFh [to Betwixt]: Sprewell will go to the rim,

but Houston settles for jumpers from the parking lot.

Cobot grins at HFh.HFh [to Betwixt]: With Camby's rebounding

they have a chance to at least win the East.

Betwixt [to HFh]: Good point.

Page 32: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Dealing with the flood of chat

• Listen withSocial Filtering– Listen puts phone

user in “radio” mode– Personalized social

filtering: delete least-interacted first

– Size of filtering tuned to minimize lag

– Play-by-play mode

• Summarization

– Summary of last n minutes of activity

– Use social statistics to determine most active users

– Entry and exit of users

– “friendly” vs. “nasty” interaction

– Batch mode

Page 33: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Personalization of Grammars

• Phone user could change grammar used through command grammar and then engaging in subdialogue

• Two built-in grammars: – smalltalk – 228 hand-constructed phrases provide

basic conversation, e.g., “yes”, “no”, “fantastic”, “terrible”, “I’m at home”, etc.

– Cliché – 2950 common English sayings, e.g., “taking coal to Pittsburgh ”, etc.

• One personal grammar – Comprised of list of phrases provided by each phone

user

Page 34: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

CobotDS Observations

• Observations from most frequent 18 non-authors who made calls

• Calls for some users averaged 20 minutes, occasionally last for 45 minutes or more

• Phone users vary use of speech/verbs• Interaction included a variety of LambdaMOO users• Average lag ten seconds, filtering rare, socially

filtered text allowed 70% of unfiltered length• Personalized grammars used early and often, evolved

to deal with ASR errors• Switching in regimes from radio/listen/interaction

Page 35: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Conclusions II

• Users – Adapted personal grammar to deal with ASR errors – “I can’t

answer that question using my limited vocab.”, “I didn’t mean to say that”, etc.

– added sentences with acoustically distinct keywords to increase recognition

– wanted to update personal grammar instantaneously– wanted explicit confirmation of utterance in “say”– wanted to selectively listen to the activity of some chat

users

Some lessons for “social” (and multi-modal) spoken dialogue systems

Page 36: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Future Work II

• Investigate similar issues in the context of collaborative learning and tutoring systems

Page 37: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Conclusions

• Adaptation via Learning and Personalization can improve dialogue system performance

• NJFun: Given a small number of exploratory dialogues, reinforcement learning can be used in a working system to choose a more optimal policy within a large search space

• CobotDS: Personalization can be useful and entertaining in a social environment; such applications pose interesting and different research challenges

Page 38: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Backup Slides

Page 39: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Sample DialogueS1: Welcome to NJFun. How may I help you?

U1: I’d like to find um winetasting in Lambertville in the morning.

(ASR output: I’d like to find out wineries the in the

Lambertville in the morning.)

S2: Did you say you are interested in Lambertville?

U2: Yes

S3: Did you say you want to go in the morning?

U3: Yes.

S4. I found a winery near Lambertville that is open in the morning.

It is […] Please give me feedback by saying “good”, “so-so”

or “bad”.

U4: Good

Page 40: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Confirmation Policy

• High ASR confidence: accept ASR match and move on

• Moderate ASR confidence: confirm• Low ASR confidence: re-ask• How to set confidence thresholds?• Early mistakes can be costly later,

but excessive confirmation is annoying

Page 41: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Computing the Optimal

• Given parameters P(s’|s,a), R(s,a), can efficiently compute policy maximizing expected return

• Typically compute the expected cumulative reward (or Q-value) Q(s,a), using value iteration

• Optimal policy selects the action with the maximum Q-value at each dialogue state

Page 42: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Potential Benefits• A principled and general framework for

automated dialogue policy synthesis – learn the optimal action to take in each state

• Compares all policies simultaneously– data efficient because actions are evaluated

as a function of state– traditional methods evaluate entire policies

• Potential for “lifelong learning” systems, adapting to changing user populations

Page 43: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Actions and State

• Actions:– initiative:

• open or closed prompt?• open or closed grammar?

– confirmation:• confirm, re-ask, move on?

– binary choices– only “reasonable” states– conservative actions

• State features:– ASR scores– barge-in counts– number of tries– time-outs– ASR-centric

42 states with binary action choice;no function approximation

Page 44: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Sample Confirmation Choices

• Confirmation (when V = 1)– confirm– no confirm

• Example– Conf3: “Did you say want to go in the

<time>?”– NoConf3: “”

Page 45: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Some System Details• Uses AT&T’s WATSON ASR and TTS

platform, DMD dialogue manager• Natural language web version used

to build multiple ASR language models

• Initial statistics used to tune bins for confidence values, history bit (informative state encoding)

Page 46: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Main Results

• Objective task completion (-1 to 3, partial credit):– train mean ~ 1.722, test mean ~ 2.176– two-sample t-test p-value ~ 0.0289

• Binary task completion:– train mean ~ 51.5%, test mean ~ 63.5%– two-sample t-test p-value ~ 0.05

On all dialogues:

On “expert” dialogues 3-6:

• Binary task completion - train mean ~ 45.6%, test mean ~ 68.2% - two-sample t-test p-value ~ 0.001

Page 47: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Comparison to Human Design• Fielded comparison infeasible, but

exploratory dialogues provide a Monte Carlo proxy of “consistent trajectories”

• Test policy: Average binary completion reward = 0.67 (based on 12 trajectories)

• Outperforms several standard fixed policies– SysNoConfirm: -0.08 (11)– SysConfirm: -0.6 (5)– UserNoConfirm: -0.2 (15)– Mixed: -0.077 (13)– User Confirm: 0.2727 (11), no difference

Page 48: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

A Sanity Check of the MDP• Generate many random policies • Compare value according to MDP and value

based on consistent exploratory trajectories• MDP evaluation of policy: ideally perfectly

accurate (infinite Monte Carlo sampling), linear fit with slope 1, intercept 0

• Correlation between Monte Carlo and MDP:– 1000 policies, > 0 trajs: cor. 0.31, slope

0.953, int. 0.067, p < 0.001– 868 policies, > 5 trajs: cor. 0.39, slope 1.058,

int. 0.087, p < 0.001

Page 49: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Related Work• Biermann and Long (1996)• Levin, Pieraccini, and Eckert (1997) • Walker, Fromer, and Narayanan (1998)• Singh, Kearns, Litman, and Walker

(1999)• Scheffler and Young (2000)• Beck, Woolf, and Beal (2000)• Roy, Pineau, and Thrun (2000)

Page 50: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Popularity

Cumulative Interactions: Population

socialstatistics

emotes

chat

Page 51: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Social Statistics• For each user, histograms of:

– verbs invoked by user (communicative style)– verbs invoked on user (communal treatment)– histogram of other users (social circle)

• For each verb, histograms of:– invoking users – target users

• Initial off-line clustering experiments• Queries allowing (limited) access to stats• Comparisons via standard cosine measure

Page 52: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Chat and Emote of Cobot

• Appropriate social basics hand-coded• Early Cobot: hand-coded chat replies via Eliza-like

pattern matching + randomization• Scaling up chat:

– randomized pattern matching on large documents– topic specificity gives “personality”– variety and focus, sidestep NL generation– Documents: Unabomber Manifesto, movie

soundtracks,…– Easily swapped in and out

Page 53: Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC litman@cs.pitt.edu

Can You Relate?

HFh [to cobot]: relate me to Bustercobot whispers, ``Here are your relationships with Buster. You like to use: - (62%), poke (7%), hug (3%), eye (3%), nod (2%), hi5, h5, zap, comfort, and grin on each other. Buster is ranked #14 on your list of playmates. You are ranked #1 on Buster's list. Your socializing overlap is 75.4% and your playmate overlap is 33.7%. Your actions have a similarity of 95.9% but ignoring common speech verbs it's 58.3%. Others act on you with similarity of 96.6% but ignoring common speech verbs it's 81.9%.''