52
Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young Cambridge University Engineering Department Trumpington Street, Cambridge CB21PZ, UK {js532, sjy}@eng.cam.ac.uk Dialog on Dialogs Meeting Carnegie Mellon University, 19 August 2005

with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Learning Dialogue Strategies with a Simulated UserJost Schatzmann and Steve Young

Cambridge University Engineering DepartmentTrumpington Street, Cambridge CB21PZ, UK{js532, sjy}@eng.cam.ac.uk

Dialog on Dialogs MeetingCarnegie Mellon University, 19 August 2005

Page 2: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

User Simulation-Based Learning

Simulated User

Strategy

Dialogue Corpus

Supervised Learning

Reinforcement Learning

Dialogue Manager

� Learn dialogue strategies through trial-and-error interaction with with a simulated user

Page 3: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Agenda

� Work on Evaluation: Experiments and Results

� Agenda-based User Modelling

Page 4: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Research Questions

� How good are the currently available simulation techniques? Can they...� produce human-like behaviour? � cover the variety of real user behaviour?

� What is the effect of the user model on the learned strategy?� Influence on strategy performance? � Influence on strategy characteristics? � Are the strategies merely fitted to a

particular UM?� Can we find UM-independent forms of

strategy evaluation?

SIGdial paper

ASRU paper

Page 5: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

User Modelling Techniques

� State of the art in intention-level modelling:

� Bigram model: p(au|as)� Levin model: p(yes_answer|expl_conf)� Pietquin model: p(yes_answer|expl_conf, goal)

� UMs typically not trained on real data

� Standard evaluation practice is to test learned strategy on the user model used for learning

Page 6: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Experiments

Bigram Model Levin Model Pietquin Model

DM DM DM

CorpusTraining

Bigram Strategy

Levin Strategy

Pietquin Strategy

Learning

Bigram Model Levin Model Pietquin Model

Testing

Page 7: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Comparative Evaluation

� Performance of the learned strategy depends on the quality of the UM

0255075

100125150175200

ATT BBN CMU SRI ALL BIG LEV PTQ

Real dialogue data Simulated dialogues

Page 8: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Cross-model Evaluation

� Strategies learned with a poor UM can fail when tested on a better UM

0255075

100125150175200

Tested on BIG Tested on LEV Tested on PTQ

BIG Strategy LEV Strategy PTQ Strategy

Page 9: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Strategy Characteristics

� Learned strategies exploit weaknesses in UMs

0

20

40

60

80

100

ATT BBN CMU SRI ALL BIG LEV PTQ

% of unknown slots requested% of known slots expl. confirmed% of known slots impl. confirmed

Page 10: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

UM-independent Evaluation

� Techniques for evaluating new strategies on real dialogue data would be helpful

-40-20

020406080

100120140160180

0 0.1 0.2 0.3 0.4 0.5

Similarity to Learned Strategy

Dia

logu

e R

ewar

d

Pietquin

Page 11: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Agenda

� Work on Evaluation: Experiments and Results

� Agenda-based User Modelling

Page 12: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Motivation

� Currently have drastically different levels of sophistication for DM and UM

� Fail to model context which extends beyond the previous dialogue turn

User: I want to go from Boston to London.

System: Going from Austin to London. And when do you want to fly?

User: No, from Boston to London.

System: From Boston to London, is that correct?

User: Yes. And I‘m flying on March 15th.

Page 13: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Agenda-based User Model

� Idea: MDP User Model with agenda-based state representation

� Combines user state and user goal representation� Naturally encodes dialogue history� Allows delayed user responses (priority of actions)

AGENDA

Dialogue Context User action

User Goal

Page 14: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Agenda-based User Model

� Assume cooperative user behaviour to label dialogues� Learn output probabilities to model user behaviour

� Potential scope for modelling uncertainty about true state of user agenda (‘Hidden Agendas‘)

AGENDA

Dialogue Context

User action

User Goal

Processing Rules

Output Probabilities

Page 15: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Summary

� Current lack of solid user models and reliable evaluation standards is a major roadblock to simulation-based strategy learning

� Work on agenda-based user models may help to enhance our model of the user state and improve simulation quality

Page 16: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Thank you!

[email protected]://mi.eng.cam.ac.uk/~js532/

J. Schatzmann, K. Georgila, and S. Young. "Quantitative Evaluation of User Simulation Techniques for Spoken Dialogue Systems". 6th SIGdial Workshop on Discourse and Dialogue, Lisbon, September 2-3, 2005 (to appear)

J. Schatzmann, M. N. Stuttle, K. Weilhammer and S. Young. "Effects of the User Model on Simulation-based Learning of Dialogue Strategies". IEEE Automatic Speech Recognition and Understanding Workshop, Cancun, Mexico, November 27 - December 1, 2005 (submitted)

Page 17: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Backup slides

� The rest of this slide deck is only a backup for further questions.

Page 18: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Strategy Confidence Scores (1/3)

� Need to deviate from known strategies to explore new and potentially better ones

System: Where are you flying from, where are you flying to, on what date are you flying, when is your preferred time, do you have a preferred airline and would you like a window-seat?

Sim. User: Flying from Boston to London on March 15 at 9am with Delta Airlines. Window seat please.

Real User: ?????

Page 19: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Strategy Confidence Scores (2/3)

� Idea: System designer needs a confidence measure indicating how reliable the learned strategy is

� Define strategy confidence as function of the likelyhood of the user response in the given context

∑ ∑= =

=N

i

N

tititsitu

i

i

saaconfNN

conf0 0

,,,,, ),,(11)(π

)|()|(),,( sapaapsaaconf ssusu =

Page 20: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Strategy Confidence Scores (3/3)

� Reliability score can integrated into the learning process by weighting the reward

confidence

reward

Good strategy with respect to performance and reliability

Spectrum of strategies with increasing reliability

Page 21: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Error Generation (1/1)

� Idea: Produce acoustic-level output and optimize strategy for system-specific error conditions

SimulatedErrors

SimulatedUser

DialogueManager

HTKSimulated

UserDialogueManagerHVSNLG TTS

Combine high-level user model with statistical NLG

and TTS for strategy learning

No need to simulate recognition and

understanding errors

Page 22: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

User Studies (1/1)

� Evaluate performance of new user models using real users� Test simulation quality using listening tests� Test strategy performance using questionnaires� Test usefulness of reliability scores

Page 23: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Summary

� Work on Evaluation (January to July 2005)� Experiments and Results

� Project Proposals (Summer 2005 to Summer 2007)� Introduction of strategy confidence scores� Agenda-based User Models� Strategy learning under system-specific error

conditions� User studies

Page 24: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Experiments

� Implemented a handcrafted DM and trained three different UMs� Bigram model: p(au|as)� Levin model: p(yes_answer|expl_conf)� Pietquin model: p(yes_answer|expl_conf, goal)

� Implemented Q-Learning DM, learned strategies with each UM and compared performance and characteristics

� Cross-model evaluation of strategies� Investigated user-model independent techniques for

testing learned strategies

Page 25: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Phase II: New User Models (2/3)

� Idea 2: Use clustering to construct networks of user behaviour

� Motivation: Networks are well-suited for encoding dialogue context, but their manual construction is expensive

� Represent each dialogue as follows:

� We want to cluster user states, but the user state can never be fully observed or captured.

� However, we can cluster user actions and assume that similar actions imply similar contexts

sd2aususd1 asi

sd2auc1sd1 asi

Page 26: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Phase II: New User Models (3/3)

� Idea 2, contd.: Overlay all dialogue sequences to obtain a network

� Use frequency counts to obtain transition probabilities

sd3

sd4

auc1

asj

sd1

sd2 ash

asi

Clustered User States represent common

nodes in the network –similar to „choice points“ used by

Scheffler and Young

Transition probabilities can be

derived from number of dialogues that contain each path

System dialogue states are also

clustered to pool training data

Page 27: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Backup Slides for Sigdial paper

Page 28: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Evaluation must cover two aspects

� Can the model produce human-like behaviour?� Does it produce user responses that a real user

might have given in the same dialogue context?

� Can the model reproduce the variety of human behaviour?� Does it represent the whole user population?

Need to compare real and simulated user responses!1

Need to compare real and simulated dialogue corpora!2

Page 29: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Simulated vs. real user responses

� Split the corpus into training and testing data

� Evaluate how well the model can predict the user responses in the test data� Feed in all information about dialogue history and

user goal� Compare simulated user turn and real user turn� Use Precision and Recall to measure how closely

the predicted turn matches the real user turn

1

Page 30: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Use of Precision and Recall

� Evaluate turn by turn:

� P = Correctly predicted actions / All predicted actions� R = Correctly predicted actions / All actions in real

response

1

Dialogue in the test set:

Sys: greetinginstructionsrequest_info orig_city

Usr: unknownprovide_info orig_city london

Sys: implicit_conf orig_city londonrequest_info dest_city

Usr: no_answerprovide_info orig_city boston

Simulated user responses:

Usr: provide_info orig_city london

Usr: yes_answerprovide_info dest_city paris

P=100%, R=50%

P=0%, R=0%

Page 31: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Results: Precision and Recall

� Precision and Recall

� What do the results mean?� Is this analysis sufficient?

33.3840.16Pietquin

31.5737.98Levin

21.6617.83Bigram

RecallPrecision

1

Page 32: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Simulated vs. real corpora

� We need to evaluate if the model can reproduce the variety of user behaviour in the training data� Generate a whole corpus through interaction

between the sim. user and the DM� Use statistical metrics to compare the simulated

corpus to the real one

2

Simulated User

Strategy

Dialogue Manager

Page 33: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Statistical metrics

� High-level dialogue features� Dialogue length (in number of turns)� Turn length (in number of actions)� Proportion of user vs system talk

� Dialogue Style and Cooperativeness� Frequency of different user and system speech acts

(average number of occurrences per dialogue)� Proportion of goal-directed actions vs. Grouding actions

vs dialogue formalities vs. Unrecognised actions� Number of times information is requested, provided, re-

requested, re-provided

� Dialogue Success and Efficiency� Average goal / subgoal achievement rate� Goal completion time

Page 34: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Results: Goal completion rates / times

0102030405060708090

100

ATT BBN CMU SRI ALL BIG LEV PTQ

Perc

enta

ge c

ompl

eted

0

5

10

15

20

25

30

35

Com

plet

ion

time

(turn

s).

Percentage completed Completion time

� Goal completion rates and times

Page 35: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Project overview

� Phase I� Evaluation of the current state of the art � Re-assessment of standard evaluation practices� Introduction of strategy confidence scores

� Phase II� Development of new user models� Separation of user and error model

� Phase III� Acoustic-level simulation� Strategy learning under system-specific error conditions

� Phase IV� User studies

Work completed

Page 36: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Phase I: Results (1/5)

� Simple statistical metrics can distinguish simulated from real dialogue data

0

10

20

30

40

50

60

ATT BBN CMU SRI ALL BIG LEV PTQ0

0.5

1

1.5

2

2.5Task length Turn length

Page 37: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Motivation

� Lack of a solid user model is currently a major roadblock to automatic DM design

� Lack of rigorous evaluation standards has led to uncertainty about the validity of simulation–based learning

� Goal is to develop user and error modelling techniques that enable us to learn strategies which outperform competing handcrafted strategies when tested on human users

Page 38: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Backup slides for simulation techniques

Page 39: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

User Models (Backup Slide)

� Bigram model: p(au|as)� Levin model: p(yes_answer|expl_conf)� Pietquin model: p(yes_answer|expl_conf, goal)

Page 40: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Overview of simulation techniques

� User simulation for strategy learning is a young field of research:� Levin, Pieraccini, Eckert (1997, 1998, 2000)� Lin and Lee (2000)� Scheffler and Young (1999, 2000, 2001, 2002)� Pietquin (2002, 2004)� Henderson, Georgilia, Lemon (2005)

� Closely related work on user simulation for SDS evaluation: � Lopez-Cozar et al. (2003)� Araki et al. (1997, 1998)

Page 41: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Levin, Pieraccini, Eckert (1997, 1998)

� Simulation on intention- rather than word- or acoustic level

� N-gram model for predicting the next user intentionût = arg max P(ut|st)

� Simulated user responses often unrealistic and in-consistent

Simulated User

Dialogue Manager

CONFIRM_DATE

REJECT

System: What is your departure city?User: New YorkSystem: What is your destination?User: New York

Page 42: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Different approaches to user simulation

probabil-istic

deter-ministic

hand-crafted data-driven

LL

SY

PR L97

A

LC

L97 Levin, Eckert, Pieraccini(1997, 1998)

A Araki et al. (1997, 1998)

LL Lin and Lee (2000, 2001)

SY Scheffler and Young (1999, 2000, 2001, 2002)

LC Lopez-Cozar et al.(2003)

PR Pietquin and Renals(2002, 2004)

L00 Levin, Eckert, Pieraccini(2000)

L00

Page 43: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Levin, Pieraccini, Eckert (2000)

� Attempt to account for weaknesses of the n-gram model

� Assume a simple dialogue model and hand-select appropriate probabilities for predicting user responses

Present result

Constraining / Relaxing questionsGreeting

P(n|greeting) P(provide DATE | constrain DESTINATION)P(yes | relax AIRLINE)

User responses still not goal-consistent!User responses still not goal-consistent!

Page 44: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Scheffler and Young (1999 - 2002)

� User model includes user goal and user‘s beliefs on current system status

� User acts according to the given goal until it is completed

� Frequencies of different goals are estimated from corpus

PendingTODAYDay

PendingARTS_PICT_HOUSECinema

NANAFilm

SpecifiedGET_FILM_LISTType

StatusValueGoal field

Page 45: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Scheffler and Young (1999 - 2002)

� Utterance generation lattices, obtained by analysing possible dialogue path in existing prototype system

Heavily task-dependent approach!Heavily task-dependent approach!

specify type or mumble?

FILMTIME

FILMLIST

MUMBLE

which type?

probabilistic choice point

determinstic choice point

Page 46: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Pietquin (2002, 2004)

� Pietquin combines ideas from Scheffler‘s and Levin‘s work

� Probabilities are conditioned on user‘s goal and memory

� P(n|greeting,goal) � P(provide RAM | constrain HDD, goal, memory)� P(yes | relax RAM, goal)� P(close | asked for SPEED, goal, memory)

0Low60HDD

0Low256RAM

0High800SPEED

0HighPentiumPROCESSOR

CountPriorityValueAttribute

MemoryGoal

Page 47: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Graph-based DM

Greeting

Flight or Hotel

Page 48: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

SDS Overview

Speech Recognition

SemanticParsing

Language Generation

DialogueManager

Speech Synthesis

WordsIntentions

WordsIntentions

I want to flyto London!

prov_infodest_citylondon

requ_infodepart_dateAnd when do you

want to go?

Page 49: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Dialogue as a Markov Decision Process

� Describe dialogue in terms of states and actions� View DM strategy as a mapping from states to actions

s1

s2

s3a1

a2

Dialogue State:orig_city confirmeddest_city knowndepart_date unknowndepart_time unknown

System Action:<impl_conf, dest_city, london><requ_info, depart_date>

Page 50: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Reinforcement Learning (1/2)

� Learning DM explores its environment through

trial-and error and receives a reward rt at each time t.

s1

s2

s3a1

a2

-1

-1

+20

...33

22

1)( +++= +++ tttt rrrr γγγ

� Aim is to maximise to thecumulative discounted

reward over time r(t)

Page 51: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Reinforcement Learning (2/2)

� Estimate value of taking action a in state s

� Define the optimal policy π*

.....................

...3.937.582.013.394.77a3

...2.365.818.829.451.56a2

...9.240.672.345.674.23a1

.........s3s2s1

States),(maxarg)( asQsa

∗∗ =π

Act

ion

s

),|(),( )( aassrEasQ ttt === ππ

Page 52: with a Simulated User Learning Dialogue Strategies./dod/papers/DodAugustv3.pdf · 2005. 8. 19. · Learning Dialogue Strategies with a Simulated User Jost Schatzmann and Steve Young

Q-Learning (Backup Slide)

� The Q-learning update rule

)),(max(),()1(:),( asQrasQasQa

′′++−=′

γαα

Learning rate

Old Q-value

Reward

Discounting factor

Maximum reward obtainable in next state