A stochastic model of human-machine interaction for learning dialog ...

  • Published on

  • View

  • Download

Embed Size (px)


<ul><li><p>IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 8, NO. 1, JANUARY 2000 11</p><p>A Stochastic Model of Human-Machine Interactionfor Learning Dialog Strategies</p><p>Esther Levin, Member, IEEE, Roberto Pieraccini, Member, IEEE, and Wieland Eckert, Member, IEEE</p><p>AbstractIn this paper, we propose a quantitative model fordialog systems that can be used for learning the dialog strategy.We claim that the problem of dialog design can be formalizedas an optimization problem with an objective function reflectingdifferent dialog dimensions relevant for a given application. Wealso show that any dialog system can be formally described asa sequential decision process in terms of its state space, actionset, and strategy. With additional assumptions about the statetransition probabilities and cost assignment, a dialog systemcan be mapped to a stochastic model known as Markov decisionprocess (MDP). A variety of data driven algorithms for findingthe optimal strategy (i.e., the one that optimizes the criterion) isavailable within the MDP framework, based on reinforcementlearning. For an effective use of the available training data wepropose a combination of supervised and reinforcement learning:the supervised learning is used to estimate a model of the user,i.e., the MDP parameters that quantify the users behavior.Then a reinforcement learning algorithm is used to estimate theoptimal strategy while the system interacts with the simulateduser. This approach is tested for learning the strategy in an airtravel information system (ATIS) task. The experimental resultswe present in this paper show that it is indeed possible to find asimple criterion, a state space representation, and a simulateduser parameterization in order to automatically learn a relativelycomplex dialog behavior, similar to one that was heuristicallydesigned by several research groups.</p><p>Index TermsDialog systems, Markov decision process, rein-forcement learning, sequential decision process, speech, spokenlanguage systems.</p><p>I. INTRODUCTION</p><p>HUMAN-MACHINE interactions are ubiquitous in todaysworld. We are not surprised anymore when we interactand struggle with menu-driven touch tone or voice telephonesystems, when we engage in an interaction with our PC throughdialog boxes, or make a transaction by interacting with ATMmachines or Web browsers. New applications such as those in[7], and [32] approach the frontier of natural language dialogto automate services that are currently performed by human op-erators. In the natural language and artificial intelligence (AI)</p><p>Manuscript received December 23, 1998; revised August 3, 1999. The as-sociate editor coordinating the review of this manuscript and approving it forpublication was Dr. James R. Glass.</p><p>E. Levin is with AT&amp;T Laboratories-Research, Shannon Laboratories,Florham Park, NJ 07932 USA (e-mail: esther@research.att.com).</p><p>R. Pieraccini was with AT&amp;T Laboratories-Research, Shannon Laboratories,Florham Park, NJ 07932 USA. He is now with SpeechWorks International, Inc.,New York, NY 10004 USA (e-mail: roberto.pieraccini@speechworks.com).</p><p>W. Eckert was with AT&amp;T Laboratories-Research, Shannon Laboratories,Florham Park, NJ 07932 USA. He is now with Lucent Technologies, Nuern-berg, Germany (e-mail: weckert@lucent.com).</p><p>Publisher Item Identifier S 1063-6676(00)00296-0.</p><p>research communities there are more complex examples of di-alog systems that make use of natural ways of communica-tion like speech and language [3][5], [12][14], [19], [23],[24]. All these are examples of man-machine dialog systemsthat notwithstanding their preferred way of interaction with thehuman (touch tone, keyboard, natural language, etc.) share acommon principle: a strategy aimed at the satisfaction of thegoal that justifies their existence.</p><p>Since there are no commonly accepted methodologies forbuilding a dialog system, today dialog design is more art thanengineering or science. In general, a dialog system consists of amore or less structured program, based on reasonable rules dic-tated both by the knowledge of the application and by contin-uous experimentation with real users. Sometimes the knowledgeof the application takes the form of a plan [19], [33] that decom-poses the original goal of the dialog system into a hierarchy ofsubgoals. However, the design principles are purely based onheuristics and trial-and-error. The problem with such a proce-dure is that generally it is hard to predict all the possible sit-uations the system might encounter in real scenarios, hence asystem can reach good performance only at the expense of anextended experimentation. Moreover, when a new applicationis developed, often the whole design process has to be startedfrom scratch.</p><p>The research community in the field of spoken language sys-tems is well aware of the benefits derived by the introductionof well-founded mathematical problem formalization. For in-stance, the tremendous progress made in speech recognitionduring the last decade is mainly due to the introduction and useof stochastic modeling of both acoustic [30] and linguistic phe-nomena [16]. Also, recent advances in language understandingfor spontaneous speech are due to the introduction of differentstochastic models [11], [14], [15], [21], [26]. The strength of thestochastic approach is in the fact that it naturally takes advantageof large amounts of data. In fact no one doubts the importance ofdata corpora in the spoken language community (No data likemore data1 has been the speech recognition motto in the past tenyears). Data is used for estimating model parameters in order tooptimize the performance of the system. Usually the more datathe system is trained with, the higher the resulting performanceon new data is. Moreover, since the training is done automati-cally and requires little or no supervision, new applications canbe developed with the only cost of collecting new data, if at allneeded.</p><p>These advantages of the data driven approach motivated thedialog community to start looking at them also for dialog design.</p><p>1This sentence has been attributed to Bob Mercer, former researcher of theIBM speech group.</p><p>10636676/00$10.00 2000 IEEE</p></li><li><p>12 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 8, NO. 1, JANUARY 2000</p><p>However it is not clear yet how to use data for that purpose. Ofcourse, dialog designers do look at data in order to build the rulesthat guide the dialog, and later to improve its performance. Butdesigners can analyze only a relatively small quantity of data,and there is not a clear understanding of which measures can bederived automatically from the corpus in order to automate thedesign procedure.</p><p>The main problem of supervised learning of dialog fromcorpus originates from the dynamic nature of dialog itself. Notonly is it not clear how to learn a dialog strategy, but also adialog system cannot be evaluated against a fixed corpus: ifat any point during the evaluation the system deviates fromthe dialog in the corpus (for example because of an error or achange in the dialog design), the dialog will take a differentcourse that cannot be predicted. In fact, even if we manage tolearn a dialog strategy that approximates the one in the corpuswith high accuracy and only few deviations, as a result ofthese deviations the dialog system can get into an unexploredportion of the state space that was not represented in the corpus.Moreover, the dialog strategies adopted by humans are notnecessarily those best suited for automatic systems.</p><p>In this paper, we propose to use a combination of supervisedand reinforcement learning. The dialog system will learn an op-timal strategy using reinforcement learning while interactingwith users. Reinforcement learning allows one to actively ex-plore the state space of the system, while using delayed feedbackto update the values of all states traversed in the current interac-tion. Although it is possible to use reinforcement learning withreal users, it is often impractical. Here we propose to use super-vised learning from a corpus of dialogs in order to estimate theparameters of a simulated user that is a stochastic model usedin a generative mode. The simulated user then interacts with thedialog system while it uses reinforcement learning.</p><p>II. DIALOGUE DESIGN AS AN OPTIMIZATION PROBLEM</p><p>In general, a dialog system is a machine that tries to achievean application goal in an efficient way through a series of inter-actions with the user. By quantifying the concepts of achieve-ment of an application goal and efficiency, we can state theproblem of dialog design as optimization of an objective :</p><p>(1)</p><p>where the terms are the expected costs for different dialogdimensions reflecting the distance to the achievement of theapplication goal and the efficiency of the interaction, and theweights determine the tradeoff among the costs. Some ofthese dimensions can be measured directly by the system, likedialog duration, cost of internal processing, cost of accessingexternal databases or other resources and cost of ineffectiveness(e.g., number of errors the system made due to poor speechrecognition); others quantify such abstract dimensions as usersatisfaction (e.g., a simple happy/not happy-with-the-systemfeedback from the user at the end of dialog, number of usershanging up before the completion of the dialog goal, etc.). Thework described in [9] relates user satisfaction to a linear com-bination of directly measurable quantities. Using the optimality</p><p>criterion expressed by (1) we can evaluate system performancesimply by running several dialog sessions and computing theaverage cost. Since the actions taken by the system may affectsome or all the terms of the objective function, the optimaldialog strategy is a result of a correct trade off between them,as illustrated by the examples in this paper. The problem ofdialog design that we address in this paper is that of finding theoptimal strategy automatically.</p><p>Recently, several data-driven optimization approaches wereattempted toward dialog design. For example, in [35], clarifica-tion questions are generated based on task description, avoidingthe situation of asking the user a question that does not con-tribute to current information exchange. In [36], the next clari-fication attribute is computed from the retrieved data in order tominimize the expected number of clarification interactions. Inboth cases, a local optimization is performed in the particularstate of the dialog where clarification is needed. In the approachproposed in this paper, the optimization is global over the wholedialog session.</p><p>We will illustrate the concepts introduced in this paper witha tutorial example of Day-and-Month Dialog, where the goalof the system is to get the correct day and month values fromthe user through the shortest possible interaction.</p><p>From the definition of the application given above, it isstraightforward to describe the objective function as the sumof three terms:</p><p>(2)</p><p>The first term is the expected duration of the dialog ( beingthe number of interactions). The second term corresponds to theexpected number of errors in the obtained values (rangingfrom zero to two); and the third measures the expected distance</p><p>from achieving our application goal (this distance is zero fora complete date, one if either day or month value is missing, andtwo if both are incomplete).</p><p>We formalize a dialog system by describing it as a sequen-tial decision process in terms of its action set, state space, andstrategy.</p><p>The action set of the dialog system includes all possible ac-tions it can perform, such as interactions with the user (e.g.,asking the user for input, providing a user some output, confir-mations, etc.), interactions with other external resources (e.g.,querying a database), and internal processing. For example, theaction set in our tutorial dialog system includes the followingfour actions:</p><p> a question asking for the value of the day ( ); a question asking for the value of the month ( ); A more open-ended question asking for the value of the</p><p>date (day and month) ( ); A final action, closing the dialog and submitting the form</p><p>( ).</p><p>When executing actions , , and , the system firstasks the corresponding question, and then activates a speechrecognition system to obtain the users answer. Of course ac-tions can be defined at different levels of granularity. For ex-ample the actions , , and can be broken into separatelower level actions, including asking the question, activating the</p></li><li><p>LEVIN et al.: STOCHASTIC MODEL OF HUMAN-MACHINE INTERACTION 13</p><p>Fig. 1. Description of a sequential decision process.</p><p>speech recognizer with the appropriate grammar, getting inputfrom the speech recognizer, and parsing it. In this example, wechoose to group these elemental actions into a single higher levelaction because the order in which they have to be executed isclear and fixed (i.e., first ask, then activate the recognizer, etc.).It is up to the designer to choose the actions and their level ofgranularity.</p><p>The state of a dialog system includes the values of all therelevant internal variables that determine the next action that thesystem executes. Given the same external conditions (i.e., userresponses, database results, etc.) the system future behavior isuniquely determined by the current state. The state space can befinite or infinite, and it includes special initial ( ) and final ( )states. For our simple tutorial example, the state description hasto include at least two integer variables that describe the day( ) and the month ( ), whose values can be either zero (i.e., thecorresponding variable is unassigned), or assigned to the answerprovided by the user. For this state representation we have atotal of 411 possible states, including one initial state ( ,</p><p>), 12 states for which the month variable is assigned andthe day is not ( , ), 31 states in whichonly the day variable is assigned ( , ),366 states with complete dates, and a special final state (</p><p>, ). Again, it is up to the system designer to decidewhich variables should be included in the state. For example, ina more sophisticated system, we could have included in the statesome information about past values of day and month variablesobtained during the same interaction. As we will see in the nextsection, sometimes it is necessary to expand the state descriptionwith additional variables in order to be able to model the systemas a Markov decision process.</p><p>When an action is taken in state , the system state changes.For the day-and-month example, when the system is in an initialstate and it asks the user for the month (action ) the nextstate depends on the actual answer of the user as well as on thespeech recognition performance, and it can be any one amongthe 12 states ( , ) in which only the monthis filled, assuming that the grammar of the speech recognizer isconstrained to recognize only months at this stage of the dialog.We will discuss more about state transitions in the section aboutMarkov decision process (MDP).</p><p>A dialog session corresponds to a p...</p></li></ul>


View more >