An Assessment Of

Embed Size (px)

Citation preview

  • 8/11/2019 An Assessment Of

    1/27

    This article was downloaded by: [129.110.5.107] On: 08 August 2014, At: 13:47

    Publisher: Institute for Operations Research and the Management Sciences (INFORMS)

    INFORMS is located in Maryland, USA

    Marketing Science

    Publication details, including instructions for authors and subscription information:

    http://pubsonline.informs.org

    Invited PaperLearning Models: An Assessment ofProgress, Challenges, and New Developments

    Andrew T. Ching, Tlin Erdem, Michael P. Keane

    To cite this article:Andrew T. Ching, Tlin Erdem, Michael P. Keane (2013) Invited PaperLearning Models: An Assessment of Progress, Challeng

    and New Developments. Marketing Science 32(6):913-938. http://dx.doi.org/10.1287/mksc.2013.0805

    Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions

    This article may be used only for the purposes of research, teaching, and/or private study. Commercial useor systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisherapproval, unless otherwise noted. For more information, contact [email protected].

    The Publisher does not warrant or guarantee the articles accuracy, completeness, merchantability, fitnessfor a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, orinclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, orsupport of claims made of that product, publication, or service.

    Copyright 2013, INFORMS

    Please scroll down for articleit is on subsequent pages

    INFORMS is the largest professional society in the world for professionals in the fields of operations research, managementscience, and analytics.

    For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org

    http://pubsonline.informs.org/page/terms-and-conditionshttp://www.informs.org/http://pubsonline.informs.org/page/terms-and-conditionshttp://dx.doi.org/10.1287/mksc.2013.0805http://pubsonline.informs.org/
  • 8/11/2019 An Assessment Of

    2/27

    Vol. 32, No. 6, NovemberDecember 2013, pp. 913938ISSN 0732-2399 (print) ISSN 1526-548X (online) http://dx.doi.org/10.1287/mksc.2013.0805

    2013 INFORMS

    Invited Paper

    Learning Models: An Assessment of Progress,Challenges, and New Developments

    Andrew T. ChingRotman School of Management, University of Toronto, Toronto, Ontario M5S 3E6, Canada,

    [email protected]

    Tlin ErdemStern School of Business, New York University, New York, New York 10012,

    [email protected]

    Michael P. KeaneNuffield College, University of Oxford, Oxford OX1 1NF, United Kingdom,

    [email protected]

    Learning models extend the traditional discrete choice framework by postulating that consumers have incom-plete information about product attributes and that they learn about these attributes over time. In thissurvey we describe the literature on learning models that has developed over the past 20 years, using the modelof Erdem and Keane as a unifying framework [Erdem T, Keane M (1996) Decision-making under uncertainty:Capturing dynamic brand choice processes in turbulent consumer goods markets. Marketing Sci. 15(1):120].We describe how subsequent work has extended their modeling framework and applied learning models toa wide range of different products and markets. We argue that learning models have contributed greatly toour understanding of consumer behaviorin particular, in enhancing our understanding of brand loyalty andlong-run advertising effects. We also discuss the limitations of existing learning models and potential extensions.One key challenge is to disentangle learning as a source of dynamics from other key mechanisms that maygenerate choice dynamics (inventories, habit persistence, etc.). Another is to enhance identification of learningmodels by collecting and using direct measures of signals, perceptions, and expectations.

    Key words : learning models; choice modeling; dynamic programming; structural models; brand equityHistory : Received: August 11, 2011; accepted: June 6, 2013; Preyas Desai served as editor-in-chief and Eric

    Bradlow served as associate editor for this article. Published online inArticles in AdvanceSeptember 5, 2013.

    1. IntroductionIn the field of discrete choice, the most widely usedmodels are the multinomial logit and probit.1 Ofcourse, there has been substantial effort over thepast 20 years to generalize these workhorse modelsto allow richer structures of consumer taste het-erogeneity, serial correlation in preferences, dynam-ics, endogenous regressors, etc. However, with few

    exceptions, work within the traditional random utilityframework maintains the strong assumption that con-sumers know the attributes of their choice options perfectly .

    Learning models extend the traditional discretechoice framework by postulating that consumers

    1 Until recently, the logit was much more popular than probit,largely because of its computational advantages. But advancesin simulation methods, such as the GHK algorithm and Gibbssampling (see Geweke and Keane 2001, McCulloch et al. 2000),have greatly increased the popularity of probit, particularly amongBayesians.

    may have incomplete information about productattributes. Thus, they make choices based on perceivedattributes. Over time, consumers receive informationsignals that enable them to learn more about prod-ucts. It is this inherent temporal aspect of learningmodels that distinguishes them from static choiceunder uncertainty models.

    Within this general framework, different types of

    learning models can be distinguished along four keydimensions. One is whether consumers behave ina forward-looking manner. If attributes are uncer-tain and consumers are myopic, they choose thealternative with the highest expected current util-ity. But forward-looking consumers may (i) maketrial purchases to enhance their information sets or(ii) actively search for information about productsusing a variety of other sources of information.

    A second key distinction is whether utility is lin-ear in attributes or whether consumers exhibit risk

    913

  • 8/11/2019 An Assessment Of

    3/27

    Ching, Erdem, and Keane: An Assessment of Progress, Challenges, and New Developments914 Marketing Science 32(6), pp. 913938, 2013 INFORMS

    aversion. In the linear case, forward-looking con-sumers are willing to pay a premium for unfamiliarproducts, as they receive not only the expected utilityof consumption but also the value of the informationacquired by trial.2 But with risk aversion, consumersare willing to pay a premium for a more familiar

    product. This can generate brand equity for well-known brands.A third distinction involves sources of information.

    In the simplest learning models, trial is the onlyinformation source. In more sophisticated models,consumers can learn from a range of sources, suchas advertising, word of mouth, price signals, sales-people, product ratings, social networks, and newspa-pers. A consumer must decide how much to use eachavailable source. In particular, consumers may engagein passive search, using only information sources thatarrive exogenously, or active search, exerting effort togather information. They may also do both.

    A fourth distinction is how consumers learn.They may be Bayesians, or they may update percep-tions in some other way. For instance, consumers mayover- or underweight new information relative to anoptimal Bayesian rule, or they may forget informationthat was received too far in the past.

    Learning models were first applied to marketingproblems in pioneering work by Roberts andUrban (1988) and Eckstein et al. (1988).3 Because oftechnical limitations of the time (both in computerspeed and estimation algorithms), however, theirmodels had to be quite simple. Roberts and Urban(1988) study how Bayesian consumers learn about a

    new product from word-of-mouth signals. Consumersin their model are risk averse but myopic, so thereis no active search. In contrast, in Eckstein et al.(1988), consumers are forward-looking, and trial pur-chase is the (only) source of information. Yet utility islinear, so their model exhibits the value of informa-tion phenomenon but not the brand equity phe-nomenon created by risk aversion. For Roberts andUrban (1988), the converse is true. The strong sim-plifying assumptions of these early models, plus thedifficulty of estimating even simple learning models20 year ago, most likely explains why no furtherlearning models appeared in the literature for several

    years after 1988.

    2 Thus, learning models play havoc with traditional welfare analy-sis, as consumer surplus is no longer the area under the demandcurve. Parameters of the demand curve are no longer structuralparameters of preferences but depend on the information set(e.g., they can be shifted by advertising). See Erdem et al. (2008)for a discussion.3 Pioneering papers that first applied learning models in laboreconomics were Jovanovic (1979) and Miller (1984). Both were con-cerned with workers and firms learning about the quality of jobmatches.

    The paper by Erdem and Keane (1996) repre-sents a significant methodological advance because itgreatly expanded the class of learning models thatare feasible to estimate.4 Their approach could handleforward-looking consumers, risk aversion, multipleinformation sources, and both active and passive

    search in one model. Thus, their model exhibits boththe value of information and brand loyalty phenom-ena. In their empirical application, consumers wereuncertain about the quality of brands and learnedthrough use experience (active learning) and exoge-nously arriving advertising signals (passive learning).

    Erdem and Keane (1996) assume that consumersprocessed quality signals as Bayesians. This assump-tion imposes a very special structure on how pastchoice history affects current choice probabilities.5

    A striking result in their paper is that this structurallymotivated functional form for choice probabilitiesactually fits the data better than commonly used

    reduced-form specifications, such as Guadagni andLittles (1983) exponential weighted average of pastpurchases specification (the so-called loyalty vari-able).6 Erdem and Keane (1996) also find strong evi-dence that advertising has important long-run effectson demand (via the total stock of advertising) but thatshort-run effects of recent advertising are negligible.

    The Erdem and Keane (1996) paper is influen-tial because (1) it provides a practical method forestimating complex learning models; (2) it shows that,far from imposing a straitjacket on the data, theBayesian learning structure led to insights about thefunctional form for state dependence that improved

    model fit; (3) it generates interesting results aboutlong- versus short-run effects of advertising; and (4) itgives an economic rationale for the brand loyaltyobserved in scanner panel data.7 These results gener-ated new interest in structural learning models (and

    4 Their computational approach involves (1) using the method ofKeane and Wolpin (1994) to obtain a fast but accurate approximatesolution to the dynamic optimization problem of forward-lookingagents and (2) using (then) recently developed simulation estima-tion methods (see, e.g., Keane 1994) to approximate the likelihoodfunction.5 Specifically, only the total number of use experiences should mat-ter, not their timing. The same is true for ad exposures. This struc-

    ture changes with forgetting, an issue we will discuss in 3.1.1.6 In most applications, imposing structure involves sacrificing fitto some extent (i.e., not surprisingly, structural models usually fitworse than flexible reduced-form or statistical/descriptive models).The payoff of imposing the structure is (1) greater interpretabilityof parameter estimates and (2) the ability to do policy experiments.Erdem and Keane (1996) is a rare instance where a structural modelfit better than popular competing reduced-form models.7 A key insight of Erdem and Keane (1996) is that uncertaintyabout quality combined with risk aversion could lead to brand-loyal behavior (i.e., persistence in brand choice over time). Loy-alty emerges when consumers stick with familiar products (whoseattributes are precisely known) to avoid risk. Given equal prices,

  • 8/11/2019 An Assessment Of

    4/27

    Ching, Erdem, and Keane: An Assessment of Progress, Challenges, and New DevelopmentsMarketing Science 32(6), pp. 913938, 2013 INFORMS 915

    dynamic structural models more generally) within thefields of marketing and economics.

    Nevertheless, there was a time lag of roughly fiveyears from Erdem and Keane (1996) to the publica-tion of many additional papers on learning models.But, starting in the early 2000s, there has been an

    explosion of new work in marketing and economicsapplying learning models to brand choice and manyother problems. Other interesting applications include(i) demand for new products, (ii) choice of TV showsand movies, (iii) prescription drugs, (iv) durablegoods, (v) insurance products, (vi) choice of tar-iffs (i.e., price/usage plans), (vii) fishing locations,(viii) career options, (ix) service quality, (x) child-care options, and (xi) medical procedures. Some ofthese applications are based rather closely on theErdem and Keane framework with forward-lookingBayesian consumers, whereas other papers departfrom or extend that framework in important ways

    (often along one of the four key dimensions notedabove).The outline of the survey is as follows: In 2 we

    describe the learning model of Erdem and Keane(1996) in detail. We treat their model as a unifyingframework to discuss the rest of the literature. In gen-eral, later developments can be viewed as extendingthe ErdemKeane model along certain dimensions(while typically restricting it on others to make thoseextensions feasible) or applying it in different con-texts. Section 3 reviews the subsequent literature onlearning models. It is divided in subsections thatcover (i) more sophisticated learning models with

    myopic consumers, (ii) more sophisticated learningmodels with forward-looking consumers, (iii) mod-els for product-level/market share data, and (iv) newor novel applications of learning models. Section 4describes what we consider the key challenges forfuture research. In 5 we summarize and conclude.

    2. The General Structure of theErdem and Keane (1996) Model

    As noted in 1, the papers by Roberts and Urban(1988) and Eckstein et al. (1988) are the first appli-cations of learning models to marketing problems.The model of Erdem and Keane (1996; henceforthreferred to as EK) nests those models in a more gen-eral framework. In this section we describe the EK

    a familiar brand may be chosen over a less familiar brand even ifit has lower expected quality, provided consumers are sufficientlyrisk averse. In this framework, loyalty is the price premium thatconsumers are willing to pay for greater familiarity (lower risk).Keller (2002) refers to the general framework laid out in Erdemand Keane (and elucidated further in Erdem and Swait 1998) as theprinciple economicmodel of brand equity. There are also a numberof psychology-based models (see Keller 2002 for an overview).

    model in some detail. Readers interested in additionalinformation about the earlier models can refer to adetailed description in Online Appendix A (avail-able as supplemental material at http://dx.doi.org/10.1287/mksc.2013.0805).

    2.1. A Simple Dynamic Learning Model withGains from Trial Information

    The key feature of learning models is that con-sumers do not know the attributes of brands with cer-tainty. Although this may be true of many attributes,most papers, including Erdem and Keane (1996),have focused on learning about brand quality. In theEK model, consumers receive signals about qual-ity through both use experience and ad signals.But before receiving any information, consumers havea normal prior on brand quality:

    Qj NQj1 2j1 j= 1 J (1)

    This says that, before receiving any information, con-sumers perceive that the true quality of brand j,denoted as Qj, is distributed normally with meanQj1 and variance

    2j1. So in the first period, the con-

    sumers information set is I1= Qj1 2j1 j. The val-

    ues ofQj1and 2j1may be influenced by many factors,

    such as reputation of the manufacturer, prelaunchadvertising, etc.

    Use experience does not fully reveal qualitybecause ofinherent product variability. This has multi-ple interpretations. First, the quality of different unitsof a product may vary. Second, a consumers expe-rience of a product may vary across use occasions.

    For instance, a cleaning product may be effective atremoving a certain type of stain on most occasions

    but be ineffective on other occasions. Alternatively,there may be inherent randomness in psychophysicalperception, e.g., the same cereal tastes better on somedays than others.

    Given inherent product variability, there is a dis-tinction between experienced quality for brand j onpurchase occasion t, which we denote as QEjt , andtrue qualityQj. Let us assume the experienced qualitydelivered by use experience is a noisy signal of truequality, as in

    QEjt =Qj+ jt where jt N0 2

    fort = 1 T (2)

    Here, 2 is the variance of inherent product variabi-lity, which we often refer to as experience variability.Of course, experience signals are specific to consu-mer i. But here and in later equations, we will sup-press the i subscript whenever possible to save onnotation.

    Note that we have conjugate priors and signals,as both the prior on quality in (1) and the noise

  • 8/11/2019 An Assessment Of

    5/27

    Ching, Erdem, and Keane: An Assessment of Progress, Challenges, and New Developments916 Marketing Science 32(6), pp. 913938, 2013 INFORMS

    in the quality signals in (2) are assumed to be nor-mal. This structure gives simple formulas for updat-ing perceptions as new information arrives, as we willsee below. This is precisely why we assume priorsand signals are normal. Few other reasonable distri-

    butions would give simple expressions. Also, because

    signals are typically unobserved by the researcher, itis not clear that more flexible distributions would beidentified from choice data alone.

    Thus, the posterior for perceived quality, given asingle use experience signal (received after the firstpurchase of brandj), is given by the simple Bayesianupdating formulas:

    Qj2 =2j1

    2j1 +2

    QEj1 + 2

    2j1 +2

    Qj1 (3)

    2j2 = 1

    1/2j1 + 1/2 (4)

    Equation (3) describes how a consumers prior onquality of brand jis updated as a result of the expe-rience signal QEj1. The extent of updating is greaterthe more accurate the signal (i.e., the smaller the 2 .Equation (4) describes how a consumers uncer-tainty declines when he or she receives the signal.The quantity 2j2 is often referred to as the perceptionerror variance.

    Equations (3) and (4) generalize to multiple periods.Let Njt denote the total number of use experi-ence signals received before the purchase at time t.Then we have

    Qjt =2j1

    Njt 2j1 +

    2

    t1s=1

    QEjs djs + 2

    Njt 2j1 +

    2

    Qj1

    fort = 2 T (5)

    2jt = 1

    1/2j1 +Njt 1/2fort = 2 T (6)

    wheredjt is an indicator for whether brandjis boughtor consumed at time t .

    In (5), the perceived quality of brand jat time t,Qjt , is a weighted average of the prior and all qual-ity signals received up until the beginning of time t,

    t1s=1 QEjs djs . It is crucial to note that this is a randomvariable across consumers, as some will, by chance,receive better-quality signals than others. Thus, thelearning model endogenously generates heterogene-ity across consumers in perceived quality of prod-ucts (even starting from identical priors). This aspectof the model is appealing. It seems unlikely thatpeople are born with brand preferences (as stan-dard models of heterogeneity implicitly assume), butrather they arrive at their views through heteroge-neous experience.

    Of course, as Equation (6) indicates, the varianceof perceived quality around true quality declinesas more signals are received and in the limit per-ceived quality converges to true quality. Still, het-erogeneity in perceptions will persist over time, forseveral reasons: (i) both brands and consumers are

    finitely lived, (ii) there is a flow of new brands andconsumers entering a market, and (iii) as peoplegather more information, the value of trial diminishes,and the incentive to learn about unfamiliar prod-ucts decreases. Intuitively, once a consumer is familiarwith a substantial subset of brands, there is rarelymuch marginal benefit to learning about all the rest.

    In general, learning models must be solved bydynamic programming (DP) because todays pur-chase affects tomorrows information set, whichaffects future utility. The key idea of DP is that, ateach timet , the value of choosing optionjconsists ofan immediate payoffplus the expected present value

    of the future payoff stream that arises from periodt+1 onward. This forward-looking term is conditionalon the option j chosen at time t, as the choice at jalters the consumers information set, which in turnaffects the choices that he or she makes in the future.

    In our notation, the information set is It , the valueof choosing alternative j at time t is Vjt It, thecurrent payoff is a context-specific utility function,and the expected present value of future payoffs(conditional on It and j), or future component, isEVIt+1 It j).

    If choices convey not just utility but also infor-mation, it may not be optimal to choose the brand

    with the highestperceivedquality in thecurrentperiod.To see this, consider the special case where the choiceis between an old familiar brand (whose attributes areknown with certainty) and a new brand. Denote these

    by j= o n (for old and new). The information set isIt = Qnt

    2nt, where we suppress the values for the

    old brand, which are Qo and2ot = 0. Prices are given

    byPjt for j= o n. Then values of choosing each brandin the current period are

    Vnt It=EUQEnt Pnt It+ EVIt+1 Itn

    where It+1 = Qn t+1 2n t+1 (7)

    Vot It= EUQEot Pot + EVIt+1 Ito

    whereIt+1 = It (8)

    At time t, a consumer chooses the brand withthe highest V, which is not necessarily the brandwith the highest expected utility. It is importantto understand why. Purchase of the old familiar

    brand gives expected utility EUQEot Pot It. This isincreasing in true quality Qo, which is known. It isdecreasing in experience variability2 , assuming con-sumers are risk averse. On the other hand, pur-chase of the new brand delivers expected utility

  • 8/11/2019 An Assessment Of

    6/27

  • 8/11/2019 An Assessment Of

    7/27

    Ching, Erdem, and Keane: An Assessment of Progress, Challenges, and New Developments918 Marketing Science 32(6), pp. 913938, 2013 INFORMS

    loyalty phenomenon. (In the Online Appendix A, wegive some details on estimation of this model.)

    2.2. Introducing Risk Aversion andExogenous Signals

    Next, we introduce two key features of the Erdem andKeane (1996) model that generalize the simple set-ting described above. First, we introduce exogenoussignals of quality (e.g., advertising) as an additionalsource of information besides use experience. Second,we consider utility functions that exhibit risk aversionwith respect to variation in brand attributes (focusingagain on quality). We should note that both of thesefeatures were already present in Roberts and Urban(1988) but in a static choice context.

    There are numerous ways one can obtain infor-mation about a brand other than trial purchase.9

    Examples include advertising, word of mouth, maga-zine articles, and dealer visits. For simplicity, we willrefer to these as exogenous signals, as we may thinkof them as arriving randomly from the outside envi-ronment. (Of course, a consumer may also activelyseek out such signals, an extension we discuss below.)For frequently purchased goods, the most importantsource of information is probably advertising, whichis the source that Erdem and Keane (1996) consider.

    Let Ajt denote an exogenous signal (advertising,word of mouth, etc.) that a consumer receives about

    brandjat timet (before the timet purchase decision).We further assume that

    Ajt =Qj+ Ajt where

    Ajt N0

    2A

    fort = 2 T (14)This says the signals Ajt provide unbiased but noisyinformation about brand quality, where the noise hasvariance 2A. The noise is assumed to be normal tomaintain conjugacy with the prior in (1).

    It is important to compare (14) with (2). The noise intrial experience is from inherent product variability,which is largely a feature of the product itself.The noise in a signal such as advertising or wordof mouth is, in contrast, largely a function of themedium. Presumably, some media convey informa-tion more accurately than others, and no medium

    is as accurate as direct use experience. We alsostress that the noise in (14) differs fundamentallyfrom that in (2), as inherent product variabilityaffects a consumers experiencedutility from consum-ing the product, whereas exogenous quality signalsdo not. Nevertheless, both types of signals enter theconsumers learning process in the same way. Given

    9 Indeed, when considering a durable good as opposed to a fre-quently purchased good, trial purchase is no longer even a relevantconsideration (assuming that one cannot return a purchase to get arefund).

    onlythe exogenous signal Ajt , we can rewrite (5) and(6) as

    Qjt =2j1

    NAj t 2j1 +

    2A

    ts=2

    Ajs dAjs +

    2ANAj t

    2j1 +

    2A

    Qj1

    fort = 2 T (15)

    2jt = 1

    1/2j1+NA

    j t 1/2A

    fort = 2 T (16)

    where dAjt is an indicator for whether a signal forbrand j is received at time t, and NAj t is the totalnumber of ad signals received for brandjup throughand including timet .

    It is simple to extend the Bayesian updating rulesin (5) and (6) and (15) and (16) to allow for twotypes of signalsi.e., both use experience and exoge-nous signals. Our timing convention is that time t adsignals are received before the time t purchase deci-sion, whereas time t experience signals are receivedafterwards. For example, perceived quality before thepurchase decision at t =2 is Qj2 =

    2j21/

    2 Q

    Ej1 +

    1/2AAj2 + 1/2j1Qj1. Generalizing to multiple peri-

    ods, we have

    Qjt =1/2Sjt

    t1s=1

    QEjs djs +1/2ASjt

    ts=2

    Ajs dAjs +

    1/2j1Sjt

    Qj1

    fort = 2 T (17)

    2jt = 1

    1/2j1 +Njt 1/2+ NA

    j t 1/2A

    fort = 2 T (18)

    where Sjt 1/2jt . Note that the recency of signals

    does not matter: in (17) and (18), only the total stockof signals determines a consumers state. Further-more, receivingNsignals with variance2 affects theperception variance in the same way as receiving onesignal with variance 2/N.

    As we will see, these properties are important forsimplifying the solution to consumers dynamic opti-mization problem. This is because the consumerslevel of uncertainty, as captured by the 2jt

    Jj=1,

    depends only on the number of signals received, notthe order or timing with which they were received.

    One could imagine scenarios where more recent sig-nals are more salient, or, conversely, where firstimpressions are most important. These are importantpotential extensions of the model, but they wouldmake computationmuchmore difficult.

    To progress further and develop a model that canbe taken to the data, one must specify a particularfunctional form for the utility function. Of course,many functions are possible. Erdem and Keane (1996)assume a utility function of the form:

    U QEjt Pjt =wQQEjt wQrQ

    Ejt

    2+wPXPjt +ejt (19)

  • 8/11/2019 An Assessment Of

    8/27

    Ching, Erdem, and Keane: An Assessment of Progress, Challenges, and New DevelopmentsMarketing Science 32(6), pp. 913938, 2013 INFORMS 919

    Here, utility is quadratic in the experienced qualityof brand j at time t and linear in consumption ofthe composite outside good Ct= X Pjt , where X isincome. The parameter wQ is the weight on quality,r is the risk coefficient, wP is the marginal utility ofthe outside good, and ejt is an idiosyncratic brand

    and time-specific error term.10

    Note that, as choicesonly depend on utility differences, and as income isthe same regardless of which brand is chosen, incomedrops out of the model. So we can think ofwp as theprice coefficient.

    Given (19), combined with (2) and (18), expectedutility is given by

    EUQEjt Pjt It = wQQjt wQrQ2jt wQr

    2jt +

    2

    wPPjt + ejt j= 1 J (20)

    Also, as the EK model was meant to be applied toweekly data, and as consumers may not buy every

    week, a utility of the no-purchase option must alsobe specified. Erdem and Keane (1996) write this asEU0 It=0 +1t+ e0t . The time trend captures in asimple way the possibility of changing value of sub-stitutes for the category in question.

    We have now specified the complete EK model, andwe are in a position to formally state the consumersproblem. Consumers are assumed to be forward-looking, making choices to maximize value functionsof the form:

    Vjt It= EUQEjt Pjt It+ EVIt+1 It j

    forj= 0 J (21)where the consumers information set is given by

    It+1 =

    Q1 t+1 QJ t+1 21 t+1

    2J t+1

    (22)

    A key point is that a consumers information abouta brand may be updated betweent andt + 1 for oneof two reasons: (i) the consumer buys the brand, or(ii) the consumer receives an exogenous signal aboutthe brand. Henceforth, we simply refer to these as adsignals. In forming EVIt+1 It j, we allow for bothsources of information. We describe the process indetail in the next section.

    With the introduction of risk aversion, the EKmodel can capture both gains from trial and brand

    10 A common assumption is that utility is linear in the consump-tion of the outside good, so there are no income effects when deal-ing with inexpensive items such as frequently purchased consumergoods. This also means that the marginal utility of consumptionwpis constant within the range of outside good consumption levelsspanned by different brand choices. However, it is likely that themarginal utility of consumption wpwould be lower for householdsat higher wealth levels. And the assumption of no income effectswould not be tenable for expensive durable goods.

    loyalty phenomena. As we discussed earlier, theEVIt+1 It j terms in a dynamic learning modelcapture the gain from trial information. These aregreater for less familiar brands, where the gain fromtrial is greater. At the same time, the risk terms 2jtare also greater for such brands. These two forces

    work against each other; which dominates determineswhether a consumer is more or less likely to try a newunfamiliar brand over a familiar brand. In categorieswhere risk aversion dominates, we would expect tosee a high degree of brand loyalty (i.e., persistence inchoice behavior). In categories where the gains fromtrial dominate, we would expect to see a high degreeof brand switching (as a result of experimentation).11

    2.3. Solving the Dynamic Optimization ProblemHere, we show how to solve a consumers DP prob-lem. Solving the DP problem is computationally diffi-cult for two reasons: (i) the expected value functions

    in (21) are high-dimensional integrals, and (ii) theseintegrals must be evaluated at many state points.The expected value functions in (21) have the form:

    EVIt+1 Itj

    =Et max

    V 0t+1It+1VJt+1It+1

    (23)

    That is, the consumer at time t knows that, attime t + 1, he or she will choose from among theJ options the one with the highest value function.The consumer can form the expected maximum overthese value functions because his or her informationset and decision at time t (i.e., the (It j)) generate a

    distribution ofIt+1 in the manner described earlier.However, it is not immediately obvious how

    (23) helps us to solve the consumers optimizationproblem. The Vs on the right-hand side of (23)themselves contain expected value functions dated att + 2that is, functions of the form EVIt+2 It+1 j.So it seems we have only pushed the problem oneperiod ahead. One key insight for solving a dynamicprogramming problem is the assumption that thereexists a terminal period Tbeyond which a consumerdoes not plan. At T, the consumer will simply choosethe option with highest expected utility. Thus, wehave that

    VjTIT=EUQEjT PjT IT forj=0J (24)

    EVIT IT1j =ET1 max

    EU0 IT

    EUQE1TP1TITEUQEJ TPJ TIT

    (25)

    11 We discuss the identification of learning models in detail in 2.5.Here, we note that it is not possible to determine from raw datapatterns the magnitudes of risk aversion versus gains from trial.Such an inference can only be made conditional on an assumedmodel structure.

  • 8/11/2019 An Assessment Of

    9/27

    Ching, Erdem, and Keane: An Assessment of Progress, Challenges, and New Developments920 Marketing Science 32(6), pp. 913938, 2013 INFORMS

    The integral in (25) is feasible to evaluate. Sup-pose that, hypothetically, the IT and PTwere knownat T 1. Then, as we see from (20), the only un-knowns appearing in (25) would be the logisticerrors e0T eJ T. In that case, (25) would havea simple closed form given by the well-known

    nested logit inclusive value formula (see Rust 1994).This illustrates the point that estimating a finite-horizon dynamic model is very much like estimat-ing a nested logit model, if one thinks of movingdown the nesting structure as a process that plays outover time.

    Of course, evaluating (25) in the EK model is moredifficult because theITandPTare not known at T 1.Both experience signals and ad signals may arrive

    betweenT 1 andT, causing the consumer to updatehis or her information set. The expectation in (25)must be taken over the possibleIT that may arise as aresult of these signals. Specifically, we must (i) update

    2j T1 to

    2j Tusing (18) to account for additional useexperience, (ii) integrate over possible values of the

    use experience signal in (2) to take the expectationover possible realizations of QjT , (iii) integrate overpossible ad exposures that may arrive between T 1and T(i.e., over realizations ofd AjT for j= 1 J ) toaccount for ad-induced changes in the2jT , and (iv)integrate over possible values of the ad signals in (14),as these will lead to different values of the QjT .

    12

    Clearly, the integrals in (25) are high dimensional,and simulation methods are needed. That is, weinte-

    grate by simulationover draws from the distributionsof the signal processes. The computational burden

    increases if consumers learn about multiple brandsand/or have more than one source of information.Memory is also an issue, as all the EVIT IT1 jmust

    be saved.Having calculated the values of EVIT IT1 j for

    every possible (IT1 j), and saving the results (a pointwe return to below), we can move back to timeT 1,where (21) becomes

    VjT 1 IT1= EU QEjT1 PjT1 IT1

    + EVIT IT1j (26)

    Note that (26) is just like (24), except for theEVIT IT1 j terms that are appended. But we havealready solved for these and saved them in memory,so they are just numbers. So we can now constructthe VjT 1 IT1. This enables us to proceed back-wards and calculate the time T 1 version of (25),

    12 It is also necessary to integrate over future price realizations forall brands. To make this integration as simple as possible, Erdemand Keane (1996) assume that the price of each brand is distributednormally around a brand-specific mean, with no serial correlationother than that induced by these mean differences.

    and we can thus obtain the EVIT1 IT2 j. Then wecan work back again and obtain the VjT 2 IT2.This backwards induction process is repeated until wehave solved the entire dynamic programming prob-lem back to t=1. Detailed descriptions of this pro-cess, known as backsolving, are contained in many

    sources (see, for instance, Keane et al. 2011).In practice, T is generally chosen to be some timebeyond the end of the sample period. This can bechosen far enough out so that results are not sensitiveto the exact value ofT.

    Unfortunately, the above description is oversimpli-fied because it assumes that it is feasible to calculatethe value EVIT IT1 jfor every possible(IT1 j). Notethat the number of variables that characterize the stateof an agent in (22) is 2 J. Solving a dynamic pro-gramming problem exactly requires that one solve theexpected value function integrals ateverypoint in thestate space; this is clearly not feasible here because

    there are too many state variables. Of course, as thestate variables in (22) are continuous, it would beimpossible to solve for the expected value functionsat every state point (because the number of points isinfinite). A common approach is to discretize continu-ous state variables using a fairly fine grid. Suppose weuseGgrid points for each state variable.13 As we have2 Jstate variables, this gives G2J grid points, whichis impractically large even for modestGandJ. This isknown as the curse of dimensionality. A number ofways to deal with this problem have been proposed.

    To solve the optimization problem in their model

    (that is, to construct the EVIt+1 It jin (21)), Erdemand Keane (1996) use an approximate solution methoddeveloped in Keane and Wolpin (1994). The idea isto evaluate the expected value function integrals ata randomly selected subset of state points (wherethis set is relatively small compared with the size ofthe total state space).14 The expected value functionsare then constructed at other points via interpolation.For instance, one can run a regression of the valuefunctions on the state variables (at the random subsetof state points) and use the regression to predict thevalue functions at other points. We give more detail onhow to apply this method to estimate learning models

    in Online Appendix B.

    13 Note that range of the discretization needs to big enough to coverthe true Qjs (or

    2j 0s), which are unknown to researchers a priori.

    In Online Appendix B, we outline a procedure to determine thebounds.14 In most applications, the expected value function integrals aresimulated using Monte Carlo methods rather than evaluatednumerically. This makes it practical to deal with the three aspectsof integration described below in Equation (31): integration overcontent of use experience signals, integration over exposure to ads,and integration over the content of ads.

  • 8/11/2019 An Assessment Of

    10/27

    Ching, Erdem, and Keane: An Assessment of Progress, Challenges, and New DevelopmentsMarketing Science 32(6), pp. 913938, 2013 INFORMS 921

    The KeaneWolpin approximation method (Keaneand Wolpin 1994), or variants on its basic idea, has

    become widely used in both economics and marketingin the past 15 years to solve many types of dynamicmodels. This has greatly increased the richness andcomplexity of the dynamic models that it is feasible to

    estimate. We will not give details of these computa-tional methods here but refer the reader to surveys byKeane et al. (2011), Aguirregebirria and Mira (2010),Geweke and Keane (2001), and Rust (1994).

    Finally, a common question is, how can we solvethe DP problem when we do not know the trueparameter values for the utility function or thestochastic processes that generate signals? The answeris that the DP problem must be solved at each trialparameter value considered during the search processfor the maximum of the likelihood function. In otherwords, the DP solution is nested within the likelihoodevaluation. We consider the construction of the likeli-

    hood function in the next section.2.4. Evaluating the Likelihood FunctionIn this section we discuss how to form the likeli-hood function for the EK learning model. Let =wQ wPrQj1

    2j1

    2

    2Adenote the entire vector

    of model parameters. Combining Equations (20) and(21), we have the choice-specific value functions:

    VjtIt= wQQjt wQrQ2jt wQr

    2jt +

    2 wPPjt

    +ejt +EVIt+1 Itj j =1J (27a)

    V 0t It0+1t+e0t+EVIt+1 It0 (27b)

    Erdem and Keane (1996) assume that the idiosyn-cratic brand- and time-specific error terms ejt in (27)are independent and identically distributed extremevalue. In this case, the choice probabilities have asimple multinomial logit form:

    P j It=expVj

    expV0+J

    k=1 expVk (28)

    where

    Vj wQQjt wQrQ2jt wQr

    2jt +

    2 wPPjt

    + EVIt+1 It j j = 1 J (29a)V0 0 +1t + EVIt+1 It 0 (29b)

    Equations (28) and (29) illustrate the point, stressedby Keane et al. (2011), that choice probabilities indynamic discrete choice models look exactly likethose in static discrete choice models (multinomiallogit in the present case), except that the Vj termsin the dynamic model include the extra EVIt+1 It jterms. However, once one has solved the DP prob-lem, these extra terms are merely numbers that one

    can look up in a table that is saved in computer mem-ory (i.e., a table that lists expected value functions atevery point in the state space). Alternatively, if onehas used an interpolating method rather than sav-ing every value, the appropriate EVIt+1 It jmay beconstructed as needed using the interpolating func-

    tion. Erdem and Keane (1996) use the latter procedurebecause the number of possible states in their modelis so large.

    To proceed in constructing the likelihood, we needsome definitions. Let jt denote the choice actuallymade at time t (we continue to suppress the i sub-scripts to conserve on notation). Let Dt1 j1 jt 1denote the history of purchases made beforetime t . Similarly, let D At d

    A2 d

    At denote the his-

    tory of ads received up through time t, where dAt dAjt

    Jj=1. LetAt A2td

    A2t AJt d

    AJt denote the content

    of ads seen at time t. Also, let QEt1 QEjss

    t1s=1 and

    At Asts=2denote the actualcontentof all experience

    and ad signals received up throught. (Recall thatDt1and DAt are observed by the econometrician, whereasQEt1 and At are not.)

    Finally, we define Pjt I1 Dt1 DAt Q

    Et1

    At asthe probability of a persons choice at time t givenhis or her initial state (I1), the history of use expe-rience before time t, and advertising exposures upthrough timet , as well as the content of those signals.It is worth emphasizing the timing convention thatthe ads at timet are observed before the time t choiceis made.

    Unfortunately, we cannot observe the actualcontentof ad and experience signals. Thus, we mustintegrate

    over that content to obtain unconditional probabilitiesPjt I1 Dt1 D

    At . Thus, the probability of a choice

    history for an individual takes the following form:

    P

    jtTt=1

    =T

    t=1

    Pjt It=T

    t=1

    Pjt I1 Dt1 DAt

    =

    QEjt1t1At

    T

    t=1

    Tt=1

    Pjt I1 Dt1 D

    At Q

    Et1

    At

    dQEjt1t1At

    Tt=1 (30)

    In (30) we integrate over all experience and advertis-ing signals that the consumer may have received fromt = 1 T . That is, we integrate over the distributionof QEjt1 t1 At

    Tt=1. Clearly, the required order of

    integration is substantial.15

    To deal with this problem, Erdem and Keane(1996) use simulated maximum likelihood (see, e.g.,

    15 It is worth emphasizing that this high-order integration problemarises even in a static learning model (i.e., with myopic agents) aslong as the contents of signals are not observed.

  • 8/11/2019 An Assessment Of

    11/27

    Ching, Erdem, and Keane: An Assessment of Progress, Challenges, and New Developments922 Marketing Science 32(6), pp. 913938, 2013 INFORMS

    Keane 1993, 1994). Specifically, drawD sets of signals

    Sd =

    Qdjt1 t1Adt

    Tt=1

    ford = 1 D,

    using the distributions defined in (2) and (14). Thenform the simulated probability:

    PjtTt=1= 1DD

    d=1

    Tt=1

    Pjt I1 Dt1 D

    At S

    d (31)

    Finally, sum the logs of these probabilities acrossindividuals i = 1 N .

    A key complication is that consumer purchasehistories and ad exposures are not usually observed

    before the start of the sample period. This creates aninitial conditions problem. A consumer who likesa particular brand will have bought it often beforethe sample period starts. Thus, brand preference iscorrelated with the information set at t = 1. The usualconsequence is to exaggerate the impact of lagged

    purchases on current choices. An exact solution to theinitial conditions problem requires integrating overall possible initial conditions when forming the like-lihood, but in most cases, this is not computationallyfeasible. Thus, a number of approximate (ad hoc)approaches have been proposed. For example, Erdemand Keane (1996) have three years of scanner dataavailable, but they use the first two years to estimatethe initial conditions for each consumer at the startof the third year and then use only the third year inestimation.

    2.5. Identification

    The discussion of identification can be confusing,as the word has multiple meanings. It can meanshowing that the parameters of a model are identified

    given the assumed model structure. This may involve for-mal proof as well as intuitive discussion of what datapatterns drive the estimates. We discuss identificationin this narrow sense in 2.5.1.

    Identification can also mean analysis of whichassumptions are necessary to estimate a model, or

    just convenient.16 For example, can assumptions suchas Bayesian updating or normal signals be relaxed?

    16 This is known as nonparametric identification analysis. Un-fortunately, this literature has been misinterpreted by manyresearchers as suggesting that it may be possible to obtain model-free evidence about behavior. In fact, the approach of thenonparametric identification literature is to make a priori assump-tions about certain parts of a model and then show that someother part (e.g., the functional form of utility or an error distri-

    bution) is identified without further assumptions. Thus, what isnonparametrically identified is just a part of the model, not allof it. For instance, Matzkin (2007) says the ideal of nonparamet-ric estimation is to start with a structural model and then imposeonly restrictions implied by theory (e.g., continuity, monotonicity,homogeneity, equilibrium conditions). One then uses the data toidentify functional forms and distributions that are not pinned

    Even more generally, can one distinguish the learningmodel from other plausible models that also gener-ate state dependence? How can we tell if consumersare forward-looking? We discuss identification in thisbroad sense in 2.5.2.

    Finally, some parameters may be formally iden-

    tified but difficult to pinpoint in finite samples.We examine this issue in 2.6, when we discuss theestimates of the EK model.

    2.5.1. Identification of Learning Model Para-meters (Given the Model Structure). Some keypoints about identification become apparent fromexamining (27)(29). First, suppose that consumershave complete information about all brands. Then wehave EVIt+1 It j = EVIt = kfor j= 1 J , wherekis a constant. That is, there is no updating of infor-mation sets based on choice, and so the EVIt+1 It jterms drop out of the model (just like any term that isconstant across choices in a discrete choice model).What remains is a static model where

    Vj t It= Q1 QJ 0 0

    wQQj wQrQ2jwQr

    2 wPPjt + ejt (32)

    Here, we have set 2jt = 0 and Qjt = Qjbecause thereis no uncertainty about quality.

    Obviously, we cannot identify 2A, or the priorsQj1

    2j1as they drop out of the model. We also can-

    not identify2 , as it is constant across alternativesj=1 J .17 Careful inspection of (32) reveals that r isnot identified either, as it cannot be disentangled from

    the scaling ofQj. (IfQjhad a known scale, this wouldnot be a problem.) Thus, r, 2A, 2 , and the priors

    Qj1 2j1 only affect choice probabilities through the

    EVIt+1 It jterms.So in an environment of complete information, all

    that can be identified are the price coefficient wp, theproductswQQj, and the terms 0and 1in the valueof the no-purchase option.18 Furthermore, as only util-ity differences matter for choice, we need a normal-ization to establish a reference alternative. Erdem andKeane (1996) set Q j= 1 for one brand, so the qualityof all other brands is measured relative to brand j.19

    down by theory. A related point is that observing data patterns thatseem consistent or inconsistent with a model can make that modelseem more or less plausible, given our priors. Yet they can neverprovide nonparametric evidence that a model is correct. In 4.5 wegive two examples to illustrate these points.17 Note that 2 does not enter the value of the no-purchase option.However, any shift in 2 can be undone by a shift in 0, leavingutility differences unchanged.18 The scale normalization on utility is imposed by assuming thatthe scale parameter of the extreme value errors is 1.19 It is worth noting that an alternative normalization would beto set Qj= 0 for one brand. However, this would not let one

  • 8/11/2019 An Assessment Of

    12/27

    Ching, Erdem, and Keane: An Assessment of Progress, Challenges, and New DevelopmentsMarketing Science 32(6), pp. 913938, 2013 INFORMS 923

    Alternatively, one could fix wQ. To summarize, byobserving consumers with essentially complete infor-mation (i.e., those with a great deal of experience withall brands), we can identify wQ and the Qj, givennormalization, as well as wp, 0, and1.

    The identification of the parameters 2A 2 , and

    r, as well as the priors Qj1 2

    j1, requires thatincomplete information actually exists. In that case,variation in EVIt+1 It j and

    2jt across consumers

    is generated by variation in the information sets Iit .Intuitively, the parameters 2A

    2 , and r, as well

    as Qj1 2j1 are identified by the extent to which,

    ceteris paribus, consumers with different informa-tion sets are observed to have different choice prob-abilities. For instance, by comparing (27a) and (32),we can clearly see that variation in 2jt across con-sumers, arising from variation in use experience andad exposures, enables us to identifyr. (This is becausewQ is already identified from consumers with com-

    plete information, as noted earlier.)Similarly, variation of Iit within consumers over

    time is also relevant. The learning parameters2A 2 ,

    and Qj1 2j1 determine how the arrival of ad and

    use experience signals change 2jt and EVIt+1 It j.Thus, these parameters are pinned down by the extentto which the arrival of signals alters behavior overtime. For instance, if behavior is greatly altered by thearrival of one use experience signal, it implies that2j1is large and 2 is small.

    It is worth stressing that this argument for identi-fication based on comparing behavior of consumerswith different amounts of information applies in bothdynamic and static models. Indeed, this is the sourceof identification of the learning-related parameters inthe static Bayesian learning model of Roberts andUrban (1988). Also note that variation in ad exposuresand in prices are plausibly exogenous sources of vari-ation in the Iit .

    Now we turn to dynamic considerations. Recallfrom 2.1 that consumers will only engage in strategictrial if >0. But in the typical scanner data set, wecannot observe whether a purchase is a trial. Thus,aside from the functional forms of utility and theEV functions, the discount factor is pinned down

    by variables that affect the EVIt+1 It j but do notaffect current utility. In the EK model, there are twoexogenous variables that play this role. These arethe brand-specific price variances and advertising fre-quencies. There is no reason for these variables to

    disentangle the wQQj products. Thus, Erdem and Keane (1996)instead set Q j= 1 for one brand. Also, with quadratic utility, it isdesirable to constrain the largest Qjto fall in the region of increas-ing utility. Erdem and Keane impose this constraint in estimation

    by updating the level ofQj at each step (while keeping relative Qvalues fixed).

    affect behavior in a static model. In such a modelone cares only about the current price and the currentstock of information, not the likelihood of future dealsor future information arrival.

    2.5.2. Identification in the General or Nonpara-metric Sense. As Erdem and Keane (1996) discuss in

    some detail, the Bayesian learning model implies aparticular form of state dependence and serial cor-relation in the errors. This can be seen from carefulexamination of Equations (17) and (18) and (27). A fre-quently asked question is how learning behavior can

    be distinguished from other forms of state or serialdependence.

    In his fundamentally important paper on paneldata, Chamberlain (1984) defines the relationship

    between two variablesytand xtas static conditionalon a latent variablec if (i)ytis independent of laggedxconditional on x t and c , and (ii) x t is strictlyexoge-nous with respect to y conditional on c (i.e., yt doesnot cause futurex). As Chamberlain shows, this staticcondition is actually stronger than the condition inwhich there is no structural state dependence (seeHeckman 1981), as the latter does not require strictexogeneity.20

    Rather remarkably, Chamberlain (1984) also showsthat in nonlinear models (such as discrete choice mod-els), one can always find a distribution of the latentvariable c such that the relationship between yt andxtis static. In simple terms, one can always find a suf-ficiently flexible/complex heterogeneity distributionsuch that state dependence is not needed to explain

    the data. The key implication is that one cannot con-struct a nonparametric test of whether state depen-dence exists.

    Obviously, if one cannot form a nonparametric testof whether state dependence exists, then it is truea fortiori that one cannot form a nonparametric test ofwhether anyparticularform of state dependence (suchas learning) exists (i.e., if heterogeneity can accountfor general state dependence, it can also account forany particular form of state dependence). Nor can weform nonparametric tests to distinguish among com-peting forms of state dependence (e.g., learning ver-sus inventories versus adjustment costs).

    Chamberlains (1984) result is an instance of theCowles Foundation view that one cannot deduceinteresting economic relationships from the dataalone. One needs a priori identifying assumptions,regardless of what sort of idealized variation is

    20 Note that a dependence of yt on a lagged x is the definingcharacteristic of structural state dependence. This is ruled out bycondition (i), but condition (ii) is additional. In the learning model,the exogenousx variables correspond to advertising exposures andprices. These variables affect purchase decisions by shifting the Iitand budget constraints.

  • 8/11/2019 An Assessment Of

    13/27

    Ching, Erdem, and Keane: An Assessment of Progress, Challenges, and New Developments924 Marketing Science 32(6), pp. 913938, 2013 INFORMS

    present in the data.21 Thus, our interpretation of datawill always be subjective, as it is contingent on ourmodel. To be concrete, both the extent and the natureof any state dependence we find in discrete choicedata will depend on the assumed functional formsfor state dependence and heterogeneity (see, e.g.,

    Keane 1997).As described in 2.5.1, in the parameterized EKlearning model, we identify parameters that describedynamics from variation in choice behavior acrossconsumers with different information sets (Iit ) andwithin consumers as their information sets changeover time. This variation in Iit arises from differ-ent histories of use experience, ad exposures, andprices. However, Chamberlains (1984) results implythat differences in behavior stemming from differ-ences in history (i.e., state dependence) cannot bedistinguished nonparametrically from differences in

    behavior stemming from a completely general form

    of heterogeneity. Nor can learning behavior be dis-tinguished nonparametrically from other mechanismsthat may induce state dependence.

    Thus, functional forms of both state dependenceand heterogeneity must be constrained for the learn-ing model to be identified. But this is true ofany nonlinear dynamic model. For a structuraleconometrician, this is not a limitationa modelthat specifies very general forms of state dependenceand/or heterogeneity so as to obtain a good fit tothe data is merely a statistical model with no struc-tural/behavioral interpretation. Such a model cannot

    be used for policy experiments. Furthermore, Occams

    razor suggests that we do not wish to work withsuch general models. What we seek are parsimoniousmodels that fit well, give useful insights into the data,and can be used for policy experiments.

    Recognizing the impossibility of completely non-parametric identification of learning effects, we canstill give some contingent answers to the questions weasked at the start of 2.5. First, note that the Bayesianupdating and normal signaling assumptions can berelaxed. We discuss some papers that do this in 3.1.1.

    Second, in principle, one can distinguish learningfrom other plausible mechanisms that may generate

    state dependence (such as inventories or switching

    21 As Koopmans et al. (1950, pp. 6364) state, Suppose [investigator] B is faced with the problem of identifying thestructural equations that alone reflect specified laws of economic

    behavior Statistical observation will in favorable circumstancespermit him to estimate the probability distribution of the vari-ables. Under no circumstances whatever will passive statisticalobservation permit him to distinguish between different mathe-matically equivalent ways of writing down that distribution The only way in which he can hope to identify and measure indi-vidual structural equations is with the help of a priori specifica-tions of the form of each structural equation.

    costs), but only if one is willing to specify paramet-ric forms for the competing models. This is consis-tent with the Bayesian decision theory view that oneneeds a model to beat a model. We return to thispoint in 4.

    Third, the questions of whether we can identify

    the discount factor and whether we can test if con-sumers are forward-looking are obviously closelyrelated. What is interesting, however, is that Chinget al. (2012a) show that, in the learning model, onecan identify whether consumers are forward-lookingusing only (i) the laws of motion of the state variablesand (ii) the form of current utility. Yet identification ofthe discount rate requires assumptions about the fullstructure (i.e., expectation formation) so that one canconstruct the expected value functions.

    To see this, suppose we adopt the Geweke andKeane (2000) method to estimate dynamic modelswithout the need to solve agents DP problem and

    without imposing the full structure of the model.To implement their method, we take the value func-tion in (21):

    Vjt It= EUQEjt Pjt It+ EVIt+1 It j

    for j= 0 J (21)

    and replace it with the equation:

    Vjt It= EUQEjt Pjt It+ F It+1It j t

    forj= 0 J (33)

    Here,F It+1It j t EVIt+1 It jis a polynomialin the state variables that approximates the futurecomponent of the value function, andtis a vector ofreduced-form parameters that characterize the futurecomponent. The idea of the GewekeKeane method isto estimate the t jointly with the structural param-eters that enter the current-period expected utilityfunction.

    Notice that, as F is just a flexible function of thestate variables, all that is assumed is that consumersunderstand the laws of motion of the state variables(i.e., how (It+1 It j) is formed). They need not formexpectations based on the true model. The approach is

    also agnostic about whether consumers use Bayesianupdating or some other method. In general, identifi-cation of t requires exclusion restrictions such thatsome variable enters Fbut not U.22

    22 Geweke and Keane (2000) point out that in the absence ofexclusion restrictions, one must observe current payoffs (at leastpartially) to identify F. In labor economics, researchers argue thatwages capture much of the current payoff (e.g., Houser 2003).Researchers can also control current payoffs in a lab experiment(e.g., Houser et al. 2004). Recently, Yao et al. (2012) propose anotherstrategy to identify the discount factor. They argued that if a data

  • 8/11/2019 An Assessment Of

    14/27

    Ching, Erdem, and Keane: An Assessment of Progress, Challenges, and New DevelopmentsMarketing Science 32(6), pp. 913938, 2013 INFORMS 925

    We see from (33) that, when the full structure is notimposed, one cost is that we lose identification of thediscount factor. The is subsumed as a scaling fac-tor for the parameters t of the F function.

    23 On theother hand, we can test whether t =0, which is atest for forward-looking behavior. Although the test

    makes weak assumptions about F, it is notnonpara-metric, as a functional form must be chosen for thecurrent payoff function. As Ching et al. (2012a) show,given the current payoff function, the t are identi-fied in the learning model because different currentchoices lead to different values of next periods statevariables.

    2.6. Key Substantive Results ofErdem and Keane (1996)

    Erdem and Keane (1996) estimate their model onNielsen scanner panel data on liquid detergentfrom Sioux Falls, South Dakota. The sample periodspanned 1986 to 1988, but only the last 51 weeks wereused; at that time, telemeters were attached to pan-elists TVs to measure household-specific ad expo-sures. The data include seven brands. A nice featureof the data is that three brands were introduced dur-ing the period, generating variability in consumersfamiliarity with the brands. The estimation samplecontained 167 households who met various criteria,such as having a working telemeter.

    Some key issues that arose in estimation are worthdiscussing because they are common across manyapplications of dynamic learning models. First, Erdemand Keane (1996) had difficulty obtaining a preciseestimate of the weekly discount factor, and so theypegged it at 0.995.24 Identification of the discount fac-tor is often a practical problem in dynamic models,

    set consists of two regimes, a static environment and a dynamicenvironment, one can first estimate the parameters of the currentpayoff function using the static environment data and then holdthem fixed when estimating the discount factor using the dynamicenvironment data. Their approach requires the assumption that thecurrent payoff function remains unchanged across regimes.23 Several papers have explored using exclusion restrictions to esti-mate the discount factor. Chevalier and Goolsbee (2009) and, morerecently, Ishihara and Ching (2012) use the resale value of a usedgood as an exclusion restriction in estimating dynamic demandmodels for new and used goods. In a dynamic store choice model,

    Ching et al. (2012b) use cumulative points earned via a rewardprogram as an exclusion restriction. In a study of salesperson pro-ductivity, Chung et al. (2013) use cumulative sales as an exclusionrestriction. The ideas presented in Ching et al. (2012b) and Chunget al. (2013) are similar: cumulative points (or sales) do not affectcurrent payoffs until they reach certain cutoffs so that customers(or sales representatives) can receive a bonus. Fang and Wang(2010) show that even parameters of quasi-hyperbolic discountingcan be identified if a dynamic model has exclusion restrictions andone has panel data with at least three periods.24 In trying to estimate the weekly discount factor, they obtained1.001 with a standard error of 0.02. This standard error implies alarge range of annual discount factors. It is also worth noting that

    even when it is formally identified. (We discuss thisfurther in 4.3.) Second, the authors also found it diffi-cult to pinpoint the prior mean of quality. Hence, theyconstrained it to equal the average true quality levelacross all brands. This implies peoples priors are cor-recton average. They also constrained the prior uncer-

    tainty to be equal across brands,j1 =1, as allowingit to differ did not significantly improve the fit.Aside from the dynamic learning model, Erdem

    and Keane (1996) estimate two other models forcomparison. These are a myopic learning model(= 0) and a reduced-form model similar to that ofGuadagni and Little (1983; henceforth referred to asGL). The latter is a multinomial logit with an expo-nentially smoothed weighted average of past pur-chases (the loyaltyvariable), a similar variable for adexposures, a price coefficient, brand intercepts, andtrends for values of no purchase and small brands.

    It is striking that Erdem and Keane (1996) find that

    both structural learning models fit substantially bet-ter than the GL model.25 This is surprising, as GLspecifies flexible (albeit ad hoc) functional forms foreffects of past use and ad exposures on current choiceprobabilities, whereas the Bayesian learning modelsimpose a special structure. Specifically, as we sawin (17) and (18), only the sum of past experience orad exposures matter in the Bayesian models, not thetiming of signals.

    Another striking result is that advertising is notsignificant in the GL model, implying that advertis-ing has no effect on brand choice. In the EK model,there is no one coefficient to capture the effect ofadvertising. The parameter r is significant and pos-itive, so consumers are risk averse with respect toquality, whereas the1 , andAimply that (i) con-sumers have rather precise priors about new brandsin the detergent category, and (ii) experience signalsare much more accurate than ad signals. Yet the effectof advertising can only be assessed via simulations.

    Erdem and Keane (1996) use their model to sim-ulate an increase in ad frequency for Surf from 23%to 70%.26 The simulation was also done for a hypo-thetical new brand with the characteristics of Surf.The results imply that an increase in advertising has

    little effect on market share for about four months,but the impact is substantial after about seven or

    Erdem and Keane (1996) set the terminal period for the DP problemat T= 100, which is 50 weeks past the end of the data set.25 The dynamic learning model had 16 parameters, whereas theother two models both had 15. Erdem and Keane (1996) obtainBayesian information criterion values of 7,531, 7,384, and 7,378for GL and the myopic and forward-looking learning models,respectively.26 Ad frequency is the weekly probability of a household seeingan ad for a brand. In the data this was 23% for Surf.

  • 8/11/2019 An Assessment Of

    15/27

    Ching, Erdem, and Keane: An Assessment of Progress, Challenges, and New Developments926 Marketing Science 32(6), pp. 913938, 2013 INFORMS

    eight months. Thus, the model implies that advertis-ing has little impact in the short run, but sustainedadvertising is important in the long run. As expected,the impact of advertising is much greater for a new

    brand (as there is more scope for learning).27

    The advertising simulation results are not surpris-

    ing in light of the parameter estimates. As consumershave rather precise priors about brands in the deter-gent category, and as ad signals are imprecise, it takessustained advertising over a long period to movepriors and/or reduce perceived risk of a brand toa significant degree.28 A clear prediction is that thehigher prior uncertainty is, and the more precise adsignals are, the larger the advertising effects will be,and the quicker they will become noticeable. Thus, animportant agenda for the literature on learning mod-els is to catalogue the magnitudes of prior uncertaintyand signal variances across categories.

    3. A Review of the Recent Literatureon Learning Models

    Here, we review developments in learning modelssubsequent to the foundational work discussed in 2.Almost all this work is post-2000, but it alreadyforms a large literature. We divide the review into(i) more complex learning models with myopicagents, (ii) more complex learning models withforward-looking consumers, (iii) learning models forproduct-level/market share data, and (iv) new appli-cations of learning models (beyond brand choice).We should note that our survey focuses on empirical

    structural learning models where agents are uncer-tain about product attributes. (There is a literatureon dynamic games where agents learn how to playequilibrium strategies or how to coordinate in multi-ple equilibria settings including social learning envi-ronments. This literature is beyond the scope ofour survey.)

    3.1. Models with Myopic AgentsOne stream of literature has focused on extendinglearning models by allowing for more complex learn-ing mechanisms. To make such extensions feasible,it is often necessary to assume that consumers are

    myopic. We consider such models in the next two

    27 We notice that Figure 1 in Erdem and Keane (1996) contains atypo. The scale on the y axis in Figure 1, which reports resultsfor the new brand with the myopic model, is incorrectly labeled.It should be labeled in the same way as Figure 5. This does notaffect any of the results discussed here.28 The simulation results also clarify why the GL model fails to findsignificant advertising effects. Theloyaltyvariable tends to put moreweight on recent advertising, and it discounts advertising from sev-eral months in the past. But the simulations show that advertisingin the past few months does little to move market shares.

    subsections that cover (i) models with more complexlearning mechanisms and (ii) models with correlatedlearning.

    3.1.1. More Complex Learning Mechanisms.Mehta et al. (2004) extend the Bayesian model toaccount for forgetting. Consumers imperfectly recall

    prior brand experiences, and the extent of forget-ting increases with time. Therefore, a consumers statedepends on the timing of signals, not just the totalnumber (as in Equation (6)). Thus, it is necessary toassume myopia to make modeling forgetting feasible.

    Deighton (1984) propose that advertising has atransformative effect whereby it alters consumerassessment of the consumption experience. Mehtaet al. (2008) include this effect in a learning model.They allow information signals from advertising to be

    biased, and this bias can change how consumers inter-pret their consumption experience. The identificationof such a model is challenging. Mehta et al. (2008) can

    achieve identification because their data set includesconsumers who rarely watch TV commercials. Theseconsumers choices allow one to identify true brandquality levels because their experience signals are notcontaminated by the transformative effect of adver-tising. After controlling for the true mean brand qual-ities, the choices of the consumers who do watch TVcommercials allow the authors to identify the bias ofthe advertising signals and the transformative effectsof advertising.

    Camacho et al. (2011) also model perception biases,but in a simpler framework. They argue that sometypes of experience may be more salient in certain

    contexts. For example, a physician may pay specialattention to feedback from patients who have justswitched treatment. The authors modify the standardBayesian model by introducing a salience parameterto capture the extra weight physicians may attach tosignals in that case. Using data on asthma drugs, theyfind evidence that feedback from switching patientsreceives 710 times more weight in physician learningthan feedback from other patients.

    Zhao et al. (2011) allow for consumer uncertaintyabout the precision of quality signals. In their model,consumers update their perception of this precisionover time. In particular, consumers who receive a very

    negative experience signal may change their percep-tion of signal variance. They estimate the model usingscanner data that span the period of a product-harmcrisis affecting Kraft Australias peanut butter divi-sion in June 1996. Their model fits the data better thana standard learning model, which assumes consumersknow the true signal variance.

    3.1.2. Models of Correlated Learning. Anotherstream of literature models information spilloveracross brands, or correlated learning. By correlatedlearning, we mean learning about a brand in one

  • 8/11/2019 An Assessment Of

    16/27

    Ching, Erdem, and Keane: An Assessment of Progress, Challenges, and New DevelopmentsMarketing Science 32(6), pp. 913938, 2013 INFORMS 927

    category by using the same brand in another cate-gory and/or learning about one attribute (e.g., drugpotency) from another (e.g., side effects). This occursif priors and/or signals are correlated across productsor attributes.

    Erdem (1998) considers a model where priors are

    correlated across umbrella brands (i.e., brands thatoperate in multiple categories). She finds evidencethat consumers learn from experience across umbrella

    brands in the toothpaste and toothbrush categories.She shows that brand dilutions can occur if a brandin the parent category (toothbrush) is extended to anew product in a different category (toothpaste) andthe new product is not well received. This frameworkhas been extended to study decisions about fishinglocations (Marcoul and Weninger 2008) and adoptionof organic food products (Sridhar et al. 2012).29

    Other papers have extended learning models tomultiattribute settings where consumers use experi-

    ence of one attribute to draw inferences about otherattributes. Prescription drugs are a good example.Coscelli and Shum (2004) estimate a diffusion modelfor omeprazole, an anti-ulcer drug. The drug cantreat (i) heartburn, (ii) hypersecretory conditions, and(iii) peptic ulcers, and it can provide (iv) mainte-nance therapy. In the model, physicians know howsignals are correlated across the four conditions.In each patientphysician encounter, a physician onlyobserves a signal of the condition being treated, buthe or she uses it to update his or her multidimen-sional prior belief.

    Chan et al. (2013) also apply a multiattribute learn-

    ing model to the drug market. They assume expe-rience signals are correlated on the two dimensionsof side effects and effectiveness. They achieve iden-tification by supplementing revealed preference datawith data on self-reported reasons for switching:side effects or ineffectiveness. Interestingly, they finddetailing visits are more effective in reducing uncer-tainty about effectiveness than side effects.

    3.2. More Sophisticated Learning Models withForward-Looking Consumers

    Following Erdem and Keane (1996), several papershave made significant contributions in the area of

    learning models with forward-looking consumers.We discuss these in turn.Ackerberg (2003) deviates from Erdem and Keane

    in several dimensions. Most notably, he models bothinformative and persuasive effects of advertising.30

    29 Hendricks and Sorensen (2009) use a similar idea of informationspillover to explain the skewness of music CD sales. They find evi-dence that a successful new album release by an artist increases thelikelihood that consumers purchase that same artists older albums.30 The separate identification of informative and persuasive effectsof advertising relies on this qualitative implication of learning

    The persuasive effect is modeled as advertisingintensity shifting consumer utility directly. The infor-mational effect is modeled by allowing consumers todraw inferences about brand quality based on adver-tising intensity.31 This is quite different from the infor-mation mechanism in Erdem and Keane (1996), where

    ad content provides noisy signals of quality. Otherdifferences are as follows: (i) Ackerberg is primarilyinterested in learning about a new product, and hismodel allows for heterogeneity in consumers matchvalue with the new product; and (ii) he assumesit takes only one trial for consumers to learn thetrue match value. Estimating the model on scannerdata for yogurt, Ackerberg finds a strong, positiveinformational effect of advertising. However, the per-suasive effect is not significant.

    The key innovation of Crawford and Shum (2005)is allowing for multiattribute learning. In an appli-cation to prescription drugs, they argue that panel

    data allow them to identify two effects: (i) symp-tomatic effects, which affect a patients per-periodutility via symptom relief; and (ii) curative effects,which alter the probability of recovery. They allowedphysicians and patients to have uncertainty along

    both dimensions (although they abstract from corre-lated learning). They also endogenize the length oftreatment by allowing patients to recover. Their esti-mates imply substantial patient heterogeneity in drugefficacy. They go on to study the welfare cost of uncer-tainty relative to the first-best environment with nouncertainty. Welfare questions cannot be addressedwithout a structural model. However, after estimat-

    ing their model, Crawford and Shum can simulatethe removal of uncertainty by setting the initial priorvariance to zero and setting each consumers priormatch value to be the true match value. By conduct-ing this experiment, they find that consumer learningallows consumers to dramatically reduce the costs ofuncertainty.

    models: as consumers gather more information over time, themarginal benefits of informative advertising must fall. Therefore, ifadvertising has any impact on brand choice in the long run, it isdue to persuasive advertising. To our knowledge, Leffler (1981) isthe first paper that proposes this identification strategy. He imple-ments it in a reduced-form model using product-level sales data for

    new and old prescription drugs. Narayanan et al. (2005) make useof the same identification argument when estimating their struc-tural model using product-level data. Recently, Ching and Ishihara(2012) propose a new identification strategy to attack this problem:they argue that informative advertising should affect all productsthat share the same features/ingredients equally, but persuasiveadvertising should be brand specific. Ching and Ishihara imple-ment their identification strategy in a prescription drug marketwhere some drugs have the same active ingredient but are mar-keted under different brand names.31 That is, ad frequency itself signals brand quality, as in the the-oretical literature on advertising as burning money (which onlyhigh-quality brands can afford to do) (Kihlstrom and Riordan 1984).

  • 8/11/2019 An Assessment Of

    17/27

    Ching, Erdem, and Keane: An Assessment of Progress, Challenges, and New Developments928 Marketing Science 32(6), pp. 913938, 2013 INFORMS

    Erdem et al. (2008) is the first paper to our knowl-edge to model the quality signaling role of pricein the context of frequently purchased goods. Theyalso allow both advertising frequency and advertisingcontent to signal quality (combining the features ofAckerberg 2003 and Erdem and Keane 1996). And they

    allow use experience to signal quality such that con-sumers may engage in strategic sampling. Thus, this isthe only paper that allows for these four key sources ofinformation simultaneously. In the ketchup category,they find that use experience provides the most pre-cise information, followed by price, then advertising.The direct information provided by ad signals is foundto be more precise than the indirect information pro-vided by ad frequency. The main finding of Erdemet al. (2008), obtained via simulation of their model, isthat when price signals quality, frequent price promo-tions can erode brand equity in the long run. They notethat there is a striking similarity between the effect of

    price cuts in their model and in an inventory model.In each case, frequent price cuts reduce consumerswillingness to pay for a product: in the signaling case,it is done by reducing perceived quality; in the inven-tory case, by making it optimal to wait for discounts.We return to this issue in 4.

    Osborne (2011) is the first paper to allow forboth learning and switching costs as sources of statedependence in a forward-looking learning model.This is important because learning is the only sourceof brand loyalty in Erdem and Keane (1996). So it ispossible they only found learning to be important asa result of omitted switching costs. However, Osborne

    finds evidence that both learning and switching costsare present in the laundry detergent category. Whenlearning is ignored, cross elasticities are underesti-mated by up to 45%.32

    Erdem et al. (2005) represents a significant exten-sion of previous learning models, as it is the firstpaper where consumers actively decide how mucheffort to devote to search before buying a durable. Thiscontrasts with Roberts and Urban (1988), where word-of-mouth (WOM) signals are assumed to arrive exoge-nously, or Erdem and Keane (1996), where ad signalsarrive exogenously. Another novel feature of the paperis that there are several information sources to choosefrom (WOM, advertisements, magazine articles, etc.).

    32 Osborne (2011) allows for a continuous distribution of consumertypes. Of course, it is impossible to solve the DP problem foreach type (which is why the DP literature usually assumes discretetypes). Thus, some approximation is necessary here. Osborne esti-mates his model by adapting the Markov chain Monte Carlo algo-rithm developed by Imai et al. (2009) and extended by Norets (2009)to accommodate serially correlated errors. It is worth noting thatNarayanan and Manchanda (2009) also estimate a learning modelwith continuous distribution of consumer types. To estimate theirmodel, however, they needed to assume that agents are myopic.

    Also, in each period, consumers decide how many ofthese sources to use.33 In their application, Erdem et al.(2005) consider technology adoption (Apple/Mac ver-sus Windows) in personal computer markets wherethere is quality and price uncertainty. As in the

    brand choice problem, consumers are not perfectly

    informed about competing technologies. A specialaspect of high-tech durables, however, is rapid techni-cal progress. This causes the price of PCs to fall rapidlyover time. Thus, there are two incentives to delay pur-chase: (i) to get a better price and (ii) to search formore information about the technologies. Delay, how-ever, implies a forgone utility of consumption. Erdemet al. (2005) estimate their model on survey panel datacollected from consumers who are in the market fora PC. Their results indicate that consumers defer pur-chases both to gather more information and to get a

    better price. But, perhaps surprisingly, simulations oftheir model imply that learning is the more important

    reason for purchase delay.Another way for consumers to learn is by observingother consumers choices instead of their opinions,i.e., observational learning (Banerjee 1992). To capturethis idea, Zhang (2010) develops a new product adop-tion model with observational learning. In her model,consumers are forward-looking and heterogeneous.One interesting implication of her model is thatobservational learning would lead to slower productadoption compared with full informational sharing(i.e., all experience signals are common knowledge).She estimates her model using data from the U.S. kid-ney market and, using a counterfactual experiment,

    quantifies the extent of inefficiency caused by obser-vational learning (compared with full informationalsharing).34

    Che et al. (2011) develop a forward-looking con-sumer brand choice model with spillover effects inlearning and changing consumer needs over time.Theirs is the first paper to model correlated learningwith forward-looking consumers. They estimate theirmodel using scanner data for the disposable diaperscategory, where consumers have to go up a size peri-odically to accommodate growing babies. This leadsto an increase in strategic trial around the time ofneeding to change sizes. Their results imply that con-

    sumer experience of a particular size of a brand pro-vides a quality signal for other sizes, and consumerquality sensitivities are lower and price sensitivitieshigher for larger sizes than they are for smaller sizes.

    Finally, Dickstein (2012) considers a model inwhich forward-looking physicians are uncertain

    33 Chintagunta et al. (2012) model how physicians learn aboutdrugs using both patients experiences and detailing.34 Cai et al. (2009) provide interesting evidence for observationallearning by studying a natural experiment where customers of arestaurant are given a ranking of some popular dishes.

  • 8/11/2019 An Assessment Of

    18/27

    Ching, Erdem, and Keane: An Assessment of Progress, Challenges, and New DevelopmentsMarketing Science 32(6), pp. 913938, 2013 INFORMS 929

    about patients intrinsic preferences for multipledrugattributes.35 This is the first model of forward-looking agents that allows for information spilloveracross alternatives. A physician uses patients util-ity of consuming a drug at time t to update hisor her belief about patients preferences parameters.

    The Bayesian updating procedure for physicians issimilar to Bayesian inference in a linear regressionmodel. An interesting implication of this approach isthat, after seeing a negative outcome of drug A, physi-cians may want to avoid other drugs that share someof the attributes of drug A.36

    3.3. Modeling Consumer Learning UsingProduct-Level Data

    The estimation technique developed by Berry et al.(1995) (henceforth referred to as BLP) led to a large

    body of demand analysis that applies static discretechoice models primarily to product-level or market

    share data.37

    In general, however, BLP cannot be usedto estimate the demand systems generated by con-sumer learning models. Learning models are alwaysdynamic in that current sales affect future demandregardless of whether consumers are forward-lookingor myopic. Demand foronebrand in a learning modeldepends on the whole distribution (across consumers)of perceived quality for all brands. We are skepticalabout whether individual heterogeneity distributionscan becrediblyidentified from aggregate (i.e., product-level) data. Furthermore, it is difficult to combine sucha complex demand system with a supply side model.

    If one wants to estimate a dynamic demand system

    with consumer learning and only market share dataare available, clearly one has to abstract from theendogenous consumer heterogeneity generated byindividual-level purchase histories (see 2.1, Equa-tion (5)). To address this issue, Narayanan et al. (2005)and Ching (2000, 2010a) propose two related modifi-cations of the EK framework. Narayanan et al. (2005)assume that every agent has an identical purchasehistory; i.e., for each brand, the quantity purchasedis equally distributed across agents in each period.Ching (2000, 2010a) assume that consumers can learn

    35

    Learning about preferences may appear to be different fromlearning about attributes. Yet the two are equivalent as long as oneassumes that the utility function is linear in attributes and prefer-ence weights.36 To estimate his model, Dickstein (2012) uses the Gittins indexapproach (Gittins and Jones 1979). Yet, as in Eckstein et al. (1988),who also use that approach, he needed to assume consumers arerisk neutral. It is worth noting that Ferreyra and Kosenok (2011) usea method similar to the Gittins index to estimate a simpler dynamiclearning problem.37 Ackerberg et al. (2007) provide an excellent survey of this area.Note that product- or market-level data are more