An Overview of Choice Models - University College …Discrete Choice Models Economic view of...

An Overview of Choice Models

Dilan Görür

Gatsby Computational Neuroscience UnitUniversity College London

May 08, 2009Machine Learning II

1 / 31

Outline

1 OverviewTerminology and NotationEconomic vs psychological views of decision makingBrief History

2 Models of Choice from PsychologyBradley Terry Luce (BTL)Elimination by Aspects

3 Models of Choice from EconomicsLogitNested LogitProbitMixed Logit

2 / 31

Outline

3 / 31

Discrete Choice ModelsTerminology

Goal of choice modelsunderstand and model the behavioral process that leads to thesubject’s choice

Data: Comparison of alternatives from a choice set

Psychophysics experimentsMarketing: which cell phone to buy

Alternatives:Items or courses of actionFeatures (i.e. aspects) may or may not be known

4 / 31

Discrete Choice ModelsTerminology

Choice setdiscrete – finitely many alternativesexhaustive – all possible alternatives are includedmutually exclusive – choosing one implies not choosing any other

Datarepeated choice from subsets of the choice setnumber of times each alternative is chosen over some others

Task: Learn the true choice probabilities

5 / 31

Discrete Choice ModelsDifferent approaches

Algebraic or absolute theories in economics vs probabilistic orstochastic theories in psychology

Random utility models – randomness in the determination ofsubjective valueConstant utility models – randomness in the decision rule

6 / 31

Discrete Choice ModelsEconomic view of decision making

Desirability precedes availabilitypreferences are predetermined in any choice situation, and do notdepend on what alternatives are available.

A vaguely biological flavorPreferences are determined from a genetically-coded tastetemplate.

The expressed preferences are functions of the consumer’s tastetemplate, experience and personal characteristics.Unobserved characteristics vary continuously with the observedcharacteristics of a consumer.

7 / 31

Discrete Choice ModelsPsychological views of decision making

Alternatives are viewed as a set of aspects that are known.The randomness in choice comes from the decision rule.

8 / 31

Discrete Choice ModelsDifferent approaches

Psychology:I Process Models

F Bradley-Terry-Luce (BTL)F Elimination by Aspects (EBA)

I Diffusion ModelsI Race Models

Economics: Random Utility Maximization ModelsI logit, nested logitI probitI mixed logit

9 / 31

Discrete Choice ModelsBasic assumptions

Simple scalabilityThe alternatives can be scaled such that the choice probability isa monotone function of the scale values of the respectivealternatives.

Independence from irrelevant alternatives (IIA)The probability of choosing an alternative over another does notdepend on the choice set.Simple scalability implies IIA

There are cases where both assumptions are violated, e.g. whenalternatives share aspects.

10 / 31

Brief History

Thurstone (1927) introduced a Law of Comperative Judgement:alternative i with true stimulus level Vi is perceived with a normalerror as Vi + εi , and the choice probability for a paired comparisonsatisfied P[1,2](1) = Φ(V1 − V2), a form now called binomial probitmodel.Random Utility Maximization (RUM) model is Thurstone’s workintroduced by Marschak (1960) into economics, exploring thetheoretical implications for choice probabilities of maximization ofutilities that contained some random elements.Independence from irrelevant alternatives (IIA) axiom introducedby Luce (1959) allowed multinomial choice probabilities to beinferred from binomial choice experiments. Luce showed forpositive probabilities that IIA implies strict utilities wi s.t.PC(i) = wi/

∑k∈C wk . Marschak proved for a finite universe of

objects that IIA implies RUM.

11 / 31

Brief History

Multinomial Logit (MNL) was introduced by McFadden (1968) inwhich the strict utilities were specified as functions of observedattributes of the alternatives, PC(i) = exp(Vi)/

∑k∈C exp(Vk ).

McFadden showed that the Luce model was consistent with aRUM model with iid additive disturbances iff these disturbanceshad Extreme Value Type I distribution.

12 / 31

Outline

13 / 31

Bradley Terry Luce (BTL)

Starts with the IIA assumption and arrives at the choice probabilities:

P(x ; A) =U(x)∑

y∈A U(y)

A: choice setU(α): utility of alternative x

14 / 31

Elimination by aspects

Takes into account similarities between alternatives. Representscharacteristics of the alternatives in terms of their aspects:

x ′ = α, β, . . .

Choice probabilities

P(x ; A) =

∑α∈x ′−Ao U(α)P(x ; Aα)∑

β∈A′−Ao U(β)

A′: set of aspects that belongs to at least one alternative in set AAo: set of aspects that belong to all alternatives in AU(α): utility of aspect α

15 / 31

Elimination by aspects

Can cope with choice scenarioswhere IIA and simple scalabilityassumptions do not hold.

16 / 31

Elimination by aspectsIndependence from irrelevant alternatives

Bus Car

IIA suggests the probabilityof choosing the car to dropfrom 1/2 to 1/3 when asecond bus is introduced asthe third alternative.

17 / 31

Elimination by aspectsSimple scalability

Simple scalability assumesthat the choice probabilitieschange smoothly with thescale (utility) of analternative. However, insome situations introducinga small but unique aspectmay lead to an immensechange in the probabilities.

18 / 31

Outline

19 / 31

Random Utility Maximization

True utility

Unj = Vnj + εnj

Choice probability

Pni = Prob(Uni > Unj ∀j 6= i)

Vnj = V (xnj , sn) ∀jxnj : observed attributes of alternative jsn : attributes of decision maker n

20 / 31

Random Utility Maximization

True utility

Unj = Vnj + εnj

Choice probability

Pni = Prob(Uni > Unj ∀j 6= i)= Prob(Vni + εni > Vnj + εnj ∀j 6= i)= Prob(εnj − εnj < Vni − Vnj ∀j 6= i)

∫εI(εnj − εnj < Vni − Vnj ∀j 6= i)f (εn) dεn,

where εn = [ε1, . . . , εJ ]

21 / 31

LogitExtreme Value

Error density and distribution

f (εnj) = e−εnj e−e−εnj, F (εnj) = e−e−εnj

Choice probability

∫ ( ∏j 6=i

e−e−εni +Vni−Vnj)

e−εnj e−e−εnjdεni

=eVni∑j eVnj

=eβT xni∑j eβT xnj

22 / 31

Nested LogitGeneralized Extreme Value

The set of alternatives are partitioned into subsets called nests suchthat:

IIA holds within each nest.IA does not hold in general for alternatives in different nests.

Error distribution

F (εn) = exp(−

K∑k=1

(∑j∈Bk

e−εnj/λk )λk)

λk is a measure of the degree of independence among the alternativesin nest k .

23 / 31

Nested LogitGeneralized Extreme Value

The set of alternatives are partitioned into subsets called nests suchthat:

IIA holds within each nest.IA does not hold in general for alternatives in different nests.

Choice probability

Pni =eVni/λk (

∑j∈Bk

eVnj/λk )λk−1∑Kl=1(

∑j∈Bl

eVnj/λl )λl

λk is a measure of the degree of independence among the alternativesin nest k .

23 / 31

Nested Logit

The model is RUM consistent if 0 ≤ λk ≤ 1 ∀kFor λk > 1, the model is RUM consistent for some range ofexplanatory variables.λk < 0 is RUM inconsistent as it implies that improving attributesof an alternative can decrease its probability of being selected.if λk = 1 the model reduces to Logitas λk → 0 the model approaches the Elimination by Aspectsmodel

24 / 31

Generalizations of Nested Logit

Higher-level nestsOverlapping nests

I Cross Nested LogitF multiple overlapping nests

I Ordered GEVF correlation depends on the ordering of alternatives

I Paired Combinatorial LogitF each pair of alternatives constitutes a nest with its own correlation

I Generalized Nested LogitF multiple overlapping nests with a different membership weight to each

net for each alternative.

25 / 31

Simple procedure for defining any GEV

Define Yj = exp(Vj), and consider a function G = G(Y1, . . . , YJ)

with partial derivatives Gi = ∂G∂Yi

Pi =YiGi

1 G ≥ 0 for all positive values of Yj ∀j .2 G is homogeneous of degree one,

i.e. G(ρY1, . . . , ρYJ) = ρG(Y1, . . . , YJ)

3 G →∞ as Yj →∞ for any j4 The cross partial derivatives of G have alternating signs.

26 / 31

Simple procedure for defining any GEVExamples

Logit: G =∑J

j=1 Yj

Nested Logit: G =∑K

( ∑j∈Bl

Y 1/λl

)λl, 0 ≤ λk ≤ 1 ∀k

Paired Combinatorial Logit: G =∑J−1

k=1∑J

(Y 1/λkl

k + Y 1/λkll )

Generalized Nested Logit: G =∑K

( ∑j∈Bk

(αjkYj)1/λk

27 / 31

ProbitNormal

Error distribution: Normal

φ(εn) =1

(2π)J/2|Ω|1/2 e−12 εT

n Ω−1εn

Choice Probabilities

∫I(Vni + εni > Vnj + εnj ∀j 6= i)φ(εn) dεn,

This integral does not have a closed form.

28 / 31

Mixed Logitcan approximate any RUM model

A hierarchical model where the logit parameters β are given priordistributions so that the choice probabilities are given as:

∫Lni(β)f (β) dβ,

where Lni(β) is the logit probability evaluated at parameters β:

Lni(β) =eVniβ∑Ji=1 eVnjβ

f (β) can be discrete or continuous. e.g. if β takes M possible values,we have a latent class model

29 / 31

Mixed Logitcan approximate any RUM model

A hierarchical model where the logit parameters β are given priordistributions so that the choice probabilities are given as:

∫Lni(β)f (β) dβ,

where Lni(β) is the logit probability evaluated at parameters β:

Lni(β) =eVniβ∑Ji=1 eVnjβ

f (β) can be discrete or continuous. e.g. if β takes M possible values,we have a latent class model

29 / 31

Extensions of Choice Models

Model extensionsNonparametric distributions over noise distributionNonparametric distributions over explanatory variable parametersNonparametric prior over the functions on explanatory variables

Application areasConjoint analysisRanking, information retrieval

30 / 31

References

McFadden, "Economic Choices", Nobel Prize Lecture, 2000Train, "Discrete Choice Methods with Simulation", CambridgeUniversity Press, 2003Tversky, "Elimination by Aspects: A Theory of Choice",Psychological Review, 1972Greene, "Discrete Choice Modeling", 2008Cao et al., "Learning to Rank: From Pairwise to ListwiseApproach", ICML 2007Chu, Ghahramani, "Preference Learning with GaussianProcesses", ICML 2005

31 / 31

An Overview of Choice Models - University College …Discrete Choice Models Economic view of...

Documents

Estimating heterogeneous choice models with oglmrwilliam/oglm/oglm_Stata.pdf · choice models may sometimes be an attractive alternative to other ordinal regression models, such as

11. Categorical choice and survival models

Demand Estimation - Discrete Choice Models

Desirability andFeasibility ofEstablishing Additional

Social Desirability

R Desirability

Models of Choice. Agenda Administrivia –Readings –Programming –Auditing –Late HW –Saturated –HW 1 Models of Choice –Thurstonian scaling –Luce choice theory

Social Discrete Choice Models - University of …faculty.ce.berkeley.edu/pozdnukhov/papers/Social...Social Discrete Choice Models Danqing Zhang∗ Civil & Environmental Engineering,

Discrete Choice Models with Consideration Sets: Identi ...economics.yale.edu/sites/default/files/discretechoiceinattention102116b.pdfDiscrete choice models typically assume that consumers

Discrete Choice Models

ESTIMATION OF DRIVERS ROUTE CHOICE USING MULTI-PERIOD MULTINOMIAL CHOICE MODELS

Incorporating Choice Dynamics in Models of …wak2.web.rice.edu/bio/My Reprints/Incorporating Choice...INCORPORATING CHOICE DYNAMICS IN MODELS OF CONSUMER BEHAVIOR 245 which is superior

Counterfactual Desirability 4th Revision2

Multiobjective Optimization with Desirability … Optimization with Desirability Functions and Desirability Indices Dr. Heike Trautmann Statistics Faculty, TU Dortmund University Drug

Incorporating Choice Dynamics in Models of Consumer …wak2.web.rice.edu/bio/My Reprints/IncorporatingChoiceDynamicsinModelsofConsumer...INCORPORATING CHOICE DYNAMICS IN MODELS OF

Conceptual Models and Frameworks - Food Choice

Time of day choice models

Categorical Choice And Survival Models

Spatial Discrete Choice Models - NYUpages.stern.nyu.edu/~wgreene/SpatialDiscreteChoiceModels.pdf · Spatial Discrete Choice Models Professor William Greene Stern School of Business,

Desirability Optimization Models to Create the Global Healthcare … · 2020-06-24 · optimization models. These ﬁndings will be deemed as the basic essence of those prospective