Estimating Discrete Choice Models with Market Level Zeroes: An Application … · 2020. 1. 6. ·...

Preview:

Citation preview

Estimating Discrete Choice Models withMarket Level Zeroes: An Application to

Scanner Data

Amit Gandhi, Zhentong Lu, Xiaoxia Shi

University of Wisconsin-Madison

February 3, 2015

Introduction

I Zeroes are highly prevalent in choice dataI =Discrete choice models (a la McFadden) were designed to

explain corner solutions in individual demand

I Our research program: The empirical analysis of choicedata with market zeroes.

I Zero demand for a choice alternative after summing oversample of consumers in a market

I A major feature of choice data from a diversity ofenvironments

I Causes serious problems for standard estimationtechniques

Scanner Data

I Store level scanner data covering all Dominick’s FinerFoods (DFF) stores in Chicago from 1989-1997

I ⇡ 80 stores over 300 weeks

I For each week/store/UPC (universal product code)observation:

I priceI quantityI marketing (display, feature etc)I product characteristics (brand, size, premium etc)I wholesale price

Product Variety

CategoryAvg No ofUPC’s in a

Store/Week

Percent of TotalSale of the Top

20% UPC’s

Percent of ZeroSales

Analgesics 224 80.12% 58.02%Beer 179 87.18% 50.45%

Bottled Juices 187 74.40% 29.87%Cereals 212 72.08% 27.14%

Canned Soup 218 76.25% 19.80%Fabric Softeners 123 65.74% 43.74%

Laundry Detergents 200 65.52% 50.46%Refrigerated Juices 91 83.18% 27.83%

Soft Drinks 537 91.21% 38.54%Toothbrushes 137 73.69% 58.63%Canned Tuna 118 82.74% 35.34%Toothpastes 187 74.19% 51.93%

Bathroom Tissues 50 84.06% 28.14%

Long Tail

Long Tail

A “Big Data” Problem

I Quan and Williams (2014): data on 13.5 million shoe salesacross 100,000 products from online retailer

A “Big Data” ProblemI Marwell (2014): collects daily data on project donations to

Kickstarter

Kickstarter #Daily %Zero#Projects 90,876 Projects Contribution#Days 555 Mean 4,713 0.59Project-Day Obs. 2,615,839 Std. Dev 1,706 0.14

A “not so Big Data” Problem

I Nurski and Verboven (2013): Belgian data on 488 carmodels in 588 towns for 2 consumer types (men andwomen).

Discrete Choice Model

I Classic McFadden (1973, 1980) discrete choice model

I Markets are t = 1, . . . ,T are the store/week realizations (amenu of products, prices, and promotion)

I Products j = 1, . . . , Jt

with attributes x

jt

2 Rdw

I Consumers i = 1, . . . ,Nt

with “demographics” w

it

2 Rdz

u

ijt

=

(d

jt

+ w

it

Gx

jt

+ e

ijt

if j > 0

e

i0t

if j = 0

I BLP (1995) add the new layer:

d

jt

= bx

jt

+ x

jt

The Zeroes Problem

I Consider simplest case of “simple logit” (G = 0)

log

✓s

jt

s0t

◆= bx

jt

+ x

jt

jt = 1, . . . , JT

where E [xjt

| z

jt

] = 0.I If s

jt

= 0 then log (sjt

) does not exist (or can only bedefined as �•).

I However dropping zeroes induces selection bias

E [xjt

| z

jt

, sjt

> 0] 6= 0

I IV estimation asymptotically biased for b (which can besevere).

I Will depend on the strength of this selection effect.

Questions

I Why would the model generate an estimating equationthat can’t be estimated with the data?

I Is it a deep rejection of the choice model?

I Is it a problem with the empirical strategy of taking modelto data?

Identification

I For simplicity focus on “simple logit” (G = 0)

p

jt

=e

djt

ÂJtk=0 e

dktj = 0, . . . , J

t

.

Iconsumer variation

d

jt

= log

✓p

jt

p0t

= s

�1j

(pt

)

Iproducts/markets variation

d

jt

= bx

jt

+ x

jt

=)b =

�E

⇥z

0jt

x

jt

⇤��1E

⇥z

0jt

d

jt

Estimation

I Standard estimation (aka BLP) uses sample analogues ofboth stages.

I Replace p

jt

with p

MLE

j

=

s

jt

i

y

ijt

n

t

which impliesd

MLE

jt

= s

�1j

(sjt

)

I Plug d

MLE

jt

into 2SLS

bb =

T

Ât=1

Jt

Âj=1

⇥z

0jt

x

jt

⇤!�1

T

Ât=1

Jt

Âj=1

hz

0jt

d

MLE

jt

i

What is happening?

I Source of problem is d

jt

= s

�1j

�p

MLE

t

I does not exist when p

MLE

jt

= 0

I Why use MLE in the first place?I MLE is a potentially bad when choice data is sparse.

I Very old problemI Laplace’s “Law of Succession”I Multinomial cell probabilities and sparse contingency

tables

I Zeroes arise when some p

jt

’s are small and n

t

is finite.I Treating n

t

as finite but JT ! • makes p

jt

and hence d

jt

anincidental parameter.

Bayesian Analysis of Multinomial Cells

I Consider multinomial probabilitiesp

t

= (p0t

, . . . ,p

Jtt) 2 DJt

I We observe quantities q

t

= (q0t

, q1t

, . . . , qJtt) for n

t

consumersI The likelihood of p

t

is q

t

⇠ MN (nt

,p

t

)

I Conjugate prior is p

t

⇠ Dir (a0t

, . . . , a

Jtt)

I Uniform prior: a

jt

= 1 (Laplace/De Morgan)I Non-informative prior: a

jt

= .5 (Jeffreys/Bernardo)

I Posterior is

p

t

| q

t

, nt

⇠ Dir (a0t

+ q0t

, a1t

+ q1t

, . . . , a

Jtt + q

Jtt)

Laplace’s “Law of Succession”

I “What is the probability the sun will rise tomorrow giventhat it has risen everyday until now?”

I He used a uniform prior a

jt

= 1

I Bayesian estimate p

jt

=

E [pjt

| q

t

, nt

] =q

jt

+ 1

n

t

+ J

t

+ 1

Ip

jt

“shrinks” empirical share s

jt

towards prior mean1/ (J

t

+ 1)

Ip

jt

is consistent (like s

jt

), i.e., p

jt

!p

p

jt

I Data dominates the prior in large samples

Demand Application

I We want to estimate d

jt

=

E

log

✓p

jt

p0t

◆| q

t

, nt

�= y (a

jt

+ q

jt

)� y (a0t

+ q0t

)

where y is the digamma function.I Use d

t

to compute “optimal market shares”

p

⇤kt

=exp

�d

kt

1+ ÂJtj=1 exp

�d

jt

I Plug optimal shares into 2SLS

bb =

T

Ât=1

Jt

Âj=1

⇥z

0jt

x

jt

⇤!�1

T

Ât=1

Jt

Âj=1

hz

0jt

s

�1j

(p⇤t

)i

Why is this a good estimator?

I We take a “Frequentist” interpretation of the priorI “Empirical Bayes” approach.

I Choice probabilities p

t

are the endogenous variable of thestructural model.

I Let z

t

= (z1t

, . . . , zJtt) be the the collection of exogenous

variablesI Then the conditional distribution p

t

| z

t

is the reduced form

of the structural modelI Prior distribution = Reduced form

Asymptotic BiasI Finite n

t

implies b will in general have asymptotic bias.I

plim

JT!• b =

b +Q

�1xz

E

hz

0jt

⇣s

�1jt

(pt

)� s

�1jt

(pt

)⌘i

where Q

xz

= E

hz

0jt

x

jt

i.

TheoremIf optimal market shares p

⇤t

are constructed from the “correct prior”

F

p|zt = F

0pt |zt

then

E

hz

0jt

⇣s

�1jt

(p⇤t

)� s

�1jt

(pt

)⌘i

= 0

I Thus optimal market shares give consistent estimatesb !

p

b.

Robust Prior

I What happens if we are not exactly right about prior, i.e.,F

p|zt ⇡ F

0pt |zt

?I Use the “Robust Priors” approach of Arellano and

Bonhomme (ECMA 2009).

TheoremIf the prior F

t

6= F

0t

is not exact then

E

hz

0jt

⇣s

�1jt

(pt

)� s

�1jt

(pt

)⌘i

= n

�1t

KLIC

�F

0t

,Ft

�+ o

�n

�1t

I So long as prior is sensible (and n

t

relatively large) the biasreduction will be good (and much better than the theimplicit MLE prior)

Dirichlet and the Long Tail

I Dirichlet is a conjugate priorI gives closed form optimal shares p

⇤t

I Dirichlet prior also gives rise to the long tail

I A key feature of demand data.

TheoremIf q

t

⇠ MN (pt

, nt

) and p

t

⇠ Dir (a · 1Jt+1) (symmetric Dirichlet)

then (for large J

t

) the quantity histogram will exhibit the long tail

shape (Pareto decay)

I A restatement of Chen (1980) on probability foundationsfor Zipf’s Law

Ia is the concentration parameter

An Illustration500 Products and 10,000 consumers

Figure : Zipf’s Law and the Symmetric Dirichlet

Picking the Prior

I Jeffrey’s prior

p

t

| z

t

⇠ Dir (.5 · 1Jt+1)

I If p

t

⇠ Dir (a · 1Jt+1) and q

t

⇠ MN (pt

, nt

) then

q

t

⇠ DirichletMultinomial (a)

Ia can be estimated with MLE.

I More generally we can allow a

jt

= gz

jt

and estimate g

(built into Stata).

I We can also allow for mixtures of Dirichlet priors forincreased flexibility at little analytic cost

I Posterior is also a mixture of Dirichlet distributions

Mixed LogitI All the theory generalizes to mixed logit models:

(l0Bayes, b

0Bayes)

0 = arg min

l,bm

BayesT

(l, b)0WT

m

BayesT

(l, b),

(0.1)where m

BayesT

(l, b) = T

�1 ÂT

t=1 m

Bayest

(l, b) with

m

Bayest

(l, b) = J

�1t

Jt

Âj=1

z

jt

[s�1j

(pBayest

, xt

;l)� x

0jt

b]. (0.2)

andp

Bayest

:= s(dpostt

(l|qt

), xt

;l). (0.3)

Is

�1j

(pt

, xt

;l) ⇡ log

⇣pjtp0t

⌘with second order

approximation (Gandhi and Nevo 2013)I Log of zero is the first order problem for mixed logitI Can use logit optimal shares as an approximation to the

optimal shares in general.

Monte Carlo I: Binary Logit

I DGP

I utility function: u

it

=

(a + bx

t

+ x

t

+ e

it

inside goode0t

outside goodI random draws: x

t

⇠ Uniform [0, 15], e

it

⇠ T1EV ,x

t

⇠ N

�0, .52�⇥ x

t

Ib = �1, a varies to produce different fractions of zeros

I ResultsFraction of Zeros 16.48% 36.90% 49.19% 63.70%Empirical Share .3833 .6589 .7965 .9424Laplace Share .2546 .5394 .6978 .8476Optimal Share -.0798 -.0924 -.0066 .0362

Note: T = 500, n = 10, 000, Number of Repetitions= 1, 000.

Monte Carlo II: Nested Logit

I DGPI utility function (Berry 1994):

u

it

= a + bx

t

+ x

t

+⇥Â

g

d

jg

z

ig

+ (1� l) e

it

I nesting structure g : {0}, {1, ..., 25}, {26, ..., 50} with nestingparameter l

I parameter of interest: b (true value = -1), l (true value = .5)I vary a to produce different fractions of zeros

I ResultsFraction b l

of Zeros ES LS OS ES LS OS

13.3% -.16 .28 .03 -.06 .07 -.0120.4% -.20 .32 .03 0.07 .10 -.0132.3% -.27 .33 .02 0.11 .13 -.0148.1% -.38 .31 .00 -.15 .13 -.02

Note: J = 50, T = 500, n = 15, 000Number of Repetitions= 1000

Application: Loss Leader Hypothesis

0 10 20 30 40 50 60 70 80 90 100

�4

�2

0

2

4

6

Week

price(standarized)quantity(standarized)

Application: Testing the Loss Leader Hypothesis

I Chevalier, Kashyap, and Rossi [2003] introduce a test:I When a product becomes more popular its price can fall.

I Category specific effect distinguishes loss-leader fromother theories of countercylical prices (Warner and Barsky[1995])

I Big empirical effect for tuna during Lent.

Tuna demand

What does an ounce of tuna cost?

I Index weeks in the data by t, stores by s , and the UPC’s byj .

Is

jst

= market share (in ounces) in week t of tuna j in store s .I

p

jst

= price/oz of tuna for tuna j in store s at time t.

I Price index for tuna in week t is

P

t

= Âj ,s

s

jst

log p

jst

.

I Actual average (Nevo and Hatzitaskos [2006])

P

t

=1

N

t

Âj ,s

log p

jst

.

Tuna during Lent

Table : Regression of Price Index on Lent

P P

(Price Index) (Average Price)Lent -.163 -.021s.e. (.0004) (.0003)

What is happening?

I The market share of cheaper UPC’s in week is going up inhigh demand period. Why?

I Demand Story: demand is more elastic in the highdemand period (consistent with Warner and Barsky [1995])

I Supply Story: retailer more aggressively promotes biggerdiscounts in the high demand period inducing (consistentwith loss leader)

Logit and Nested Logit

Table : Demand in Lent vs. Non-Lent

Price Avg. OwnCoefficient Price Elast.

LentNon

LentNon

Share Lent LentLogit Empirical. -.60 -.50 -.89 -.75

(.019) (.005)Optimal. -1.96 -2.01 -2.90 -3.01

(.027) (.008)Nested Empirical. -.57 -.52 -1.39 -1.54

Logit (.014) (.003)Optimal. -1.02 -.98 -5.81 -7.79

(.015) (.003)

Key Variation in Data

I Sale = 5% reduction (or more) from high price of previous3 weeks.

Table : Regression of Sales Price Index on Lent

P P

(Price Index) (Average Price)Sale Regular Sale Regular

Lent -.199 .035 .010 .001s.e. (.0017) (.0003) (.0016) (.0003)

How Big is Promotion Effect

I Turn off sales in data.I Set all p

sale

jt

= p

regular

jt

I Leave promotions (deal + unobservable) the same in data.I Predict new quantities q

jt

.

I Use q

jt

to form counterfactual price index (with originalprice).

P

t

= Âj ,s

s

jst

log p

jst

I Isolates the change in demand (composition) due only topromotions changing in high demand period.

Result

Table : Original Regression

P P

(Price Index) (Average Price)Lent -.163 -.021s.e. (.0004) (.0003)

Table : Counterfactual Regression

P P

(Price Index) (Average Price)Lent -.14 -.02s.e. (.0005) (.0003)

Basic Story

I It is not that tuna prices are cheaper during LentI Instead consumers are more likely to be aware of sales

prices (through promotional effort) in the high demandperiod.

I This change in promotional effort steers demand to morediscounted products.

I Demand steering explains most of the change in the priceindex.

Table : Observed Promotions and Discounts

5% Sale 10% Sale 25% Sale 50% saleNon-Lent .62 .63 .64 .49

Lent .70 .72 .84 .80

Conclusion

I We provide an approach to estimating discrete choicemodels with market zeroes

I We apply our approach to estimating demand withscanner data

I revisit the loss leader hypothesis debate

I We can nest our approach within a dynamic stockpilingmodel (along lines of Hendel an Nevo 2006).

I Market zeroes arise in many settings where our approachis applicable

I bilateral trade flowsI crime regressions

Judith A Chevalier, Anil K Kashyap, and Peter E Rossi. Whydon’t prices rise during periods of peak demand? evidencefrom scanner data. American Economic Review, 93(1):15–37,2003.

Aviv Nevo and Konstantinos Hatzitaskos. Why does theaverage price paid fall during high demand periods?Technical report, CSIO working paper, 2006.

Elizabeth J Warner and Robert B Barsky. The timing andmagnitude of retail store markdowns: evidence fromweekends and holidays. The Quarterly Journal of Economics,110(2):321–352, 1995.