Upload
sonja
View
84
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Consumer Behavior Prediction using Parametric and Nonparametric Methods. Elena Eneva CALD Masters Presentation 19 August 2002 Advisors: Alan Montgomery, Rich Caruana, Christos Faloutsos. Outline. Introduction Data Economics Overview Baseline Models New Hybrid Models Results - PowerPoint PPT Presentation
Citation preview
Consumer Behavior Prediction using Parametric and
Nonparametric Methods
Elena EnevaCALD Masters Presentation
19 August 2002
Advisors: Alan Montgomery, Rich Caruana,
Christos Faloutsos
Outline
Introduction Data Economics Overview Baseline Models New Hybrid Models Results Conclusions and Future Work
Background
Retail chains are aiming to customize prices in individual stores
Pricing strategies should adapt to the neighborhood demand
Stores can increase operating profit margins by 33% to 83%
Price Elasticity
consumer’s response to price change
Ppercent
Qpercent E
inelastic elastic
Q is quantity purchased
P is price of product
Data Example
0
20000
40000
60000
80000
100000
0.02 0.03 0.04 0.05 0.06price
quantity
2.75
3.25
3.75
4.25
4.75
5.25
-1.58 -1.53 -1.48 -1.43 -1.38 -1.33 -1.28ln(price)
ln(quant)
Data Example – Log Space
Assumptions
Independence– Substitutes: fresh fruit, other juices– Other Stores
Stationarity– Change over time– Holidays
“The” ModelCategory
Price of Product 1
Price of Product 2
Price of Product 3
Price of Product N
. . .
“I know your
customers”
PredictorPredictor
Quantity bought of Product 1
. . .
Quantity bought of Product 2
Quantity bought of Product 3
Quantity bought of Product N
Need to multiply this across many stores, many categories.
),0(~
))(ln(~)ln(2
N
pfq
conv
ert t
o ln
spa
ce
conv
ert t
o or
igin
al s
pace
Converting to Original Space
),0(~
))(ln(~)ln(2
N
pfq
))(ln(ˆ)ln( pfq
),(~))(ln(|)ln( 2Npfq
2^
2
1)ln(
ˆq
eq
eqE2
2
1
][
Existing Methods
Traditionally – using parametric models (linear regression)
Recently – using non-parametric models (neural networks)
Our Goal
Advantage of LR: known functional form (linear in log space), extrapolation ability
Advantage of NN: flexibility, accuracy
robustness
acc
ura
cy
NNnew
LR
Take Advantage: use the known functional form to bias the NN
Build hybrid models from the baseline models
Datasets
weekly store-level cash register data at the product level
Chilled Orange Juice category
2 years 12 products 10 random stores selected
Evaluation Measure
Root Mean Squared Error (RMS) the average deviation between the
predicted quantity and the true quantity
N
iii qq
NRMSerror
1
2ˆ
1
Models
Hybrids– Smart Prior– MultiTask Learning– Jumping Connections– Frozen Jumping
Connections
Baselines–Linear Regression–Neural Networks
Baselines
Linear Regression
Neural Networks
q is the quantity demanded pi is the price for the ith product K products overall The coefficients a and bi are determined by
the condition that the sum of the square residuals is as small as possible.
Linear Regression
),0(~
)ln()ln(
2
1
N
pbaq i
K
ii
Linear Regression
Results RMS
0
2000
4000
6000
8000
10000
12000
LR NN SmPr MTL JC FJC Vote WAV
Neural Networks
generic nonlinear function approximators
a collection of basic units (neurons), computing a (non)linear function of their input
backpropagation
Neural Networks
1 hidden layer, 100 units, sigmoid activation function
Results RMS
0
2000
4000
6000
8000
10000
12000
LR NN SmPr MTL JC FJC Vote WAV
Hybrids
Smart Prior MultiTask Learning Jumping Connections Frozen Jumping Connections
Smart Prior
Idea: start the NN at a “good” set of weights, help it start from a “smart” prior.
Take this prior from the known “linearity” NN first trained on synthetic data generated
by the LR model NN then trained on the real data
Smart Prior
Results RMS
0
2000
4000
6000
8000
10000
12000
LR NN SmPr MTL JC FJC Vote WAV
Multitask Learning
Idea: learning an additional related task in parallel, using a shared representation
Adding the output of the LR model (built over the same inputs) as an extra output to the NN
Make the net share its hidden nodes between both tasks
Custom halting function Custom RMS function
MultiTask Learning
0
2000
4000
6000
8000
10000
12000
LR NN SmPr MTL JC FJC Vote WAV
Results RMS
Jumping Connections
Idea: fusing LR and NN
change architecture add connections which “jump” over the
hidden layer Gives the effect of simulating a LR and
NN all together
Jumping Connections
Results RMS
0
2000
4000
6000
8000
10000
12000
LR NN SmPr MTL JC FJC Vote WAV
Frozen Jumping Connections
Idea: you have the linearity, now use it!
same architecture as Jumping Connections, plus really emphasizing the linearity
freeze the weights of the jumping layer, so the network can’t “forget” about the linearity
Frozen Jumping Connections
Frozen Jumping Connections
Frozen Jumping Connections
Results RMS
0
2000
4000
6000
8000
10000
12000
LR NN SmPr MTL JC FJC Vote WAV
Models
Hybrids– Smart Prior– MultiTask Learning– Jumping Connections– Frozen Jumping
Connections
Baselines:–Linear Regression–Neural Networks
Combinations–Voting–Weighted Average
Combining Models
Idea: Ensemble Learning
Committee Voting – equal weights for each model’s prediction
Weighted Average – optimal weights determined by a linear regression model
2 baseline and 3 hybrid models (Smart Prior, MultiTask Learning, Frozen Jumping Conections)
Committee Voting
Average the predictions of the models
0
2000
4000
6000
8000
10000
12000
LR NN SmPr MTL JC FJC Vote WAV
Results RMS
Weighted Average – Model Regression
Linear regression on baselines and hybrid models to determine vote weights
Results RMS
0
2000
4000
6000
8000
10000
12000
LR NN SmPr MTL JC FJC Vote WAV
Normalized RMS Error
Compare model performance across stores Stores of different sizes, ages, locations, etc Need to normalize Compare to baselines
Take the error of the LR benchmark as unit error
Normalized RMS Error
0.75
0.80
0.85
0.90
0.95
1.00
1.05
1.10
LR NN SmPr MTL JC FJC Vote WAV
Conclusions
Clearly improved models for customer choice prediction
Will allow stores to price the products more strategically and optimize profits
Maintain better inventories Understand product interaction
Future Work Ideas
analyze Weighted Average model compare extrapolation ability of new
models use other domain knowledge
– shrinkage model – a “super” store model with data pooled across all stores
Acknowledgements
I would like to thank my advisors
and
my CALDling friends and colleagues
The Most Important Slide
for this presentation and the paper:
www.cs.cmu.edu/~eneva/research.htm
References
Montgomery, A. (1997). Creating Micro-Marketing Pricing Strategies Using Supermarket Scanner Data
West, P., Brockett, P. and Golden, L (1997) A Comparative Analysis of Neural Networks and Statistical Methods for Predicting Consumer Choice
Guadagni, P. and Little, J. (1983) A Logit Model of Brand Choice Calibrated on Scanner data
Rossi, P. and Allenby, G. (1993) A Bayesian Approach to Estimating Household Parameters