View
280
Download
4
Category
Preview:
Citation preview
Big Data, Small Models:The Age of Fast Analytics
March 9, 2015
Carol A. WolowicSenior Manager, MediaPanera Bread
Adam BenaroyaDirector, Digital InsightsMindshare
Kajal Mukhopadhyay, Ph.D.VP, Performance & MeasurementXaxis
Defining Big Data
BIG
DATA
A broad term for data sets so large or complex that they are difficult to process using traditional data processing applications. Challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and information privacy
The term big data has largely come to refer simply to the use of predictive analytics or other certain advanced methods to extract value from data, without any required magnitude thereon
-- WIKIPEDIA
VOLUME VARIETY VELOCITY VERACITY
3 Vs or 4 Vs
Small Models: A Few of Everything?
Fewer Goals:Well defined singular KPI (Goals) – key performance indicator/measurement
Fewer Variables:Small set of variables and parameters
Fast Computation:Distributed, Additive, Modular
Forward Looking:Short history, short-term forecast, next action
No fancy acronyms – 3 or 4 F’s
Small model estimation
and fast prediction
Distributional simplification
#(𝑥𝑖∈𝑋𝑖)
𝑁𝑖→ 𝑝𝑖 : convergence in proportion1
𝑥𝑖∈𝑋𝑖𝑥𝑖
𝑁𝑖→ 𝜇𝑖 : convergences in mean2
𝐿𝐿 ≤ 𝑥𝑖 ≤ 𝑈𝐿 : confidence and tests3
𝑥1(𝜇1)
𝑥2(𝜇2)
𝑥3(𝜇3)
𝑥4(𝜇4)
𝑥5(𝜇5)
𝑥6(𝜇6)
Correlation and Confounding Bias Correction
UNIVARIATE RANDOM DESIGNOVER A SINGLE VARIABLE
MANY SMALL EXPERIMENTSOVER A FEW LARGE BETS1
PRE-POST CORRECTIONOVER ANY TEMPORAL SPACE
FAST ADJUSTMENTOVER HISTORICAL BASELINE3
UNIVERSAL CONTROLOVER ENTRIE DATA
COLLECTION PROCESS
RAPID ITERATIONSOVER BIG CAMPAIGNS2
(𝜇𝑡𝑒𝑠𝑡+𝛿𝑏𝑖𝑎𝑠) − (𝜇𝑏𝑎𝑠𝑒+𝛿𝑏𝑖𝑎𝑠) = 𝜇𝑡𝑒𝑠𝑡-𝜇𝑏𝑎𝑠𝑒 lift
Large number of events, audience characteristics and segment populations
𝑃 Action Audience ∈ 𝑋𝑖 =𝑃(𝐴𝑢𝑑𝑖𝑒𝑛𝑐𝑒 ∈ 𝑋𝑖𝑇𝑎𝑘𝑖𝑛𝑔 𝐴𝑐𝑡𝑖𝑜𝑛)
𝑃(𝐴𝑢𝑑𝑖𝑒𝑛𝑐𝑒 ∈ 𝑋𝑖)1
Most Likely
Audience
Index Score =𝑃(Action 1|Audience ∈ 𝑋𝑖)
𝑃(Action 2|Audience ∈ 𝑋𝑖)2Most Likely Action
Odd ratio =𝑃(Action|Audience ∈ 𝑋𝑖)
1 − 𝑃(Action|Audience ∈ 𝑋𝑖)3Event
Chance
Odd Score =𝑃(Action 1 |Audience ∈𝑋𝑖)
1−𝑃(Action 1 |Audience ∈𝑋𝑖)/
𝑃(Action 2 |Audience ∈𝑋𝑖)1−𝑃(Action 2 |Audience ∈𝑋𝑖)4
Event Relevance
Ratio
Correlated vs. Casual Relation
Recommendation table based on conditional probability
Y= 𝜶 + 𝜷𝟏𝑿𝟏 + 𝜷𝟐𝑿2 + 𝜺Regression Framework:
Y
X1 X2
• Causal relationship models• Strong and weak linkages
between variables• Bayesian Network
Fast Prediction
Audience Classifications based on Bayes principle
Audience behaviors based on most likely behavior
Audience actions based most likely behaviorA
ud
ien
ce T
arge
tin
g
Predictive behavior based on actions
Predictive behavior on audience segments
Predictive behaviors based on media trigger P
red
icti
ve B
ehav
ior
ACTION
How do you take action on what, when and where?
Adam BenaroyaDirector, Digital InsightsMindshare
Client Challenge: Local Planning
Client has a significant investment in local digital advertising, but could not predict either efficiency or scale of local markets.
Business Questions - Efficiency• What factors appear to influence local
advertising efficiency?• Can we identify attributes which may
help project the performance of expansion markets for the future?
• Can we determine which new markets best fit this profile and thus are more likely to perform efficiently?
• Can we identify markets which are more likely to be high risk options?
Business Questions - Scale• What is the right amount for each
market to invest in local?• What can we learn from markets
which under-delivered on their budgets?
• Can we find a better way to estimate budgets for each market?
Solution: Cluster Analysis – answering ‘Efficiency’3 explanatory variables were the primary contributors to ‘cost-per-action’. Markets were clustered into 4 groups to project potential campaign efficiency.
Solution: CHAID Analysis – answering ‘Scale’2 explanatory variables were the primary contributors to budget scale. Recommendations for individual local market spend could be determined by the two variables.
Client Challenge: Determining causation between digital touchpoints and conversion
Business Questions • How do paid digital media
channels interact together to drive a final conversion?
• Which digital actions are strongest drivers of final conversions?
• How should I choose a proxy KPI for a sales campaign?
Site Action
A
Conversion
Site Action
B
Video Display
HIGH
Sample Size of Data
LOW
Solution: Bayesian Network to prioritize KPIs
Site Action
A
Conversion
Site Action
B
Video
HIGH
Sample Size of Data
LOW
Contribution: 35%
Contribution: 5%
Display
• Bayesian network built to map the relationship between all trackable digital touchpoints.
• When sample size limitations prevent optimization on a sales conversion KPI, the Bayesian network can also indicate which ‘proxy’ KPIs to use earlier in the campaign flight
ACTION
How do you take action on what, when, and where?
Carol A. WolowicSenior Manager, MediaPanera Bread
Realities of Big Data
Volume
Variety
Velocity
Veracity
• How is Panera using the data from the Data Management Platforms?
• How does Panera use “fast data” to react to market findings in a timely manner?
Leveraging Audience
Intelligence
Using Big Data to address critical, quantifiable goalsB
ig D
ata
Inte
llige
nce
&
Cam
pai
gn P
erfo
rman
ce
THANK YOU
Question and Answers
Recommended