Upload
jacktastic
View
823
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Early Lessons Learned In Applying Big Data To Television Advertising. September 12, 2011 presentation by Simulmedia.
Citation preview
Early Lessons Learned in Applying Big Data To TV Advertising
ARF September 12, 2011 Jack Smith, Chief Product Officer, Simulmedia
2
About Us
We are a New York based start-‐up. We are venture backed by Avalon Ventures, Union Square Ventures and Time-‐Warner.
Our 35 person team has veterans of:
Television is sHll the most powerful adverHsing medium in the world. While addressability will come, we’re not waiHng for it. We’ve taken a few strategies we learned from the Internet and are applying it to linear TV adverHsing, today.
Through partnerships with major data providers, we have assembled the world’s largest set of acHonable television data. We sell television adverHsing. With inventory in over 106 million US households, we can cost-‐effecHvely extend reach into high-‐value target audiences across virtually any adverHser category. We use big data and science to do this.
Who We Are
Where We Have Been
What We Believe
How We Do It
How We Make Money
3
Why Did We Leave The Web?
Television remains the dominant consumer medium
(a) Nielsen US TV Viewing Audicence TradiHonal Live-‐Only TV based on average monthly viewing during 1Q2011. Internet and Online Video based on average monthly consumpHon during July 2011. Video on Demand based on consumpHon during May 2011.
4
TV Spend Is Increasing
Source: MAGNAGLOBAL
5
Audience Is FragmenEng
Source: Nielsen via TVbythenumbers.com
6
Campaign Reach Is Declining
Source: Simulmedia analysis of data from SQAD, Nielsen and TVB
Impossible for measurement and planning tools to keep pace
Highly ConfidenHal
Big Data
8
Big Data Is Driving Growth
“We are on the cusp of a tremendous wave of innova;on, produc;vity and growth, as well as new modes of compe;;on and value-‐capture –
all driven by Big Data.” -‐ McKinsey Global InsHtute, May 2011
“For CMOs, Big Data is a very big deal.” -‐ Alfredo Gangotena, CMO, Mastercard, July 2011
9
Size Is RelaEve
1 byte x 1000 = 1 kilobyte …x 1000 = 1 megabyte …x 1000 = 1 gigabyte …x 1000 = 1 terabyte …x 1000 = 1 petabyte …x 1000 = 1 exabyte
10
Size Is RelaEve
Telegram = 100 bytes
Data © 1997-‐2011, James S. Huggins hfp://www.jamesshuggins.com/h/tek1/how_big.htm
11
Size Is RelaEve
Page of an Encyclopedia = 100 kilobytes
Data © 1997-‐2011, James S. Huggins hfp://www.jamesshuggins.com/h/tek1/how_big.htm
12
Size Is RelaEve
Pickup truck bed full of paper = 1 gigabyte
Data © 1997-‐2011, James S. Huggins hfp://www.jamesshuggins.com/h/tek1/how_big.htm
13
Size Is RelaEve
EnHre print collecHon of the Library of Congress = 10 terabytes
Data © 1997-‐2011, James S. Huggins hfp://www.jamesshuggins.com/h/tek1/how_big.htm
14
Size Is RelaEve
All hard drives produced in 1995 = 20 petabytes
Data © 1997-‐2011, James S. Huggins hfp://www.jamesshuggins.com/h/tek1/how_big.htm
15
Size Is RelaEve
All printed material = 200 petabytes
Data © 1997-‐2011, James S. Huggins hfp://www.jamesshuggins.com/h/tek1/how_big.htm
16
But Big Data Is More Than Size
Time:
Focus:
Supports:
What happened?
Why did it happen?
BIG DATA
What’s going to happen next?
Past Future
ReporHng PredicHon
Human decisions
Machine decisions
Structured Aggregated
Unstructured Unaggregated
Data:
Dashboards Excel
Discovery VisualizaHon
StaHsHcs & Physics
Human Skills:
17
AcceleraEng The Push To Big Data
Hadoop, cloud compuHng, Facebook, Yahoo, quants, Biforrent, machine learning, Stanford,
large hadron collider, Wal-‐Mart, text processing, Amazon S3 & EC2, open source intelligence, NoSQL, social media, Google,
commodity hardware, Hive, fraud detecHon, trading desks, MapReduce, natural language
processing
18
What Can It Mean For TV AdverEsing?
Big data drove the rise of web & search adver;sing • AccumulaHon of high volume of direct measurement
of media consumpHon • Befer predicHons about consumer interests • Real Hme return path • AutomaHon • Interim step for addressability • More diligence around consumer privacy • Media buyers and sellers rethinking their approach to
audience packaging, campaign planning, technology, data assembly and people
19
Post Modern Architecture
Have we reached the limits of classic data storage architecture?
Data Warehouses • Yahoo!: 700 tb1 • Australian Bureau of StaHsHcs: 250 tb1 • AT&T: 250 tb1 • Nielsen: 45 tb1 • Adidas: 13 tb1 • Wal-‐Mart: 1 pb2
1 Oracle F1Q10 Earnings Call September 16, 2009 Transcript 2 Stair, Principles of Informa;on Systems, 2009, p 181 3 Dhruba Borthakur, Facebook, December 2010, hfp://www.facebook.com/note.php?note_id=468211193919 4 Simulmedia esHmate
Data Lakes • Facebook: 30 pb3 (7x
compression) • Yahoo: 22 pb4 • Google: ???
20
Our Idea of Big Data
Set Top Boxes
• 17+ million boxes
• Completely anonymous viewing • Live • DVR • VOD • Pay channels
Program
• 3 different sets of schedule data
• Proprietary metadata
Public
• US census • Military • Business
Ad Occurrence
• What ads ran?
• Where did they run?
Client Proprietary
• Business Development Indices (BDI)
• Commercial Development Indices (CDI)
• Regional sales data
Nielsen RaHngs
• All Minute Respondent Level Data (AMRLD)
Bringing the data set together in a single plaMorm
Our (comparaHvely modest) data set: • 200 tb (approx. 7x compression) • 113,858,592 daily events • Approximately 402,301 weekly ads • Double capacity every 6 months …And we don’t load every data point across all data sets, yet
21
Rethinking Media Data Architecture
• No clouds allowed (ISO compliance) • Expect hardware failure
• Learn from those who have done it • ParHcipate in the Open Source community
• ELT (Extract, Load, Transform) • Meddle • Machine learning
Commodity Hardware
Open Source Sosware
Write Your Own Sosware
Applying big data to television required us to rethink what our technical architecture should be
• Advanced staHsHcal techniques • ExperimentaHon Science
22
Some Wrinkles In The Matrix
23
The People We Needed
• New core skills for everyone in the company • Pafern recogniHon • VisualizaHon • Technology • ExperimentaHon
• Where do you find hard to find tech skills? • You don’t find them. You make them.
• A dedicated Science team • Non tradiHonal researchers (Brain imaging, bioinformaHcs,
economic modeling, geneHcs) • People who watch a lot of television
A different approach required different skill sets
Highly ConfidenHal
10 Lessons We’ve Learned
25
Some Things To Know, First
• Live viewing unless otherwise noted • Time shising lessons is a whole other presentaHon • Time shising + live viewing lessons is a whole other other presentaHon • Video on demand is a whole other other other presentaHon
• We name names and provide numbers where clients and data partners permit • Client confidenHality is important to us
• None of this work would’ve been possible without the help of our clients and partners
Read me… This box will contain important informaHon about the graphs on
each page.
Highly ConfidenHal
60% of TV Viewers Watch 90% of TV
27
Networks with relatively fewer lighter viewer impressions
Networks with relatively more lighter viewer impressions
OXYGEN 7.4
WE 7.6
PLANET GREEN
7.7
OVATION 7.8
STYLE 7.8
MTV2 7.8
SUNDANCE 7.9
IFC 7.9
TCM 13.6
HALLMARK 13.7
ADSWIM 14.0
NICKNITE 14.3
CNBC 15.7
FOX NEWS 18.0
Higher rated networks
Lower rated
networks
Where The Other 40% Are
VerEcal: RaHo of Heavy Viewers to light viewer impressions. Horizontal: Low rated to Highly rated networks Call outs: RaHo is the number of Heavier Viewer impressions you would deliver to reach a Lighter Viewer on a given network Sources: Nielsen & Simulmedia’s a7
28
Where The Other 40% Are
To capture light viewers, media planning and measurement tools must quickly apply new methods to emerging data sets
Highly ConfidenHal
Quality Control Is A Full Time Job
30
When Data Goes Missing
AutomaHon of error checking/quality control is essenHal Reuse the data to solve other problems Occasionally observe missing data Three choices:
• Pick up the phone • EsHmate missing fields • Work around the missing
data
Source: Simulmedia’s a7
Time series of SYFY network. 10645 observaEons from 2010.02.28 at 7:00pm Eastern to 2010.10.14 at 12:30pm Eastern
Highly ConfidenHal
More Data Really Is Befer
32
DisambiguaEon: The Madonna Problem
OR
Pop Icon? Religious icon?
33
The RevoluEon of Simple Methods
More data beats beUer algorithms. The best performing algorithm underperforms the worst algorithm when given an order of magnitude more data. Simple algorithms at very large scale can help befer predict audience movement.
Peter Norvig | Internet Scale Data Analysis | June 21, 2010
Original graph sourced from: Banko & Brill, 2001. Mi;ga;ng the paucity-‐of-‐data problem: exploring the effect of training corpus size on classifier performance for natural language processing
34
Packaging Reach
Peter Norvig | Internet Scale Data Analysis | June 21, 2010
Very large data sets beUer predict TV audience movements
35
The Cost Of More Data
• All data online. All the Hme.
• Less expensive hardware • Extremely flexible
• All data online. All the Hme.
• More expensive talent • Physicists & staHsHcians ain’t cheap
• Hard to find programmers • Not everything meets your needs
• Evolving technologies in mission criHcal funcHons
More data drives beUer results but there are costs
Highly ConfidenHal
The Data Isn’t Biased Just Because It Comes From A
Set Top Box
37
Applying Simple Methods At Scale
Sources: Nielsen & Simulmedia’s a7
Regression analysis of Nielsen Household Cume RaEng against Simulmedia’s a7 cume raEng. 20 PrimeEme Network shows with HAWAII FIVE-‐0. Fall 2010.
High correlaHon of a7 measures and Nielsen esHmates.
Either bias is insignificant or Nielsen data and our data share the same bias.
MulHple methods yield similar results
38
And Then We Kept Going
Two samples 1. Sample 1: Fall 2010: 20 PrimeHme
broadcast series launches + promos
2. Sample 2: Jan 2011: 15 PrimeHme cable series premieres + promos (Plus one mulH-‐season/year primeHme broadcast premiere + promos)
• Hand selected programs • Mix of genres • Mix of new vs. returning shows
How we sliced it • EnHre a7 data set • Cross correlated individual data
sets contained in a7 aggregate data set
• Aggregate cross geographies (DMA to DMA)
ObservaEons • Sample 1 average r2>0.85 • Sample 2 average r2>0.93
We measured program Tune-‐In, Spot Tune-‐In, Campaign Reach, Campaign Ra;ng using mul;ple slices of our data set using two
different sample sets and ;me frames
Highly ConfidenHal
Addressability Is Here
40
Closing The Loop On Program PromoEon
Sources: Simulmedia’s a7
Spring 2010 broadcast premiere promoEon. Horizontal: Leb to right moves back in Eme. 0 is the premiere Eme. VerEcal: Conversion rate is measured in percent. Size of the bubble represents total conversions for a given spot.
41
Closing The Loop On Program PromoEon
Sources: Simulmedia’s a7
Spring 2010 broadcast premiere promoEon. Horizontal: Leb to right moves back in Eme. 0 is the premiere Eme. VerEcal: Conversion rate is measured in percent. Size of the bubble represents total conversions for a given spot.
42
Long held beliefs and rules of thumb in planning may or may not be supported by data
TV marketers now have more opHons for show promoHon
Closing The Loop
Highly ConfidenHal
Nielsen’s RaHngs Are Good (Surprisingly Good)
44
Time Series: Broadcast: CBS
Sources: Nielsen & Simulmedia’s a7
Hour by hour Hme series Mar 20 to April 8, 2011. Z score plots with Nielsen esHmates in red. Simulmedia measurements in blue. Where Nielsen provided no esHmate, esHmates were imputed using MulHple ImputaHon (Rubin (1987))
60 networks. High correla;on between Nielsen large sample measurement and a7 measures
45
Time Series: Broadcast: Fox
Sources: Nielsen & Simulmedia’s a7
Hour by hour Hme series Mar 20 to April 8, 2011. Z score plots with Nielsen esHmates in red. Simulmedia measurements in blue. Where Nielsen provided no esHmate, esHmates were imputed using MulHple ImputaHon (Rubin (1987))
46
Time Series: Broadcast: ABC
Sources: Nielsen & Simulmedia’s a7
Hour by hour Hme series Mar 20 to April 8, 2011. Z score plots with Nielsen esHmates in red. Simulmedia measurements in blue. Where Nielsen provided no esHmate, esHmates were imputed using MulHple ImputaHon (Rubin (1987))
47
Time Series: Cable: InvesEgaEon Discovery
Sources: Nielsen & Simulmedia’s a7
Hour by hour Hme series Mar 20 to April 8, 2011. Z score plots with Nielsen esHmates in red. Simulmedia measurements in blue. Where Nielsen provided no esHmate, esHmates were imputed using MulHple ImputaHon (Rubin (1987))
48
Time Series: Cable: Golf
Sources: Nielsen & Simulmedia’s a7
Hour by hour Hme series Mar 20 to April 8, 2011. Z score plots with Nielsen esHmates in red. Simulmedia measurements in blue. Where Nielsen provided no esHmate, esHmates were imputed using MulHple ImputaHon (Rubin (1987))
49
Time Series: Cable: Bravo
Sources: Nielsen & Simulmedia’s a7
Hour by hour Hme series Mar 20 to April 8, 2011. Z score plots with Nielsen esHmates in red. Simulmedia measurements in blue. Where Nielsen provided no esHmate, esHmates were imputed using MulHple ImputaHon (Rubin (1987))
50
Time Series: Cable: ESPN2
Sources: Nielsen & Simulmedia’s a7
Hour by hour Hme series Mar 20 to April 8, 2011. Z score plots with Nielsen esHmates in red. Simulmedia measurements in blue. Where Nielsen provided no esHmate, esHmates were imputed using MulHple ImputaHon (Rubin (1987))
51
Time Series: Cable: Speed
Sources: Nielsen & Simulmedia’s a7
Hour by hour Hme series Mar 20 to April 8, 2011. Z score plots with Nielsen esHmates in red. Simulmedia measurements in blue. Where Nielsen provided no esHmate, esHmates were imputed using MulHple ImputaHon (Rubin (1987))
Highly ConfidenHal
…but…
53
When You Look Closer
Sources: Nielsen & Simulmedia’s a7
Hour by hour Hme series Mar 20 to April 8, 2011. Z score plots with Nielsen esHmates in red. Simulmedia measurements in blue. Where Nielsen provided no esHmate, esHmates were imputed using MulHple ImputaHon (Rubin (1987))
54
High Frequency Time Series: ABC Family
Sources: Nielsen & Simulmedia’s a7
Nielsen
Sample graph from High Frequency (Second and Minute level) Time Series Analysis of 45 networks on January 19th 2011. Simulmedia a7 Sample (Second by Second to Minute) Nielsen Sample (Minute by Minute)
a7
Vola;lity in dayparts, low rated networks, demographics…. Unrated networks “don’t exist.” Did NOT look at local.
Highly ConfidenHal
Women Are More Different Than Men
56
Gender Driven Geographic VariaEon
Viewing by zip code among women across markets is more varied than men in the same zip codes
Women 18-‐54 Men 18-‐54
FracHon of view Hme for ages 18-‐54 as fracHon of view Hme for all TV viewers. Week 2 vs. the same fracHon for week 1 (last two weeks in January). Three markets: Philadelphia (blue) Atlanta (red) and Chicago (green) Each point represents a zip code in one of these markets. Source: Simulmedia’s a7
57
Gender Driven Geographic VariaEon
Planning tac;cs for female targeted campaigns should be different than male target campaigns
PS…Also a good case for geo based crea;ve versioning
Highly ConfidenHal
Privacy Mafers
59
Privacy By Design
• All markeHng data companies need to care
• Make consumer privacy protecHon part of the business from the beginning • Anonymous, aggregated data only • No personal data or data that can
be related to parHcular individuals or devices
• Broad markeHng segmentaHons, not profiling
• No sensiHve data
Don’t be creepy
Highly ConfidenHal
Mass Reach Is Indiscriminant
61
FragmentaEon Effects On Frequency
Source: Nielsen & Simulmedia’s a7
Each segment was above 70% reach but the frequency distribu;on was nearly iden;cal
Percent of audience reached for major animated moHon picture campaign 2011. Two weeks prior to release. Each stacked bar is a different audience segment. Each color with the stacked bar represents the frequency of ad view for each segment.
62
FragmentaEon Effects On Frequency
Source: Nielsen & Simulmedia’s a7
Fragmenta;on is affec;ng all high reach campaigns.
Percent of audience reached for insurance adverHsers September to October 2010. Approximately 8000 ads. Each stacked bar is a different audience segment. Each color with the stacked bar represents the frequency of ad view for each segment.
63
The TV adverHsing market can’t conHnue to support this
FragmentaEon Effects On Frequency
Highly ConfidenHal
40% Of The Audience Is Geyng 85% Of The
Impressions
65
FragmentaEon Rears It’s Head Again
Source: Nielsen & Simulmedia’s a7
0.0
1.4
4.3
9.1
24.8
0.0%
3.6%
10.8%
23.0%
62.6%
Average Frequency Per QuinEle
% of Total Impressions Per QuinEle
Campaign impressions increasingly concentrated against
heavy viewers.
Percent of audience reached for a different major animated moHon picture campaign 2011. Two weeks prior to release. The stacked bar represents quinHles. Blue labels are average frequency per respecHve quinHle. Red labels are % of total campaign impressions by respecHve quinHle.
Total US Television Audience
66
FragmentaEon Effects on Frequency
AdverHsers won’t conHnue to support this
Highly ConfidenHal
What Happens Next?
68
Choices
• If fragmentaHon is causing declining campaign reach and frequency imbalances, marketers must make choices. • Reduce reach
• Do nothing • Use other channels
• Stabilize or improve reach • Re-‐aggregate audiences using big data
What do you think?
70
About Our Science Team
• Krishna Balasubramanian, Chief ScienHst • Previously: Chief ScienHst, Tacoda. Chief ScienHst, Real Media. • Doctoral Candidate, Physics. (Condensed Mafer Physics) The Ohio State University • MS, Computer & InformaHon Systems. The Ohio State University • MSc, Physics. Indian Ins;tute of Technology, Kanpur
• Yuliya Torosjan, ScienHst • Previously: Clinical Research (Brain Imaging), Mount Sinai College of Medicine • MA, StaHsHcs. Columbia University • BSE, Computer Science & Engineering. University of Pennsylvania • BA, Psychology. University of Pennsylvania
• Mario Morales, ScienHst • Previously: Lecturer, BioinformaHcs, New York University. Senior Consultant, Weiser LLP. • MS, StaHsHcs. Hunter College • MS, BioinformaHcs. New York University
• Dr. Sidd Mukherjee, ScienHst • Previously, VisiHng Scholar (Atomic Scafering experiments), The Ohio State University • Post doctoral research, Heat capacity of Helium-‐4. Pennsylvania State University • PhD, Physics. (Thesis: Measurements of Diffuse and Specular Scafering of 4He Atoms from
4He Films), Ohio State University • MS, Computer &InformaHon Systems. The Ohio State University • BSc, Physics & MathemaHcs. University of Bombay