Upload
amazon-web-services
View
228
Download
2
Embed Size (px)
Citation preview
Using Big Data to Drive Big Engagement
Name: George Chiu
Company: Teradata
Netflix: Using Big Data to Drive Big Engagement40PB Analytics in AWS
George Chiu, Sr. Industry Consultant
Oct. 2017
3
#1 Streaming
video service
Started 1998
when Reed
Hastings
accrued $40
late fee on
“Apollo 11”
In 2000, Blockbuster
Video
declined
chance to purchase
Netflix for $50M
Current
Market Cap:
$56B
Teradata
Customer
since 2007
86M
members in
190 countries
Stream
132M hrs/day
aka
92K hrs/min
aka
10.5 yrs/min
600B events
generated
daily
40PB on
AWS-S3
Read/write
10% daily
350 active
big data
users
4
Agenda
1. What Analytics that Netflix used for driving more engagement?
2. Insights & Approach
3. Netflix Architecture on AWS with Teradata DW.a.a.S.
5 © 2017 Teradata
What Analytics that Netflix used for driving more engagement?
6
© 2016 Teradata
Netflix
• Focus is on making it easy to find things
to watch
• Spend $150m on data & analytics
➢ 20x more than average
➢ 2% of ARPU
• Processing 400bn interactions daily
• Hundreds of analyst continually deriving
new metadata
7 © 2017 Teradata
Differentiate or Disappear
• More content, newer, more exclusive
• Make it easy for customers to find
• Make it easy to watch
• Provide a great service
• Provide relevant, timely and consistent
interactions
• Provide flexible packages
https://business.tivo.com/content/dam/tivo/resources/whitepapers/Q3_2016_Video_Trends_Report.pdf
8
Can we influence customer engagement?
• 1.2% of high value TV package subscribers down spin each month (+11% on LY)
• Perceived value diminishes when initial discount ends…12 months & beyond
• Subscribers who down spin are not engaged with the content and watch 15% less exclusive/premium TV
• Current marketing limited with no 121 content
Identify at risk customers and prevent down spin with
personalised recommendations
© 2017 Teradata
9 © 2017 Teradata
Insights and Approach
10 © 2017 Teradata
Approach
Step 1:
Profile Subscriber
Viewing Against
Genres
Step 2:
Create
Behavioural
Clusters
Step 3:
Which
Subscribers to
target per
cluster?
Step 4:
Build
Recommenda
tions per
subscriber
Step 5:
Apply Business
Rules
11 © 2017 Teradata
Step 1: Profile Subscriber Viewing Against Genres
News Soccer Reality Documentary Horro
r
Music Crime Drama … …
5 10 32 18 1 4 5 … …
News Soccer Reality Documentary Horro
r
Music Crime Drama … …
0.07 0.13 0.43 0.24 0.01 0.05 0.07 … …
Identify the proportion of each subscribers
viewing duration that can be attributed
to each genre.
This subscriber
watches majority
Reality content
(43%), but also likes
Documentaries (24%)
and Soccer (13%).
12 © 2017 Teradata
Soccer, Drama, NewsCluster #: 0
# Subscribers: 61k
Soccer, News, Sports TalkCluster #: 8
# Subscribers: 32k
Reality, Documentary, EntsCluster #: 17
# Subscribers: 85k
MusicCluster #: 25
# Subscribers: 25k
Step 2: Create Behavioural Clusters
Crime DamaCluster #: 13
# Subscribers: 28k
DocumentaryCluster #: 21
# Subscribers: 56k
Children, Animated, AdventureCluster #: 11
# Subscribers: 56k
RealityCluster #: 15
# Subscribers: 57k
13 © 2017 Teradata
Step 3: Which Subscribers to Target Per Cluster?
% Channels Viewed Premium
% D
ura
tio
n V
iew
ed
Pre
miu
m
Deciding on a threshold:
Threshold
Re
ca
ll o
f C
hu
rne
rs
By focusing on subscribers who watch less
than 30% Premium content and channels,
allows us to identify 80% of the churning
population (who churn within the next month).
30:30 Rule
Low
Engagement
High
Engagement
14
Programmes
Subscribers
Subscriber 1 Subscriber 2 Subscriber 3
Recommended to Subscriber 1
Recommended to Subscriber 2
Step 4: Build Recommendations per Subscriber (Series)
Uses a ‘People Like
Me’ Collaborative
Filtering approach to
identify similar
programmes based
on subscribers who
watch programmes
together.
© 2017 Teradata
15
Programmes
Subscribers
Subscriber 1 Subscriber 2 Subscriber 3
Step 4: Build Recommendations per Subscriber (Movies)
Similarity of movies watched in the same cluster is
computed using a Pearson Correlation metric
based on the IMDB features of the movies (Genre,
Director, Cast, Rating etc).© 2017 Teradata
16 © 2016 Teradata
Step 5: Apply Business Rules
All RecommendationsEliminate previously watched content & content no longer available live or on demand
Apply business profitability rules.
17 © 2017 Teradata
QlikView: Behavioural Cluster Dashboard
A dashboard can
be created to
convey the
outputs of
advanced
analytics.
18 © 2017 Teradata
Next Steps
We think you’ll like this, Ruth• How effective are personalised
recommendations in engaging customers with premium and package exclusive content?
o Personalised banner in weekly email
o Measurement of downspin Test versus Control
Netflix AWS Architecture with Teradata DW.a.a.S
20
Am
azo
n S
3
NETFLIX Architecture
Users
Ca
ssan
dra
Lo
g C
olle
ctio
n &
OD
S
Keystore(Kafka)
Pig
HiveEMR
ETL
$$$
Redshift
Redshift
Redshift
Future
Analytic
Engines
DWaaS1,100,000 QPD
(50,000 analytic)
300TB Disk
3,500 QPD
40PB Disk
21
22
100% Open Source SQL Query Engine for the Modern Data Ecosystem
23
Presto workerPresto worker Presto worker Presto worker
Presto Coordinator
What is Presto?
Client
SELECT u.UserID,
count(s.*) as ClickCnt
FROM MySQL.MDM.Users as u
JOIN Hive.Web.Clicks as s
on u.SessID = s.SessID
Group by u.UserID
Order by ClickCnt desc;
24
Also, NOT Hadoop
• Not an Apache Project
• Daemon based, not MapReduce
• Typically stand-alone cluster
• Hadoop large source of data
LOOKS like a Database
• ANSI SQL compliant
• Advanced SQL features
• In-Memory operations
• ODBC / JDBC drivers
NOT a Database
• No persistent store
• Sources data at runtime
• Doesn’t run at “relational
speed”
What is Presto?
X X
25
Why Presto@Netflix?
Selection Criteria
• Petabyte Scale
• Open Source
• ANSI Compliant
• Hadoop-Friendly
• Running Facebook
• Well Designed Java
• 1 Month to Write S3 API
• Performance
26
Presto Use Cases @ Netflix
If you need to… Then
try…
However, if… Then
use…
Run reports via Tableau or MSTR, or analytics on aggregate data
Teradata Data needed at a lower grain, or for longer historical period
Presto
Adhoc Interactiveexploration on detail data
Presto Joining 2 big tables, or otherwise doesn’t fit into memory
Hive
Long running queries joining big tables
Hive
Sub-Second analysis on pre-generated cube structures
Druid Question falls outside cube definition
Teradata / Presto
Run Batch ETL in legacy framework
Pig Building new ETL in future framework
Spark
Build new ETL from scratch Spark Data size too big Pig
Validate ETL accuracy Presto Joining 2 big tables, or otherwise doesn’t fit into memory
Hive
EMR
27
Presto
• Detailed Exploration – Network behavior prior to event
– User segment clustering
– Historical viewing trends
– Historic user behavior
– Program correlation analysis
– Recommendation validation
– Predictive production decisions
– Etc.
Teradata
• Enterprise reporting Microstrategy
– Subscriptions by country
– Average Minutes per Sitting
– Errors per 1M streams
– Monthly profitability by device
• BI tool exploration & analyticsTableau
– Reasons for quitting mid-stream
– Seasonal viewing trends by genre
– Marketing responsiveness
Analytics at Netflix
28
Netflix User Experience
Very positive!
• ~3500 Queries per Day
• 90% of queries complete under 1 minute
• 60% of queries completeunder 5 seconds
• Integrated into Big Data Portal
• Easy cluster scaling up/down
Adoption was rapid and overwhelmingly positive
29
Netflix Data Pipeline
Compute
EMR
Service
MetaCat
Operational
15 minutes
Daily
CloudApps
Ca
ssa
nd
ra
Kakfa
Storage
Am
azo
n S
3
30
Netflix Data Pipeline
Compute
EMR
Service
MetaCat
Tools
Forklift
Sting
Charlotte
Data Movement
Data Visualization
Data Lineage
Data Quality
Pig WorkflowVisualization
Job Cluster Perf.Visualization
Quinto
Lipstick
API
API
API
API
API
API
API
Big Data Portal
Big Data Portal Teradata V
SELECT *
FROM MyTable;
Submit
✓
✓
✓
✓
✓
✓
ServicesTeradataPrestoEMR HiveSparkDruid
=
31
https://www.linkedin.com/in/george-chiu/
THANK YOU