Using Big Data to Driving Big Engagement

Using Big Data to Drive Big Engagement

Name: George Chiu

Company: Teradata

Netflix: Using Big Data to Drive Big Engagement40PB Analytics in AWS

George Chiu, Sr. Industry Consultant

Oct. 2017

3

#1 Streaming

video service

Started 1998

when Reed

Hastings

accrued $40

late fee on

“Apollo 11”

In 2000, Blockbuster

Video

declined

chance to purchase

Netflix for $50M

Current

Market Cap:

$56B

Teradata

Customer

since 2007

86M

members in

190 countries

Stream

132M hrs/day

aka

92K hrs/min

aka

10.5 yrs/min

600B events

generated

daily

40PB on

AWS-S3

Read/write

10% daily

350 active

big data

users

4

Agenda

1. What Analytics that Netflix used for driving more engagement?

2. Insights & Approach

3. Netflix Architecture on AWS with Teradata DW.a.a.S.

5 © 2017 Teradata

What Analytics that Netflix used for driving more engagement?

6

© 2016 Teradata

Netflix

• Focus is on making it easy to find things

to watch

• Spend $150m on data & analytics

➢ 20x more than average

➢ 2% of ARPU

• Processing 400bn interactions daily

• Hundreds of analyst continually deriving

new metadata

7 © 2017 Teradata

Differentiate or Disappear

• More content, newer, more exclusive

• Make it easy for customers to find

• Make it easy to watch

• Provide a great service

• Provide relevant, timely and consistent

interactions

• Provide flexible packages

https://business.tivo.com/content/dam/tivo/resources/whitepapers/Q3_2016_Video_Trends_Report.pdf

8

Can we influence customer engagement?

• 1.2% of high value TV package subscribers down spin each month (+11% on LY)

• Perceived value diminishes when initial discount ends…12 months & beyond

• Subscribers who down spin are not engaged with the content and watch 15% less exclusive/premium TV

• Current marketing limited with no 121 content

Identify at risk customers and prevent down spin with

personalised recommendations

© 2017 Teradata

9 © 2017 Teradata

Insights and Approach

10 © 2017 Teradata

Approach

Step 1:

Profile Subscriber

Viewing Against

Genres

Step 2:

Create

Behavioural

Clusters

Step 3:

Which

Subscribers to

target per

cluster?

Step 4:

Build

Recommenda

tions per

subscriber

Step 5:

Apply Business

Rules

11 © 2017 Teradata

Step 1: Profile Subscriber Viewing Against Genres

News Soccer Reality Documentary Horro

r

Music Crime Drama … …

5 10 32 18 1 4 5 … …

News Soccer Reality Documentary Horro

r

Music Crime Drama … …

0.07 0.13 0.43 0.24 0.01 0.05 0.07 … …

Identify the proportion of each subscribers

viewing duration that can be attributed

to each genre.

This subscriber

watches majority

Reality content

(43%), but also likes

Documentaries (24%)

and Soccer (13%).

12 © 2017 Teradata

Soccer, Drama, NewsCluster #: 0

# Subscribers: 61k

Soccer, News, Sports TalkCluster #: 8

# Subscribers: 32k

Reality, Documentary, EntsCluster #: 17

# Subscribers: 85k

MusicCluster #: 25

# Subscribers: 25k

Step 2: Create Behavioural Clusters

Crime DamaCluster #: 13

# Subscribers: 28k

DocumentaryCluster #: 21

# Subscribers: 56k

Children, Animated, AdventureCluster #: 11

# Subscribers: 56k

RealityCluster #: 15

# Subscribers: 57k

13 © 2017 Teradata

Step 3: Which Subscribers to Target Per Cluster?

% Channels Viewed Premium

% D

ura

tio

n V

iew

ed

Pre

miu

m

Deciding on a threshold:

Threshold

Re

ca

ll o

f C

hu

rne

rs

By focusing on subscribers who watch less

than 30% Premium content and channels,

allows us to identify 80% of the churning

population (who churn within the next month).

30:30 Rule

Low

Engagement

High

Engagement

14

Programmes

Subscribers

Subscriber 1 Subscriber 2 Subscriber 3

Recommended to Subscriber 1

Recommended to Subscriber 2

Step 4: Build Recommendations per Subscriber (Series)

Uses a ‘People Like

Me’ Collaborative

Filtering approach to

identify similar

programmes based

on subscribers who

watch programmes

together.

© 2017 Teradata

15

Programmes

Subscribers

Subscriber 1 Subscriber 2 Subscriber 3

Step 4: Build Recommendations per Subscriber (Movies)

Similarity of movies watched in the same cluster is

computed using a Pearson Correlation metric

based on the IMDB features of the movies (Genre,

Director, Cast, Rating etc).© 2017 Teradata

16 © 2016 Teradata

Step 5: Apply Business Rules

All RecommendationsEliminate previously watched content & content no longer available live or on demand

Apply business profitability rules.

17 © 2017 Teradata

QlikView: Behavioural Cluster Dashboard

A dashboard can

be created to

convey the

outputs of

advanced

analytics.

18 © 2017 Teradata

Next Steps

We think you’ll like this, Ruth• How effective are personalised

recommendations in engaging customers with premium and package exclusive content?

o Personalised banner in weekly email

o Measurement of downspin Test versus Control

Netflix AWS Architecture with Teradata DW.a.a.S

20

Am

azo

n S

3

NETFLIX Architecture

Users

Ca

ssan

dra

Lo

g C

olle

ctio

n &

OD

S

Keystore(Kafka)

Pig

HiveEMR

ETL

$$$

Redshift

Redshift

Redshift

Future

Analytic

Engines

DWaaS1,100,000 QPD

(50,000 analytic)

300TB Disk

3,500 QPD

40PB Disk

21

22

100% Open Source SQL Query Engine for the Modern Data Ecosystem

23

Presto workerPresto worker Presto worker Presto worker

Presto Coordinator

What is Presto?

Client

SELECT u.UserID,

count(s.*) as ClickCnt

FROM MySQL.MDM.Users as u

JOIN Hive.Web.Clicks as s

on u.SessID = s.SessID

Group by u.UserID

Order by ClickCnt desc;

24

Also, NOT Hadoop

• Not an Apache Project

• Daemon based, not MapReduce

• Typically stand-alone cluster

• Hadoop large source of data

LOOKS like a Database

• ANSI SQL compliant

• Advanced SQL features

• In-Memory operations

• ODBC / JDBC drivers

NOT a Database

• No persistent store

• Sources data at runtime

• Doesn’t run at “relational

speed”

What is Presto?

X X

25

Why Presto@Netflix?

Selection Criteria

• Petabyte Scale

• Open Source

• ANSI Compliant

• Hadoop-Friendly

• Running Facebook

• Well Designed Java

• 1 Month to Write S3 API

• Performance

26

Presto Use Cases @ Netflix

If you need to… Then

try…

However, if… Then

use…

Run reports via Tableau or MSTR, or analytics on aggregate data

Teradata Data needed at a lower grain, or for longer historical period

Presto

Adhoc Interactiveexploration on detail data

Presto Joining 2 big tables, or otherwise doesn’t fit into memory

Hive

Long running queries joining big tables

Hive

Sub-Second analysis on pre-generated cube structures

Druid Question falls outside cube definition

Teradata / Presto

Run Batch ETL in legacy framework

Pig Building new ETL in future framework

Spark

Build new ETL from scratch Spark Data size too big Pig

Validate ETL accuracy Presto Joining 2 big tables, or otherwise doesn’t fit into memory

Hive

EMR

27

Presto

• Detailed Exploration – Network behavior prior to event

– User segment clustering

– Historical viewing trends

– Historic user behavior

– Program correlation analysis

– Recommendation validation

– Predictive production decisions

– Etc.

Teradata

• Enterprise reporting Microstrategy

– Subscriptions by country

– Average Minutes per Sitting

– Errors per 1M streams

– Monthly profitability by device

• BI tool exploration & analyticsTableau

– Reasons for quitting mid-stream

– Seasonal viewing trends by genre

– Marketing responsiveness

Analytics at Netflix

28

Netflix User Experience

Very positive!

• ~3500 Queries per Day

• 90% of queries complete under 1 minute

• 60% of queries completeunder 5 seconds

• Integrated into Big Data Portal

• Easy cluster scaling up/down

Adoption was rapid and overwhelmingly positive

29

Netflix Data Pipeline

Compute

EMR

Service

MetaCat

Operational

15 minutes

Daily

CloudApps

Ca

ssa

nd

ra

Kakfa

Storage

Am

azo

n S

3

30

Netflix Data Pipeline

Compute

EMR

Service

MetaCat

Tools

Forklift

Sting

Charlotte

Data Movement

Data Visualization

Data Lineage

Data Quality

Pig WorkflowVisualization

Job Cluster Perf.Visualization

Quinto

Lipstick

API

API

API

API

API

API

API

Big Data Portal

Big Data Portal Teradata V

SELECT *

FROM MyTable;

Submit

✓

✓

✓

✓

✓

✓

ServicesTeradataPrestoEMR HiveSparkDruid

=

31

https://www.linkedin.com/in/george-chiu/

THANK YOU

Documents

Using Big Data to Driving Big Engagement