31
Using Big Data to Drive Big Engagement Name: George Chiu Company: Teradata

Using Big Data to Driving Big Engagement

Embed Size (px)

Citation preview

Page 1: Using Big Data to Driving Big Engagement

Using Big Data to Drive Big Engagement

Name: George Chiu

Company: Teradata

Page 2: Using Big Data to Driving Big Engagement

Netflix: Using Big Data to Drive Big Engagement40PB Analytics in AWS

George Chiu, Sr. Industry Consultant

Oct. 2017

Page 3: Using Big Data to Driving Big Engagement

3

#1 Streaming

video service

Started 1998

when Reed

Hastings

accrued $40

late fee on

“Apollo 11”

In 2000, Blockbuster

Video

declined

chance to purchase

Netflix for $50M

Current

Market Cap:

$56B

Teradata

Customer

since 2007

86M

members in

190 countries

Stream

132M hrs/day

aka

92K hrs/min

aka

10.5 yrs/min

600B events

generated

daily

40PB on

AWS-S3

Read/write

10% daily

350 active

big data

users

Page 4: Using Big Data to Driving Big Engagement

4

Agenda

1. What Analytics that Netflix used for driving more engagement?

2. Insights & Approach

3. Netflix Architecture on AWS with Teradata DW.a.a.S.

Page 5: Using Big Data to Driving Big Engagement

5 © 2017 Teradata

What Analytics that Netflix used for driving more engagement?

Page 6: Using Big Data to Driving Big Engagement

6

© 2016 Teradata

Netflix

• Focus is on making it easy to find things

to watch

• Spend $150m on data & analytics

➢ 20x more than average

➢ 2% of ARPU

• Processing 400bn interactions daily

• Hundreds of analyst continually deriving

new metadata

Page 7: Using Big Data to Driving Big Engagement

7 © 2017 Teradata

Differentiate or Disappear

• More content, newer, more exclusive

• Make it easy for customers to find

• Make it easy to watch

• Provide a great service

• Provide relevant, timely and consistent

interactions

• Provide flexible packages

https://business.tivo.com/content/dam/tivo/resources/whitepapers/Q3_2016_Video_Trends_Report.pdf

Page 8: Using Big Data to Driving Big Engagement

8

Can we influence customer engagement?

• 1.2% of high value TV package subscribers down spin each month (+11% on LY)

• Perceived value diminishes when initial discount ends…12 months & beyond

• Subscribers who down spin are not engaged with the content and watch 15% less exclusive/premium TV

• Current marketing limited with no 121 content

Identify at risk customers and prevent down spin with

personalised recommendations

© 2017 Teradata

Page 9: Using Big Data to Driving Big Engagement

9 © 2017 Teradata

Insights and Approach

Page 10: Using Big Data to Driving Big Engagement

10 © 2017 Teradata

Approach

Step 1:

Profile Subscriber

Viewing Against

Genres

Step 2:

Create

Behavioural

Clusters

Step 3:

Which

Subscribers to

target per

cluster?

Step 4:

Build

Recommenda

tions per

subscriber

Step 5:

Apply Business

Rules

Page 11: Using Big Data to Driving Big Engagement

11 © 2017 Teradata

Step 1: Profile Subscriber Viewing Against Genres

News Soccer Reality Documentary Horro

r

Music Crime Drama … …

5 10 32 18 1 4 5 … …

News Soccer Reality Documentary Horro

r

Music Crime Drama … …

0.07 0.13 0.43 0.24 0.01 0.05 0.07 … …

Identify the proportion of each subscribers

viewing duration that can be attributed

to each genre.

This subscriber

watches majority

Reality content

(43%), but also likes

Documentaries (24%)

and Soccer (13%).

Page 12: Using Big Data to Driving Big Engagement

12 © 2017 Teradata

Soccer, Drama, NewsCluster #: 0

# Subscribers: 61k

Soccer, News, Sports TalkCluster #: 8

# Subscribers: 32k

Reality, Documentary, EntsCluster #: 17

# Subscribers: 85k

MusicCluster #: 25

# Subscribers: 25k

Step 2: Create Behavioural Clusters

Crime DamaCluster #: 13

# Subscribers: 28k

DocumentaryCluster #: 21

# Subscribers: 56k

Children, Animated, AdventureCluster #: 11

# Subscribers: 56k

RealityCluster #: 15

# Subscribers: 57k

Page 13: Using Big Data to Driving Big Engagement

13 © 2017 Teradata

Step 3: Which Subscribers to Target Per Cluster?

% Channels Viewed Premium

% D

ura

tio

n V

iew

ed

Pre

miu

m

Deciding on a threshold:

Threshold

Re

ca

ll o

f C

hu

rne

rs

By focusing on subscribers who watch less

than 30% Premium content and channels,

allows us to identify 80% of the churning

population (who churn within the next month).

30:30 Rule

Low

Engagement

High

Engagement

Page 14: Using Big Data to Driving Big Engagement

14

Programmes

Subscribers

Subscriber 1 Subscriber 2 Subscriber 3

Recommended to Subscriber 1

Recommended to Subscriber 2

Step 4: Build Recommendations per Subscriber (Series)

Uses a ‘People Like

Me’ Collaborative

Filtering approach to

identify similar

programmes based

on subscribers who

watch programmes

together.

© 2017 Teradata

Page 15: Using Big Data to Driving Big Engagement

15

Programmes

Subscribers

Subscriber 1 Subscriber 2 Subscriber 3

Step 4: Build Recommendations per Subscriber (Movies)

Similarity of movies watched in the same cluster is

computed using a Pearson Correlation metric

based on the IMDB features of the movies (Genre,

Director, Cast, Rating etc).© 2017 Teradata

Page 16: Using Big Data to Driving Big Engagement

16 © 2016 Teradata

Step 5: Apply Business Rules

All RecommendationsEliminate previously watched content & content no longer available live or on demand

Apply business profitability rules.

Page 17: Using Big Data to Driving Big Engagement

17 © 2017 Teradata

QlikView: Behavioural Cluster Dashboard

A dashboard can

be created to

convey the

outputs of

advanced

analytics.

Page 18: Using Big Data to Driving Big Engagement

18 © 2017 Teradata

Next Steps

We think you’ll like this, Ruth• How effective are personalised

recommendations in engaging customers with premium and package exclusive content?

o Personalised banner in weekly email

o Measurement of downspin Test versus Control

Page 19: Using Big Data to Driving Big Engagement

Netflix AWS Architecture with Teradata DW.a.a.S

Page 20: Using Big Data to Driving Big Engagement

20

Am

azo

n S

3

NETFLIX Architecture

Users

Ca

ssan

dra

Lo

g C

olle

ctio

n &

OD

S

Keystore(Kafka)

Pig

HiveEMR

ETL

$$$

Redshift

Redshift

Redshift

Future

Analytic

Engines

DWaaS1,100,000 QPD

(50,000 analytic)

300TB Disk

3,500 QPD

40PB Disk

Page 21: Using Big Data to Driving Big Engagement

21

Page 22: Using Big Data to Driving Big Engagement

22

100% Open Source SQL Query Engine for the Modern Data Ecosystem

Page 23: Using Big Data to Driving Big Engagement

23

Presto workerPresto worker Presto worker Presto worker

Presto Coordinator

What is Presto?

Client

SELECT u.UserID,

count(s.*) as ClickCnt

FROM MySQL.MDM.Users as u

JOIN Hive.Web.Clicks as s

on u.SessID = s.SessID

Group by u.UserID

Order by ClickCnt desc;

Page 24: Using Big Data to Driving Big Engagement

24

Also, NOT Hadoop

• Not an Apache Project

• Daemon based, not MapReduce

• Typically stand-alone cluster

• Hadoop large source of data

LOOKS like a Database

• ANSI SQL compliant

• Advanced SQL features

• In-Memory operations

• ODBC / JDBC drivers

NOT a Database

• No persistent store

• Sources data at runtime

• Doesn’t run at “relational

speed”

What is Presto?

X X

Page 25: Using Big Data to Driving Big Engagement

25

Why Presto@Netflix?

Selection Criteria

• Petabyte Scale

• Open Source

• ANSI Compliant

• Hadoop-Friendly

• Running Facebook

• Well Designed Java

• 1 Month to Write S3 API

• Performance

Page 26: Using Big Data to Driving Big Engagement

26

Presto Use Cases @ Netflix

If you need to… Then

try…

However, if… Then

use…

Run reports via Tableau or MSTR, or analytics on aggregate data

Teradata Data needed at a lower grain, or for longer historical period

Presto

Adhoc Interactiveexploration on detail data

Presto Joining 2 big tables, or otherwise doesn’t fit into memory

Hive

Long running queries joining big tables

Hive

Sub-Second analysis on pre-generated cube structures

Druid Question falls outside cube definition

Teradata / Presto

Run Batch ETL in legacy framework

Pig Building new ETL in future framework

Spark

Build new ETL from scratch Spark Data size too big Pig

Validate ETL accuracy Presto Joining 2 big tables, or otherwise doesn’t fit into memory

Hive

EMR

Page 27: Using Big Data to Driving Big Engagement

27

Presto

• Detailed Exploration – Network behavior prior to event

– User segment clustering

– Historical viewing trends

– Historic user behavior

– Program correlation analysis

– Recommendation validation

– Predictive production decisions

– Etc.

Teradata

• Enterprise reporting Microstrategy

– Subscriptions by country

– Average Minutes per Sitting

– Errors per 1M streams

– Monthly profitability by device

• BI tool exploration & analyticsTableau

– Reasons for quitting mid-stream

– Seasonal viewing trends by genre

– Marketing responsiveness

Analytics at Netflix

Page 28: Using Big Data to Driving Big Engagement

28

Netflix User Experience

Very positive!

• ~3500 Queries per Day

• 90% of queries complete under 1 minute

• 60% of queries completeunder 5 seconds

• Integrated into Big Data Portal

• Easy cluster scaling up/down

Adoption was rapid and overwhelmingly positive

Page 29: Using Big Data to Driving Big Engagement

29

Netflix Data Pipeline

Compute

EMR

Service

MetaCat

Operational

15 minutes

Daily

CloudApps

Ca

ssa

nd

ra

Kakfa

Storage

Am

azo

n S

3

Page 30: Using Big Data to Driving Big Engagement

30

Netflix Data Pipeline

Compute

EMR

Service

MetaCat

Tools

Forklift

Sting

Charlotte

Data Movement

Data Visualization

Data Lineage

Data Quality

Pig WorkflowVisualization

Job Cluster Perf.Visualization

Quinto

Lipstick

API

API

API

API

API

API

API

Big Data Portal

Big Data Portal Teradata V

SELECT *

FROM MyTable;

Submit

ServicesTeradataPrestoEMR HiveSparkDruid

=

Page 31: Using Big Data to Driving Big Engagement

31

https://www.linkedin.com/in/george-chiu/

THANK YOU