Going Real-Time: Creating Frequently-Updating Datasets for Personalization: Spark Summit East talk...

Shriya Arora

Senior Data EngineerPersonalization Analytics

Streaming datasets for Personalization

What is Netflix’s Mission?

Entertaining you by allowing you to streamcontent anywhere, anytime

What is Netflix’s Mission?

Entertaining you by allowing you to stream personalizedcontent anywhere, anytime

How much data do we process to have a personalized Netflix for everyone?

● 93M+ active members

● 125M hours/ day

● 190 countries with unique catalogs

● 450B unique events/day

● 600+ Kafka topics

Image credit:http://www.bigwisdom.net/

Data Infrastructure

Raw data(S3/hdfs)

Stream Processing (Spark, Flink …)

Processed data(Tables/Indexers)

Batch processing(Spark/Pig/Hive/MR)

Application instances

Keystone IngestionPipeline

Rohan’s talk tomorrow @12:20

User watches a video on Netflix

Data flows through Netflix Servers

Our problem: Using user plays for feature generation, discovery, clustering ..

Why have data later when you can have it now?

● Business Wins○ Algorithms can be trained with the latest + greatest data ○ Enhances research ○ Creates opportunity for new types of algorithms

● Technical Wins○ Save on storage costs○ Avoid long running jobs

Source of Discovery / Source of Play

Source-of-Discovery pipeline

Playback sessions

Spark- Streaming app

Discoveryservice

REST API

Video Metadata

Backup

Spark Streaming

● Needs a StreamingContext and a batch duration● Data received in DStreams, which are easily converted to RDDs● Support all fundamental RDD transformations and operations● Time-based windowing● Checkpointing support for resilience to failures ● Deployment

Performance tuning your Spark streaming application

● Choice of micro-batch interval○ The most important parameter

● Cluster memory○ Large batch intervals need more memory

● Parallelism○ DStreams naturally partitioned to Kafka partitions○ Repartition can help with increased parallelism at the cost of shuffle

● # of CPUs○ <= number of tasks ○ Depends on how computationally intensive your processing is

Performance tuning your Spark streaming application

Challenges with Spark

● Not a ‘pure’ event streaming system○ Minimum latency of batch interval○ Un-intuitive to design for a stream-only world

● Choice of batch interval is a little too critical ○ Everything can go wrong, if you choose this wrong○ Build-up of scheduling delay can lead to data loss

● Only time-based windowing* ○ Cannot be used to solve session-stitching use cases, or trigger based event aggregations

* I used 1.6.1

Challenges with Streaming

● Pioneer Tax○ Training ML models on streaming data

is new ground

● Increased criticality of outages○ Batch failures have to be addressed

urgently, Streaming failures have to be addressed immediately.

● Infrastructure investment○ Monitoring, Alerts, ○ Non-trivial deployments

There are two kinds of pain...

Questions?

Stay in touch!

@NetflixData

Going Real-Time: Creating Frequently-Updating Datasets for Personalization: Spark Summit East talk...

Data & Analytics

The new customer intimacy: Personalization dominates as ... · It’s critical to retail survival. Personalization then vs. personalization now Although personalization is not a new

LOCATION PLAN SHRIYA SERENITYshriyaconstruction.com/shriyalayoutfinal.pdf · 2014. 2. 5. · LOCATION PLAN SHRIYA constructions SC Flat No. 303, Rajeshwari Towers Dwarakapuri Colony,

Shriya Bhatt Mission Hospital

Learning Personalization and Discovery_final.pdf Personalization and... · Learning Personalization and Discovery ... company’s “Learning Catalog” contains more than 47,000

Madam Arora

SUDESH ARORA

PATHOPHYSIOLOGY OF DIABETES MELLITUS By Prarit Arora By Prarit Arora

Shriya Patel Portfolio

Arora Thesis

10. 11. Sh. Sh. Puran Chand Arora Subhash Arora …...Puran Chand Arora Subhash Arora ITI, ITI, Range V, Chandigarh Range- IV, Chandigarh 9530786686 9530770786 (Sandeep Dahiya) Commissioner

Dhruv arora

Payal Arora

INTERCONTINENTAL LONDON - THE O2 · Brasserie 220 pax CONFERENCE CENTRE & HOTEL Arora 11 Arora 1 Arora 17 Arora 16 Arora 13 Arora 15 Arora 12 Arora 3 Arora 2 Arora 5 Arora 7 Arora

Advancing Personalization - eMarketer€¦ · Advancing Personalization: A How-To From the Leaders of the Retail Personalization Index. AGENDA The Retail Personalization Index & Personalization

Internet Marketing Personalization. Topics Personalization and marketing Consumer benefits of personalization Implementing personalization

Manish Arora

Portfolio_Sugandh Arora

Form Personalization - 7 - Form Personalization and CUSTOMpll

Kuntal arora

By Devina Pathak, Seetal Assi and Shriya Amin