Paris ML meetup

Preview:

Citation preview

Machine Learning @ Netflix(and some lessons learned)

Yves Raimond (@moustaki)

Research/Engineering Manager

Search & Recommendations

Algorithm Engineering

Netflix evolution

Netflix scale● > 69M members

● > 50 countries

● > 1000 device types

● > 3B hours/month

● 36% of peak US downstream traffic

Recommendations @ Netflix

● Goal: Help members find content

to watch and enjoy to maximize

satisfaction and retention

● Over 80% of what people watch

comes from our recommendations

● Top Picks, Because you Watched,

Trending Now, Row Ordering,

Evidence, Search, Search

Recommendations, Personalized

Genre Rows, ...

▪ Regression (Linear, logistic, elastic net)

▪ SVD and other Matrix Factorizations

▪ Factorization Machines

▪ Restricted Boltzmann Machines

▪ Deep Neural Networks

▪ Markov Models and Graph Algorithms

▪ Clustering

▪ Latent Dirichlet Allocation

▪ Gradient Boosted Decision Trees/Random Forests

▪ Gaussian Processes

▪ …

Models & Algorithms

Some lessons learned

Build the offline experimentation framework first

When tackling a new problem● What offline metrics can we compute that capture what online improvements we’

re actually trying to achieve?

● How should the input data to that evaluation be constructed (train, validation,

test)?

● How fast and easy is it to run a full cycle of offline experimentations?

○ Minimize time to first metric

● How replicable is the evaluation? How shareable are the results?

○ Provenance (see Dagobah)

○ Notebooks (see Jupyter, Zeppelin, Spark Notebook)

When tackling an old problem● Same…

○ Were the metrics designed when first running experimentation in that space still appropriate now?

Think about distribution from the outermost layers

1. For each combination of hyper-parameter

(e.g. grid search, random search, gaussian processes…)

2. For each subset of the training data

a. Multi-core learning (e.g. HogWild)

b. Distributed learning (e.g. ADMM, distributed L-BFGS, …)

When to use distributed learning?● The impact of communication overhead when building distributed ML

algorithms is non-trivial

● Is your data big enough that the distribution offsets the communication overhead?

Example: Uncollapsed Gibbs sampler for LDA

(more details here)

Design production code to be experimentation-friendly

Idea Data

Offline Modeling

(R, Python, MATLAB, …)

Iterate

Implement in production

system (Java, C++, …)

Missing post-processing logic

Performance issues

Actual outputProduction environment

(A/B test) Code discrepancies

Final model

Data discrepancies

Example development process

Avoid dual implementations

Shared Engine

Experimentcode

Productioncode

ProductionExperiment

We’re hiring!

Yves Raimond (@moustaki)