Building Data Pipelines for Music Recommendations at Spotify

October 17, 2015

Data Pipelines for Music Recommendations

@ Spotify

Vidhya Murali@vid052

Vidhya Murali

Who Am I?

2

•Areas of Interest: Data & Machine Learning•Data Engineer @Spotify•Masters Student from the University of Wisconsin Madison

aka Happy Badger for life!

“Torture the data, and it will confess!”

3

– Ronald Coase, Nobel Prize Laureate

Spotify’s Big Data

4

•Started in 2006, now available in 58 countries• 70+ million active users, 20+ million paid subscribers• 30+ million songs in our catalog, ~20K added every day• 1.5 billion playlists so far and counting• 1 TB of user data logged every day• Hadoop cluster with 1500 nodes • ~20,000 Hadoop jobs per day

Music Recommendations at Spotify

Features:DiscoverDiscover WeeklyMomentsRadioRelated Artists

5

6

30 million tracks…What to recommend?

Approaches 7

•Manual curation by Experts

•Editorial Tagging

•Metadata (e.g. Label provided data, NLP over News, Blogs)

•Audio Signals

•Collaborative Filtering Model

Approaches 7

•Manual curation by Experts

•Editorial Tagging

•Metadata (e.g. Label provided data, NLP over News, Blogs)

•Audio Signals

•Collaborative Filtering Model

Collaborative Filtering Model 8

•Find patterns from user’s past behavior to generate recommendations

•Domain independent

•Scalable

•Accuracy (Collaborative Model) >= Accuracy (Content Based Model)

Definition of CF

9

Hey,I like tracks P, Q, R, S!

Well,I like tracks Q, R, S, T!

Then you should check out track P!

Nice! Btw try track T!

Legacy Slide of Erik Bernhardsson

The YoLo Problem 10

The YoLo Problem 10

•YoLo Problem: “You Only Listen Once” to judge recommendations•Goal: Predict if users will listen to new music (new to user)

The YoLo Problem 10


•Challenges• Scale of catalog (30M songs + ~20K added every day)• Repeated consumption of music is not very uncommon• Music is niche• Music consumption is heavily influenced by user’s lifestyle

The YoLo Problem 10


•Challenges• Scale of catalog (30M songs + ~20K added every day)• Repeated consumption of music is not very uncommon• Music is niche• Music consumption is heavily influenced by user’s lifestyle

• Input: Feedback is implicit through streaming behavior, collection adds, browse history, search history etc

User Plays to Track Recs 11


1. Weighted play counts from logs



2. Train Model using the input signals




3. Generate recs from the trained model




3. Generate recs from the trained model

4. Post process the recommendations

12

Step 1: ETL of Logs

•Extract and transform the anonymized logs to training data set•Case: Logs -> (user, track, wt.count)

Step 2: Construct Big Matrix! 13

Tracks(n)

Users(m)

Vidhya

Burn by Ellie Goulding

Step 2: Construct Big Matrix! 13

Tracks(n)

Users(m)

Vidhya

Burn by Ellie Goulding

Order of 70M x 30M!

Latent Factor Models 14

Vidhya Burn

.. . . . .

.. . . . .

.. . . . .

.. . . . .

.. . . . .

•Use a “small” representation for each user and items(tracks): f-dimensional vectors

.. .

.. .

.. .

.. . . .

.. .

.. .

.. .

.. .

. .m m

n

m n


Vidhya Burn

.. . . . .

.. . . . .

.. . . . .

.. . . . .

.. . . . .


.. .

.. .

.. .

.. . . .

.. .

.. .

.. .

.. .

. .m m

n

m n

User Track Matrix: (m x n)


Vidhya Burn

.. . . . .

.. . . . .

.. . . . .

.. . . . .

.. . . . .


.. .

.. .

.. .

.. . . .

.. .

.. .

.. .

.. .

. .m m

n

m n

User Vector Matrix: X: (m x f)



Vidhya Burn

.. . . . .

.. . . . .

.. . . . .

.. . . . .

.. . . . .


.. .

.. .

.. .

.. . . .

.. .

.. .

.. .

.. .

. .m m

n

m n


Track Vector Matrix: Y: (n x f)



Vidhya Burn

.. . . . .

.. . . . .

.. . . . .

.. . . . .

.. . . . .


.. .

.. .

.. .

.. . . .

.. .

.. .

.. .

.. .

. .

(here, f = 2)

m m

n

m n


Track Vector Matrix: Y: (n x f)


Matrix Factorization using Implicit Feedback 15

Matrix Factorization using Implicit Feedback

User Track Play Count Matrix

15



User Track Preference

Matrix

Binary Label: 1 => played 0 => not played

15



User Track Preference

Matrix

Binary Label: 1 => played 0 => not played

Weights Matrix

Weights based on play count and smoothing

15

Equation(s) Alert!16

Implicit Matrix Factorization 17

1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1

•Aggregate all (user, track) streams into a large matrix•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by

minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between

their latent factor vector in X and the track latent factor vectors in Y.

X YUsers

Tracks

• = bias for user• = bias for item• = regularization parameter

• = 1 if user streamed track else 0• • = user latent factor vector• = item latent factor vectoryi

Alternating Least Squares 18

1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1

X YUsers

Tracks


• = 1 if user streamed track else 0• • = user latent factor vector• = item latent factor vector

Fix tracks




yi

19

1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1

X YUsers



Fix tracks

Solve for users




Alternating Least Squares

yi

Tracks

20

1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1

X YUsers



Fix users





yi

Tracks

21

1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1

X YUsers



Fix usersSolve for tracks





yi

Tracks

22

1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1

X YUsers




Repeat until convergence…





yi

Tracks

23

1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1

X YUsers




Repeat until convergence…





yi

Tracks

Vectors•“Compact” representation for users and items(tracks) in the same space

Why Vectors? 25

•Vectors encode higher order dependencies

•Users and Items in the same vector space!•Use vector similarity to compute:•Item-Item similarities•User-Item recommendations

•Linear complexity: order of number of latent factors

•Easy to scale up

26

• Compute track similarities and track recommendations for users as a similarity measure

Step 3: Compute Recs!

• Euclidian Distance

• Cosine Similarity

• Pearson Correlation

26

• Compute track similarities and track recommendations for users as a similarity measure

Step 3: Compute Recs!

Recommendations via Cosine Similarity 27

Recommendations via Cosine Similarity 27

28

Annoy

• 70 million users, at least 4 million tracks for candidates per user• Brute Force Approach: • O(70M x 4M x 10) ~= 0(3 peta-operations)!

• Approximate Nearest Neighbor Oh Yeah!

• Uses Local Sensitive Hashing

• Clone: https://github.com/spotify/annoy

https://github.com/spotify/annoy

29

• Apply Filters• Interacted music• Holiday music anyone?

• Factor for:• Diversity• Freshness• Popularity• Demographics• Seasonality

Step 4: Post Processing

30

70 Million users x 30 Million tracks. How to

scale?

Matrix Factorization with MapReduce 31

Reduce stepMap step

u % K = 0i % L = 0

u % K = 0i % L = 1 ... u % K = 0

i % L = L-1

u % K = 1i % L = 0

u % K = 1i % L = 1 ... ...

... ... ... ...

u % K = K-1i % L = 0 ... ... u % K = K-1

i % L = L-1

item vectorsitem%L=0

item vectorsitem%L=1

item vectorsi % L = L-1

user vectorsu % K = 0

user vectorsu % K = 1

user vectorsu % K = K-1

all log entriesu % K = 1i % L = 1

u % K = 0

u % K = 1

u % K = K-1

•Split the matrix up into K x L blocks.•Each mapper gets a different block, sums up intermediate terms, then key by

user (or item) to reduce final user (or item) vector.

Matrix Factorization with MapReduce 32

One map taskDistributed

cache:All user vectors where u % K = x

Distributed cache:

All item vectors where i % L = y

Mapper Emit contributions

Map input:tuples (u, i, count)

where u % K = x

andi % L = y

Reducer New vector!

• Input to Mapper is a list of (user, item, count) tuples– user modulo K is the same for all users in block– item modulo L is the same for all items in the block– Mapper aggregates intermediate contributions for each user (or item)

– Eg: K=4, Mapper #1 gets user 1, 5, 9, 13 etc– Reducer keys by user (or item), aggregates intermediate mapper sums and solves closed form for final user

(or item) vector

Music Recommendations Data Flow

33

34

Source:

Revisiting YOLO!35

“You Only Listen Once to judge recommendations” problem

Optimizing for the Yolo Problem

•OFFLINE TESTING: •Experts’ Inputs •Measure accuracy

•A/B TESTS: control vs a/b group. Some useful metrics we consider: •DAU / WAU / MAU•Retention•Session Length•Skip Rate

36

Challenge Accepted!

•Cold start problem for both users and new music/upcoming artists: •Content based signals, real time recommendation

•Measuring recommendation quality:• A/B test metrics•Active forums for getting user feedback

•Scam Attacks:•Rule based model to detect scammers

•Humans choices are not always predictable: •Faith in humanity

37

What Next?

•Personalize user experience on Spotify for every moment: •Right Now

•Recommend other media formats:•Podcasts•Video

•Power music recommendations on other platforms:•Google Now

38

Join the Band!

We are hiring!

39

Thank You!You can reach me @Email: [email protected]: @vid052

mailto:[email protected]

Technology

Building Data Pipelines for Music Recommendations at Spotify