Upload
chris-johnson
View
157.928
Download
4
Embed Size (px)
Citation preview
From Idea to Execution: Spotify’s Discover Weekly
Chris Johnson :: @MrChrisJohnsonEdward Newett :: @scaladaze DataEngConf • NYC • Nov 2015
Or: 5 lessons in building recommendation products at scale
Spotify in Numbers• Started in 2006, now available in 58 markets• 75+ Million active users, 20 Million paying subscribers• 30+ Million songs, 20,000 new songs added per day• 1.5 Billion user generated playlists• 1 TB user data logged per day• 1,700 node Hadoop cluster • 10,000+ Hadoop jobs run daily
Discover Weekly
• Started in 2006, now available in 58 markets• 75+ Million active users, 20 Million paying subscribers• 30+ Million songs, 20,000 new songs added per day• 1.5 Billion user generated playlists• 1 TB user data logged per day• 1,700 node Hadoop cluster • 10,000+ Hadoop jobs run daily
2013 :: Discover Page v1.0
• Personalized News Feed of recommendations
• Artists, Album Reviews, News Articles, New Releases, Upcoming Concerts, Social Recommendations, Playlists…
• Required a lot of attention and digging to engage with recommendations
• No organization of content
2014 :: Discover Page v2.0
• Recommendations grouped into strips (a la Netflix)
• Limited to Albums and New Releases
• More organized than News-Feed but still requires active interaction
• Reach: How many users are you reaching• Depth: For the users you reach, what is the
depth of reach.• Retention: For the users you reach, how many
do you retain?
Define success metrics BEFORE you release your test
• Reach: DW WAU / Spotify WAU• Depth: DW Time Spent / Spotify WAU• Retention: DW week-over-week retention
Discover Weekly Key Success Metrics
Personalized image resulted in 10% lift in WAU
• Initial 0.5% user test
• 1% Spaceman image
• 1% Personalized image
1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1
•Aggregate all (user, track) streams into a large matrix•Goal: Approximate binary preference matrix by inner product of 2 smaller matrices by minimizing the
weighted RMSE (root mean squared error) using a function of plays, context, and recency as weight
X YUsers
Songs
• = bias for user• = bias for item• = regularization parameter
• = 1 if user streamed track else 0• • = user latent factor vector• = item latent factor vector
[1] Hu Y. & Koren Y. & Volinsky C. (2008) Collaborative Filtering for Implicit Feedback Datasets 8th IEEE International Conference on Data Mining
Implicit Matrix Factorization
1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1
•Aggregate all (user, track) streams into a large matrix•Goal: Model probability of user playing a song as logistic, then maximize log likelihood of binary
preference matrix, weighting positive observations by a function of plays, context, and recency
X YUsers
Songs
• = bias for user• = bias for item• = regularization parameter
• = user latent factor vector• = item latent factor vector
[2] Johnson C. (2014) Logistic Matrix Factorization for Implicit Feedback Data NIPS Workshop on Distributed Matrix Computations
Can also use Logistic Loss!
[3] http://benanne.github.io/2014/08/05/spotify-cnns.html
Deep Learning on Audio
Scaling to 100%: Rollout Challenges
‣Create and publish 75M playlists every week
‣Downloading and processing Facebook images
‣Language translations
Scaling to 100%: Weekly refresh
‣Time sensitive updates
‣Refresh 75M playlists every Sunday night
‣Take timezones into account