36
Playlist Recommendations @ Nikhil Tibrewal @nikhil_tibrewal

Playlist Recommendations @ Spotify

Embed Size (px)

Citation preview

Page 1: Playlist Recommendations @ Spotify

Playlist Recommendations@

Nikhil Tibrewal

@nikhil_tibrewal

Page 2: Playlist Recommendations @ Spotify

Who am I?

Nikhil Tibrewal (Nick-hill)

● Data Engineer on Lambda squad (Spotify’s primary ML team)● Graduated from Carnegie Mellon University in Dec 2013● B.Sc. in Computer Science + additional major in Econ● Been part of Spotify band for ~1.5 years● Worked on a range of projects, primarily Playlist Recommendations

Page 3: Playlist Recommendations @ Spotify

Spotify in numbers

● Started in 2006, 58 markets● 75M+ active users, 20M+ paying● 30M+ songs, 20K new per day● 1.5+ billion playlists● 1 TB logs per day

Page 4: Playlist Recommendations @ Spotify

● Discover tab● Radio● Related Artists● Discover Weekly● Playlist recs on “Now” Strip

Recommendations so far on SpotifyFor Ellie Goulding

Page 5: Playlist Recommendations @ Spotify

“Now” Strip

Human curated playlist

Page 6: Playlist Recommendations @ Spotify

“Now” Strip

Human curated playlist

Recommended playlist

Page 7: Playlist Recommendations @ Spotify

But…How are playlist recs generated?

Page 8: Playlist Recommendations @ Spotify

Quick Overview!

● Recommend only human curated playlists (1000+)○ Well-designed cover images○ Thorough descriptions○ Title reflects content

Page 9: Playlist Recommendations @ Spotify

Quick Overview!

● Recommend only human curated playlists (1000+)○ Well-designed cover images○ Thorough descriptions○ Title reflects content

Good

Page 10: Playlist Recommendations @ Spotify

Quick Overview!

● Recommend only human curated playlists (1000+)○ Well-designed cover images○ Thorough descriptions○ Title reflects content

Good Bad

Page 11: Playlist Recommendations @ Spotify

Quick Overview!

● Recommendations pipeline: Candidate Generation○ Generate N dimensional track vectors from collaborative filtering

Page 12: Playlist Recommendations @ Spotify

Quick Overview!

● Recommendations pipeline: Candidate Generation○ Generate N dimensional track vectors from collaborative filtering○ Vectorize playlists:

■ Playlist vector derived from track vectors in playlist

Page 13: Playlist Recommendations @ Spotify

Quick Overview!

● Recommendations pipeline: Candidate Generation○ Generate N dimensional track vectors from collaborative filtering○ Vectorize playlists:

■ Playlist vector derived from track vectors in playlist○ Use Annoy to store playlist vectors in N dimensional space

ANNOY (Approximate Nearest Neighbors Oh Yeah)created at Spotify

https://github.com/spotify/annoy

Page 14: Playlist Recommendations @ Spotify

Quick Overview!

● Recommendations pipeline: Candidate Generation○ Generate N dimensional track vectors from collaborative filtering○ Vectorize playlists:

■ Playlist vector derived from track vectors in playlist○ Use Annoy to store playlist vectors in N dimensional space○ Vectorize user taste as well:

■ User vector derived from user listening history

Page 15: Playlist Recommendations @ Spotify

Quick Overview!

● Recommendations pipeline: Candidate Generation○ Generate N dimensional track vectors from collaborative filtering○ Vectorize playlists:

■ Playlist vector derived from track vectors in playlist○ Use Annoy to store playlist vectors in N dimensional space○ Vectorize user taste as well:

■ User vector derived from user listening history○ User and playlist vectors in same space!○ Query for nearest playlists to user from Annoy tree

annoyTree.getNearest(seedVector, K)

Page 16: Playlist Recommendations @ Spotify

Quick Overview!

● Recommendations pipeline: Ranking Model○ Use genre information, demographics data, and playlist popularity

data to further rank recommendations■ John: 21, USA, likes rock■ Should get rock playlist recs that are popular in USA and

amongst 21 year olds○ Apply post-processing steps for shuffling and add variety to avoid

repetitions

Page 17: Playlist Recommendations @ Spotify

Quick Overview!

● Recommendations pipeline: Ranking Model○ Use genre information, demographics data, and playlist popularity

data to further rank recommendations■ John: 21, USA, likes rock■ Should get rock playlist recs that are popular in USA and

amongst 21 year olds○ Apply post-processing steps for shuffling and add variety to avoid

repetitions

90% DAUs have recs!

Page 18: Playlist Recommendations @ Spotify

Quick Overview!

● Infrastructure○ Luigi to manage workflow (also built at Spotify)○ Entire pipeline written in Scalding○ 1200+ nodes Hadoop cluster to run jobs○ Cassandra (~dozen nodes for playlist recs)○ Java backend micro-services serving recs

Page 19: Playlist Recommendations @ Spotify

Quick Overview!

"Scalding is comprised of a DSL (domain-specific language) that makes MapReduce computations look like Scala’s collection API and is a wrapper for Cascading to make it easy to define jobs, test and data sources on an HDFS" (http://cascading.io/customer/twitter/)

Page 20: Playlist Recommendations @ Spotify

Scalding w.r.t. Playlist Recs

● Used Python back in the day○ Inputs and outputs were tab separated○ Complexity UP => Difficulty to maintain UP○ Hard to write tests

● Scalding provided compile time error checks○ Catch errors early○ Define schemas (e.g. Avro)

● Can use Parquet + Avro for input/output○ Easy to write and read data○ Records with a lot of fields!○ Lesson: Parquet hurts performance w/ fat columns (nested data structs)

+

Page 21: Playlist Recommendations @ Spotify

Scalding w.r.t. Playlist Recs +

Page 22: Playlist Recommendations @ Spotify

Scalding w.r.t. Playlist Recs

● Data quality○ Hadoop counters wrappers in extended Scalding library code

+

Page 23: Playlist Recommendations @ Spotify

Scalding w.r.t. Playlist Recs

● Data quality○ Hadoop counters wrappers in extended Scalding library code○ Verify counters within reasonable ranges

+

Page 24: Playlist Recommendations @ Spotify

Scalding w.r.t. Playlist Recs +

Page 25: Playlist Recommendations @ Spotify

Scalding w.r.t. Playlist Recs

● Pipeline tolerance○ Job failures are normal, and annoying with big jobs○ Scalding checkpoints○ Lesson: checkpoint itself is a map-reduce job and has the same caveats○ Still very helpful!

+

Page 26: Playlist Recommendations @ Spotify

Scalding w.r.t. Playlist Recs

● Job runtimes○ Common solutions: more reducers and code optimizations○ Speculative execution for larger jobs○ Caveat: can take up unnecessary resources

+

Page 27: Playlist Recommendations @ Spotify

Scalding w.r.t. Playlist Recs

● Memory issues○ Used Sparkey indices in Python (developed at Spotify, now open source)

■ “Simple constant key/value storage lib for read-heavy systems with infrequent large bulk inserts”

■ Replicated to all mappers○ Complex jobs in Scalding => higher memory config for jobs with Sparkey

+

https://github.com/spotify/sparkey

Page 28: Playlist Recommendations @ Spotify

Scalding w.r.t. Playlist Recs

● Memory issues○ Used Sparkey indices in Python (developed at Spotify, now open source)

■ “Simple constant key/value storage lib for read-heavy systems with infrequent large bulk inserts”

■ Replicated to all mappers○ Complex jobs in Scalding => higher memory config for jobs with Sparkey○ Lesson: trade memory resources for MAYBE a little more time with joins

+

bigPipe.join(exSparkeyPipe)

https://github.com/spotify/sparkey

Page 29: Playlist Recommendations @ Spotify

Scalding w.r.t. Playlist Recs

● Driven○ “A sophisticated tool that collects telemetry data from running Scalding /

Cascading jobs on a cluster and presenting them in an intriguing User Interface."

○ http://cascading.io/

+

Page 30: Playlist Recommendations @ Spotify

Scalding w.r.t. Playlist Recs +

Page 31: Playlist Recommendations @ Spotify

Scalding w.r.t. Playlist Recs

● Other awesome benefits

+

Page 32: Playlist Recommendations @ Spotify

Scalding w.r.t. Playlist Recs

● Other awesome benefits○ Active community + big players

+

Page 33: Playlist Recommendations @ Spotify

Scalding w.r.t. Playlist Recs

● Other awesome benefits○ Active community + big players

○ Data pipeline flows naturally follow the functional paradigm - essentially writing Scala code

+

Page 34: Playlist Recommendations @ Spotify

Scalding w.r.t. Playlist Recs +

Page 35: Playlist Recommendations @ Spotify

Scalding w.r.t. Playlist Recs

Productivity without sacrificing performance!

+

Page 36: Playlist Recommendations @ Spotify

Status: CompletedSpotify is hiring!

Nikhil Tibrewal

@nikhil_tibrewal