View
216
Download
3
Embed Size (px)
Citation preview
Quest for $1,000,000:The Netflix Prize
Bob BellAT&T Labs-Research
July 15, 2009
Joint work with Chris Volinsky, AT&T Labs-Researchand Yehuda Koren, Yahoo! Research
2
Recommender Systems
• Personalized recommendations of items (e.g., movies) to users
• Increasingly common– To deal with explosive number of choices on
the internet– Netflix– Amazon– Many others
3
Content Based Systems
• A pre-specified list of attributes
• Score each item on all attributes
• User interest obtained for the same attributes– Direct solicitation, or– Estimated based on user purchases or ratings
4
Pandora
• Music recommendation system
• Songs rated on 400+ attributes– Music genome project– Roots, instrumentation, lyrics, vocals
• Two types of user feedback– Seed songs– Thumbs up/down for recommended songs
5
Drawbacks of Content Based Systems
• Effort to score all items on many attributes– Best attributes may be unknown– Some attributes may be unscorable
• Need for direct solicitation of data from users in some systems
6
Collaborative Filtering (CF)
• Does not require content information about items or solicitation of users
• Infers user-item relationships from purchases or ratings
• Used by Amazon and Netflix
7
“We’re quite curious, really. To the tune of one million dollars.” – Netflix Prize rules
• Goal to improve on Netflix’ existing movie recommendation technology
• Prize– Based on reduction in root mean squared
error (RMSE) on test data– $1,000,000 grand prize for 10% drop– Or, $50,000 progress for best result each
year• Contest began October 2, 2006
8
Data Details
• Training data– 100 million ratings (from 1 to 5 stars)– 6 years (2000-2005)– 480,000 users– 17,770 “movies”
• Test data– Last few ratings of each user– User, movie, date given– Ratings withheld (for most of test data)– Teams are allowed daily feedback on their RMSE
9
Higher Mean Rating in Test Data
0
5
10
15
20
25
30
35
40
1 2 3 4 5
Rating
Per
cen
tag
e
Training (m = 3.60)
Probe (m = 3.67)
10
2004
Something Happened in Early 2004
11
Movies Rated Most Often
Title # Ratings Mean Rating
Miss Congeniality 227,715 3.36
Independence Day 216,233 3.72
The Patriot 200,490 3.78
The Day After Tomorrow 194,695 3.44
Pretty Woman 190,320 3.90
Pirates of the Caribbean 188,849 4.15
The Green Mile 180,883 4.31
Forrest Gump 180,736 4.30
12
Most Active Users
User ID # Ratings Mean Rating
305344 17,651 1.90
387418 17,432 1.81
2439493 16,560 1.22
1664010 15,811 4.26
2118461 14,829 4.08
1461435 9,820 1.37
1639792 9,764 1.33
1314869 9,739 2.95
13
Ratings per Movie in Training Data
Avg #ratings/movie: 5627
14
Ratings per User in Training Data
Avg #ratings/user: 208
15
Progress after 2 Months
Top contenders for Progress Prize 2007
0
1
2
3
4
5
6
7
8
9
10
10/2
/06
11/2
/06
12/2
/06
1/2/
07
2/2/
07
3/2/
07
4/2/
07
5/2/
07
6/2/
07
7/2/
07
8/2/
07
9/2/
07
10/2
/07
% i
mp
rove
men
t
ML@TorontoHow low can he go?wxyzConsulting
Grand prize
16
Progress after 8 Months
Top contenders for Progress Prize 2007
0
1
2
3
4
5
6
7
8
9
101
0/2
/20
06
11
/2/2
00
6
12
/2/2
00
6
1/2
/20
07
2/2
/20
07
3/2
/20
07
4/2
/20
07
5/2
/20
07
6/2
/20
07
7/2
/20
07
8/2
/20
07
9/2
/20
07
10
/2/2
00
7
% i
mp
rove
men
t
ML@Toronto
How low can he go?
wxyzConsulting
Gravity
BellKor
Grand prize
17
Nearest Neighbor (NN) Methods
• Most common CF tool• Predict rating for a specific user-item pair based
on ratings of– Similar items– By the same user– Or vice versa
• Requires no “content” about items or users• Easy to apply• Easy to explain to users• But not as powerful as other methods
18
Latent Factor Models
• Explain ratings by a set of latent factors (attributes)– Factors are learned from the data– No need for pre specification
• Neural networks• SVD (Singular Value Decomposition)
– AKA matrix factorization– Dominant method used by leaders of competition
19
Item Factors
• Each item summarized by a d-dimensional vector qi
• Potential factors– Comedy vs. drama– Amount of action– Depth of character development– Totally uninterpretable
• Choose d much smaller than number of items or users– e.g., d = 50 << 18,000 or 480,000
20
User Factors
• Similarly, each user summarized by pu
• Same number of factors • User factors measure interest in
corresponding item factors
• Predicted rating for Item i by User u– Inner product of qi and pu
– ˆor ˆ ''uiiuuiuiui pqbbrpqr
21
Geared towards females
Geared towards males
serious
escapist
The PrincessDiaries
The Lion King
Braveheart
Lethal Weapon
Independence Day
AmadeusThe Color Purple
Dumb and Dumber
Ocean’s 11Sense and Sensibility
22
Geared towards females
Geared towards males
serious
escapist
The PrincessDiaries
The Lion King
Braveheart
Lethal Weapon
Independence Day
AmadeusThe Color Purple
Dumb and Dumber
Ocean’s 11Sense and Sensibility
Gus
Dave
Challenges in Using SVD
• Need lots of factors (large d)
23
Challenges in Using SVD
• Need lots of factors (large d)
• Easy to over fit
24
25
The Fundamental Challenge
• How can we estimate as much signal as possible where there are sufficient data, without over fitting where data are scarce?
26
Geared towards females
Geared towards males
serious
escapist
The PrincessDiaries
The Lion King
Braveheart
Lethal Weapon
Independence Day
AmadeusThe Color Purple
Dumb and Dumber
Ocean’s 11Sense and Sensibility
Gus
27
Geared towards females
Geared towards males
serious
escapist
The PrincessDiaries
The Lion King
Braveheart
Lethal Weapon
Independence Day
AmadeusThe Color Purple
Dumb and Dumber
Ocean’s 11Sense and Sensibility
Gus
28
Geared towards females
Geared towards males
serious
escapist
The PrincessDiaries
The Lion King
Braveheart
Lethal Weapon
Independence Day
AmadeusThe Color Purple
Dumb and Dumber
Ocean’s 11Sense and Sensibility
Gus
29
Geared towards females
Geared towards males
serious
escapist
The PrincessDiaries
The Lion King
Braveheart
Lethal Weapon
Independence Day
AmadeusThe Color Purple
Dumb and Dumber
Ocean’s 11Sense and Sensibility
Gus
Challenges in Using SVD
• Need lots of factors (large d)
• Easy to over fit
• User behavior may change over time– Ratings go up or down– Interests may change– Composition of account may change, for
example, with addition of a new rater
30
31
Geared towards females
Geared towards males
serious
escapist
The PrincessDiaries
The Lion King
Braveheart
Lethal Weapon
Independence Day
AmadeusThe Color Purple
Dumb and Dumber
Ocean’s 11Sense and Sensibility
Gus
32
Geared towards females
Geared towards males
serious
escapist
The PrincessDiaries
The Lion King
Braveheart
Lethal Weapon
Independence Day
AmadeusThe Color Purple
Dumb and Dumber
Ocean’s 11Sense and Sensibility
Gus
33
Geared towards females
Geared towards males
serious
escapist
The PrincessDiaries
The Lion King
Braveheart
Lethal Weapon
Independence Day
AmadeusThe Color Purple
Dumb and Dumber
Ocean’s 11Sense and Sensibility
Gus +
Challenges in Using SVD
• Need lots of factors (large d)
• Easy to over fit
• User behavior may change over time
• Misses some types of patterns
34
35
Neither SVD nor NN is Perfect
• SVD is poorly situated to fully capture strong “local” relationships– e.g., among sequels
• NN ignores cumulative effect of many small signals– May be ineffective for items with no close
neighbors
• Each method complements the other
36
The Wisdom of Crowds (of Models)
• All models are wrong; some are useful – G. Box
• Our best entry during Year 1 was a linear combination of 107 sets of predictions– Nearest neighbors, SVD, neural nets, et al.– Many variations of model structure and parameter
settings• Years 2 and 3
– Individual models are more comprehensive and much more accurate
– Combining many models still helps– Five models suffice to beat Year 1 score
37
Progress after 1 Year
Top contenders for Progress Prize 2007
0
1
2
3
4
5
6
7
8
9
101
0/2
/20
06
11
/2/2
00
6
12
/2/2
00
6
1/2
/20
07
2/2
/20
07
3/2
/20
07
4/2
/20
07
5/2
/20
07
6/2
/20
07
7/2
/20
07
8/2
/20
07
9/2
/20
07
10
/2/2
00
7
% i
mp
rove
men
t
ML@Toronto
How low can he go?
wxyzConsulting
Gravity
BellKor
Grand prize
38
Is this Any Way to do Science?
• Wide participation– Submissions from 5,000 teams– 8,300 posts on the Netflix Prize forum
• Generation and dissemination of new methods– Presentations/workshops in academic conferences– Journal publications
• Reasons for success– Well designed by Netflix– Industrial strength data set– Opportunity to build on work of others– Collegial spirit of competitors
The Race is On
40
Thank You!
• www.netflixprize.com– …/leaderboard– …/community
• Click BellKor on Leaderboard for details