46
Social Computing and Collective Intelligence Lecture 11 21 February 2012

Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

Social Computing and

Collective Intelligence

Lecture 11 21 February 2012

Page 2: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

More on Link Analysis and Web Search

• Sections 14.1-14.5 (pages 397-417) of Networks, Crowds, and Markets: Reasoning about a Highly Connected World by David Easley and Jon Kleinberg. Cambridge University Press, 2010. http://www.cs.cornell.edu/home/kleinber/networks-book/networks-book-ch14.pdf

• "How Google’s Algorithm Rules the Web", Stephen Levy, Wired, March 2010. http://www.wired.com/magazine/2010/02/ff_google_algorithm/all/1

• "Adversarial Information Retrieval: The Manipulation of Web Content", Dennis Fetterly, ACM Computing Reviews, July 2007. http://www.reviews.com/hottopic/hottopic_essay_06.cfm

Page 3: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

More on Condorcet Jury Theorem Mathematics

• One voter:

Voter 1 Majority Vote

A A

B B

Page 4: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

• Three voters:

Voter 1 Voter 2 Voter 3 Majority Vote

A A A A

A A B A

A B A A

A B B B

B A A A

B A B B

B B A B

B B B B

More on Condorcet Jury Theorem Mathematics

Page 5: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

• Five voters:

Vote 1 Vote 2 Vote 3 Vote 4 Vote 5 Majority Vote

A A A A A A

A A A A B A

A A A B A A

A A A B B A

A A B A A A

A A B A B A

A A B B A A

A B A A A A

A B A A B A

A B A B A A

A B B A A A

B A A A A A

B A A A B A

B A A B A A

B A B A A A

B B A A A A

More on Condorcet Jury Theorem Mathematics

Page 6: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

• Five voters:

Vote 1 Vote 2 Vote 3 Vote 4 Vote 5 Majority Vote

A A B B B B

A B A B B B

A B B A B B

A B B B A B

A B B B B B

B A A B B B

B A B A B B

B A B B A B

B A B B B B

B B A A B B

B B A B A B

B B A B B B

B B B A A B

B B B A B B

B B B B A B

B B B B B B

More on Condorcet Jury Theorem Mathematics

Page 7: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

More on Condorcet Jury Theorem Mathematics

• One voter, probability A = 0.6, B = 0.4:

Voter 1 Majority Vote Probability

A A .6

B B .4

Page 8: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

• Three voters, probability A = 0.6, B = 0.4:

Voter 1 Voter 2 Voter 3 Majority Vote Probability

A A A A 0.6 x 0.6 x 0.6 = 0.216

A A B A 0.6 x 0.6 x 0.6 = 0.144

A B A A 0.6 x 0.6 x 0.6 = 0.144

A B B B 0.6 x 0.6 x 0.6 = 0.096

B A A A 0.6 x 0.6 x 0.6 = 0.144

B A B B 0.6 x 0.6 x 0.6 = 0.096

B B A B 0.6 x 0.6 x 0.6 = 0.096

B B B B 0.6 x 0.6 x 0.6 = 0.064

More on Condorcet Jury Theorem Mathematics

Page 9: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

• Three voters, probability A = 0.6, B = 0.4:

Probability A = 0.216 + 0.144 + 0.144 + 0.144 = 0.648

Probability B = 0.096 + 0.096 + 0.096 + 0.064 = 0.352

Voter 1 Voter 2 Voter 3 Majority Vote Probability

A A A A 0.6 x 0.6 x 0.6 = 0.216

A A B A 0.6 x 0.6 x 0.6 = 0.144

A B A A 0.6 x 0.6 x 0.6 = 0.144

A B B B 0.6 x 0.6 x 0.6 = 0.096

B A A A 0.6 x 0.6 x 0.6 = 0.144

B A B B 0.6 x 0.6 x 0.6 = 0.096

B B A B 0.6 x 0.6 x 0.6 = 0.096

B B B B 0.6 x 0.6 x 0.6 = 0.064

More on Condorcet Jury Theorem Mathematics

Page 10: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

More on Condorcet Jury Theorem Mathematics

• One voter, probability A = 0.6, B = 0.4:

Voter 1 Majority Vote Probability

A A .6

B B .4

Page 11: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

• Three voters, probability A = 0.6, B = 0.4:

Probability A = 0.216 + 0.144 + 0.144 + 0.144 = 0.648

Probability B = 0.096 + 0.096 + 0.096 + 0.064 = 0.352

Voter 1 Voter 2 Voter 3 Majority Vote Probability

A A A A 0.6 x 0.6 x 0.6 = 0.216

A A B A 0.6 x 0.6 x 0.6 = 0.144

A B A A 0.6 x 0.6 x 0.6 = 0.144

A B B B 0.6 x 0.6 x 0.6 = 0.096

B A A A 0.6 x 0.6 x 0.6 = 0.144

B A B B 0.6 x 0.6 x 0.6 = 0.096

B B A B 0.6 x 0.6 x 0.6 = 0.096

B B B B 0.6 x 0.6 x 0.6 = 0.064

More on Condorcet Jury Theorem Mathematics

Page 12: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

• Five voters:

Vote 1 Vote 2 Vote 3 Vote 4 Vote 5 Majority Probability

A A A A A A 0.07776

A A A A B A 0.05184 A A A B A A 0.05184

A A A B B A 0.03456 A A B A A A 0.05184

A A B A B A 0.03456 A A B B A A 0.03456 A B A A A A 0.05184

A B A A B A 0.03456 A B A B A A 0.03456

A B B A A A 0.03456

B A A A A A 0.05184 B A A A B A 0.03456

B A A B A A 0.03456 B A B A A A 0.03456

B B A A A A 0.03456

More on Condorcet Jury Theorem Mathematics

Page 13: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

• Five voters:

Vote 1 Vote 2 Vote 3 Vote 4 Vote 5 Majority Probability

A A B B B B 0.02304

A B A B B B 0.02304 A B B A B B 0.02304

A B B B A B 0.02304 A B B B B B 0.01536

B A A B B B 0.02304 B A B A B B 0.02304 B A B B A B 0.02304

B A B B B B 0.01536 B B A A B B 0.02304

B B A B A B 0.02304

B B A B B B 0.01536 B B B A A B 0.02304

B B B A B B 0.01536 B B B B A B 0.01536

B B B B B B 0.01024

More on Condorcet Jury Theorem Mathematics

Page 14: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

• Five voters:

– Probability majority vote is A = 0.683

– Probability majority vote is B = 0.317

More on Condorcet Jury Theorem Mathematics

Page 15: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

• Five voters: – Probability majority vote is A = 0.683 – Probability majority vote is B = 0.317

• Three voters: – Probability majority vote is A = 0.648 – Probability majority vote is B = 0.352

• One voter: – Probability majority vote is A = 0.6 – Probability majority vote is B = 0.4

More on Condorcet Jury Theorem Mathematics

Page 16: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

More on AMT and SSNs

Page 17: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

More on AMT and SSNs

Page 18: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

More on Collaborative Filtering

• Netflix Prize – 2006:

Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean rating for a movie: 1.054)

– $1M if you could improve this by 10%, to 0.8572

– $50,000 per year to best attempt if 10% not reached, as long as it’s 1% better than the previous year

– Ineligible countries: Cuba, Iran, Syria, North Korea, Myanmar, Sudan, and

Page 19: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

More on Collaborative Filtering

• Netflix Prize – 2006:

Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean rating for a movie: 1.054)

– $1M if you could improve this by 10%, to 0.8572

– $50,000 per year to best attempt if 10% not reached, as long as it’s 1% better than the previous year

– Ineligible countries: Cuba, Iran, Syria, North Korea, Myanmar, Sudan, and Quebec

Page 20: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

Netflix Prize Risks

• Privacy violation

• Prize won immediately

• No winners

• Labor overhead

• Lowering barrier to entry from competitors

• $1M

Page 21: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

Netflix Prize

Page 22: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

Netflix Prize Data

• 100,480,507 ratings for training (2000-2005) – 480,189 users – 17,770 movies – Each item: <user, movie, date, {1:5}> – 1,408,395 “probe” items in training set with distribution similar

to test data – Average user rated > 200 movies

One user rated over 17,000 – Average movie rated by > 5000 users

Some movies had only 3 ratings

• Hidden data: – 1,408,342 items you could get error rate on – 1,408,789 items on which Netflix rated submissions

Page 23: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

23

Higher Mean Rating in Probe Data

0

5

10

15

20

25

30

35

40

1 2 3 4 5

Rating

Perc

en

tag

e

Training (m = 3.60)

Probe (m = 3.67)

Page 24: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

Data about the Movies Avg rating Most Loved Movies

4.593 The Shawshank Redemption

4.545 Lord of the Rings :The Return of the King

4.306 The Green Mile

4.460 Lord of the Rings :The Two Towers

4.415 Finding Nemo

4.504 Raiders of the Lost Ark

Most Rated Movies

Miss Congeniality

Independence Day

The Patriot

The Day After Tomorrow

Pretty Woman

Pirates of the Caribbean

Highest Variance

The Royal Tenenbaums

Lost In Translation

Pearl Harbor

Miss Congeniality

Napolean Dynamite

Fahrenheit 9/11

Page 25: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

25

Most Active Users

User ID # Ratings Mean Rating

305344 17,651 1.90

387418 17,432 1.81

2439493 16,560 1.22

1664010 15,811 4.26

2118461 14,829 4.08

1461435 9,820 1.37

1639792 9,764 1.33

1314869 9,739 2.95

Page 26: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

26

Ratings per Movie in Training Data

Avg #ratings/movie: 5627

Page 27: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

27

Ratings per User in Training Data

Avg #ratings/user: 208

Page 28: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

Netflix Prize Timeline

• October 2, 2006:

– Contest opened

• October 8, 2006:

– WXYZConsulting beats Cinematch

• October 15, 2006:

– 3 teams had beaten Cinematch, one by >1%

• June 2007:

– over 20,000 teams (150 countries) registered

Page 29: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

29

Progress after 2 Months

Top contenders for Progress Prize 2007

0

1

2

3

4

5

6

7

8

9

10

10/2

/06

11/2

/06

12/2

/06

1/2

/07

2/2

/07

3/2

/07

4/2

/07

5/2

/07

6/2

/07

7/2

/07

8/2

/07

9/2

/07

10/2

/07

% i

mp

rovem

en

t

ML@Toronto

How low can he go?

wxyzConsulting

Grand prize

Page 30: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

30

Progress after 8 Months

Top contenders for Progress Prize 2007

0

1

2

3

4

5

6

7

8

9

101

0/2

/20

06

11

/2/2

00

6

12

/2/2

00

6

1/2

/20

07

2/2

/20

07

3/2

/20

07

4/2

/20

07

5/2

/20

07

6/2

/20

07

7/2

/20

07

8/2

/20

07

9/2

/20

07

10

/2/2

00

7

% i

mp

rovem

en

t

ML@Toronto

How low can he go?

wxyzConsulting

Gravity

BellKor

Grand prize

Page 31: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

31

Progress after 1 Year

Top contenders for Progress Prize 2007

0

1

2

3

4

5

6

7

8

9

101

0/2

/20

06

11

/2/2

00

6

12

/2/2

00

6

1/2

/20

07

2/2

/20

07

3/2

/20

07

4/2

/20

07

5/2

/20

07

6/2

/20

07

7/2

/20

07

8/2

/20

07

9/2

/20

07

10

/2/2

00

7

% i

mp

rovem

en

t

ML@Toronto

How low can he go?

wxyzConsulting

Gravity

BellKor

Grand prize

Page 32: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

Netflix Prize Timeline

• Year 1:

– Progress prize:

• KorBell (aka BellKor): 8.43% improvement

• Publish description of their algorithm

• Linear combination of 107 different factors

Page 33: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

Netflix Prize Timeline

• Year 1:

– Progress prize:

• KorBell (aka BellKor): 8.43% improvement

• Publish description of their algorithm

• Linear combination of 107 different factors

• Year 2: – Progress prize:

• Only 3 teams qualify (>1%)

• BellKor in BigChaos: 9.44%

• Publish description of their algorithm

Page 34: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

Netflix Prize Timeline

• Year 1:

– Progress prize:

• KorBell (aka BellKor): 8.43% improvement

• Publish description of their algorithm

• Linear combination of 107 different factors

• Year 2: – Progress prize:

• Only 3 teams qualify (>1%)

• BellKor in BigChaos: 9.44%

• Publish description of their algorithm

• Year 3: – Top 2 candidates:

• BellKor's Pragmatic Chaos: 10.05%

• The Ensemble: 10.09%

Page 35: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

Netflix Prize Timeline

Page 36: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

Netflix Prize Timeline

• Year 1:

– Progress prize:

• KorBell (aka BellKor): 8.43% improvement

• Publish description of their algorithm

• Linear combination of 107 different factors

• Year 2: – Progress prize:

• BellKor in BigChaos: 9.44%

• Publish description of their algorithm

• Year 3: – Top 2 candidates:

• BellKor's Pragmatic Chaos: 10.05%

• The Ensemble: 10.09%

Page 37: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

Netflix Prize Timeline

• Year 1:

– Progress prize:

• KorBell (aka BellKor): 8.43% improvement

• Publish description of their algorithm

• Linear combination of 107 different factors

• Year 2: – Progress prize:

• BellKor in BigChaos: 9.44%

• Publish description of their algorithm

• Year 3: – Top 2 candidates:

• BellKor's Pragmatic Chaos: 10.05%

• The Ensemble: 10.09%

• Over 800 factors

Page 38: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

References

• Y. Koren, Collaborative filtering with temporal dynamics, ACM SIGKDD Conference 2009

• Koren, Bell, Volinsky, Matrix factorization techniques for recommender systems, IEEE Computer, 2009

• Y. Koren, Factor in the neighbors: scalable and accurate collaborative filtering, ACM Transactions on Knowledge Discovery in Data, 2010

Page 39: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

Netflix Prize Postscript: 2007

Page 40: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

Netflix Prize Postscript: 2009

Page 41: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

Netflix Prize Postscript: 2009

The new data set, providing more than 100 million data points, will include, among other things, information about renters' ages, genders, ZIP codes, genre ratings and previously chosen movies. As with the first Netflix Prize, all data provided is anonymous and cannot be associated with a specific Netflix member.

Page 42: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

Netflix Prize Postscript: 2009

Page 43: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

Netflix Prize Postscript: 2009

Page 44: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

Netflix Prize Postscript: 2010

Page 45: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

Netflix Prize Postscript: 2010

Page 46: Social Computing and Collective Intelligence Lecture 11 21 ...hirsh/classes/442/Lecture 11.pdf · • Netflix Prize –2006: Netflix Cinematch algorithm: 0.9525 RMSE (Just give mean

Assignments

• Infotopia, Chapter 2

– Due 23 February 2012

• News feed