22
MovieTweetings: a Movie Rating Dataset Collected From Twitter @sidooms Simon Dooms

MovieTweetings: a movie rating dataset collected from twitter

Embed Size (px)

DESCRIPTION

Slides about the MovieTweetings dataset presented at the RecSys 2013 conference on October 12 in Hong Kong by Simon Dooms.

Citation preview

Page 1: MovieTweetings: a movie rating dataset collected from twitter

MovieTweetings: a Movie Rating Dataset Collected From Twitter

@sidoomsSimon Dooms

Page 2: MovieTweetings: a movie rating dataset collected from twitter

Research datasets

Recsys research needs datasets To evaluate, experiment and demonstrate

I need datasets

Available for download: MovieLens 100K MovieLens 1M MovieLens 10M

ConclusionResultsAbout DataTwitter - IMDbIntro

Oct. 12, 2013 Simon Dooms - Ghent University - CrowdRec 2013 2

Page 3: MovieTweetings: a movie rating dataset collected from twitter

ConclusionResultsAbout DataTwitter - IMDbIntro

3Oct. 12, 2013 Simon Dooms - Ghent University - CrowdRec 2013

Page 4: MovieTweetings: a movie rating dataset collected from twitter

Research datasets

Recsys research needs datasets To evaluate, experiment and demonstrate

I needed datasets

Available for download: MovieLens 100K ~ most recent movie: 1998 MovieLens 1M ~ most recent movie: 2000 MovieLens 10M ~ most recent movie: 2008

I need up-to-date movie ratings

ConclusionResultsAbout DataTwitter - IMDbIntro

4Oct. 12, 2013 Simon Dooms - Ghent University - CrowdRec 2013

Page 5: MovieTweetings: a movie rating dataset collected from twitter

Finding data

Data is all around us

5

ConclusionResultsAbout DataTwitter - IMDbIntro

Oct. 12, 2013 Simon Dooms - Ghent University - CrowdRec 2013

Page 6: MovieTweetings: a movie rating dataset collected from twitter

6

ConclusionResultsAbout DataTwitter - IMDbIntro

Oct. 12, 2013 Simon Dooms - Ghent University - CrowdRec 2013

Page 7: MovieTweetings: a movie rating dataset collected from twitter

7

ConclusionResultsAbout DataTwitter - IMDbIntro

Oct. 12, 2013 Simon Dooms - Ghent University - CrowdRec 2013

Page 8: MovieTweetings: a movie rating dataset collected from twitter

8

ConclusionResultsAbout DataTwitter - IMDbIntro

Oct. 12, 2013 Simon Dooms - Ghent University - CrowdRec 2013

Page 9: MovieTweetings: a movie rating dataset collected from twitter

Finding data

Data is all around usBUT extremely unstructured

What we want:1::122::5::8389850461::185::5::8389835251::231::5::8389833921::292::5::8389834211::316::5::838983392

(user, item, rating, time)9

ConclusionResultsAbout DataTwitter - IMDbIntro

Oct. 12, 2013 Simon Dooms - Ghent University - CrowdRec 2013

Page 10: MovieTweetings: a movie rating dataset collected from twitter

Structured data

10

ConclusionResultsAbout DataTwitter - IMDbIntro

Oct. 12, 2013 Simon Dooms - Ghent University - CrowdRec 2013

Page 11: MovieTweetings: a movie rating dataset collected from twitter

Structured data

11

ConclusionResultsAbout DataTwitter - IMDbIntro

Oct. 12, 2013 Simon Dooms - Ghent University - CrowdRec 2013

Page 12: MovieTweetings: a movie rating dataset collected from twitter

Structured data

12

ConclusionResultsAbout DataTwitter - IMDbIntro

Page 13: MovieTweetings: a movie rating dataset collected from twitter

Structured data

13

ConclusionResultsAbout DataTwitter - IMDbIntro

Oct. 12, 2013 Simon Dooms - Ghent University - CrowdRec 2013

Page 14: MovieTweetings: a movie rating dataset collected from twitter

Structured data“I rated Death Proof 10/10 #IMDb”

• User• Item (movie)• Rating• Hashtag

14

ConclusionResultsAbout DataTwitter - IMDbIntro

Oct. 12, 2013 Simon Dooms - Ghent University - CrowdRec 2013

Page 15: MovieTweetings: a movie rating dataset collected from twitter

Structured dataSearch Twitter for“I rated #IMDb”

Bingo!

15

ConclusionResultsAbout DataTwitter - IMDbIntro

Oct. 12, 2013 Simon Dooms - Ghent University - CrowdRec 2013

Page 16: MovieTweetings: a movie rating dataset collected from twitter

Collecting data

We query the Twitter API for “I rated #IMDb” Extract relevant information Cross-reference with IMDb for extra genre data

16

ConclusionResultsAbout DataTwitter - IMDbIntro

Oct. 12, 2013 Simon Dooms - Ghent University - CrowdRec 2013

Page 17: MovieTweetings: a movie rating dataset collected from twitter

The dataRatings.dat

1::1074638::7::13650291071::1853728::8::13665766392::0113277::10::1379466669

Movies.dat1028528::Death Proof (2007)::Action|Thriller0133093::The Matrix (1999)::Action|Adventure|Sci-Fi1670345::Now You See Me (2013)::Thriller

Users.dat1::184051822::9958850603::31260677

IMDb ID - http://www.imdb.com/title/tt0113277

Twitter ID (NOT @handle)

Rating scale from 1 to 10

17

ConclusionResultsAbout DataTwitter - IMDbIntro

Page 18: MovieTweetings: a movie rating dataset collected from twitter

Your data MovieTweetings dataset available on GitHub

(https://github.com/sidooms/MovieTweetings) Find it on the RecSys Wiki (category datasets)

Latest All ratings Automagically updated daily

Snapshots Fixed portion of dataset Added manually when appropriate 10K, 20K, 30K, 40K, 50K, 100K

DISCLAIMER: Depending on Twitter API, IMDb apps and me!18

ConclusionResultsAbout DataTwitter - IMDbIntro

Oct. 12, 2013 Simon Dooms - Ghent University - CrowdRec 2013

Page 19: MovieTweetings: a movie rating dataset collected from twitter

Some numbers

567¿𝑑𝑎𝑦MovieTweetings MovieLens 100K MovieLens 1M MovieLens 10M

Ratings 121,404 100,000 1,000,209 10,000,054 Users 19,464 943 6,040 71,567Items 11,655 1682 3,900 10,681

19

(Results on September 30, 2013)

ConclusionResultsAbout DataTwitter - IMDbIntro

Oct. 12, 2013 Simon Dooms - Ghent University - CrowdRec 2013

Page 20: MovieTweetings: a movie rating dataset collected from twitter

Some funTop 3 most rated movies

1. Iron Man 3 (2013)2. Man of Steel (2013)3. World War Z (2013)

Top 3 AVG rated movies (min 20 ratings)4. The Shawshank Redemption (1994)5. LOTR: The Return of the King (2003)6. The Dark Knight (2008)

Bottom 3 worst AVG rated movies (min 20 ratings)3. Scary MoVie (2013)2. Piranha 3DD (2012)1. Cosmopolis (2012)

20

ConclusionResultsAbout DataTwitter - IMDbIntro

Oct. 12, 2013 Simon Dooms - Ghent University - CrowdRec 2013

Page 21: MovieTweetings: a movie rating dataset collected from twitter

Some conclusions Outdated public datasets Social media = Unstructured data available Structured rating data through Twitter – IMDb MovieTweetings: our Movie Rating Dataset

Always up-to-date Includes most recent and most relevant movies Unfiltered rating data Publicly available

Death Proof (2007) really is an awesome movie

21

ConclusionResultsAbout DataTwitter - IMDbIntro

Oct. 12, 2013 Simon Dooms - Ghent University - CrowdRec 2013

Page 22: MovieTweetings: a movie rating dataset collected from twitter

@sidoomsSimon Dooms

MovieTweetings: a Movie Rating Dataset Collected From Twitter