Transcript
Page 1: Movie topics- Efficient features for movie recommendation systems

Efficient Features for Movie Recommendation

Systems

Project presentation

Suvir Bhargav

Page 2: Movie topics- Efficient features for movie recommendation systems

Outline

● Motivation and Why movie reviews● Problem statement● How? or the overall system ● Text preprocessing approaches● Postprocessing: movie topics from a reviews

corpus● Similarity● Experimental setup and results

Page 3: Movie topics- Efficient features for movie recommendation systems

Thanks to Sean Lind, source: http://www.silveroakcasino.com/blog/posts/netflix/what-to-watch-on-netflix.html

Motivation

Page 4: Movie topics- Efficient features for movie recommendation systems

Motivation

● movie genres are not enough.● classify movies

○ keywords○ moods○ imdb ratings○ micro genres

Page 5: Movie topics- Efficient features for movie recommendation systems

micro genres

source: http://www.theatlantic.com/technology/archive/2014/01/how-netflix-reverse-engineered-hollywood/282679/

Page 6: Movie topics- Efficient features for movie recommendation systems

Why movie reviews?

Source: a sample user written movie review from imdb

Page 7: Movie topics- Efficient features for movie recommendation systems

Problem statement

● Feature extraction from user reviews of movies

● Use extracted features to find similar movies.

Page 8: Movie topics- Efficient features for movie recommendation systems

The overall system

Movie reviews corpus● preprocessing

○ tokenization, stopwords, lemmatized.

● post processing○ topic modeling: Movie topics from a reviews corpus

● similarity measure○ return movies with similar topics distribution

Page 9: Movie topics- Efficient features for movie recommendation systems

tokenization, stopwords, lemmatized.

Simple information extraction

Text preprocessing

Figure credit to nltk book.

Page 10: Movie topics- Efficient features for movie recommendation systems

Post processing

Document representation: Vector Space Model (VSM)

Picture credit: pyevolve

Page 11: Movie topics- Efficient features for movie recommendation systems

Post processing: generative model

source: David blei’s slide

Page 12: Movie topics- Efficient features for movie recommendation systems

Post processing: LDA

For each document in the collection, the words can be generated in two stage process1) Randomly choose a distribution over topics.2) For each word in the document

a) Randomly choose a topic from the distribution over topics in step 1.

b) Randomly choose a word from the corresponding distribution over the vocabulary

Documents exhibit multiple topics

Page 13: Movie topics- Efficient features for movie recommendation systems

Movie topics from a reviews corpus

Page 14: Movie topics- Efficient features for movie recommendation systems

Similarity Measure

● Cosine Similarity● KL divergence● Hellinger distance

Page 15: Movie topics- Efficient features for movie recommendation systems

Cosine Similarity

Similarity Measure

Page 16: Movie topics- Efficient features for movie recommendation systems

Hellinger Distance

Similarity Measure

Page 17: Movie topics- Efficient features for movie recommendation systems

The overall system: implementation

Movie reviews corpus● preprocessing

○ nltk and gensim’s simple preprocessing.

● post processing○ gensim python wrapper to MALLET○ index topic distribution of query movies, q and 1k

movies corpus, C.

● similarity measure○ python numpy implementation○ apply distance metric on indexed q and C.○ sort and pick top 5 movies.

Page 18: Movie topics- Efficient features for movie recommendation systems

Experimental setup

Movie reviews corpus of 1k movies

reviews data source: imdb

Page 19: Movie topics- Efficient features for movie recommendation systems

Evaluation criteria

Experimental setup

Page 20: Movie topics- Efficient features for movie recommendation systems

Conclusion

● Movie topics as efficient features for RS○ represents movies by underlying semantic patterns

○ useful for capturing movie genre and mood.

○ but not so well with plot.

○ user written movie reviews are useful movie meta-data.

● The developed prototype○ easy to add more movie meta-data

○ python allows scalability.

○ Topics as an explanation needs further tuning.

Page 21: Movie topics- Efficient features for movie recommendation systems

Future directions

● Movie review preprocessing○ bigram, trigrams.○ create multi-word movie keywords or language

construction

● Building complex topic models○ Hierarchical LDA○ author-topic model

■ include authorship information.■ similarity between authors

Page 22: Movie topics- Efficient features for movie recommendation systems

Questions ?

Thank You

Image src: http://www.brinvy.biz/177215/batman-catching-a-ride-on-supermans-back-funny-hd-wallpaper-x.html

Page 23: Movie topics- Efficient features for movie recommendation systems

Extra slides

List of extra slides and notes● Original LDA paper● introduction to probabilistic topic modeling● and A. Huang’s Similarity measures for text document

clustering● Another good LDA description● Integrating out multinomial parameters in LDA● language construction in micro genres

Page 24: Movie topics- Efficient features for movie recommendation systems

LDA


Recommended