21
Sentiment Analysis 1. Discover a niche network of Twitter users 2. Model their emotions on topics 3. Use feelings to more accurately predict a time series e.g. The stock market e.g. Box office success 4. Are some [users/networks] more influential than others?

Mike davies sentiment_analysis_presentation_backup

  • Upload
    m1ked

  • View
    107

  • Download
    0

Embed Size (px)

DESCRIPTION

My Michaelmas fourth year presentation on a CUED fourth year project: Sentiment Analysis.

Citation preview

Page 1: Mike davies sentiment_analysis_presentation_backup

Sentiment Analysis

1. Discover a niche network of Twitter users

2. Model their emotions on topics

3. Use feelings to more accurately predict a time series e.g. The stock market

e.g. Box office success

4. Are some [users/networks] more influential than others?

Page 2: Mike davies sentiment_analysis_presentation_backup

This Talk

The Design Decision The Core Goals The 3 parts of the project:

1. Classifying the SENTIMENT of tweets

2. Building a NETWORK of twitter users

3. Finding a TIME SERIES of sentiment for each user

Page 3: Mike davies sentiment_analysis_presentation_backup

Sentiment Analysis Used Already

Derwent Capital Markets - ”The twitter hedgefund”

£25m fund 10% of tweets predicts Dow Jones movement direction with

87.6% accuracy Returned 1.85% in its first month of trading Johan Bollen, Indiana University, used bag-of-

words approach

Page 4: Mike davies sentiment_analysis_presentation_backup

Sentiment Analysis Used Already

Product reviews / ratings

Page 5: Mike davies sentiment_analysis_presentation_backup

Sentiment Analysis Used Already

Social Media Analytics

Page 6: Mike davies sentiment_analysis_presentation_backup

Design Decision

Many paragraphs of text (Product Reviews)

+ : Better accuracy of prediction

- : Less data overall

Huge amount of small quantities of text (Twitter)

+ : Opinions of greater number of people& at high enough frequency to model as a signal

- : Classification of opinion is v. poor

=> TWITTER

Page 7: Mike davies sentiment_analysis_presentation_backup

2 Current Aims (will change later)

1. Project aims to be context

independent (i.e. Movies & products)

2. When context is given, use it to better classify tweets

Page 8: Mike davies sentiment_analysis_presentation_backup

1: Sentiment Analysis of Tweets

Three-tier classification process:

tweet

spam not spam

objective subjective

positive negative

Page 9: Mike davies sentiment_analysis_presentation_backup

Double-Back Propagation Algorithm ACL Journal, March 2011, MIT Press Opinion Word Extraction & Target Extraction 4 rules

”The phone has a good screen”

=> add ”good” to list of adjectives

=> add ”screen” to list of nouns Etc.

Great for rating features of a product

Not great for tweets

1: Sentiment Analysis of Tweets

Page 10: Mike davies sentiment_analysis_presentation_backup

Twitter Part Of Speech (POS) tagger:

www.ark.cs.cmu.edu/TweetNLP/ Written in java Max Ent

1: Sentiment Analysis of Tweets

" ^Drive ^" ^, ,go Vand &watch Vit O! ,Fantastic Amovie N. ,

Page 11: Mike davies sentiment_analysis_presentation_backup

Bootstrapped Tweet SA improver

IMDB Movie Review Corpora

Double-BackProp. Algo

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

SentimentAnalysis

Gives useful adjectives, nouns

Page 12: Mike davies sentiment_analysis_presentation_backup

Collected my twitter friends, friends of friends, friends of friends of friends.

=> 115,896 users

2: Building a Network

Page 13: Mike davies sentiment_analysis_presentation_backup

2: Building a Network

Page 14: Mike davies sentiment_analysis_presentation_backup

Community detection: Paper 1: Near linear time algorithm for

detecting community structures on large scale networks

Paper 2: An LDA-based Community Structure Discovery Approach for Large-Scale Social Networks Haizheng Zhang

2: Building a Network

Page 15: Mike davies sentiment_analysis_presentation_backup

Like MapReduce Instead of ”map” and ”reduce” Map = 'Update':

modify overlapping sets of data Reduce = 'Sync': perform reductions in the

background while sync is running Label Propagation & LDA

2: Building a Network

Page 16: Mike davies sentiment_analysis_presentation_backup

Will get time series from python to R using the rpy2 module

R has a great package ”quantmod” for importing financial market data.

Can also import other time seriesvery easily & many great libraries.

3: Time series prediction

Page 17: Mike davies sentiment_analysis_presentation_backup

Built With

Python - For majority of code

Packages: numpy, scipy, matplotlib

networkx, graphviz, rpy2

django, twython, nltk R - For time series analysis Postgreql - SQL database Java - Twitter POS tagger C/C++ - GraphLab

Page 18: Mike davies sentiment_analysis_presentation_backup

End Product

IMDB Movie Review Corpora

Double-BackProp. Algo

Tweet

Tweet

Tweet

Tweet

Tweet

SentimentAnalysis

Page 19: Mike davies sentiment_analysis_presentation_backup

Thank You Mike Davies

Documented at www.m1ked.com

Page 20: Mike davies sentiment_analysis_presentation_backup

Notes: Vowpal Wabbit LDA

Vowpal Wabbit is an open source library for fast online learning (mostly SGD) mainly developed by a guy at Yahoo.

Optimised for speed LDA uses clever tricks like vectorisation,

floating point representation to avoid using pow() and exp() functions.

Page 21: Mike davies sentiment_analysis_presentation_backup

Notes: Label Propagation

Label Propagation has been proven to be an effective semi-supervised learning approach in many applications. The key idea behind label propagation is to first construct a graph in which each node represents a data point and each edge is assigned a weight often computed as the similarity between data points, then propagate the class labels of labeled data to neighbors in the constructed graph in order to make predictions.