36
S Sentiment A Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Embed Size (px)

Citation preview

Page 1: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

SSentiment AAnalysis

Presented by

Aditya Joshi 08305908

Guided by

Prof. Pushpak Bhattacharyya

IIT Bombay

Page 2: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

What is SA & OM?

• Identify the orientation of opinion in a piece of text

• Can be generalized to a wider set of emotions

The movie was fabulous!

The movie stars Mr. X

The movie was horrible!

Page 3: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Motivation

• Knowing sentiment is a very natural ability of a human being.

Can a machine be trained to do it?

• SA aims at getting sentiment-related knowledge especially from the huge amount of information on the internet

• Can be generally used to understand opinion in a set of documents

Page 4: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Tripod of Sentiment AnalysisCognitive Science

Natural Language Processing

Machine Learning

Sentiment Analysis

Natural Language Processing

Machine Learning

Page 5: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Contents :

Challenges

Subjectivitydetection

SAApproaches

Applications

Lexical Resources

Page 6: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Challenges

• Contrasts with standard text-based categorization

• Domain dependent

• Sarcasm

• Thwarted expressions

Mere presence of words is Indicative of the category

in case of text categorization.

Not the case with sentiment analysis

• Contrasts with standard text-based categorization

• Domain dependent

• Sarcasm

• Thwarted expressions

Sentiment of a word is w.r.t. the

domain.

Example: ‘unpredictable’

For steering of a car,

For movie review,

• Contrasts with standard text-based categorization

• Domain dependent

• Sarcasm

• Thwarted expressions

Sarcasm uses words ofa polarity to represent

another polarity.

Example: The perfume is soamazing that I suggest you wear it

with your windows shut

• Contrasts with standard text-based categorization

• Domain dependent

• Sarcasm

• Thwarted expressions

the sentences/words that contradict the overall sentiment

of the set are in majority

Example: The actors are good, the music is brilliant and appealing.

Yet, the movie fails to strike a chord.

Page 7: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

SentiWordNet

•Lexical resource for sentiment analysis

•Built on the top of WordNet synsets

•Attaches sentiment-related information with synsets

Page 8: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Quantifying sentiment

Objective Polarity

Subjective Polarity

Positive Negative

Term sense position

Each term has a Positive, Negative and Objective score. The scores sum to one.

Page 9: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Building SentiWordNet

• Ln, Lo, Lp are the three seed sets• Iteratively expand the seed sets through K

steps• Train the classifier for the expanded sets

Page 10: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

LpLn

also-se

e

antonymy

Expansion of seed sets

The sets at the end of kth step are called Tr(k,p) and Tr(k,n)

Tr(k,o) is the set that is not present in Tr(k,p) and Tr(k,n)

Page 11: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Committee of classifiers

• Train a committee of classifiers of different types and different K-values for the given data

• Observations:– Low values of K give high precision and low

recall– Accuracy in determining positivity or

negativity, however, remains almost constant

Page 12: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

WordNet Affect

• Similar to SentiWordNet (an earlier work)

• WordNet-Affect: WordNet + annotated affective concepts in hierarchical order

• Hierarchy called ‘affective domain labels’– behaviour– personality– cognitive state

Page 13: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Subjectivity detection

• Aim: To extract subjective portions of text

• Algorithm used: Minimum cut algorithm

Page 14: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Constructing the graph

• Why graphs?

• Nodes and edges?

• Individual Scores

• Association scoresTo model item-specific

and pairwise information independently.

• Why graphs?

• Nodes and edges?

• Individual Scores

• Association scores

Nodes: Sentences of the document and source & sink

Source & sink representthe two classes of sentences

Edges: Weighted with either of the two scores

• Why graphs?

• Nodes and edges?

• Individual Scores

• Association scoresPrediction whether

the sentence is subjective or not

Indsub(si)=

• Why graphs?

• Nodes and edges?

• Individual Scores

• Association scoresPrediction whether two

sentences should have

the same subjectivity level

T : Threshold – maximum distance upto which sentences may be considered proximalf: The decaying functioni, j : Position numbers

Page 15: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Constructing the graph

• Build an undirected graph G with vertices {v1, v2…,s, t} (sentences and s,t)

• Add edges (s, vi) each with weight ind1(xi)

• Add edges (t, vi) each with weight ind2(xi)

• Add edges (vi, vk) with weight assoc (vi, vk)

• Partition cost:

Page 16: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Example

Sample cuts:

Page 17: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Document

Subjective

Results (1/2)

• Naïve Bayes, no extraction : 82.8%

• Naïve Bayes, subjective extraction : 86.4%

• Naïve Bayes, ‘flipped experiment’ : 71 %

DocumentSubjectivity

detectorObjective

POLARITY CLASSIFIER

Page 18: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Results (2/2)

Page 19: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Approach 1: Using adjectives

• Many adjectives have high sentiment value– A ‘beautiful’ bag– A ‘wooden’ bench– An ‘embarrassing’ performance

• An idea would be to augment this polarity information to adjectives in the WordNet

Page 20: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Setup

• Two anchor words (extremes of the polarity spectrum) were chosen

• PMI of adjectives with respect to these adjectives is calculated

Polarity Score (W)= PMI(W,excellent) – PMI (W, poor)

excellent poor

wordPMI PMI

Page 21: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Experimentation

• K-means clustering algorithm used on the basis of polarity scores

• The clusters contain words with similar polarities

• These words can be linked using an ‘isopolarity link’ in WordNet

Page 22: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Results

• Three clusters seen

• Major words were with negative polarity scores

• The obscure words were removed by selecting adjectives with familiarity count of 3– the ones that are not very common

Page 23: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Approach 2: Using Adverb-Adjective Combinations (AACs)

• Calculate sentiment value based on the effect of adverbs on adjectives

• Linguistic ideas:• Adverbs of affirmation: certainly• Adverbs of doubt: possibly• Strong intensifying adverbs: extremely• Weak intensifying adverbs: scarcely• Negation and Minimizers: never

Page 24: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Moving towards computation…

• Based on type of adverb, the score of the resultant AAC will be affected

• Example of an axiom:

• Example : ‘extremely good’ is more positive than ‘good’

Page 25: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

AAC Scoring Algorithms

1. Variable Scoring Algorithm

2. Adjective Priority Scoring Algorithm

3. Adverb first scoring algorithm

Page 26: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Scoring the sentiment on a topic

• Rel (t) : Sentences in d that reference to topic t

• s : Sentence is Rel (t)

• Appl+(s) : AACs with positive score in s

• Appl-(s) : AACs with negative score in s

• Return strength =

Page 27: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Findings

• APSr with r=0.35 worked the best (Better correlation with human subject)– Adjectives are more important than adverbs in

terms of sentiment

• AACs give better precision and recall as compared to only adjectives

Page 28: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Approach 3: Subject-based SA

• Examples:

The horse bolted.

The movie lacks a good story.

Page 29: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Lexicon

subj. bolt

b VB bolt subj

subj. lack obj.

b VB lack obj ~subj

Argument that sends the sentiment (subj./obj.)

Argument that receives the sentiment (subj./obj.)

Argument that receives the sentiment (subj./obj.)

Page 30: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Lexicon

• Also allows ‘\S+’ characters

• Similar to regular expressions

• E.g. to put \S+ to risk– The favorability of the subject depends on the

favorability of ‘\S+’.

Page 31: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Example

The movie lacks a good story.

G JJ good obj.

The movie lacks \S+.

B VB lack obj ~subj.

Lexicon : Steps :

1) Consider a context window of upto five words

2) Shallow parse the sentence

3) Step-by-step calculate the sentiment value based on lexicon and by adding ‘\S+’ characters at each step

Page 32: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Results

Description Precision Recall

Benchmark corpus

Mixed statements

94.3% 28%

Open Test corpus

Reviews of a camera

94% 24%

Page 33: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Applications

• Review-related analysis

• Developing ‘hate mail filters’ analogous to ‘spam mail filters’

• Question-answering (Opinion-oriented questions may involve different treatment)

Page 34: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

Conclusion & Future Work

• Lexical Resources have been developed to capture sentiment-related nature

• Subjective extracts provide a better accuracy of sentiment prediction

• Several approaches use algorithms like Naïve Bayes, clustering, etc. to perform sentiment analysis

• The cognitive angle to Sentiment Analysis can be explored in the future

Page 35: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

References (1/2)

• Tetsuya Nasukawa, Jeonghee Yi. ‘Sentiment Analysis: Capturing Favorability Using Natural Language Processing’. In K-CAP ’03, Florida, pages 1-8. 2003.

• Alekh Agarwal, Pushpak Bhattacharyya. ‘Augmenting WordNet with polarity information on adjectives’. In K-CAP ’03, Florida, pages 1-8. 2003.

• SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining Andrea Esuli, Fabrizio Sebastiani

• ‘Machine Learning’, Han and Kamber, 2nd edition, 310-330.• http://wordnet.princeton.edu• Farah Benamara, Carmine Cesarano, Antonio Picariello, VS Subrahmanian

et al; ‘Sentiment Analysis: Adjectives and Adverbs are better than Adjectives Alone’; In ICWSM ’2007 Boulder, CO USA, 2007.

Page 36: SA Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay

References (2/2)

• Jon M. Kleinberg; ‘Authoritative Sources in a Hyperlinked Environment’ as IBM Research Report RJ 10076, May 1997, Pgs. 1 – 34.

• www.cs.uah.edu/~jrushing/cs696-summer2004/notes/Ch8Supp.ppt• Opinion Mining and Sentiment Analysis, Foundations and Trends in

Information Retrieval, B. Pang and L. Lee, Vol. 2, Nos. 1–2 (2008) 1–135, 2008.

• Bo Pang, Lillian Lee; ‘A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts’; Proceedings of the 42nd ACL; pp. 271–278; 2004.

• http://www.cse.iitb.ac.in/~veeranna/ppt/Wordnet-Affect.ppt