View
222
Download
0
Category
Preview:
Citation preview
Sentiment DetectionSentiment Detection
Naveen Sharma(02005010)Naveen Sharma(02005010)PrateekChoudhary(02005016)PrateekChoudhary(02005016)
Yashpal Meena(02005030)Yashpal Meena(02005030)Under guidance Under guidance
OfOfProf. Pushpak BhattacharyaProf. Pushpak Bhattacharya
OutlineOutline
Problem StatementProblem Statement
ChallengesChallenges
Earlier Work and Traditional ApproachesEarlier Work and Traditional Approaches
Recent AdvancesRecent Advances
Conclusion/Future DirectionsConclusion/Future Directions
Sentiment AnalysisSentiment Analysis
What is Sentiment Analysis?What is Sentiment Analysis?– Determining the overall polarity of a given Determining the overall polarity of a given
documentdocument
Polarity:Polarity:- Positive- Positive- Negative- Negative- Mixed- Mixed- Neutral- Neutral
MotivationMotivation
IndividualIndividual– Movie Reviews on web (Thumbs up or Thumbs down)Movie Reviews on web (Thumbs up or Thumbs down)
CommercialCommercial– Feedback/evaluation forms.Feedback/evaluation forms.– Opinions about a product.Opinions about a product.– Recognizing and discarding “flames” on newsgroups.Recognizing and discarding “flames” on newsgroups.
PoliticalPolitical– Opinions on government policiesOpinions on government policies
eg. Iraq War, Taxationeg. Iraq War, Taxation
Sentiment AnalysisSentiment Analysis
A type of Text ClassificationA type of Text ClassificationOther types of Text ClassificationsOther types of Text Classifications– Author based ClassificationAuthor based Classification– Topic CategorizationTopic Categorization
Sentiment Analysis and Topic Sentiment Analysis and Topic categorizationcategorization– Topics - subject matterTopics - subject matter– Sentiments - opinion towards subject matterSentiments - opinion towards subject matter
ChallengesChallenges
Reference to multiple objects in the same Reference to multiple objects in the same documentdocument- - The NR70 is The NR70 is trendy.trendy. T-Series is fast becoming T-Series is fast becoming obsoleteobsolete..Dependence on the context of the documentDependence on the context of the document- - “Unpredictable” plot ; “Unpredictable” performance“Unpredictable” plot ; “Unpredictable” performance
Negations have to be capturedNegations have to be captured- - Monochrome display is Monochrome display is notnot what the user what the user wantswants– It is It is notnot like the movie is a total waste of time like the movie is a total waste of time
Challenges(contd.)Challenges(contd.)
Metaphors/SimilesMetaphors/Similes
- - The metallic body is The metallic body is solid as a rocksolid as a rock
Part-of and Attribute-of relationshipsPart-of and Attribute-of relationships
- - The small keypad is inconvenientThe small keypad is inconvenient
Subtle ExpressionSubtle Expression
- - How can someone sit through this How can someone sit through this movie?movie?
Earlier Work (First approaches)Earlier Work (First approaches)
Naive BayesNaive Bayes
Maximum EntropyMaximum Entropy
Support Vector MachinesSupport Vector Machines
Naïve BayesNaïve Bayes
What is Naïve Bayesian ClassifierWhat is Naïve Bayesian Classifier
DifficultyDifficulty
-More than few variables-More than few variables
How to over come this difficultyHow to over come this difficulty
- Independence of variables- Independence of variables
Naïve Bayes(Contd.)Naïve Bayes(Contd.) --- set of predefined feature vectors--- set of predefined feature vectors
– Features can be representative words/word patternsFeatures can be representative words/word patternsEach document d represented by document vector Each document d represented by document vector
Where nWhere nii(d) = no. of times feature vector f(d) = no. of times feature vector f i i occurs in doccurs in d
Assign a document d to classAssign a document d to class
WhereWhere
P(d) plays no role in selecting c*.P(d) plays no role in selecting c*.
( )* ( / )( / )
( )
P c P d cP c d
P d
1 2{ , ,..... }mf f f
1( ( ),.... ( ))md n d n d
* arg max ( / )cc P c d
Naïve Bayes(contd.)Naïve Bayes(contd.)
Assuming fAssuming fiis are independent, Naïve Bayes s are independent, Naïve Bayes can be decomposed ascan be decomposed as
Advantages: Advantages: SimpleSimplePerforms Well Performs Well
( )
1( )( ( | ) )
( / ) :( )
im n d
iiNB
P c P f cP c d
P d
Recent AdvancesRecent Advances
An unsupervised learning algorithmAn unsupervised learning algorithm
Extract phrases from the review based on Extract phrases from the review based on pattern of parts of speech tags.pattern of parts of speech tags.
JJ = adjective NN = NounJJ = adjective NN = Noun
Eg. Extracting 2 word patternsEg. Extracting 2 word patterns
First wordFirst word Second WordSecond Word Third Word (Not Third Word (Not extracted)extracted)
JJJJ NN or NNSNN or NNS AnythingAnything
JJJJ JJJJ Not NN nor NNSNot NN nor NNS
Unsupervised Learning(contd.)Unsupervised Learning(contd.)
Estimate Semantic Orientation of Estimate Semantic Orientation of extracted phrasesextracted phrases
PMI (Pointwise Mutual Information) PMI (Pointwise Mutual Information) as strength of semantic associationas strength of semantic association
PMI(wordPMI(word11 , word , word22) = ) =
loglog22[ p(word[ p(word1 1 & word& word22)/ p(word)/ p(word11) p(word) p(word22)])]
SO(phrase) = SO(phrase) = PMI (phrase, ”excellent”) – PMI (phrase, “poor”)
Unsupervised Learning(contd.)Unsupervised Learning(contd.)
Determine the Determine the Semantic Orientation Semantic Orientation (SO) of the phrases(SO) of the phrases
Search on AltaVistaSearch on AltaVista
SO (SO (phrasephrase) = ) =
( " ") (" ")log
( " ") (" ")
hits phraseNear excellent hits poor
hits phraseNear poor hits excellent
Unsupervised Learning(contd.)Unsupervised Learning(contd.)
Calculate the average semantic orientation Calculate the average semantic orientation of phrases in the given review and classify of phrases in the given review and classify the review as recommended if the av-the review as recommended if the av-erage is positive and otherwise not erage is positive and otherwise not recommended.recommended.
Recent Advances(contd.)Recent Advances(contd.)
Subjectivity and min-cuts Approach by Subjectivity and min-cuts Approach by Pang and LeePang and Lee– Step1: labeling sentences as subjective and Step1: labeling sentences as subjective and
objective.objective.– Step2: applying standard machine learning Step2: applying standard machine learning
classifier to the subjective extract.classifier to the subjective extract.
Min cut approach(contd.)Min cut approach(contd.)
Formalization : Suppose we have n items Formalization : Suppose we have n items xx1 1 …..x…..xnn to divide into classes C to divide into classes C1 1 and Cand C22
We need two types of scores:We need two types of scores:– Individual scores indIndividual scores ind jj(x(xii))
estimate of each xestimate of each x ii’s preference’s preference
– Associative scores assoc(xAssociative scores assoc(x ii, x, xkk))
estimate of importance of both being in the estimate of importance of both being in the same classsame class
Min cut approach(contd.)Min cut approach(contd.)Maximize individual preferenceMaximize individual preference
Penalize tightly associated items in different Penalize tightly associated items in different classesclasses
Optimization problem: The formula for cost:Optimization problem: The formula for cost:
Build an undirected graph G with vertices {vBuild an undirected graph G with vertices {v11 ….v ….vnn, ,
s, t}s, t}
edge (s, vedge (s, vii) ---- weight ind) ---- weight ind11(x(xii))
1 2 1
2
2 1,
( ) ( ) ( , )i
k
i kx C x C x C
x C
ind x ind x assoc x x
Min cut approach(contd.)Min cut approach(contd.)
edge (vedge (vi i , t) – weight ind, t) – weight ind22(x(xii))
edge (vedge (vii, v, vkk) –weight assoc(x) –weight assoc(xii, x, xkk))
Classification problem now reduces to Classification problem now reduces to finding minimum cuts in the graphfinding minimum cuts in the graph
Min cut approach(contd.)Min cut approach(contd.)
Advantages/Analysis:Advantages/Analysis:– Different algorithmsDifferent algorithms– Maximum flow algorithms Maximum flow algorithms – N most subjective sentences.N most subjective sentences.– Last N sentences Last N sentences – Most Subjective N sentencesMost Subjective N sentences
Recent AdvancesRecent Advances
Using linguistic knowledge and wordnet Using linguistic knowledge and wordnet synonymy graphs – Agarwal and synonymy graphs – Agarwal and BhattacharyaBhattacharya
On Movie reviewsOn Movie reviews
Bag of words featuresBag of words features
Strength of adjective:Strength of adjective:
( , ) ( , )( )
( , )
d w bad d w goodEVA w
d good bad
Wordnet Approach(contd.)Wordnet Approach(contd.)
aboutabout and and ofof sentences sentences– About the movie (review)About the movie (review)– Whats in the movieWhats in the movie
Two kinds of weights:Two kinds of weights:– Individual weights :: probability estimates by an SVM Individual weights :: probability estimates by an SVM
classifierclassifier– Mutual weights:: tendency to fall in same categoryMutual weights:: tendency to fall in same category
Physical separationPhysical separation– Paragraph boundariesParagraph boundaries
Contextual similarityContextual similarity– Total adjective strengthTotal adjective strength– Scaling and distance measureScaling and distance measure
WordnetWordnet Approach(cont.) Approach(cont.)
Minimum cut algorithm similar to Pang and LeeMinimum cut algorithm similar to Pang and Lee
Mutual Similarity CoefficientMutual Similarity Coefficient
ffkk is the kth feature is the kth feature
FFii(f(fkk) = 1 if kth feature present in document) = 1 if kth feature present in document
= 0 otherwise= 0 otherwise
min
max min
( )* ( )( , ) i k j kki j
F f F f sMSC d d
s s
WordnetWordnet Approach(contd.) Approach(contd.)
SVM trained to give PrSVM trained to give Prgoodgood and Pr and Prbadbad
SVM probabilities and MSC values – SVM probabilities and MSC values – Weights MatrixWeights Matrix
Min cut ApproachMin cut Approach
WordnetWordnet Approach(contd.) Approach(contd.)
AnalysisAnalysis– Mutual relationships between documentsMutual relationships between documents– Graph cut technique as simple and powerfulGraph cut technique as simple and powerful– Decline in accuracy with subjectivityDecline in accuracy with subjectivity– Wordnet Wordnet - a useful lexicon resource- a useful lexicon resource
Conclusion/Future DirectionsConclusion/Future Directions
Practical UtilityPractical Utility
Harder than other text classificationsHarder than other text classifications
Traditional machine learning techniques Traditional machine learning techniques don’t perform that well.don’t perform that well.
Linguistic knowledge needs to be usedLinguistic knowledge needs to be used– Eg. Eg. WordnetWordnet
Subjectivity extracts and mutual Subjectivity extracts and mutual dependenciesdependencies
Conclusion/Future DirectionsConclusion/Future Directions
Better measure to incorporate linguistic Better measure to incorporate linguistic knowledgeknowledge
Better measures for degree of similarityBetter measures for degree of similarity
Formulation as multiclass problemFormulation as multiclass problem– Eg. Emotional icons in messengersEg. Emotional icons in messengers– May be helpful in building psychological May be helpful in building psychological
profiles through newsgroup mailsprofiles through newsgroup mails
ReferencesReferences
Alekh Agarwal and Pushpak Bhattacharyya, Alekh Agarwal and Pushpak Bhattacharyya, Sentiment Analysis: A New Sentiment Analysis: A New Approach for Effective Use of Linguistic Knowledge and Exploiting Approach for Effective Use of Linguistic Knowledge and Exploiting Similarities in a Set of Documents to be ClassifiedSimilarities in a Set of Documents to be Classified, International Conference , International Conference on Natural Language Processing (on Natural Language Processing ( ICON 05 ICON 05), IIT Kanpur, India, December, ), IIT Kanpur, India, December, 20052005
Bo Pang and Lillian Lee, Bo Pang and Lillian Lee, A Sentimental Education:Sentiment Analysis Using A Sentimental Education:Sentiment Analysis Using Subjectivity Summarization Based on Minimum CutsSubjectivity Summarization Based on Minimum Cuts, Proceedings of ACL, , Proceedings of ACL, 2004.2004.
Bo Pang, Lillian Lee and Shivakumar Vaithyanathan, Bo Pang, Lillian Lee and Shivakumar Vaithyanathan, Thumbs Up? Thumbs Up? Sentiment Classification Using Machine Learning TechniquesSentiment Classification Using Machine Learning Techniques, Proceedings , Proceedings of EMNLP 2002,pp 79-86.of EMNLP 2002,pp 79-86.
Peter Turney. 2002. Peter Turney. 2002. Thumbs up or thumbs down? Se-mantic orientation Thumbs up or thumbs down? Se-mantic orientation applied to unsupervised classication of reviewsapplied to unsupervised classication of reviews. In Proc. of the ACL.. In Proc. of the ACL.
Recommended