Upload
neha-natarajan
View
121
Download
2
Embed Size (px)
Citation preview
OPINION MINING FROM USER
REVIEWS
Project Members
- Pankaj Mishra
- Neha Natarajan
- Chinmay Deshpande
Guides
Dr. Amiya Kumar Tripathy
Dr. Revathy Sundararajan
May 5, 2014
05/05/2014 1
Contents• Introduction
• How opinion mining is useful for companies?
• Feedback Cycle in companies
• Methodology
• Machine Learning: HMM
• Architecture
• Algorithm
• System Learning and Tuning
• Implementation
• Application
05/05/2014 2
IntroductionWhat is Opinion Mining?
• Opinion mining focuses on using information processing
techniques to find valuable information in the vast quantity of
user-generated content.
05/05/2014 3
Methodology Input from different sources:
• Web Reviews
• Blogs
• Text Documents
Sentences are classified into two principal classes:
• objective sentences
• subjective sentences
Opinion and Sentiments can be extracted from subjective sentences only.
05/05/2014 6
Methodology (Cont…)
05/05/2014 7
Enhancers
Reducers
Negation
Very BadVery Bad
Slightly GoodSlightly Good
Hassle FreeHassle Free
Methodology (Cont…)
Stanford Core NLP libraries
Provides a set of natural language analysis
tools.
Input:
Raw English language
Output:
POS-Tagging, Parse Tree
Co-referencing dependencies
Word Count
05/05/2014 8
Sentence ‘the performance of the car is really very good’
Output (in Pretty Print Format)
05/05/2014 9
Methodology (Cont…)• SentiWordNet
A lexical resource for opinion mining
Provides
- synsets; synonyms of the word
- positive, objective and negative score for the word in the range of 0 to 1
05/05/2014 10
Architecture
05/05/2014 14
Data Extraction
Sentence
Processing
Domain
Knowledge
Sentence
Analysis
Opinion
Extraction
Aggregation
Database
Algorithms1. Polarity Assignment Algorithm
2. Opinion Extraction Algorithm
3. Weight Assignment Algorithm
05/05/2014 15
System Learning and Tuning• Alerts
– Noise can get added in domain knowledge
– Also, Polarity orientation may be opposite
– These are corrected here
05/05/2014 16
System Learning and Tuning• Blacklist
– Some of the noisy data may get added again and
again.
– On blacklisting them, they are never considered
again for opinion mining
– Burden of admin to remove noise is reduced
05/05/2014 18
05/05/2014 23
• Enhancers:
▫ Appear with opinion word
▫ Increase the +ve or –ve of sentence
▫ Words like ‘extremely’, ‘very’, etc.
Happy with the car
(positive degree)
Very happy with the car
(larger positive degree)
Larger Positive Degree
Larger Negative Degree
Poor Performance
(negative degree)
Extremely Poor Performance
(larger negative degree)
05/05/2014 24
• Reducers: Appear with opinion word Reduce the Impact Words like ‘only’, ‘slightly’, etc.
Lesser Positive Degree
Better Performance
(positive degree)
Slightly Better Performance
(lesser positive degree)
Lesser Negative Degree
Bad Taste
(negative degree)
Slightly Bad Taste
(lesser negative degree)
05/05/2014 25
• Negation:
– Reverses the polarity of the word
– Words like ‘Not’, ‘Never’, etc.
– Recognizing is a crucial task
– Set of words which convey positive effect
– Words like ‘free’, ‘remove’, etc.
Car is Good Car is not Good
Car is hassle free
(‘hassle’ is negative word.‘free’ changes the polarity from negative to
positive.Hence ‘hassle free’ becomes a positive
opinion)
• Parse Tree in Pretty Print Format
• Output in Visual Format
05/05/2014 26
• Parse Tree in Pretty Print Format
• Output in Visual Format
05/05/2014 31
• An observation sequence O of length T:
O = (O1, O2,… OT)• Some definitions:
– n - the number of stated in the model
– M - the number of different symbols that can observed
– Q - {q1, q2,…,qn} – set of internal states
– V - {v1,v2,…,vn} – the set of observable symbols
– A - {} – set of state of transitional probabilities
– B - {} – set of symbol emission probabilities
– Π - initial state probability distribution
– Λ – Hidden Markov Model
λ = (A,B,Π)
05/05/2014 32
• Suppose there are two coins A : Biased, B : Unbiased
• For A,
probability of Heads = 0.75
probability of Tails = 0.25
• For B,
probability of Heads = 0.5
probability of Tails = 0.5
Person can toss any coin he wants. He can switch from one coin to another at any instance of time. Only the output at each instance i.e. ‘H’ or ‘T’ is visible to us.
Biased-Coin Model
05/05/2014 33
Visible States = {Heads, Tails}Hidden States = {Biased coin, Unbiased coin}Sample Output
HTHHTHTHHTHTHTHHHHHHHHHHHHHHHHTTHTHTHTHT
Here we cannot surely say when the person switched between the two coins.
Using HMM, we can predict when biased coin was used.
HTHHTHTHHTHTHTHHHHHHHHHHHHHHHHTTHTHTHTHT