36
OPINION MINING FROM USER REVIEWS Project Members - Pankaj Mishra - Neha Natarajan - Chinmay Deshpande Guides Dr. Amiya Kumar Tripathy Dr. Revathy Sundararajan May 5, 2014 05/05/2014 1

Opinion Mining from User Reviews (OMUR)

Embed Size (px)

Citation preview

OPINION MINING FROM USER

REVIEWS

Project Members

- Pankaj Mishra

- Neha Natarajan

- Chinmay Deshpande

Guides

Dr. Amiya Kumar Tripathy

Dr. Revathy Sundararajan

May 5, 2014

05/05/2014 1

Contents• Introduction

• How opinion mining is useful for companies?

• Feedback Cycle in companies

• Methodology

• Machine Learning: HMM

• Architecture

• Algorithm

• System Learning and Tuning

• Implementation

• Application

05/05/2014 2

IntroductionWhat is Opinion Mining?

• Opinion mining focuses on using information processing

techniques to find valuable information in the vast quantity of

user-generated content.

05/05/2014 3

How Opinion Mining is useful for Companies?

05/05/2014 4

Feedback Cycle in Companies

05/05/2014 5

Methodology Input from different sources:

• Web Reviews

• Blogs

• Text Documents

Sentences are classified into two principal classes:

• objective sentences

• subjective sentences

Opinion and Sentiments can be extracted from subjective sentences only.

05/05/2014 6

Methodology (Cont…)

05/05/2014 7

Enhancers

Reducers

Negation

Very BadVery Bad

Slightly GoodSlightly Good

Hassle FreeHassle Free

Methodology (Cont…)

Stanford Core NLP libraries

Provides a set of natural language analysis

tools.

Input:

Raw English language

Output:

POS-Tagging, Parse Tree

Co-referencing dependencies

Word Count

05/05/2014 8

Sentence ‘the performance of the car is really very good’

Output (in Pretty Print Format)

05/05/2014 9

Methodology (Cont…)• SentiWordNet

A lexical resource for opinion mining

Provides

- synsets; synonyms of the word

- positive, objective and negative score for the word in the range of 0 to 1

05/05/2014 10

Word ‘sharp’

05/05/2014 11

Machine Learning: HMM

05/05/2014 12

HMM (Cont…)

05/05/2014 13

Architecture

05/05/2014 14

Data Extraction

Sentence

Processing

Domain

Knowledge

Sentence

Analysis

Opinion

Extraction

Aggregation

Database

Algorithms1. Polarity Assignment Algorithm

2. Opinion Extraction Algorithm

3. Weight Assignment Algorithm

05/05/2014 15

System Learning and Tuning• Alerts

– Noise can get added in domain knowledge

– Also, Polarity orientation may be opposite

– These are corrected here

05/05/2014 16

05/05/2014 17

System Learning and Tuning• Blacklist

– Some of the noisy data may get added again and

again.

– On blacklisting them, they are never considered

again for opinion mining

– Burden of admin to remove noise is reduced

05/05/2014 18

05/05/2014 19

Implementation

05/05/2014 20

Application

05/05/2014 21

Thank You!!!

05/05/2014 22

05/05/2014 23

• Enhancers:

▫ Appear with opinion word

▫ Increase the +ve or –ve of sentence

▫ Words like ‘extremely’, ‘very’, etc.

Happy with the car

(positive degree)

Very happy with the car

(larger positive degree)

Larger Positive Degree

Larger Negative Degree

Poor Performance

(negative degree)

Extremely Poor Performance

(larger negative degree)

05/05/2014 24

• Reducers: Appear with opinion word Reduce the Impact Words like ‘only’, ‘slightly’, etc.

Lesser Positive Degree

Better Performance

(positive degree)

Slightly Better Performance

(lesser positive degree)

Lesser Negative Degree

Bad Taste

(negative degree)

Slightly Bad Taste

(lesser negative degree)

05/05/2014 25

• Negation:

– Reverses the polarity of the word

– Words like ‘Not’, ‘Never’, etc.

– Recognizing is a crucial task

– Set of words which convey positive effect

– Words like ‘free’, ‘remove’, etc.

Car is Good Car is not Good

Car is hassle free

(‘hassle’ is negative word.‘free’ changes the polarity from negative to

positive.Hence ‘hassle free’ becomes a positive

opinion)

• Parse Tree in Pretty Print Format

• Output in Visual Format

05/05/2014 26

• Parse Tree in Pretty Print Format

• Output in Visual Format

05/05/2014 27

05/05/2014 28

SentiWordNet to MySQL

05/05/2014 29

05/05/2014 30

05/05/2014 31

• An observation sequence O of length T:

O = (O1, O2,… OT)• Some definitions:

– n - the number of stated in the model

– M - the number of different symbols that can observed

– Q - {q1, q2,…,qn} – set of internal states

– V - {v1,v2,…,vn} – the set of observable symbols

– A - {} – set of state of transitional probabilities

– B - {} – set of symbol emission probabilities

– Π - initial state probability distribution

– Λ – Hidden Markov Model

λ = (A,B,Π)

05/05/2014 32

• Suppose there are two coins A : Biased, B : Unbiased

• For A,

probability of Heads = 0.75

probability of Tails = 0.25

• For B,

probability of Heads = 0.5

probability of Tails = 0.5

Person can toss any coin he wants. He can switch from one coin to another at any instance of time. Only the output at each instance i.e. ‘H’ or ‘T’ is visible to us.

Biased-Coin Model

05/05/2014 33

Visible States = {Heads, Tails}Hidden States = {Biased coin, Unbiased coin}Sample Output

HTHHTHTHHTHTHTHHHHHHHHHHHHHHHHTTHTHTHTHT

Here we cannot surely say when the person switched between the two coins.

Using HMM, we can predict when biased coin was used.

HTHHTHTHHTHTHTHHHHHHHHHHHHHHHHTTHTHTHTHT

05/05/2014 34

Polarity Assignment Algorithm

05/05/2014 35

05/05/2014 36

• P(o) polarity of the opinion words

• P(m) polarity of the modifiers

• Both can take values either 1(+ve) or -1(-ve)

• W(o) Weight of opinion words

• W(m) Weight of modifiers.

The final weight W(f)

W(f) = P(o) * W(o) * [1 + W(m)]P(m)