www.decideo.fr/bruley
Extract from various presentations: Bing Liu, Aditya Joshi, Aster Data …
Sentiment Sentiment AnalysisAnalysis
January 2012January 2012
www.decideo.fr/bruley
IntroductionIntroduction
Two main types of textual information: Facts and Opinions
Most current text information processing methods work with factual information (e.g., web search, text mining)
Sentiment analysis or opinion mining, computational study of opinions (sentiments, emotions) expressed in text
Why opinion mining now? Mainly because of the Web huge volumes of opinionated text.
www.decideo.fr/bruley
What is Sentiment What is Sentiment Analysis?Analysis?
Identify the orientation of opinion in a piece of text (blogs, user comments, review websites, community websites, …), in others words determine if a sentence or a document expresses positive, negative, neutral sentiment towards some object?
The movie was fabulous!
The movie stars Mr. X
The movie was horrible!
[ Factual ][ Sentimental ] [ Sentimental ]
www.decideo.fr/bruley
SA at different levelsSA at different levels
The movie wasinteresting and
fabulous
fabulous
interesting
The movie wasvery boring
boring
Word-level SA
Sentence-level SA
Document-level SA
The police stoppedcorruption
His last movie wasgreat.
police (subj.) stopped (verb) corruption (obj.)
His last movie wasGreat and interesting.
This one’s a dud.
www.decideo.fr/bruley
What is an Opinion? What is an Opinion?
An opinion is a quintuple:
(oj, fjk, soijkl, hi, tl)
where
– oj is a target object
– fjk is a feature of the object oj
– soijkl is the sentiment value of the opinion of the opinion holder hi on feature fjk of object oj at time tl
– hi is an opinion holder
– tl is the time when the opinion is expressed
www.decideo.fr/bruley
Objective: structure the Objective: structure the unstructuredunstructured
Objective: Given an opinionated document, – Discover all quintuples (oj, fjk, soijkl, hi, tl),
• i.e., mine the five corresponding pieces of information in each quintuple
With the quintuples, – Unstructured Text Structured Data
• Traditional data and visualization tools can be used to slice, dice and visualize the results in all kinds of ways
• Enable qualitative and quantitative analysis
With all quintuples, all kinds of analyses become possible
www.decideo.fr/bruley
SA is not Just ONE ProblemSA is not Just ONE Problem
Track direct opinions:
– document
– sentence
– feature level
Compare opinions: different types of comparisons
Detect opinion spam detection: fake reviews
www.decideo.fr/bruley
Polarity ClassifierPolarity Classifier
First eliminate objective sentences, then use remaining sentences to classify document polarity (reduce noise)
www.decideo.fr/bruley
Level of AnalysisLevel of Analysis
We can inquire about sentiment at various linguistic levels:
Words – objective, positive, negative, neutral
Clauses – “going out of my mind”
Sentences – possibly multiple sentiments
Documents
www.decideo.fr/bruley
WordsWords
Adjectives
– objective: red, metallic
– positive: honest, important, mature, large, patient
– negative: harmful, hypocritical, inefficient
– subjective (but not positive or negative): curious, peculiar, odd, likely, probable
Verbs
– positive: praise, love
– negative: blame, criticize
– subjective: predict Nouns
– positive: pleasure, enjoyment
– negative: pain, criticism
– subjective: prediction, feeling
www.decideo.fr/bruley
ClausesClauses
Might flip word sentiment
– “not good at all”
– “not all good”
Might express sentiment not in any word
– “convinced my watch had stopped”
– “got up and walked out”
www.decideo.fr/bruley
Some ProblemsSome Problems
Which features to use? Words (unigrams), Phrases/n-grams, Sentences
How to interpret features for sentiment detection? Bag of words (IR), Annotated lexicons (WordNet, SentiWordNet), Syntactic patterns, Paragraph structure
Must consider other features due to…
– Subtlety of sentiment expression
• irony
• expression of sentiment using neutral words
– Domain/context dependence
• words/phrases can mean different things in different contexts and domains
– Effect of syntax on semantics
www.decideo.fr/bruley
Some Applications Some Applications ExamplesExamples
Review classification: Is a review positive or negative toward the movie?
Product review mining: What features of the ThinkPad T43 do customers like/dislike?
Tracking sentiments toward topics over time: Is anger
ratcheting up or cooling down?
Prediction (election outcomes, market trends): Will Obama or Republican candidate win?
Etcetera
www.decideo.fr/bruley
Aster Data position for Text Aster Data position for Text AnalysisAnalysis
Data Acquisition
Data Acquisition Pre-ProcessingPre-Processing MiningMining Analytic
ApplicationsAnalytic
Applications
Perform processing required to transform and
store text data and information
(stemming, parsing, indexing, entity extraction, …)
Gather text from relevant sources
(web crawling, document scanning, news feeds,
Twitter feeds, …)
Apply data mining techniques to derive insights about stored
information
(statistical analysis, classification, natural
language processing, …)
Leverage insights from text mining to provide
information that improves decisions and processes
(sentiment analysis, document management, fraud analysis,
e-discovery, ...)
Third-Party Tools Fit
Aster Data Fit
Aster Data Value: Massive scalability of text storage and processing, Functions for text processing, Flexibility to develop diverse custom analytics and incorporate third-party libraries