14
www.decideo.fr/bruley Extract from various presentations: Bing Liu, Aditya Joshi, Aster Data … Sentiment Sentiment Analysis Analysis [email protected] January 2012 January 2012

Big Data & Sentiment Analysis

Embed Size (px)

DESCRIPTION

Big amount of information is available in textual form in databases or online sources, and for many enterprise functions (marketing, maintenance, finance, etc.) represents a huge opportunity to improve their business knowledge. Sentiment analysis or opinion mining refers to the application of language processing to identify and extract subjective information in source materials. Generally speaking, sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document.

Citation preview

Page 1: Big Data & Sentiment Analysis

www.decideo.fr/bruley

Extract from various presentations: Bing Liu, Aditya Joshi, Aster Data …

Sentiment Sentiment AnalysisAnalysis

[email protected]

January 2012January 2012

Page 2: Big Data & Sentiment Analysis

www.decideo.fr/bruley

IntroductionIntroduction

Two main types of textual information: Facts and Opinions

Most current text information processing methods work with factual information (e.g., web search, text mining)

Sentiment analysis or opinion mining, computational study of opinions (sentiments, emotions) expressed in text

Why opinion mining now? Mainly because of the Web huge volumes of opinionated text.

Page 3: Big Data & Sentiment Analysis

www.decideo.fr/bruley

What is Sentiment What is Sentiment Analysis?Analysis?

Identify the orientation of opinion in a piece of text (blogs, user comments, review websites, community websites, …), in others words determine if a sentence or a document expresses positive, negative, neutral sentiment towards some object?

The movie was fabulous!

The movie stars Mr. X

The movie was horrible!

[ Factual ][ Sentimental ] [ Sentimental ]

Page 4: Big Data & Sentiment Analysis

www.decideo.fr/bruley

SA at different levelsSA at different levels

The movie wasinteresting and

fabulous

fabulous

interesting

The movie wasvery boring

boring

Word-level SA

Sentence-level SA

Document-level SA

The police stoppedcorruption

His last movie wasgreat.

police (subj.) stopped (verb) corruption (obj.)

His last movie wasGreat and interesting.

This one’s a dud.

Page 5: Big Data & Sentiment Analysis

www.decideo.fr/bruley

What is an Opinion? What is an Opinion?

An opinion is a quintuple:

(oj, fjk, soijkl, hi, tl)

where

– oj is a target object

– fjk is a feature of the object oj

– soijkl is the sentiment value of the opinion of the opinion holder hi on feature fjk of object oj at time tl

– hi is an opinion holder

– tl is the time when the opinion is expressed

Page 6: Big Data & Sentiment Analysis

www.decideo.fr/bruley

Objective: structure the Objective: structure the unstructuredunstructured

Objective: Given an opinionated document, – Discover all quintuples (oj, fjk, soijkl, hi, tl),

• i.e., mine the five corresponding pieces of information in each quintuple

With the quintuples, – Unstructured Text Structured Data

• Traditional data and visualization tools can be used to slice, dice and visualize the results in all kinds of ways

• Enable qualitative and quantitative analysis

With all quintuples, all kinds of analyses become possible

Page 7: Big Data & Sentiment Analysis

www.decideo.fr/bruley

SA is not Just ONE ProblemSA is not Just ONE Problem

Track direct opinions:

– document

– sentence

– feature level

Compare opinions: different types of comparisons

Detect opinion spam detection: fake reviews

Page 8: Big Data & Sentiment Analysis

www.decideo.fr/bruley

Polarity ClassifierPolarity Classifier

First eliminate objective sentences, then use remaining sentences to classify document polarity (reduce noise)

Page 9: Big Data & Sentiment Analysis

www.decideo.fr/bruley

Level of AnalysisLevel of Analysis

We can inquire about sentiment at various linguistic levels:

Words – objective, positive, negative, neutral

Clauses – “going out of my mind”

Sentences – possibly multiple sentiments

Documents

Page 10: Big Data & Sentiment Analysis

www.decideo.fr/bruley

WordsWords

Adjectives

– objective: red, metallic

– positive: honest, important, mature, large, patient

– negative: harmful, hypocritical, inefficient

– subjective (but not positive or negative): curious, peculiar, odd, likely, probable

Verbs

– positive: praise, love

– negative: blame, criticize

– subjective: predict Nouns

– positive: pleasure, enjoyment

– negative: pain, criticism

– subjective: prediction, feeling

Page 11: Big Data & Sentiment Analysis

www.decideo.fr/bruley

ClausesClauses

Might flip word sentiment

– “not good at all”

– “not all good”

Might express sentiment not in any word

– “convinced my watch had stopped”

– “got up and walked out”

Page 12: Big Data & Sentiment Analysis

www.decideo.fr/bruley

Some ProblemsSome Problems

Which features to use? Words (unigrams), Phrases/n-grams, Sentences

How to interpret features for sentiment detection? Bag of words (IR), Annotated lexicons (WordNet, SentiWordNet), Syntactic patterns, Paragraph structure

Must consider other features due to…

– Subtlety of sentiment expression

• irony

• expression of sentiment using neutral words

– Domain/context dependence

• words/phrases can mean different things in different contexts and domains

– Effect of syntax on semantics

Page 13: Big Data & Sentiment Analysis

www.decideo.fr/bruley

Some Applications Some Applications ExamplesExamples

Review classification: Is a review positive or negative toward the movie?

Product review mining: What features of the ThinkPad T43 do customers like/dislike?

Tracking sentiments toward topics over time: Is anger

ratcheting up or cooling down?

Prediction (election outcomes, market trends): Will Obama or Republican candidate win?

Etcetera

Page 14: Big Data & Sentiment Analysis

www.decideo.fr/bruley

Aster Data position for Text Aster Data position for Text AnalysisAnalysis

Data Acquisition

Data Acquisition Pre-ProcessingPre-Processing MiningMining Analytic

ApplicationsAnalytic

Applications

Perform processing required to transform and

store text data and information

(stemming, parsing, indexing, entity extraction, …)

Gather text from relevant sources

(web crawling, document scanning, news feeds,

Twitter feeds, …)

Apply data mining techniques to derive insights about stored

information

(statistical analysis, classification, natural

language processing, …)

Leverage insights from text mining to provide

information that improves decisions and processes

(sentiment analysis, document management, fraud analysis,

e-discovery, ...)

Third-Party Tools Fit

Aster Data Fit

Aster Data Value: Massive scalability of text storage and processing, Functions for text processing, Flexibility to develop diverse custom analytics and incorporate third-party libraries