31
Semantic Patterns for Sentiment Analysis of Twitter Hassan Saif, Yulan He, Miriam Fernandez and Harith Alani The 13 th International Semantic Web Conference (ISWC2014) May 2014

Semantic Patterns for Sentiment Analysis of Twitter

Embed Size (px)

DESCRIPTION

Most existing approaches to Twitter sentiment analysis assume that sentiment is explicitly expressed through affective words. Nevertheless, sentiment is often implicitly expressed via latent semantic relations, patterns and dependencies among words in tweets. In this paper, we propose a novel approach that automatically captures patterns of words of similar contextual semantics and sentiment in tweets. Unlike previous work on sentiment pattern extraction, our proposed approach does not rely on external and fixed sets of syntactical templates/patterns, nor requires deep analyses of the syntactic structure of sentences in tweets. We evaluate our approach with tweet- and entity-level sentiment analysis tasks by using the extracted semantic patterns as classification features in both tasks. We use 9 Twitter datasets in our evaluation and compare the performance of our patterns against 6 state-of-the-art baselines. Results show that our patterns consistently outperform all other baselines on all datasets by 2.19% at the tweet-level and 7.5% at the entity-level in average F-measure.

Citation preview

Page 1: Semantic Patterns for Sentiment Analysis of Twitter

Semantic Patterns for Sentiment Analysis of Twitter

Hassan Saif, Yulan He, Miriam Fernandez and Harith Alani

The 13th International Semantic Web Conference (ISWC2014)May 2014

Page 2: Semantic Patterns for Sentiment Analysis of Twitter

OutLine

o Sentiment Analysis

o Traditional Sentiment Analysis

o Pattern-based Sentiment Analysis

o Semantic Sentiment Patterns

o Evaluation

o Results

o Conclusion

Page 3: Semantic Patterns for Sentiment Analysis of Twitter

“Sentiment analysis is the task of identifying positive and negative opinions, emotions and evaluations in text”

3

Opinion OpinionFact

Nooo, it is very humid :(

The weather is great today :)

I think its almost 30 degrees today

Sentiment Analysis

Page 4: Semantic Patterns for Sentiment Analysis of Twitter

Traditional Sentiment Analysis

Training Features:– Syntactic features

(letter, n-grams, word n-grams, POS tags, etc)

– Linguistic Features (Synonyms, glosses, etc)

(1) The Lexicon-based Approach

(1) The Machine Learning Approach

Just got my new iPhone 6, looks and feel great! :D

Sentiment Lexicon

great horrible

sad

pretty

down

wrong

beautiful mistake

good

Page 5: Semantic Patterns for Sentiment Analysis of Twitter

Traditional Sentiment Analysis

However..Sentiment is often expressed via more subtle relations, patterns and dependencies among words in tweets:

Destroy Invading Germs

Negative ConceptNegative

Positive Sentiment

Page 6: Semantic Patterns for Sentiment Analysis of Twitter

Pattern-based Sentiment Analysis

Syntactic Pattern Approaches

Semantic Pattern Approaches

Page 7: Semantic Patterns for Sentiment Analysis of Twitter

Syntactic Pattern Approaches

• Based on syntactic relations between words.

• Rely on predefined POS templates:

• But, they are Semantically Weak!

<subject> passive-verb <subject> active-verb<customer> was satisfied <she> complained

<beer> is cold <subject> verb cold

<weather> is cold

Page 8: Semantic Patterns for Sentiment Analysis of Twitter

Semantic Pattern Approaches

• Apply syntactic and semantic processing techniques

• Use external semantic resources (Ontologies, Semantic Networks, etc.)

• Capture the conceptual semantic relations in text that implicitly convey sentiment– Happy birthday (Positive)

– Invading Germs (Negative)

Page 9: Semantic Patterns for Sentiment Analysis of Twitter

Syntactic & Semantic Pattern Approaches

are not tailored to

Twitter

Page 10: Semantic Patterns for Sentiment Analysis of Twitter

Are designed to function on

Formal Text, that is:

1. Long enough

2. Well-Structured

3. Formal Sentences

Syntactic & Semantic Pattern Approaches

Page 11: Semantic Patterns for Sentiment Analysis of Twitter

Tweets are often• Short!• Noisy and messy• Have informal, and

ill-structured sentences

Page 12: Semantic Patterns for Sentiment Analysis of Twitter

A pattern-based approach

Works on Twitter

Does not rely on the syntactic structures of tweets or pre-defined syntactic templates

Does not rely on or semantic knowledge sources.

Automatically extracts patterns from the contextual semantic and sentiment similarities of words in tweets

We Propose..

Page 13: Semantic Patterns for Sentiment Analysis of Twitter

Contextual Semantics and Sentiment

• Contextual Semantics refer to semantics inferred from words’ co-occurrences in tweets.

“Words that occur in similar context tend to have similar meaning”Wittgenstein (1953)

Trojan Horse

ThreatHack

Code

Malware

Program

Dangerous

HarmTrojan Horse

Greek Tale

History

ClassWooden

Troy

Contextual Semantics

Page 14: Semantic Patterns for Sentiment Analysis of Twitter

Contextual Semantic Sentiment Patterns

“Some words in different tweets tend to come with similar contextual semantics and sentiment, forming therefore specific clusters or patterns.

Trojan Horse

ThreatHack

Code

Malware

Program

Dangerous

Harm

Spyware

Page 15: Semantic Patterns for Sentiment Analysis of Twitter

Contextual Semantic Sentiment Patterns

Trojan Horse

ThreatHack

Code

Malware

Program

Dangerous

Harm

Spyware

C_Semantics(Worms)

Negative Contextual Pattern

C_Semantics(Adware)

C_Semantics(Time bombs)

Follow

Follow

Follow

Page 16: Semantic Patterns for Sentiment Analysis of Twitter

Pattern Extraction

1. Syntactical Preprocessing of tweets

2. Capturing the Contextual Semantics and Sentiment of words

3. Extracting Semantic Sentiment Patterns

Pipeline

Page 17: Semantic Patterns for Sentiment Analysis of Twitter

• All URL links are replaced with the term “URL”

• Remove all non-ASCII and non-English characters

• Revert words that contain repeated letters to their original English form. – “maaadddd” will be converted to “mad” after

processing.

(1) Syntactical Preprocessing

Page 18: Semantic Patterns for Sentiment Analysis of Twitter

The SentiCircle Approach

(2) Capturing Contextual Semantics & Sentiment

Term (m) C1

Degree of Correlation

Prior Sentiment

Trojan Horse

Context Terms

X = R * COS(θ) Y = R * SIN(θ)

Dangerous

X

ri

θi

xi

yi

SentiCircle of “Trojan Horse”

PositiveVery Positive

Very Negative Negative

+1

-1

+1-1 Neutral Region

ri = TDOC(Ci)θi = Prior_Sentiment (Ci) * π

threat

destroyMalicious

attack

easily

discoverusefulfixC1Dangerous

Overall Contextual Sentiment (Senti-Median)

Saif, H., Fernandez, M., He, Y. and Alani, H. (2014) SentiCircles for Contextual and Conceptual Semantic Sentiment Analysis of Twitter, ESWC2014

Page 19: Semantic Patterns for Sentiment Analysis of Twitter

(3) Extracting Semantic Sentiment Patterns

Patterns are extracted by finding clusters of Similar SentiCircles

iPod

Spyware

Oprah

Obama

Geometry Density Dispersion

SentiCircle’s Feature Vector

(1)

(2) K-means

SS-Patterns

SentiCircle’s Feature Vectors

Page 20: Semantic Patterns for Sentiment Analysis of Twitter

Evaluation

SS-Patterns

Training Sentiment Classifiers

Entity-level Sentiment Analysis

Tweet-level Sentiment Analysis

Detect the sentiment (Positive, Negative, Neutral) of named entities extracted from tweets

Detect the overall sentiment (Positive, Negative) of a tweet.

Page 21: Semantic Patterns for Sentiment Analysis of Twitter

Sentiment Classifiers– Tweet-Level• Maximum Entropy (MaxEnt)• Naïve Bayes (NB)

– Entity-Level• MLE Classifier

Evaluation Setup (1)

Page 22: Semantic Patterns for Sentiment Analysis of Twitter

Datasets

Evaluation Setup (2)

Tweet-level

Entity-Level

58 manually annotated named entities

9 Twitter datasets

Page 23: Semantic Patterns for Sentiment Analysis of Twitter

Baseline Features

Evaluation Setup (3)

Syntactic FeaturesUnigrams Individual unique terms in tweets

POS Features Words’ part-of-speech tags

Twitter Features Usernames, emoticons, hashtags, etc

Lexicon Features Prior sentiment of words in a given sentiment lexicon(e.g., great->positive, destroy->negative)

Semantic FeaturesLDA-Topic Features Topics generated by LDA

Semantic Concepts Semantic concepts of named entities in tweets (e.g., Obama -> Person, London -> City)

Page 24: Semantic Patterns for Sentiment Analysis of Twitter

Results

Page 25: Semantic Patterns for Sentiment Analysis of Twitter

Tweet-Level Sentiment Analysis (1)

The baseline model is a sentiment classifier trained from word unigram features.

• MaxEnt outperforms NB in average Accuracy and F1-measure

Page 26: Semantic Patterns for Sentiment Analysis of Twitter

Tweet-Level Sentiment Analysis (2)

Win/Loss in Accuracy and F-measure of using different features for sentiment classification on

all nine datasets.

Page 27: Semantic Patterns for Sentiment Analysis of Twitter

Entity-Level Sentiment Analysis

Accuracy F155.00

57.00

59.00

61.00

63.00

65.00

67.00

Unigrams LDA-TopicsSemantic Concepts SS-Patterns

SS-Patterns produce 6.31% and 7.5% higher accuracy and F-measure than other features

Page 28: Semantic Patterns for Sentiment Analysis of Twitter

Within-Pattern Sentiment Consistency

• Refers to the percentage of words having

similar sentiment within a given pattern.

• Strongly consistent patterns are those whose terms have similar sentiment.

Page 29: Semantic Patterns for Sentiment Analysis of Twitter

Within-Pattern Sentiment Consistency

• STS-Entity Dataset: – 58 Entities 14 SS-Patterns

Consistency(Pattern5) = 50%

Consistency(Pattern12) = 88.89%

Average Sentiment Consistency (14 SS-Patterns) = 88%

(Strongly Consistent)

(Poorly Consistent)

Page 30: Semantic Patterns for Sentiment Analysis of Twitter

Conclusion

• We proposed a new approach for automatically extracting patterns from the contextual semantic and sentiment similarities of words in tweets.

• Used patterns as features in tweet- and entity-level sentiment classification tasks

• SS-Patterns consistently outperformed the syntactic and semantic type of features for entity- and tweet-level sentiment analysis

• Conducted quantitative analysis on a sample of our extracted SS-Patterns and show that our patterns are strongly consistent with the sentiment of the words within them.

Page 31: Semantic Patterns for Sentiment Analysis of Twitter

Thank YouEmail: [email protected]: hrsaifWebsite: tweenator.com