Determining Negation Scope and Strength in Sentiment Analysis SMC 2011 Paul van Iterson Erasmus...

Preview:

Citation preview

Determining Negation Scope and Strengthin Sentiment Analysis

SMC 2011

Paul van ItersonErasmus School of EconomicsErasmus University Rotterdam

paulvaniterson@gmail.com

Bas HeerschopErasmus School of EconomicsErasmus University Rotterdam

basheerschop@gmail.com

Flavius FrasincarErasmus School of EconomicsErasmus University Rotterdam

frasincar@ese.eur.nl

Uzay KaymakErasmus School of EconomicsErasmus University Rotterdam

kaymak@ese.eur.nl

October 12, 2011

Alexander HogenboomErasmus School of EconomicsErasmus University Rotterdam

hogenboom@ese.eur.nl

Outline

• Introduction

• Sentiment Analysis

• Accounting for Negation

• Framework

• Evaluation

• Conclusions

• Future WorkSMC 2011

2

Introduction (1)

• Need for information monitoring tools for tracking sentiment in today’s complex systems

• The Web offers an overwhelming amount of textual data, containing traces of sentiment

SMC 2011

3

Introduction (2)

• Existing sentiment analysis approaches are based on word frequencies

• There is a tendency of involving various other aspects of content in automated sentiment analysis

• Accounting for negation seems promising, but how to model the influence of negation keywords on the conveyed sentiment?

SMC 2011

4

Sentiment Analysis

• Sentiment analysis is typically focused on determining the polarity of natural language texts

• Applications in summarizing reviews, determining a general mood (consumer confidence, politics)

• Common approach to sentiment analysis:– Creation of lexicon (list of words and their sentiment scores)– Utilization of lexicon to determine sentiment in text

• Sentiment analysis approaches differ on several distinguishing characteristic features, e.g.,– Analysis level and focus– Handling of syntactic variations, amplification, and negation

55

SMC 2011

Accounting for Negation (1)

• Common approach: exploitation of negation keywords

• Challenge lies in finding the negation scope

• Sophisticated approaches involve complex rules, compositional semantics, or machine learning

• Many existing sentiment analysis frameworks use rather simple conceptualizations of negation scope

66

SMC 2011

Accounting for Negation (2)

• Let us consider the following positive sentence:– Example: Luckily, the smelly poo did not leave awfully

nasty stains on my favorite shoes!

• Rest of Sentence (RoS):– Following: Luckily, the smelly poo did not leave awfully

nasty stains on my favorite shoes! – Around:Luckily, the smelly poo did not leave awfully

nasty stains on my favorite shoes!

• First Sentiment-Carrying Word (FSW):– Following: Luckily, the smelly poo did not leave awfully

nasty stains on my favorite shoes!– Around:Luckily, the smelly poo did not leave awfully

nasty stains on my favorite shoes!

77

SMC 2011

Accounting for Negation (3)

• Let us consider the following positive sentence:– Example: Luckily, the smelly poo did not leave awfully

nasty stains on my favorite shoes!

• Next Non-Adverb (NNA):– Following: Luckily, the smelly poo did not leave awfully

nasty stains on my favorite shoes!

• Fixed Window Length (FWL):– Following (3): Luckily, the smelly poo did not leave awfully

nasty stains on my favorite shoes! – Around (3): Luckily, the smelly poo did not leave awfully

nasty stains on my favorite shoes!

88

SMC 2011

Framework (1)

• Lexicon-based sentence-level sentiment scoring by using SentiWordNet

• Optional support for sentiment negation

• Individual words are scored in the range [-1,1]

• Word scores are used to classify a sentence as positive (1) or negative (-1)

99

SMC 2011

Framework (2)

• Score sentences in test corpus for their sentiment

• For an arbitrary sentence:– Retrieve all words (simple and compound)– Retrieve each words’ Part-Of-Speech (POS) and lemma– Disambiguate word senses (Lesk algorithm)– Retrieve words’ sentiment scores from lexicon– Negate sentiment scores of negated words, as determined

by means of one of the considered approaches, by multiplying the scores with an inversion factor (typically negative)

– Calculate sentence score as sum of words’ scores– Classify sentence as either positive (score ≥ 0) or negative

(score < 0)

1010

SMC 2011

Evaluation (1)

• Implementation in C#, Microsoft SQL Server database, SharpNLP-based POS tagger, WordNet.Net API for lemmatization and word sense disambiguation, SentiWordNet sentiment lexicon

• Corpus of 930 positive and 1,355 negative manually classified English movie review sentences (60% training set, 40% test set)

1111

SMC 2011

Evaluation (2)

• Baseline: sentiment without accounting for negation

• Alternatives: negation scoping with RoS, FSW, NNA, and FWL (window sizes ranging from 1 to 4)

• Optimized inversion factor for best alternative to a value in the range [-2, 0] (hill-climbing on training set)

1212

SMC 2011

Evaluation (3)

Method Direction Window Inversion Accuracy Macro F1

Baseline - - -1.00 49.9% 49.4%

FSW Following - -1.00 52.3% 52.0%

FWL Following 1 -1.00 51.3% 51.0%

FWL Following 2 -1.00 52.7% 52.4%

FWL Following 3 -1.00 52.0% 51.7%

FWL Following 4 -1.00 52.2% 52.0%

FWL Following 2 -1.27 53.5% 53.3%

1313

SMC 2011

Conclusions

• Recent sentiment analysis methods consider more and more aspects of content other than word frequencies

• Our corpus-based evaluation of several common negation scoping methods shows that only some perform significantly better than our baseline of not accounting for negation

• FWL with a window of 2 words following a negation keyword yields the highest increase in accuracy (5.5%) and macro-level F1 (6.2%) compared to the baseline

• An optimized inversion factor of -1.27 rather than -1 yields an accuracy increase of 7.0% and a macro-level F1 increase of 8.0% compared to the baseline

SMC 2011

14

Future Work

• Let the negation scope detection method depend on the position of a negation keyword

• Deeper understanding of semantics in order to cope with, e.g., context-dependent interpretations

• Distinct sentiment inversion factors for negated positive and negative words

SMC 2011

15

Questions?

Alexander HogenboomErasmus School of EconomicsErasmus University RotterdamP.O. Box 1738, NL-3000 DRRotterdam, the Netherlands

hogenboom@ese.eur.nl

SMC 2011

16