27
Sentiment, News and the Polarity Problem Leslie Barrett www.lbtechconsulting.com April 13, 2010

Sentiment, News, and the Polarity Problem, Leslie Barrett

Embed Size (px)

DESCRIPTION

Although sentiment analysis has a strong history of success on 
customer feedback and certain blogs and editorials, accuracy results are 
mixed for data in the absence of an opinion holder. In particular, news data 
poses some unique challenges to accuracy for sentiment analysis due to the 
blending of what I will call "objective" polarity with opinion-based 
polarity. How is (document-level) Sentiment to be determined, for example, 
in an article about the Haitian earthquake that discusses humanitarian aid? 
Similarly, an article about Bernard Madoff’s jail sentence shows a highly 
negative “objective polarity” somehow mitigated by a subsequent action. And 
how can we tease an author’s opinion from the semantics of objective polarity 
where they exist in news data? Author opinion (often referred to as “bias”) 
in news data is subtle in its indication by design. This talk discusses the grounding of the concept of "sentiment" within the greater 
context of the Semantics of Opposition.

Citation preview

Page 1: Sentiment, News, and the Polarity Problem, Leslie Barrett

Sentiment, News and the Polarity Problem

Leslie Barrett

www.lbtechconsulting.com

April 13, 2010

Page 2: Sentiment, News, and the Polarity Problem, Leslie Barrett

Sentiment and Opinion

• Are sentiment and opinion the same?• Are feelings the same as beliefs?• Sentiment can be applied to opinion but

not the other way around (Kim And Hovy 2004)

• The question is – should it apply to anything else? Does it make sense in narrative, exposition, news data?

• How much text should we apply it to?

Page 3: Sentiment, News, and the Polarity Problem, Leslie Barrett

Sources

• Sentiment analysis has been applied where opinion is the norm – blogs and Tweets

• It has also been applied where opinion is designed to be subtle, if expressed at all – news data

• So maybe news data is never really objective, or else maybe sentiment is really used as simple polarity – separating the world into human ideas of positive and negative “buckets” blind to objectivity

Page 4: Sentiment, News, and the Polarity Problem, Leslie Barrett

Polarity

• Polarity is the stuff through which sentiment is measured

• Sentiment is usually considered to have the “poles” positive and negative

• These are most often “translated” into “good” and “bad”

• Sentiment analysis is really considered useful for telling us what is “good” and “bad” in our information stream

Page 5: Sentiment, News, and the Polarity Problem, Leslie Barrett

The “Machine”

• So the sentiment analysis machine takes in some text and tells us whether that text says something “good” or “bad”.

• OK…..but before we unveil our machine, we need to ask some important but often overlooked questions:

• - what text is going in?• - where does “good” stop and “bad” begin?• - what is the text “about”?

Page 6: Sentiment, News, and the Polarity Problem, Leslie Barrett

Why do we needSentiment AnalysisBeavis?

So we’ll know what we’re thinking!

Page 7: Sentiment, News, and the Polarity Problem, Leslie Barrett

Let’s Try Feeding the Machine News Data!

• News Headlines sound like a pretty straightforward text type to apply sentiment to, given what we’ve just said.

• Even though news is supposed to be “objective”, headlines sell papers and often can be dramatic

• Keywords like “crash”, “downturn” and “disaster” are abundant and strong sentiment indictors.

• - but are headlines enough? • - we may want document-level sentiment for

news• - does it matter what the news is “about”?

Page 8: Sentiment, News, and the Polarity Problem, Leslie Barrett

Some “real” headlines

• Short-lived • Coup• Disappoints• Bears

Page 9: Sentiment, News, and the Polarity Problem, Leslie Barrett

Beware of Headlines in Financial News

• financial news especially is really a genre unto itself

• Its polarity perspective is skewed constantly by pundit “benchmarking”

• Beating bad expectations is better than a good quarter that falls short – in pundit opinion

Page 10: Sentiment, News, and the Polarity Problem, Leslie Barrett

Can Sentiment Analysis “beat expectations”?

• All kinds of negatives here but the document-level sentiment should be positive – that’s how an analyst would see it

• So if you skew to this, what about other news?

Page 11: Sentiment, News, and the Polarity Problem, Leslie Barrett

Objectively “bad” Events Happen

• Some events don’t require an opinion holder

• They simply have a generally agreed upon negative or positive polarity

• And we need to get them right because they affect other events (e.g. crop yields, etc)

Page 12: Sentiment, News, and the Polarity Problem, Leslie Barrett

When Bad Things Happen to Positive Sentiment

• But objectively bad events have their own problems, even in the absence of “expectations”.

• The problem with polarity measures outside of the presence of an opinion holder is topic drift

• An editorial or blog is likely to stick to one sentiment, but bad events can have the dreaded “silver lining”

Page 13: Sentiment, News, and the Polarity Problem, Leslie Barrett

Disaster+Relief Can Spell Trouble

• Despite some strong negative polarity indicators like “traumatized”, “disaster” and “tsunami” this article has an overall positive theme

Page 14: Sentiment, News, and the Polarity Problem, Leslie Barrett

Don’t Quote Me!

• Another problem in news data is “opinion blend”

• Often you have an author’s opinion but other opinions that may differ – directly or indirectly cited

• Or an author using quotes to showcase two different opinions

• Coverage of a “debate” for example can get very difficult for even a human to judge

Page 15: Sentiment, News, and the Polarity Problem, Leslie Barrett

Attribution vs. Quoting

• The author clearly does not believe the positive topic of the article

• But Clinton believes it• So is this positive

sentiment about Clinton?

Page 16: Sentiment, News, and the Polarity Problem, Leslie Barrett

Pundits vs. Authors vs. Topics

• How can I be sure that “bad news” about my client is about my client?– Make sure the named entity in question is a topic of

the document– So-called “document mates” don’t matter

• Do author names matter? Should I extract them?– Yes! Over time if you classify by author name against

other entities you might detect bias– Do the same for known “pundits” on a topic…..same

result may emerge

Page 17: Sentiment, News, and the Polarity Problem, Leslie Barrett

What’s it all About?

• Some data just tends to be multi-thematic or non-thematic

• In particular, market and financial reports, which often make their way into news feeds, tend to be this way.

• It is very hard to get a reasonable sentiment reading on either type of document.

Page 18: Sentiment, News, and the Polarity Problem, Leslie Barrett

SEC Reports: too big, too many sections

• There is the Management Discussion, which can have appropriate sentiment scores

• But there are so many other sections, no single theme

• Many sections have boilerplate, such as the accounting review

Page 19: Sentiment, News, and the Polarity Problem, Leslie Barrett

Scraping

• Your data is only as good as your news feed.

• Sometimes a site will deliver excess content that creeps into the text field of a feed

• That content could be an ad or even another article, skewing the sentiment reading for the expected article and hurting topic detection too.

Page 20: Sentiment, News, and the Polarity Problem, Leslie Barrett

Field Overlap from a Typical News Page

Page 21: Sentiment, News, and the Polarity Problem, Leslie Barrett

What to Do?

• Stop doing Sentiment Analysis on news data?

• NO!• News data is very valuable for reputation

management• Also can be valuable for investment firms

*if* you can tease out the jargon and pundit-speak

• Document-level is still OK!

Page 22: Sentiment, News, and the Polarity Problem, Leslie Barrett

Best Practices

• Good topic detection

- see what’s closely aligned with a theme and eliminate non-thematic or weak-thematic documents

• Good feed maintenance

- you or your feed provider need to spot check for scraping problems

Page 23: Sentiment, News, and the Polarity Problem, Leslie Barrett

Tricks & Tips

• Data extraction for problem documents– If document sections are identified with tags, use

them (this is true for SEC reports) and extract the “good” data (see Pang and Lee 2004 on extracting document portions)

– Write regular expression libraries to find quoted and cited material. Remove or use separately

• Topic drift is harder but….– you can extract the first n paragraphs. Main topical

material in news generally in top 25% of document– Secondary topics don’t carry same weight

Page 24: Sentiment, News, and the Polarity Problem, Leslie Barrett

What’s Next for Polarity?

• Future directions for news-based sentiments analysis are based on looking outside of Positive and Negative poles

• Think about all the “opposites” in the world– Sweet/sour– Cold/hot– Inside/outside– Wet/dry– Hard/soft

Page 25: Sentiment, News, and the Polarity Problem, Leslie Barrett

Leverage the Semantics of Opposition

• There are many types of opposition to study and they can be used in different ways– Complementary opposites (male,female)– Reversatives (backwards, forwards)– Scalar opposites (tall, short)

• A good deal of semantic research that has yet to be leveraged for opinion analysis and classification (Mettinger, Pustejovsky, Kennedy, Miller, inter alia…)

Page 26: Sentiment, News, and the Polarity Problem, Leslie Barrett

Opposites and Opinions

• Let’s think of some opinions that fit into poles not definable in terms of “positive” and “negative”– Conserative vs. Liberal– Government Expansion vs. Privatization

• Can these positions be detected automatically?

• ………..

Page 27: Sentiment, News, and the Polarity Problem, Leslie Barrett

Appendix/Bibliography• Kim, Soo-Min and Eduard Hovy. 2004. Determining the

Sentiment of Opinions. Proceedings of COlING-04. pp. 1367--1373. Geneva, Switzerland.

• James Pustejovsky, "Events and the Semantics of Opposition" in Events as Grammatical Objects , C. Tenny and J. Pustejovsky (eds.), 2000, CSLI Publications.

• Arthur Mettinger, Aspects of Semantic Opposition in English, Clarendon Press, Oxford, 1994

• Bo Pang and Lillian Lee, “A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts”, In Proceedings of the Association for Computational Linguistics, 2004