An Ensemble Model for Cross-Domain Polarity Classification on Twitter

15th Web Information System Engineering (WISE 2014)Thessaloniki, 14 October 2014

An Ensemble Model for Cross-Domain PolarityClassification on TwitterAdam Tsakalidis, Symeon Papadopoulos, Yiannis Kompatsiaris

Information Technologies Institute (ITI)Center for Research & Technology Hellas (CERTH)

#wiseconf2014

#wiseconf2014 #2

Overview

• The Problem

• Existing Approaches

• Proposed Ensemble Model

• Experimental Study

• Summary – Future Work

#wiseconf2014 #3

Polarity Classificationmotivation & definition

#wiseconf2014 #4

Polarity Classification - Problem

Finding about whether a piece of text (e.g. tweet) conveys a positive or negative sentiment about a topic or entity of interest (person, party, brand, etc.)

#wiseconf2014

Polarity Classification - Applications

• Monitoring sentiment about a brand could help take appropriate measures in case of negative sentiment increase (e.g. apology, assistance, compensation).

• Aggregating over many tweets could help form an overall impression of the public opinion about the topic/entity of interest.

#5

#wiseconf2014

The Cross-Domain Challenge

• Depending on the domain (and sometimes even the topic) of interest, the vocabulary and phrasing of sentiment may vary wildly.

• Most sentiment detection approaches require manually created ground truth from the same domain to achieve high performance expensive!

• We propose a framework that leverages an ensemble of sentiment detectors to tackle the cross-domain challenge.

#6

#wiseconf2014

Related Work

#7

#wiseconf2014 #8

Background: Sentiment Analysis

Two main problems: subjectivity & polarity

RepresentationRepresentation LearningLearningTextText SentimentSentiment

NBNB SVMSVM

Hybrid- Ensemble

Hybrid- EnsembleSSLSSLUnsupervisedUnsupervisedSupervisedSupervisedOtherOtherPOSPOSBigramsBigramsUnigramsUnigrams

#wiseconf2014

Hybrid Models

Barbosa and Feng, COLING 2010 • Combines sentiment labels from three independent sources• Considers the quality of individual sources• Considers the different bias of individual sources

Gonçalves et al., COSN 2013 • First evaluates 8 different methods with respect to coverage

and agreement• Proposes a combined method, which combines individual

method outputs based on their performance (first in terms of coverage and then in terms of accuracy) on an independent dataset

#9

#wiseconf2014

Ensemble Model for Cross-Domain Polarity Classificationapproach description

#10

#wiseconf2014

Method Overview

#11

TBRTBR

FBRFBR

CRCR

LBRLBR

CL_TBRCL_TBR

CL_FBRCL_FBR

TweetTweetCL_CRCL_CR

CL_LBRCL_LBR

HCHC

LCLC

COMBCOMB

representation learning ensembling

#wiseconf2014

Representation

• TBR – binary n-grams – n-grams with tf– n-grams with tf-idf– n = 1, 2, 3 9 representations

• FBR

• CR– TBR + POS tagging

• LBR– SentiWordNet (5D): sum of scores for Noun, Verbs, Adjectives,

Adverbs, sum of keywords (#pos - #neg)– Opinion Lexicon: Sum of keywords (#pos - #neg)

#12

TBR, FBR and CR with tf < 2 were removed!

#wiseconf2014

Ensemble Learning

• Combine two types of classifiers: HC and LC• Hybrid classifier (HC) [domain-dependent]

i: tweet, c: class (pos/neg), r: TBR, FBR, LR, wr: weights defined based on accuracy on ED dataset, pr: individual classifier output

• Lexicon-based classifier (LC) [domain-independent]• If the outputs of the two classifiers agree, then

assign the agreed class to the tweet.• If the outputs disagree, use a domain-adapted

classifier (TBR) trained on the agreed tweets.

#13

#wiseconf2014

Practical aspects

• In order for the ensemble model to work well, the two types of classifiers need to be individually accurate at sufficient rates, since we trust their outputs to be used as a new training set!

• In a real-time scenario, we need to allocate some time for the domain adaptation process to take place. This should be defined on a per case basis.

#14

adaptation ensemble model operation

#wiseconf2014

Experimental Study

#15

#wiseconf2014

Datasets

• Emoticons Dataset (ED)– 250K tweets collected over 2-days (mid-march 2014) containing

happy/sad emoticons (removed retweets) [English only!]– used for feature extraction, preliminary development

• Development set (DEV)– 10K tweets (equally balanced) collected in the same way as ED

• Stanford Twitter dataset (STS)– STS-1: 177 pos, 184 neg– STS-2: 108 pos, 75 neg

• Obama Healthcare Reform (HCR)– dev-set: 135 pos, 328 neg– test-set: 116 pos, 383 neg

• Obama-McCain Debate (OMD)– subset (agreement>50%): 707 pos, 1190 neg

#16

#wiseconf2014

Model building

• Performance of TBR variants, role of POS (on 33% of ED)

• Performance of LR features

• Comparative performance of all classifiers (on DEV)

#17

#wiseconf2014

Results

#18

over-optimistic estimate of real-world performance since 90% of set used for training

more realistic estimate of real-world performance / still, far better than other out-of-domain methods

#wiseconf2014

Role of classifier agreement

• As expected, accuracy on disagreed tweets is lower– results produced using a classifier trained on “noisy” data– more challenging cases

• A similar case when training using automatically collected data based on emoticons

#19

#wiseconf2014

Conclusions & Future Work

#20

#wiseconf2014

Conclusions & Future Work

Conclusion•Ensemble model in tandem with domain adaptation is an effective strategy for tackling the cross-domain challenge!

Future Work•Incorporate the “neutral” class and evaluate•Test with more datasets•Check with additional classifiers as input

#21

#wiseconf2014

Thank you!

#22

Questions?

https://github.com/socialsensor/sentiment-analysis

http://socialsensor.eu/

[email protected] / [email protected]@warwick.ac.uk

#wiseconf2014

Key References (1/3)• Alina Andreevskaia and Sabine Bergler. When specialists and generalists work

together: Overcoming domain dependence in sentiment tagging. In ACL, pages 290-298, 2008

• Luciano Barbosa and Junlan Feng. Robust sentiment detection on twitter from biased and noisy data. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 36-44. ACL, 2010

• Adam Bermingham and Alan F Smeaton. Classifying sentiment in microblogs: is brevity an advantage? In Proceedings of the 19th ACM international conference on Information and knowledge management, pages 1833-1836. ACM, 2010

• Albert Bifet and Eibe Frank. Sentiment knowledge discovery in twitter streaming data. In Discovery Science, pages 1-15. Springer, 2010

• Johan Bollen, Huina Mao, and Alberto Pepe. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In ICWSM, 2011

• Samuel Brody and Nicholas Diakopoulos. Cooooooooooooooollllllllllllll !!!!!!!!!!!!!!: using word lengthening to detect sentiment in microblogs. In Proceedings of the Conference on EMNLP, pages 562-570. ACL, 2011.

#23

#wiseconf2014

Key References (2/3)

• Andrea Esuli and Fabrizio Sebastiani. Sentiwordnet: A publicly available lexical resource for opinion mining. In Proceedings of LREC, vol. 6, pages 417-422, 2006

• Alec Go, Richa Bhayani, and Lei Huang. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, pages 1-12, 2009

• Pollyanna Goncalves, Matheus Araujo, Fabricio Benevenuto, and Meeyoung Cha. Comparing and combining sentiment analysis methods. In Proceedings of the first ACM COSN, pages 27-38. ACM, 2013

• Minqing Hu and Bing Liu. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 168-177. ACM, 2004

• Long Jiang, Mo Yu, Ming Zhou, Xiaohua Liu, and Tiejun Zhao. Target-dependent twitter sentiment classification. In Proceedings of the 49th Annual Meeting of the ACL: HLT-Vol. 1, pages 151-160. ACL, 2011

• Jonathon Read. Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In Proceedings of the ACL Student Research Workshop, pages 43-48. ACL, 2005

• Hassan Saif, Yulan He, and Harith Alani. Semantic sentiment analysis of twitter. In The Semantic Web-ISWC 2012, pages 508-524. Springer, 2012

#24

#wiseconf2014

Key References (3/3)

• Emmanouil Schinas, Symeon Papadopoulos, Sotiris Diplaris, Yiannis Kompatsiaris, Yosi Mass, Jonathan Herzig, and Lazaros Boudakidis. Eventsense: Capturing the pulse of large-scale events by mining social media streams. In Proceedings of the 17th Panhellenic Conference on Informatics, pages 17-24. ACM, 2013

• Michael Speriosu, Nikita Sudan, Sid Upadhyay, and Jason Baldridge. Twitter polarity classification with label propagation over lexical links and the follower graph. In Proceedings of the First workshop on Unsupervised Learning in NLP, pages 53-63. ACL, 2011

• Kristina Toutanova, Dan Klein, Christopher D Manning, and Yoram Singer. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the ACL on HLT-Vol. 1, pages 173-180. ACL, 2003

• Andranik Tumasjan, Timm Oliver Sprenger, Philipp Sandner, and Isabell Welpe. Predicting elections with twitter: What 140 characters reveal about political sentiment. In ICWSM 2010, 10:178-185, 2010

• Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the conference on HLT and EMNLP, pages 347-354. ACL, 2005

• Ley Zhang, Riddhiman Ghosh, Mohamed Dekhil, Meichun Hsu, and Bing Liu. Combining Lexicon-based and learning-based methods for twitter sentiment analysis. HP Laboratories, Technical Report HPL-2011, 89, 2011.

#25