Upload
symeon-papadopoulos
View
238
Download
2
Embed Size (px)
DESCRIPTION
Presentation on polarity classification on Twitter using a new ensemble method @ WISE 2014, Thessaloniki, Greece
Citation preview
15th Web Information System Engineering (WISE 2014)Thessaloniki, 14 October 2014
An Ensemble Model for Cross-Domain PolarityClassification on TwitterAdam Tsakalidis, Symeon Papadopoulos, Yiannis Kompatsiaris
Information Technologies Institute (ITI)Center for Research & Technology Hellas (CERTH)
#wiseconf2014
#wiseconf2014 #2
Overview
• The Problem
• Existing Approaches
• Proposed Ensemble Model
• Experimental Study
• Summary – Future Work
#wiseconf2014 #3
Polarity Classificationmotivation & definition
#wiseconf2014 #4
Polarity Classification - Problem
Finding about whether a piece of text (e.g. tweet) conveys a positive or negative sentiment about a topic or entity of interest (person, party, brand, etc.)
#wiseconf2014
Polarity Classification - Applications
• Monitoring sentiment about a brand could help take appropriate measures in case of negative sentiment increase (e.g. apology, assistance, compensation).
• Aggregating over many tweets could help form an overall impression of the public opinion about the topic/entity of interest.
#5
#wiseconf2014
The Cross-Domain Challenge
• Depending on the domain (and sometimes even the topic) of interest, the vocabulary and phrasing of sentiment may vary wildly.
• Most sentiment detection approaches require manually created ground truth from the same domain to achieve high performance expensive!
• We propose a framework that leverages an ensemble of sentiment detectors to tackle the cross-domain challenge.
#6
#wiseconf2014
Related Work
#7
#wiseconf2014 #8
Background: Sentiment Analysis
Two main problems: subjectivity & polarity
RepresentationRepresentation LearningLearningTextText SentimentSentiment
NBNB SVMSVM
Hybrid- Ensemble
Hybrid- EnsembleSSLSSLUnsupervisedUnsupervisedSupervisedSupervisedOtherOtherPOSPOSBigramsBigramsUnigramsUnigrams
#wiseconf2014
Hybrid Models
Barbosa and Feng, COLING 2010 • Combines sentiment labels from three independent sources• Considers the quality of individual sources• Considers the different bias of individual sources
Gonçalves et al., COSN 2013 • First evaluates 8 different methods with respect to coverage
and agreement• Proposes a combined method, which combines individual
method outputs based on their performance (first in terms of coverage and then in terms of accuracy) on an independent dataset
#9
#wiseconf2014
Ensemble Model for Cross-Domain Polarity Classificationapproach description
#10
#wiseconf2014
Method Overview
#11
TBRTBR
FBRFBR
CRCR
LBRLBR
CL_TBRCL_TBR
CL_FBRCL_FBR
TweetTweetCL_CRCL_CR
CL_LBRCL_LBR
HCHC
LCLC
COMBCOMB
representation learning ensembling
#wiseconf2014
Representation
• TBR – binary n-grams – n-grams with tf– n-grams with tf-idf– n = 1, 2, 3 9 representations
• FBR
• CR– TBR + POS tagging
• LBR– SentiWordNet (5D): sum of scores for Noun, Verbs, Adjectives,
Adverbs, sum of keywords (#pos - #neg)– Opinion Lexicon: Sum of keywords (#pos - #neg)
#12
TBR, FBR and CR with tf < 2 were removed!
#wiseconf2014
Ensemble Learning
• Combine two types of classifiers: HC and LC• Hybrid classifier (HC) [domain-dependent]
i: tweet, c: class (pos/neg), r: TBR, FBR, LR, wr: weights defined based on accuracy on ED dataset, pr: individual classifier output
• Lexicon-based classifier (LC) [domain-independent]• If the outputs of the two classifiers agree, then
assign the agreed class to the tweet.• If the outputs disagree, use a domain-adapted
classifier (TBR) trained on the agreed tweets.
#13
#wiseconf2014
Practical aspects
• In order for the ensemble model to work well, the two types of classifiers need to be individually accurate at sufficient rates, since we trust their outputs to be used as a new training set!
• In a real-time scenario, we need to allocate some time for the domain adaptation process to take place. This should be defined on a per case basis.
#14
adaptation ensemble model operation
#wiseconf2014
Experimental Study
#15
#wiseconf2014
Datasets
• Emoticons Dataset (ED)– 250K tweets collected over 2-days (mid-march 2014) containing
happy/sad emoticons (removed retweets) [English only!]– used for feature extraction, preliminary development
• Development set (DEV)– 10K tweets (equally balanced) collected in the same way as ED
• Stanford Twitter dataset (STS)– STS-1: 177 pos, 184 neg– STS-2: 108 pos, 75 neg
• Obama Healthcare Reform (HCR)– dev-set: 135 pos, 328 neg– test-set: 116 pos, 383 neg
• Obama-McCain Debate (OMD)– subset (agreement>50%): 707 pos, 1190 neg
#16
#wiseconf2014
Model building
• Performance of TBR variants, role of POS (on 33% of ED)
• Performance of LR features
• Comparative performance of all classifiers (on DEV)
#17
#wiseconf2014
Results
#18
over-optimistic estimate of real-world performance since 90% of set used for training
more realistic estimate of real-world performance / still, far better than other out-of-domain methods
#wiseconf2014
Role of classifier agreement
• As expected, accuracy on disagreed tweets is lower– results produced using a classifier trained on “noisy” data– more challenging cases
• A similar case when training using automatically collected data based on emoticons
#19
#wiseconf2014
Conclusions & Future Work
#20
#wiseconf2014
Conclusions & Future Work
Conclusion•Ensemble model in tandem with domain adaptation is an effective strategy for tackling the cross-domain challenge!
Future Work•Incorporate the “neutral” class and evaluate•Test with more datasets•Check with additional classifiers as input
#21
#wiseconf2014
Thank you!
#22
Questions?
https://github.com/socialsensor/sentiment-analysis
http://socialsensor.eu/
[email protected] / [email protected]@warwick.ac.uk
#wiseconf2014
Key References (1/3)• Alina Andreevskaia and Sabine Bergler. When specialists and generalists work
together: Overcoming domain dependence in sentiment tagging. In ACL, pages 290-298, 2008
• Luciano Barbosa and Junlan Feng. Robust sentiment detection on twitter from biased and noisy data. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 36-44. ACL, 2010
• Adam Bermingham and Alan F Smeaton. Classifying sentiment in microblogs: is brevity an advantage? In Proceedings of the 19th ACM international conference on Information and knowledge management, pages 1833-1836. ACM, 2010
• Albert Bifet and Eibe Frank. Sentiment knowledge discovery in twitter streaming data. In Discovery Science, pages 1-15. Springer, 2010
• Johan Bollen, Huina Mao, and Alberto Pepe. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In ICWSM, 2011
• Samuel Brody and Nicholas Diakopoulos. Cooooooooooooooollllllllllllll !!!!!!!!!!!!!!: using word lengthening to detect sentiment in microblogs. In Proceedings of the Conference on EMNLP, pages 562-570. ACL, 2011.
#23
#wiseconf2014
Key References (2/3)
• Andrea Esuli and Fabrizio Sebastiani. Sentiwordnet: A publicly available lexical resource for opinion mining. In Proceedings of LREC, vol. 6, pages 417-422, 2006
• Alec Go, Richa Bhayani, and Lei Huang. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, pages 1-12, 2009
• Pollyanna Goncalves, Matheus Araujo, Fabricio Benevenuto, and Meeyoung Cha. Comparing and combining sentiment analysis methods. In Proceedings of the first ACM COSN, pages 27-38. ACM, 2013
• Minqing Hu and Bing Liu. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 168-177. ACM, 2004
• Long Jiang, Mo Yu, Ming Zhou, Xiaohua Liu, and Tiejun Zhao. Target-dependent twitter sentiment classification. In Proceedings of the 49th Annual Meeting of the ACL: HLT-Vol. 1, pages 151-160. ACL, 2011
• Jonathon Read. Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In Proceedings of the ACL Student Research Workshop, pages 43-48. ACL, 2005
• Hassan Saif, Yulan He, and Harith Alani. Semantic sentiment analysis of twitter. In The Semantic Web-ISWC 2012, pages 508-524. Springer, 2012
#24
#wiseconf2014
Key References (3/3)
• Emmanouil Schinas, Symeon Papadopoulos, Sotiris Diplaris, Yiannis Kompatsiaris, Yosi Mass, Jonathan Herzig, and Lazaros Boudakidis. Eventsense: Capturing the pulse of large-scale events by mining social media streams. In Proceedings of the 17th Panhellenic Conference on Informatics, pages 17-24. ACM, 2013
• Michael Speriosu, Nikita Sudan, Sid Upadhyay, and Jason Baldridge. Twitter polarity classification with label propagation over lexical links and the follower graph. In Proceedings of the First workshop on Unsupervised Learning in NLP, pages 53-63. ACL, 2011
• Kristina Toutanova, Dan Klein, Christopher D Manning, and Yoram Singer. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the ACL on HLT-Vol. 1, pages 173-180. ACL, 2003
• Andranik Tumasjan, Timm Oliver Sprenger, Philipp Sandner, and Isabell Welpe. Predicting elections with twitter: What 140 characters reveal about political sentiment. In ICWSM 2010, 10:178-185, 2010
• Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the conference on HLT and EMNLP, pages 347-354. ACL, 2005
• Ley Zhang, Riddhiman Ghosh, Mohamed Dekhil, Meichun Hsu, and Bing Liu. Combining Lexicon-based and learning-based methods for twitter sentiment analysis. HP Laboratories, Technical Report HPL-2011, 89, 2011.
#25