22
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features Pierpaolo Basile and Nicole Novielli [email protected], [email protected] Department of Computer Science University of Bari Aldo Moro (ITALY) EVALITA 2014 Pisa, 11th December 2014 Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. ’14 1 / 21

UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

Embed Size (px)

Citation preview

Page 1: UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

UNIBA at EVALITA 2014-SENTIPOLC TaskPredicting tweet sentiment polarity combiningmicro-blogging lexicon and semantic features

Pierpaolo Basile and Nicole Novielli

pierpaolobasileunibait nicolenovielliunibaitDepartment of Computer Science

University of Bari Aldo Moro (ITALY)

EVALITA 2014Pisa 11th December 2014

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 1 21

Introduction The Task

Task Overview

EVALITA 2014 SENTIment POLarity Classification (SENTIPOLC)

Subtask A - Subjectivity Classification classify subjectivityobjectivity of the tweetrsquoscontent

La Yamaha e buona solo ad alzar polemiche E la Honda a crearne rarr subj

Domani sera a Palazzo Chigi incontro informale tra Mario Monti e parti sociali I leaderdi Cgil Cisl Uil e Ug http bit ly rNWoB0 rarr obj

Subtask B - Polarity Classification classify tweetrsquos polarity positive negative neutraland tweets expressing both positive and negative sentiment

Maroni La politica del governo Monti e sbagliata rarr neg

Coma iniziare un buon sabato Guardandosi South Park il film rarr pos

Pilot Task Irony Detection

governo monti piu equita o piu equitalia

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 2 21

The System Our Approach

Our Approach

Supervised approach relying on three kinds of features

bull Keywords and micro-blogging properties of tweets

bull Content representation in a distributional semantic model

bull Sentiment lexicon

Our contribution

bull Represent both the tweets and the polarity classes in the word space

bull Automatically develop a sentiment lexicon for the Italian starting formSentiWordNet

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 3 21

The System Features

Building a Sentiment Lexicon for Italian

Automatic development of a sentiment lexicon starting form SentiWordNet

Lexicon-based features

bull features based on sentiment scores of tokens

bull sentiment variation in the tweet

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 4 21

The System Features

Building the Word Space from Unlabeled Tweets

Homogenous representation of both tweets and polarity classes in the word space

Distributional features

bull tweet vectorminusrarrt as semantic feature in training our classifiers

bull similarity betweenminusrarrt and each prototype vector

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 5 21

The System Features

Building the Word Space from Unlabeled Tweets

Homogenous representation of both the tweets and the polarity classes in the word space

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 6 21

The System Features

Tweets downloading

Idea

Using four lexicons extracted from training data for each class (pos neg subj obj)

Probability assigned to each token

P(t|ci ) =t + 1

toti + |V | (1)

Rank tokens in descending order according to the Kullback-Leibler divergence (KLD)

KLD = P(t|cs) lowast logP(t|cs)P(t|co)

(2)

Top terms (50) in the rank for each class are used as seeds for downloading the samenumber of tweets for each lexicon

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 7 21

The System Features

Keywords and Micro-blogging features

Keywords

bull exploit tokens occurring in the tweets (unigrams)

bull replace the user mentions URLs and hashtags with three metatokens ldquo USER rdquoldquo URL rdquo and ldquo TAG rdquo

Micro-blogging

bull the use of upper case and character repetitions

bull positive and negative emoticons

bull informal expressions of laughters (ie sequences of ldquoahrdquo)

bull the presence of exclamation and interrogative marks

bull occurrences of adversative words disjunctive words conclusive words andexplicative words

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 8 21

Learning

Learning step

bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm

bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21

Learning

Learning step

bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm

bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21

System Setup

System Setup overview

bull Learning algorithm

bull SVM with the RBF kernelbull C=4 selected after a 10-folds validation on training databull total number of features 12117

bull Completely developed in JAVA

bull Weka library is adopted for learning

bull Tweets are tokenized using ldquoTwitter NLP and Part-of-Speech Taggingrdquo

bull Word space download 10 million tweets using the Twitter Streaming API and

build the space using ldquoword2vecrdquo

bull continuous Bag-of-Words Model (CBOW)bull 200 vector dimensionsbull remove the terms with less than ten occurrences (about 200000 terms

overall)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 10 21

Evaluation

Evaluation

Evaluating systems on their ability in

bull Task A decide whether a given tweet is subjective or objective

bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)

Dataset

bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)

bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)

bull Metrics systems are compared against the gold standard in terms of F measure

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21

Evaluation Results

Results

Setting Task F Rank Imp

baselineTask A 04005 - -Task B 03718 - -

constrainedTask A 07140 1 78Task B 06771 1 82

unconstrainedTask A 06892 2 72Task B 06638 1 79

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21

Evaluation Discussion

Results Discussion

bull The system always obtains the best task performance in all settings

bull Co-training approach seems to introduce noise

bull Deep error analysis

bull the co-training system slightly improves the performance in classifyingpositive tweets

bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)

bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions

bull ablation test investigate the predictive power of the features in ourmodel

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21

Evaluation Ablation test

Ablation Test

Decrease of F by removing each feature group compared to the complete feature setting

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21

Conclusions and Future Work

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21

Conclusions and Future Work

Conclusions and Future Work

Conclusions

bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance

bull Semantic features (distributional approach) have more predictive power

Future Work

bull Validate and generalize our findings further data and languages

bull Fix some issues in co-training approach

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21

Thank you

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21

For Further Reading

For Further Reading I

Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task

In Proc of EVALITA 2014 Pisa Italy (2014)

Basile V Nissim M Sentiment analysis on italian tweets

In Proc of WASSA 2013 pp 100ndash107 (2013)

Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21

For Further Reading

For Further Reading II

Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining

In Proc of LREC pp 417ndash422 (2006)

Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision

Processing pp 1ndash6 (2009)

Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth

J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)

Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg

In Proc of ICWSM 2011 pp 538ndash541 (2011)

Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space

In Proc of ICLR Work (2013)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21

For Further Reading

For Further Reading III

OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series

In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)

Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining

In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)

Pang B Lee L Opinion mining and sentiment analysis

Found Trends Inf Retr 2(1-2) 1ndash135 (2008)

Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database

In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21

For Further Reading

For Further Reading IV

Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems

Artificial Intelligence 46(1-2) 159ndash216 (1990)

Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter

In Proc of COLING 2014 (2014)

Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis

Comput Linguist 35(3) 399ndash433 (2009)

Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language

Proc of the Corpus Linguistics Conf 2005 (2005)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21

  • Introduction
    • The Task
      • The System
        • Our Approach
        • Features
          • Learning
          • System Setup
          • Evaluation
            • Results
            • Discussion
            • Ablation test
              • Conclusions and Future Work
Page 2: UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

Introduction The Task

Task Overview

EVALITA 2014 SENTIment POLarity Classification (SENTIPOLC)

Subtask A - Subjectivity Classification classify subjectivityobjectivity of the tweetrsquoscontent

La Yamaha e buona solo ad alzar polemiche E la Honda a crearne rarr subj

Domani sera a Palazzo Chigi incontro informale tra Mario Monti e parti sociali I leaderdi Cgil Cisl Uil e Ug http bit ly rNWoB0 rarr obj

Subtask B - Polarity Classification classify tweetrsquos polarity positive negative neutraland tweets expressing both positive and negative sentiment

Maroni La politica del governo Monti e sbagliata rarr neg

Coma iniziare un buon sabato Guardandosi South Park il film rarr pos

Pilot Task Irony Detection

governo monti piu equita o piu equitalia

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 2 21

The System Our Approach

Our Approach

Supervised approach relying on three kinds of features

bull Keywords and micro-blogging properties of tweets

bull Content representation in a distributional semantic model

bull Sentiment lexicon

Our contribution

bull Represent both the tweets and the polarity classes in the word space

bull Automatically develop a sentiment lexicon for the Italian starting formSentiWordNet

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 3 21

The System Features

Building a Sentiment Lexicon for Italian

Automatic development of a sentiment lexicon starting form SentiWordNet

Lexicon-based features

bull features based on sentiment scores of tokens

bull sentiment variation in the tweet

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 4 21

The System Features

Building the Word Space from Unlabeled Tweets

Homogenous representation of both tweets and polarity classes in the word space

Distributional features

bull tweet vectorminusrarrt as semantic feature in training our classifiers

bull similarity betweenminusrarrt and each prototype vector

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 5 21

The System Features

Building the Word Space from Unlabeled Tweets

Homogenous representation of both the tweets and the polarity classes in the word space

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 6 21

The System Features

Tweets downloading

Idea

Using four lexicons extracted from training data for each class (pos neg subj obj)

Probability assigned to each token

P(t|ci ) =t + 1

toti + |V | (1)

Rank tokens in descending order according to the Kullback-Leibler divergence (KLD)

KLD = P(t|cs) lowast logP(t|cs)P(t|co)

(2)

Top terms (50) in the rank for each class are used as seeds for downloading the samenumber of tweets for each lexicon

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 7 21

The System Features

Keywords and Micro-blogging features

Keywords

bull exploit tokens occurring in the tweets (unigrams)

bull replace the user mentions URLs and hashtags with three metatokens ldquo USER rdquoldquo URL rdquo and ldquo TAG rdquo

Micro-blogging

bull the use of upper case and character repetitions

bull positive and negative emoticons

bull informal expressions of laughters (ie sequences of ldquoahrdquo)

bull the presence of exclamation and interrogative marks

bull occurrences of adversative words disjunctive words conclusive words andexplicative words

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 8 21

Learning

Learning step

bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm

bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21

Learning

Learning step

bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm

bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21

System Setup

System Setup overview

bull Learning algorithm

bull SVM with the RBF kernelbull C=4 selected after a 10-folds validation on training databull total number of features 12117

bull Completely developed in JAVA

bull Weka library is adopted for learning

bull Tweets are tokenized using ldquoTwitter NLP and Part-of-Speech Taggingrdquo

bull Word space download 10 million tweets using the Twitter Streaming API and

build the space using ldquoword2vecrdquo

bull continuous Bag-of-Words Model (CBOW)bull 200 vector dimensionsbull remove the terms with less than ten occurrences (about 200000 terms

overall)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 10 21

Evaluation

Evaluation

Evaluating systems on their ability in

bull Task A decide whether a given tweet is subjective or objective

bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)

Dataset

bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)

bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)

bull Metrics systems are compared against the gold standard in terms of F measure

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21

Evaluation Results

Results

Setting Task F Rank Imp

baselineTask A 04005 - -Task B 03718 - -

constrainedTask A 07140 1 78Task B 06771 1 82

unconstrainedTask A 06892 2 72Task B 06638 1 79

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21

Evaluation Discussion

Results Discussion

bull The system always obtains the best task performance in all settings

bull Co-training approach seems to introduce noise

bull Deep error analysis

bull the co-training system slightly improves the performance in classifyingpositive tweets

bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)

bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions

bull ablation test investigate the predictive power of the features in ourmodel

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21

Evaluation Ablation test

Ablation Test

Decrease of F by removing each feature group compared to the complete feature setting

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21

Conclusions and Future Work

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21

Conclusions and Future Work

Conclusions and Future Work

Conclusions

bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance

bull Semantic features (distributional approach) have more predictive power

Future Work

bull Validate and generalize our findings further data and languages

bull Fix some issues in co-training approach

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21

Thank you

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21

For Further Reading

For Further Reading I

Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task

In Proc of EVALITA 2014 Pisa Italy (2014)

Basile V Nissim M Sentiment analysis on italian tweets

In Proc of WASSA 2013 pp 100ndash107 (2013)

Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21

For Further Reading

For Further Reading II

Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining

In Proc of LREC pp 417ndash422 (2006)

Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision

Processing pp 1ndash6 (2009)

Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth

J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)

Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg

In Proc of ICWSM 2011 pp 538ndash541 (2011)

Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space

In Proc of ICLR Work (2013)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21

For Further Reading

For Further Reading III

OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series

In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)

Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining

In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)

Pang B Lee L Opinion mining and sentiment analysis

Found Trends Inf Retr 2(1-2) 1ndash135 (2008)

Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database

In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21

For Further Reading

For Further Reading IV

Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems

Artificial Intelligence 46(1-2) 159ndash216 (1990)

Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter

In Proc of COLING 2014 (2014)

Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis

Comput Linguist 35(3) 399ndash433 (2009)

Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language

Proc of the Corpus Linguistics Conf 2005 (2005)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21

  • Introduction
    • The Task
      • The System
        • Our Approach
        • Features
          • Learning
          • System Setup
          • Evaluation
            • Results
            • Discussion
            • Ablation test
              • Conclusions and Future Work
Page 3: UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

The System Our Approach

Our Approach

Supervised approach relying on three kinds of features

bull Keywords and micro-blogging properties of tweets

bull Content representation in a distributional semantic model

bull Sentiment lexicon

Our contribution

bull Represent both the tweets and the polarity classes in the word space

bull Automatically develop a sentiment lexicon for the Italian starting formSentiWordNet

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 3 21

The System Features

Building a Sentiment Lexicon for Italian

Automatic development of a sentiment lexicon starting form SentiWordNet

Lexicon-based features

bull features based on sentiment scores of tokens

bull sentiment variation in the tweet

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 4 21

The System Features

Building the Word Space from Unlabeled Tweets

Homogenous representation of both tweets and polarity classes in the word space

Distributional features

bull tweet vectorminusrarrt as semantic feature in training our classifiers

bull similarity betweenminusrarrt and each prototype vector

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 5 21

The System Features

Building the Word Space from Unlabeled Tweets

Homogenous representation of both the tweets and the polarity classes in the word space

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 6 21

The System Features

Tweets downloading

Idea

Using four lexicons extracted from training data for each class (pos neg subj obj)

Probability assigned to each token

P(t|ci ) =t + 1

toti + |V | (1)

Rank tokens in descending order according to the Kullback-Leibler divergence (KLD)

KLD = P(t|cs) lowast logP(t|cs)P(t|co)

(2)

Top terms (50) in the rank for each class are used as seeds for downloading the samenumber of tweets for each lexicon

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 7 21

The System Features

Keywords and Micro-blogging features

Keywords

bull exploit tokens occurring in the tweets (unigrams)

bull replace the user mentions URLs and hashtags with three metatokens ldquo USER rdquoldquo URL rdquo and ldquo TAG rdquo

Micro-blogging

bull the use of upper case and character repetitions

bull positive and negative emoticons

bull informal expressions of laughters (ie sequences of ldquoahrdquo)

bull the presence of exclamation and interrogative marks

bull occurrences of adversative words disjunctive words conclusive words andexplicative words

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 8 21

Learning

Learning step

bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm

bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21

Learning

Learning step

bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm

bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21

System Setup

System Setup overview

bull Learning algorithm

bull SVM with the RBF kernelbull C=4 selected after a 10-folds validation on training databull total number of features 12117

bull Completely developed in JAVA

bull Weka library is adopted for learning

bull Tweets are tokenized using ldquoTwitter NLP and Part-of-Speech Taggingrdquo

bull Word space download 10 million tweets using the Twitter Streaming API and

build the space using ldquoword2vecrdquo

bull continuous Bag-of-Words Model (CBOW)bull 200 vector dimensionsbull remove the terms with less than ten occurrences (about 200000 terms

overall)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 10 21

Evaluation

Evaluation

Evaluating systems on their ability in

bull Task A decide whether a given tweet is subjective or objective

bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)

Dataset

bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)

bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)

bull Metrics systems are compared against the gold standard in terms of F measure

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21

Evaluation Results

Results

Setting Task F Rank Imp

baselineTask A 04005 - -Task B 03718 - -

constrainedTask A 07140 1 78Task B 06771 1 82

unconstrainedTask A 06892 2 72Task B 06638 1 79

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21

Evaluation Discussion

Results Discussion

bull The system always obtains the best task performance in all settings

bull Co-training approach seems to introduce noise

bull Deep error analysis

bull the co-training system slightly improves the performance in classifyingpositive tweets

bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)

bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions

bull ablation test investigate the predictive power of the features in ourmodel

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21

Evaluation Ablation test

Ablation Test

Decrease of F by removing each feature group compared to the complete feature setting

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21

Conclusions and Future Work

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21

Conclusions and Future Work

Conclusions and Future Work

Conclusions

bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance

bull Semantic features (distributional approach) have more predictive power

Future Work

bull Validate and generalize our findings further data and languages

bull Fix some issues in co-training approach

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21

Thank you

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21

For Further Reading

For Further Reading I

Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task

In Proc of EVALITA 2014 Pisa Italy (2014)

Basile V Nissim M Sentiment analysis on italian tweets

In Proc of WASSA 2013 pp 100ndash107 (2013)

Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21

For Further Reading

For Further Reading II

Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining

In Proc of LREC pp 417ndash422 (2006)

Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision

Processing pp 1ndash6 (2009)

Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth

J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)

Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg

In Proc of ICWSM 2011 pp 538ndash541 (2011)

Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space

In Proc of ICLR Work (2013)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21

For Further Reading

For Further Reading III

OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series

In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)

Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining

In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)

Pang B Lee L Opinion mining and sentiment analysis

Found Trends Inf Retr 2(1-2) 1ndash135 (2008)

Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database

In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21

For Further Reading

For Further Reading IV

Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems

Artificial Intelligence 46(1-2) 159ndash216 (1990)

Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter

In Proc of COLING 2014 (2014)

Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis

Comput Linguist 35(3) 399ndash433 (2009)

Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language

Proc of the Corpus Linguistics Conf 2005 (2005)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21

  • Introduction
    • The Task
      • The System
        • Our Approach
        • Features
          • Learning
          • System Setup
          • Evaluation
            • Results
            • Discussion
            • Ablation test
              • Conclusions and Future Work
Page 4: UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

The System Features

Building a Sentiment Lexicon for Italian

Automatic development of a sentiment lexicon starting form SentiWordNet

Lexicon-based features

bull features based on sentiment scores of tokens

bull sentiment variation in the tweet

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 4 21

The System Features

Building the Word Space from Unlabeled Tweets

Homogenous representation of both tweets and polarity classes in the word space

Distributional features

bull tweet vectorminusrarrt as semantic feature in training our classifiers

bull similarity betweenminusrarrt and each prototype vector

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 5 21

The System Features

Building the Word Space from Unlabeled Tweets

Homogenous representation of both the tweets and the polarity classes in the word space

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 6 21

The System Features

Tweets downloading

Idea

Using four lexicons extracted from training data for each class (pos neg subj obj)

Probability assigned to each token

P(t|ci ) =t + 1

toti + |V | (1)

Rank tokens in descending order according to the Kullback-Leibler divergence (KLD)

KLD = P(t|cs) lowast logP(t|cs)P(t|co)

(2)

Top terms (50) in the rank for each class are used as seeds for downloading the samenumber of tweets for each lexicon

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 7 21

The System Features

Keywords and Micro-blogging features

Keywords

bull exploit tokens occurring in the tweets (unigrams)

bull replace the user mentions URLs and hashtags with three metatokens ldquo USER rdquoldquo URL rdquo and ldquo TAG rdquo

Micro-blogging

bull the use of upper case and character repetitions

bull positive and negative emoticons

bull informal expressions of laughters (ie sequences of ldquoahrdquo)

bull the presence of exclamation and interrogative marks

bull occurrences of adversative words disjunctive words conclusive words andexplicative words

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 8 21

Learning

Learning step

bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm

bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21

Learning

Learning step

bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm

bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21

System Setup

System Setup overview

bull Learning algorithm

bull SVM with the RBF kernelbull C=4 selected after a 10-folds validation on training databull total number of features 12117

bull Completely developed in JAVA

bull Weka library is adopted for learning

bull Tweets are tokenized using ldquoTwitter NLP and Part-of-Speech Taggingrdquo

bull Word space download 10 million tweets using the Twitter Streaming API and

build the space using ldquoword2vecrdquo

bull continuous Bag-of-Words Model (CBOW)bull 200 vector dimensionsbull remove the terms with less than ten occurrences (about 200000 terms

overall)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 10 21

Evaluation

Evaluation

Evaluating systems on their ability in

bull Task A decide whether a given tweet is subjective or objective

bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)

Dataset

bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)

bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)

bull Metrics systems are compared against the gold standard in terms of F measure

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21

Evaluation Results

Results

Setting Task F Rank Imp

baselineTask A 04005 - -Task B 03718 - -

constrainedTask A 07140 1 78Task B 06771 1 82

unconstrainedTask A 06892 2 72Task B 06638 1 79

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21

Evaluation Discussion

Results Discussion

bull The system always obtains the best task performance in all settings

bull Co-training approach seems to introduce noise

bull Deep error analysis

bull the co-training system slightly improves the performance in classifyingpositive tweets

bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)

bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions

bull ablation test investigate the predictive power of the features in ourmodel

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21

Evaluation Ablation test

Ablation Test

Decrease of F by removing each feature group compared to the complete feature setting

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21

Conclusions and Future Work

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21

Conclusions and Future Work

Conclusions and Future Work

Conclusions

bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance

bull Semantic features (distributional approach) have more predictive power

Future Work

bull Validate and generalize our findings further data and languages

bull Fix some issues in co-training approach

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21

Thank you

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21

For Further Reading

For Further Reading I

Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task

In Proc of EVALITA 2014 Pisa Italy (2014)

Basile V Nissim M Sentiment analysis on italian tweets

In Proc of WASSA 2013 pp 100ndash107 (2013)

Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21

For Further Reading

For Further Reading II

Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining

In Proc of LREC pp 417ndash422 (2006)

Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision

Processing pp 1ndash6 (2009)

Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth

J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)

Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg

In Proc of ICWSM 2011 pp 538ndash541 (2011)

Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space

In Proc of ICLR Work (2013)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21

For Further Reading

For Further Reading III

OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series

In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)

Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining

In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)

Pang B Lee L Opinion mining and sentiment analysis

Found Trends Inf Retr 2(1-2) 1ndash135 (2008)

Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database

In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21

For Further Reading

For Further Reading IV

Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems

Artificial Intelligence 46(1-2) 159ndash216 (1990)

Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter

In Proc of COLING 2014 (2014)

Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis

Comput Linguist 35(3) 399ndash433 (2009)

Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language

Proc of the Corpus Linguistics Conf 2005 (2005)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21

  • Introduction
    • The Task
      • The System
        • Our Approach
        • Features
          • Learning
          • System Setup
          • Evaluation
            • Results
            • Discussion
            • Ablation test
              • Conclusions and Future Work
Page 5: UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

The System Features

Building the Word Space from Unlabeled Tweets

Homogenous representation of both tweets and polarity classes in the word space

Distributional features

bull tweet vectorminusrarrt as semantic feature in training our classifiers

bull similarity betweenminusrarrt and each prototype vector

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 5 21

The System Features

Building the Word Space from Unlabeled Tweets

Homogenous representation of both the tweets and the polarity classes in the word space

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 6 21

The System Features

Tweets downloading

Idea

Using four lexicons extracted from training data for each class (pos neg subj obj)

Probability assigned to each token

P(t|ci ) =t + 1

toti + |V | (1)

Rank tokens in descending order according to the Kullback-Leibler divergence (KLD)

KLD = P(t|cs) lowast logP(t|cs)P(t|co)

(2)

Top terms (50) in the rank for each class are used as seeds for downloading the samenumber of tweets for each lexicon

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 7 21

The System Features

Keywords and Micro-blogging features

Keywords

bull exploit tokens occurring in the tweets (unigrams)

bull replace the user mentions URLs and hashtags with three metatokens ldquo USER rdquoldquo URL rdquo and ldquo TAG rdquo

Micro-blogging

bull the use of upper case and character repetitions

bull positive and negative emoticons

bull informal expressions of laughters (ie sequences of ldquoahrdquo)

bull the presence of exclamation and interrogative marks

bull occurrences of adversative words disjunctive words conclusive words andexplicative words

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 8 21

Learning

Learning step

bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm

bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21

Learning

Learning step

bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm

bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21

System Setup

System Setup overview

bull Learning algorithm

bull SVM with the RBF kernelbull C=4 selected after a 10-folds validation on training databull total number of features 12117

bull Completely developed in JAVA

bull Weka library is adopted for learning

bull Tweets are tokenized using ldquoTwitter NLP and Part-of-Speech Taggingrdquo

bull Word space download 10 million tweets using the Twitter Streaming API and

build the space using ldquoword2vecrdquo

bull continuous Bag-of-Words Model (CBOW)bull 200 vector dimensionsbull remove the terms with less than ten occurrences (about 200000 terms

overall)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 10 21

Evaluation

Evaluation

Evaluating systems on their ability in

bull Task A decide whether a given tweet is subjective or objective

bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)

Dataset

bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)

bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)

bull Metrics systems are compared against the gold standard in terms of F measure

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21

Evaluation Results

Results

Setting Task F Rank Imp

baselineTask A 04005 - -Task B 03718 - -

constrainedTask A 07140 1 78Task B 06771 1 82

unconstrainedTask A 06892 2 72Task B 06638 1 79

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21

Evaluation Discussion

Results Discussion

bull The system always obtains the best task performance in all settings

bull Co-training approach seems to introduce noise

bull Deep error analysis

bull the co-training system slightly improves the performance in classifyingpositive tweets

bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)

bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions

bull ablation test investigate the predictive power of the features in ourmodel

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21

Evaluation Ablation test

Ablation Test

Decrease of F by removing each feature group compared to the complete feature setting

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21

Conclusions and Future Work

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21

Conclusions and Future Work

Conclusions and Future Work

Conclusions

bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance

bull Semantic features (distributional approach) have more predictive power

Future Work

bull Validate and generalize our findings further data and languages

bull Fix some issues in co-training approach

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21

Thank you

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21

For Further Reading

For Further Reading I

Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task

In Proc of EVALITA 2014 Pisa Italy (2014)

Basile V Nissim M Sentiment analysis on italian tweets

In Proc of WASSA 2013 pp 100ndash107 (2013)

Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21

For Further Reading

For Further Reading II

Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining

In Proc of LREC pp 417ndash422 (2006)

Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision

Processing pp 1ndash6 (2009)

Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth

J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)

Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg

In Proc of ICWSM 2011 pp 538ndash541 (2011)

Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space

In Proc of ICLR Work (2013)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21

For Further Reading

For Further Reading III

OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series

In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)

Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining

In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)

Pang B Lee L Opinion mining and sentiment analysis

Found Trends Inf Retr 2(1-2) 1ndash135 (2008)

Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database

In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21

For Further Reading

For Further Reading IV

Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems

Artificial Intelligence 46(1-2) 159ndash216 (1990)

Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter

In Proc of COLING 2014 (2014)

Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis

Comput Linguist 35(3) 399ndash433 (2009)

Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language

Proc of the Corpus Linguistics Conf 2005 (2005)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21

  • Introduction
    • The Task
      • The System
        • Our Approach
        • Features
          • Learning
          • System Setup
          • Evaluation
            • Results
            • Discussion
            • Ablation test
              • Conclusions and Future Work
Page 6: UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

The System Features

Building the Word Space from Unlabeled Tweets

Homogenous representation of both the tweets and the polarity classes in the word space

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 6 21

The System Features

Tweets downloading

Idea

Using four lexicons extracted from training data for each class (pos neg subj obj)

Probability assigned to each token

P(t|ci ) =t + 1

toti + |V | (1)

Rank tokens in descending order according to the Kullback-Leibler divergence (KLD)

KLD = P(t|cs) lowast logP(t|cs)P(t|co)

(2)

Top terms (50) in the rank for each class are used as seeds for downloading the samenumber of tweets for each lexicon

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 7 21

The System Features

Keywords and Micro-blogging features

Keywords

bull exploit tokens occurring in the tweets (unigrams)

bull replace the user mentions URLs and hashtags with three metatokens ldquo USER rdquoldquo URL rdquo and ldquo TAG rdquo

Micro-blogging

bull the use of upper case and character repetitions

bull positive and negative emoticons

bull informal expressions of laughters (ie sequences of ldquoahrdquo)

bull the presence of exclamation and interrogative marks

bull occurrences of adversative words disjunctive words conclusive words andexplicative words

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 8 21

Learning

Learning step

bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm

bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21

Learning

Learning step

bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm

bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21

System Setup

System Setup overview

bull Learning algorithm

bull SVM with the RBF kernelbull C=4 selected after a 10-folds validation on training databull total number of features 12117

bull Completely developed in JAVA

bull Weka library is adopted for learning

bull Tweets are tokenized using ldquoTwitter NLP and Part-of-Speech Taggingrdquo

bull Word space download 10 million tweets using the Twitter Streaming API and

build the space using ldquoword2vecrdquo

bull continuous Bag-of-Words Model (CBOW)bull 200 vector dimensionsbull remove the terms with less than ten occurrences (about 200000 terms

overall)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 10 21

Evaluation

Evaluation

Evaluating systems on their ability in

bull Task A decide whether a given tweet is subjective or objective

bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)

Dataset

bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)

bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)

bull Metrics systems are compared against the gold standard in terms of F measure

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21

Evaluation Results

Results

Setting Task F Rank Imp

baselineTask A 04005 - -Task B 03718 - -

constrainedTask A 07140 1 78Task B 06771 1 82

unconstrainedTask A 06892 2 72Task B 06638 1 79

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21

Evaluation Discussion

Results Discussion

bull The system always obtains the best task performance in all settings

bull Co-training approach seems to introduce noise

bull Deep error analysis

bull the co-training system slightly improves the performance in classifyingpositive tweets

bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)

bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions

bull ablation test investigate the predictive power of the features in ourmodel

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21

Evaluation Ablation test

Ablation Test

Decrease of F by removing each feature group compared to the complete feature setting

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21

Conclusions and Future Work

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21

Conclusions and Future Work

Conclusions and Future Work

Conclusions

bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance

bull Semantic features (distributional approach) have more predictive power

Future Work

bull Validate and generalize our findings further data and languages

bull Fix some issues in co-training approach

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21

Thank you

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21

For Further Reading

For Further Reading I

Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task

In Proc of EVALITA 2014 Pisa Italy (2014)

Basile V Nissim M Sentiment analysis on italian tweets

In Proc of WASSA 2013 pp 100ndash107 (2013)

Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21

For Further Reading

For Further Reading II

Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining

In Proc of LREC pp 417ndash422 (2006)

Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision

Processing pp 1ndash6 (2009)

Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth

J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)

Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg

In Proc of ICWSM 2011 pp 538ndash541 (2011)

Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space

In Proc of ICLR Work (2013)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21

For Further Reading

For Further Reading III

OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series

In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)

Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining

In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)

Pang B Lee L Opinion mining and sentiment analysis

Found Trends Inf Retr 2(1-2) 1ndash135 (2008)

Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database

In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21

For Further Reading

For Further Reading IV

Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems

Artificial Intelligence 46(1-2) 159ndash216 (1990)

Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter

In Proc of COLING 2014 (2014)

Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis

Comput Linguist 35(3) 399ndash433 (2009)

Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language

Proc of the Corpus Linguistics Conf 2005 (2005)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21

  • Introduction
    • The Task
      • The System
        • Our Approach
        • Features
          • Learning
          • System Setup
          • Evaluation
            • Results
            • Discussion
            • Ablation test
              • Conclusions and Future Work
Page 7: UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

The System Features

Tweets downloading

Idea

Using four lexicons extracted from training data for each class (pos neg subj obj)

Probability assigned to each token

P(t|ci ) =t + 1

toti + |V | (1)

Rank tokens in descending order according to the Kullback-Leibler divergence (KLD)

KLD = P(t|cs) lowast logP(t|cs)P(t|co)

(2)

Top terms (50) in the rank for each class are used as seeds for downloading the samenumber of tweets for each lexicon

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 7 21

The System Features

Keywords and Micro-blogging features

Keywords

bull exploit tokens occurring in the tweets (unigrams)

bull replace the user mentions URLs and hashtags with three metatokens ldquo USER rdquoldquo URL rdquo and ldquo TAG rdquo

Micro-blogging

bull the use of upper case and character repetitions

bull positive and negative emoticons

bull informal expressions of laughters (ie sequences of ldquoahrdquo)

bull the presence of exclamation and interrogative marks

bull occurrences of adversative words disjunctive words conclusive words andexplicative words

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 8 21

Learning

Learning step

bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm

bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21

Learning

Learning step

bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm

bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21

System Setup

System Setup overview

bull Learning algorithm

bull SVM with the RBF kernelbull C=4 selected after a 10-folds validation on training databull total number of features 12117

bull Completely developed in JAVA

bull Weka library is adopted for learning

bull Tweets are tokenized using ldquoTwitter NLP and Part-of-Speech Taggingrdquo

bull Word space download 10 million tweets using the Twitter Streaming API and

build the space using ldquoword2vecrdquo

bull continuous Bag-of-Words Model (CBOW)bull 200 vector dimensionsbull remove the terms with less than ten occurrences (about 200000 terms

overall)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 10 21

Evaluation

Evaluation

Evaluating systems on their ability in

bull Task A decide whether a given tweet is subjective or objective

bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)

Dataset

bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)

bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)

bull Metrics systems are compared against the gold standard in terms of F measure

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21

Evaluation Results

Results

Setting Task F Rank Imp

baselineTask A 04005 - -Task B 03718 - -

constrainedTask A 07140 1 78Task B 06771 1 82

unconstrainedTask A 06892 2 72Task B 06638 1 79

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21

Evaluation Discussion

Results Discussion

bull The system always obtains the best task performance in all settings

bull Co-training approach seems to introduce noise

bull Deep error analysis

bull the co-training system slightly improves the performance in classifyingpositive tweets

bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)

bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions

bull ablation test investigate the predictive power of the features in ourmodel

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21

Evaluation Ablation test

Ablation Test

Decrease of F by removing each feature group compared to the complete feature setting

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21

Conclusions and Future Work

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21

Conclusions and Future Work

Conclusions and Future Work

Conclusions

bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance

bull Semantic features (distributional approach) have more predictive power

Future Work

bull Validate and generalize our findings further data and languages

bull Fix some issues in co-training approach

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21

Thank you

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21

For Further Reading

For Further Reading I

Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task

In Proc of EVALITA 2014 Pisa Italy (2014)

Basile V Nissim M Sentiment analysis on italian tweets

In Proc of WASSA 2013 pp 100ndash107 (2013)

Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21

For Further Reading

For Further Reading II

Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining

In Proc of LREC pp 417ndash422 (2006)

Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision

Processing pp 1ndash6 (2009)

Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth

J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)

Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg

In Proc of ICWSM 2011 pp 538ndash541 (2011)

Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space

In Proc of ICLR Work (2013)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21

For Further Reading

For Further Reading III

OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series

In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)

Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining

In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)

Pang B Lee L Opinion mining and sentiment analysis

Found Trends Inf Retr 2(1-2) 1ndash135 (2008)

Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database

In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21

For Further Reading

For Further Reading IV

Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems

Artificial Intelligence 46(1-2) 159ndash216 (1990)

Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter

In Proc of COLING 2014 (2014)

Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis

Comput Linguist 35(3) 399ndash433 (2009)

Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language

Proc of the Corpus Linguistics Conf 2005 (2005)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21

  • Introduction
    • The Task
      • The System
        • Our Approach
        • Features
          • Learning
          • System Setup
          • Evaluation
            • Results
            • Discussion
            • Ablation test
              • Conclusions and Future Work
Page 8: UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

The System Features

Keywords and Micro-blogging features

Keywords

bull exploit tokens occurring in the tweets (unigrams)

bull replace the user mentions URLs and hashtags with three metatokens ldquo USER rdquoldquo URL rdquo and ldquo TAG rdquo

Micro-blogging

bull the use of upper case and character repetitions

bull positive and negative emoticons

bull informal expressions of laughters (ie sequences of ldquoahrdquo)

bull the presence of exclamation and interrogative marks

bull occurrences of adversative words disjunctive words conclusive words andexplicative words

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 8 21

Learning

Learning step

bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm

bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21

Learning

Learning step

bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm

bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21

System Setup

System Setup overview

bull Learning algorithm

bull SVM with the RBF kernelbull C=4 selected after a 10-folds validation on training databull total number of features 12117

bull Completely developed in JAVA

bull Weka library is adopted for learning

bull Tweets are tokenized using ldquoTwitter NLP and Part-of-Speech Taggingrdquo

bull Word space download 10 million tweets using the Twitter Streaming API and

build the space using ldquoword2vecrdquo

bull continuous Bag-of-Words Model (CBOW)bull 200 vector dimensionsbull remove the terms with less than ten occurrences (about 200000 terms

overall)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 10 21

Evaluation

Evaluation

Evaluating systems on their ability in

bull Task A decide whether a given tweet is subjective or objective

bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)

Dataset

bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)

bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)

bull Metrics systems are compared against the gold standard in terms of F measure

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21

Evaluation Results

Results

Setting Task F Rank Imp

baselineTask A 04005 - -Task B 03718 - -

constrainedTask A 07140 1 78Task B 06771 1 82

unconstrainedTask A 06892 2 72Task B 06638 1 79

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21

Evaluation Discussion

Results Discussion

bull The system always obtains the best task performance in all settings

bull Co-training approach seems to introduce noise

bull Deep error analysis

bull the co-training system slightly improves the performance in classifyingpositive tweets

bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)

bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions

bull ablation test investigate the predictive power of the features in ourmodel

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21

Evaluation Ablation test

Ablation Test

Decrease of F by removing each feature group compared to the complete feature setting

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21

Conclusions and Future Work

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21

Conclusions and Future Work

Conclusions and Future Work

Conclusions

bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance

bull Semantic features (distributional approach) have more predictive power

Future Work

bull Validate and generalize our findings further data and languages

bull Fix some issues in co-training approach

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21

Thank you

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21

For Further Reading

For Further Reading I

Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task

In Proc of EVALITA 2014 Pisa Italy (2014)

Basile V Nissim M Sentiment analysis on italian tweets

In Proc of WASSA 2013 pp 100ndash107 (2013)

Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21

For Further Reading

For Further Reading II

Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining

In Proc of LREC pp 417ndash422 (2006)

Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision

Processing pp 1ndash6 (2009)

Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth

J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)

Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg

In Proc of ICWSM 2011 pp 538ndash541 (2011)

Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space

In Proc of ICLR Work (2013)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21

For Further Reading

For Further Reading III

OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series

In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)

Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining

In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)

Pang B Lee L Opinion mining and sentiment analysis

Found Trends Inf Retr 2(1-2) 1ndash135 (2008)

Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database

In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21

For Further Reading

For Further Reading IV

Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems

Artificial Intelligence 46(1-2) 159ndash216 (1990)

Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter

In Proc of COLING 2014 (2014)

Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis

Comput Linguist 35(3) 399ndash433 (2009)

Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language

Proc of the Corpus Linguistics Conf 2005 (2005)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21

  • Introduction
    • The Task
      • The System
        • Our Approach
        • Features
          • Learning
          • System Setup
          • Evaluation
            • Results
            • Discussion
            • Ablation test
              • Conclusions and Future Work
Page 9: UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

Learning

Learning step

bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm

bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21

Learning

Learning step

bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm

bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21

System Setup

System Setup overview

bull Learning algorithm

bull SVM with the RBF kernelbull C=4 selected after a 10-folds validation on training databull total number of features 12117

bull Completely developed in JAVA

bull Weka library is adopted for learning

bull Tweets are tokenized using ldquoTwitter NLP and Part-of-Speech Taggingrdquo

bull Word space download 10 million tweets using the Twitter Streaming API and

build the space using ldquoword2vecrdquo

bull continuous Bag-of-Words Model (CBOW)bull 200 vector dimensionsbull remove the terms with less than ten occurrences (about 200000 terms

overall)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 10 21

Evaluation

Evaluation

Evaluating systems on their ability in

bull Task A decide whether a given tweet is subjective or objective

bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)

Dataset

bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)

bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)

bull Metrics systems are compared against the gold standard in terms of F measure

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21

Evaluation Results

Results

Setting Task F Rank Imp

baselineTask A 04005 - -Task B 03718 - -

constrainedTask A 07140 1 78Task B 06771 1 82

unconstrainedTask A 06892 2 72Task B 06638 1 79

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21

Evaluation Discussion

Results Discussion

bull The system always obtains the best task performance in all settings

bull Co-training approach seems to introduce noise

bull Deep error analysis

bull the co-training system slightly improves the performance in classifyingpositive tweets

bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)

bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions

bull ablation test investigate the predictive power of the features in ourmodel

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21

Evaluation Ablation test

Ablation Test

Decrease of F by removing each feature group compared to the complete feature setting

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21

Conclusions and Future Work

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21

Conclusions and Future Work

Conclusions and Future Work

Conclusions

bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance

bull Semantic features (distributional approach) have more predictive power

Future Work

bull Validate and generalize our findings further data and languages

bull Fix some issues in co-training approach

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21

Thank you

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21

For Further Reading

For Further Reading I

Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task

In Proc of EVALITA 2014 Pisa Italy (2014)

Basile V Nissim M Sentiment analysis on italian tweets

In Proc of WASSA 2013 pp 100ndash107 (2013)

Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21

For Further Reading

For Further Reading II

Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining

In Proc of LREC pp 417ndash422 (2006)

Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision

Processing pp 1ndash6 (2009)

Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth

J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)

Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg

In Proc of ICWSM 2011 pp 538ndash541 (2011)

Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space

In Proc of ICLR Work (2013)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21

For Further Reading

For Further Reading III

OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series

In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)

Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining

In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)

Pang B Lee L Opinion mining and sentiment analysis

Found Trends Inf Retr 2(1-2) 1ndash135 (2008)

Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database

In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21

For Further Reading

For Further Reading IV

Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems

Artificial Intelligence 46(1-2) 159ndash216 (1990)

Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter

In Proc of COLING 2014 (2014)

Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis

Comput Linguist 35(3) 399ndash433 (2009)

Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language

Proc of the Corpus Linguistics Conf 2005 (2005)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21

  • Introduction
    • The Task
      • The System
        • Our Approach
        • Features
          • Learning
          • System Setup
          • Evaluation
            • Results
            • Discussion
            • Ablation test
              • Conclusions and Future Work
Page 10: UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

Learning

Learning step

bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm

bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21

System Setup

System Setup overview

bull Learning algorithm

bull SVM with the RBF kernelbull C=4 selected after a 10-folds validation on training databull total number of features 12117

bull Completely developed in JAVA

bull Weka library is adopted for learning

bull Tweets are tokenized using ldquoTwitter NLP and Part-of-Speech Taggingrdquo

bull Word space download 10 million tweets using the Twitter Streaming API and

build the space using ldquoword2vecrdquo

bull continuous Bag-of-Words Model (CBOW)bull 200 vector dimensionsbull remove the terms with less than ten occurrences (about 200000 terms

overall)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 10 21

Evaluation

Evaluation

Evaluating systems on their ability in

bull Task A decide whether a given tweet is subjective or objective

bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)

Dataset

bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)

bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)

bull Metrics systems are compared against the gold standard in terms of F measure

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21

Evaluation Results

Results

Setting Task F Rank Imp

baselineTask A 04005 - -Task B 03718 - -

constrainedTask A 07140 1 78Task B 06771 1 82

unconstrainedTask A 06892 2 72Task B 06638 1 79

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21

Evaluation Discussion

Results Discussion

bull The system always obtains the best task performance in all settings

bull Co-training approach seems to introduce noise

bull Deep error analysis

bull the co-training system slightly improves the performance in classifyingpositive tweets

bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)

bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions

bull ablation test investigate the predictive power of the features in ourmodel

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21

Evaluation Ablation test

Ablation Test

Decrease of F by removing each feature group compared to the complete feature setting

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21

Conclusions and Future Work

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21

Conclusions and Future Work

Conclusions and Future Work

Conclusions

bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance

bull Semantic features (distributional approach) have more predictive power

Future Work

bull Validate and generalize our findings further data and languages

bull Fix some issues in co-training approach

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21

Thank you

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21

For Further Reading

For Further Reading I

Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task

In Proc of EVALITA 2014 Pisa Italy (2014)

Basile V Nissim M Sentiment analysis on italian tweets

In Proc of WASSA 2013 pp 100ndash107 (2013)

Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21

For Further Reading

For Further Reading II

Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining

In Proc of LREC pp 417ndash422 (2006)

Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision

Processing pp 1ndash6 (2009)

Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth

J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)

Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg

In Proc of ICWSM 2011 pp 538ndash541 (2011)

Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space

In Proc of ICLR Work (2013)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21

For Further Reading

For Further Reading III

OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series

In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)

Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining

In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)

Pang B Lee L Opinion mining and sentiment analysis

Found Trends Inf Retr 2(1-2) 1ndash135 (2008)

Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database

In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21

For Further Reading

For Further Reading IV

Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems

Artificial Intelligence 46(1-2) 159ndash216 (1990)

Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter

In Proc of COLING 2014 (2014)

Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis

Comput Linguist 35(3) 399ndash433 (2009)

Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language

Proc of the Corpus Linguistics Conf 2005 (2005)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21

  • Introduction
    • The Task
      • The System
        • Our Approach
        • Features
          • Learning
          • System Setup
          • Evaluation
            • Results
            • Discussion
            • Ablation test
              • Conclusions and Future Work
Page 11: UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

System Setup

System Setup overview

bull Learning algorithm

bull SVM with the RBF kernelbull C=4 selected after a 10-folds validation on training databull total number of features 12117

bull Completely developed in JAVA

bull Weka library is adopted for learning

bull Tweets are tokenized using ldquoTwitter NLP and Part-of-Speech Taggingrdquo

bull Word space download 10 million tweets using the Twitter Streaming API and

build the space using ldquoword2vecrdquo

bull continuous Bag-of-Words Model (CBOW)bull 200 vector dimensionsbull remove the terms with less than ten occurrences (about 200000 terms

overall)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 10 21

Evaluation

Evaluation

Evaluating systems on their ability in

bull Task A decide whether a given tweet is subjective or objective

bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)

Dataset

bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)

bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)

bull Metrics systems are compared against the gold standard in terms of F measure

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21

Evaluation Results

Results

Setting Task F Rank Imp

baselineTask A 04005 - -Task B 03718 - -

constrainedTask A 07140 1 78Task B 06771 1 82

unconstrainedTask A 06892 2 72Task B 06638 1 79

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21

Evaluation Discussion

Results Discussion

bull The system always obtains the best task performance in all settings

bull Co-training approach seems to introduce noise

bull Deep error analysis

bull the co-training system slightly improves the performance in classifyingpositive tweets

bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)

bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions

bull ablation test investigate the predictive power of the features in ourmodel

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21

Evaluation Ablation test

Ablation Test

Decrease of F by removing each feature group compared to the complete feature setting

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21

Conclusions and Future Work

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21

Conclusions and Future Work

Conclusions and Future Work

Conclusions

bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance

bull Semantic features (distributional approach) have more predictive power

Future Work

bull Validate and generalize our findings further data and languages

bull Fix some issues in co-training approach

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21

Thank you

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21

For Further Reading

For Further Reading I

Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task

In Proc of EVALITA 2014 Pisa Italy (2014)

Basile V Nissim M Sentiment analysis on italian tweets

In Proc of WASSA 2013 pp 100ndash107 (2013)

Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21

For Further Reading

For Further Reading II

Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining

In Proc of LREC pp 417ndash422 (2006)

Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision

Processing pp 1ndash6 (2009)

Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth

J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)

Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg

In Proc of ICWSM 2011 pp 538ndash541 (2011)

Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space

In Proc of ICLR Work (2013)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21

For Further Reading

For Further Reading III

OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series

In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)

Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining

In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)

Pang B Lee L Opinion mining and sentiment analysis

Found Trends Inf Retr 2(1-2) 1ndash135 (2008)

Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database

In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21

For Further Reading

For Further Reading IV

Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems

Artificial Intelligence 46(1-2) 159ndash216 (1990)

Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter

In Proc of COLING 2014 (2014)

Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis

Comput Linguist 35(3) 399ndash433 (2009)

Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language

Proc of the Corpus Linguistics Conf 2005 (2005)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21

  • Introduction
    • The Task
      • The System
        • Our Approach
        • Features
          • Learning
          • System Setup
          • Evaluation
            • Results
            • Discussion
            • Ablation test
              • Conclusions and Future Work
Page 12: UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

Evaluation

Evaluation

Evaluating systems on their ability in

bull Task A decide whether a given tweet is subjective or objective

bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)

Dataset

bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)

bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)

bull Metrics systems are compared against the gold standard in terms of F measure

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21

Evaluation Results

Results

Setting Task F Rank Imp

baselineTask A 04005 - -Task B 03718 - -

constrainedTask A 07140 1 78Task B 06771 1 82

unconstrainedTask A 06892 2 72Task B 06638 1 79

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21

Evaluation Discussion

Results Discussion

bull The system always obtains the best task performance in all settings

bull Co-training approach seems to introduce noise

bull Deep error analysis

bull the co-training system slightly improves the performance in classifyingpositive tweets

bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)

bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions

bull ablation test investigate the predictive power of the features in ourmodel

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21

Evaluation Ablation test

Ablation Test

Decrease of F by removing each feature group compared to the complete feature setting

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21

Conclusions and Future Work

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21

Conclusions and Future Work

Conclusions and Future Work

Conclusions

bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance

bull Semantic features (distributional approach) have more predictive power

Future Work

bull Validate and generalize our findings further data and languages

bull Fix some issues in co-training approach

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21

Thank you

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21

For Further Reading

For Further Reading I

Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task

In Proc of EVALITA 2014 Pisa Italy (2014)

Basile V Nissim M Sentiment analysis on italian tweets

In Proc of WASSA 2013 pp 100ndash107 (2013)

Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21

For Further Reading

For Further Reading II

Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining

In Proc of LREC pp 417ndash422 (2006)

Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision

Processing pp 1ndash6 (2009)

Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth

J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)

Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg

In Proc of ICWSM 2011 pp 538ndash541 (2011)

Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space

In Proc of ICLR Work (2013)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21

For Further Reading

For Further Reading III

OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series

In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)

Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining

In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)

Pang B Lee L Opinion mining and sentiment analysis

Found Trends Inf Retr 2(1-2) 1ndash135 (2008)

Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database

In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21

For Further Reading

For Further Reading IV

Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems

Artificial Intelligence 46(1-2) 159ndash216 (1990)

Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter

In Proc of COLING 2014 (2014)

Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis

Comput Linguist 35(3) 399ndash433 (2009)

Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language

Proc of the Corpus Linguistics Conf 2005 (2005)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21

  • Introduction
    • The Task
      • The System
        • Our Approach
        • Features
          • Learning
          • System Setup
          • Evaluation
            • Results
            • Discussion
            • Ablation test
              • Conclusions and Future Work
Page 13: UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

Evaluation Results

Results

Setting Task F Rank Imp

baselineTask A 04005 - -Task B 03718 - -

constrainedTask A 07140 1 78Task B 06771 1 82

unconstrainedTask A 06892 2 72Task B 06638 1 79

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21

Evaluation Discussion

Results Discussion

bull The system always obtains the best task performance in all settings

bull Co-training approach seems to introduce noise

bull Deep error analysis

bull the co-training system slightly improves the performance in classifyingpositive tweets

bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)

bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions

bull ablation test investigate the predictive power of the features in ourmodel

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21

Evaluation Ablation test

Ablation Test

Decrease of F by removing each feature group compared to the complete feature setting

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21

Conclusions and Future Work

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21

Conclusions and Future Work

Conclusions and Future Work

Conclusions

bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance

bull Semantic features (distributional approach) have more predictive power

Future Work

bull Validate and generalize our findings further data and languages

bull Fix some issues in co-training approach

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21

Thank you

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21

For Further Reading

For Further Reading I

Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task

In Proc of EVALITA 2014 Pisa Italy (2014)

Basile V Nissim M Sentiment analysis on italian tweets

In Proc of WASSA 2013 pp 100ndash107 (2013)

Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21

For Further Reading

For Further Reading II

Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining

In Proc of LREC pp 417ndash422 (2006)

Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision

Processing pp 1ndash6 (2009)

Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth

J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)

Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg

In Proc of ICWSM 2011 pp 538ndash541 (2011)

Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space

In Proc of ICLR Work (2013)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21

For Further Reading

For Further Reading III

OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series

In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)

Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining

In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)

Pang B Lee L Opinion mining and sentiment analysis

Found Trends Inf Retr 2(1-2) 1ndash135 (2008)

Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database

In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21

For Further Reading

For Further Reading IV

Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems

Artificial Intelligence 46(1-2) 159ndash216 (1990)

Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter

In Proc of COLING 2014 (2014)

Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis

Comput Linguist 35(3) 399ndash433 (2009)

Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language

Proc of the Corpus Linguistics Conf 2005 (2005)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21

  • Introduction
    • The Task
      • The System
        • Our Approach
        • Features
          • Learning
          • System Setup
          • Evaluation
            • Results
            • Discussion
            • Ablation test
              • Conclusions and Future Work
Page 14: UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

Evaluation Discussion

Results Discussion

bull The system always obtains the best task performance in all settings

bull Co-training approach seems to introduce noise

bull Deep error analysis

bull the co-training system slightly improves the performance in classifyingpositive tweets

bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)

bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions

bull ablation test investigate the predictive power of the features in ourmodel

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21

Evaluation Ablation test

Ablation Test

Decrease of F by removing each feature group compared to the complete feature setting

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21

Conclusions and Future Work

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21

Conclusions and Future Work

Conclusions and Future Work

Conclusions

bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance

bull Semantic features (distributional approach) have more predictive power

Future Work

bull Validate and generalize our findings further data and languages

bull Fix some issues in co-training approach

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21

Thank you

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21

For Further Reading

For Further Reading I

Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task

In Proc of EVALITA 2014 Pisa Italy (2014)

Basile V Nissim M Sentiment analysis on italian tweets

In Proc of WASSA 2013 pp 100ndash107 (2013)

Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21

For Further Reading

For Further Reading II

Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining

In Proc of LREC pp 417ndash422 (2006)

Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision

Processing pp 1ndash6 (2009)

Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth

J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)

Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg

In Proc of ICWSM 2011 pp 538ndash541 (2011)

Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space

In Proc of ICLR Work (2013)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21

For Further Reading

For Further Reading III

OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series

In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)

Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining

In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)

Pang B Lee L Opinion mining and sentiment analysis

Found Trends Inf Retr 2(1-2) 1ndash135 (2008)

Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database

In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21

For Further Reading

For Further Reading IV

Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems

Artificial Intelligence 46(1-2) 159ndash216 (1990)

Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter

In Proc of COLING 2014 (2014)

Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis

Comput Linguist 35(3) 399ndash433 (2009)

Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language

Proc of the Corpus Linguistics Conf 2005 (2005)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21

  • Introduction
    • The Task
      • The System
        • Our Approach
        • Features
          • Learning
          • System Setup
          • Evaluation
            • Results
            • Discussion
            • Ablation test
              • Conclusions and Future Work
Page 15: UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

Evaluation Ablation test

Ablation Test

Decrease of F by removing each feature group compared to the complete feature setting

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21

Conclusions and Future Work

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21

Conclusions and Future Work

Conclusions and Future Work

Conclusions

bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance

bull Semantic features (distributional approach) have more predictive power

Future Work

bull Validate and generalize our findings further data and languages

bull Fix some issues in co-training approach

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21

Thank you

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21

For Further Reading

For Further Reading I

Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task

In Proc of EVALITA 2014 Pisa Italy (2014)

Basile V Nissim M Sentiment analysis on italian tweets

In Proc of WASSA 2013 pp 100ndash107 (2013)

Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21

For Further Reading

For Further Reading II

Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining

In Proc of LREC pp 417ndash422 (2006)

Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision

Processing pp 1ndash6 (2009)

Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth

J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)

Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg

In Proc of ICWSM 2011 pp 538ndash541 (2011)

Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space

In Proc of ICLR Work (2013)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21

For Further Reading

For Further Reading III

OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series

In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)

Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining

In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)

Pang B Lee L Opinion mining and sentiment analysis

Found Trends Inf Retr 2(1-2) 1ndash135 (2008)

Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database

In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21

For Further Reading

For Further Reading IV

Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems

Artificial Intelligence 46(1-2) 159ndash216 (1990)

Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter

In Proc of COLING 2014 (2014)

Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis

Comput Linguist 35(3) 399ndash433 (2009)

Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language

Proc of the Corpus Linguistics Conf 2005 (2005)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21

  • Introduction
    • The Task
      • The System
        • Our Approach
        • Features
          • Learning
          • System Setup
          • Evaluation
            • Results
            • Discussion
            • Ablation test
              • Conclusions and Future Work
Page 16: UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

Conclusions and Future Work

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21

Conclusions and Future Work

Conclusions and Future Work

Conclusions

bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance

bull Semantic features (distributional approach) have more predictive power

Future Work

bull Validate and generalize our findings further data and languages

bull Fix some issues in co-training approach

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21

Thank you

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21

For Further Reading

For Further Reading I

Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task

In Proc of EVALITA 2014 Pisa Italy (2014)

Basile V Nissim M Sentiment analysis on italian tweets

In Proc of WASSA 2013 pp 100ndash107 (2013)

Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21

For Further Reading

For Further Reading II

Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining

In Proc of LREC pp 417ndash422 (2006)

Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision

Processing pp 1ndash6 (2009)

Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth

J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)

Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg

In Proc of ICWSM 2011 pp 538ndash541 (2011)

Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space

In Proc of ICLR Work (2013)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21

For Further Reading

For Further Reading III

OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series

In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)

Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining

In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)

Pang B Lee L Opinion mining and sentiment analysis

Found Trends Inf Retr 2(1-2) 1ndash135 (2008)

Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database

In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21

For Further Reading

For Further Reading IV

Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems

Artificial Intelligence 46(1-2) 159ndash216 (1990)

Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter

In Proc of COLING 2014 (2014)

Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis

Comput Linguist 35(3) 399ndash433 (2009)

Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language

Proc of the Corpus Linguistics Conf 2005 (2005)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21

  • Introduction
    • The Task
      • The System
        • Our Approach
        • Features
          • Learning
          • System Setup
          • Evaluation
            • Results
            • Discussion
            • Ablation test
              • Conclusions and Future Work
Page 17: UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

Conclusions and Future Work

Conclusions and Future Work

Conclusions

bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance

bull Semantic features (distributional approach) have more predictive power

Future Work

bull Validate and generalize our findings further data and languages

bull Fix some issues in co-training approach

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21

Thank you

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21

For Further Reading

For Further Reading I

Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task

In Proc of EVALITA 2014 Pisa Italy (2014)

Basile V Nissim M Sentiment analysis on italian tweets

In Proc of WASSA 2013 pp 100ndash107 (2013)

Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21

For Further Reading

For Further Reading II

Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining

In Proc of LREC pp 417ndash422 (2006)

Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision

Processing pp 1ndash6 (2009)

Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth

J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)

Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg

In Proc of ICWSM 2011 pp 538ndash541 (2011)

Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space

In Proc of ICLR Work (2013)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21

For Further Reading

For Further Reading III

OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series

In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)

Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining

In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)

Pang B Lee L Opinion mining and sentiment analysis

Found Trends Inf Retr 2(1-2) 1ndash135 (2008)

Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database

In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21

For Further Reading

For Further Reading IV

Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems

Artificial Intelligence 46(1-2) 159ndash216 (1990)

Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter

In Proc of COLING 2014 (2014)

Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis

Comput Linguist 35(3) 399ndash433 (2009)

Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language

Proc of the Corpus Linguistics Conf 2005 (2005)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21

  • Introduction
    • The Task
      • The System
        • Our Approach
        • Features
          • Learning
          • System Setup
          • Evaluation
            • Results
            • Discussion
            • Ablation test
              • Conclusions and Future Work
Page 18: UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

Thank you

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21

For Further Reading

For Further Reading I

Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task

In Proc of EVALITA 2014 Pisa Italy (2014)

Basile V Nissim M Sentiment analysis on italian tweets

In Proc of WASSA 2013 pp 100ndash107 (2013)

Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21

For Further Reading

For Further Reading II

Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining

In Proc of LREC pp 417ndash422 (2006)

Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision

Processing pp 1ndash6 (2009)

Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth

J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)

Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg

In Proc of ICWSM 2011 pp 538ndash541 (2011)

Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space

In Proc of ICLR Work (2013)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21

For Further Reading

For Further Reading III

OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series

In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)

Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining

In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)

Pang B Lee L Opinion mining and sentiment analysis

Found Trends Inf Retr 2(1-2) 1ndash135 (2008)

Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database

In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21

For Further Reading

For Further Reading IV

Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems

Artificial Intelligence 46(1-2) 159ndash216 (1990)

Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter

In Proc of COLING 2014 (2014)

Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis

Comput Linguist 35(3) 399ndash433 (2009)

Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language

Proc of the Corpus Linguistics Conf 2005 (2005)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21

  • Introduction
    • The Task
      • The System
        • Our Approach
        • Features
          • Learning
          • System Setup
          • Evaluation
            • Results
            • Discussion
            • Ablation test
              • Conclusions and Future Work
Page 19: UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

For Further Reading

For Further Reading I

Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task

In Proc of EVALITA 2014 Pisa Italy (2014)

Basile V Nissim M Sentiment analysis on italian tweets

In Proc of WASSA 2013 pp 100ndash107 (2013)

Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys

In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21

For Further Reading

For Further Reading II

Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining

In Proc of LREC pp 417ndash422 (2006)

Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision

Processing pp 1ndash6 (2009)

Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth

J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)

Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg

In Proc of ICWSM 2011 pp 538ndash541 (2011)

Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space

In Proc of ICLR Work (2013)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21

For Further Reading

For Further Reading III

OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series

In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)

Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining

In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)

Pang B Lee L Opinion mining and sentiment analysis

Found Trends Inf Retr 2(1-2) 1ndash135 (2008)

Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database

In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21

For Further Reading

For Further Reading IV

Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems

Artificial Intelligence 46(1-2) 159ndash216 (1990)

Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter

In Proc of COLING 2014 (2014)

Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis

Comput Linguist 35(3) 399ndash433 (2009)

Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language

Proc of the Corpus Linguistics Conf 2005 (2005)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21

  • Introduction
    • The Task
      • The System
        • Our Approach
        • Features
          • Learning
          • System Setup
          • Evaluation
            • Results
            • Discussion
            • Ablation test
              • Conclusions and Future Work
Page 20: UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

For Further Reading

For Further Reading II

Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining

In Proc of LREC pp 417ndash422 (2006)

Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision

Processing pp 1ndash6 (2009)

Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth

J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)

Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg

In Proc of ICWSM 2011 pp 538ndash541 (2011)

Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space

In Proc of ICLR Work (2013)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21

For Further Reading

For Further Reading III

OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series

In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)

Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining

In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)

Pang B Lee L Opinion mining and sentiment analysis

Found Trends Inf Retr 2(1-2) 1ndash135 (2008)

Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database

In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21

For Further Reading

For Further Reading IV

Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems

Artificial Intelligence 46(1-2) 159ndash216 (1990)

Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter

In Proc of COLING 2014 (2014)

Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis

Comput Linguist 35(3) 399ndash433 (2009)

Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language

Proc of the Corpus Linguistics Conf 2005 (2005)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21

  • Introduction
    • The Task
      • The System
        • Our Approach
        • Features
          • Learning
          • System Setup
          • Evaluation
            • Results
            • Discussion
            • Ablation test
              • Conclusions and Future Work
Page 21: UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

For Further Reading

For Further Reading III

OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series

In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)

Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining

In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)

Pang B Lee L Opinion mining and sentiment analysis

Found Trends Inf Retr 2(1-2) 1ndash135 (2008)

Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database

In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21

For Further Reading

For Further Reading IV

Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems

Artificial Intelligence 46(1-2) 159ndash216 (1990)

Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter

In Proc of COLING 2014 (2014)

Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis

Comput Linguist 35(3) 399ndash433 (2009)

Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language

Proc of the Corpus Linguistics Conf 2005 (2005)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21

  • Introduction
    • The Task
      • The System
        • Our Approach
        • Features
          • Learning
          • System Setup
          • Evaluation
            • Results
            • Discussion
            • Ablation test
              • Conclusions and Future Work
Page 22: UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

For Further Reading

For Further Reading IV

Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems

Artificial Intelligence 46(1-2) 159ndash216 (1990)

Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter

In Proc of COLING 2014 (2014)

Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis

Comput Linguist 35(3) 399ndash433 (2009)

Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language

Proc of the Corpus Linguistics Conf 2005 (2005)

Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21

  • Introduction
    • The Task
      • The System
        • Our Approach
        • Features
          • Learning
          • System Setup
          • Evaluation
            • Results
            • Discussion
            • Ablation test
              • Conclusions and Future Work