18
EVALITA 2014 EVALUATION OF NLP AND SPEECH TOOLS FOR ITALIAN SENTIPOLC SENTIment POLarity Classification di.unito.it/sentipolc14 Valerio Basile, University of Groningen Andrea Bolioli, CELI, Torino Malvina Nissim, University of Groningen, University of Bologna Viviana Patti, University of Torino, Dip. di Informatica Paolo Rosso, Universitat Politècnica de València

SENTIment POLarity Classification Task - Sentipolc@Evalita 2014

Embed Size (px)

Citation preview

EVALITA 2014 EVALUATION OF NLP AND SPEECH TOOLS FOR ITALIAN

SENTIPOLC SENTIment POLarity Classification

di.unito.it/sentipolc14 Valerio Basile, University of Groningen

Andrea Bolioli, CELI, Torino

Malvina Nissim, University of Groningen, University of Bologna

Viviana Patti, University of Torino, Dip. di Informatica

Paolo Rosso, Universitat Politècnica de València

EVALITA  2014  Workshop  December  11  2014,  Pisa  

A new shared task in the Evalita evaluation campaign •  sentiment analysis at the message level on Italian tweets •  three independent sub-tasks: Task 1 - Subjectivity Classification: a system

must decide whether a given message is subjective or objective

Task 2 - Polarity Classification: a system must decide whether a given message is of positive, negative, neutral or mixed sentiment

Task 3 (Pilot): Irony Detection: a system must decide whether a given message is ironic or not

Task description

Semeval  2013,  task  2  Semeval  2014,  task  9  

EVALITA  2014  Workshop  December  11  2014,  Pisa  

Development and Test Data

Collection •  6,448 (training set 4,513; test set: 1,935)

tweets derived from two existing corpora: o  SENTI-TUT (Bosco, Patti, Bolioli, 2013) o  TWITA (Basile and Nissim, 2013)

•  two main components: o  political: extraction based on

specific keywords and hashtags marking political topics (#grillo, Monti)

o  generic: random tweets on any topic.

EVALITA  2014  Workshop  December  11  2014,  Pisa  

Development and Test Data

Data format •  Each tweet is presented as a sequence of comma separated fields

id, subj, pos, neg, iro, topic, text

•  Manual annotation: subj (subjectivity)/pos (positive polarity)/neg (negative polarity)/iro (ironic)

•  Apart from the id, which is a string of numeric characters, the value of all the other fields can be either “0” or “1”.

•  For the four manually annotated classes: o  0 means that the feature is absent o  1 means that the feature is present

EVALITA  2014  Workshop  December  11  2014,  Pisa  

Development and Test Data

Data format •  Each tweet is presented as a sequence of comma separated fields

id, subj, pos, neg, iro, topic, text

Constraints in the annotation scheme: •  An objective tweet will not have any polarity nor irony •  A subjective tweet can exhibit at the same time positive and

negative polarity (mixed!) •  A subjective tweet can exhibit no specific polarity and be just

neutral but with a clear subjective flavour •  An ironic tweet is always subjective and it must have one

defined polarity

EVALITA  2014  Workshop  December  11  2014,  Pisa  

Development and Test Data

•  Constraints in the annotation scheme:

EVALITA  2014  Workshop  December  11  2014,  Pisa  

Examples

l’articolo di Roberto Ciccarelli dal manifesto di oggi http://fb.me/1BQVy5WAk

o subj = 0 o pos = 0 o neg = 0 o iro = 0

•  Objective tweet: …0, 0, 0, 0… id, subj, pos, neg, iro, topic, text

EVALITA  2014  Workshop  December  11  2014,  Pisa  

Examples

Dati negativi da Confindustria che spera nel nuovo governo Monti. Castiglione: “Avanti con le riforme” http://t.co/kIKnbFY7

o subj = 1 o pos = 1 o neg = 1 o iro = 0

•  Subjective, mixed: …1, 1, 1, 0 … id, subj, pos, neg, iro, topic, text

EVALITA  2014  Workshop  December  11  2014,  Pisa  

Examples Botta di ottimismo a #lInfedele: Governo Monti, o la

va o la spacca.

o  subj = 1 o  pos = 0 o  neg = 1 o  iro = 1

•  Subjective, negative, ironic: …1, 0, 1, 1 … id, subj, pos, neg, iro, topic, text

•  Underlying assumptions on irony o  1111: not allowed! o  1001: not allowed! o  0XX1: not allowed!

An  ironic  tweet  is  always  subjec?ve  and  it  must  have  one  defined  polarity  

EVALITA  2014  Workshop  December  11  2014,  Pisa  

Development and Test Data

Data format •  Each tweet is presented as a sequence of comma separated fields

id, subj, pos, neg, ironic, topic, text

•  id: Twitter status id (necessary to retrieve the text). •  topic: 0 means “generic” and 1 means “political”. •  text: this column will be filled with the

actual tweet's text o  Due to Twitter’s privacy policy, tweets

cannot be distributed directly o  Participants were provided with a web interface (RESTful Web API

technology) through which they could download the tweet’s text on the fly --when still available-- for all the ids provided

TwiEer’s  peculiar  issue  in  the  evalua?on  phase:  same  training/test  data  for  all  teams  

EVALITA  2014  Workshop  December  11  2014,  Pisa  

Evaluation

•  Evaluation set: tweets classified by all participating teams o current twitter policies! o no big differences

•  Metrics: precision, recall and F-measure for each field/class o polarity classification: adapted in order to take

into account the peculiarities of the annotation scheme (e.g. possible to have mixed sentiment)

o details on evaluation metrics applied for the evaluation of the participant results in the organizers’ report

EVALITA  2014  Workshop  December  11  2014,  Pisa  

Participants •  A total of 11 teams from 4 different countries participated in

at least one of the three tasks •  SENTIPOLC was the most participated Evalita task with a total

of 35 submitted runs: great interest of the NLP community on sentiment analysis in Italian social media o  Most of the submissions were constrained (training only on task data)

•  Only academy (no industry)

EVALITA  2014  Workshop  December  11  2014,  Pisa  

Results – Task 1 subjectivity

•  The highest F-score was achieved by uniba2930 at 0.7140 (constrained run) o  All participating systems show an improvement over the

baseline

•  majority class baseline (for all tasks)

EVALITA  2014  Workshop  December  11  2014,  Pisa  

Results – Task 2 polarity •  Again, the highest F-score was achieved by uniba2930 at

0.6771 (constrained). o  the most popular

subtask o  all participating

systems show an improvement over the baseline

EVALITA  2014  Workshop  December  11  2014,  Pisa  

Results – pilot Task 3 irony •  The highest F-score was achieved by UNITOR at 0.5959

(unconstrained run) and 0.5759 (constrained run). o  some systems score very close to the baseline:

hight complexity of the task

EVALITA  2014  Workshop  December  11  2014,  Pisa  

Comparison, issues

•  Comparison lines: o  exploitation of further Twitter annotated data for training o  classification framework (approaches, algorithms, features) o  exploitation of available resources (e.g. sentiment lexicons, NLP

tools, etc.), o  interdependency of tasks in case of systems participating in

several subtasks …in the organizers’ report

•  Issues o  Irony and polarity reversal o  Mixed sentiment is hard to recognise

EVALITA  2014  Workshop  December  11  2014,  Pisa  

What’s next •  uniba2930: best system on tasks 1 & 2

Pierpaolo Basile and Nicole Novielli UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

•  UNITOR: best system on pilot task 3 Giuseppe Castellucci, Danilo Croce, Diego De Cao, Roberto Basili A Multiple Kernel Approach for Twitter Sentiment Analysis in Italian

Discussion!

17.45: Poster session Proceedings on-line!

http://clic.humnet.unipi.it/proceedings/Proceedings-EVALITA-2014.pdf

EVALITA  2014  Workshop  December  11  2014,  Pisa  

Discussion •  Feedback from 2014 Sentipolc teams? •  Next edition? Ideas? Proposal?

o  data - Twitter data

Facebook data (conversational threads, friends network)?

- format o  tasks

- aspect-based sentiment analysis (target)? emotions? o  systems

- Sentipolc systems available as services via API/download?

o  evaluation metrics