Upload
azubiaga
View
670
Download
4
Tags:
Embed Size (px)
Citation preview
Arkaitz ZubiagaUniversity of Warwick
Maria Liakata1, Rob Procter1, Kalina Bontcheva2, Peter Tolmie1
1 University of Warwick, UK2 University of Sheffield, UK
Crowdsourcing the Annotation of Rumourous Conversations in Social Media
Objectives
l Scenario where a journalist is tracking a breaking news story.
l Identify rumours, distinguishing them from non-rumours.
l Study the conversational aspects of rumours, towards determining their veracity.
Objectives
Study conversational aspects of rumours.l
1) Build a dataset with diverse sets of rumourous stories.
2) Annotate linguistic and interaction patterns within rumours to enable automated analysis.
3) Analyse these patterns and use machine learning techniques to determine the veracity of rumours.
Related Work
Previous work on rumour detection in social media [Qazvinian et al. 2011, Procter et al. 2013, Castillo et al. 2013, Starbird er al. 2014]
● Rumours known a priori, keyword search, e.g., “sandy sharks” or “london eye fire”.
● Looking at tweets individually, no interactions captured.
Our Approach
Our approach:
● Identify diverse set of rumours, can be unknown a priori, e.g.,
follow #hurricanesandy, and see what comes up.
● Annotate conversational aspects (wrt veracity), capturing interaction between tweets.
Creating a corpus of rumourous conversations
Steps:
● Formal definition of rumour.
● Annotation of rumours and non-rumours.
● Annotation of conversational aspects.
Definition of rumour
Putting together OED and previous research on rumours:
Rumour: “a circulating story of questionable veracity, which is apparently credible but hard to verify, and produces sufficient skepticism and/or anxiety.”
Data Collection
• Track event on streaming API, e.g. #ferguson.
• Data sampling: According to definition of rumours, sample tweets that spark high number of Rts.
• Conversations: Collecting associated conversations.
Annotation (rumour vs non-rumour): Results
Annotations:
● Ferguson:291 / 1,185 rumours (24.6%) – 42 stories.
● Ottawa shootings:475 / 901 rumours – 51 stories.
● Essien contracted Ebola:18 / 18 rumours (100%) – 1 story.
Annotation scheme: conversational aspects of rumours Designed annotation scheme to:•
l Capture sequential features of conversation thread.l Analyse effect of interaction at a given point.l Break down annotation into tweet triples (or less).
Crowdsourcing the annotation of tweets
Used CrowdFlower for crowdsourcing, 5-10 annotators for tweet-feature pair.•
• All data also annotated by two of us, as a reference.
Crowdsourcing task results
• Annotation of 216 tweets in 8 threadsl 3-4 features per tweet: 4,974 units.
• 98 different annotators.• Final set of annotations obtained through majority voting.
Crowdsourcing task agreement
CS REF
CS 60.2% 68.84%
REF 78.57%
CS: crowdsourced annotations.l
l REF: reference annotations.
Distribution of annotations
Skewed distribution of annotations:ll 66.5% of replies are comments.
l
l 79.8% of replies provide no evidence.l
l 84% of comments provide no evidence.
Annotation scheme: conversational aspects of rumours
(+) Underspecified
l Comments should not be annotated for certainty and evidentiality (they're not adding anything to veracity anyway)
Conclusion
● Described novel method to collect and annotate rumorous conversations from Twitter.
● Introduced annotation scheme for annotation of conversation threads.l Annotations looking at tweet triples.l Differentiating source tweets and replies.
Scheme iteratively revised and validated through crowdsourcing.
Future Work
● With validated annotation scheme, perform larger scale crowdsourced annotation of conversations.
● Annotation of a wider variety of events, e.g., Charlie Hebdo shooting, Germanwings plane crash, etc.
● Development of Machine Learning tools:l Rumour identification and veracity assessment.l Tweet classification: supporting/denying, providing evidence, etc.