Upload
stan
View
36
Download
0
Embed Size (px)
DESCRIPTION
Mention- a nomaly - based E vent D etection and T racking in T witter Adrien Guille & Cécile Favre ERIC Lab , University of Lyon 2, France. IEEE/ACM ASONAM 2014, Beijing, China. What is Twitter & why study it ?. Twitter : micro- blogging service 140-character messages - PowerPoint PPT Presentation
Citation preview
Mention-anomaly-based Event
Detection and Tracking in
Adrien Guille & Cécile FavreERIC Lab, University of Lyon 2,
FranceIEEE/ACM ASONAM 2014, Beijing,
ChinaAugust 20,
2014
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
2
What is Twitter & why study it?
Twitter: micro-blogging service 140-character messages
Ever growing number of Twitter users Pro: Timely source of information Con: Information overload
How can we use Twitter for automated event detection and tracking?
August 20, 2014
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
3
Related Work
Idea: spot bursty patterns Term-weighting-based approaches
Peaky Topics [Shamma11], Trending Score [Benhardus13]
Possible ambiguity, lack of context Topic-modeling-based approaches
On-line LDA [Lau12], ET-LDA [Yuheng12] Lack of scalability
Clustering-based approaches EDCoW [Weng11], TwEvent [Li12], ET [Parikh13] Noisy event descriptions
August 20, 2014
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
4
Issues & Proposal
August 20, 2014
Shortcomings of existing methods Event duration is a fixed parameter Only the textual content of tweets is considered
We propose a novel approach and method that Dynamically estimate each event duration Exploit the social aspect of tweet streams through mentions
5
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
Proposed Method
August 20, 2014
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
6
Problem Formulation
Input Corpus C containing N
tweets partitioned into n time-slices
Vocabularies V and V@
Output The k most impactful events
August 20, 2014
Event: A bursty topic and a value Mag translating its magnitude of impact
Bursty Topic: A time interval I, a main term t, a set S of weighted related terms
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
7
Overview of the proposed method
August 20, 2014
Two-phase flow 1: Analyse the mention
frequency of each word in V@ to detect events (Mag,I,t,Ø)
2: Select related words and generating the final list of the k most impactful events while controling redundancy
MABED, Mention-Anomaly-Based Event Detection
8
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
PHASE 1
Proposed Method
August 20, 2014
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
9
Detecting Events with Mention Anomaly
August 20, 2014
Computing the anomaly at a point i for word t Requires computing the expected volume
of tweets containing at least one mention and t, at i
Normal distribution: Expectation: Anomaly:
Measuring the magnitude of impact Integrating anomaly:
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
10
Detecting Events with Mention Anomaly
August 20, 2014
For each word t in V@
Solve a « Maximum Contiguous Subsequence Sum » type of problem:
Eventually, each event is described by A main word t A period of time I The magnitude of its impact Mag
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
11
Detecting Events with Mention Anomaly
August 20, 2014
Example
12
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
PHASE 2
Proposed Method
August 20, 2014
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
13
Selecting Words Describing Events
August 20, 2014
Identifying candidate words Set of p words that co-occur the most with t
during I Selecting the most
relevant words Measure the
similarity between candidate words and the main word frequency [Erdem12]
Apply a threshold θ
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
14
Selecting Words Describing Events
August 20, 2014
Example
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
15
Generating the List of Top k Events
August 20, 2014
Event graph & redundancy graph
Detecting duplicated events Connectivity of main terms in the event graph Overlap between intervals, threshold σ
Merging duplicated events Identifying connected components in the
redundancy graph
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
16
Generating the List of Top k Events
August 20, 2014
Example
17
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
Evaluation
August 20, 2014
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
18
Experimental Setup
August 20, 2014
Corpora C(en): 1,437,126 tweets published in
November 2009 C(fr): 2,086,136 tweets published in March
2012 Baselines for comparison
Trending Score (TS) [Benhardus13] and ET [Parikh13]
α-MABED Parameter setting
(α-)MABED: 30-min time-slices, p=10, θ=0.7, σ=0.5
Trending Score, ET: 1-day time-slices
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
19
Evaluation Metrics
August 20, 2014
Manual annotation Two human annotators judging the significancy
of the top 40 events detected by each method (κ = 0.72)
Precision Significant events / All detected events
Recall Distinct significant events / All detected events
DERate [Li12] Duplicated events / Significant events
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
20
Quantitative Evaluation
August 20, 2014
Performance of the five methods on the two corpora
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
21
Quantitative Evaluation
August 20, 2014
Impact of σ on MABED
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
22
Qualitative Evaluation
August 20, 2014
Improved readability Excerpt of the list of events detected in C(en) by MABED
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
23
Qualitative Evaluation
August 20, 2014
Improved temporal precision & reduced redundancy
Importance of dynamically estimating events duration Politics-related events
tend to be discussed longer [Romero11]
24
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
Included in the open-source social media data mining tool SONDY [Guille13]
http://mediamining.univ-lyon2.fr/people/guille/mabed.php
Implementation
August 20, 2014
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
25
Time-oriented Interface
August 20, 2014
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
26
Impact-oriented Interface
August 20, 2014
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
27
Topic-oriented Interface
August 20, 2014
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
28
Conclusion & Future Work
August 20, 2014
Propose a novel approach and method for detecting events in Twitter
Verified hypothesis Considering mentions helps detecting significant
events Experimental results on two different datasets
demonstrate the accuracy and the robustness of the proposed method
Future work More features to model discussions between
users
A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter
29
References
August 20, 2014
[Shamma11] D. A. Shamma, L. Kennedy, and E. F. Churchill, “Peaks and persistence: modeling the shape of microblog conversations,” in CSCW, 2011
[Benhardus13] J. Benhardus and J. Kalita, “Streaming trend detection in twitter,” IJWBC, vol. 9, no. 1, 2013
[Lau12] J. H. Lau, N. Collier, and T. Baldwin, “On-line trend analysis with topic models: #twitter trends detection topic model online,” in COLING, 2012
[Yuheng12] H.Yuheng, J.Ajita, D.S.Dorée, and W.Fei, “What were the tweets about? topical associations between public events and twitter feeds,” in ICWSM, 2012
[Weng11] J. Weng and B.-S. Lee, “Event detection in twitter,” in ICWSM, 2011
[Li12] C. Li, A. Sun, and A. Datta, “Twevent: Segment-based event detection from tweets,” in CIKM, 2012
[Parikh13] R. Parikh and K. Karlapalem, “Et: events from tweets,” in companion WWW, 2013
[Erdem12] O. Erdem, E. Ceyhan, and Y. Varli, “A new correlation coefficient for bivariate time-series data,” in MAF, 2012
[Guille13] A. Guille, C. Favre, H. Hacid, and D. Zighed, “Sondy: An open source platform for social dynamics mining and analysis,” in SIGMOD, 2013