Upload
yanxiang-wang
View
334
Download
1
Tags:
Embed Size (px)
Citation preview
Social Event Detection with Clustering and Filtering
Yanxiang Wang Australian National University
Lexing Xie Australian National University
Hari Sundaram Arizona State University
Background
SED with Clustering and Filtering 2
Introduction
• Previous Approaches – Supervised[Firan CIKM’102] – Unsupervised[Becker WSDM’101,
Rapadopoulos3] • Query partial specified motivate a
Clustering and Filtering approach
SED with Clustering and Filtering 3
Cluster-Based Landmark and Event Detection for Tagged Photo Collection, Papadopoulos3
Bring Order to Your Photos: Event-Driven Classification of Flickr Images Based on Social Knowledge, Firan2
Learning Similarity Metrics for Event Identification in Social Media, Becker1
Similarity Metric
• Time: Time Difference in minutes • Location: Great Circle Distance • Tag: Jaccard index • Text: Cosine similarity
SED with Clustering and Filtering 4
1− t1 − t2tw1− gcd
50ta∩ tbta∪ tbA BA B•
Overview
SED with Clustering and Filtering 5
Time
Tag + Text + Location
Time + Location
Tag + Text Visual
Clustering
• Incremental Clustering1
1. Time Clustering 2. Tag + Text + Location
– Weighted sum combination – Weight corresponds to training performance
SED with Clustering and Filtering 6
wtst +wxsx +wlsl
Learning Similarity Metrics for Event Identification in Social Media, Becker1
1 2
Filtering
1. Time + Location: – Time: outside time-frame – Location: outside radius of central point
2. Tag + Text: Query Expansion 3. Visual: Concept List
SED with Clustering and Filtering 7
1 2 3
Tag + Text Filtering
• Use Flickr API to construct query – Tag: flickr.tags.getClusters – Text: flickr.photos.search
• Use online event directory last.fm to retrieve tag and text information
• Filter the clusters with same similarity metric
SED with Clustering and Filtering 8
wtst +wxsx
Example Query
SED with Clustering and Filtering 9
Visual Filtering
• Filter clusters with invalid concept • e.g. the list for soccer event
SED with Clustering and Filtering 10
Concept Threshold Beach 0.3 Flower Scene 0.4 Infant 0.3 …
Training
• Setup – No training set from organizer – Compile from subset of upcoming dataset – Additional random photos from flickr –
• Result – 80% on F1 evaluation after clustering – 40% on F1 evaluation after filtering
SED with Clustering and Filtering 11
Result
• Query Expansion – Challenge 1: Barcelona, Rome, soccer – Challenge 2: Paradiso, Parc del Forum
• Runs – Different thresholds µ for the tag + text
filtering
SED with Clustering and Filtering 12
Performance Matric µ:0.2 µ:0.1 µ:0.05 Precision 12.53% 62.88% 84.86% Recall 58.79% 52.93% 52.54% F1 20.65% 57.48% 64.9% NMI 0.1166 0.2207 0.2367
SED with Clustering and Filtering 13
Matric µ:0.2 µ:0.1 µ:0.05 µ:0.1 last.fm Precision 38.5% 59.26% 66.89% 56.16% Recall 66.34% 43.9% 6.04% 18.9% F1 48.72% 50.44% 11.07% 28.28% NMI 0.2941 0.448 0.2705 0.4491
Challenge 1
Challenge 2
Summary
• Simple clustering and filtering algorithm
SED with Clustering and Filtering 14
Correct result Incorrect result Didn’t find
Future work
• Thorough result analysis on available ground-truth
• Refine the filtering process • Incorporate methods to merge and rank
clusters
SED with Clustering and Filtering 15
Thoughts for SED 2012 (and beyond?) • Provide a common training set?
– E.g. 2009 photos for training, 2010 for evaluation
• TREC-style ranked-list evaluation – e.g. AP, F1 vs depth, so as to easily see how an algorithm
(could) easily achieve
• Accommodate other event definitions? – Multi-city long-lasting events, e.g. Olympic torch relay
http://www.flickr.com/search/?q=olympic+torch+relay+2010&s=rec
– Recurring events, e.g. French Open Tennis
SED with Clustering and Filtering 16
The end
SED with Clustering and Filtering 17