Upload
ioannis-katakis
View
328
Download
0
Tags:
Embed Size (px)
DESCRIPTION
An overview of University of Athens' work on INSIGHT's Twitter Intelligent Sensor Agent.
Citation preview
INSIGHT: Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data
TwitterISA
Ioannis Katakis Univ. of Athens
1
Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data
Contents
2
The Twitter ISA
Classifying Traffic Related Tweets in Dublin, and the Twitter ISA
Complementarity of Event Detection Methods
Identifying Noisy Hashtags
Evaluating the Sample Quality
Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data
Purpose of the Twitter ISA
Advantages of Social Sensors
Richer information about the event (description in natural language)
Multi-modal content (text, image, sound, video)
Mobile
Low cost. People will volunteer.
Can be asked any question (by crowdsourcing)
3
The Twitter ISA
Analyze the Social Stream in Real-Time and Identify Events
Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data
Twitter ISA in the INSIGHT Architecture
4
Τhe Twitter ISA
Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data
Twitter-ISA Architecture
5
The Twitter ISA
Twitter Process
Historical Data Real-time JSON
Stream
Twitter Agent
Twitter Model
(Traffic +Floods
Classifier)
RT1 RT2 … RTN Round Table
Manager
Twitter Streaming API
Join Table
Leave Table
Query
data
discussion anomaly
Τhe Twitter ISA
Classifying Traffic Related Tweets in Dublin
Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data
Current Situation & Problem
Twitter Services that inform people about traffic issues (@LiveDrive, @RoadWatch, @GardaTraffic)
7
Τhe Twitter ISA
Citizen Tweets about traffic
> Can we automatically identify citizen traffic related tweets?
Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data
Solution
Training the Text Classifier
We could use service Tweets (they talk about traffic) but there are not formatted like a citizen tweet
Assumption: Tweets that @mention one of the services are talking about
Build a classifier on those tweets
Extend the Twitter Dublin-Stream by following users from Dublin
Precision: 70%
8
Τhe Twitter ISA
> A classifier that identifies traffic related tweets
Dimitrios Kotzias, Theodoros Lappas, DimitriosGunopulos: Addressing the Sparsity of Location Information on Twitter. EDBT/ICDT Workshops 2014: 339-346
Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data
Τhe Twitter ISA
Complementarity of Event Detection Methods
Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data
Problem
Study a set of event detection techniques using different sources of information of the same stream
10
Τhe Twitter ISA
Active Users
Sentiment Analysis
Social Graph
London Dataset (10 Days, 700K Tweets)
Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data
Activating Users
11
Τhe Twitter ISA
> Correlation between active users and events> Events motivate users to say something in Twitter
Unique Users Participating in each time segment
Severe thunderstorms in Germany (9/6 & 11/6)
Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data
Sentiment Analysis
> Mostly sad events…
> Independent of the number of users participating
12
Τhe Twitter ISA
Posi
tive
Neg
ativ
e
Emotion Change Detection for Event Identification Online detection of changes
in the emotional data distribution
Anger, fear, disgust, happiness, sadness, surprise.
Valkanas G., Gunopulos D., "How the Live Web Feels About Events", CIKM 2013
Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data
User to User Interactions
13
Τhe Twitter ISA
1. Extract the social graph (of each time segment) based on the reply tweets (reply = connection
2. Display the largest connected component of this graph as a time series
Largest Connect Component vs Time
> Detection Methods are Complementary
Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data
Τhe Twitter ISA
Identifying noisy hashtags
D. Kotsakos, P. Sakkos, I. Katakis, D. Gunopulos, “#tag: Meme or Event?“, The 2014 IEEE/ACM International Conference on Advances in Social Network Analysis and Mining (ASONAM 2014), Beijing, China, August 17-20, 2014.
Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data
#Hashtags
Add valuable meta-knowledge to text that is by nature limited in length
#Events
Track events using hashtags #worldcup2014
#Memes
Promote certain ideas or discussions
Celebrity fans – target trends list of the platform
Advertising
Hashtags, Events and Memes
15
Τhe Twitter ISA
Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data
Memes vs Events
16
Τhe Twitter ISA
Events can be traced back to the news stream and social stream whereas Memes only appears in the social stream
Memes are not inherently detrimental. However, due to their volume they can be noise for some tasks (e.g. event detection)
Many event detection applications are affected by these noisy meme-#hashtags
We developed a method that distinguishes between Event-Hashtags and Meme-Hashtags by using machine learning classifiers
Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data
Features for the Classifier
Text Features
TokensPerTweet
hashTagsPerTweet
urlsPerTweet
mediasPerTweet
favoritesPerTweet
retweetsPerTweet
17
Τhe Twitter ISA
Social features replyTweets mentionsPerTweet tweetsPerUser uniqueUsersCount userFollowersPerUser userFriendsPerUser listedCountPerUser avgVerifiedUsers
Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data
Results
18
Τhe Twitter ISA
We can accurately distinguish between events and meme hashtags
Most informative features (Information Gain)
Τhe Twitter ISA
Is the Sample Good Enough?
G. Valkanas, I. Katakis, D. Gunopulos, A. Stefanidis, “Mining Twitter Data with Resource Constraints“, The 2014 IEEE / WIC / ACM International Conference on Web Intelligence, 11-14 August 2014, Warsaw, Poland.
Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data
Main Research Question
We compare against the 10% sample (Garden Hose)
20
Τhe Twitter ISA
Is the 1% sample provided by the Twitter API sufficient for spatio-temporal analysis tasks? … which tasks?
Problem: Even though the we use methods to extend the Twitter Stream (e.g. following specific users), the 1% constraints remains an issue for a lot of tasks.
Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data
Tasks we look into
Sentiment Analysis
Geo-located information
Popular tweets
Social Graph Evolution
Linguistic Analysis
21
Τhe Twitter ISA
Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data
Results & Conclusions
The two streams are similar when it comes to geo-locatedinformation, sentiment analysis, social graph
22
Τhe Twitter ISA
… but they differ when it comes to looking into details (e.g. if you try to find the most re-tweeted tweets)
(An Experiment…)
1. Identify the most retweeted tweets by analyzing both samples.
2. Compare these lists against the ground truth (since this information is included in the tweet)
How the top-N most retweet tweets extracted from the samples are similar to the ground truth. 10% sample approximates the ground truth better
N
Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data 23
Thank You!Questions?