23
INSIGHT: Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data Twitter ISA Ioannis Katakis Univ. of Athens 1

Twitter Intelligent Sensor Agent

Embed Size (px)

DESCRIPTION

An overview of University of Athens' work on INSIGHT's Twitter Intelligent Sensor Agent.

Citation preview

Page 1: Twitter Intelligent Sensor Agent

INSIGHT: Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

TwitterISA

Ioannis Katakis Univ. of Athens

1

Page 2: Twitter Intelligent Sensor Agent

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Contents

2

The Twitter ISA

Classifying Traffic Related Tweets in Dublin, and the Twitter ISA

Complementarity of Event Detection Methods

Identifying Noisy Hashtags

Evaluating the Sample Quality

Page 3: Twitter Intelligent Sensor Agent

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Purpose of the Twitter ISA

Advantages of Social Sensors

Richer information about the event (description in natural language)

Multi-modal content (text, image, sound, video)

Mobile

Low cost. People will volunteer.

Can be asked any question (by crowdsourcing)

3

The Twitter ISA

Analyze the Social Stream in Real-Time and Identify Events

Page 4: Twitter Intelligent Sensor Agent

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Twitter ISA in the INSIGHT Architecture

4

Τhe Twitter ISA

Page 5: Twitter Intelligent Sensor Agent

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Twitter-ISA Architecture

5

The Twitter ISA

Twitter Process

Historical Data Real-time JSON

Stream

Twitter Agent

Twitter Model

(Traffic +Floods

Classifier)

RT1 RT2 … RTN Round Table

Manager

Twitter Streaming API

Join Table

Leave Table

Query

data

discussion anomaly

Page 6: Twitter Intelligent Sensor Agent

Τhe Twitter ISA

Classifying Traffic Related Tweets in Dublin

Page 7: Twitter Intelligent Sensor Agent

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Current Situation & Problem

Twitter Services that inform people about traffic issues (@LiveDrive, @RoadWatch, @GardaTraffic)

7

Τhe Twitter ISA

Citizen Tweets about traffic

> Can we automatically identify citizen traffic related tweets?

Page 8: Twitter Intelligent Sensor Agent

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Solution

Training the Text Classifier

We could use service Tweets (they talk about traffic) but there are not formatted like a citizen tweet

Assumption: Tweets that @mention one of the services are talking about

Build a classifier on those tweets

Extend the Twitter Dublin-Stream by following users from Dublin

Precision: 70%

8

Τhe Twitter ISA

> A classifier that identifies traffic related tweets

Dimitrios Kotzias, Theodoros Lappas, DimitriosGunopulos: Addressing the Sparsity of Location Information on Twitter. EDBT/ICDT Workshops 2014: 339-346

Page 9: Twitter Intelligent Sensor Agent

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Τhe Twitter ISA

Complementarity of Event Detection Methods

Page 10: Twitter Intelligent Sensor Agent

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Problem

Study a set of event detection techniques using different sources of information of the same stream

10

Τhe Twitter ISA

Active Users

Sentiment Analysis

Social Graph

London Dataset (10 Days, 700K Tweets)

Page 11: Twitter Intelligent Sensor Agent

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Activating Users

11

Τhe Twitter ISA

> Correlation between active users and events> Events motivate users to say something in Twitter

Unique Users Participating in each time segment

Severe thunderstorms in Germany (9/6 & 11/6)

Page 12: Twitter Intelligent Sensor Agent

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Sentiment Analysis

> Mostly sad events…

> Independent of the number of users participating

12

Τhe Twitter ISA

Posi

tive

Neg

ativ

e

Emotion Change Detection for Event Identification Online detection of changes

in the emotional data distribution

Anger, fear, disgust, happiness, sadness, surprise.

Valkanas G., Gunopulos D., "How the Live Web Feels About Events", CIKM 2013

Page 13: Twitter Intelligent Sensor Agent

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

User to User Interactions

13

Τhe Twitter ISA

1. Extract the social graph (of each time segment) based on the reply tweets (reply = connection

2. Display the largest connected component of this graph as a time series

Largest Connect Component vs Time

> Detection Methods are Complementary

Page 14: Twitter Intelligent Sensor Agent

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Τhe Twitter ISA

Identifying noisy hashtags

D. Kotsakos, P. Sakkos, I. Katakis, D. Gunopulos, “#tag: Meme or Event?“, The 2014 IEEE/ACM International Conference on Advances in Social Network Analysis and Mining (ASONAM 2014), Beijing, China, August 17-20, 2014.

Page 15: Twitter Intelligent Sensor Agent

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

#Hashtags

Add valuable meta-knowledge to text that is by nature limited in length

#Events

Track events using hashtags #worldcup2014

#Memes

Promote certain ideas or discussions

Celebrity fans – target trends list of the platform

Advertising

Hashtags, Events and Memes

15

Τhe Twitter ISA

Page 16: Twitter Intelligent Sensor Agent

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Memes vs Events

16

Τhe Twitter ISA

Events can be traced back to the news stream and social stream whereas Memes only appears in the social stream

Memes are not inherently detrimental. However, due to their volume they can be noise for some tasks (e.g. event detection)

Many event detection applications are affected by these noisy meme-#hashtags

We developed a method that distinguishes between Event-Hashtags and Meme-Hashtags by using machine learning classifiers

Page 17: Twitter Intelligent Sensor Agent

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Features for the Classifier

Text Features

TokensPerTweet

hashTagsPerTweet

urlsPerTweet

mediasPerTweet

favoritesPerTweet

retweetsPerTweet

17

Τhe Twitter ISA

Social features replyTweets mentionsPerTweet tweetsPerUser uniqueUsersCount userFollowersPerUser userFriendsPerUser listedCountPerUser avgVerifiedUsers

Page 18: Twitter Intelligent Sensor Agent

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Results

18

Τhe Twitter ISA

We can accurately distinguish between events and meme hashtags

Most informative features (Information Gain)

Page 19: Twitter Intelligent Sensor Agent

Τhe Twitter ISA

Is the Sample Good Enough?

G. Valkanas, I. Katakis, D. Gunopulos, A. Stefanidis, “Mining Twitter Data with Resource Constraints“, The 2014 IEEE / WIC / ACM International Conference on Web Intelligence, 11-14 August 2014, Warsaw, Poland.

Page 20: Twitter Intelligent Sensor Agent

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Main Research Question

We compare against the 10% sample (Garden Hose)

20

Τhe Twitter ISA

Is the 1% sample provided by the Twitter API sufficient for spatio-temporal analysis tasks? … which tasks?

Problem: Even though the we use methods to extend the Twitter Stream (e.g. following specific users), the 1% constraints remains an issue for a lot of tasks.

Page 21: Twitter Intelligent Sensor Agent

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Tasks we look into

Sentiment Analysis

Geo-located information

Popular tweets

Social Graph Evolution

Linguistic Analysis

21

Τhe Twitter ISA

Page 22: Twitter Intelligent Sensor Agent

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Results & Conclusions

The two streams are similar when it comes to geo-locatedinformation, sentiment analysis, social graph

22

Τhe Twitter ISA

… but they differ when it comes to looking into details (e.g. if you try to find the most re-tweeted tweets)

(An Experiment…)

1. Identify the most retweeted tweets by analyzing both samples.

2. Compare these lists against the ground truth (since this information is included in the tweet)

How the top-N most retweet tweets extracted from the samples are similar to the ground truth. 10% sample approximates the ground truth better

N

Page 23: Twitter Intelligent Sensor Agent

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data 23

Thank You!Questions?