Developing Smart Cities Services through Semantic Analysis
of Social StreamsCataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops
(Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group)
WDS4SC 2015 WWW 2015 Workshop on
Web Data Science and Smart Cities Florence (Italy) - May 19, 2015
Outline• Background
• Information Overload • Social Content Analytics
• CrowdPulse • Social Data Extraction • Semantic Tagging • Sentiment Analysis • Processing & Visualization
• Use Cases • L’Aquila Social Urban Network • The Italian Hate Map
• Conclusions
2Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Background
3Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Background
4
Information Overload
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Information Overload
5
… in digital life
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Information Overload
6… in real life
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
7
Obstacleor
Opportunity?
Information Overload
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Background (again)
8Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
9
Social Networks
can be considered as novel data silosCataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
10
Social Networks
information about preferencesCataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
11
Social Networks
information about connectionsCataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
12
Social Networks
information about people feelingsCataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
13
Social Networks
changed the rule for content analytics
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
14
Social Content AnalyticsSuccessful Use Cases
- Online brand monitoring
- Social CRM- Real-time polls
All these applications share a common insight
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
15
Social Content AnalyticsResearch Question
Is it possible to aggregate rough human-generated data to get complex people-based findings?
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
16
Our contribution: CrowdPulse
A framework for real-time Semantic Analysis of Social Streams
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
17
CrowdPulse
Social Data Extraction
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
features
Semantic Tagging
Sentiment Analysis Processing & Visualization
18Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
workflowCrowdPulse
19Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Step 1: Social Data ExtractionCrowdPulse
20Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Step 1: Social Data Extraction
Extraction
Source
Heuristics
CrowdPulse
21Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Step 1: Social Data Extraction
Extraction
Source
Heuristics
CrowdPulse
22Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Step 1: Social Data Extraction
Extraction
Source
Heuristics
ContentUserGeo
Content+Geo
#www2015#democrats
#traffic
@barack_obama@comunefi
#earthquake
Page
Group
CrowdPulse
23Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Step 1: Social Data Extraction
Extraction
Source
Heuristics
ContentUserGeo
Content+Geo
#www2015#democrats
#traffic
@barack_obama@comunefi
#earthquake
Page
GroupWe only extract public content
CrowdPulse
24Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Step 2: Semantic TaggingCrowdPulse
25
aquila
??
(eagle)
(italian city)
(italian)
Semantic TaggingMotivations
Poor Semantics
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Keyword-based representation introduces a lot of noise in the analysis
26
aquila
??
(eagle)
(italian city)
(italian)
Semantic TaggingMotivations
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
(Please, do something: l’Aquila is going to die!)(Please, do something: the eagle is going to die!)
“Fate qualcosa per favore, l’Aquila sta morendo!”
?
27
Semantic TaggingMotivations
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
• Entity Linking Algorithms• Input: textual content • Output: identification and
disambiguation of the entities mentioned in the text.
(1) http://tagme.di.unipi.it
(2) http://spotlight.dbpedia.org
28
Step 2: Semantic Tagging
Solution: semantic processing of extracted content
Algorithms
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
29
Step 2: Semantic TaggingCrowdPulse
Entity Linking: identification and disambiguation of the entities mentioned in the text.
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
30
Step 2: Semantic TaggingCrowdPulse
Non-trivial NLP tasks (stopwords removal, n-grams identification, named entities recognition and disambiguation) are automatically performed
Entity Linking: identification and disambiguation of the entities mentioned in the text.
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
CrowdPulse
31
Step 2: Semantic Tagging
Entity Linking: identification and disambiguation of the entities mentioned in the text.
Each entity is a reference to a Wikipedia page http://it.wikipedia.org/wiki/Massimo_Cialente
IMPORTANT!
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
We enriched the entity-based representation by exploiting the Wikipedia categories’ tree
32Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
CrowdPulseStep 2: Semantic Tagging
We enriched the entity-based representation by exploiting the Wikipedia categories’ tree
33Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
CrowdPulseStep 2: Semantic Tagging
Many interesting (new) features come into play!(e.g. italian politics, L’Aquila mayors, Democrats politics)
The final representation of each content is obtained by merging the entities identified in the text with the most relevant Wikipedia categories each entity is linked to.
Features = Entities + Wikipedia Categories
34Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
CrowdPulseStep 2: Semantic Tagging
35
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Step 3: Sentiment Analysis
36
Sentiment AnalysisMotivations
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Is this content conveying any opinion?
37
Sentiment AnalysisMotivations
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Is this content conveying any opinion?
This is a crucial issue if people-based findings have to be generated
38
Sentiment AnalysisDefinition
“It is the field of study that analyzes people’s
opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as
products, services, organizations, individuals, issues, events, topics, and
their attributes “ (*)
(Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008)
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
We concentrated on the polarity detection task
39
Sentiment Analysis
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
How to develop an (unsupervised) sentiment analysis algorithm?
40
External lexical resourcesassociate a polarity score to each term.
joy +++
frustration - -
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Sentiment AnalysisLexicon
41
SenticNet(*)
(*) Cambria, Erik, Daniel Olsher, and Dheeraj Rajagopal. "SenticNet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis." Twenty-eighth AAAI conference on artificial intelligence. 2014.
Inspired by the Hourglass of Emotions model
Each term is represented on the ground of the intensity of four basic emotional dimensions (sensitivity, aptitude, attention, pleasantness)
The activation level of each dimension defines 16 basic emotions
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Sentiment Analysis
42
Sentiment AnalysisSenticNet
According to the triggered emotions, each term is provided with an aggregated polarity score
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
43
SenticNet
SenticNet models a sentiment score for some bigrams and trigrams as well!
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Sentiment Analysis
44
Insight:The polarity of a textual content (e.g. a
microblog posts) depends on the polarity of the microphrases which compose it.
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Sentiment AnalysisMethodology
45
Insight:The polarity of a textual content (e.g. a
microblog posts) depends on the polarity of the microphrases which compose it.
A microphrase is built whenever a splitting cue
is found in the text
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Sentiment AnalysisMethodology
46
Insight:The polarity of a textual content (e.g. a
microblog posts) depends on the polarity of the microphrases which compose it.
A microphrase is built whenever a splitting cue
is found in the text
Conjunctions, adverbs and punctuations are used as
splitting cues
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Sentiment AnalysisMethodology
47
Insight:The polarity of a textual content (e.g. a
microblog posts) depends on the polarity of the microphrases which compose it.
A microphrase is built whenever a splitting cue
is found in the text
Conjunctions, adverbs and punctuations are used as
splitting cues
example: “I don’t like this food, it’s terrible”
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Sentiment AnalysisMethodology
48
Insight:The polarity of a textual content (e.g. a
microblog posts) depends on the polarity of the microphrases which compose it.
A microphrase is built whenever a splitting cue
is found in the text
Conjunctions, adverbs and punctuations are used as
splitting cues
example: “I don’t like this food, it’s terrible”{ { m1 m2
splittingcue
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Sentiment AnalysisMethodology
49
Insight:
pol(C) = ∑ pol(mi)
The polarity of a textual content (e.g. a microblog posts) depends on the polarity of the microphrases which compose it.
i=1
k
Content microphrase
T={m1…mk}
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Sentiment AnalysisMethodology
50
Insight:
pol(T) = ∑ pol(mi)i=1
k
The polarity of a content depends on the polarity of the micro-phrases which
compose it.
pol(mi) = ∑ score(tj)j=1
term
n
T={m1…mk}
Mi={t1…tn}
Content microphrase
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Sentiment AnalysisMethodology
51
Insight:
pol(T) = ∑ pol(mi)i=1
k
The polarity of a microphrase depends on the polarity of the terms which compose it.
pol(mi) = ∑ score(tj)j=1
term
n
T={m1…mk}
Mi={t1…tn}
Tweet microphrase
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Sentiment AnalysisMethodology
52
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Step 3: Sentiment Analysis
53
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Step 3: Sentiment Analysis
Overall sentiment: :-(
54
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Step 3: Sentiment Analysis
Overall sentiment: :-(The process can be iterated over a larger set of content, to get findings about the feeling of the
population regards a certain topic
55
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Step 3: Sentiment Analysis
Overall sentiment: :-(
56
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Step 4: Processing & Visualization
57
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Step 4: Domain-specific processing
Supervised learning
Unsupervised learning
Linguistic Analysis
classification, regression tasks
clustering
building word spaces, similarity between concepts, analysis of terms usage, etc.
58
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Step 4: Domain-specific processing
Supervised learning
Unsupervised learning
Linguistic Analysis
classification, regression tasks
clustering
building word spaces, similarity between concepts, analysis of terms usage, etc.
CrowdPulse natively supports all these methodologies
59
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Step 4: Domain-specific processing
Supervised learning
Unsupervised learning
Linguistic Analysis
classification, regression tasks
clustering
building word spaces, similarity between concepts, analysis of terms usage, etc.
The choice is typically scenario-dependent
60
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Step 4: Data Visualization
An interactive analytics console is made available for
each project
Descriptive statistics can be built in real-time and can be
immediately shown
61
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Recap
62
Use Cases
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
63Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
L’Aquila Social Urban Network The Italian Hate Map1. 2.
Use Cases
64Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesL’Aquila Social Urban Network
April 6, 20095.8 magnitude earthquake20 billions damages70,000 people displaced309 people died
65Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesL’Aquila Social Urban Network
2015: six years later7 billions fundings still needed22,000 people still displacedDiaspora
66Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesL’Aquila Social Urban Network
19 ‘new towns’ around l’Aquila 15,200 people today live there
67Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesL’Aquila Social Urban Network
What about the consequences?
Loss of trust, sense of belonging, relationships
68Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesL’Aquila Social Urban Network
Loss of social capital
69Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesL’Aquila Social Urban Network
Research Question:Is it possible to extract and process social
media to monitor in real time people feelings, opinions and sentiments about the current
state of the social capital of L’Aquila?
70Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesL’Aquila Social Urban Network
Research Question:
We can use CrowdPulse
Is it possible to extract and process social media to monitor in real time people feelings, opinions and sentiments about the current
state of the social capital of L’Aquila?
71Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesL’Aquila Social Urban Network
Heuristics: - Twitter users (local newspapers, mention to politicians) - Twitter content+geo (50km around l’Aquila with specific hashtags as #laquila #earthquake, etc)
CROWDPULSE SETTINGS
72Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesL’Aquila Social Urban Network
CROWDPULSE SETTINGS
Heuristics: - Facebook groups (identified after a thorough analysis) - Facebook pages (identified after a thorough analysis)
73Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesL’Aquila Social Urban Network
Extracted content (example)
Tweets about the fear of new earthquakes.
Facebook posts about citizens’ proposals.
Tweets about people worried of the situation.Tweets about new buildings in the city.
CROWDPULSE SETTINGS
74Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesL’Aquila Social Urban Network
Sentiment Analysis and Semantic Tagging of the content
CROWDPULSE SETTINGS
75Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesL’Aquila Social Urban Network
How to map each content with the social indicator it refers to?
CROWDPULSE SETTINGS
76Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesL’Aquila Social Urban Network
Given a fixed set of social capital indicators, we built a classification model to associate each content (along with
its sentiment) to the social indicator it refers to.
CROWDPULSE SETTINGS
77Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesL’Aquila Social Urban Network
Tweet about new buildings in the city.
Input: Social indicators + semantic representation of the content
Tweet about new buildings in the city.
78Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesL’Aquila Social Urban Network
Domain-specific processing: Classification model
Tweet about new buildings in the city.
79Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesL’Aquila Social Urban Network
Output: (multi-class) classification + sentiment
Tweet about new buildings in the city.
80Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesL’Aquila Social Urban Network
Tweet about new buildings in the city.
The score of a social indicator is the average sentiment of all the content referring to it.
81Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesL’Aquila Social Urban Network
CROWDPULSE OUTPUT
Overall score of the social indicators between March and August 2014
82Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesL’Aquila Social Urban Network
CROWDPULSE OUTPUT
COMMUNITY PROMOTER
DEFINES SOME INITIATIVES TO EMPOWER THE SOCIAL CAPITAL
MONITORS THE STATE OF THE SOCIAL INDICATORS
Real-world applicationof the output
Conclusions
83Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
L’Aquila Social Urban Network
Crowdsourcing-based approach
Social content about L’Aquila are extracted and processed in real-time
Machine Learning exploited to build a classification
model
Sentiment Analysis used to provide each social
indicator with a score
1. 2.
3. 4. Analytics Console used to monitor the state of the social
capital in real-time
Almost 500,000 social content extracted and analyzed.
84Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesThe Italian Hate Map
http://users.humboldt.edu/mstephens/hate/hate_map.html
Inspired by the Hate Map built by
the Humboldt University
joint research with a psychologists team of Rome University and a
no-profit agency focused on human
rights
85Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesThe Italian Hate Map
http://users.humboldt.edu/mstephens/hate/hate_map.html
Insight:To aggregate rough people-based data in order to analyze
complex phenomena.
86Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use Cases
Research Question:Is it possible to extract and process social media
to detect intolerant content posted on social networks and identify the most at-risk areas of the
Italian country?
The Italian Hate Map
87Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use Cases
Research Question:Is it possible to extract and process social media
to detect intolerant content posted on social networks and identify the most at-risk areas of the
Italian country?
We can use CrowdPulse
The Italian Hate Map
88Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use Cases
Heuristics: Twitter content- 76 intolerant seed terms, defined by the psychologists teams - 5 intolerance dimensions: violence (against women), racism,
homophobia, disability, anti-semitism
CROWDPULSE SETTINGS
The Italian Hate Map
89Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use Cases
Extracted content (seed term: nano/midget)
Tweet about an Italian ministry
CROWDPULSE SETTINGS
Tweet about iPod nano
Tweet about an Italian football player
The Italian Hate Map
90Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use Cases
Many non-intolerant Tweets are extracted!
Tweet about an Italian ministry
CROWDPULSE SETTINGS
Tweet about iPod nano
Tweet about an Italian football playerX
X
The Italian Hate Map
91Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use Cases
Non-intolerant Tweets are detected and filtered out.
CROWDPULSE SETTINGS
The Italian Hate Map
92Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use Cases
Ironic Tweets are detected and filtered out.
CROWDPULSE SETTINGS
The Italian Hate Map
93Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesCROWDPULSE SETTINGS
The Italian Hate Map
94Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use Cases
We have to build a map, so we only need geotagged content
CROWDPULSE SETTINGS
The Italian Hate Map
95Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use Cases
We have to build a map, so we only need geotagged content
CROWDPULSE SETTINGS
The Italian Hate Map
96Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesCROWDPULSE SETTINGS
The Italian Hate Map
Definition of heuristics to increase the number of geotagged Tweets
97Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesThe Italian Hate Map
Dimension #Tweets #Geo %Geo
Homophobia 110,774 8,501 7,66%
Racism 154,170 1,940 1,24%
Violence 1,102,494 28,886 2,62%
Disability 479,654 3,410 0,75%
Anti-Semitism 6,000 1,150 18,03%
98Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesCROWDPULSE OUTPUT
The Italian Hate Map
Violence against women Disability
Racism Homophobia
99Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Use CasesCROWDPULSE OUTPUT
The Italian Hate Map
Given the maps and given the output of the linguistic analysis of intolerant Tweets (co-occurrences between terms,
timelapse, etc.), the psychologists team defined some guidelines to tackle and prevent intolerant behaviors.
These guidelines have been freely distributed to public administration on early 2015.
Conclusions
100Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Crowdsourcing-based approach
Social content containing the seed terms is extracted and processed in
real-time
Semantic Processing exploited to delete non-intolerant
Tweets
Sentiment Analysis
used to filter out Tweet with irony
1. 2.
3. 4. Analytics Console used to build real-time hate
maps
Almost 2,000,000 social content extracted and analyzed.
The Italian Hate Map
Lessons Learned
101Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Lessons Learned
102
Pipeline of state of the art techniquesEntity Linking, Sentiment Analysis, Machine Learning, Data Visualization
Use Cases.L’Aquila Social Urban Network The Italian Hate Map
DEFINITION OF A FRAMEWORK FOR REAL-TIME SEMANTIC CONTENT ANALYSIS
1.2.
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Lessons Learned
103
Pipeline of state of the art techniquesEntity Linking, Sentiment Analysis, Machine Learning, Data Visualization
Use Cases.L’Aquila Social Urban Network The Italian Hate Map
DEFINITION OF A FRAMEWORK FOR REAL-TIME SEMANTIC CONTENT ANALYSIS
1.2.
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
The outcomes of both use cases showed that very complex phenomena can be analyzed in a totally new
way, thanks to the huge availability of textual data
Future Research
104
Integration of further machine learning techniques, and further data visualization formalisms
Evaluation of the real impact of the framework in real-world dynamics (e.g., do intolerant behaviors decrease thanks to the Hate Map?)
Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015
Improvement of the algorithms for semantic tagging, text classification and sentiment analysis