Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
ARMATWEET: DETECTING EVENTS BYSEMANTIC TWEET ANALYSIS
Alberto Tonon1 Philippe Cudre-Mauroux1
Albert Blarer2 Vincent Lenders2 Boris Motik3
1eXascale Infolab, University of Fribourg, Switzerland
2armasuisse, Switzerland
3University of Oxford, United Kingdom
May 31, 2017
TABLE OF CONTENTS
1 INTRODUCTION & STATE OF THE ART
2 SEMANTIC EVENT DETECTION
3 NLP COMPONENT
4 SEMANTIC ANALYSIS
5 TIME SERIES ANALYSIS
6 EVALUATION
7 CONCLUSION
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 0/18
Introduction & State of the Art
TABLE OF CONTENTS
1 INTRODUCTION & STATE OF THE ART
2 SEMANTIC EVENT DETECTION
3 NLP COMPONENT
4 SEMANTIC ANALYSIS
5 TIME SERIES ANALYSIS
6 EVALUATION
7 CONCLUSION
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 0/18
Introduction & State of the Art
TWITTER EVENT DETECTION
Twitter stats:317 M users, 500 M tweets / day⇒ Timely source of information about current events
Example: Brussels Airport attack on 22/3/2016‘In Brussels Airport. Been evacuated afer [sic] suspected bomb.’‘Stampede now. Everyone running’
Example: 1.7 M tweets about Charlie Hebdo shootings on 7/1/2015
armasuisse S+T: the R&D agency for the Swiss Armed ForcesCan analysing Twitter streams help identify security-relevant events?
Goal: detect events more quickly than through conventional mediaGoal: detect events retroactivelyAlready developing a Social Media Analysis (SMA) system
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 1/18
Introduction & State of the Art
STATE OF THE ART: THREE KINDS OF APPROACHES
Detecting unspecified events
Detecting predetermined events
Detecting specific events
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 2/18
Introduction & State of the Art
STATE OF THE ART: DETECTING UNSPECIFIED EVENTS
Users do not specify topics of interest ⇒ detects most prominent eventsOutput of a prominent system in this category:
http://statuscalendar.com/month/201501/« prev month next month »
« prev month next month »
January 2015Mon Tue Wed Thu Fri Sat Sun
Twitter CalendarA demonstration of NLP on twitter showing the most prominent events mentioned in association with each day.
more
1
more...
caicos: playing, soccer, fan
turks: ciacos, today, playing
january 2015: winning, closed, cost
justin bieber: ciacos, today, soccer
justin: album, birthday, follow
2
more...
@youtube: liked, added, video
connor mcdavid: defended
team canada: defended, game,beating
caicos: soccer, playing, treated
turks: playing, soccer, treated
3
more...
essential mix: listen, listening,called
justin: birthday, taken, wish
4
more...
new york: selling, attend, issued
wolf: show, shows, added
4 january 2015: development, floodrelief, join
valencia: game, give real, winning
5
more...
justin: school, birthday, release
wolf: show, shows, came
@youtube: liked, video, added
6
more...
justin: school, birthday, follow
la: leaving, show, restaurant
lara stone: spring, exposed,showing
calvin klein: passed away, revealed,campaign
justin bieber: school, passed away,birthday
7
more...
justin: school, fan, church
charlie hebdo: attack, killed,shooting
beverly hills: dinner, leaving, leavingdinner
la: fan, live, kissing
paris: attack, killed, shooting
8
more...
@youtube: added, liked, playlist
rosie jones photoshoot:
david letterman: interview, liked,watch
news headlines: raining, recorded
donald trump: interview, liked,added
9
more...
justin: restaurant, eating, birthday
10
more...
@youtube: liked, added, video
justin: party, birthday, follow
la: meal, show, party
11
more...
@youtube: liked, video, added
new york: selling, issued, trip out
congress: boasted, act, application
obama: boasted, act, march
e-book fiction books 11 january2015: selling
12
more...
raif badawi: call, clemency, cut
january jan 12 2015 kate middleton:
trump national doral: attendsopening, golf course, morningjustin: birthday, released, belieber
donald j trump: attends opening,golf course
13
more...
justin: birthday, school, follow
14
more...
beverly hills: leaving, running,spotted
maryam safdar: meeting, school
justin: leaving, dinner, school@youtube: liked, added, video
15
more...
@youtube: liked, added, video
yugioh market guide: cards, liked
justin: hung out, birthday, interview
paige: birthday, leave, looked
dean ambrose: class, dressed, getcharacters
16
more...
jeff rense: show
nigel lyons: interview personal,lyons interview
elsa uchiha: birthday, postedkendall https://t.co/puezqiwzhi:
cara: connect, haunting, join
17
more...
justin: party, fan, playing
jan 2015: birthday, celebrate, backto
@youtube: added, liked, playlist
18
more...
new york: selling, issued, attend
justin: shopping, spotted, home
trade fiction paperback books:sellingkendall: old home, birthday, home
beverly hills: home, old home, join
19
more...
blake lively: crash, #entertainment,wake up
kendall jenner: hiking, playing,home
justin: playing, home, volleyball
kendall: hiking, playing, volleyball
ryan: hiking, home, checking
20
more...
scientific consensus:
hailey: was, excited, see
los angeles: performing, spottedout, videos
justin: performing, singing, covering
@youtube: liked, added, video
21
more...
justin: grab lunch, relaxing, fan
la: show, lunch, performing
22
more...
sarah corcoran:
raif badawi: update, protest, free
justin: leaving, game, fan
la: game, leaving, show
@youtube: liked, added, video
23
more...
justin: meeting, fan, arrested
la: show, birthday, party
24
more...
kostov v.n.: playing
la: show, hiking, meeting
justin: leaving, hiking, meeting
@youtube: liked, added, playlist
25
more...
wwe royal rumble: show, watch, free
new york: selling, snowstorm,warned
books 25 january 2015: selling
e-book fiction books 25 january2015: selling
justin: taken, spotted, fan
26
more...
challenge: starts, win, change
le flah: free, listening
modern nature: album, enter, got acopy
@youtube: liked, added, video
27
more...
@youtube: liked, added, video
justin: leaving, school, birthday
28
more...
justin on the ellen show:
season 2: fought, kill,@danicapatrick season
justin: show, leaving, restaurant
@youtube: added, liked, playlist
29
more...
oprah winfrey: birthday, party,surprise
park yoochun: ceremony
justin: home, show, watch
ewca crim 110: case, law
awards ceremony: held, attend,schedule
30
more...
larry king: chat, stops, stopping
justin: chat, stops, relaxing
beverly hills: chat, stops, relaxing
hs dhingra: commanded, passedaway
avsm: commanded, passed away
31
more...
jackson ms january 31st 2015:information
justin: fan, birthday, singing
kendall: birthday, tumbling, night
beverly hills: shopping, doing, joinsafari
@youtube: liked, added, video
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 3/18
Introduction & State of the Art
STATE OF THE ART: DETECTING PREDETERMINED EVENTS
Hardcoded to specific type of event:ConcertsControversial eventsLocal festivalsEarthquakesDisastersDisease progression...
Typically use NLP + ML + lots of domain knowledge
Efficient on the target event type, but not very generalEvents of interest change⇒ Requires reprogramming/retraining the system
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 4/18
Introduction & State of the Art
STATE OF THE ART: DETECTING SPECIFIC EVENTS
Typically based on keyword matching (i.e., IR techniques)Possibly extended with keyword expansion
Example: Charlie Hebdo shooting in Pariskeywords such as ‘hostage’, ‘shooting’, ‘Paris’, etc. identify the event
VBS / armasuisse / W + T / Dr. Vincent Lenders MS ID/Vers. 35618/00 Event Detection with Ontologies – Kick-off with Prof. B. Motik
3
INTERN
Example Event 1: Charlie Hebdo Shooting
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 5/18
Introduction & State of the Art
WHAT KEYWORDS IDENTIFY ‘POLITICIANS DYING’?
Attempt 1: Match ‘politician’ and ‘die’:politician die since:2015-01-02
Search filters · Hide
From anyone
Anywhere
All languages
Advanced search
New to Twitter?Sign up now to get your ownpersonalised timeline!
TOP LATEST PEOPLE PHOTOS VIDEOS MORE
Sign up
Worldwide Trends
#ESpedesocorro84.6K Tweets
#ConfPresseFillon28.9K Tweets
#earthquake15.4K Tweets
#ARMYSelcaDay50.6K Tweets
#LesAnges99,926 Tweets
Alexandre de Moraes16.4K Tweets
DIRECTS ON TOP463K Tweets
Fognini6,320 Tweets
John Bercow2,610 Tweets
186K Tweets
© 2017 Twitter About Help TermsPrivacy Cookies Ads info
Maggie Klaus @Maggie_Klaus · Jan 26Don't look at me. I voted for the ONLY sane, prepared, intelligent & professional politician in this election. Yay, we all going to die!
1 12
Keith B. Still @NaYaKnoMi · Jan 26there is only one politician in washington i trust and would literally die to protect. her name is @SenGillibrand. the others are cowards.
23 52
Artaxerxes @King_Ojwandoh · Jan 25It all boils down to the ordinary mwananchi, if you want to die coz of a politician, it's your life to lose. #TingaLordOfViolence
5 3
Brad Slager @MartiniShark · Jan 25- Writer of the law says thisMEDIA: (blinks...blinks)
- GOP politician says thisMEDIA: Why do you want people to DIE!!!!!
Jonathan Gruber on @TuckerCarlson : "This law [ACA] was never supposed to help everybody"
1 1
die juedische @juedische · 5hAnti-Israel German politician compares Jewish Facebook head with Nazi
Jonathan H. Adler @jadler1969
Home Moments politician die since:2015-01-02 Have an account? Log in
Attempt 2: Match ‘politician’ or ‘die’:politician OR die since:2015-01-02
Search filters · Hide
From anyone
Anywhere
All languages
Advanced search
New to Twitter?Sign up now to get your ownpersonalised timeline!
TOP LATEST PEOPLE PHOTOS VIDEOS MORE
Sign up
Worldwide Trends#ESpedesocorro86.1K Tweets
#earthquake15.4K Tweets
#LesAnges912.6K Tweets
#ConfPresseFillon28.9K Tweets
#ARMYSelcaDay52.4K Tweets
Alexandre de Moraes16.4K Tweets
DIRECTS ON TOP469K Tweets
Fognini6,547 Tweets
John Bercow3,273 Tweets
186K Tweets
© 2017 Twitter About Help TermsPrivacy Cookies Ads info
Funny Or Die @funnyordie · 16hBest Super Bowl halftime show ever!
7 290 863
Steven James @TheLaunchMag · 43mToday in 2003 50 Cent drops Get Rich or Die Tryin
Jigga loves Ja but loves money more. He lets 50 open for him on tour & diss Ja every nite
100 46
κ80ღ @DOLANZANSKI · 39mthe twins should do a follow spree when when they hit 3 mil rt if u agree@EthanDolan @GraysonDolan(pls do one i will actually die aksjd)
1 18 19
Home Moments politician OR die since:2015-01-02 Have an account? Log in
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 6/18
Semantic Event Detection
TABLE OF CONTENTS
1 INTRODUCTION & STATE OF THE ART
2 SEMANTIC EVENT DETECTION
3 NLP COMPONENT
4 SEMANTIC ANALYSIS
5 TIME SERIES ANALYSIS
6 EVALUATION
7 CONCLUSION
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 6/18
Semantic Event Detection
OBJECTIVE OF SEMANTIC EVENT DETECTION
Goal: extend the SMA system with semantic event detection
⇒ Allow analysts to precisely describe events they want
⇒ Who does what to whom and where; for example:who = a politician; what = dieswho = a militia group; what = performs an act of violencewhat = performs a cyberattack; to whom = a company
The need for a precise specification is a feature, not a bug!Only 1000s of tweets on 3/1/2015 talk about the death of Edward Brooke
Unspecified event techniques have no chance!Must have a way to focus the search
Keyword-based techniques not precise enoughNo understanding of relationships
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 7/18
Semantic Event Detection
MAIN IDEA: USE SEMANTIC SEARCH
Annotate (automatically) each triple with quads(subject , predicate, object , location)
Any component can be absent(Do not confuse with RDF quads → our quads are purely conceptual)
Embed quads into a knowledge graphDBpedia ⇒ background knowledge about nouns
provides anchors for subject , object , and locationE.g., classification: ‘Edward Brooke is a politician’E.g., containment hierarchy: ‘Bern is in Switzerland’
WordNet ⇒ background knowledge about verbsprovides anchors for predicate
Describe events of interest using queries over the knowledge graphE.g., ‘quads where subject is classified as politician, and predicate refers to dying’
Use time series analysis to identify eventsE.g., by threshold or other statistics
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 8/18
Semantic Event Detection
SYSTEM ARCHITECTURE
Quads Time Series
Event Categories Semantic Analysis
Creating Knowledge Graph
Creating Semantic Queries
Reasoning
Event Detection
Aggregating Tweets by Day
Anomaly Detection
NLP
Resolving Entities
Resolving Verbs
Preparing Data
Extracting Locations
Correcting Passive Voice
Tweets
Events
System outputs a list of eventsEach event has a summary (e.g., ‘Edward Brooke dies’)Each event is associated with one day ⇒ no long-running events (currently)
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 9/18
NLP Component
TABLE OF CONTENTS
1 INTRODUCTION & STATE OF THE ART
2 SEMANTIC EVENT DETECTION
3 NLP COMPONENT
4 SEMANTIC ANALYSIS
5 TIME SERIES ANALYSIS
6 EVALUATION
7 CONCLUSION
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 9/18
NLP Component
THE ROLE OF NATURAL LANGUAGE PROCESSING
Goal: convert a tweet into zero or more quads
1 Data preparationClean the text, remove emoticons, etc.Use OpenIE to extract named entities, POS tags, texttriples, and parse treesUse Spotlight to annotate text with DBpedia resources
2 Location extractionIf the object is a NER location and a grammatical casedependency from predicate to object exists, moveobject to location (e.g., ‘White House’)
3 Passive voice correctionIf predicate has passive auxiliary modifier dependency,passive subject dependency to the subject, and agentdependency to object, swap subject and object
4 Entity resolutionOverlap text triples with the output of Spotlight
5 Verb resolutionProprietary approach adapted to WordNet
HawijaNNP
wasVBD
bombedVBN
byIN
ISISNNP
againRB
!!
nsubjpassadvmod
agent
caseauxpass
(‘Hawija’, ‘was’, ‘bombed’), (‘Hawija’, ‘was bombed by’, ‘ISIS’)
ObamaNNP
metVBD
TrumpNNP
inIN
theDT
WhiteNNP
dobjnsubj
HouseNNP
nmod:indet
casecomp
(‘Obama’, ‘met Trump in’, ‘White House’), (‘Obama’, ‘met’, ‘Trump’)
HawijaNNP
wasVBD
bombedVBN
byIN
ISISNNP
againRB
!!
nsubjpassadvmod
agent
caseauxpass
(‘Hawija’, ‘was’, ‘bombed’), (‘Hawija’, ‘was bombed by’, ‘ISIS’)
ObamaNNP
metVBD
TrumpNNP
inIN
theDT
WhiteNNP
dobjnsubj
HouseNNP
nmod:indet
casecomp
(‘Obama’, ‘met Trump in’, ‘White House’), (‘Obama’, ‘met’, ‘Trump’)
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 10/18
Semantic Analysis
TABLE OF CONTENTS
1 INTRODUCTION & STATE OF THE ART
2 SEMANTIC EVENT DETECTION
3 NLP COMPONENT
4 SEMANTIC ANALYSIS
5 TIME SERIES ANALYSIS
6 EVALUATION
7 CONCLUSION
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 10/18
Semantic Analysis
KNOWLEDGE GRAPH STRUCTURE
aso:Tweet
ast:551507074258325504
rdf:type
“EdwardBrooke,firstblacksenatorsinceReconstruction,diesat95#Birminghamhttp://t.co/SixG7VOUC1”
“2015-01-03T22:35:07”
dbr:Birmingham
dbr:United_Kingdom
aso:tweetCountry
asq:5705079aso:tweetQuad
dbr:Edward_Brooke
wnr:200359085-vaso:quadPredicate
wno:gloss
“passfromphysicallifeandloseallbodilyattributesAndfunctionsnecessarytosustainlife”
yago:Politician110451263
rdf:type
dbo:Country
yago:Location100027167
asq:5705080
dbr:Reconstruction_Era
aso:TimeSeries-SP
_:ts-sp_4344996_1855965
rdf:type
aso:timeSeriesSubject
DBpedia
WordNet
TweetsTimeSeries
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 11/18
Semantic Analysis
USER QUERIES AND TIME SERIES EXTRACTION
Example query for ‘politician dying’:SELECT ?S wnr:200359085-v WHERE {
?Q aso:quadPredicate wnr:200359085-v .?Q aso:quadSubject ?S . ?S rdf:type yago:Politician110451263
}
Query produces an RDFox rule that creates time series:[?TS, rdf:type, aso:PoliticianDying],[?TS, aso:timeSeriesSubject, ?S],[?TS, aso:timeSeriesVerb, wnr:200359085-v],[?TS, aso:timeSeriesHigh, ?TW] :-
[?TW, aso:tweetQuad, ?Q],[?Q, aso:quadSubject, ?S],[?S, rdf:type, yago:Politician110451263],[?Q, aso:quadPredicate, wnr:200359085-v],BIND(SKOLEM("ts-sp", ?S, wnr:200359085-v) AS ?TS) .
Rule identify high and low confidence time series members
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 12/18
Time Series Analysis
TABLE OF CONTENTS
1 INTRODUCTION & STATE OF THE ART
2 SEMANTIC EVENT DETECTION
3 NLP COMPONENT
4 SEMANTIC ANALYSIS
5 TIME SERIES ANALYSIS
6 EVALUATION
7 CONCLUSION
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 12/18
Time Series Analysis
TIME SERIES ANALYSIS
Determines whether a time series (i.e., a group of tweets) is an event
We use the Seasonal Hybrid ESD testDeveloped specifically by Twitter for analysing their server load
1 Determine periodicity/seasonality of data2 Split data into disjoint windows each covering at least two weeks3 Subtract from each window the median component for the window4 Apply the Extreme Student Derivative (ESD) test—well-known anomaly detection
We used the implementation from R
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 13/18
Evaluation
TABLE OF CONTENTS
1 INTRODUCTION & STATE OF THE ART
2 SEMANTIC EVENT DETECTION
3 NLP COMPONENT
4 SEMANTIC ANALYSIS
5 TIME SERIES ANALYSIS
6 EVALUATION
7 CONCLUSION
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 13/18
Evaluation
EVALUATION CHALLENGES
No exhaustive list of events (discussed on Twitter) exists ⇒ no ground truth!We can report only precision, but not recall
No comparable system exists
Keyword expansion most related, but no off-the-shelf system existsWe tried implementing a related approach, but not enough timeRaises questions whether we understood the approach well enough
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 14/18
Evaluation
EVALUATION METHODOLOGY (I)
1 We identified event categories relevant to armasuisse in several workshopsAviation accidentCyber attack on a companyCapital punishment in a countryMilitia terror actPolitician dyingPolitician visits a countryUnrest in a country
2 We created queries manuallyAbout four person-days in totalAd hoc process ⇒ many improvements possible/desired!
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 15/18
Evaluation
EVALUATION METHODOLOGY (II)
3 We extracted the events
4 We validated the events manuallyDetermining whether an event matches the query was tricky⇒ We assigned each event to one of four relevance categoriesR3 — clear positive event instancesR2 — positive instances where the entity resolution is imprecise
E.g., dbr:British Raj vs. dbr:IndiaE.g., ‘ISIS attacked X ’ vs. ‘X attacked ISIS’E.g., is an ISIS attack in Iraq a ‘Militia terror act’?
R1 — ‘fuzzy’ relationship to the categoryE.g., is ‘ISIS kills X ’ or ‘policeman killed’ relevant for ‘Unrest in a country’?
R0 — no relevance to the event category
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 16/18
Evaluation
PRECISION RESULTS
Total Positive Instances by RelevanceEvent Category Type Events R3 R3+R2 R3–R1
Aviation accident SP 84 44 (52%) 51 (61%) 64 (76%)Cyber attack on a company PO 129 20 (16%) 42 (33%) 57 (44%)Capital punishment in a country PC 153 47 (31%) 67 (44%) 92 (60%)Militia terror act SP 220 92 (42%) 125 (57%) 141 (64%)Politician dying SP 111 76 (68%) 80 (72%) 85 (77%)Politician visits a country SPC 44 29 (66%) 36 (82%) 44 (100%)Unrest in a country PC 200 125 (63%) 133 (67%) 148 (74%)
Total: 941 433 (46%) 534 (57%) 631 (67%)
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 17/18
Conclusion
TABLE OF CONTENTS
1 INTRODUCTION & STATE OF THE ART
2 SEMANTIC EVENT DETECTION
3 NLP COMPONENT
4 SEMANTIC ANALYSIS
5 TIME SERIES ANALYSIS
6 EVALUATION
7 CONCLUSION
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 17/18
Conclusion
CONCLUSION
The approach was successful at detecting very specific events
⇒ Much better than pure keyword-based approaches
Problem: developing queries manually is difficultMore research into aiding users needed
Open issue: comparison with keyword expansion approaches
Tonon, Cudre-Mauroux, Blarer, Lenders, Motik ArmaTweet: Detecting Events by Semantic Tweet Analysis 18/18