Upload
amparo-elizabeth-cano
View
333
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs
Citation preview
Stretching the Life of Twitter Classifiers with Time-Stamped
Semantic Graphs
A. Elizabeth Cano (@pixarelli) [email protected]
Yulan He
Harith Alani
1
Introduction Social Media Streams
2
Introduction Representing Topics in Dynamic Environments
#Jan24
Egypt
dead
protest
security
Egypt
Pres Morsi
Tehran
Syrian
uprising
Boston
bombing
suspect
Watertown
Obama
ISIS
strategy
3 dead in protest in Egypt. Security official vows to ‘deal firmly..#Jan24
Egypt Pres Morsi uses his visit to Tehran to praise the Syrian uprising
#Boston bombing suspect “pinned down” on boat in Watertown
Why Obama needs to rethink his entire ISIS strategy…
2011 2012 2013 2014
Techniques for topic classification of Social Media are sensitive to the evolution of topics
3
Introduction
Challenges • Keeping updated model requires regular
retuning. • Manual annotation expensive
Questions
• Which feature types provide a more stable representation of a topic?
4
Introduction Previous work
Using local features • Bag of Words (BoW)[Genc et al., 2011] • BoW + Bag of Entities (BoE) [Vitale et al., 2012] • BoW + BoE + Part of Speech (PoS) tagging [Munoz et al.,
2011][Varga et al., 2012]
Exploiting the link structure of a Knowledge Source • Exploiting categories containing entities [Michelson et al.,
2010] • Relating tweets with Wikipedia resources[Milne et al., 2008]
[Xu et al., 2011]. • Use of semantic features for topic classification [Cano et al.,
2013] [Varga et al.,2014].
5
Introduction Topic Evolution
Twitter Corpus
Topic
. . . .
t t+1 . . . . . . . .
Lexical
Seman7c
6
Introduction Characterising Topic Changes with DBpedia
dbp:Barack_Obama
yago:PresidentOfTheUnitedStates
rdf:type
dbo:Person
rdfs:subClassOf
dbo:au
thor
dbp:Michelle_Obama
dbo:spouse
dbp:Hawaii dbo:birthPlace
skos:subject dbp:The_Audacity_of_Hope
dbp:Dreams_from_My_Father
.
.category:Community_organisers
category:Columbia_University_Alumni
.
.
3.6 DBPEDIA
3.7 DBPEDIA
skos:subject
dbo:leaderdbp:United_States_National_Council
dbp:National_Science_and_Techology
.
.
category:United_States_presidential_candidates,_2012dbp:Al-Qaeda
dbo:wikiPageWikiLink
3.8 DBPEDIA dbp:Budget_Control_Act_of_2011dbo:wikiPageWikiLink
Some features remain unchanged, others provide information of past, current or future contexts (e.g. dbp:UnitedStatesPresidentialCandidates)!
7
Approach DBpedia Graph Snapshots
Definition: Time-dependent Resource Meta Graph! Is a sequence of tuples G:=(R,P,C,Y, ft) where • R, P, C are finite sets whose elements are
resources, properties and classes; • Y is a ternary relation
representing a hypergraph with ternary edges. • Y is a tripartite graph where the
vertices are • ft assigns a temporal marker to each ternary
edge.
Y ⊆ R×P×C
H Y( ) = V,DD = r, p,c{ } r, p,c( )∈ Y{ }
8
Approach Semantic Representation of a Tweet
<dbp:Hosni_Mubarak>
<dbp:Egypt> <dbp:Barack_Obama>
dbp: http://dbpedia.org/resource/
<dbp:CNN>
9
Approach Semantic Representation of a Tweet
<dbp:Hosni_Mubarak>
<dbp:Egypt> <dbp:Barack_Obama>
<dbp:CNN>
<yago:NobelPeacePrizeLaureates>
rdf:type
<dbo:OfficeHolder> rdf:type
<dbo:Country>
rdf:type
dbo: http://dbpedia.org/ontology/
<dbo:Broadcaster>
rdf:type
Class Features (rdf:type)
10
Approach Semantic Representation of a Tweet
<dbp:Hosni_Mubarak>
<dbp:Egypt> <dbp:Barack_Obama>
<dbp:CNN>
American
<dbp:Prime_Minister_of_Egypt>
<dbp:Altanta>
skos: http://dbpedia.org/resource/Category:
<dbp:Egyptian_Arabic>
dbprop:languages
Property Features
dbprop:nationality dbprop:headquarters
dbprop:title
11
Approach Semantic Representation of a Tweet
<dbp:Hosni_Mubarak>
<dbp:Egypt> <dbp:Barack_Obama>
<dbp:CNN>
<skos:Presidents_of_the_United_States
<skos:PresidentsOfEgypt> dcterms:subject
<skos:Arab_republics>
skos: http://dbpedia.org/resource/Category:
<skos:English-language_television_stations>
dcterms:subject
Category Features (skos)
dcterms:subject
dcterms:subject
12
Approach Semantic Representation of a Tweet
<dbp:Hosni_Mubarak>
<dbp:Egypt> <dbp:Barack_Obama>
<dbp:CNN>
<dbp:Death_Of_Osama_Bin_Laden>
<dbp:Prime_Minister_of_Egypt>
<dbp:Altanta>
skos: http://dbpedia.org/resource/Category:
<dbp:Egyptian_Arabic>
dbprop:languages
Resource Features
dbprop:commander dbprop:headquarters
dbprop:title
13
Approach DBpedia Graph Snapshots
I.e. The meta-graph of entity e is the aggregation of all resources, properties and classes related to this entity at time t.
Properties and Resources <dbp:Barack_Obama>
DBpedia 3.6 3.7 3.8 ….
<MichelleObama>
<Hawaii>
prop:spouse
prop:birthPlace
<MichelleObama>
<Hawaii>
prop:spouse
prop:birthPlace
prop:commander
<MichelleObama>
<Hawaii>
prop:spouse
prop:wikiPageWikiLink
<UnitedStatesPresidentialCandidates>
prop:birthPlace
<Budget_Control_Act_of_2011>
prop:wikiPageWikiLink <dbp:Death_Of_Osama_Bin_Laden>
14
Approach Semantic Feature Weighting Strategies
Characterise the global relevance of a semantic feature to a given topic in DBpedia at a given point in time.
Topic Relevance-based Weighting Strategy:
DBpedia Graph Topic graph in DBpedia Graph
?
15
Approach Semantic Feature Weighting Strategies
• Class-based Topic Relevance (ClsW) • Property-based Topic Relevance (PropW) • Category-based Topic Relevance (CatW) • Resource Relevance (ResW)
Topic Relevance-based Weighting Strategy:
16
Approach Semantic Feature Weighting Strategies
Integrating weights into a Tweet representation
DB_ tWx( f ) = DB_ t
Nx( f ) +1
F +DB_ t
Nx( f ')f '∈F∑
#
$
%%
&
'
((∗ WDB_ t ( f )#$ &'
1/2
Semantic feature f in a document x:
Frequency with Laplace smoothing
Weight derived from DB_t graph
17
Experiments Framework for Twitter Topic Classification with DBpedia
18
• Do semantic features built from DBpedia Graphs aid on a cross-epoch topic classification of Tweets?
• Which feature type provides a more stable topic representation over time?
Experiments Framework for Twitter Topic Classification with DBpedia
2010
Microposts
Dumps
3.6
3.7
3.8
2011
2013
Resources
3.9
19
Experiments Datasets
Tweets 2010 2011 2013
Disaster and Accident (D&A) Law and Crime (L&C) War and Conflict (W&C) Violence Related Topics
Nov-Dec Aug Sep
1x106 1x106 1x106
Assigns a topic label from a pool of over 10 categories
Perform Manual Annotation until 1K per year per Topic
Negative set 1K per year for Topics other than the 3
12K annotated tweets
20
Experiments Framework for Twitter Topic Classification with DBpedia
2010
Microposts
Dumps
3.6
3.7
3.8
2011
2013
Resources
3.9
Concept Enrichment
<dbp:Hosni_Mubarak>
<dbp:Egypt> <dbp:Barack_Obama>
<dbp:CNN>
21
Experiments Framework for Twitter Topic Classification with DBpedia
2010
Microposts
Dumps
3.6
3.7
3.8
2011
2013
Resources
3.9
Concept Enrichment
Resource Backtrack Mapping
Deriving Semantic Graph Snapshots
2010 2011 2013
22
Experiments Framework for Twitter Topic Classification with DBpedia
Concept Enrichment
Resource Backtrack Mapping
Deriving Semantic Graph Snapshots
2010 2011 2013
DBpedia Topic Relevance based Feature Weighting
2010
Microposts
Dumps
3.6
3.7
3.8
2011
2013
Resources
3.9
23
Experiments Datasets
LEX
24
W&
C
D&
A L&
C
NE
G
2010 2011 2013
2010 2011 2013
2010 2011 2013
2010 2011 2013
BoW Category Property Resource Class SEMANTIC
Experiments Framework for Twitter Topic Classification with DBpedia
Concept Enrichment
Resource Backtrack Mapping
Deriving Semantic Graph Snapshots
2010 2011 2013DBpedia Topic Relevance based Feature Weighting
Build Topic Classifier
Topic Labelled Microposts 20
10
2011
2013
2010
Microposts
Dumps
3.6
3.7
3.8
2011
2013
Resources
3.9
25
Experiments Understanding the Stability of a Topic Representation
train test
Lexi
cal
Sem
anti
c Co
mbi
ned
Epoch t t+1
Same epoch Scenario
26
Experiments Epoch Scenarios
Same epoch Scenario (Trained on 2010- Tested on 2010)
Disaster_Acc Law_Crime War_Conflict F1 F1 F1
BoW 0.831 0.765 0.844
Category 0.697 0.650 0.744
Property 0.680 0.639 0.720
Resource 0.692 0.637 0.762
Class 0.633 0.583 0.637
27
All the experiments reported in our paper were conducted using a 10-‐fold cross valida7on seMng
Same epoch Scenario
Experiments Understanding the Stability of a Topic Representation
train test
Lexi
cal
Sem
anti
c Co
mbi
ned
Epoch t t+1
Cross-epoch Scenario test train
t t+1 28
Experiments Epoch Scenarios
Cross-epoch Scenario (Trained on 2010- Tested on X)
Cross-Epoch
2010-2011 2010-2013 2011-2013 Average
F1 F1 F1 BoW 0.634 0.481 0.261 0.458 Category 0.683 0.539 0.524 0.582 Property 0.665 0.557 0.502 0.603 Resource 0.774 0.544 0.445 0.587 Class 0.691 0.665 0.669 0.675
Disaster_Acc
29
Experiments Epoch Scenarios
Averaged Cross-epoch Scenarios
Disaster_Acc Law_Crime War_Conflict Average F1 F1 F1
BoW 0.458 0.620 0.531 0.536 Category 0.582 0.537 0.453 0.55 Property 0.574 0.504 0.506 0.528 Resource 0.587 0.578 0.466 0.544 Class 0.675 0.647 0.664 0.665
30
Conclusions
• Semantic Features are much slower to decay
than lexical features. • Semantic representation improve performance in
cross-time setting scenarios. • Class based features alone achieve on average a
gain of 7% over lexical features on cross-epoch setting scenarios.
31
Future Work
• Concept-drift tracking for transfer learning using
Linked Data sources. • Study cross-epoch transfer learning approaches
using semantic features.
32