33
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs A. Elizabeth Cano (@pixarelli) [email protected] Yulan He [email protected] Harith Alani [email protected] 1

Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Embed Size (px)

DESCRIPTION

Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Citation preview

Page 1: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Stretching the Life of Twitter Classifiers with Time-Stamped

Semantic Graphs

A. Elizabeth Cano (@pixarelli) [email protected]

Yulan He

[email protected]

Harith Alani

[email protected]

1  

Page 2: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Introduction Social Media Streams

2  

Page 3: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Introduction Representing Topics in Dynamic Environments

#Jan24

Egypt

dead

protest

security

Egypt

Pres Morsi

Tehran

Syrian

uprising

Boston

bombing

suspect

Watertown

Obama

ISIS

strategy

3 dead in protest in Egypt. Security official vows to ‘deal firmly..#Jan24

Egypt Pres Morsi uses his visit to Tehran to praise the Syrian uprising

#Boston bombing suspect “pinned down” on boat in Watertown

Why Obama needs to rethink his entire ISIS strategy…

2011   2012   2013   2014  

Techniques for topic classification of Social Media are sensitive to the evolution of topics

3  

Page 4: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Introduction

Challenges •  Keeping updated model requires regular

retuning. •  Manual annotation expensive

Questions

•  Which feature types provide a more stable representation of a topic?

4  

Page 5: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Introduction Previous work

 

Using local features •  Bag of Words (BoW)[Genc et al., 2011] •  BoW + Bag of Entities (BoE) [Vitale et al., 2012] •  BoW + BoE + Part of Speech (PoS) tagging [Munoz et al.,

2011][Varga et al., 2012]

Exploiting the link structure of a Knowledge Source •  Exploiting categories containing entities [Michelson et al.,

2010] •  Relating tweets with Wikipedia resources[Milne et al., 2008]

[Xu et al., 2011]. •  Use of semantic features for topic classification [Cano et al.,

2013] [Varga et al.,2014].

5  

Page 6: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Introduction Topic Evolution

Twitter Corpus

 

Topic  

. . . .  

t   t+1  . . . .   . . . .  

Lexical  

Seman7c  

6  

Page 7: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Introduction Characterising Topic Changes with DBpedia

 

dbp:Barack_Obama

yago:PresidentOfTheUnitedStates

rdf:type

dbo:Person

rdfs:subClassOf

dbo:au

thor

dbp:Michelle_Obama

dbo:spouse

dbp:Hawaii dbo:birthPlace

skos:subject dbp:The_Audacity_of_Hope

dbp:Dreams_from_My_Father

.

.category:Community_organisers

category:Columbia_University_Alumni

.

.

3.6 DBPEDIA

3.7 DBPEDIA

skos:subject

dbo:leaderdbp:United_States_National_Council

dbp:National_Science_and_Techology

.

.

category:United_States_presidential_candidates,_2012dbp:Al-Qaeda

dbo:wikiPageWikiLink

3.8 DBPEDIA dbp:Budget_Control_Act_of_2011dbo:wikiPageWikiLink

Some features remain unchanged, others provide information of past, current or future contexts (e.g. dbp:UnitedStatesPresidentialCandidates)!

7  

Page 8: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Approach DBpedia Graph Snapshots

 

Definition: Time-dependent Resource Meta Graph! Is a sequence of tuples G:=(R,P,C,Y, ft) where •  R, P, C are finite sets whose elements are

resources, properties and classes; •  Y is a ternary relation

representing a hypergraph with ternary edges. •  Y is a tripartite graph where the

vertices are •  ft assigns a temporal marker to each ternary

edge.

Y ⊆ R×P×C

H Y( ) = V,DD = r, p,c{ } r, p,c( )∈ Y{ }

8  

Page 9: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Approach Semantic Representation of a Tweet

 

<dbp:Hosni_Mubarak>

<dbp:Egypt> <dbp:Barack_Obama>

dbp: http://dbpedia.org/resource/  

<dbp:CNN>

9  

Page 10: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Approach Semantic Representation of a Tweet

 

<dbp:Hosni_Mubarak>

<dbp:Egypt> <dbp:Barack_Obama>

<dbp:CNN>

<yago:NobelPeacePrizeLaureates>

rdf:type  

<dbo:OfficeHolder> rdf:type  

<dbo:Country>

rdf:type  

dbo: http://dbpedia.org/ontology/  

<dbo:Broadcaster>

rdf:type  

Class Features (rdf:type)  

10  

Page 11: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Approach Semantic Representation of a Tweet

 

<dbp:Hosni_Mubarak>

<dbp:Egypt> <dbp:Barack_Obama>

<dbp:CNN>

American

<dbp:Prime_Minister_of_Egypt>

<dbp:Altanta>

skos: http://dbpedia.org/resource/Category:  

<dbp:Egyptian_Arabic>

dbprop:languages  

Property Features  

dbprop:nationality  dbprop:headquarters  

dbprop:title  

11  

Page 12: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Approach Semantic Representation of a Tweet

 

<dbp:Hosni_Mubarak>

<dbp:Egypt> <dbp:Barack_Obama>

<dbp:CNN>

<skos:Presidents_of_the_United_States

<skos:PresidentsOfEgypt> dcterms:subject  

<skos:Arab_republics>

skos: http://dbpedia.org/resource/Category:  

<skos:English-language_television_stations>

dcterms:subject  

Category Features (skos)  

dcterms:subject  

dcterms:subject  

12  

Page 13: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Approach Semantic Representation of a Tweet

 

<dbp:Hosni_Mubarak>

<dbp:Egypt> <dbp:Barack_Obama>

<dbp:CNN>

<dbp:Death_Of_Osama_Bin_Laden>

<dbp:Prime_Minister_of_Egypt>

<dbp:Altanta>

skos: http://dbpedia.org/resource/Category:  

<dbp:Egyptian_Arabic>

dbprop:languages  

Resource Features  

dbprop:commander  dbprop:headquarters  

dbprop:title  

13  

Page 14: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Approach DBpedia Graph Snapshots

 

I.e. The meta-graph of entity e is the aggregation of all resources, properties and classes related to this entity at time t.  

Properties and Resources <dbp:Barack_Obama>

DBpedia 3.6 3.7 3.8 ….

<MichelleObama>

<Hawaii>

prop:spouse  

prop:birthPlace  

<MichelleObama>

<Hawaii>

prop:spouse  

prop:birthPlace  

prop:commander  

<MichelleObama>

<Hawaii>

prop:spouse  

prop:wikiPageWikiLink  

<UnitedStatesPresidentialCandidates>

prop:birthPlace  

<Budget_Control_Act_of_2011>

prop:wikiPageWikiLink  <dbp:Death_Of_Osama_Bin_Laden>

14  

Page 15: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Approach Semantic Feature Weighting Strategies

 

Characterise the global relevance of a semantic feature to a given topic in DBpedia at a given point in time.

Topic Relevance-based Weighting Strategy:

DBpedia Graph Topic graph in DBpedia Graph

?  

15  

Page 16: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Approach Semantic Feature Weighting Strategies

 

•  Class-based Topic Relevance (ClsW) •  Property-based Topic Relevance (PropW) •  Category-based Topic Relevance (CatW) •  Resource Relevance (ResW)

Topic Relevance-based Weighting Strategy:

16  

Page 17: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Approach Semantic Feature Weighting Strategies

 

Integrating weights into a Tweet representation

DB_ tWx( f ) = DB_ t

Nx( f ) +1

F +DB_ t

Nx( f ')f '∈F∑

#

$

%%

&

'

((∗ WDB_ t ( f )#$ &'

1/2

Semantic feature f in a document x:

Frequency with Laplace smoothing

Weight derived from DB_t graph

17  

Page 18: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Experiments Framework for Twitter Topic Classification with DBpedia

 18  

•  Do semantic features built from DBpedia Graphs aid on a cross-epoch topic classification of Tweets?

•  Which feature type provides a more stable topic representation over time?

Page 19: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Experiments Framework for Twitter Topic Classification with DBpedia

 

2010

Microposts

Dumps

3.6

3.7

3.8

2011

2013

Resources

3.9

19  

Page 20: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Experiments Datasets

 

Tweets 2010 2011 2013

Disaster and Accident (D&A) Law and Crime (L&C) War and Conflict (W&C) Violence Related Topics

Nov-Dec Aug Sep

1x106 1x106 1x106

Assigns a topic label from a pool of over 10 categories

Perform Manual Annotation until 1K per year per Topic

Negative set 1K per year for Topics other than the 3

12K annotated tweets

20  

Page 21: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Experiments Framework for Twitter Topic Classification with DBpedia

 

2010

Microposts

Dumps

3.6

3.7

3.8

2011

2013

Resources

3.9

Concept Enrichment

<dbp:Hosni_Mubarak>

<dbp:Egypt> <dbp:Barack_Obama>

<dbp:CNN>

21  

Page 22: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Experiments Framework for Twitter Topic Classification with DBpedia

 

2010

Microposts

Dumps

3.6

3.7

3.8

2011

2013

Resources

3.9

Concept Enrichment

Resource Backtrack Mapping

Deriving Semantic Graph Snapshots

2010 2011 2013

22  

Page 23: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Experiments Framework for Twitter Topic Classification with DBpedia

 

Concept Enrichment

Resource Backtrack Mapping

Deriving Semantic Graph Snapshots

2010 2011 2013

DBpedia Topic Relevance based Feature Weighting

2010

Microposts

Dumps

3.6

3.7

3.8

2011

2013

Resources

3.9

23  

Page 24: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Experiments Datasets

 

LEX  

24  

W&

C

D&

A L&

C

NE

G

2010 2011 2013

2010 2011 2013

2010 2011 2013

2010 2011 2013

BoW Category Property Resource Class SEMANTIC  

Page 25: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Experiments Framework for Twitter Topic Classification with DBpedia

 

Concept Enrichment

Resource Backtrack Mapping

Deriving Semantic Graph Snapshots

2010 2011 2013DBpedia Topic Relevance based Feature Weighting

Build Topic Classifier

Topic Labelled Microposts 20

10

2011

2013

2010

Microposts

Dumps

3.6

3.7

3.8

2011

2013

Resources

3.9

25  

Page 26: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Experiments Understanding the Stability of a Topic Representation

 

train test

Lexi

cal

Sem

anti

c Co

mbi

ned

Epoch t t+1

Same epoch Scenario

26  

Page 27: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Experiments Epoch Scenarios

Same epoch Scenario (Trained on 2010- Tested on 2010)

Disaster_Acc Law_Crime War_Conflict F1 F1 F1

BoW 0.831 0.765 0.844

Category 0.697 0.650 0.744

Property 0.680 0.639 0.720

Resource 0.692 0.637 0.762

Class 0.633 0.583 0.637

27  

All  the  experiments  reported  in  our  paper  were  conducted  using  a  10-­‐fold  cross  valida7on  seMng    

Page 28: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Same epoch Scenario

Experiments Understanding the Stability of a Topic Representation

 

train test

Lexi

cal

Sem

anti

c Co

mbi

ned

Epoch t t+1

Cross-epoch Scenario test train

t t+1 28  

Page 29: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Experiments Epoch Scenarios

Cross-epoch Scenario (Trained on 2010- Tested on X)

Cross-Epoch

2010-2011 2010-2013 2011-2013 Average

F1 F1 F1 BoW 0.634 0.481 0.261 0.458 Category 0.683 0.539 0.524 0.582 Property 0.665 0.557 0.502 0.603 Resource 0.774 0.544 0.445 0.587 Class 0.691 0.665 0.669 0.675

Disaster_Acc  

29  

Page 30: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Experiments Epoch Scenarios

Averaged Cross-epoch Scenarios

Disaster_Acc Law_Crime War_Conflict Average F1 F1 F1

BoW 0.458 0.620 0.531 0.536 Category 0.582 0.537 0.453 0.55 Property 0.574 0.504 0.506 0.528 Resource 0.587 0.578 0.466 0.544 Class 0.675 0.647 0.664 0.665

30  

Page 31: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Conclusions

 

•  Semantic Features are much slower to decay

than lexical features. •  Semantic representation improve performance in

cross-time setting scenarios. •  Class based features alone achieve on average a

gain of 7% over lexical features on cross-epoch setting scenarios.

31  

Page 32: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Future Work

 

•  Concept-drift tracking for transfer learning using

Linked Data sources. •  Study cross-epoch transfer learning approaches

using semantic features.

32  

Page 33: Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Questions

 

[email protected] @pixarelli

33