Upload
oscar-corcho
View
1.042
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Presentation at
Citation preview
Characterising the Emergent Semantics in Twitter Lists
Andrés García-Silva †, Jeon-Hyung Kang*, Kristina Lerman*,Oscar Corcho †
† {hgarcia, ocorcho}@fi.upm.esFacultad de Informática
Universidad Politécnica de Madrid, Spain
*{jeonhyuk,lerman}@isi.edu
Information Sciences Institute,
University of Southern California, USA
Characterising the Emergent Semantics in Twitter Lists 2
Introduction
Twitter Lists
3Characterising the Emergent Semantics in Twitter Lists
Introduction
Curators and
List Names
4Characterising the Emergent Semantics in Twitter Lists
Introduction
Members and
List Names
5Characterising the Emergent Semantics in Twitter Lists
Introduction
Subscribers
and
List Names
6Characterising the Emergent Semantics in Twitter Lists
• Previous examples showed individual uses of lists• Some list names where related among them
• What about if we group the lists?
Introduction
7Characterising the Emergent Semantics in Twitter Lists
IntroductionLists where the Yahoo!Finance user is a member grouped by frequency of membership
Lists where the NASDAQ user is a member grouped by number of subscriptions
8Characterising the Emergent Semantics in Twitter Lists
Stocks
PersonalBanking
Investment
BanksCurator 1 Curator 2
Subscriber 1
List members
• Is it possible to identify related keywords from list names according to the use given by the different user roles?• Are two list names related if they have been used by a similar set of
curators?• Are two list names related if a similar set of users have subscribe to the
corresponding lists?• Are two list names related if their corresponding lists have a similar set of
members?• What kind of user roles will generate more related keywords?• What types of relations between keywords can we obtain?
• Synonyms, is-a, siblings..?
Introduction: Research questions
9Characterising the Emergent Semantics in Twitter Lists
Approach
Elicit related keywords from Twitter lists
Characterise the semantics of the relations
Schema Representation of keywords
Based on members
Based on subscribers
Based on curators
Model to identify similar keywords
Vector Space Model
Latent Dirichlet Allocation
Pairs of related
keywords per
Schema Rep. and
Model
Twitter Lists
10Characterising the Emergent Semantics in Twitter Lists
Approach
Elicit related keywords from Twitter lists
Characterise the semantics of the relations
Pairs of related
keywords per
Schema Rep. and
Model
Similarity based on WordNet
Jiang & Conrath (Distributional Inf.)
Wu & Palmer (Hierarchical Inf.)
Path Length
SPARQL queries over general KBs published as Linked Data
DBpedia, OpenCyc, and UMBEL
SynonymsIs-a
SiblingsIndirect is-a
Specificity of relations
Synonyms(sameAs)
Binary relations(TypeOf, BT)
Object Prop.(Occupation)
11Characterising the Emergent Semantics in Twitter Lists
• Data set• Total
• 297,521 lists, 2,171,140 members, 215,599 curators, and 616,662 subscribers
• We extracted 5932 unique keywords from list names; 55% of them were found in WordNet.
• We use approximate matching of the list names with dictionary entries
• The dictionary was created from Wikipedia article titles
Experiment: Setup
12Characterising the Emergent Semantics in Twitter Lists
Experiment: Execution
Pairs of related
keywords per
Schema Rep. and
Model
Each keyword
with the 5 Most
related WordNet Similarity
Characterise the semantics of the relations
Similarity based on WordNet
Jiang & Conrath (Distributional Inf.)
Wu & Palmer (Hierarchical Inf.)
Path Length
Elicit related keywords from Twitter lists
Schema Representation of keywords
Based on members
Based on subscribers
Based on curators
Model to identify similar keywords
Vector Space Model
Latent Dirichlet Allocation
Dataset
13Characterising the Emergent Semantics in Twitter Lists
Experiment: Data Analysis
Pearson's coefficient of correlations
Average J&C distance and W&P similarity
Cor
rela
tion
Val
ues
(-1
to
1)
14Characterising the Emergent Semantics in Twitter Lists
Path Length Members Subscribers Curators
VSM LDA VSM LDA VSM LDA
1 (synonyms) 8.58% 10.87% 3.97% 3.24% 1.24% 0.50%
2 (is-a) 3.42% 3.08% 1.93% 0.47% 0.70% 0.00%
3 (Siblings, ind. Is-a) 2.37% 3.77% 2.96% 2.06% 2.38% 4.03%
>3 67.61% 65.5% 67.2% 67.5% 77.8% 75.8%
Experiment: Data Analysis
In average 97.65% of the relations with a path length greater than 3 involve a common subsumer
Path Length in WordNet
% of relations found by each schema representation and model
15Characterising the Emergent Semantics in Twitter Lists
Rel
atio
ns
in W
ord
Net
Depth of the least common subsumer
Experiment: Data Analysis
Rel
atio
ns
wit
h d
ept(
LC
S)
>=
5
Length of the path setting up the relation
Depth (LCS) and path length as indicators of specificity
16Characterising the Emergent Semantics in Twitter Lists
Summary• Similarity models based on members
• produce the results that are most correlated to the results of similarity measures based on WordNet
• find more synonyms and direct relations is-a when compared to the other models (path length).
• The majority of relations found by any model have a path length >= 3 and involve a common subsumer.• Depth of LCS
• VSM based on subscribers produces the highest number of specific relations (depth of LCS >= 5 or 6).
• Similarity models based on curators produce a lower number of relations.
Experiment: Findings
17Characterising the Emergent Semantics in Twitter Lists
Experiment: ExecutionExperiment: Execution
Pairs of related
keywords per
Schema Rep. and
Model
Each keyword
with the 5 Most
related
Elicit related keywords from Twitter lists
Schema Representation of keywords
Based on members
Based on subscribers
Based on curators
Model to identify similar keywords
Vector Space Model
Latent Dirichlet Allocation
Dataset
Ontological Relations between
keywords
Characterise the semantics of the relations
SPARQL queries over general KBs published as Linked Data
DBpedia, OpenCyc, and UMBEL
18Characterising the Emergent Semantics in Twitter Lists
• We anchor 63.77% of the keywords extracted from Twitter Lists to DBPedia resources
Experiment
19Characterising the Emergent Semantics in Twitter Lists
Experiment
Linked data pattern (54.73%): x -> object <-yRelations object Keywords
type type 67.35% company nokia intelsubClassOf subClassOf 30.61% activities philanthropy fundraising
Linked data pattern (43.49%): x <-object->yRelations object Keywords
genre genre 12.43% Aesthetica theater filmoccupation genre 10.27% Adam Maxwell fiction writeroccupation occupation 8.11% Alina Tugend poet writer
product product 7.57% ChenOne clothes fashionindustry product 9.73% UserLand Softw. blogs internet
known for occupation 5.41% Adeline Yen Mah author writingknown for known for 3.78% Rebecca Watson skeptics atheist
main interest main interest 3.24% Aristotle politics government
Relation type Example of keywordsBroader Term 26% life-science biotech
subClassOf 26% writers authorsdeveloper 11% google google_apps
genre 11% funland comedylargest city 6% houston texas
Others 20% - -
Vector-space model based on members (direct relations)
Vector-space model based on subscribers (relations of length 3)
20Characterising the Emergent Semantics in Twitter Lists
• Different models to elicit related keywords from Twitter lists.• Curators, Subscribers and members - VSM and LDA
• Characterise the semantics of relations: WordNet-based similarity measures and SPARQL queries over linked data sets
Conclusions
21Characterising the Emergent Semantics in Twitter Lists
• Vector-space and LDA models based on members produce the most correlated results to those of WordNet-based metrics.• Shortest JC distance and highest WP similarities
• According to the path length in WordNet• Models based on members produce more synonyms and direct is-a• Most of the relations have path length ≥ 3 and have a common subsumer
• Depth of LCS• Vector-space model based on subscribers finds highest
number of relations (depth LCS ≥ 5 and 4 ≤ path length ≤ 0) • We confirm these results according to linked data sets
Conclusions
Characterising the Emergent Semantics in Twitter Lists
Andrés García-Silva †, Jeon-Hyung Kang*, Kristina Lerman*,Oscar Corcho †
† {hgarcia, ocorcho}@fi.upm.esFacultad de Informática
Universidad Politécnica de Madrid, Spain
*{jeonhyuk,lerman}@isi.edu
Information Sciences Institute,
University of Southern California, USA