Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing...

Anon Plangprasopchok&

Kristina LermanUSC Information Sciences Institute

Constructing Folksonomies from User-Specificed Relations

on Flickr

Motivation

UsersWeb content

classification

Consume

ProduceAnnotate

Organize

DiscoverAnnotation /

Metadata

Organize Search Recommend Leverage

Inducing Folksonomy

• GOAL: induce hidden classification hierarchies, “Folksonomies*,” from user generated metadata

Although metadata from an individual user may be too inaccurate and incomplete, the metadata from different users may complement each other, making it, in combination, meaningful.

• In this work, we explore some strategies that combine metadata from many users and then induce folksonomies.

* The definition is somewhat different from the original one, made by Thomas Vander Wal.

Outline

• Motivation• Hierarchical Relations• Approaches• Results• Discussion• Related work

Hierarchical Relations in Social Web

• Appear Implicitly

• Appear Explicitly

Tags:InsectGrasshopperAustralianMacroOrthoptera

Folder (collection)

Sub folder (set)

Relations

Goal: to induce deeper hierarchies from this metadata

Inducing Hierarchy from Tags

Existing approaches

• Graph based [Mika05]• build a network of associated tags (node = tag, edge = co-occurrence of tags)• suggest applying betweenness centrality and set theory to determine broader/narrower relations

• Hierarchical Clustering [Brooks06; Heymann06+]•Tags appearing more frequently would likely have higher centrality and thus more abstract.

• Probabilistic subsumption [Sanderson99+; Schmitz06]

• x is broader than y if x subsumes y x subsumes y if p(x|y) > t & p(y|x) < t

Inducing Hierarchy from Tags

• Some difficulties when using tags to induce hierarchy:

Above relations induced using subsumption approach on tags [Sanderson99+, Schmitz06]

Washington United States

Car Automobile

Notation: A B (A is broader than B

Or hypernym relation)

Insect Hongkong

Color Brazilian

Specificity Rarity

Tags are from different facets

• User specified relations, e.g., • Flickr’s Collection-Set • Delicious’ Bundle-Tag • Bibsonomy’s Relation-Tag

• Key intuition: Not so many people specify peculiar relations like• “automobile” “car”, or • “Washington” “United States”

Inducing Hierarchy from user-specified relations

In this work, we concentrate on metadata from Flickr.

Simple Strategy

Mushrooms & Fungi

Collection

Fungi, Puffballs & Shelf FungiTokenize + Stem …

Concept relations

mushroom mushroom

Shelf fungi

puffballs

live thing

mushrooms

puffballs

shelf fungi

live thing

fungimushroom plant

……

2. Link concepts & Select path

1. Remove “noisy” relations- Conflict resolution- Significance test

Collection

Remove noisy relations: 1st approach

• Conflict Resolution (when both AB and BA appear)• Relation conflicts occur because of noise• Voting scheme:

Keep AB (and discard BA)

If Nu(AB) > 1 and Nu(AB) > Nu(BA)

insect

butterfly

insect

Remove noisy relations:2nd approach

• Significance Test- Use statistical significance test to decide if A B is significant

- Null hypothesis: observed relation AB was generated by chance, via the random, independent generation of individual concepts A, B.

# observations

rejectaccept

# of A B

Is B narrower than A by chance?

Link Concepts

• Link concepts together

• simply assume that same terms refer to the same concept

insect

bug insectbug

insect

mothbug

insect

Select path

bug insect

4 possible paths from anim moth:1) abim2) aim3) am4) abm

Network Bottleneck idea: “the flow bottleneck is a minimum flow capacity among all relations in the path”

1) abim [BN score = min(26,1,18) = 1]2) aim [BN score = min(72,18) = 18]3) am [BN score = min(10) = 10]4) abm [BN score = min(26,4) = 4]

• Select path: link relations from many users can cause a spaghetti graph

Evaluation & Data Set

• Hypothesis: the approach that takes explicit relations into account can induce better hierarchies.• “Better” means more consistent with the reference hierarchy (obtained from Open Directory Project (ODP))

Hierarchy in ODP is• created by volunteer editors• controlled under ODP guidelines

Evaluation & Data Set (2)

• The baseline approach is subsumption approach [Schmitz06] Collection and set terms are used instead of tags, making it comparable.

Data Set: Data from 17 user groups, devoted to wildlifeand naturalist photography

21,792 of 39,922 users specify at least one collection

110,543 unique terms (c.f. 166,153 unique terms in ODP), 15,495 terms in common.

Evaluation methodology

ODP has many sub hierarchies: comparing to the induced ones are impractical!It’s easier to compare when specifying “root concept” and “leaf concepts”, i.e., specifying a certain sub tree to compare.

Reference hierarchy

Relations (right after tokenized)

Induced hierarchy

Induce (remove

noise+link)

Metrics

• Taxonomic Overlap [adapted from Maedche02+]• measuring structure similarity between two trees• for each node, determining how many ancestor

and descendant nodes overlap to those in the reference tree.

• Lexical Recall• measuring how well an approach can discover

concepts, existing in the reference hierarchy (coverage)

Quantitative Result

• Manually selecting 32 root nodes

Subs 1/32Conres 11/32Sig001 15/32

Subs 2/32Conres 17/32Sig001 6/32

Subs ~ 0.85 Conres ~ 2.24Sig001 ~ 2.08

Sport hierarchy

Invertebrate hierarchy

Country hierarchy

Discussion

• Simple strategy to aggregate a large number of shallow relations specified by different users into a common, deeper hierarchy

• Induced hierarchies are more consistent with ODP

• Future work includes: Term ambiguity Global structure Relation types Apply to other datasets

Related Work

• Learning concept hierarchy from text data • Syntactic based [Hearst92, Caraballo99,

Pasca04, Cimiano+05, Snow+06]• Word clustering [e.g., Segal+02, Blei+03]

• Induce concept hierarchy from tags • Graph-based & clustering based [Mika05,

Brooks+06, Heymann+06, Zhou07+]• Probabilistic subsumption [Schmitz06]

• Ontology alignment [e.g., Udrea+07]

• Exploit user-specified hierarchy• GiveALink [Markines06+]

• Questions?• Is the metric used in evaluation meaningful?• How is the scalability of the system?• WordNet, ODP is already there. Why do we need

this system?• How is this work related to ontology enrichment?• Is it ethical to collect users’ data?• ….

Questions?

THANK YOU!

Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing...

Documents

Folksonomies-Group 4

Network properties of folksonomies

Workshop on FOLKSONOMIES - uni-duesseldorf.de...1. Folksonomies – Indexing without rules 2. Folksonomies in information services and library catalogues Short Break. 3. Folksonomies

Portafolio Daniela Lerman

Presentation-Weeks Lerman

Folksonomies und Unternehmenskommunikation peters final

Constructing Folksonomies from User-Speciﬁed Relations on ...lerman/papers/fp500-plangprasopchok.pdf · Figure 1: Personal hierarchy speciﬁed by a Flickr user. (a) Some of the

Plan EstratéGico Lerman

Dr. Robert Lerman,

Tel Aviv Lerman

Folksonomies in Wissensrepräsentation und IR

Taxonomies and Folksonomies

Lerman Booklet

Understanding the Semantics of Ambiguous Tags in Folksonomies · Understanding the Semantics of Ambiguous Tags in Folksonomies – C.M. Au Yeung, N. Gibbins, N. Shadbolt • Folksonomies

Social Bookmarks, Folksonomies–Complex Networks

Tour Of Folksonomies Delicious Short

Folksonomies And Facets

Taxonomy, ontology, folksonomies & SKOS

Lerman Evolving

Lerman Duván Machado Laitón