21
1/21 Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks Improving Flickr discovery through Wikipedias Federico Gobbo {federico.gobbo}@uninsubria.it Universit` a degli Studi dell’Insubria Varese, Italy (cc) Some rights reserved.

Improving Flickr discovery through Wikipedias

Embed Size (px)

DESCRIPTION

Position paper presented at the "Between Ontologies and Folksonomies" (BOF) workshop at CCT2007.

Citation preview

Page 1: Improving Flickr discovery through Wikipedias

1/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Improving Flickr discovery through Wikipedias

Federico Gobbo{federico.gobbo}@uninsubria.it

Universita degli Studi dell’InsubriaVarese, Italy

(cc) Some rights reserved.

Page 2: Improving Flickr discovery through Wikipedias

2/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

1 IntroductionWhy folksonomies are interesting

2 FolksonomiesWhy folksonomies differ?

3 Linguistic issuesAugmented folksonomies through natural language

4 Introducing FlickrpediaMultilingual diversity as the source of knowledge

5 Concluding Remarks

Page 3: Improving Flickr discovery through Wikipedias

3/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Why folksonomies are interesting

A key question of information retrieval today

How to add meaningful metadata to web content, in order toincrease the utility of information by improve the precision ofinformation retrieval to search engines?

Page 4: Improving Flickr discovery through Wikipedias

4/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Why folksonomies are interesting

Folksonomies, a tentative answer. What are they?

folksonomy = folks + taxonomy

A folksonomy is made by tags or labels, usually single-wordmetadata attached to online items (documents, photos, videos,etc.), in order to add contextual meaning to the items themselves.

Folksonomies are a tentative effort toward the goal of improvingthe precision of information retrieval.

Page 5: Improving Flickr discovery through Wikipedias

5/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Why folksonomies differ?

Folksonomies and traditional taxonomies

Unlike traditional taxonomies, there is no explicit hierarchybetween tags nor tags are exclusive. For example, the photo of a

cat may be tagged as ‘cat’ and ‘european’ and ‘animal’, but thereis nothing that say that all cats are animals: tags can be seen ascommon facets of the item itself (Schmitz 2006). There is no

central authority, and this is the main reason why folksonomies arebecoming more and more popular among web resource users.

Page 6: Improving Flickr discovery through Wikipedias

6/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Why folksonomies differ?

The two different scopes of folksonomies

Each tag has two different scopes at the same time:

personimy, the user’s defined one (Quintarelli 2005);

consensus, the social shared meaning.

Consensus is becoming more and more important, as the wide useof tag suggestion interfaces in web applications suggests.

Page 7: Improving Flickr discovery through Wikipedias

7/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Why folksonomies differ?

Folksonomies and the Long Tail (see the video!)

Page 8: Improving Flickr discovery through Wikipedias

8/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Why folksonomies differ?

The key concept of serendipity

Consensus permits serendipity, i.e. users dig the web through tagsfinding new, unexpected and useful content, not easily accessiblevia traditional search engines.

Tags are used as filters, i.e. a query on more tags returns the itemstagged with any of the given tags – or with all tags, depending onthe application (Golder and Huberman 2006).

The purpose of this paper is to improve serendipity allowing peopleto dig folksonomies regardless of the natural language(s) theymaster.

Page 9: Improving Flickr discovery through Wikipedias

9/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Augmented folksonomies through natural language

Tags as linguistic objects

Tags are words, i.e. alphabetical strings meaningful in somenatural language. There is no controlled language. In particular,features unrecognized are:

synonymity (different word strings, analogue meaning);

homography (identical word string, totally different meaning);

different strategies in encoding are possibles (e.g.‘28-03-2008’, ‘2008March3’, ‘3rd March 2008’);

misspellings are very frequent, so standard NLP techniques arebanned.

Guy and Tonkin (2006) even advocated tag literacy education.

Page 10: Improving Flickr discovery through Wikipedias

10/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Augmented folksonomies through natural language

The linguistic divide in folksonomies

Multilingualism is an issue not fully explored yet in folksonomies.In fact, tags are written in a human language and users areinclined to write in the languages they are comfortable in.

It is certainly desiderable for a user not comfortable in English orother big language (in terms of presence in the web) to search andfind tags using a search engine interface in his or her tongue, whilethe engine searches the corresponding tags in English and in othermajor human languages.

Page 11: Improving Flickr discovery through Wikipedias

11/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Multilingual diversity as the source of knowledge

How to overcome the linguistic divide?

A proposal: through a special web application which extracts thepairs language-tags in every available language before passing thetags to the folksonomy search engine.

The claim is improvement in serendipity: when searching in 20natural languages at the same time, some interesting data will befound, undiscovered through a single language search.

Page 12: Improving Flickr discovery through Wikipedias

12/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Multilingual diversity as the source of knowledge

Flickr and its API

Flickr is one of the most popular web applications for photos (+2million photos are found if ‘flowers’ are searched, nowadays).Photos are freely tagged by users, so it can be considered afolksonomy.

Open source APIs in major programming languages are availableand people can make queries to the Flickr repository through anauthentication key given on request.

http://www.flickr.com/services/api

Page 13: Improving Flickr discovery through Wikipedias

13/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Multilingual diversity as the source of knowledge

Flickrpedia = Flickr + Wikipedias

Flickrpedia is designed on an API in Ruby and over developmentframework Ruby on Rails (Thomas 2005, Thomas andHeinemeier-Hansson 2005). Users can make queries in Flickrwriting a tag specifying its natural language.

The system crawls the Wikipedia in the corresponding languageand look for an appropriate page. With the help of regularexpressions, Flickrpedia parses the web page and extracts theexisting language pairs of the same topic in other languages fromthe appropriate web page box.

Page 14: Improving Flickr discovery through Wikipedias

How Flickrpedia works

AirplaneEnglish

German user

FlugzeugGerman

AvionFrench

Hegazkinbasque

enters the query in Flickrpedia

the systemcrawls

parsing with the help of regular expressions

...

the German user obtains the desidered photos from Flickr!

Page 15: Improving Flickr discovery through Wikipedias

The web page box for “alternate languages” in WikipediaAn example: the German word ‘Flugzeug’

Page 16: Improving Flickr discovery through Wikipedias

16/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Multilingual diversity as the source of knowledge

The results of the German word ‘Flugzeug’

At 2007, April, 11, Flickr finds less than 10,000 photos whileFlickrpedia more than 20,000 for the same query, giving a lot ofunexpected and relevant photos.

Page 17: Improving Flickr discovery through Wikipedias

Don’t trust me: try by yourself!Word searched: ‘Flugzeug’, i.e. airplane in German

http://buffy.sciva.uninsubria.it/∼rl608838/search

Page 18: Improving Flickr discovery through Wikipedias

18/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Flickrpedia until now

Flickrpedia should only store the wikipedias according to theexisting natural languages – actually, 85. Large and extemporaneusshared information repositories, like Flickr, can be managedthrough other semi-structured information repositories as thewikipedias.

Flickrpedia, if refined out of its actual prototypical phase, may helpusers with poor knowledge of major languages to retrieveinformation only through their lesser-used languages.

Page 19: Improving Flickr discovery through Wikipedias

19/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Further direction of Flickrpedia

Flickrpedia is far from perfect: homographies are still unmanaged,even if wikipedias have disambiguating pages, and it is not clearwhich wikipedias to choose in order to optimize serendipity.

By now the parsed wikipedias are the biggest ones in terms of wikipages, but this doesn’t give any guarantee of serendipityaugmentation.

Finally, the API given by Flickr is a severe limit: up to 20 tags canbe inserted in a single query request, and up to 60 thumbnails maybe given.

Page 20: Improving Flickr discovery through Wikipedias

20/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Beyond Flickrpedia

This approach isn’t limited to Flickr as the underlying folksonomy.Our research direction is towards generalization, i.e. users canchoose the appropriate folksonomy performing multilingual queries.

It is still to demonstrate how to apply this approach tofolksonomies where the semantic references are different fromphotos, i.e. an airplane or a flower is still so in almost every humanlanguage, more or less.

The real underlying problem is how to measure serendipity, i.e.specific and precise metrics for serendipity are needed.

Page 21: Improving Flickr discovery through Wikipedias

21/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Thank you. Any questions?

Download these slides at the following permalink:

http://purl.org/net/fgobbo

(cc) F. Gobbo 2007. Published in Italy.Attribuzione – Non commerciale – Condividi allo stesso modo 2.5