April 2008 1 Uncorking the Varietals: Social Tagging, Folksonomies & Controlled Vocabularies...

Preview:

Citation preview

April 2008 1

Uncorking the Varietals: Social Tagging, Folksonomies & Controlled VocabulariesMargaret MaurerHead, Catalog and MetadataKent State University Libraries and Media Services

2April 2008

In wine making - What is a Varietal? A wine made from a single,

named grape variety.

Cabernet Sauvignon wines are made from cabernet sauvignon grapes

Chardonnay wines are made from chardonnay grapes

3April 2008

In information seeking – on the Web or in the catalog

Access and identification systems may be controlled by librarians–controlled vocabularies

Access and identification systems may be dynamically generated by users–social tagging, folksonomies

These are different varieties of access and identification systems

4April 2008

This presentation Controlled vocabularies Social Tagging Folksonomies My recommendations

First we’ll talk about the cabernet sauvignons – the controlled vocabs

5April 2008

Purpose of a controlled vocabulary

To create sets of objects To serve as a bridge between the searcher’s

language and the author’s language To provide consistency To improve precision and recall

6April 2008

Characteristics of a controlled vocabulary

Features a single, authorized form of heading Often features a syndetic structure of cross-

references Based on belief that the successful use of the

catalog is based on the quality of the individual records

7April 2008

The authority record structure

Records the standardized form Ensures the gathering together of records via

that access point Enables standardized catalog records Documents decisions taken Records all other heading forms

and provides links from them to

the standardized form

8April 2008

Benefits of controlled vocabularies Promotes discovery generally Promotes discovery when the aboutness of

something has nothing to do with words in the resource or its representation Imaginative literature (Genre headings) Humanities

Promotes pre-coordinated displays expand access–http://cinema.library.ucla.edu

9April 2008

Benefits when combined with keyword searching

Keywords hook into strings of terms most efficiently

Users can be routed by pre-coordinated strings

10April 2008

Controlled vocabularies support faceted catalogs

Encore Evergreen Endeca WorldCat Local

All provide hyperlinks to authorized headings

11April 2008

Weaknesses of controlled vocabularies The artificially controlled language is not

necessarily natural language—Cookery anyone? Subject searches are the most problematic for

users It may work better in theory than in practice It is costly to perform necessary maintenance Cost is seen to outweigh the benefits by many

administrators

12April 2008

Library of Congress Subject Headings - LCSH

Has a long and well-documented history Commonly used Is contained in millions of bibliographic records Strong institutional support from LC

13April 2008

More benefits of LCSH

The rich vocabulary covers most subjects It imposes synonym and homograph control There are machine assisted authority control

mechanisms There is pre-coordination with LCC The music subject heading system is well

developed

14April 2008

Weaknesses of LCSH

It is a generalist taxonomy that can’t always provide needed granularity

Terminology currency It doesn’t allow for post-search coordination (it is

pre-coordinated) It suffers from LC Collection bias

15April 2008

More weaknesses of LCSH

Training neededRequires some orientation to use effectively Is not always accurately applied by catalogers

Maintenance It is difficult to maintain when changes occur

16April 2008

Authority control outside the catalog Data critical mass tipping point?

Homogeneity of data in terms of subject matter

Requirements within data community’s users for specificity

SizeComputing power

Wikipedia’s “disambiguation”

17April 2008

ZoomInfo http://www.zoominfo.com/Default.aspx

18April 2008

19April 2008

What if we did open up our authority files to the web?

National Library of Australia’s People Australia Project

http://www.nla.gov.au/initiatives/peopleaustralia/ Wikipedia Persondata-Tool

http://www.ifla.org/IV/ifla73/papers/113-Danowski-en.pdf

20April 2008

Is ontology overrated?

Physicality requires ontologies for searching, but systems with hyperlinks do not

Browse versus search may eliminate the need for creating lists of authorized headings

21April 2008

Ontological classification

Works well when the domain to be organized is small, has formal categories, has stable entities, is restricted and has clear edges

Does not work well when the domain to be organized is large, has no formal categories, is unstable, is unrestricted and has no clear edges

22April 2008

Ontological classification

Works well when the participants are expert catalogers, authoritative sources of judgement, coordinated users or expert users

Does not work well when the participants are uncoordinated, armature, naïve or non-authoritative

23April 2008

Now we talk about the Chardonnays – social tagging and folksonomies

24April 2008

What are tags?

Keywords or terms associated with or assigned to a piece of information

They enable keyword-based classification and search of information

25April 2008

Common Web sites that use tags include Del.icio.us – Social bookmarking site Flickr – Image tagging LibraryThing Gmail - Webmail YouTube

26April 2008

Tags, and therefore social tags and folksonomies are

Dynamic categorization systems Often created on-the-fly Chosen as relevant to the user – not to the

creator, cataloger or researcher A social activity (more on this later) Hopefully one small step toward a more

interactive and responsive library system

27April 2008

Social tags are

Non-hierarchical A way to create links between items by the

creation of sets of objects A means of connecting with others interested in

the same things

28April 2008

Way baaack in 2003…

Del.icio.us includes identity in its social bookmarking

Flickr includes tags Lists of tags became a tool for serendipitous

discovery (folksonomies)

29April 2008

Why is tagging so popular?

It is easy and enjoyable It has a low cognitive cost It is quick to do It provides self and social

feedback immediately

30April 2008

People tag things

To find them again To get exposure and traffic To voice their opinions Incidentally as they perform other tasks To take advantage of functionality built on top of

a folksonomy To play a game or earn points

31April 2008

Putting the social in tagging

Tags allow for social interaction because when we navigate by tags we are directly connecting with others

People tag for their own benefit

32April 2008

Don’t confuse tags with keywords or full-text searching Keywords are behind the scenes, tags are often

visibly aggregated for use and browsing Keywords can not be hyper-linked Keywords imply searching, tags imply linking Full-text searching is passive, tagging is active It’s more about connecting items rather than

categorizing them.

33April 2008

What is a Folksonomy?

Folksonomy refers to an “emergent, grassroots taxonomy”An aggregate collections of tagsA bottom-up categorical structure

developmentAn emergent thesaurus

A term coined by Thomas Vander Wal

34April 2008

How do folksonomies work?

The searcher defines the access, but The aggregation of the terms has public value It’s a typically messy democratic approach

35April 2008

What makes folksonomies popular?

Their dynamic nature works well

with dynamic resources They’re personal They lower barriers to cooperation

36April 2008

Tagging and the consequent folksonomies work best when It’s easy to do It’s not commercial in nature Taggers have ownership Taggers are more likely to tag their own stuff

than they are your stuff It has been shown to work well on the Web

37April 2008

The unexpected development: terminological consensus

Collective action yields common terms Stabilization may be caused by imitation and

shared knowledge The wisdom of the crowd

38April 2008

Is your tagging influenced by my tagging?

Of course it is! People are beginning tag in ways that make it

easier for others to fine like stuff Shared meaning consequently evolves for tags Most used tags become most visible

39April 2008

Strengths of folksonomies

Cost-effective way to organize Internet Social benefits It’s inclusive For many environments, they work well

40April 2008

Issues with meaning

They do not yield the level of clarity that controlled vocabularies do

Term ambiguity – words with multiple meanings No synonym control

41April 2008

Issues with specificity

Variable specificity for related terms Broadness of terms impacts precision – terms

are often imprecise Mixed perspectives

42April 2008

Issues with structure

Singular and plural forms create redundant headings

No guidelines for the use of compound headings, punctuation, word order

No scope notes No cross references

43April 2008

Issues with accuracy

Collective ‘wisdom’ of the tagging community How does wrong information impact retrieval Conflicting cultural norms Sometimes authority counts

44April 2008

“Spagging” and other problems

Opening doors to opinion tags Tagging wars “Spagging” Spam tagging

45April 2008

Tidying up the tags…?

Lists of tagging norms have been developed Are there programmatic solutions? Users know they are looking at tags By tidying, do we destroy the essence of why

this works? Do we realistically have the resources?

46April 2008

Recommendations

Don’t assume that one size fits all Retain controlled vocabularies in the catalog Explore ways to use controlled vocabularies to

help organize the internet by re-purposing controlled vocabularies that already exist

Invite Folksonomies to the party in the catalog to gain their benefits

Explore ways to combine the two systems

47April 2008

RecommendationsWhen you invite folksonomies into the

catalog, do so strategically, and carefully

Don’t put terms in the same index as controlled vocabularies Find ways to associate terms applied across

editions of works Need for mediation, or at least observation The crowd is not necessarily the best arbiter

of specific terminology

48April 2008

Recommendations

Always remember why people tag

People tag things because they want to find them, not because they want others to find them

Be aware that this will impact the quality of the terms, and their frequency

49April 2008

Recommendations

Controlled vocabularies could be better utilized than they currently are

Subject structures are underutilized in the ILS Controlled vocabularies that exist are not being

exported to the Web Well-connected terms foster discovery – let’s

connect them. Index those cross references where available

50April 2008

Questions?

Margaret Maurer

mbmaurer@kent.edu

Recommended