74
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF) December 4, 2013 Speaker: Thomas Hickey, Chief Scientist, OCLC http://www.niso.org/news/events/2013/dcmi/authority

NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Embed Size (px)

DESCRIPTION

Libraries around the world have a long tradition of maintaining authority files to assure the consistent presentation and indexing of names. As library authority files have become available online, the authority data has become accessible -- and many have been published as Linked Open Data (LOD) -- but names in one library authority file typically had no link to corresponding records for persons and organizations in other library authority files. After a successful experiment in matching the Library of Congress/NACO authority file with the German National Library's authority file, an online system called the Virtual International Authority File was developed to facilitate sharing by ingesting, matching, and displaying the relations between records in multiple authority files. The Virtual International Authority File (VIAF) has grown from three source files in 2007 to more than two dozen files today. The system harvests authority records, enhances them with bibliographic information and brings them together into clusters when it is confident the records describe the same identity. Although the most visible part of VIAF is a HTML interface, the API beneath it supports a linked data view of VIAF with URIs representing the identities themselves, not just URIs for the clusters. It supports names for person, corporations, geographic entities, works, and expressions. With English, French, German, Spanish interfaces (and a Japanese in process), the system is used around the world, with over a million queries per day. Speaker Thomas Hickey is Chief Scientist at OCLC where he helped found OCLC Research. Current interests include metadata creation and editing systems, authority control, parallel systems for bibliographic processing, and information retrieval and display. In addition to implementing VIAF, his group looks into exploring Web access to metadata, identification of FRBR works and expressions in WorldCat, the algorithmic creation of authorities, and the characterization of collections. He has an undergraduate degree in Physics and a Ph.D. in Library and Information Science.

Citation preview

Page 1: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

NISO/DCMI Webinar:Cooperative Authority Control:

The Virtual International Authority File (VIAF)

December 4, 2013

Speaker: Thomas Hickey, Chief Scientist, OCLC

http://www.niso.org/news/events/2013/dcmi/authority

Page 2: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Thomas Hickey

Chief Scientist

2013 December 4

NISO/DCMI Webinar

Cooperative Authority Control:Virtual InternationalAuthority File (VIAF)

Page 3: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

3

Outline

Background and Philosophy Visible VIAF Challenges New directions

Relationship with other identifiers Coping with ambiguity

Page 4: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Why do we like authorities?

1. To enable a person to find a book of which either(A) the author(B) the title(C) the subject

2. To show what the library has(D) by a given author(E) on a given subject(F) in a given kind of literature

3. To assist in the choice of a book(G) as to its edition (bibliographically)(H) as to its character (literary or topical)

is known.

Charles A. Cutter: Rules for a printed dictionary catalog, 1876

Page 5: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

What do authority files control?

• Names!– Persons– Corporations– Places– Uniform Titles– Families– Trademarks– Concepts

Page 6: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

But we also control

• Collective authors• Pseudonyms• Imaginary characters• Deities, saints, angels• Whales, horses, dinosaurs• Buildings• Ships, telescopes, space ships, missiles• Kings, Popes, Presidents• Cities, lakes, mountains

Page 7: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

A changing world

• Libraries– Local library– Library consortia– National cooperation– Within languages– Global

• Technology– Handwritten– Typed– Printed– Online– Pervasive

EVERYBODY WANTSTO CHANGE THE WORLDBUT NOBODY WANTS

TO CHANGE

Page 8: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

A world of linked data

http://www.w3.org/DesignIssues/diagrams/lod/2010-color.png

Page 9: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)
Page 10: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Challenges to libraries

• Reflect these links in our catalogs– RDA

• Link to external resources• Have non-library resources link to us

– Promote our links

• Be integrated in our users workflow

Page 11: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Library data is

• Trusted• Understood• Reasonably interoperable• Complex

Within the community, linked data of limited help

Page 12: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Shareable metadata

• Public• Simple• Supply data rather than APIs

– Avoid idiosyncratic protocols• Z39.50• MARC-21• ISO2709

12

Page 13: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

13

Brief history of VIAF

VIAF Proof-of-concept project launched

1998

VIAF Consortium

formed(Berlin)

2003 2007•Library of Congress•Die Deutsche Bibliothek•OCLC Research

2011

After considering multiple options,

consensus to transition VIAF to an OCLC service

BnF joins

VIAF becomes anOCLC service

2012

VIAF Council holds 1st meeting (Helsinki)

4 Principals+

18 Contributorsin

18 countries

Page 14: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

14

VIAF’s Goals

Reduce cost of authority control Increase the utility of library authority files

Provide links between equivalent names Make the information Web friendly

Open API Bulk downloads Open Linked Data

Page 15: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Applications

FRBR matching Better matching of non-English metadata Uniform identifier across all languages

Authority control for cataloging Better regionalization of catalogs Minimize differences across languages of

cataloging

More intelligent linking and searching

Page 16: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)
Page 17: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

17

VIAF authority record counts

26,400,000

5,100,000

400,0001,800,000

PersonalCorporateGeographicUniform Titles

Page 18: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

18

Web interface and usage

Page 19: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)
Page 20: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)
Page 21: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)
Page 22: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

22

VIAF Use

Page 23: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

23

Usage

• Browser usage for past year– 953,020 visitors– 1,531,493– 5,448,910 pages

• API usage– Went from 90% of usage to 98%– Peaks at ~20/second– ~ 5 million searches/week

• Downloads– ~150/week for links, 150 for clusters

Page 24: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

24

Page 25: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

25

Page 26: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

26

Building VIAF

Page 27: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Enhancing authorities

Bibliographic

Record

Derived Authorit

y

AuthorityRecord

Processed

Authority

Page 28: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Record Flow

• 37 million authority records• 30 million links between authorities

SWNL Bib & Authority BnF Bib & Authority LC Bib & Authority

VIAF

Page 29: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Machine access to VIAF

Page 30: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Background

VIAF is available in bulk downloads All online interaction with VIAF is RESTful

Using SRU http://www.loc.gov/standards/sru/ http://www.oclc.org/developer/documentation/virt

ual-international-authority-file-viaf/using-api

Page 31: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Bulk downloads

Go to http://viaf.org/viaf/data Variety of formats

Just links RDF (XML and N-Triples) MARC-21 Native XML clusters

Page 33: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

SRU Tricks

RSS feedhttp://viaf.org/viaf/search?query=dempsey&http:accept=application/rss%2bxml

Exact with truncationhttp://viaf.org/viaf/search?

query=local.names+exact+%22cervantes*%22&sortKeys=holdingscount

Page 34: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

http://viaf.org/viaf/search

Page 35: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

URL Patterns

http://viaf.org/viaf/95216565 http://viaf.org/viaf/sourceID/BNF%7C1192613

3 http://viaf.org/viaf/sourceID/LC%7Cn++7913

0807 http://viaf.org/viaf/95216565/viaf.xml http://viaf.org/viaf/95216565/justlinks.json http://viaf.org/viaf/95216565/marc21.xml http://viaf.org/viaf/95216565/rdf.xml

Page 36: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

36

New Directions for VIAF

Non-library sources Information from WorldCat Integration with WorldCat

Page 37: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

VIAFbot – The Wikipedia Connection

VIAFbot

http://www.flickr.com/photos/vintagehalloweencollector/480856825/

OCLC Wikipedian in residence Max Klein

Automatic comparison of VIAF and Wikipedia references

Initially English then German

Now working with WikiData

Page 38: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

WikiData

Page 39: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

39

WikiData

Page 40: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

40

WikiData

Page 41: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

41

WikiData

Page 42: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

VIAF↔Wikidata Linking Benefits

VIAF Enhancing Wikipedia language coverage

14,000+ New labels/aliases added

Page 43: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

VIAF – in the Web ofBibliographic Data

Worldcat.org/oclc/81453459The Hidden Face of Eve

http://viaf.org/viaf/84254254/Nawal El Saadawi

http://www.wikidata.org/wiki/Q238514Nawal El Saadawi

http://isni-url.oclc.nl/isni/0000000120296695Nawal El Saadawi

author

sameAs

sameAs

sameAs

http://id.loc.gov/authorities/subjects/sh85120576The Sex customs

about

VIAF

Page 44: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

44

Other non-library sources

• ISNI– International Standard Name Identifier

• Perseus Digital Library• Syriac project names• Fihirst Arabic names

Page 45: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

45

Information from WorldCat

Page 46: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

46

Multilingual BibliographicStructure Project Majority of WorldCat about non-English

works Much of the metadata is non-English

Hybrid records Parallel records

FRBR work-level algorithm plus GLIMIR manifestation/expression level Identify 3 levels of FRBR Can’t we do something with these?

Page 47: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

47

Approach

• Process at work-level when possible• Extract most reliable information• Use that to extract less reliable• Find

– Languages, original language– Translators– Titles (by language)

Page 48: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

48

Benefits

• Localize metadata to various languages– Easier cataloging– Better cataloging

• Merge• Fix

– Better displays to fit the user• Linking of translations• Appropriate language• Use all appropriate data!• Better FRBR groupings

Page 49: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

49

Records for VIAF

• Translated works– Work and expression records– More information about

• Languages• Translators

– Better links between work/expression records

Page 50: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

50

Other possibilities

• Variant forms of names• More titles• Coauthors• FAST subject headings

Page 51: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

51

Identifier relationships

Page 52: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

ISNIInternational Standard Name Identifier

Draft ISO standard:… aspires to provide a means to uniquely identify creators, including authors, composers, artists, cartographers and performers, among others.  Such an authoritative identifier will serve to provide a link for occurrences of the identity across databases on the web

Driven by rights-holders Publishers Rights agencies representing authors, artists

Active disambiguation program

Page 53: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Started with Thomson-Reuter’s Researcher ID Most ‘social’

Claiming IDs Interactive verification of associated works Pulling together several current initiatives

Driven by STM, university communities Primarily interested in researchers Large number of participants Mostly concerned with present and future

names

Page 54: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Cooperation Challenges

What data can be shared? How to fund the efforts? Established by different types of institutions:

Libraries, Standards Organization, STM Publishers Different

Technologies Time scales

What does the name represent? People, personas, organizations

Who is in charge?

Page 55: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Commonalities

All centered in not-for-profits All interested in data exchange All interested in global systems All have an understanding of the problem

Personal author disambiguation and identification

Central to their operations

Page 56: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Coping with Ambiguity

1,520 headings found for smith, john

Page 57: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

The problem

Two names in single source for same identity

Mixed identities Different granularity

Pseudonyms Presidents, Kings

Chains of matches

VIAF has ~ ½ million ambiguous groups

Page 58: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Goal

• 99+% sure of pair-wise assertions– Includes all pairs of records in resulting clusters

Page 59: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

59

Another common issue

Page 60: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Harvest and ingest

Coping with– Duplicate identifiers– Deletes

Page 61: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Matching Authorities to Bibs

Sometimes identifier Often ambiguity with just names

Multiple possibilities May mix and identity

Page 62: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Cross references within sources Strings can be ambiguous Links not necessarily resolvable

Page 63: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Enhance the authority records

• Pull information from bibs, authority notes• Cope with

– Mistagged fields– Ambiguous dates– Errors in pulling titles, etc.

Page 64: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Pair-wise matching between sources

• Two dozen types of matches– Ranked by reliability/strength

• Major problems– Missing information– Mixed identities

• Can override the matching– xA

Page 65: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Duplicates within sources

• Rely primarily on– String similarity– Complexity of the preferred form

• Also look for multiple links from other sources

• Lonely names

Page 66: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Pulling together groups

• Only keep strongest links between records in different sources– A record in source A may match several

records in source B– E.g. keep a double-date match over a coauthor

match

Page 67: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Generate coherent clusters

• Look for cliques• Merge subgraphs

o Strength of the best link between the pairo Number of links between the pairo A metric based on

Strength of the match Title closeness Node type (corporate, personal, etc.) Name closeness

o Whether the nodes are personal names or not

Page 68: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Coherent clusters

• Avoid Date conflicts Incompatible names Names that are cross references to each other Names that differ only in a number

Page 69: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Assign VIAF IDs

Minimize moves of source records Redirect unused VIAF IDs if possible

Page 70: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Create links between clusters

• Cross references• Uniform titles• Coauthors• Other bibliographic titles

In general, link only if not ambiguity

Page 71: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

71

Lonely names

Page 72: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

©2013 OCLC. This work is licensed under a Creative Commons Attribution 3.0 Unported License. Suggested attribution: “This work uses content from [presentation title] © OCLC, used under a Creative Commons Attribution license: http://creativecommons.org/licenses/by/3.0/”

Thank You!

72

Page 73: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

NISO/DCMI WebinarCooperative Authority Control: The Virtual International Authority File (VIAF)

NISO/DCMI Webinar • December 4, 2013

Questions?All questions will be posted with presenter answers on the NISO website following the webinar:

http://www.niso.org/news/events/2013/dcmi/authority

Page 74: NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International Authority File (VIAF)

Thank you for joining us today. Please take a moment to fill out the brief online survey.

We look forward to hearing from you!

THANK YOU