36
The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley Metadata Librarian Metadata Working Group February 22, 2008

The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

  • Upload
    nardo

  • View
    15

  • Download
    0

Embed Size (px)

DESCRIPTION

The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley Metadata Librarian. Metadata Working Group February 22, 2008. Reduce linking dead ends from a publisher’s content to another Show multiple subscriptions or relevant access points in one place - PowerPoint PPT Presentation

Citation preview

Page 1: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

The OpenURL Quality Problem & Project

Adam ChandlerCoordinator, Service Design Group

Glen WileyMetadata Librarian

Metadata Working Group

February 22, 2008

Page 2: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

The Original Problem

• Reduce linking dead ends from a publisher’s content to another

• Show multiple subscriptions or relevant access points in one place

• Desire to show the most appropriate version of the service (like full text)

• Improve content visibility

• Possibly reduce document delivery costs

Page 3: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Brief History of OpenURL

• Originated by Herbert van de Stompel at Univ. of Ghent, around 2000– Became OpenURL Version 0.1

• Commercialized by ExLibris (SFX) in 2001

• Fast-tracked by NISO– Released as Version 1.0, but officially as international ANSI standard

Z39.88 in 2004

• OCLC is maintenance agency as of June 2006

Page 4: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

What is OpenURL?

• OpenURL is a syntax for querying a server

• to perform a service

• on a resource– specified by attributes

• sensitive to context– also specified by attributes

OpenURL is an "actionable" URL that transports resource metadata.

Page 5: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

OpenURL Version 0.1 Example

http://linkresolver.library.cornell.edu:4550/resserv?genre=article&issn=01604120&title=Environment+International&volume=32&issue=1&date=20060101&atitle=The+United+States+Department+of+Energy's+Regional+Carbon+Sequestration+Partnerships+program.&spage=128&pages=128-144&sid=EBSCO:aph&aulast=Litynski

Page 6: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

OpenURL Version 1.0 Example

http://linkresolver.library.cornell.edu:4550/resserv?url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&rfr_id=info:sid/www.isinet.com:wok:wos&rft.au=giordanino,+m&rft.epage=377&rft.stitle=knowl+eng+rev&rft.date=2007&rft_id=info:doi/

10.1017%2fs0269888907001233/&url_ver=z39.88-2004&rft.issn=0269-8889&rft.aulast=uren&rft.title=knowledge+engineering+review&rft.genre=article&rft.issue=4&rft.spage

=361&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.volume=22&rft.auinit=v&rft.atitle=the+usability+of+semantic+search+tools

%3a+a+review

Page 7: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

How does it work?

Page 8: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

OpenURL Version 0.1

• OpenURL 0.1 is a de facto standard that is built around scholarly bibliographic data only

• An accepted “standard” syntax for creating a link between an information source and a link resolver

• Pre-defines sets of data elements to use in describing an “item”

• Relies on HTTP protocol for transmission

• The concept of context-sensitive linking implemented for a specific class of resources: (some) scholarly assets

Page 9: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

OpenURL Version 0.1

Limitations:

• Pre-defined metadata genres and elements means that new ones cannot be defined to meet emerging needs (e.g., for image databases)

• Only provides for key-value pair (HTTP GET or POST) representation of metadata.

• OpenURL 0.1 is tied to HTTP transport

• Lack of implementation guidelines means that support for OpenURL is loosely defined

Page 10: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

OpenURL Version 1.0

– Complicated and highly abstract– Designed for greater flexibility– Slower uptake– Supports richer data formats/genres

• Journal, Article, Proceeding, Preprint, Book, Report, Document, Patent, Dissertation, etc

– Provides more complete context description– Supports transport mechanisms other than HTTP

• like SOAP, OAI-PMH, HTTPS– A generic specification that allows to implement OpenURL Applications

• OpenURL Applications: networked applications that implement the concept of context-sensitive services for a certain class of resources

Page 11: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Resolver

Referentreferenceabout

servicespertaining toReferent

networkedresource

Transport

descriptionof Referent & context

ContextObject

Understanding OpenURL Version 1.0

Diagram is from Herbert von de Sompel’s OpenURL Tutorial at the

Olybris 2005 Ex Libris Seminar, Kos, Greece, April 18th 2005.

Page 12: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Understanding OpenURL Version 1.0

• OpenURL 1.0 divides ContextObject into six entities (including the resource) – Each entity has attributes to identify it– Each entity has schema for those attributes

• Each entity affects URL resolution

Page 13: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Problems with the Standard & Documentation

•Tough read

•Key/Encoded-Value (KEV) “Implementation Guidelines” are helpful, but complex

•Not specific enough in many ways. Some mention of best practices for metadata values like:

•UTF-8 encoding for special characters•DCMI Type Vocabulary for Referent Type (rft.type)•MIME type for Referent Format (rft.format)

Page 14: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Miriam Blake citation and the Known Issues• M.E. Blake, F.L. Knudson. Metadata and Reference Linking. Libr. Coll. Acq.

& Tech. Serv. 26 (2002) 219–230 229

• Goals for the future:– Increased consistency in metadata within a single database and

across databases.

– Increased communication between primary publishers and secondary publishers.

– Increased awareness of bibliographic/citation standards by authors.

– Increased outreach by librarians to authors emphasizing and promoting the importance of citation standards for electronic document retrieval.

Page 15: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Link Resolvers and the Serials Supply Chain

[UKSG Report] -- 2007

•Description of the Supply Chain

•Issues and Barriers •Lack of awareness•Lack of Co-operation•Inaccurate/Incomplete Data•Content Package Issues•Responsibility of Data Quality•Lack of Data Standards•Inbound Linking Issues•Etc…

•Recommendations

Page 16: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Problems Persist

1. Wrong start end date in the local library's holdings database

2. Wrong link-to syntax in link resolver

3. Inaccurate or missing Crossref DOI URL (often the DOI registration process is out of sync with the mounting of articles)

Page 17: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Problems Persist

4. Semantically inaccurate metadata from the OpenURL origin (wrong ISSN, for example)

Page 18: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Problems Persist

5. Syntactically incorrect metadata from the OpenURL origin

Page 19: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Problems Persist

6. Subscription and embargo errors (especially in January)

– For each month that passes the chances of the link working is increased by over 8%

Page 20: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Characteristics of a solution to the OpenURL quality problem

•empirical•network level problem: so it needs be solved at the network level • sanctioned, officially recognized•offer value to librarians and content providers•narrow scope

Page 21: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Model: Open Language Archives Community

Metadata Quality Evaluation: Experience from the Open Language Archives Community, Baden Hughes, Department of Computer Science and Software Engineering, University of Melbourne,Abstract. We describe the motivation, design and implementation of aninfrastructure to support metadata quality assessment within a specialised OpenArchives Initiative (OAI) sub-domain, the Open Language ArchivesCommunity (OLAC). While services for structural validation of metadata arewidely used, there is little corresponding work regarding services whichevaluate the semantic and syntactic content of metadata from a qualitativeperspective. We posit that any measure of metadata quality benefits from bothcontextual and referential assessment - metadata on a per record and percollection basis is legitimately assessed against the baseline of broadercommunity practice, as well as for compliance to any external standard. In thispaper we describe the implementation of a metadata quality assessment scheme,and the corresponding interfaces to the evaluation tool.

http://eprints.infodiv.unimelb.edu.au/archive/00001408/01/ICADL2004-PUBLISHED.pdf

Page 22: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Model: Open Language Archives Community

Metrics

• code existence score, 0-1 (bonus for using controlled vocabulary)• element absence penalty, 0-1 (penalty for missing core elements)• per metadata record weighted aggregate, max 10• archive level derivative metrics

• archive diversity metric (use of controlled vocabulary across the archive)• metadata quality score metric (derived from individual scores)• core elements per record metric • core element usage metric• code usage metrics• code and element usage metrics• “star rating” (derived from average item score in archive)

http://eprints.infodiv.unimelb.edu.au/archive/00001408/01/ICADL2004-PUBLISHED.pdf

Page 23: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Case Study: L'Année philologique

Log file provided by Professor Eric Rebillard, Director of Graduate Studies, Field of Classics

http://www.annee-philologique.com/aph/

126 OpenURLs in sample

Page 24: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley
Page 25: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley
Page 26: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Observations: log file scan

[Log file is not available in Powerpoint version. Please contact Adam Chandler for more information]

Page 27: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Observations: date

log examples:

2000-20012000-20012000-20012004-20052004-20052003-20042004-20051998-19992004-20052004-2005

Date of publication in ISO 8601 form YYYY, YYYY-MM or YYYY-MM-MM [p.56]

NOTE: "chron" Indications of chronology in a non ISO8601 form (like "Spring" or "1st quarter") should be carried in this element; the element content is notnormalized. Where numeric ISO8601 dates are also available, they should be provided in the "date" element. As such, a recorded date of publication of "Spring, 1992" becomes "date=1992" and "chron=spring". Chronology information can also be provided in the "ssn" and "quarter" elements [p. 57]

Page 28: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Observations: volume and issue

Volume is usually expressed as a number but could be roman numerals or non-numeric, e.g. "124", or "VI"."4“ [p.57]

Issue: This is the designation of the published issue of a journal, corresponding to the actual physical piece in most cases. While usually numeric, it could be nonnumeric. Note that some publications use chronology in the place of enumeration, i.e. Spring, 1998. [p.58]

log examples:

N.%20S.%2055%20(1)7%20(1)43%20(3-4)N.%20S.%2055%20(2)4a%20ser.%203%20(1)N%B0%20152N%B0%20547%20(2)133%20(2)13-144a%20ser.%203%20(1)31%20(1)133%20(2)38%20(3)98%20(1)N.%20S.%2055%20(1)

Page 29: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Observations: spage

"spage=" is missing: more useful than pages field when linking to full text

First page number of a start/end (spage-epage) pair. Note that pages are not always numeric [p.58]

Page 30: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Observations: missing ISSNs

International Standard Serial Number (ISSN). ISSN numbers may contain a hyphen, e.g. "1041-5653" [p. 59]

"ISSN=" these are easier to resolve than titles, especially with titles that contain special characters

Page 31: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Observations: character encoding

Character encoding:

Use UTF-8

Specify character encoding this way in OpenURl 1.0: info:ofi/enc:UTF-8

Source: http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecords&metadataPrefix=oai_dc&set=Core:Character+Encodings

Page 32: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Observations: Missing WorldCat numbers

Including OCLC WorldCat numbers would help to resolve title level ambiguities, especially when the request is routed to InterLibrary Loan

Data from title matching in WorldCat

17 titles without an ISSN

To do this means moving from OpenURL 0.1 to 1.0 format

info:ofi/nam:info:oclcnum:

Source: http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecords&metadataPrefix=oai_dc&set=Core:Namespaces

Page 33: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Analysis of L'Année philologique in log sample that are held in WorldCat libraries

Total titles analyzed: 81

Total confirmed held by Cornell in WorldCat: 53 (margin of error)

Unconfirmed in or out of WorldCat: 6

Median number of libraries that hold these titles: 67

Thus, even if the metadata were perfect, finding the title through ILL, especially without an identifier (ISSN, ISBN, WorldCat) is expensive.

Caveat: Not all of a library’s holdings are in WorldCat, especially journals.

Page 34: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Cornell link resolver activity: December 3, 2007 – February 8, 2008: 53,062 openurls were sent to link resolver.

The scale of the OpenURL quality problem

Page 35: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Discussion

Page 36: The OpenURL Quality Problem & Project Adam Chandler Coordinator, Service Design Group Glen Wiley

Notes and links

http://library4.library.cornell.edu/openurl/index.html

How many openurls came into Cornell dec – feb?

http://www.language-archives.org/index.html

http://www.niso.org/standards/standard_detail.cfm?std_id=783

http://erms.library.cornell.edu/webbridge/edit

beforelinks.html