66
Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Embed Size (px)

Citation preview

Page 1: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Linked Data for Libraries (LD4L)

CUL Metadata Working GroupMay 15, 2015

Page 2: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Linked Data for Libraries (LD4L)

• Andrew W. Mellon Foundation made a two-year $999K grant to Cornell, Harvard, and Stanford starting Jan 2014

• Partners will work together to develop an ontology and linked data sources that provide relationships, metadata, and broad context for Scholarly Information Resources

• Leverages existing work by both the VIVO project and the Hydra Partnership

Page 3: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015
Page 4: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Specific LD4L Goals

• Free information from existing library system silos to provide context and enhance discovery of scholarly information resources

• Leverage usage information about resources• Link bibliographic data about resources with academic

profile systems and other external linked data sources• Assemble (and where needed create) a flexible,

extensible LD ontology to capture all this information about our library resources

• Demonstrate combining and reconciling the assembled LD across our three institutions

Page 5: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Bibliographic Data

• MARC• MODS• EAD

Person Data• VIVO • ORCID• ISNI• VIAF

Usage Data• Circulation• Citation• Curation

• Exhibits• Research

Guides• Syllabi• Tags

LD4L Data Sources

LD4L

Page 6: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Agenda

• Simeon – Intro (done) and use cases• Jon – Ontologies• Steven – Modified Bibframe for LD4L• Rebecca – UC2, mass conversion, post

processing• Lynette – UC1.1, ORE, scraping, lookups• Naun – Preview of LD4P

Page 7: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

LD4L Use Case Clusters1. Bibliographic + curation

data2. Bibliographic + person

data3. Leveraging external data

including authorities4. Leveraging the deeper

graph (via queries or patterns)

5. Leveraging usage data6. Three-site services, e.g.

cross-site search

42 Raw Use Cases

12 Refined Use Casesin 6 clusters…

Page 8: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Cluster 1. Bibliographic + curation dataUC1.1: Build a virtual collection

“As a faculty member or librarian, I want to create a virtual collection or exhibit containing information resources from multiple collections across multiple universities either by direct selection or by a set of resource characteristics, so that I can share a focused collection with a <class, set of researchers, set of students in a disciplinary area>.”

Page 9: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

UC1.1: Build a virtual collection

• Individual selection or resources• Collection may be exhibit, reading list, etc.• Includes description, ordering• Shared or private• Enable across institutions

• Well developed, uses annotations• Demo => Lynette Rayle

Page 10: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

UC1.2: Tag scholarly information resources to support reuse

“As a librarian, I would like to be able to tag scholarly information resources from one or multiple institutions into curated lists, so that I can feed these these lists into subject guides, course reserves, or reference collections. I'd like these lists to be portable (into Drupal, into LibGuides, into Spotlight! or Omeka, into Sakai, e.g.) and durable. I'd like these lists/tags to selectively feed back into the discovery environment without having to modify the catalog records.”

Page 11: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

UC1.2: Tag scholarly information resources to support reuse

• Scale O(100,000) items• Use for e.g. virtual subject library• Selection by rules and patterns• Collaborative curation• Feed data into discovery environment

• Same model as 1.1, partially developed• CULLR v2

Page 12: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

CULLR -> provide virtual library “views/collections” as part of main Blacklight discovery system

Page 13: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Cluster 2. Bibliographic + person dataUC2.1: See and search on works by people to discover more works, and

better understand people

“As a researcher, I'd like to see / search on works <by, about, cited by, collected, taught> by University faculty <in an OPAC, profiles system>, to discover works of interest based on connection to people, and to understand people based on their relation to works.”

Page 14: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

UC2.1: See and search on works by people to discover more works, and

better understand people

• Profile + catalog data• Perhaps between institutions: multiple

catalogs + multiple profile systems• Poor journal data suggests likely can do more

in non-journal disciplines

• Demo => Rebecca Younes

Page 15: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

UC3.1: Search with Geographic Data for Record Enrichment and Pivoting

“As a researcher, I'd like to see the geographic context of my search results, and be able to pivot, extend or refine a search with a single click, in order to better assess found resources, find related resources, and filter or expand search results to broaden or narrow a search on the fly.”

Page 16: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

UC3.x: Search with XYZ for Record Enrichment and Pivoting

XYZ = geographic / subject /person data

• External data and entity linking• Things not strings• Different data types have different challenges

and data sources

Page 17: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Cluster 4. Leveraging the deeper graph (via queries or Patterns)

UC4.1: Identifying related works

“As a scholar, I would like to find all the images associated with various instances of a work sorted by time, so that I can see how the depictions of or illustrations in a work have changed over time.”

“(GloPAD specific): As a scholar, I would like to find all costume photographs and scene illustrations for various stagings and performances of the plays of a particular author or the operas of a particular composer, so that I can see how the visual look of performances of the plays or operas have changed over time.”

Page 18: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

UC4.1: Identifying related works

• use complex graph relationships via queries or patterns (rather than direct connections

• allow discovery that would not be possible without the semantics of different relationship

• restricted by available data

• Steven Folsom work on complex and linked descriptions for hip-hop flyers

Page 19: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

UC4.2: Leverage the deeper graph to surface more relevant works

“As a researcher, I would like to see resources in response to a search where the relevance ranking of the results reflects the "importance" of the works, based on how they have been used or selected by others, so that I can find important resources that might otherwise be "hidden" in a large set of results.”

Page 20: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Cluster 5. Leveraging usage dataUC5.1: Research guided by community

usage

“As a researcher, I want to find what is being used (read, annotated, bought by libraries, etc.) by the scholarly communities not only at my institution but at others, and to find sources used elsewhere but not by my community”

Page 21: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

UC5.1: Research guided by community usage

• need to understand community of user– direct: auth and from identity?– inferred as part of discovery?

• cross-institutional – need usage data that can be shared and merged

Page 22: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

UC5.2: Be guided in collection building by usage

“As a librarian, I would like help building my collection by seeing what is being used by students and faculty.”

“As a subject librarian, I would like to see what resources in my subject area are heavily used at peer institutions but are not in my institution’s collection.”

Page 23: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

UC5.2: Be guided in collection building by usage

• business analytics• help libraries make best use of collection

building activities and funds• useful at both institutional or cross-

institutional levels• cross-institutional – need usage data that can

be shared and merged

Page 24: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Usage data

• Most work at Harvard

Page 25: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Cluster 6. Three site servicesUC6.1: Cross-site search

As a scholar I don’t want my discovery process to be constrained by the collection boundaries of my university yet I want to retain the detailed coverage of special collections that are important in my field.

Bonus points: I want results ranked by the scholarly value, not simply popularity in the public eye.

Page 26: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

UC6.1: Cross-site search

• demonstrate the power of sharing library data as linked open data– opportunities for third parties– benefits for partners

• deal with dupes & near dupes, group in works

Will have simple demo by end of project using all the LOD from our three institutions merged in LD4L ontology / Bibframe

Page 27: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015
Page 28: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

LD4L Ontology Team Overview

Metadata Working GroupMay 15, 2015

Jon Corson-Rikert

Page 29: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

The team

Page 30: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Goals for an LD4L ontology framework(from the 2013 proposal)

• reuse appropriate parts of currently available ontologies while introducing extensions and additions where necessary

• be sufficiently expressive to encompass traditional catalog metadata from the 3 partners

• maintain compatibility with VIVO and research networking ontologies

• include usage and other contextual elements

Page 31: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Early LD4L discussions

• What library information is local vs. global?• How do we add links to external identifiers,

authorities, and real world objects (RWOs)?• How do we link across our libraries?• What existing ontologies should we use?• What services and workflows do we need?• Which services are (or aren’t) ready for prime

time?

Page 32: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Other starting points

• LD4L is not about original cataloging (that’s LD4P)

• LD4L is not trying to reconcile the different approaches of Schema.org and BIBFRAME

• Linking to data beyond library catalogs is a focus– Predisposed to re-use ontologies appropriate to

the domain involved• Ontology work has focused on our use cases

Page 33: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

MWG April 18, 2003

Page 34: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Ontology design

Source: MK Bergmann, http://www.mkbergman.com

Page 35: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Ontologies need to evolve

http://www.mkbergman.com/908/a-new-methodology-for-building-lightweight-domain-ontologies/

Page 36: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Ontology orthogonality

http://www.w3.org/2007/Talks/0223-Bangalore-IH/Slides.html

Page 37: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

BIBFRAME and Schema.org in LD4L

• We considered the needs of our project and our libraries

• Both will likely need the greater expressiveness that BIBFRAME offers– Technical services units are actively participating

in BIBFRAME training and trials• Not an either/or proposition -- libraries also

want broader discoverability– Value in exposing bibliographic metadata on the

web with Schema.org tags

Page 38: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Conversations continue on the BF list …• “Much of Bibframe’s problems seem to come from trying to keep

everything from the past (MARC) while moving to something fit for the future, an ambition that has I think also afflicted RDA” (Thomas Meehan)

• “Why do the 'powers that be' think that we even want our local catalogs to be semantically connected to the web or have all of our data linked?!” (Michael Ayres)

• “The whole linking idea is great, but really, after 40 years using MARC21, some yahoo wants to unravel everything and bill me for it? I don’t think so.” (Jeffrey Trimble)

Page 39: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Simple core framework

Page 40: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Becomes more complex …

Page 41: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Gets downright messy …

Page 42: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015
Page 43: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Simple framework, ironing out details

Page 44: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

More work in progress …

Page 45: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Local vs. global identifiers

• Establishing local identifiers allows libraries to make their own assertions about resources and authorities– Assigning stable linked data URIs are accessible anywhere

• Shared references to global identifiers enable both direct and indirect linkages

• OCLC, VIAF, ISNI, ORCID and others are addressing global identifiers at scale

Page 46: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Local linked data identifiers

• For library resources but also local or unique information on people, organizations, events, collections– There is value and credibility in the institutional

namespace• Supplement with locally-sourced and/or

locally-targeted annotations– Not restricted to local generation or visibility

• Does not require exposing all operational data

Page 47: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

From strings to things

• People• Organizations• Places• Subjects• Events• Works• Datasets

Image: designyoutrust.com

Page 48: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Entity resolution

• Can potentially happen before, during, or after MARC to RDF conversion

• Can draw on existing authorities directly and indirectly– Local sources may involve custom workflows and

services– Remote sources are likely shared and can more likely

benefit from standardized services• Assess potential to loop back

– From non-MARC sources to other catalog resources– From external sources to local

Page 49: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Working with non-MARC metadata

• Faculty research profiles (CAP, Faculty Finder, VIVO)

• Library guides and other library-sourced web content

• Digital collections that vary widely in size and complexity, and that encompass diverse subject domains

• Pilot projects– Cornell’s Hip Hop flyers– Harvard’s Visual Image Access metadata

Page 50: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Usage data

• Inspiration– Harvard’s Stacklife

• Goals– To supplement library discovery interfaces– To inform collection review and additions

• Challenges– Data availability– Concerns for patron privacy

• Potential for a normalized stack score across institutions

Page 51: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Stacklife screenshot

Page 52: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015
Page 53: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

LD4L and BIBFRAME:Then and Now

Image credit: http://www.chicagobearboards.com/node/4372

Steven FolsomCornell University Library Metadata Working Group Forum

May 15th, 2015

Page 54: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Why is LD4L attempting to Use BIBFRAME?

• An ontology built by the Library of Congress as a replacement for MARC– If/when it solidifies, we will likely see data provided by

libraries using this model• Open Source tools available for converting MARC to

BIBFRAME• LD4L perspective is that BIBFRAME will likely be a

starting point, with post-processing needed to meet some of our use cases

• Didn’t want to develop an ontology from scratch within the timeframe of the grant…

Page 55: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

BIBFRAME Vocabulary

Status (As of May 15, 2015, 11 am EST)• Paused in its current/first version since January 2014 to allow the

community to respond with feedback and so that tools could be built to test it in pilot projects

• Expecting a new version this spring based on feedback, more on this in a moment

Structure• Classes: Work, Instance, Authority and Annotation

– bf:Work loosely conflates FRBR concepts of Works and Expressions

– bf:Instance loosely aligns with FRBR Manifestion– No equivalent to FRBR Item yet in the vocabulary

Page 57: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Image credits: http://bibframe.org/vocab-model/ and http://en.wikipedia.org/wiki/George_Clooney

What BIBFRAME Isn’t.

Page 58: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

LD4L’s Role in Grounding BIBFRAMEin Linked Data Best Practices

Image credit: http://www.moviefanatic.com/2014/01/gravity-george-clooney-reacts-to-oscar-nominations/

Page 59: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

BIBFRAME: Areas for Improvement

Including, but not limited to…

Page 60: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Improvements: Reuse Existing Classes and Properties from Proven Ontologies

•bf:Person foaf:Person

•bf:Event schema:Event

•bf:subject dcterms:subject

Page 61: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Improvements: Provide Inverse Relationships

For example:

•bf:hasPart, add bf:isPartOf

•bf:hasReproduction, add bf:reproduces)

•dcterms:creator, add bf:creatorOf

Page 62: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Improvements: Lessen the number of sub-properties that do not add semantics

For Example:

• bf:titleValue doesn’t add more meaning between a title entity and its value. The more generalizable rdf:value works fine.

Page 63: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Improvements: Focus on Things not Strings

Page 64: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

BIBFRAME (Post-Makeover)

Page 65: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015

Questions I want to answer going forward…

•How will descriptive practice change in with this more modular approach to the structure?

–What changes to content standards will be needed? (Especially for related entities, e.g. People, Publishers…)

•What new data sources will the move to RDF allow for? (VIVO, MusicBrainz, Getty Vocabularies, ISNI)

•What types of entities can we describe better in RDF? (e.g. Awards, Events)

•How can we take advantage of URIs right now in MARC, while ensuring a smoother transition to linked data?

Page 66: Linked Data for Libraries (LD4L) CUL Metadata Working Group May 15, 2015