39
Ed Chamberlain Systems Development Librarian Cambridge University Library

Linked data and voyager

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Linked data and voyager

Ed ChamberlainSystems Development Librarian

Cambridge University Library

Page 2: Linked data and voyager

Disclaimers … Apologies if you see the semantic web

as up there with quantum mechanics …

Will contain some techy stuff

Not that much on Voyager …

Page 3: Linked data and voyager

Overview Linked data in theory

What we learntIPRDataSupporting technology

How could it be used by Ex Libris?

Page 4: Linked data and voyager

What is the semantic web? “The Semantic Web is a "man-made woven web of

data" that facilitates machines to understand the semantics, or meaning, of information on the World Wide Web[1][2].”

“The concept of Semantic Web applies methods beyond linear presentation of information (Web 1.0) and multi-linear presentation of information (Web 2.0) to make use of hyper-structures leading to entities of hypertext.”

http://en.wikipedia.org/wiki/Semantic_Web

Page 5: Linked data and voyager

Eh? Semantic = its meaning is explained -

self-describing data!

Hyperlinked = meaning contextualised elsewhere

Focus on machines rather than people

Page 6: Linked data and voyager

What is Linked Data … After several iterations of semantic web development …

Tim Berners-Lee has advocated four underlying design principles for linked data:

1. Use URIs as names for things2. Use HTTP URIs so that people can look up those names3. When someone looks up a URI, provide useful information,

using the standards (RDF, SPARQL)4. Include links to other URIs, so that they can discover more

things

http://www.w3.org/DesignIssues/LinkedData.html

Page 7: Linked data and voyager

And RDF ? The Resource Description Framework

(RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax formats.

http://en.wikipedia.org/wiki/Resource_Description_Framework

Page 8: Linked data and voyager

What does this mean in practice … RDF Data is expressed as triples:

DC XML …<dc:identifer>1000346</dc:identifer><dc:title>Early medieval history of Kashmir : [with special reference to the Loharas] A.D.

1003-1171</dc:title>

Marc21 …001 1000346245$aEarly medieval history of Kashmir : $b[with special reference to the Loharas] A.D.

1003-1171 /

RDF triples …<http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/title> "Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-

1171" .

Page 9: Linked data and voyager

Most of a record …1. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/title> "Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-1171" .2. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/type> <http://data.lib.cam.ac.uk/id/type/1cb251ec0d568de6a929b520c4aed8d1> .3. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/type> <http://data.lib.cam.ac.uk/id/type/46657eb180382684090fda2b5670335d> .4. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/identifier> "UkCU1000346" .5. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/issued> "1981" .6. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/creator> <http://data.lib.cam.ac.uk/id/entity/cambrdgedb_a5a6f7a184ff02e08b1befedc1b3a4d0> .7. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/language> <http://id.loc.gov/vocabulary/iso639-2/eng> .8. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://RDVocab.info/ElementsplaceOfPublication> <http://id.loc.gov/vocabulary/countries/ii>

Page 10: Linked data and voyager

Where is the linking exactly? <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346>

<http://purl.org/dc/terms/creator> <http://data.lib.cam.ac.uk/id/entity/cambrdgedb_a5a6f7a184ff02e08b1befedc1b3a4d0>

<http://data.lib.cam.ac.uk/id/entity/cambrdgedb_a5a6f7a184ff02e08b1befedc1b3a4d0> <http://www.w3.org/2000/01/rdf-schema#label> "Mohan, Krishna" . <http://data.lib.cam.ac.uk/id/entity/cambrdgedb_a5a6f7a184ff02e08b1befedc1b3a4d0> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <http://data.lib.cam.ac.uk/id/entity/cambrdgedb_a5a6f7a184ff02e08b1befedc1b3a4d0> <http://xmlns.com/foaf/0.1#name> "Mohan, Krishna" .

Page 11: Linked data and voyager

External linking <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/subject>

<http://data.lib.cam.ac.uk/id/entry/cambrdgedb_43e3fa1b4404410454c90d8022578852> .

<http://data.lib.cam.ac.uk/id/entry/cambrdgedb_43e3fa1b4404410454c90d8022578852> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> . <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_43e3fa1b4404410454c90d8022578852> <http://www.w3.org/2004/02/skos/core#inScheme> <http://id.loc.gov/authorities#conceptScheme> . <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_43e3fa1b4404410454c90d8022578852> <http://www.w3.org/2004/02/skos/core#prefLabel> "Lohars -- History" . <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_43e3fa1b4404410454c90d8022578852> <http://purl.org/dc/terms/hasPart> <http://id.loc.gov/authorities/sh85078149#concept> .

Page 12: Linked data and voyager

Live demo … http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346

Page 13: Linked data and voyager

Meanwhile … BNB

British Museum

Library of Congress

BBC Nature

Page 14: Linked data and voyager

The Linking Open Data cloud diagram - http://richard.cyganiak.de/2007/10/lod/

Page 15: Linked data and voyager

What was COMET? Cambridge Open Metadata

Cambridge University Library / CARET / OCLC

Funded by the JISC Infrastructure for Resource Discovery Project

February to July 2011

http://discovery.ac.uk

Page 16: Linked data and voyager

What did COMET do …1. Experimentally convert as much of the Cambridge

University Library catalogue as it could from Marc21 to RDF triples

2. Investigate IPR issues around Open License publishing and Marc21

3. Construct an RDF publishing platform to site behind those URI’s …

4. Release tools for others to do the same

5. Blog and documentation

Page 17: Linked data and voyager

Why? Respond to academic / national demand for

Open Data

Get our data to non-librarians!

Tax-payer value-for-money

CUL already provides public APIs

Gain in-house experience of RDF

Move library services forward

Page 18: Linked data and voyager

Why - IPR Linked data works best with a

permissive license

CC0 or Public Domain Data License

Non-commercial licenses not suitable

Conflict with record vendors

Page 19: Linked data and voyager

How – IPR Examine contracts with major vendors

Decide on re-use conditions and contact them

Decode record ownership from Marc21 fields (Could not use Voyager SQL)

Page 20: Linked data and voyager

How – IPR Where does a record come from ?

Several places in Marc21 where this data could be held (015,035,038,994 …)

Logic and hierarchy for examination

Attempt at scripted analysis – list bib_ids by record vendor

Page 21: Linked data and voyager
Page 22: Linked data and voyager

What - IPR Most vendors happy with permissive

license for ‘non-marc21’ formats

RLUK / BL B.N.B. – PDDL

OCLC – ODC-By Attribution license

No good reason not to re-publish – need the right license!

Page 23: Linked data and voyager

IPR - What did we learn? Marc21 not fit for purpose here, no

‘authoritative code’ for license

National / international mandate to release open data

No good reason not to re-publish – need the right license!

Page 24: Linked data and voyager

How - data Several attempts – settled on SQL

extracts based on lists of bib_ids

Use Perl scripting to ‘munge’ the data

You can try this at home ! (work)

Page 25: Linked data and voyager

How - marc problems Punctuation as a function

Binary encoding

Numbers for field names

Bad characters

Replication of data in fields

Page 26: Linked data and voyager

How – data vocab RDF allows you to freely mix vocabularies

Emerging consensus on bibliographic description

Our conversion script is CSV customisable

BL and others leading the way

Page 27: Linked data and voyager

How - data publishing Bulk downloads

Queryable ‘endpoints’

Data and code at http://data.lib.cam.ac.uk

Page 28: Linked data and voyager

How – linking PHP script to match text against LOC

subject headings – enrich with LOC GUID

FAST / VIAF enrichment courtesy of OCLC

Page 29: Linked data and voyager

Data - What did we learn ? Marc / AACR2 cannot translate will to

semantically rich formats

Need better container / transfer standards (not necessarily RDF)

Page 30: Linked data and voyager

What else?

Page 31: Linked data and voyager

RDF friendly database Called RDF stores, triplestores or

Quadstores

Vary in size scale and scope

None are particularly admin / dev friendly right now …

Page 32: Linked data and voyager

How - SPARQL Query language for RDF stores

Still a work in progress

Some similarities with SQL

Bibliographic-centric tutorial

Page 33: Linked data and voyager

How –storage and access ARC2 - Lightweight MYSQL / PHP

solution

Good fit for a six month projectGreat for around 3-500 k recordsNot so good for 1 million plus20 million + ?

Page 34: Linked data and voyager

Supporting tech -What did we learn? Triplestores are cumbersome

SPARQL alone does not do the trick

High entry barrier to RDF is partly a result of these accompanying technologies

Page 35: Linked data and voyager

What does this mean for Ex Libris Building whole systems around RDF is not really a

good idea

Need the flexibility to do this by dropping Marc21

GUIDS for records (or allow us to have our own) – resolvable ?

Ensure any RDF publishing capacity is flexible (as ours is)

RDF capability for Primo ?

Page 36: Linked data and voyager

Always add value to RDF … Standalone RDF is just fiddly Dublin Core, so …

Create httpd URI’s for entities

Link it to something useful (LOC, FAST, VIAF)

Endpoint (SPARQL?)

Don’t limit to the bibliographic

Page 37: Linked data and voyager

Beyond bibliographic

Bibliographic

Holdings

FAST subject headings

Libraries

Transactions

Special collections

Archives

Creator / entity

Place of publication

LCSH subject headings

Course lists

Language

Librarians

Page 38: Linked data and voyager

Do what Tim said …1. Use URIs as names for things2. Use HTTP URIs so that people can look

up those names3. When someone looks up a URI, provide

useful information, using the standards (RDF, SPARQL)

4. Include links to other URIs, so that they can discover more things

http://www.w3.org/DesignIssues/LinkedData.

html

Page 39: Linked data and voyager

Questions? @edchamberlain / [email protected]

http://data.lib.cam.ac.uk

http://cul-comet.blogspot.com/