Upload
edmund-chamberlain
View
939
Download
1
Embed Size (px)
DESCRIPTION
Citation preview
Ed ChamberlainSystems Development Librarian
Cambridge University Library
Disclaimers … Apologies if you see the semantic web
as up there with quantum mechanics …
Will contain some techy stuff
Not that much on Voyager …
Overview Linked data in theory
What we learntIPRDataSupporting technology
How could it be used by Ex Libris?
What is the semantic web? “The Semantic Web is a "man-made woven web of
data" that facilitates machines to understand the semantics, or meaning, of information on the World Wide Web[1][2].”
“The concept of Semantic Web applies methods beyond linear presentation of information (Web 1.0) and multi-linear presentation of information (Web 2.0) to make use of hyper-structures leading to entities of hypertext.”
http://en.wikipedia.org/wiki/Semantic_Web
Eh? Semantic = its meaning is explained -
self-describing data!
Hyperlinked = meaning contextualised elsewhere
Focus on machines rather than people
What is Linked Data … After several iterations of semantic web development …
Tim Berners-Lee has advocated four underlying design principles for linked data:
1. Use URIs as names for things2. Use HTTP URIs so that people can look up those names3. When someone looks up a URI, provide useful information,
using the standards (RDF, SPARQL)4. Include links to other URIs, so that they can discover more
things
http://www.w3.org/DesignIssues/LinkedData.html
And RDF ? The Resource Description Framework
(RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax formats.
http://en.wikipedia.org/wiki/Resource_Description_Framework
What does this mean in practice … RDF Data is expressed as triples:
DC XML …<dc:identifer>1000346</dc:identifer><dc:title>Early medieval history of Kashmir : [with special reference to the Loharas] A.D.
1003-1171</dc:title>
Marc21 …001 1000346245$aEarly medieval history of Kashmir : $b[with special reference to the Loharas] A.D.
1003-1171 /
RDF triples …<http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/title> "Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-
1171" .
Most of a record …1. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/title> "Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-1171" .2. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/type> <http://data.lib.cam.ac.uk/id/type/1cb251ec0d568de6a929b520c4aed8d1> .3. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/type> <http://data.lib.cam.ac.uk/id/type/46657eb180382684090fda2b5670335d> .4. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/identifier> "UkCU1000346" .5. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/issued> "1981" .6. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/creator> <http://data.lib.cam.ac.uk/id/entity/cambrdgedb_a5a6f7a184ff02e08b1befedc1b3a4d0> .7. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/language> <http://id.loc.gov/vocabulary/iso639-2/eng> .8. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://RDVocab.info/ElementsplaceOfPublication> <http://id.loc.gov/vocabulary/countries/ii>
Where is the linking exactly? <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346>
<http://purl.org/dc/terms/creator> <http://data.lib.cam.ac.uk/id/entity/cambrdgedb_a5a6f7a184ff02e08b1befedc1b3a4d0>
<http://data.lib.cam.ac.uk/id/entity/cambrdgedb_a5a6f7a184ff02e08b1befedc1b3a4d0> <http://www.w3.org/2000/01/rdf-schema#label> "Mohan, Krishna" . <http://data.lib.cam.ac.uk/id/entity/cambrdgedb_a5a6f7a184ff02e08b1befedc1b3a4d0> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <http://data.lib.cam.ac.uk/id/entity/cambrdgedb_a5a6f7a184ff02e08b1befedc1b3a4d0> <http://xmlns.com/foaf/0.1#name> "Mohan, Krishna" .
External linking <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/subject>
<http://data.lib.cam.ac.uk/id/entry/cambrdgedb_43e3fa1b4404410454c90d8022578852> .
<http://data.lib.cam.ac.uk/id/entry/cambrdgedb_43e3fa1b4404410454c90d8022578852> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> . <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_43e3fa1b4404410454c90d8022578852> <http://www.w3.org/2004/02/skos/core#inScheme> <http://id.loc.gov/authorities#conceptScheme> . <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_43e3fa1b4404410454c90d8022578852> <http://www.w3.org/2004/02/skos/core#prefLabel> "Lohars -- History" . <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_43e3fa1b4404410454c90d8022578852> <http://purl.org/dc/terms/hasPart> <http://id.loc.gov/authorities/sh85078149#concept> .
Live demo … http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346
Meanwhile … BNB
British Museum
Library of Congress
BBC Nature
The Linking Open Data cloud diagram - http://richard.cyganiak.de/2007/10/lod/
What was COMET? Cambridge Open Metadata
Cambridge University Library / CARET / OCLC
Funded by the JISC Infrastructure for Resource Discovery Project
February to July 2011
http://discovery.ac.uk
What did COMET do …1. Experimentally convert as much of the Cambridge
University Library catalogue as it could from Marc21 to RDF triples
2. Investigate IPR issues around Open License publishing and Marc21
3. Construct an RDF publishing platform to site behind those URI’s …
4. Release tools for others to do the same
5. Blog and documentation
Why? Respond to academic / national demand for
Open Data
Get our data to non-librarians!
Tax-payer value-for-money
CUL already provides public APIs
Gain in-house experience of RDF
Move library services forward
Why - IPR Linked data works best with a
permissive license
CC0 or Public Domain Data License
Non-commercial licenses not suitable
Conflict with record vendors
How – IPR Examine contracts with major vendors
Decide on re-use conditions and contact them
Decode record ownership from Marc21 fields (Could not use Voyager SQL)
How – IPR Where does a record come from ?
Several places in Marc21 where this data could be held (015,035,038,994 …)
Logic and hierarchy for examination
Attempt at scripted analysis – list bib_ids by record vendor
What - IPR Most vendors happy with permissive
license for ‘non-marc21’ formats
RLUK / BL B.N.B. – PDDL
OCLC – ODC-By Attribution license
No good reason not to re-publish – need the right license!
IPR - What did we learn? Marc21 not fit for purpose here, no
‘authoritative code’ for license
National / international mandate to release open data
No good reason not to re-publish – need the right license!
How - data Several attempts – settled on SQL
extracts based on lists of bib_ids
Use Perl scripting to ‘munge’ the data
You can try this at home ! (work)
How - marc problems Punctuation as a function
Binary encoding
Numbers for field names
Bad characters
Replication of data in fields
How – data vocab RDF allows you to freely mix vocabularies
Emerging consensus on bibliographic description
Our conversion script is CSV customisable
BL and others leading the way
How - data publishing Bulk downloads
Queryable ‘endpoints’
Data and code at http://data.lib.cam.ac.uk
How – linking PHP script to match text against LOC
subject headings – enrich with LOC GUID
FAST / VIAF enrichment courtesy of OCLC
Data - What did we learn ? Marc / AACR2 cannot translate will to
semantically rich formats
Need better container / transfer standards (not necessarily RDF)
What else?
RDF friendly database Called RDF stores, triplestores or
Quadstores
Vary in size scale and scope
None are particularly admin / dev friendly right now …
How - SPARQL Query language for RDF stores
Still a work in progress
Some similarities with SQL
Bibliographic-centric tutorial
How –storage and access ARC2 - Lightweight MYSQL / PHP
solution
Good fit for a six month projectGreat for around 3-500 k recordsNot so good for 1 million plus20 million + ?
Supporting tech -What did we learn? Triplestores are cumbersome
SPARQL alone does not do the trick
High entry barrier to RDF is partly a result of these accompanying technologies
What does this mean for Ex Libris Building whole systems around RDF is not really a
good idea
Need the flexibility to do this by dropping Marc21
GUIDS for records (or allow us to have our own) – resolvable ?
Ensure any RDF publishing capacity is flexible (as ours is)
RDF capability for Primo ?
Always add value to RDF … Standalone RDF is just fiddly Dublin Core, so …
Create httpd URI’s for entities
Link it to something useful (LOC, FAST, VIAF)
Endpoint (SPARQL?)
Don’t limit to the bibliographic
Beyond bibliographic
Bibliographic
Holdings
FAST subject headings
Libraries
Transactions
Special collections
Archives
Creator / entity
Place of publication
LCSH subject headings
Course lists
Language
Librarians
Do what Tim said …1. Use URIs as names for things2. Use HTTP URIs so that people can look
up those names3. When someone looks up a URI, provide
useful information, using the standards (RDF, SPARQL)
4. Include links to other URIs, so that they can discover more things
http://www.w3.org/DesignIssues/LinkedData.
html
Questions? @edchamberlain / [email protected]
http://data.lib.cam.ac.uk
http://cul-comet.blogspot.com/