Upload
kory-manning
View
242
Download
0
Tags:
Embed Size (px)
Citation preview
Interoperability Fundamentals: OAI-PMH and OAI-ORE
SUETr Interoperability Event9th December 2008London School of Economics Library
Dr Robert SandersonDept. of Computer ScienceUniversity of [email protected]
http://www.openarchives.org/ore/http://foresite.cheshire3.org/
SUETr Interoperability Event 9th December Slide 1
Interoperability Fundamentals: OAI-PMH and OAI-ORE
Overview
OAI: Protocol for Metadata Harvesting
Introduction
Technical Details
Example
OAI: Object Reuse and Exchange
Introduction
ORE for Repositories (Motivation)
RDF and Atom
Support
SUETr Interoperability Event 9th December Slide 2
Interoperability Fundamentals: OAI-PMH and OAI-ORE
OAI: Protocol for Metadata Harvesting
Does pretty much what it says on the tin:
An XML over HTTP protocol ... that allows a client to harvest ... all of the metadata records in a repository.
SUETr Interoperability Event 9th December Slide 3
Interoperability Fundamentals: OAI-PMH and OAI-ORE
OAI-PMH Request: ListIdentifiers, GetRecords
OAI-PMH Response: (Records)
Local Fetch RecordLocal Store Record
Service Provider Data Provider
Architecture
Distinction between Data Provider (repository) and Service Provider (someone who does something with the data)
Most service providers are aggregators of more than one repository Eg: Search, Analysis, Summarization, Caching, Proxies, ...
Or could be used for inter-repository transfer/update, where the Service Provider is also a Data Provider.
Distinction between Centralized and Distributed architectureCentralized: Harvest everything into one place and then search (PMH)Distributed: Leave data where it is and search remotely (Z39.50/SRU)
But can be combined – distributed search over centralized database providing an SRU interface and single distributed databases
SUETr Interoperability Event 9th December Slide 4
Interoperability Fundamentals: OAI-PMH and OAI-ORE
Technical Details
Single URL end point that handles protocol eg: http://www.cheshire3.org/services/oai?
Operation (verb) as a parameter:Identify: Tell me about yourselfListMetadataFormats: Tell me which formats you supportListSets: What sets of records do you supportListIdentifiers: Retrieve headers for recordsListRecords: Retrieve full recordsGetRecord: Retrieve single known record
List operations by timestamp of update to the record:
...?verb=ListIdentifers&metadataPrefix=oai_dc&from=2008-12-01
Hence can ask only for changed records since you last harvested
Compare to RSS/Atom (even order isn't guaranteed!)
SUETr Interoperability Event 9th December Slide 5
Interoperability Fundamentals: OAI-PMH and OAI-ORE
Support?
LOTS of libraries, as the protocol is easy to implement.Simple google stats for +oai-pmh +download +(language)
c#: 2,450perl: 4,520c++: 5,440ruby: 19,800python: 21,700java: 28,000php: 47,300
Okay not all are implementations, but you get the picture!
Active mailing list (still!)
Repository Explorer / Conformance Tester
Lots of service providers looking to suck up data (eg OAIster)
SUETr Interoperability Event 9th December Slide 6
Interoperability Fundamentals: OAI-PMH and OAI-ORE
Example Interaction
Harvester wants to fetch all of the metadata records in a repository since it last harvested, in the simple dublin core format.Verb to use: ListRecordshttp://repo.example.org/oai?verb=ListRecords&from=2008-11-01&metadataPrefix=oai_dc
Response:
SUETr Interoperability Event 9th December Slide 7
Interoperability Fundamentals: OAI-PMH and OAI-ORE
<OAI-PMH><responseDate>2002-06-01T19:20:30Z</responseDate>
<request verb="ListRecords" from="2008-11-01" metadataPrefix="oai_dc"> http://repo.example.org/oai?</request> <ListRecords> <record> <header> <identifier>oai:arXiv.org:hep-th/9901001</identifier> <datestamp>2008-12-02</datestamp> </header> <metadata> <!-- Record Is Here --> </metadata>
</record> <record> ... </record>
...</ListRecords>
</OAI-PMH>
I Don't Want All This @#*&)#%*&!
Problem: In order to download the records you want, you have to download everything and then filter it. This just wastes everyone's time.
Solution (?):There are server defined sets of records (not nested).Each record knows which sets it is a member of.Can fetch only those records which are part of a named set.
How are the sets defined? By the server/repository admin...
Many people have tried to add search functionality to OAI-PMH...
This is Wrong Wrong Wrong and shows a fundamental misunderstanding of the role of OAI-PMH in the overall information landscape!For search, there's OpenSearch and SRU. (Another talk!)
SUETr Interoperability Event 9th December Slide 8
Interoperability Fundamentals: OAI-PMH and OAI-ORE
What is ORE?
A method for making complex digital objects available over the web...
In order for the object and its component parts to be easily and seamlessly reused as parts of other objects and in other contexts...
And exchanged between organizations, infrastructures and services.
A set of projects funded by the Andrew W. Mellon Foundation, the Coalition for Networked Information, Microsoft, the National Science Foundation, and the Joint Information Systems Committee, under the Open Archives Initiative.
SUETr Interoperability Event 9th December Slide 9
Interoperability Fundamentals: OAI-PMH and OAI-ORE
Who is Responsible?
Principal Investigators:Carl Lagoze (Cornell University)Herbert Van de Sompel (Los Alamos National Labs)
Editors:Pete Johnston (Eduserv Foundation)Michael Nelson (Old Dominion University)Rob Sanderson (University of Liverpool)Simeon Warner (Cornell University)
Technical and Advisory Boards:Including: Liz Lyon, Peter Murray Rust, Les Carr, Richard Jones,
Julie Allinson, Andy Powell, Lorcan Dempsey, John Erickson, MacKenzie Smith, Tony Hammond, Savas Parastatidis, Robert Tansley, Jane Hunter, Tim Cole, Leigh Dodds, Tim DiLauro, Jeff Young, ...
SUETr Interoperability Event 9th December Slide 10
Interoperability Fundamentals: OAI-PMH and OAI-ORE
Main Idea of ORE
Create a way to describe an Aggregation of Resources... and the relationships between them... without changing the way we do things... without changing the resources themselves... in a manner consistent with the web architecture
Add boundary information over top of the connected resources on the web
Publish this information using existing technologies... which we call a Resource Map
This is concept is nothing new...
SUETr Interoperability Event 9th December Slide 11
Interoperability Fundamentals: OAI-PMH and OAI-ORE
SUETr Interoperability Event 9th December Slide 12
Interoperability Fundamentals: OAI-PMH and OAI-ORE
The Sky An Aggregation of Stars
The Web
SUETr Interoperability Event 9th December Slide 13
Interoperability Fundamentals: OAI-PMH and OAI-ORE
Aggr
Aggr
ReM
ReM
... with Boundary Information
... and Additional Relationships
ORE for Repositories
SUETr Interoperability Event 9th December Slide 14
Interoperability Fundamentals: OAI-PMH and OAI-ORE
Key:
1 URI2 Formats3 Title4 Authors5 Creation Dates6 Similar Objects7 Versions8 Links out9 Citations in/out
a Abstractb Journal
a
b
SUETr Interoperability Event 9th December Slide 15
Interoperability Fundamentals: OAI-PMH and OAI-ORE
1
3
4 5
a
9
8 7
b2
Interoperability Fundamentals 9th December Slide 16
Interoperability Fundamentals: OAI-PMH and OAI-ORE
23
45
6
9 8
a
b
RDF
The ORE Data Model is defined as a Graph, and expressed in RDF.
We express these relationships as triples:
Interoperability Fundamentals 9th December Slide 17
Interoperability Fundamentals: OAI-PMH and OAI-ORE
44
11
5533
6622
Aggr
Aggr
ReM
ReM
XX
URI-ReMURI-AggrURI-AggrURI-1URI-5
ore:describesore:aggregatesore:aggregatesdcterms:referencesrdf:seeAlso
URI-AggrURI-1 [...]URI-6URI-2URI-X
Where's the Data?
Triples can also have literal strings, numbers, dates etc:
URI-Aggr dcterms:modified “2008-12-09T10:30:00Z”URI-Aggr dc:title “Rob's New Aggregation”
In our examples, the green aggregated resources are the different formats for the same work. That makes the Aggregation a resource that somehow represents the work in the abstract, and the Resource Map a description of that.
URI-ReM ore:describes URI-AggrURI-ReM dcterms:modified “2008-12-09T10:30:00Z”URI-Aggr ore:aggregates URI-ps, URI-pdf, URI-htmlURI-Aggr dc:title “Parametrization of ...”URI-Aggr dcterms:modified “2006-01-18T06:30:00Z”URI-Aggr ore:similarTo info:doi/10.1142/
S02177...URI-Aggr dcterms:creator URI-HuiURI-Hui foaf:name “Hui Li”URI-ps dc:format “application/
postscript”...
Interoperability Fundamentals 9th December Slide 18
Interoperability Fundamentals: OAI-PMH and OAI-ORE
Serializations
RDF has MANY serializations, including simple triple formats, XML formats, and RDFa – a way to embed RDF in XHTML.Recommended are RDF/XML and RDFa.
Also recommended is an Atom serialization:
Each Aggregation is an Atom <entry>, and the atom elements are mapped to the predicate (middle) part of the triple, eg author → dcterms:creatorAggregated Resources are referenced in <link> elements.Anything that can't be expressed natively in atom goes into an <ore:triples> extension block.
This allows aggregations to sit in regular Atom feeds for discoveryAnd plays nicely with other Atom based protocols like OpenSearch or other GData like systems
Interoperability Fundamentals 9th December Slide 19
Interoperability Fundamentals: OAI-PMH and OAI-ORE
Support
Not as much as OAI-PMH ... yet! Version 1.0 only released in October.
Libraries: Foresite Toolkithttp://foresite-toolkit.googlecode.com/
Java (ORE 0.9, Richard Jones) and Python (ORE 1.0, me)Idea: Build an object model on top of RDF graph:
a = Aggregation()a.title = “New Aggregation”
Validator: Atom/ORE Validatorhttp://www.openarchives.org/ore/1.0/atom-validator
From Los Alamos National Labs, plus other transforms
Generic RDF Libraries, Converters:Available in most languages... talk to me about writing a foresite library!
Interoperability Fundamentals 9th December Slide 20
Interoperability Fundamentals: OAI-PMH and OAI-ORE
Repository Operations
Create: Send ORE in Atom via SWORD from client Update: Send ORE in Atom via SWORD from client to existing URISearch: Return ORE via OpenSearch/SRUHarvest: Return ORE via OAI-PMHArchive: Archive ORE Resource Map plus Aggregated Resources
Export: Export ORE (to be created by other) Import: Create from ORE
Real Life Example
Wrapper around Flickr API to export Photos/Photosets (Rob)ORE Importer into Omeka Digital Library Platform (Sean Hannan)Ran importer against flickr wrapper to import photos out of flickr, along with metadata, different sizes, etc. Seamless Interoperability!
Other examples: DSpace, Fedora, MyExperiment, JSTOR, WordPress, ...
Interoperability Fundamentals 9th December Slide 21
Interoperability Fundamentals: OAI-PMH and OAI-ORE
Thank You :)
Questions?
URLs:Me: [email protected]
PMH: http://www.openarchives.org/pmh/ORE: http://www.openarchives.org/ore/Foresite: http://foresite-toolkit.googlecode.com/This: http://www.csc.liv.ac.uk/~azaroth/papers/suetr-ore.pdf
(Bonus points for expressing the above as an ORE Aggregation!)
Interoperability Fundamentals 9th December Slide 22
Interoperability Fundamentals: OAI-PMH and OAI-ORE