22
Interoperability Fundamentals: OAI-PMH and OAI-ORE SUETr Interoperability Event 9 th December 2008 London School of Economics Library Dr Robert Sanderson Dept. of Computer Science University of Liverpool [email protected] http://www.openarchives.org/ore/ http://foresite.cheshire3.org/ SUETr Interoperability Event 9 th December Slide 1 Interoperability Fundamentals: OAI-PMH and OAI-ORE

Interoperability Fundamentals: OAI-PMH and OAI-ORE SUETr Interoperability Event 9 th December 2008 London School of Economics Library Dr Robert Sanderson

Embed Size (px)

Citation preview

Interoperability Fundamentals: OAI-PMH and OAI-ORE

SUETr Interoperability Event9th December 2008London School of Economics Library

Dr Robert SandersonDept. of Computer ScienceUniversity of [email protected]

http://www.openarchives.org/ore/http://foresite.cheshire3.org/

SUETr Interoperability Event 9th December Slide 1

Interoperability Fundamentals: OAI-PMH and OAI-ORE

Overview

OAI: Protocol for Metadata Harvesting

Introduction

Technical Details

Example

OAI: Object Reuse and Exchange

Introduction

ORE for Repositories (Motivation)

RDF and Atom

Support

SUETr Interoperability Event 9th December Slide 2

Interoperability Fundamentals: OAI-PMH and OAI-ORE

OAI: Protocol for Metadata Harvesting

Does pretty much what it says on the tin:

An XML over HTTP protocol ... that allows a client to harvest ... all of the metadata records in a repository.

SUETr Interoperability Event 9th December Slide 3

Interoperability Fundamentals: OAI-PMH and OAI-ORE

OAI-PMH Request: ListIdentifiers, GetRecords

OAI-PMH Response: (Records)

Local Fetch RecordLocal Store Record

Service Provider Data Provider

Architecture

Distinction between Data Provider (repository) and Service Provider (someone who does something with the data)

Most service providers are aggregators of more than one repository Eg: Search, Analysis, Summarization, Caching, Proxies, ...

Or could be used for inter-repository transfer/update, where the Service Provider is also a Data Provider.

Distinction between Centralized and Distributed architectureCentralized: Harvest everything into one place and then search (PMH)Distributed: Leave data where it is and search remotely (Z39.50/SRU)

But can be combined – distributed search over centralized database providing an SRU interface and single distributed databases

SUETr Interoperability Event 9th December Slide 4

Interoperability Fundamentals: OAI-PMH and OAI-ORE

Technical Details

Single URL end point that handles protocol eg: http://www.cheshire3.org/services/oai?

Operation (verb) as a parameter:Identify: Tell me about yourselfListMetadataFormats: Tell me which formats you supportListSets: What sets of records do you supportListIdentifiers: Retrieve headers for recordsListRecords: Retrieve full recordsGetRecord: Retrieve single known record

List operations by timestamp of update to the record:

...?verb=ListIdentifers&metadataPrefix=oai_dc&from=2008-12-01

Hence can ask only for changed records since you last harvested

Compare to RSS/Atom (even order isn't guaranteed!)

SUETr Interoperability Event 9th December Slide 5

Interoperability Fundamentals: OAI-PMH and OAI-ORE

Support?

LOTS of libraries, as the protocol is easy to implement.Simple google stats for +oai-pmh +download +(language)

c#: 2,450perl: 4,520c++: 5,440ruby: 19,800python: 21,700java: 28,000php: 47,300

Okay not all are implementations, but you get the picture!

Active mailing list (still!)

Repository Explorer / Conformance Tester

Lots of service providers looking to suck up data (eg OAIster)

SUETr Interoperability Event 9th December Slide 6

Interoperability Fundamentals: OAI-PMH and OAI-ORE

Example Interaction

Harvester wants to fetch all of the metadata records in a repository since it last harvested, in the simple dublin core format.Verb to use: ListRecordshttp://repo.example.org/oai?verb=ListRecords&from=2008-11-01&metadataPrefix=oai_dc

Response:

SUETr Interoperability Event 9th December Slide 7

Interoperability Fundamentals: OAI-PMH and OAI-ORE

<OAI-PMH><responseDate>2002-06-01T19:20:30Z</responseDate>

<request verb="ListRecords" from="2008-11-01" metadataPrefix="oai_dc"> http://repo.example.org/oai?</request> <ListRecords> <record> <header> <identifier>oai:arXiv.org:hep-th/9901001</identifier> <datestamp>2008-12-02</datestamp> </header> <metadata> <!-- Record Is Here --> </metadata>

</record> <record> ... </record>

...</ListRecords>

</OAI-PMH>

I Don't Want All This @#*&)#%*&!

Problem: In order to download the records you want, you have to download everything and then filter it. This just wastes everyone's time.

Solution (?):There are server defined sets of records (not nested).Each record knows which sets it is a member of.Can fetch only those records which are part of a named set.

How are the sets defined? By the server/repository admin...

Many people have tried to add search functionality to OAI-PMH...

This is Wrong Wrong Wrong and shows a fundamental misunderstanding of the role of OAI-PMH in the overall information landscape!For search, there's OpenSearch and SRU. (Another talk!)

SUETr Interoperability Event 9th December Slide 8

Interoperability Fundamentals: OAI-PMH and OAI-ORE

What is ORE?

A method for making complex digital objects available over the web...

In order for the object and its component parts to be easily and seamlessly reused as parts of other objects and in other contexts...

And exchanged between organizations, infrastructures and services.

A set of projects funded by the Andrew W. Mellon Foundation, the Coalition for Networked Information, Microsoft, the National Science Foundation, and the Joint Information Systems Committee, under the Open Archives Initiative.

SUETr Interoperability Event 9th December Slide 9

Interoperability Fundamentals: OAI-PMH and OAI-ORE

Who is Responsible?

Principal Investigators:Carl Lagoze (Cornell University)Herbert Van de Sompel (Los Alamos National Labs)

Editors:Pete Johnston (Eduserv Foundation)Michael Nelson (Old Dominion University)Rob Sanderson (University of Liverpool)Simeon Warner (Cornell University)

Technical and Advisory Boards:Including: Liz Lyon, Peter Murray Rust, Les Carr, Richard Jones,

Julie Allinson, Andy Powell, Lorcan Dempsey, John Erickson, MacKenzie Smith, Tony Hammond, Savas Parastatidis, Robert Tansley, Jane Hunter, Tim Cole, Leigh Dodds, Tim DiLauro, Jeff Young, ...

SUETr Interoperability Event 9th December Slide 10

Interoperability Fundamentals: OAI-PMH and OAI-ORE

Main Idea of ORE

Create a way to describe an Aggregation of Resources... and the relationships between them... without changing the way we do things... without changing the resources themselves... in a manner consistent with the web architecture

Add boundary information over top of the connected resources on the web

Publish this information using existing technologies... which we call a Resource Map

This is concept is nothing new...

SUETr Interoperability Event 9th December Slide 11

Interoperability Fundamentals: OAI-PMH and OAI-ORE

SUETr Interoperability Event 9th December Slide 12

Interoperability Fundamentals: OAI-PMH and OAI-ORE

The Sky An Aggregation of Stars

The Web

SUETr Interoperability Event 9th December Slide 13

Interoperability Fundamentals: OAI-PMH and OAI-ORE

Aggr

Aggr

ReM

ReM

... with Boundary Information

... and Additional Relationships

ORE for Repositories

SUETr Interoperability Event 9th December Slide 14

Interoperability Fundamentals: OAI-PMH and OAI-ORE

Key:

1 URI2 Formats3 Title4 Authors5 Creation Dates6 Similar Objects7 Versions8 Links out9 Citations in/out

a Abstractb Journal

a

b

SUETr Interoperability Event 9th December Slide 15

Interoperability Fundamentals: OAI-PMH and OAI-ORE

1

3

4 5

a

9

8 7

b2

Interoperability Fundamentals 9th December Slide 16

Interoperability Fundamentals: OAI-PMH and OAI-ORE

23

45

6

9 8

a

b

RDF

The ORE Data Model is defined as a Graph, and expressed in RDF.

We express these relationships as triples:

Interoperability Fundamentals 9th December Slide 17

Interoperability Fundamentals: OAI-PMH and OAI-ORE

44

11

5533

6622

Aggr

Aggr

ReM

ReM

XX

URI-ReMURI-AggrURI-AggrURI-1URI-5

ore:describesore:aggregatesore:aggregatesdcterms:referencesrdf:seeAlso

URI-AggrURI-1 [...]URI-6URI-2URI-X

Where's the Data?

Triples can also have literal strings, numbers, dates etc:

URI-Aggr dcterms:modified “2008-12-09T10:30:00Z”URI-Aggr dc:title “Rob's New Aggregation”

In our examples, the green aggregated resources are the different formats for the same work. That makes the Aggregation a resource that somehow represents the work in the abstract, and the Resource Map a description of that.

URI-ReM ore:describes URI-AggrURI-ReM dcterms:modified “2008-12-09T10:30:00Z”URI-Aggr ore:aggregates URI-ps, URI-pdf, URI-htmlURI-Aggr dc:title “Parametrization of ...”URI-Aggr dcterms:modified “2006-01-18T06:30:00Z”URI-Aggr ore:similarTo info:doi/10.1142/

S02177...URI-Aggr dcterms:creator URI-HuiURI-Hui foaf:name “Hui Li”URI-ps dc:format “application/

postscript”...

Interoperability Fundamentals 9th December Slide 18

Interoperability Fundamentals: OAI-PMH and OAI-ORE

Serializations

RDF has MANY serializations, including simple triple formats, XML formats, and RDFa – a way to embed RDF in XHTML.Recommended are RDF/XML and RDFa.

Also recommended is an Atom serialization:

Each Aggregation is an Atom <entry>, and the atom elements are mapped to the predicate (middle) part of the triple, eg author → dcterms:creatorAggregated Resources are referenced in <link> elements.Anything that can't be expressed natively in atom goes into an <ore:triples> extension block.

This allows aggregations to sit in regular Atom feeds for discoveryAnd plays nicely with other Atom based protocols like OpenSearch or other GData like systems

Interoperability Fundamentals 9th December Slide 19

Interoperability Fundamentals: OAI-PMH and OAI-ORE

Support

Not as much as OAI-PMH ... yet! Version 1.0 only released in October.

Libraries: Foresite Toolkithttp://foresite-toolkit.googlecode.com/

Java (ORE 0.9, Richard Jones) and Python (ORE 1.0, me)Idea: Build an object model on top of RDF graph:

a = Aggregation()a.title = “New Aggregation”

Validator: Atom/ORE Validatorhttp://www.openarchives.org/ore/1.0/atom-validator

From Los Alamos National Labs, plus other transforms

Generic RDF Libraries, Converters:Available in most languages... talk to me about writing a foresite library!

Interoperability Fundamentals 9th December Slide 20

Interoperability Fundamentals: OAI-PMH and OAI-ORE

Repository Operations

Create: Send ORE in Atom via SWORD from client Update: Send ORE in Atom via SWORD from client to existing URISearch: Return ORE via OpenSearch/SRUHarvest: Return ORE via OAI-PMHArchive: Archive ORE Resource Map plus Aggregated Resources

Export: Export ORE (to be created by other) Import: Create from ORE

Real Life Example

Wrapper around Flickr API to export Photos/Photosets (Rob)ORE Importer into Omeka Digital Library Platform (Sean Hannan)Ran importer against flickr wrapper to import photos out of flickr, along with metadata, different sizes, etc. Seamless Interoperability!

Other examples: DSpace, Fedora, MyExperiment, JSTOR, WordPress, ...

Interoperability Fundamentals 9th December Slide 21

Interoperability Fundamentals: OAI-PMH and OAI-ORE

Thank You :)

Questions?

URLs:Me: [email protected]

PMH: http://www.openarchives.org/pmh/ORE: http://www.openarchives.org/ore/Foresite: http://foresite-toolkit.googlecode.com/This: http://www.csc.liv.ac.uk/~azaroth/papers/suetr-ore.pdf

(Bonus points for expressing the above as an ORE Aggregation!)

Interoperability Fundamentals 9th December Slide 22

Interoperability Fundamentals: OAI-PMH and OAI-ORE