21
Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte, KUL Joris Klerkx, KUL

Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,

Embed Size (px)

Citation preview

Open Archives Iniative – Protocol for Metadata Harvesting

Iztok Kavkler, University of Ljubljana

Some slides byStefaan Ternier, KULBram Vandenputte, KULJoris Klerkx, KUL

2

What is OAI?

Harvesting standard, documented athttp://www.openarchives.org/OAI/openarchivesprotocol.html

Seven service verbs– Identify– ListMetadataFormats– GetRecord– ListRecords– ListIdentifiers– ListSets

Allows multiple metadata formats– DC (Dublin core) format mandatory

3

How OAI works

OAI “VERBS”– Identify – ListMetadataFormats– GetRecord– ListIdentifiers– ListRecords– ListSets

HARVESTER

REPOSITORY

OAI OAI

Service Provider Metadata Provider

HTTP Request

HTTP Response

(OAI Verb)

(Valid XML)

4

Try it

Install Apache-Tomcat or any other Java servlet container

Download WAR file from

http://fire.eun.org/Iztok/OAILREApp.war Deploy WAR Demo html

http://localhost:8080/OAILREApp/

Or type a service verb, e.g.http://localhost:8080/OAILREApp/oaiHandler?verb=Identify

5

The raw XML

By default, the resulting XML has stylesheet attached for pretty rendering

To remove the stylesheet comment the line

OAIHandler.styleSheet=testoai/oaicat.xsl

in file

oaicat.properties (in WAR file or the web-app dir)

6

OAI XML example<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" ...><responseDate>2007-06-11T06:48:58Z</responseDate><request metadataPrefix="oai_lom"

verb="ListRecords">http://localhost:8080/OAILREApp/oaiHandler</request><ListRecords> <record> <header>

<identifier>oai:oai.xyz-repository.com:exercises/112553</identifier><datestamp>2007-06-09T22:38:28Z</datestamp><setSpec>exercises</setSpec>

</header> <metadata>

<lom xmlns=...> ... </lom> </metadata> </record>

....<resumptionToken expirationDate="2007-06-11T07:48:58Z"completeListSize="42" cursor="10">1181544538265</resumptionToken></ListRecords></OAI-PMH>

7

OAICat - a Java implementation

OAICat home athttp://www.oclc.org/research/software/oai/cat.htm

Takes care of– web service details– OAI XML specification

The implementer has to provide three classes– RepositoryOAICatalog– RepositoryRecordFactory– Repository2oai_dc (lom, ...) - usually more than

one

8

A sample implementation

(Source code and libs inhttp://fire.eun.org/Iztok/OAILREApp.zip)

Create a new web module Add servlet oaiHandler to web.xml<servlet>

<servlet-name>LreOAIHandler</servlet-name>

<servlet-class>ORG.oclc.oai.server.OAIHandler</servlet-class>

<load-on-startup>5</load-on-startup>

</servlet>

<servlet-mapping>

<servlet-name>LreOAIHandler</servlet-name>

<url-pattern>/oaiHandler</url-pattern>

</servlet-mapping>

9

(cont)

Define properties file location<context-param>

<param-name>properties</param-name>

<param-value>oaicat.properties</param-value>

</context-param>

Welcome file for testing<welcome-file-list>

<welcome-file>testoai/index.html</welcome-file>

</welcome-file-list>

10

Sample record

A record with basic fieldsid, url, title, descr and date

SampleOAICatalog contains an array with 3 sample records

11

SampleOAICatalog.listIdentifiers

Parameters– from – date to harvest from (String in iso8601

format) date or datetime - depends on granularity

– to – date to harvest to– set – a set name, list only records from this set (if

null, list all records) set names classify objects in natural groups every record may belong to multiple sets (or none)

– metadaPrefix – list only records that support this format (sample formats: oai_dc, oai_lom, ...)

12

SampleOAICatalog.listIdentifiers

Must return a map with to fields– headers – a String iterator of OAI headers– identifiers – a String iterator of OAI identifiers

Both created by the call (rec is a SampleRecord)String[] header = getRecordFactory().createHeader(rec);

headers.add(header[0]);

identifiers.add(header[1]);

Create resultMap<String, Object> listIdMap = new HashMap<String, Object>();

listIdMap.put("headers", headers.iterator());

listIdMap.put("identifiers", identifiers.iterator());

return listIdMap;

13

getRecordFactory().createHeader(rec)

Creates header by calling the methods in SampleRecordFactory

String getOAIIdentifier(Object rec)– return full oai identifier “oai:oay.rep.com:id001”

String getDatestamp(Object rec)– returns date in iso8601 format

Iterator<String> getSetSpecs (Object rec)ArrayList<String> list = new ArrayList<String>();

list.add(...);

return list.iterator(); Iterator<String> getAbouts (Object rec) String fromOAIIdentifier(String id)

– helper method – convert id to a local id

14

SampleOAICatalog.listSets

takes no parameters, returns the list of all sets in this repository– each ListIdentifiers or ListRecords query may

contain a set name, limiting the results to just one set

15

SampleOAICatalog.getSchemaLocations

like GetRecord, but returns the Vector of all metadata schema locations the record supports– to obtain them, just call

getRecordFactory().getSchemaLocations(rec);

16

SampleOAICatalog.getRecord

String getRecord(String id, String metadataPrefix)– find record and convert it to xml string (<record> element)– id is in global format – to get local value call

getRecordFactory().fromOAIIdentifier(id)– throw IdDoesNotExistException if record not found– to generate XML use constructRecord

constructRecord(rec, metadataPrefix)

17

SampleOAICatalog.listRecords

just like ListIdentifiers, only generates a list of XML <record> elements

return a map with one elementMap<String, Object> listRecMap = new HashMap<String, Object>();

listRecMap.put(“records", records.iterator());return listRecMap;

18

Crosswalks

Conversions of native record type to XML like Sample2oai_lom or Sample2oai_dc

Only two methods per implementation– boolean isAvailableFor(Object rec)– String createMetadata(Object rec)

SampleRecord record = (SampleRecord) rec;return LOMFormat.writeStringWithSchema(record.toLOM());

throw CannotDisseminateFormatException if the metadata not available in this format

19

SampleRecord.toLOM

uses LOM-j lib to quickly hack together LOMhttp://sourceforge.net/projects/lom-j/

– automatic serialization/deserialization of LOM and DC XML formats

Examplelom.newGeneral().newIdentifier(0).newCatalog().setString("lre");

lom.newGeneral().newIdentifier(0).newEntry().setString("sample:" + id);

lom.newTechnical().newLocation(-1).setString(url);

lom.newGeneral().newTitle().newString(0).newLanguage().setValue("en");

lom.newGeneral().newTitle().newString(0).setString(title);

20

Resumption

A repository usually has fixed limit on the numer of records to return in one call– if there are more available, it returns a resumption

token, allowing to receive next packet– Implemented by functions

listIdentifiers(String resumptionToken) ,listRecords(String resumptionToken)

– see XYZOAICatalog for details

21

References

http://www.openarchives.org/OAI/openarchivesprotocol.html http://www.fmf.uni-lj.si/~kavkler/ http://www.oclc.org/research/software/oai/cat.htm http://www.cs.kuleuven.ac.be/~hmdb/SqiOaiMelt http://sourceforge.net/projects/lom-j/ SIO/Trubar OAI url

http://sio.edus.si/LreTomcat/