26
http://www.ukoln.ac.uk/ Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University of Bath, UK [email protected] 2003 Dublin Core Conference, Seattle, Washington, USA 28 September - 2 October 2003

Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

Embed Size (px)

Citation preview

Page 1: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

Integrating metadata schema registries with digital preservation systems to support interoperability

Michael DayUKOLN, University of Bath, [email protected]

2003 Dublin Core Conference, Seattle, Washington, USA28 September - 2 October 2003

Page 2: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

Presentation outline

– Preservation metadata• Purpose• Standards

– Concepts of interoperability• Metadata capture and inheritance• Object exchange

– Metadata schema registries• Definitions• Application to digital preservation systems

– Concluding thoughts

Page 3: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

Preservation metadata (1)

• All digital preservation strategies depend - to some extent - on the creation, capture and maintenance of metadata

– "Preserving the right metadata is key to preserving digital objects" (ERPANET Briefing Paper, 2003)

• Defined as:– The various types data that will allow the re-

creation and interpretation of the structure and content of digital data over time (Ludäsher, Marciano & Moore, 2001)

Page 4: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

Preservation metadata (2)

• Metadata fulfil various roles, e.g.:– "… to find, manage, control, understand or

preserve … information over time" (Cunningham, 2000)

– Descriptive information; technical information about formats and structure; information about provenance and context; administrative information, e.g. for rights management

– Current schemas either very complex or only provide a basic framework (sometimes both!)

– Perception that different strategies and objects will need different metadata

Page 5: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

Preservation metadata - standards

– Developed from many different perspectives:• Digital libraries:

– METS, NISO Z39.87 (to support digitisation initiatives)– OCLC/RLG Framework, Cedars, NEDLIB, NLA, NLNZ– OAIS influence has been greatest in this area

• Records management and archival description:– Pittsburgh BAC, RKMS, NAA, VERS, PRO, EAD, etc.

– Also standards not specifically developed for preservation, but with some overlap:

• Multimedia– MPEG-7, SMPTE, etc

• Rights management:– <indecs>, MPEG-21, etc.

Page 6: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

The OAIS model

– Reference Model for an Open Archival Information System (OAIS)

– ISO 14721:2003– Established a common framework of terms and

concepts– Influential on the design of some schemas

» e.g., OCLC/RLG Metadata Framework– Identified basic functions:

» Ingest, Data Management, Archival Storage, Administration, Access, Preservation Planning

Page 7: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

OAIS functional model

Administration

Ingest

ArchivalStorage

Access

DataManagement

Descriptive info.

PRODUCER

CONSUMER

MANAGEMENT

queries

result sets

Descriptive info.

Preservation Planning

orders

OAIS Functional Entities (Figure 4-1)

SIP

SIP

SIP

DIP

DIP

AIP AIP

Page 8: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

OAIS information objects

• Information Object (basic concept)– Data Object (bit-stream)– Representation Information (permits “the full

interpretation of Data Object into meaningful information”)

• Information Object Classes– Content Information– Preservation Description Information (PDI)– Packaging Information– Descriptive Information

Page 9: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

OAIS information packages

• Information package:– Container that encapsulates Content Information

and PDI– Packages for submission (SIP), archival storage

(AIP) and dissemination (DIP)» AIP = “... a concise way of referring to a set of

information that has, in principle, all of the qualities needed for permanent, or indefinite, Long Term Preservation of a designated Information Object”

– PDI = other information (metadata) “which will allow the understanding of the Content Information over an indefinite period of time”

» Reference, Provenance, Context, Fixity

Page 10: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

Draft categorisation (1)

PracticalConceptual

PRO

NEDLIB

DCMI

METS

RKMSPITT

VERS

NLNZ

NLACEDARS

OCLC/RLG

MPEG-7

Z39.87

OAIS

Page 11: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

Draft categorisation (2)

• Earliest schemas were largely conceptual in nature:

– e.g. Pittsburgh BAC model, Cedars outline specification, OCLC/RLG WG I

• Gradually moving towards a more practical focus:

– e.g., VERS, NLNZ, METS, PREMIS WG– Convergence on XML (DTDs and Schemas)

• But there is an urgent need for all this practical experience to be shared

– e.g., published schemas, advice on implementation, etc.

Page 12: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

Implementation

– We need to prove the practical value of metadata frameworks and 'outline specifications'

– It can be difficult for implementers to use these as a guide to the design of real systems?

– We need to move from the conceptual to the practical, need to move beyond proof-of-concept

– Positive signs:• METS/NISO Z39.87• OCLC/RLG PREMIS WG looking at implementation

strategies for preservation metadata

Page 13: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

Sustainability (1)

• Balance risks with costs:– There is a perception that metadata creation and

maintenance will be expensive– But costs associated with data recovery are not

trivial– Need to balance the risks of data loss with the

cost of creating metadata» Cost/benefit analysis» Robust selection criteria» Co-operation between repositories» Re-use of existing metadata

Page 14: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

Sustainability (2)

• Avoid imposing unnecessary costs:– Avoid large schemas (?)– Need to identify the right metadata - 'core

metadata' (?)

Page 15: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

Interoperability (1)

• Heterogeneity– The need to cope with a wide (and growing)

range of preservation metadata standards, object types, formats, etc.

– No realistic prospect of a single standard– Repositories will need to manage a range of

metadata standards, at least within ingest and access functions

Page 16: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

Interoperability (2)

• Metadata creation and capture– Created by humans or captured automatically?– Some metadata already exists, e.g.:

» Embedded within objects» In separate databases» Generated by particular processes

– Need for this metadata to be captured at creation, ingest, migration, and at other appropriate points in object life-cycle

Page 17: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

Interoperability (3)

• Benefits of interoperability:– To support the capture or inheritance of

metadata, e.g. on ingest– To support the management of multiple formats

and metadata schema within a digital preservation system

» Current metadata specifications not entirely clear on how this should be done

– To support the exchange of information packages outside the repository, e.g. by converting to standard 'exchange formats'

» Networks of 'trusted repositories'

Page 18: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

Registries (1)

• Registries of metadata schemas may offer one way to deal with this problem

• Parallel concept of format registries– There is "… a pressing need to establish reliable,

sustained repositories of file format specifications, documentation, and related software" (Lawrence, et al., 2000)

– DSpace 'bitstream format registry'– Typed Object Model (TOM) project– Digital Library Federation, et al. recently

proposed a Global digital format registry

Page 19: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

Registries (2)

• Metadata schema registries:– "… formal systems that can disclose authoritative

information about the semantics and structure of the data elements that are included within a particular metadata scheme" (Heery, et al., 2000)

– Existing registries include the XML.org Registry and Repository (OASIS), and metadata registries set up by DCMI and SMPTE

Page 20: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

Registry functions (1)

• Provides support for the ingest process– Support conversion and metadata capture tools

• May also provide support for the access function

– The export of Dissemination Information Packages

– The exchange of information packages (AIPs?) with other repositories; conversion to exchange standards

• Can link metadata where there are multiple instances within the system

• May help to manage schema evolution

Page 21: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

Registry functions (2)

Administration

Ingest

ArchivalStorage

Access

DataManagement

Descriptive info.

PRODUCER

CONSUMER

MANAGEMENT

queries

result sets

Descriptive info.

Preservation Planning

orders

OAIS Functional Entities (Figure 4-1)

SIP

SIP

SIP

DIP

DIP

AIP AIP

Page 22: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

Registry functions (2)

Administration

Ingest

ArchivalStorage

Access

DataManagement

Descriptive info.

PRODUCER

CONSUMER

MANAGEMENT

queries

result sets

Descriptive info.

Preservation Planning

orders

OAIS Functional Entities (Figure 4-1)

SIP

SIP

SIP

DIP

DIP

AIP AIP

RegistrySchemasSchemas

Page 23: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

Registry functions (2)

Administration

Ingest

ArchivalStorage

Access

DataManagement

Descriptive info.

PRODUCER

CONSUMER

MANAGEMENT

queries

result sets

Descriptive info.

Preservation Planning

orders

OAIS Functional Entities (Figure 4-1)

SIP

SIP

SIP

DIP

DIP

AIP AIP

RegistrySchemasSchemas

Page 24: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

Registries - organisational issues

• Registries are part of infrastructure• Distributed vs. centralised approaches:

– Concept of 'shared services'– Experimental distributed registries are based on

Resource Description Framework (RDF)» CORES Registry» Encourage re-use of metadata (the

'application profile' concept)» Are other technologies more suitable?

• Who should be responsible for them?

Page 25: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

Summing up

• Interoperability supports the reuse of metadata and the exchange of information objects

• There is a possible role for metadata registries to help manage these (and other) processes - but the concept needs extensive scoping and evaluation

• Registries are NOT a panacea

Page 26: Http:// Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University

                                                             

http://www.ukoln.ac.uk/

DC-2003, Seattle, USA, 29 September 2003

Acknowledgements

UKOLN is funded by Resource: the Council for Museums, Archives and Libraries, the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath, where it is based.