Digital Preservation. The Past is Prologue Developing Preservation Approaches

Preview:

Citation preview

Digital Preservation

The Past is Prologue

Developing Preservation Approaches

Diagram by Nancy Y. McGovern based on PhD Research, March 2001

5 Stages of Digital Preservation

1. Digitization leads to understanding that digital content needs to be managed and protected

2. Digital Preservation Projects are initiated

3. Digital Preservation Projects segue into Programs

4. Digital Preservation Programs become comprehensive and coordinated

5. Institutional Programs embrace Inter-institutional Collaboration

Digital Preservation Officer

• First DPO appointed January 2002http://www.library.cornell.edu/iris/dpo/

• coordinates digital preservation policy development and implementation

• serves as the liaison to digital preservation initiatives and projects

• developing a conceptual framework for a cohesive digital preservation program

Models and Standards

• Attributes of a Trusted Digital Repository (RLG-OCLC)

http://www.rlg.org/longterm/attributes01.pdf

• OAIS Reference Model (CCSDS)http://www.ccsds.org/documents/pdf/CCSDS-650.0-R-2.pdf

Models and Standards

• SIP Transfer Issues: • Producer-Archive Interface Methodology Abstract Standard

(CCSDS)

http://ssdoo.gsfc.nasa.gov/nost/isoas/CCSDS-651.0-W-1.pdf

• AIP Components (OCLC/RLG PMWG): • Content Information

• Preservation Description Informationhttp://www.oclc.org/research/pmwg/

• Format Issues: • Draft Standard - Data Dictionary - Technical Metadata for Digital Still

Images (NISO) http://www.niso.org/committees/committee_au.html

Attributes of a Trusted Repository

1. Administrative responsibility

• Provide evidence of fundamental commitment to standards, best practices

• Commit to OAIS model

• Meet standards on environment (6)

• Share measurements with depositors (6)

• Involve external community experts in validating/certifying practices (6)

• Commit to transparency and accountability (6)

2. Organizational viability

• Demonstrate viability and trustworthiness (3)• Reflect commitment to long-term retention/management in

mission statements• Have appropriate legal status, staff and professional development

(1)(3)• Establish transparent business practices, effective management

policies (6)(3)• Define inclusive agreements with depositors (6)• Review/maintain policies and procedures (6)• Undertake risk management, contingency and succession (trusted

inheritors) planning (6)(3)

3. Financial sustainability

• Establish/maintain good business practices and an auditable business plan (1)(2)

• Demonstrate financial fitness and ongoing financial commitment (1)(2)

• Balance risk, benefit, investment, expenditure

• Maintain adequate budget and reserves and actively seek potential funding sources

4. Technological suitability

• Consider/adopt appropriate preservation strategies (6)• Ensure appropriate infrastructure for acquisition,

storage, access (5)• Establish technology management policy for repository

(2)(3)• Comply with relevant standards and best practices,

adequate expertise (6)• Undergo regular external audits on system components

and performance (6)

5. System security

• Assure security of systems for digital assets (3)

• Establish policies and procedures to meet requirements (4)(6)

• Stress processes that will detect, avoid and repair loss, document and notify of changes and resulting actions (4)(6)

6. Procedural accountability

• Enact policies and procedures for tasks and functions, document practices (1)(2)

• Establish monitoring mechanisms to ensure continued operation of systems and procedures (4)(5)

• Record/justify preservation strategies (1)(2)

• Set up feedback mechanisms for problem resolution; negotiate evolving requirements between providers and consumers (1)(2)

Framework Components

• Administrative Responsibility

• Organizational Viability

• Financial Sustainability

• Technological Suitability

• System Security

• Procedural Accountability

Diagram by Nancy Y. McGovern based upon the RLG-OCLC Attributes of a Trusted Repository

Open Archival Information System (OAIS)

Framework to Model

Overview of the OAIS Model

from Reference Model for an Open Archival Information System [4]

OAIS Categories

• [Data Object]• Representation Information

(Structure, Semantic, and Other Information)

• Content Information [1](Data Object + Representation Information)

• Preservation Description Information [2](Reference, Context, Provenance and Fixity Information)

• Descriptive Information (Content Information + PDI)

• Packaging Information [physically and logically binds]

OAIS at Cornell

Preserving Essential Elements

• Content

• Context

• Structure

• Appearance

• Behavior

Emulation

• Jeff Rothenberg

• Dutch National Library

• IBM

• CAMiLEON Project

• David Bearman

Migration

• Risk Management of Digital Information: A File Format Investigation

• Charles Dollar

• Margaret Hedstrom

• CAMiLEON Project

• Dutch Testbed Project

XML and Object-Based

• NARA and SDSC

• Dutch Testbed Project

• Victoria Electronic Records Project (VERS)

• Harvard SIP proposal

Project Prism

CUL Research Team:Anne R. Kenney

Nancy Y. McGovern

Peter Botticelli

Richard Entlich

Risk Management Stages

Typical Stages Prism Stages

1. Risk identification 1. Data gathering

Characterization2. Risk classification

3. Risk assessment 2. Simple risk declaration

3. Contextualized

declaration/detection4. Risk analysis

5. Program implementation 4. Automated enforcement

Levels of Context

• Web page • as a stand-alone object, ignoring its hyperlinks• in local context, considering the links into it

and out from it

• Web site• as a semantically coherent set of linked Web

pages• as an entity in a broader technical and

organizational context

Page-level Monitoring

• Formatting: TIDY• Standards compliance• Document structure• Metadata:

• HTTP headers• HTML headers

• Changes• Content• Location

• Links• Out-link structure• In-link structure• Intra-site • Hub• Volatility

• Page provenance• URL parsing

• Log analysis

Site-level Monitoring

• Graph analysis

• Static site analysis and Longitudinal study

• Aggregate page analyses

• Site maintenance indicators• Backup and archiving policies and procedures

• Hardware and software environment

• Network configuration and maintenance

Research Plan

• Preservation Risk Management for Web Resources: Virtual Remote Control in Cornell’s Project Prism

By Anne R. Kenney, Nancy Y. McGovern, Peter Botticelli, Richard Entlich, Carl Lagoze, and Sandra Payette

DLib Magazine, January 2002http://www.dlib.org/dlib/january02/kenney/01kenney.html

Publisher-Based Digital Archives

Subject-Based Digital Archives

Intersection of Digital Archives

Format-based

Relevant Initiatives

• Metadata Encoding and Transmission Standard (METS) http://www.loc.gov/standards/mets/

[highlighted Web site in RLG DigiNews February 2002]

• Flexible and Extensible Digital Object and Repository Architecture (FEDORA)

• Mellon Fedora Projecthttp://fedora.comm.nsdlib.org

Slides from January 2002 briefing: http://www.cs.cornell.edu/payette/presentations

Relevant External Projects• NEDLIB

• http://www.kb.nl/coop/nedlib/

• CAMiLEON (CEDARS)• http://www.si.umich.edu/CAMILEON/index.htm

• http://www.leeds.ac.uk/cedars/

• PANDORA• http://pandora.nla.gov.au/index.html

• Harvard University LDI• http://hul.harvard.edu/ldi/

• NARA & SDSC• http://www.nara.gov/era/

Recommended