40
DPI: The Digital DPI: The Digital Preservation Preservation Imperative Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003 Vancouver, BC

DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

  • View
    219

  • Download
    2

Embed Size (px)

Citation preview

Page 1: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: The Digital Preservation DPI: The Digital Preservation ImperativeImperativeDavid McKnight, Director

Digital Collections Program

McGill University

Access 2003 Conference

October 2, 2003

Vancouver, BC

Page 2: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

2

One TakeOne Take

• “In previous ages scribes, copyists, clerks, printers, human computers, and other information technicians all possessed guidelines for closure in their own idioms. Yet, today’s user of information technology finds no single, correct, classifying schema, no universal, mathematical “mapping” of a fixed, informational world that can serve as a guide through the contemporary morass that makes up our information overload. There are only data, endless data to be encoded and shuffled about.” Hobart and Schiffman, Information Ages (1998).

Page 3: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

3

DPI : Digital Preservation ImperativeDPI : Digital Preservation Imperative

IntroductionThe ProblemDefinitionsIssuesSolutionsMetadataBest PracticesPoliciesConclusion

Page 4: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

4

The ProblemThe Problem

• In New York, the corporate data of the Pennsylvania Railroad was erased.

• 20 percent of the 1976 NASA Viking Mars Mission data is unreadable.

• In Oregon, the primary database of people with disabilities vanished.

• WW II service records and Vietnam-era POW/MIA data have been lost.

• Every day, Web pages expire and are unavailable when needed.

• In several states, land use records are indecipherable due to

missing software.

Page 5: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

5

What is Digital Preservation?What is Digital Preservation?

• The term “digital preservation” refers to both preservation of materials that are created originally in digital form and never exist in print or analog form (also called “born digital” and “electronic records”) and the use of imaging and recording technologies to create digital surrogates of analog materials for access and preservation purposes….Digital materials regardless of whether they are created initially in digital form or converted to digital form, are threatened by technology obsolescence and physical deterioration. (Hedstrom)

Page 6: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

6

Digital AuthenticityDigital Authenticity

• For libraries, data centers, or any other organizations that need to preserve information objects over time, the ultimate outcome of the preservation process should be authentic preserved objects; that is, the outputs of a preservation process ought to be identical, in all respects, to what went into that process.

Page 7: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

7

What is a Digital Object?What is a Digital Object?

• A digital object is an information object, of any type of information or any format expressed in digital form. (Kenneth Thibodeau)

Page 8: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

8

Digital Object’s Three Digital Object’s Three PropertiesProperties

• A physical object

• A logical object

• A conceptual object• The physical object represents an inscription on a

physical medium. The logical object is an object that is recognized and processed by software. The conceptual object is the object as it is interpreted by a person or possibly by a computer application capable of executing a business transaction.

Page 9: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

9

Digital Objects: ExamplesDigital Objects: Examples

• Web Pages Geographic Information Systems Databases Multimedia Digital Images Digital Audio/Video Files Electronic Journals Electronic Theses

• Electronic Texts

• Fill in the blank

Page 10: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

10

Digital Preservation Issues:Digital Preservation Issues:

Technical Obsolescence

Standards

Interoperability

Metadata

• Information Security

• Software obsolescence

Page 11: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

11

Digital Preservation Issues (cont’d)Digital Preservation Issues (cont’d)

Rights Management and Intellectual Property

Authenticity

System Architecture

Longevity of the Storage Medium

Signal Degradation

Page 12: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

12

PROPOSED SOLUTIONSPROPOSED SOLUTIONS

• Data Migration

• Emulation

• Encapsulation

• Universal Virtual Computer

Page 13: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

13

Data MigrationData Migration

• Description: Periodically convert digital

data to next generation formats

• Pros: Data are "fresh" and instantly

accessible

• Cons:Copies degrade from generation

to generation

Page 14: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

14

EmulationEmulation

• Description:.Write software mimicking older

hardware or software tricking old programs

into thinking they are running on their original

platforms

• Pros: Data don't need to be altered

• Cons: Mimicking is seldom perfect; chains of

emulators eventually break down

Page 15: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

15

EncapsulationEncapsulation

• Description: Encase digital data in physical and software "wrappers," showing future

users how to reconstruct them • Pros: Details of interpreting data are never

separated from the data themselves • Cons:Must build new wrappers for every new

format and software release; works poorly for nontextual data

Page 16: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

16

Universal Virtual ComputerUniversal Virtual Computer

• Description: Archive paper copies of speci- fications for a simple, soft-ware defined decoding machine; save all data in a format readable by the machine

• Pros: Paper lasts for centuries; machine is not tied to specific hardware or software

• Cons: .Difficult to distill specifications into a brief paper document

Page 17: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

17

METADATAMETADATA

• .Commonly we refer to three levels of metadata: descriptive, structural and administrative. Descriptive metadata facilitates resource discovery – which enables the user to engage in the “conceptual” properties of the digital object. Administrative and structural metadata records the physical and logical properties of the digital object. Thus knowing technical components of the object is essential for future use and rendering.

Page 18: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

18

Metadata (cont’d)Metadata (cont’d)

• Metalanguages : SGML and XML

• Text Mark up languages– HTML– EAD– ETD– TEI– CIMI

Page 19: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

19

METADATA ELMENENT SETSMETADATA ELMENENT SETS

• METADATA ELEMENT SETS– DUBLIN CORE– MARC 21– GILS– CSGDM– METS – MPEG 7/21– OTHERS

Page 20: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

20

RIGHTS METADADARIGHTS METADADA

• DOI: DIGITAL OBJECT IDENTIFIER

• URN: UNIFORM RESOURCE IDENTIFIER

• PICS: PLATFORM FOR INTERNET SITES

Page 21: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

21

WRAPPER WRAPPER TECHNNOLOGIESTECHNNOLOGIES

• RDF: RESOURCE DESCRIPTION FRAMEWORK

• UPF: UNIVERSAL PRESERVATION FORMAT

Page 22: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

22

OAIS MODELOAIS MODEL

• One model that is gaining wide acceptance is OAIS the Open Archive Information System. OAIS standard is published as ISO 14721:2002. It provides managers of digital objects with a framework for interoperability which is based on six functional areas:

Page 23: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

23

FUNCTIONAL COMPONENTSFUNCTIONAL COMPONENTS

• A producer provides a submission information package (SIP to the Ingest entity.

• .An archival information package (AIP) is created and delivered to Archival storage.

• .Related descriptive information is provided to Data Management

Page 24: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

24

FUNCTIONAL COMPONENTS (CONT’D)

• A consumer searches for and requests information using appropriate descriptive information and access aids.

• .The appropriate AIP is retreived from Archival Storage and transformed by the ACCes entity into the appropriate dissemination package (DIP) for delivery to the consumer

• Activities are carried out under the guidance of the Administration entity.

Page 25: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

25

OAIS FUNCTIONAL ENTITIESOAIS FUNCTIONAL ENTITIES

Page 26: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

26

HODGES’ PRINCIPLES OF BEST HODGES’ PRINCIPLES OF BEST PRACTICES:PRACTICES:

• The Digital Object Life Cycle Model• Creation

• Acquisition and Collection Development

• Cataloguing / Identification

• Storage

• Preservation

• Standards and Interoperability

• Access

Page 27: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

27

CREATIONCREATION• Determine the long-term value of the information and embed a

"preservation indicator" into the object providing future users

with the creators' assessment of the long term intellectual and

informational value of the object.

• At the outset pay attention to issues of consistency, format,

standardisation, and metadata description.

• Metadata is one of the instruments that is critical to the longevity

of the object thus becomes obligatory if it is to survive.

Standards are emerging making this easier to implement at the

creation stage.

Page 28: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

28

ACQUISITION AND COLLECTION ACQUISITION AND COLLECTION DEVELOPMENTDEVELOPMENT

• Create digital object collection policies of both born digital and digitized collections. One area where digital objects differ from print objects is legal deposit. Legal deposit valorizes and protects print material under current copyright law.

• It is essential to establishing a hierarchy of intellectual value to assess the significance of the digital object.

Page 29: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

29

CATALOGUING / IDENTIFICATIONCATALOGUING / IDENTIFICATION

• Identification provides a unique key for finding the object and linking that object to other related subjects.

• Cataloguing in the form of metadata supports

organization, access and curation. • Metadata is key to interoperability and such initiatives

as the (Open Archival Information System) OAIS Reference Model reinforce the heterogeneous nature of the problem and the solution.

• Insure long to access through the adoption of PURLs

Page 30: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

30

STORAGESTORAGE

• Storage media and formats are constantly changing.

Block sizes, tape sizes, tape drive mechanisms and

operating systems have all changed over time.

• Migration rate: every three to five years to new

storage media:

• Issue: Data loss and quality are problems that must

be monitored very carefully.

Page 31: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

31

PRESERVATIONPRESERVATION• "Preservation is the aspect of archival management that preserves

the content as well as the look and feel of the digital object. While the study showed that there is no common agreement on the definition of long-term preservation, the time frame can be thought of as long enough to be concerned about changes in technology and changes in the user community. Depending on the particular technologies and subject disciplines involved...the estimated cycle for hardware/software migration is two to ten years."

• Issues:Developing a planned cycle of migration to new versions of software is essential to avoid the problem of backward

compatibility.

Page 32: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

32

STANDARDS AND STANDARDS AND INTEROPERABILITYINTEROPERABILITY

• Formats are decreasing in number and there is a greater awareness of economic and "political" benefits of standardisation and the adoption of acknowledged set of interchangeable file formats and tagging languages, especially where text and images are concerned. Hence the wide adoption of SGML, XML, HTML, PDF, for texts, TIF and JPEG formats for images and MPEG in the realm of sound and video.

Page 33: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

33

ACCESSACCESS

• Access Mechanisms

• Rights Management

• Security Requirements

Page 35: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

35

RESOURCESRESOURCES

• ART MUSEUM CONSORITIUM (AMICO)• CANADIAN CONSERVATION ISTITUTE• CANADIAN INITIATIVE ON DIGITAL

LIBRARIES (CIDL)• COALITION FOR NETEWORKED

INFORMATION (CNI)• COUNCIL ON LIBRARY AND

INFORMATION RESOURCES (CLIR)

Page 36: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

36

RESOURCES (CONT’D)RESOURCES (CONT’D)

• DIGITAL LIBRARY FEDERATION (DLF)• INSTITUTE OF MUSEUM AND

LIBRARY STUDIES (IMLS)• LIBRARY AND ARCHIVES OF CANADA • ONLINE COMPUTER LIBRARY

CENTER (OCLC)

Page 37: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

37

SAMPLE INTITATIVESSAMPLE INTITATIVES

• InterPARES (Preservation Task Force of the Research on Permanent Records Systems)

• International DOI Foundation • Preserving Access to Digital Information

(PADI)• Preserving and Accessing Networked

Documentary Resources of Australia (PANDORA)

Page 38: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

38

CONCLUSION: A PRIMER ON CONCLUSION: A PRIMER ON DIGITAL PRESERVATION DIGITAL PRESERVATION

IMPERATIVESIMPERATIVES: • Assume responsibility for your digital objects at the

moment of creation

• Be proactive within the digital preservation community

• Create digital content based on the dual principles of

• interoperability and common standards

• Draft a digital preservation policy based on the life cycle of the object

• Educate your administrators on the importance of digital preservation

Page 39: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

39

PRIMER….PRIMER….

• Find champions and stakeholders

• Give credence to the probability that data files

created today will be unreadable tomorrow

• Heed proprietary solutions

• Invest in the long term access to your digital assets

• Just do it!

Page 40: DPI: The Digital Preservation Imperative David McKnight, Director Digital Collections Program McGill University Access 2003 Conference October 2, 2003

DPI: David McKnight, ACCESS 2003

40