21
National Geospatial Digital Archive Greg Janée

National Geospatial Digital Archive Greg Janée. Greg Janée May 31, 20052 Outline Two preservation misadventures Digital preservation problems Genesis

Embed Size (px)

Citation preview

National Geospatial Digital Archive

Greg Janée

Greg Janée • May 31, 2005 2

Outline

• Two preservation misadventures• Digital preservation problems• Genesis of NGDA• Project approach/philosophy• What it will mean to be a data provider

Greg Janée • May 31, 2005 3

Domesday Book— 1086

Greg Janée • May 31, 2005 4

Domesday Book— 1986

Greg Janée • May 31, 2005 5

Greg Janée • May 31, 2005 6

Meanwhile, back at NASA...

• 1976– Viking probes go to Mars

• 1999– USC neurobiologist Joseph Miller asks for

data– tapes coded “in a format so old that the

programmers who knew it had died”– works off of paper records

Greg Janée • May 31, 2005 7

Preservation issues

• Physical– media– systems

• Contextual– format– semantics– authenticity

• Legal– copyright

Greg Janée • May 31, 2005 8

Project genesis

• NDIIPP– Library of Congress, 2000– $100M– http://www.digitalpreservation.gov/

• NGDA– UCSB (MIL) & Stanford (Branner Library)– $2.6M, 3 years– archive geospatial data on a national scale– http://www.ngda.org/

Greg Janée • May 31, 2005 9

Greg Janée • May 31, 2005 10

Some philosophy

• Archival has to be cheap & easy– must be distributed– but reality is little incentive, no funding

• Archive definition:– offers access now & in the future– no mandatory services beyond simple access

• Policy separated from mechanism

• Archive includes data semantics– key differentiator from text, audio, video

Greg Janée • May 31, 2005 11

Philosophy, cont.

• Curatorial, not archeological approach– assumption: content comes in discrete, self-

contained chunks

• Preservation by format definition, archival & association– support for derivative forms, services

• Must support long-term preservation– need to migrate archive itself

Greg Janée • May 31, 2005 12

MIT Media LabStewart Brand, “How Buildings Learn,” p. 53

Greg Janée • May 31, 2005 13

MIT Building 20Ibid., p. 26

Greg Janée • May 31, 2005 14

system

databasestorage

handleresolver

database

Typical repository architecture

database

handleresolver

database

fragile

Greg Janée • May 31, 2005 15

NGDA architecture

storage subsystem

standard, public data model

archival system

databases,caches,

etc.

bulkloader

ingest

ADL OAIWeb

access

Greg Janée • May 31, 2005 16

Post-NGDA architecture

storage subsystem

standard, public data model

Web

Greg Janée • May 31, 2005 17

Storage system requirements

• Req’s:– associate UUIDs/RIDs with bitstreams– retrieve global/local bitstream by UUID/RID– determine (parent) UUID of any bitstream– list all UUIDs

• Satisfied by:– any filesystem– tag URIs for UUIDs

• tag:library.ucsb.edu,2005:identifier

Greg Janée • May 31, 2005 18

Archival objects

directoryUUID

componentRID

UUID

Greg Janée • May 31, 2005 19

Example

USGS

DOQQ

GeoTIFFFGDC

Object x

x.tiffx.fgdc x.gif

met

adat

ad

ata

derived

TIFFsubtypeOf

Greg Janée • May 31, 2005 20

Object types

• Data, other content• Format definitions• Semantic definitions• Providers• Organizational structures

– collection– series– ingest session

Greg Janée • May 31, 2005 21

Archive-provider agreement

• Defines– common structure of objects to be ingested– necessary validations– associations to other objects

• assumes pre-loading of semantic definitions

– policies, rights, etc.

• Represents choke point– requires human evaluation