13
Bill Roberts, PresDB 07 Database Preservation: A success story and an unsolved problem Bill Roberts 23 March 2007 PresDB, Edinburgh

Bill Roberts, PresDB 07 Database Preservation: A success story and an unsolved problem Bill Roberts 23 March 2007 PresDB, Edinburgh

Embed Size (px)

Citation preview

Page 1: Bill Roberts, PresDB 07 Database Preservation: A success story and an unsolved problem Bill Roberts 23 March 2007 PresDB, Edinburgh

Bill Roberts, PresDB 07

Database Preservation:A success story and an unsolved

problem

Bill Roberts23 March 2007

PresDB, Edinburgh

Page 2: Bill Roberts, PresDB 07 Database Preservation: A success story and an unsolved problem Bill Roberts 23 March 2007 PresDB, Edinburgh

Bill Roberts, PresDB 07

Digital preservation: why is it hard?

PEBKAC:

Page 3: Bill Roberts, PresDB 07 Database Preservation: A success story and an unsolved problem Bill Roberts 23 March 2007 PresDB, Edinburgh

Bill Roberts, PresDB 07

Data Creator

Stage 1: Creation and free

use

Stage 2: Controlled use

Operational System

Working Files

Records Management

System

Locked Files

Archiving System

Preserved Files

User

Stage 3: Active

Preservation

Me Them

Page 4: Bill Roberts, PresDB 07 Database Preservation: A success story and an unsolved problem Bill Roberts 23 March 2007 PresDB, Edinburgh

Bill Roberts, PresDB 07

Databases: what to preserve?

• Contents of tables: the data

• Structure

• Semantics

• Context

• Business/scientific process

www.digitaleduurzaamheid.nl/bibliotheek/docs/volatility-permanence-databases-en.pdf

OAIS representation information

Page 5: Bill Roberts, PresDB 07 Database Preservation: A success story and an unsolved problem Bill Roberts 23 March 2007 PresDB, Edinburgh

Bill Roberts, PresDB 07

JET data preservation• Similar experimental processes repeated many

times, 1983 • Well defined format for processed data • 2000: IBM mainframe Unix (~8 TB)• New NetCDF/XDR file format + relational metadata

database• Old API still supported

All data still accessible Fusion Engineering and Design, Volume 60, Issue 3, June 2002, 333-339. Richard Layne and Martin Wheatley

Page 6: Bill Roberts, PresDB 07 Database Preservation: A success story and an unsolved problem Bill Roberts 23 March 2007 PresDB, Edinburgh

Bill Roberts, PresDB 07

Why a success?

Single organisationSmall number of formatsCarefully designed from the startContinuously managedStill in active useData curators part of user community

Me Them

Page 7: Bill Roberts, PresDB 07 Database Preservation: A success story and an unsolved problem Bill Roberts 23 March 2007 PresDB, Edinburgh

Bill Roberts, PresDB 07

Multinational company data

• Regulatory• IP protection• Litigation • Knowledge

• Office documents• Instrument data• Records of experiments• Analysed data• Regulatory submissions• Lab notebooks

Mostly in relational databases

Page 8: Bill Roberts, PresDB 07 Database Preservation: A success story and an unsolved problem Bill Roberts 23 March 2007 PresDB, Edinburgh

Bill Roberts, PresDB 07

“Easy vs Hard”

• Few activities• Consistent approach

• Control of data formats• Standardisation• Record of data

• Many activities• Rapid changes of

science, technology, methods, formats, management

• Formats driven externally • Freedom to innovate• Trail of analysis and

basis of decisions

Page 9: Bill Roberts, PresDB 07 Database Preservation: A success story and an unsolved problem Bill Roberts 23 March 2007 PresDB, Edinburgh

Bill Roberts, PresDB 07

Solutions?

Page 10: Bill Roberts, PresDB 07 Database Preservation: A success story and an unsolved problem Bill Roberts 23 March 2007 PresDB, Edinburgh

Bill Roberts, PresDB 07

Active Preservation

Storage

Archive Management

WorkflowAutomation

Characterisation tools

Preservation action tools

Planning tools

Testbed

Page 11: Bill Roberts, PresDB 07 Database Preservation: A success story and an unsolved problem Bill Roberts 23 March 2007 PresDB, Edinburgh

Bill Roberts, PresDB 07

Data silos

Representation information!

‘Merge’ the silos:

•Interoperability now between groups

•Interoperability between now and future

Page 12: Bill Roberts, PresDB 07 Database Preservation: A success story and an unsolved problem Bill Roberts 23 March 2007 PresDB, Edinburgh

Bill Roberts, PresDB 07

RECOMMENDATIONS:

• Design for data interoperability and re-use

• Consider whole life-cycle cost

• Automate metadata harvesting

• Make it easy for data creators to do the right thing

Page 13: Bill Roberts, PresDB 07 Database Preservation: A success story and an unsolved problem Bill Roberts 23 March 2007 PresDB, Edinburgh

Bill Roberts, PresDB 07

[email protected]

www.tessella.com