Upload
margaretmargaret-wood
View
215
Download
0
Embed Size (px)
Citation preview
Bill Roberts, PresDB 07
Database Preservation:A success story and an unsolved
problem
Bill Roberts23 March 2007
PresDB, Edinburgh
Bill Roberts, PresDB 07
Digital preservation: why is it hard?
PEBKAC:
Bill Roberts, PresDB 07
Data Creator
Stage 1: Creation and free
use
Stage 2: Controlled use
Operational System
Working Files
Records Management
System
Locked Files
Archiving System
Preserved Files
User
Stage 3: Active
Preservation
Me Them
Bill Roberts, PresDB 07
Databases: what to preserve?
• Contents of tables: the data
• Structure
• Semantics
• Context
• Business/scientific process
www.digitaleduurzaamheid.nl/bibliotheek/docs/volatility-permanence-databases-en.pdf
OAIS representation information
Bill Roberts, PresDB 07
JET data preservation• Similar experimental processes repeated many
times, 1983 • Well defined format for processed data • 2000: IBM mainframe Unix (~8 TB)• New NetCDF/XDR file format + relational metadata
database• Old API still supported
All data still accessible Fusion Engineering and Design, Volume 60, Issue 3, June 2002, 333-339. Richard Layne and Martin Wheatley
Bill Roberts, PresDB 07
Why a success?
Single organisationSmall number of formatsCarefully designed from the startContinuously managedStill in active useData curators part of user community
Me Them
Bill Roberts, PresDB 07
Multinational company data
• Regulatory• IP protection• Litigation • Knowledge
• Office documents• Instrument data• Records of experiments• Analysed data• Regulatory submissions• Lab notebooks
Mostly in relational databases
Bill Roberts, PresDB 07
“Easy vs Hard”
• Few activities• Consistent approach
• Control of data formats• Standardisation• Record of data
• Many activities• Rapid changes of
science, technology, methods, formats, management
• Formats driven externally • Freedom to innovate• Trail of analysis and
basis of decisions
Bill Roberts, PresDB 07
Solutions?
Bill Roberts, PresDB 07
Active Preservation
Storage
Archive Management
WorkflowAutomation
Characterisation tools
Preservation action tools
Planning tools
Testbed
Bill Roberts, PresDB 07
Data silos
Representation information!
‘Merge’ the silos:
•Interoperability now between groups
•Interoperability between now and future
Bill Roberts, PresDB 07
RECOMMENDATIONS:
• Design for data interoperability and re-use
• Consider whole life-cycle cost
• Automate metadata harvesting
• Make it easy for data creators to do the right thing