Upload
wade-ramirez
View
14
Download
0
Embed Size (px)
DESCRIPTION
ITHAKA Preservation Metadata 2.0: Revising the Event Model A last-minute presentation on work currently in progress Evan Owens VP, Content Management ITHAKA (JSTOR / Portico) [email protected]. Background. Portico Preservation Metadata designed & implemented in 2002-2003 - PowerPoint PPT Presentation
Citation preview
ITHAKA Preservation Metadata 2.0:Revising the Event Model
A last-minute presentation on work currently in progress
Evan OwensVP, Content ManagementITHAKA (JSTOR / Portico)[email protected]
Background
•Portico Preservation Metadata designed & implemented in 2002-2003
– Inspired by PREMIS working group participation
– Operational before PREMIS was completed!
•Portico Archive as of October 2009
– >14 Million E-Journal Articles plus other content
– ~150 Million Files
– ~1 Billion Events
– Only 1K manual events; 99.999% system generated
– Over 1 TB of Preservation Metadata
•Portico / JSTOR / Ithaka merger in 2009
2.0 PMD Revision Project
•Begun in 2008; Implementation now underway
•Design Goals for Revision to Events:
– Consistent editorial/coding practices (capitalization, verb tenses, etc.)
– Clarify what event goes with which object and why
– Eliminate redundant information where possible
– Make explicit all data constraints not currently expressed in our schemas
– Synchronize event metadata with the high-level preservation metadata so that the events properly document changes in the core metadata
– Establish a clean base line for future expansion of events metadata
PMD 2.0 Design Choices
•Use our own data model / information architecture
– Optimized for Java, Oracle, and XML instantiations
– XML designed to reduce future versioning:
• XSD schema for frame (syntax) only
• All business rules (semantics) expressed in Schematron
– Not METS, not DIDL, not PREMIS XML
– PREMIS compliant
•Optimized for size and speed
– Fully relationally normalized
– Inheritable attributes / metadata
– Events attached to objects
Processing Record“master” for each processing pass
Bring together information common to all the events from a given processing pass; e.g., initial ingest, future migration, etc.
Not a real event!
Example XML serialization showing all possible child elements to illustrate the information model
Event Types
•Check: Virus, Fixity, …
•Characterize: File, …
•Generate: Desc. MD, Tech. MD, Fixity, …
•Edit: Desc. MD, …
•Set: Status, Format, Preservation Level, …
• Ingest: into Archive
•Add, Create, Remove File
Mapping PMD 2.0 to PREMIS
Observations
•Large-scale automated events feel very different from human events
• ITHAKA archive will quadruple in 2010
– Likely 3-5 billion events . . .
•Every bit of metadata has to be need justified
•Events have proved their value
– An entire talk on that subject alone
•Nothing is easy in quantities of billions
•We still have to work on full lifecycle events
•THIS IS STILL A WORK IN PROGRESS!