View
221
Download
0
Category
Tags:
Preview:
Citation preview
Page 1
Content from the Library of CongressDPOE Baseline Modulesversion 2.0, Nov 2011
Jody L. DeRidderUniversity of Alabama Libraries jlderidder@ua.edu
July 16, 2012
An Introduction to Digital Preservation
Page 2
DPOE Modules
Identify - what digital content do you have?
Select - what portion of that content will be preserved?
Store - what issues are there for long term storage?
Protect - what steps are needed to protect your digital content?
Manage - what provisions are needed for long-term management?
Provide - what considerations are there for long-term access?
DPOE Baseline Modules: Intro, version 2.0, Nov 2011
Page 3
identify
select
storeprotectmanage
provide
Managing Content Over Time
DPOE Baseline Modules: Intro, version 2.0, Nov 2011
Page 4
Why do we identify content?
SCOPE!
DPOE Baseline Modules: Identify, version 2.0, Nov 2011
Image from: http://en.wikipedia.org/wiki/Telescopic_sight
Preservation requires an explicit commitment of resources Effective planning is based on knowing the extent of what will be preserved Identifying content is a first step to planning for current and future preservation needs An explicit inventory is the best way to identify content
Page 5
Content Categories
Inventories should include all relevant material: • Institutional records • Special collections• Scholarly content – licensed and open• Research data• Web content• Digitized collections
DPOE Baseline Modules: Identify, version 2.0, Nov 2011
Page 6
Example entry:• Category: Special Collections• Title/Description: Railroad Photographs, SE U.S.• Type: images, digitized• Format: TIFF• Extent: 242 GB; 2,250 images• Location: archival server in Room A, Central IT• Coverage Dates: early 1900’s• Creation date: January-June 2006• Inventoried: 12/15/2011, by Fred Jones
Page 7
Selecting Content for Preservation: Why do it?
• Storage may be cheap, but management is not … especially over time
• Sustaining the quality of content
takes effort
• Continually changing discovery and dissemination services will be needed as hardware and software change
… think scale, scope, performance, sustainability
DPOE Baseline Modules: Select, version 2.0, Nov 2011
Page 8
Selection Criteria: matching mission to content…
• Acquisition or collection development policy• Departmental criteria (priorities, precedents)• Research criteria (interests, significance)• Uniqueness (only source) • Value (historical, evidential, can’t reproduce)
DPOE Baseline Modules: Select, version 2.0, Nov 2011
Page 9
Practical Considerations
Stop if or when the answer is ‘no’…1. Content
– does the content have value? – does it fit your scope?
2. Technical– is it feasible for you to preserve the content?
3. Access– is it possible to make the content available?
DPOE Baseline Modules: Select, version 2.0, Nov 2011
Page 10
Selection starts at the beginning…
Contact content creators
(as needed)– Arrange a convenient time for them– Prepare brief statement of outcomes– Identify list of materials to review with them– Send a reminder before the meeting– Document the results and send them a copy
DPOE Baseline Modules: Select, version 2.0, Nov 2011
Prevent later headaches!
Page 11
STORAGE involves…
• What you store– File Formats– Metadata
• How you store it– Number of copies – Storage media– Repository selection
Page 12
What are storage needs?
Archival Storage manages content as
objects
Digital content (files + metadata = object):• May include any types
– e.g., images, text, sound, video, maps
• Requires some identification
and description– Captured as metadata
DPOE Baseline Modules: Store, version 2.0, Nov 2011
Page 13
Selecting File Formats for Text
“…the agency must clearly define the purpose and the requirements for preservation…
The appropriate answer will depend on:• the mission of the agency • the kind of information to be preserved• the uses to which the objects may be put in the future• the expectations of current and future users, and • how far into the future the objects are intended to
remain useful.
CENDI Digital Preservation Task Group. “Formats for Digital Preservation: A Review of Alternatives and Issues”, Revised Mar. 1, 2007. p.22.http://www.cendi.gov/publications/CENDI_PresFormats_WhitePaper_03092007.pdf
For text: TIFF XML PDF / A
Page 14
Selecting File Formats for Images
Sustainability factors:• Disclosure• Adoption• Transparency• Self-documentation• External dependencies• Impact of patents• Technical protection mechanisms
Bill LeFurgy, October 12, 2011. “Digital Preservation-Friendly File Formats for Scanned Images” http://blogs.loc.gov/digitalpreservation/2011/10/digital-preservation-file-formats-for-scanned-images/
For images: TIFF JPEG 2000 PDF / A
Page 15
Which Formats Are Best?
Sustainability of Digital FormatsPlanning for Library of Congress Collections http://www.digitalpreservation.gov/formats/
Page 16
Importance of Metadata• How do you know what an object is?
− Metadata uniquely identifies digital objects
• How do you use content in the future?– Metadata makes digital objects understandable
• How do you know an object is authentic?– Metadata allows objects to be traced over time
Metadata enables long-term preservation
DPOE Baseline Modules: Identify, version 2.0, Nov 2011DPOE Baseline Modules: Store, version 2.0, Nov 2011
Page 17
Preservation MetadataContent (what), Fixity (unchanged), Provenance (life story),
Reference (this thing), Context (relationships)
Administrative(manage)
Structural(understand, use)
Descriptive(find, use)
Object-level Metadata
Diagram courtesy DPM Workshops
DPOE Baseline Modules: Identify, version 2.0, Nov 2011DPOE Baseline Modules: Store, version 2.0, Nov 2011
Page 18
Object Metadata Characteristics
Content: preserve the substanceFixity: demonstrate content is unchanged Reference: identify as this content and no other Provenance: trace to its origin (or to deposit)Context: preserve linkages with other objects Original source: Preserving Digital Information Report, 1996
DPOE Baseline Modules: Identify, version 2.0, Nov 2011DPOE Baseline Modules: Store, version 2.0, Nov 2011
Page 19
Number of Copies
How many copies are enough for you?
Minimum: two (2) copies in two locationsOptimum: six (6) copies
Examples of storage factors:• Video files are too large to store 6 copies• Possible legal restrictions (e.g., storage locations)• Types of media used for storing the content
DPOE Baseline Modules: Identify, version 2.0, Nov 2011DPOE Baseline Modules: Store, version 2.0, Nov 2011
Page 20
Storage Media Options• Content (objects) are kept on storage media • Options include: online, near-line, offline• Factors for choosing options include
– Cost (available resources for preservation)– Quantity (size and number of files)– Expertise (skills required to manage)– Partners (achieving geographic distribution) – Services (outsourcing)
DPOE Baseline Modules: Identify, version 2.0, Nov 2011DPOE Baseline Modules: Store, version 2.0, Nov 2011
Page 21
• Multiple, geographically distributed copies
• Storage Partners or Hosted Services
Storage Considerations
Services and collaborations can make it easier for organizations to manage content over timeDPOE Baseline Modules: Identify, version 2.0, Nov 2011DPOE Baseline Modules: Store, version 2.0,
Nov 2011
Page 22
Repository Selection• Range of types to consider:
– general (any content) to special (format-specific)
– open source to proprietary
– unified to distributed
– easy to advanced installation and
management
• Each option has pros and cons
• No system is fully compliant to standards
Select best option for your content – for now
DPOE Baseline Modules: Identify, version 2.0, Nov 2011DPOE Baseline Modules: Store, version 2.0, Nov 2011
Page 23
What are we protecting content from?
• Change and loss – accidental and intentional• Obsolescence – as technology evolves• Inappropriate access – e.g., confidential data• Non-compliance – standards and requirements• Disasters – emergencies of all kinds
DPOE Baseline Modules: Identify, version 2.0, Nov 2011DPOE Baseline Modules: Protect, version 2.0, Nov 2011
Page 24
Obsolescence Prevention Terminology
• Refresh: moving content to newer media
• Migrate: moving content to newer formats that can be accessed with current hardware and software
• Normalize: migrate to archival formats that meet your specifications
• Emulate: attempting to provide the original look and feel of the content with newer software
Page 25
ReadinessProper planning should allow you to:• Prevent – undesirable outcomes
• Predict – most likely risks and threats• Detect – errors, problems, damage• Respond – with appropriate measures• Repair – damage or possible loss
DPOE Baseline Modules: Identify, version 2.0, Nov 2011DPOE Baseline Modules: Protect, version 2.0, Nov 2011
Page 26
Everyday Protection
• Know where your content is located– Onsite and offsite; online and offline
• Know who can have access to it– DP staff, IT staff, others?
• Manage authentication information – For staff, depositors, users
• Track and review usage then adjust practices– Web use, internal use and activities, maintenance
DPOE Baseline Modules: Identify, version 2.0, Nov 2011DPOE Baseline Modules: Protect, version 2.0, Nov 2011
Page 27
Emergency Protection
• Engage in ongoing disaster planning – Establish committee and share information– Develop and maintain documents
• Identify possible outcomes and prepare– e.g., server goes down, media is damaged
DPOE Baseline Modules: Protect, version 2.0, Nov 2011
Page 28
Disaster Planning Resources
DPOE Baseline Modules: Protect, version 2.0, Nov 2011
Page 29
Why do we emphasize management?
Preserving DigitalInformation (PDI),
1996
DPOE Baseline Modules: Manage, version 2.0, Nov 2011
• Rapid technological obsolescence• Media fragility• Legal and organizational environment in flux• Complex practical issues• Lack of clarity as to procedures and responsibilities• Multiplicity of types of content in growing number of formats• Massive amounts of content
Page 30
Balanced Management
An effective approach will address:• Organizational requirements and objectives• Technological opportunities and change• Resources – funding, staff, equipment, etc.
DPOE Baseline Modules: Manage, version 2.0, Nov 2011
Kenney and McGovern, 2003. “The Five Organizational Stages of Digital Preservation” http://www.dpworkshop.org/
Page 31
• Preservation Planning (ongoing)• Self-assessment (internal process)• Audit (external review by peers)• Business Continuity • Disaster Planning
DPOE Baseline Modules: Manage, version 2.0, Nov 2011
Organizational Requirements: Planning
Page 32
Organizational Objectives: DP Standards
Standards emerging since 1996 report :• Trusted Digital Repositories, 2002• Open Archival Information Systems (OAIS)
Reference Model, 2003 and 2009 revision• Preservation Metadata Implementation Strategies,
2005 plus updates• Trustworthy Repositories Audit and Certification
(TRAC), 2011 Common practices are emerging and evolving
DPOE Baseline Modules: Manage, version 2.0, Nov 2011
Page 33
Trusted Digital Repository
A TDR should have these characteristics:• community standards (OAIS Compliance )• commitment (Administrative Responsibility) • management (Organizational Viability)• resources (Financial Sustainability)• infrastructure (Technological … Suitability)• protection and control (System Security)• documentation (Procedural Accountability)
DPOE Baseline Modules: Manage, version 2.0, Nov 2011
Page 34
Community Expectations: Ten Principles
Available on the CRL website
DPOE Baseline Modules: Manage, version 2.0, Nov 2011
1) Demonstrates organizational fitness (including financial, staffing, and processes) to fulfill its commitment.
2) Acquires and maintains requisite contractual and legal rights and fulfills responsibilities.
3) Has an effective and efficient policy framework.
4) Acquires and ingests digital objects based upon stated criteria that correspond to its commitments and capabilities.
Page 35
Available on the CRL website
DPOE Baseline Modules: Manage, version 2.0, Nov 2011
5) Maintains/ensures the integrity, authenticity and usability of digital objects it holds over time. 6) Creates and maintains requisite metadata about:
actions taken on digital objects during preservation and relevant contexts before preservation:
• production • access • usage
Community Expectations: Ten Principles
Page 36
Available on the CRL website
DPOE Baseline Modules: Manage, version 2.0, Nov 2011
7) The repository commits to continuing maintenance of digital objects for identified community/communities.
8) Fulfills requisite dissemination requirements.9) Has a strategic program for preservation planning and
action.10) Has technical infrastructure adequate to continuing
maintenance and security of its digital objects.
Community Expectations: Ten Principles
Page 37
Technological Opportunities: Investing in Technology
• Prioritize: weigh requirements to be met• Assess: define criteria to select appropriate• Sequence: identify steps to meet goals• Fund: decide when to own/join/share• Anticipate: look ahead, be prepared• Evaluate: measure outcomes and success
DPOE Baseline Modules: Manage, version 2.0, Nov 2011
Page 38
Technological Opportunities: Adopting Technologies
Characteristics of sound software:• written in a well-documented language• usable on a wide variety of platforms• sustained support by creators/developers• modular in design• supports batch processing and workflows• licenses support secondary use
DPOE Baseline Modules: Manage, version 2.0, Nov 2011
Page 39
Resources: Designated Funding
• Funds set aside for digital preservation• Measurable indication of intent to preserve• Challenging to do, but important• Over time, contributes to track record• May not be explicit (e.g., budget line item)
… but must be able to make a compelling case
DPOE Baseline Modules: Manage, version 2.0, Nov 2011
Page 40
Resources: Sustainable Access
Effective and sustainable DP programs address:• Value – understand and stress content value• Roles – identify stakeholders and involve them• Incentives – identify “carrots” for preserving
Identify and address costs across life cycle
See: Blue Ribbon Task Force Report on Sustainable Preservation and Access Report
DPOE Baseline Modules: Provide, version 2.0, Nov 2011
Page 41
What is Long-term Access?
Preservation • relies upon proven technologies
to preserve digital objects across generations of technology
• accumulates metadata over the life cycle to trace preserve content
DPOE Baseline Modules: Provide, version 2.0, Nov 2011
Access• relies on cutting edge
technologies to provide best and fastest access at a point in time
• selects metadata needed to use and understand content
Preservation makes long-term access possible…
Page 42
Preservation • preservation systems create new
versions of digital objects for access to deliver as needs change over time
• purpose: ensure long-term access
• focus: future users
Access• access systems deliver objects
with user-oriented services to make the objects
• purpose: provide content to users
• focus: current users
Preservation makes long-term access possible…
DPOE Baseline Modules: Provide, version 2.0, Nov 2011
What is Long-term Access?
Page 43
Understand Users
• Who are your users? Track and respond to them.
User expectations will change over time, and must be monitored.
• Preservation provides pathway from one generation of technology to the next
Digital content will need to be packaged in new ways for delivery over time.
DPOE Baseline Modules: Provide, version 2.0, Nov 2011
Page 44
Access Policies: Issues
• Who is allowed to have access to content?• Are access policies equal for all content?• If not, how are categories managed?• How are exceptions/special requests handled?• How do users request/get access?• What options (if any) do users have?
Consider using FAQs as a step to develop policies
DPOE Baseline Modules: Provide, version 2.0, Nov 2011
Page 45
• Legal issues include copyright, but copyright is only a portion of legal issues in DP
• Legal questions emerge throughout lifecycle … and most of us are not lawyers
• Access raises legal issues, but manage from submission (or before) throughout lifecycle
• DP requires well-formed, valid documentation
− agreements, contracts, licenses, policies, etc.
• Good legal advice should enable well-formed evidential documentation and transparency
Managing Life Cycle Legal Issues
DPOE Baseline Modules: Provide, version 2.0, Nov 2011
Page 46http://www.lib.ua.edu/wiki/digcoll/index.php/Digital_Services_Permission_Agreement
The Donor grants […] and its agents the right to: • Digitize all submitted content, and create derivative
representations for web access • Reproduce and distribute reprints or derivative representations
for noncommercial scholarly purposes • Augment or create metadata to enhance accessibility and
management of content • Electronically view, present and display the full digital content
to others, including providing open access via the web • Electronically store, archive, copy and/or convert the digitized
content for preservation and access purposes
Create and use Permissions Agreements
Page 47
DPOE Baseline Principles (1-2)
1. Define the digital content within your scope of responsibility [Identify]
2. Specify the digital content you need/want to preserve [Select]
DPOE Baseline Modules: Wrap Up, version 2.0, Nov 2011
Page 48
DPOE Baseline Principles (3-6)
3. Establish requirements for storing files in preservation formats [Store]4. Determine (and review) your best option for storing your content [Store]5. Ensure that your content is secure during day-to-day activities [Protect]6. Work to ensure that your content is prepared for an emergency [Protect]
DPOE Baseline Modules: Wrap Up, version 2.0, Nov 2011
Page 49
7. Develop (and review) plans for managing content over time [Manage]
8. Use policies to contain and develop your preservation program [Manage]
9. Remember that long-term access is the purpose of preservation [Provide]
10. Make sure the means to deliver content to users remains current [Provide]
DPOE Baseline Principles (7-10)
DPOE Baseline Modules: Wrap Up, version 2.0, Nov 2011
©iStockphoto.com/CGinspiration
Page 50
Resources•“Digital Preservation Management: Implementing Short-Term Strategies for Long-Term Problems” Online Tutorial: http://www.dpworkshop.org/dpm-eng/eng_index.html Survey of Institutional Readiness: http://www.dpworkshop.org/
• "Planning for Digital Preservation: 20 Questions for Providers of Digital Storage Services," Bernard Reilly, Center for Research Libraries http://www.nedcc.org/resources/digital/downloads/QuestionstoAskProvidersofDigitalStoragefinal.pdf
• "Digital Preservation Metadata Standards," Angela Dappert and Marcus Enders, Information Standards Quarterly, Spring 2010, Volume 22, Issue 2 http://www.loc.gov/standards/premis/FE_Dappert_Enders_MetadataStds_isqv22no2.pdf
Page 51
More Resources• ICPSR Digital Curation:
http://www.icpsr.umich.edu/icpsrweb/ICPSR/curation/
• Trustworthy Repositories Audit & Certification (TRAC): Criteria and Checklist (2007):
http://www.crl.edu/sites/default/files/attachments/pages/trac_0.pdf
[NOTE: ISO 16363 version of TRAC approved fall 2011]
• Center for Research Libraries Reports on Digital Archives and Repositories: http://www.crl.edu/archiving-preservation/digital-archives/digital-archive-reports
• “Digital Preservation Outreach and Education,” Library of Congress. http://www.digitalpreservation.gov/education/
Recommended