27
Finding the Balance An attempt at modeling differentiated storage for digitized collections : finding the balance between storage, costs and preservation of digitized publications. Trudie Stoutjesdijk, September 5th 2013

Finding the balance ipres2013

Embed Size (px)

DESCRIPTION

The Koninklijke Bibliotheek (KB) digitizes the national collection of the Netherlands. Digitization leads to multiple versions of a publication: a digital access file, a digital master file, back-ups of the digital versions and the physical original publication. This in turn increases the need for storage capacity quickly. And raises questions like: Should all versions be stored? Do all the versions need to be preserved in order to ensure permanent access, and if so which ones should be preserved and how? Based on the collection care plan and the content strategy a differentiated storage policy is set up in order to establish a relation between the physical object and the digital counterpart(s). This method assigns value to different collection lots and is used to find out how to apply collection care in an efficient way.

Citation preview

Page 1: Finding the balance ipres2013

Finding the BalanceAn attempt at modeling differentiated storage for digitized collections : finding the balance between storage, costs and preservation of digitized publications.

Trudie Stoutjesdijk, September 5th 2013

Page 2: Finding the balance ipres2013

How to f ind the balance….

Digit ization

•Multiple versions of a publication•Which versions should be stored?•What representation is the object of preservation?•How can we reduce the need for storage?

Finding the balance Trudie Stoutjesdijk, September 5th 2013

Page 3: Finding the balance ipres2013

Agenda• Who we are• What we have• Finding the balance

Finding the balance Trudie Stoutjesdijk, September 5th 2013

Page 4: Finding the balance ipres2013

Who we are• National Library

• Strategic Plan 2010-2013• We offer everyone access to

everything published in and about the Netherlands

• We improve the national information infrastructure

• We guarantee long-term storage of digital information

• We maintain, present and strengthen our collection

Finding the balance Trudie Stoutjesdijk, September 5th 2013

Page 5: Finding the balance ipres2013

What we have1. Collection Development

Programme2. Collection Care Plan3. Storage Management4. Digital Preservation System

Finding the balance Trudie Stoutjesdijk, September 5th 2013

Page 6: Finding the balance ipres2013

What we have1. Collection development programme

(2010-2013)

• Collect and preserve everything published in and about the Netherlands

• Transition from printed to digital format is key priority.• Collect 50% of all Dutch digital born publications• Harvest 10.000 websites• Digit ization of al l the books, periodicals and

newspapers since 1470 (60 M pages before 2014)

Finding the balance Trudie Stoutjesdijk, September 5th 2013

Page 7: Finding the balance ipres2013

Finding the balance Trudie Stoutjesdijk, September 5th 2013

Output: •Digital objects in JPEG2000.•Different versions of an object: master, access, back-up, physical publication.

Rapid increase in the number of items and total cost for storage

What we have

10% of all books, periodicals and newspapers (since 1470), digitized before 2014.

1. Collection development programme : Digit ization

Page 8: Finding the balance ipres2013

What we have2. Collection Care Plan

Integrated, efficient and effective collection care for both physical and digital collections, based on the following principles: •Integrated collection care for digital files and physical objects•Value assessment of collect ions•Risk identification•Differentiated levels of collection care•Care redirected from the most valuable collections, to those where the biggest loss of value is expected.

Finding the balance Trudie Stoutjesdijk, September 5th 2013

Page 9: Finding the balance ipres2013

2. Collection Care

Finding the balance Trudie Stoutjesdijk, September 5th 2013

What we have

Differentiated col lect ion care based on a rat ional selection tool: value assessment•Divide the collections in different collection lots or categories•Describe collection units•Establish the definition of every criterion•Rate every collection unit•Calculate the average valueResult: The level and duration of collection care

Primary cri teria Secondary cri teria

Informational value UseAesthetic value CompletenessHistorical value Condition

Social value Provenance

Page 10: Finding the balance ipres2013

What we have

Hierarchical storage management (HSM)

Finding the balance Trudie Stoutjesdijk, September 5th 2013

• Using several tiers defining different levels of storage quality.

• Based on different needs.

• Use more than one type of media (HDD, Magnetic Tape).

3. Storage strategy

Page 11: Finding the balance ipres2013

What we have4. Digital Preservation System

•e-Depot system (DIAS)at the end of its natural life: •New Digital Preservation System (DPS)•2012 migration from DIAS to new DPS•2013 new ingest workflows for born digital publications.•Next step: new ingest workflows for all the digit ized collect ions.

Finding the balance Trudie Stoutjesdijk, September 5th 2013

Page 12: Finding the balance ipres2013

How to f ind the balance….It is impossible to preserve all the versions at the highest preservation level.

The value assessment provides insight in: - The level and duration of collection care- The relation between physical object and digital counterparts.- The relation between the state of the physical object and the necessity of preservation imaging and sustainable storage.

Finding the balance Trudie Stoutjesdijk, September 5th 2013

Page 13: Finding the balance ipres2013

A differentiated storage policy has been applied on the digitized collections; based on the following secondary values:

• Use- The availability of digital content for the

customer • Condition- The vulnerability of the physical resources- Sustainability of digital storage

Finding the balance Trudie Stoutjesdijk, September 5th 2013

Finding the balance: Differentiated storage model for digitized collections

In anticipation of the results of the value assessment we tried to identify classification levels.

Page 14: Finding the balance ipres2013

Finding the balance Trudie Stoutjesdijk, September 5th 2013

Collection Care: Classif ication levelsPreservat ion level

1. 2. 3. 4. 5.

Representation available?-Digital Master No No Master light Preservation

masterPreservation master

- Access file No Yes Yes Yes Yes- Physical original

No Yes Yes Yes Yes

Preservation copy available?  No No Physical

originalPreservation master

- Physical original - Preservation master

Effort of conservation / preservation careActive     Physical

originalpreservation master

Physical original and digital master

Passive   physical original; access file

Master light physical original

 

Finding the balance: Differentiated storage model for digitized collections

Page 15: Finding the balance ipres2013

Finding the balance Trudie Stoutjesdijk, September 5th 2013

Collection Care: Classif ication levels

Finding the balance: Differentiated storage model for digitized collections

Preservat ion level 1.

Representation available?

-Digital Master No

- Access file No

- Physical original NoPreservation copy available?

  No

Effort of conservation / preservation care

Active  

Passive  

Level 1: -Lowest imaginable level. -For use only. -Contains no representations and there’s nothing to preserve. -Example: the reference collection which is being transformed from physical to digital.

Page 16: Finding the balance ipres2013

Finding the balance Trudie Stoutjesdijk, September 5th 2013

Collection Care: Classif ication levels

Finding the balance: Differentiated storage model for digitized collections

Level 2: -Digitized for use.-Contains publications that can be digitized more than once.-Condition is good and will continue under the current circumstances.-No need for a digital master unless decay strikes-Example: all foreign titles of the Google project

Preservat ion level

2.

Representation available?-Digital Master No- Access file Yes- Physical original Yes

Preservation copy available?

  NoEffort of conservation /

preservation careActive  Passive physical

original; access file

Page 17: Finding the balance ipres2013

Finding the balance Trudie Stoutjesdijk, September 5th 2013

Collection Care: Classif ication levels

Finding the balance: Differentiated storage model for digitized collections

Level 3: -Digitization for use-Contains objects that represents multiple values-Physical object is in a quite good condition. Can be digitized repeatedly-No need for preservation image-Active preservation: physical original.-Example: large parts of the special collection (18th century)

Preservat ion level

3.

Representation available?-Digital Master Master light

- Access file Yes- Physical original Yes

Preservation copy available?  Physical original

Effort of conservation / preservation care

Active Physical original

Passive Master light

Page 18: Finding the balance ipres2013

Finding the balance Trudie Stoutjesdijk, September 5th 2013

Collection Care: Classif ication levels

Finding the balance: Differentiated storage model for digitized collections

Level 4: -For use and preservation-Objects with high information value, hardly value as an object.-The material can be fragile, digitization can sometimes be done only once-Maintenance of the physical object may not be possible in the future-Create high quality preservation master-Example: Metamorfoze Nat. program for the Preservation of Paper Heritage

Preservat ion level

4.

Representation available?-Digital Master Preservation

master- Access file Yes- Physical original

Yes

Preservation copy available?  Preservation

masterEffort of conservation / preservation

careActive preservation

masterPassive physical original

Page 19: Finding the balance ipres2013

Finding the balance Trudie Stoutjesdijk, September 5th 2013

Collection Care: Classif ication levels

Finding the balance: Differentiated storage model for digitized collections

Level 5: -For use and preservation-Contains fragile, precious objects-Physical object represents primary values that might not be reflected in the digital master-Can only be digitized once-High quality digital master-Example: Bookbinding of William the Silent

Preservation level

5.

Representation available?-Digital Master Preservation

master- Access file Yes- Physical original Yes

Preservation copy available?  - Physical original

- Preservation master

Effort of conservation / preservation care

Active Physical original and digital master

Passive  

Page 20: Finding the balance ipres2013

Finding the balance Trudie Stoutjesdijk, September 5th 2013

Digit ized collect ions and storage costs.

Finding the balance: Differentiated storage model for digitized collections

Currently the output of digitization process is a digital master and a digital access file.

Page 21: Finding the balance ipres2013

Finding the balance Trudie Stoutjesdijk, September 5th 2013

Finding the balance: Differentiated storage model for digitized collections

Type of publication Total costs

Books Storage/year Digitization / page

Master € 0,01 € 0,72

Access file € 0,008 € 0,56

Master & Access € 0,02 € 1,28

Newspapers Storage Digitization

Master € 0,02 € 1,08

Access file € 0,01 € 0,93

Master & Access € 0,05 € 2,01

Journals Storage Digitization

Master € 0,01 € 0,77

Access file € 0,009 € 0,61

Master & Access € 0,01 € 1,38

Costs based on TCO storage & digit ization

Page 22: Finding the balance ipres2013

Finding the balance Trudie Stoutjesdijk, September 5th 2013

Finding the balance: Differentiated storage model for digitized collections Classif ication levels & Cost savingsThe application of the five level classification model reduce the storage costs of digitized publications for 2 levels.

•level 2 will not contain digital master files. This could reduce the costs with 30 – 40%. •level 3 a digital master light will be created; a master light could require less image quality than a preservation master which could reduce the size of a digitized publication, less storage costs.

Page 23: Finding the balance ipres2013

Finding the balance Trudie Stoutjesdijk, September 5th 2013

Finding the balance: Differentiated storage model for digitized collections

Alternatives for cost saving

New digital master or digital access files needed: •The access file no longer meets the requirements of the user,•technologies offers new opportunities, possibly better and smaller digital masters •the original physical decay appears to be stronger than expected...

Rescanning and/or conversion?

Page 24: Finding the balance ipres2013

Finding the balance Trudie Stoutjesdijk, September 5th 2013

Finding the balance: Differentiated storage model for digitized collections

Rescanning : i.e. re-digitization of (parts of) the collection.•Level 1 has no objects.•Level 2 when decay increases•Level 3 has 2 digital copies, decay / obselescence•Level 4 & 5 rescanning is undesirable / impossible.

Conversion: generate a digital access file from the digital master.•Can offer a solution, for level 4 and 5, (vulnerable physical collections). Conversion on the f ly: generate a digital access file on demand•Suitable for level 3, 4 and 5, access files don’t need to be stored.

Page 25: Finding the balance ipres2013

Finding the balance Trudie Stoutjesdijk, September 5th 2013

Finding the balance: Differentiated storage model for digitized collections

Conversion and/or on the f ly conversionPro’s•Appropriate and efficient method for permanent storage and access of the collections.•Good solution for the collections at level 4 & 5•Probably cost saving on production•Cost saving on storage

Con’s •A system intensive activity that could create a bottleneck in the delivery to the end user •Insufficient knowledge about the technique•No insight in the costs

Started research on conversion by the Research Department

Page 26: Finding the balance ipres2013

Finding the balance Trudie Stoutjesdijk, September 5th 2013

Finding the balance: Differentiated storage model for digitized collections Wrap up: Tried to realise a modelLessons learned:•Value assessment helps to gain insight in the value of collections•USE and CONDITION of collections helps to find a balance between permanent access and costs•Transparency of the costs.

• Rescanning is not feasible for publications that are in vulnerable state.

• Conversion might seem preferable to that of rescanning. • Investigation of the conversion / on-the-fly conversion

technique is necessary to gain insight into the benefits of this method. In particular with respect to applicability, performance and efficiency.

Page 27: Finding the balance ipres2013

Thank you!

Trudie Stoutjesdijk, September 5th 2013