Upload
imani-roman
View
39
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Harvesting and DAMS. Glen Robson, DAMS Manager, National Library of Wales. What do we do when it gets here. Normalise Meta data Migrate? Storage Access. Normalise Metadata. Consistency Convert to NLW standards (METS) Consistent METS between projects Add technical metadata - PowerPoint PPT Presentation
Citation preview
Harvesting and DAMSGlen Robson, DAMS Manager, National Library of Wales
What do we do when it gets here•Normalise Meta data•Migrate?•Storage•Access
Normalise Metadata
•Consistency Convert to NLW standards (METS) Consistent METS between projects
•Add technical metadata▫Link file format to PRONOM registry▫Automatic technical metadata
Jhove or NZ metadata extraction tool•Add preservation metadata (PREMIS)
▫Objects history
Harvesting
•Take a copy of metadata and Thesis•Different formats
▫PDF, Word and Text•Complex Objects
▫E.g. 1 PDF per chapter
Migration
•Input:▫221 application/msword ▫4 application/octet-stream▫114 application/pdf ▫3 application/vnd.ms-excel▫340 text/plain
Now or later?
•Migrate on ingest▫How do you choose the format?▫Storage Cost
•Migrate on obsolescence▫Tools available?
Migration
•Microsoft Word▫Can open it now▫Have to have a copy of Word
•application/octet-stream▫Can’t open now
Storage
•LOCKSS•University copy•NLW Copy
▫Archive copy on tape▫Archive copy on Optical Disc▫Archive copy offsite▫Access copy
•Ethos copy
Access
•Convert to MARC▫Digital and Print in MARC▫Single Point of access for all collections
•Mostly automated▫Best use of resources
Lessons Learnt and Problems Encountered•Started using Fedora in 2004
▫Ingested 3 Digitisation Project 2 Mass Digitisation
▫Ingesting Video and Radio Programs•Started with Pilot•Purchased VITAL based on Fedora•Project Driven
Lesson 1: Physical carriers degrade or obsolete
Lesson 1: Physical carriers degrade or obsolete
Lesson 1: Physical carriers degrade or obsolete
Lesson 1: Physical carriers degrade or obsolete
Why is this a problem for the library?•Deposit
▫Sometimes no choice on carrier▫Depositors aren’t in a position to change
the carrier
Lesson 1: Physical carriers degrade or obsolete• Age• Storage conditions• Sun light • Temperature
• “Widely differing claims have been made for the life expectancy of CD-Rs, but it is generally accepted that they will last longer than the associated technology and are therefore suitable for preservation purposes. CD-Rs offer storage capacities of 650 MB to 700 MB. CD-RW is based upon a different recording process to CD-R, and is not recommended for archival storage.”
• http://www.nationalarchives.gov.uk/documents/media_care.rtf
Practical Example•Deposit of CDs from Cliff McLucas and Brith
Gof Theater company•22% of the Cliff McLucas CDs •60% from Brith Gof could not be copied or
read. •According to the sleeves, many of the Brith
Gof discs contain material relating to performances between about 1989 and 1992.
•Only real solution is to copy data from carrier as soon as possible
CDAS
Lesson 2: Digital can get BIG• Wills Project
▫ 182, 404 Wills▫ 816, 325 Images▫ 998, 729 Fedora Objects
• Welsh Journals▫ 50 Titles▫ Thousands of Pages
• Offair▫ 40,000 Records
• SCIF Newspaper and Magazines▫ 2 Million Pages
• Repository 3 Million plus Objects
Problems
•Processing takes time•Management•Discovery•Cost•Cataloguing / Metadata
Lesson 2: Digital can get BIG• Sgrîn – Cardiff Media Company• Company closing down (2006)• Collect data from Shared drive• Stats:
▫ 29.2 GB▫ 68,446 files
Microsoft Word Documents: 32,086 JPEG Images: 18,093 Rich Text Format: 2,707 Microsoft Excel Documents: 2,498 Microsoft Works Word Document: 2,127 Files with missing File extension: 2,036
• Selection?• Cataloging?
Lesson 3: Metadata is expensive•Accessioning:
▫Depositor adds metadata (Roda)▫Deposit comes with metadata (Ethos)
•Digitisation▫Structure / Context▫From Catalogue▫Write Once use many
•Automate as much as possible
Lesson 4: You can’t automate everything
•Offair Recording•Original Plan:
▫BOB System records programs Metadata from EPG
▫Harvest from BOB create MARC record ▫Ingest
•Totally automated
Lesson 4: You can’t automate everything
•Spanners in the works:▫Duplicate Recordings▫Failed Recordings▫EPG Errors
•New workflow:▫BOB System records programs
Metadata from EPG▫Fix failed validation records (Human
Process)▫Harvest from BOB create MARC record ▫Ingest
Lesson 5: Things Change
Ingest Early
•Items managed early•Missing items picked up earlier•Change / Creation at the same point•1 interface rather than 1 creation 1 edit
•Preserve but allow change▫Systems make it difficult
Lesson 6: Workflows not Projects•Develop specific Project based workflows•Have to be customised each time•Symptom of project based funding
•Digitisation Workflow•Generic Services
▫Technical Metadata▫Checksums
Preservation Paranoia
•Lesson we may learn:▫How much metadata is too much?▫How much technical metadata should we
have?▫Migrations MS-Word:
PDF Text Image of each page Open Office XML
Summary
•Physical carriers degrade or obsolete•Digital can get BIG•Metadata is expensive•You can’t automate everything•Things change•Workflows not Projects•Preservation Paranoia
Questions