39
Bringing Order to Chaos: Preparation and Organization for Long-Term Access University of Alabama Libraries Jody L. Jody L. DeRidder DeRidder Image courtesy of Life Magazine

Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

Embed Size (px)

Citation preview

Page 1: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

Bringing Order to Chaos: Preparation and Organization for Long-

Term Access

University of Alabama LibrariesJody L. DeRidder Jody L. DeRidder

Image courtesy of Life Magazine

Page 2: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

Software Changes...

File Formats change...

Question: What would it take to reconstruct YOUR digital library in another software system, from scratch?

Athley, Jake. 2009.“Understanding the Digital Asset Life Cycle.” Widen Enterprises.

ONLINE IS NOT ENOUGH!!

...and sometimes, we run out of money!

Page 3: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

Where's the TIFF?

No tiff??

Reference to archival file missing in OAI exports

Not valid XML!

Page 4: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

Again, Where's the TIFF?

Page-level metadata AND reference to archival file missing in CONTENTdm XML exports ALSO.

… Tab-Delimited text export is your only hope of reconstruction.

No tiff??

Page 5: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

Identification…??

<object> <object><url></url><file></file><filenb></filenb><item></item><filena></filena><fullrs></fullrs><rm></rm><databa></databa><identi></identi>

32 different file naming schemes, each with anomalies that did not fit the collection’s own pattern

10 possible fields in which to find an identifier:

Many metadata files had NO identifiers or ones which did NOT match the filename

Sometimes CONTENTdm changed the archival filename on upload…

Page 6: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

File storage is a lot like a basement closet...

Image courtesy of Teemo, Master of Clowning

Image courtesy of Life Magazine

What happens when it's time to move???

Page 7: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

Bringing Order to Chaos

1) Identification

2) Consistency

3) Organization

4) Documentation

University of Alabama Libraries

Holder ID: u0003

Collection ID: 0000023

Item ID: 0000007

Sequence ID: 0005

Archival File: u0003_0000023_0000007_0005.tif

Page 8: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

u0003_0001980_0000001 is the first digitized item in the MSS 1980 collection

HOLDER ID

COLLECTION ID

Page 9: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

(Unambiguous) Identification

u0003_0001604_0000001_0004.tif

…depends on US!!!(not the software)

Tuscaloosa Service Men's Center Scrapbook, 1943-1946. MSS 1604, William Stanley Hoole Special Collections Library, University of Alabama.

Page 10: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

Consistency 1800-1860: Hugh Davis Farm JournalsVoyages dans l’Amerique SeptentrionaleJesse Griffin Letter, 1813 SeptemberNehemiah Denton papers, 1831-1844 F.H. Petrie Letters, 1831-1833  

1861-1865:George S Smith DiaryConfederate Imprints Sheet music S. R. Norton Letters, 1864-1865  1866-1899: S. D. Cabaniss PapersJoe WheelerJosiah and Amelia Gorgas Family Papers

1900-1919: Roland Harper Railroad TimetablesCentral Iron and CoalDaphne Cunningham DiaryEugene Allen Smith

Page 11: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

collection linking

Page 12: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

CONSISTENCY! In merging collections, you discover all the different metadata variations you have…Item IdentifierFilenameIdentifierTitleOther TitleCover TitleFirst Line of TextFirst Line of ChorusMasthead TitleSeries TitleSpecial IssueTitle from PlateSubject(s)DescriptionBiographic and Historical NoteScope and ContentTranscript URLProvenanceFunding InformationAbstractCreator(s)Arranger(s)Author(s)Composer(s)Conductor(s)Diariest(s)Etcher(s)Instrumentalist(s)

Interviewee(s)Lyricist(s)Photographer(s)Sender(s)Vocalist(s)Work(s)PublisherDigital PublisherDonor(s)Funder(s)Contributor(s)Editor(s)Interviewer(s)Performer(s)Recipient(s)Date(s)Date of PhotographPerformance DateDate ISOType(s)Genre(s)FormatAlbum NumberBibliographic CitationBox NumberCall NumberCollection NumberContainer Number

Folder NumberPlate NumberPhotograph NumberSourceLanguage(s)RelationPublished InDigital CollectionRepositoryRepository CollectionsIs Referenced ByMode of AccessCoverageLocationPerformance LocationPlace of PublicationRecipient LocationSender LocationStates ServedRightsTermsAudienceSorting NumberStaff NotesTranscriptObject File Name

Page 13: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

Item Identifier:item:TEXT:SMALL:BLANK:BLANK:NOSEARCH:HIDE:NOVOCAB:BLANKFilename:filena:TEXT:SMALL:BLANK:BLANK:SEARCH:NOHIDE:VOCAB:identiIdentifier:identi:TEXT:SMALL:BLANK:BLANK:SEARCH:HIDE:VOCAB:identiTitle:title:TEXT:SMALL:BLANK:BLANK:SEARCH:NOHIDE:NOVOCAB:titleOther Title:other:TEXT:SMALL:BLANK:BLANK:SEARCH:NOHIDE:NOVOCAB:titleaCover Title:cover:TEXT:SMALL:BLANK:BLANK:SEARCH:NOHIDE:NOVOCAB:titleaFirst Line of Text:first:TEXT:SMALL:BLANK:BLANK:SEARCH:NOHIDE:NOVOCAB:titleaFirst Line of Chorus:firsta:TEXT:SMALL:BLANK:BLANK:SEARCH:NOHIDE:NOVOCAB:titleaMasthead Title:masthe:TEXT:SMALL:BLANK:BLANK:SEARCH:NOHIDE:NOVOCAB:titleaSeries Title:series:TEXT:SMALL:BLANK:BLANK:SEARCH:NOHIDE:NOVOCAB:titlea

image supp

Collection directory in /contentdbs

index

etc

config.txt

Configure it once... Then copy the config file to the other directories.

cp coll1/index/etc/config.txt coll2/index/etc/config.txt

Page 14: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

Capturing ALL the metadata on EVERY level for preservation

<mods><mods> <titleInfo displayLabel="<titleInfo displayLabel="titletitle“> <title> “> <title> 6th grade class picture </title> </titleInfo></title> </titleInfo> <name type="corporate“><namePart> <name type="corporate“><namePart> Ebsco Industries </namePart></namePart> <role><roleTerm authority="marcRelator" type="text"> <role><roleTerm authority="marcRelator" type="text"> FunderFunder </roleTerm></role> </roleTerm></role> </name></name> <<typetypeOfResource> OfResource> Still Image </typeOfResource></typeOfResource> <<genregenre authority="bgtchm“> authority="bgtchm“> Photographs </genre></genre> <originInfo> <<originInfo> <dateCreateddateCreated> > early 1900s <dateCreated></originInfo><dateCreated></originInfo> <physicalDescription> <physicalDescription> <<extentextent> > 1 photograph : gelatin developing-out paper, black and white ; 5 x 7 in. on mount 5 x 7 in. </extent></extent> </physicalDescription></physicalDescription> <note displayLabel="<note displayLabel="DescriptionDescription“>“> Jeff Coleman with his 6th grade classmates at Seth Mellew elementary school </note></note> <note displayLabel="<note displayLabel="Funding InformationFunding Information" type="sponsorship">" type="sponsorship"> The digitization of this collection was funded by a gift from EBSCO Industries. </note></note> <identifier type="local" displayLabel="<identifier type="local" displayLabel="FilenameFilename“> “> u0001_2008002_0000001 </identifier> </identifier> <<subjectsubject><><geographicgeographic> United States--Alabama--Sumter County—Livingston </geographic> </subject></geographic> </subject> <<subjectsubject authority="lcsh"> authority="lcsh"> <topic> <topic> Coleman, Jefferson Jackson </topic></topic> </subject></subject> <<subjectsubject authority="lcsh">authority="lcsh"> <topic> <topic> Seth Mellew Elementary School </topic></topic> </subject> </subject>

Archivists Utilitytranslates spreadsheet rows to MODS xml

Page 15: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

mods

Organization starts with the working area!

Before…

And after!

Page 16: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

working area

A Collection Folder in the Working Area

Collection folders are named for the collection identifier. Allowed subfolders include: Admin Metadata Scans Transcripts

Compound objects have their own subfolders for pages, named for the item.

Page 17: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

Consistency and organization are cost-saving.

...and they let you AUTOMATE your work.

Page 18: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

An Example of the Lowest- Cost Model: The Alabama Digital Preservation Network http://www.adpn.org/

http://www.lockss.org/

Lots of Copies Keeps Stuff Safe!!

Page 19: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

storage area

Simple, Clear Hierarchical Organization:

Holder ID Collection ID Item ID Sequence ID

Page 20: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

u0003 slide

Identification, Organization and Consistency

Each segment of numbers:

Holder ID Collection ID Item ID Sequence ID

is used in the directory structure.

Page 21: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

file org pattern storage areaAutomated file

storage and creation of LOCKSS Manifests:

… a VERY good thing!

Organization and Consistency Pay Off

Page 22: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

DOCUMENTATION

http://www.lib.ua.edu/wiki/digcoll

Page 23: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

Documentation is a wonderful thing…

it helps your digital content survive … well into the future. http://www.formatregistry.org

Page 24: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

How do you know if your file has been altered?

Can you verify that this is the unchanged original?

(it’s not that hard)

http://www.thefreecountry.com/utilities/free-md5-sum-tools.shtml

Tuscaloosa Service Men's Center Scrapbook, 1943-1946. MSS 1604, William Stanley Hoole Special Collections Library, University of Alabama.

Page 25: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

Get a CONTENTdmStandard XML Export

Page 26: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

California Digital Library 7Train Software

http://seventrain.sourceforge.net/

Page 27: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

CDL METS

Descriptive Metadata is in the dmdSec

Page 28: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

California Digital Library 7train on CONTENTdm Standard XML Export…

NO Item-level information beyond the title… but LOOK! You get the OCR!

Page 29: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

File System

LIVE Links …for web delivery

NOT intended for preservation.

What good is this in 50 years??

Page 30: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

/contentdbs/{coll}/index/description/desc.all

/contentdbs/{coll}/supp/{dmrecord number}

/contentdbs/{coll}/image/

Matching it all up!! Identification is a wonderful thing.

Where’s my JPEG?Where’s my metadata?

(then look up the parent dmrecord number in desc.all)

Page 31: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine
Page 32: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

Holder ID: u0003------------------------ Collection ID: 0000001 ----------------------------- Item ID: 0000003 ----------------------- Sequence ID: 0002 ---------------- Sub-Page: 004 -------------File:

u0003_0000001_0000003_0002_004.tif

Metadata and Documentation stored at the applicable level

METS documents how files relate to one another in a hierarchical structure… which we already have!!!

Page 33: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

Dropping the Technical Metadata in… where it belongs

Makes METS creation a Piece of Cake!

(and redundant!)

Page 34: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

Output →

XML Output →

Page 35: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

MIX:Metadata for Images inXML

http://www.loc.gov/standards/mix/

Page 36: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

AudioMD: Audio Technical Metadata

http://www.loc.gov/rr/mopic/avprot/

Page 37: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

Don’t forget to add the namespace at the top! xmlns:mix=http://www.loc.gov/standards/mix/xmlns:audioMD=“http://www.loc.gov/standards/AudioMD/”

METS has 5 sections:

• Descriptive Metadata section: dmdSec• Administrative Metadata section: amdSec • File Group section: fileSec• Structural Map: structMap• Behavior: behaviorSec

http://www.loc.gov/standards/mets/METSOverview.html

So where does this technical information GO??

<mets:amdSec> <mets:techMD ID=“MIX1“> <mets:mdWrap MDTYPE="NISOIMG"> <mets:xmlData> <mix:mix> <mix:ImageCreation>

<mets:fileSec> <mets:fileGrp USE="image/master"> <mets:file ID="FID1" MIMETYPE="image/tiff" SEQ="1" CREATED="2003-01-22T00:00:00“ ADMID=" MIX1" GROUPID="GID1">

Put it here!

Refer to it here!

Page 38: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

What’s confusing about this?

Simple, Clear,Low Cost,Scalable.

That’s a good thing.

Page 39: Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine

Bringing Order to Chaos

1) Identification

2) Consistency

3) Organization

4) Documentation

University of Alabama Libraries

Holder ID: u0003

Collection ID: 0000023

Item ID: 0000007

Sequence ID: 0005

Archival File: u0003_0000023_0000007_0005.tif

Jody L. [email protected]