Getting started with Digital Preservation in Your Library
Are we drowning, treading water or swimming across the river?
La Crosse Public Library Archives is housed in a medium sized Wisconsin public library. We have a staff of two professional archivists, one full time associate librarian and two part time associate librarians. While the associate level does not require education beyond a college degree, it happens that all of the people in this level in the Archives have masters degrees in Library Studies.
We have a small reading room that is super-vised during open hours (53 hrs/wk).
In 2016 we answered over 13,000 questions
We have 940 processed archival collections that include our local publications.
Not unlike many public libraries, our collections in the Archives are focused on local history and genealogy. We also geographically restrict our collecting but this is based on each collection within the Archives. For example, our book collection would yield the most diversity in terms of geography (local history plus some areas of western Wisconsin and southeastern Minnesota) while our photographic and archival collections would be the most confined as to geography.
Collections include books, manuscripts, public records, local publications, maps, photographic images and ephemera. These collections exist in a variety of formats including VHS, 16mm, 8mm, slides you get the picture.
Our collections that include electronic formats are primarily photographic images, manuscripts, public records, local publications and maps; however, the majority are still physical objects that require housing in a controlled environment.
We have both analog to digital data as well as born digital content, so for some items we may have the original physical item and a digital representation, while other born digital material exists only in electronic format.
We also have materials that are lent to us for scanning and then returned to the donor in these cases we have analog to digital without owning the original object.
Appraisal is the term archivists use when determining if an item or group of items is worthwhile to be added to the repositorys holdings. I will not be talking about this assume that the appraisal decisions have already been made.
We look at collecting holistically in our minds we dont separate out different strategies for different formats. Meeting minutes are meeting minutes regardless if they are in an electronic format or not. How we handle them and provide access is a procedural difference not an intellectual one.
We created an email address for our electronically collected material called earchives@ so when we subscribe to newsletters or organizations, we can be added to distribution lists, etc. This way when someone leaves our department, we dont have to resubscribe to things.
We set up a structure on our storage server that has limited access to just Archives staff. We mirrored the structure of our collections that we label as archival (i.e., not books, maps, and other published items) so you can almost think of this file directory folders as recreating our separate collection types.
Photographs are in a different place on the server.
La Crosse Series
We collect born digital local publications in one of two ways either through email or we download them from websites.
If the materials are embedded or formatted inline email (i.e. not attached as a separate document to the email), then we save them as an HTML webpage but only save the first page. Why? Because we need to balance relevance to storage costs eventually the read more links will go dead on the host server but we will have saved the first page. We made an appraisal decision here based on collection samples.
Most of the emails contain an attachment or are directing us to their webpage. We save these as PDF documents to preserve formatting and have less version migration issues later. We can also more readily assure customers that this is an authentic capture. At this point we are also changing the file name to conform to our convention beginning with the year_month_date.
Sample of the harvest workflow worksheet
Now that we have identified and isolated electronic data we want to save, we use a file naming convention to help us access specific issues. This first column is part of the local publication list of files; the second column is part of one of those titles.
This is a very similar approach to how we handle local publications. Again it can be materials where we have digitized analog material to electronic content (so we likely have both the physical and digital representation) as well as content that comes to us electronically.
The file structure is similar again as it is based on the collection type and identifier but in this case these are meeting minutes instead of serial publications.
We use Archivists Toolkit as our collection management software. We maintain lists of the processed materials such as local publications as well as manuscripts and public records. There is an accessioning module as well and indexes to maintain name authority and subject authority lists.
The following is an example of the finding aid in AT for the LLFA collection for which we just looked at the file structure. AT is meant as a staff only tool the public never has access to it. We push the Encoded Archival Description or EAD out to our webpage for public access. The full finding aid for example can be found here:
What you are seeing here is the container list for the collection. However, there are
many notes (abstract, scope and contents, etc.) at the collection level description,
including a general note that says meeting minutes 2012-2017 are available only in
Here is the nitty gritty staff view of where the electronic content lives.
Here is an example of a mix of types of electronic materials within the same manuscript collection this helps orient the staffer to find what the customer needs. Ideally these would match the series names and structure of the finding aid.
We have physical and digital photographs in a variety of places. We maintain a Picture file that contains a wide variety of photographers, scenes, people, time periods and the like that are largely unrelated to any other type of collection, although we might have a see reference in the picture file subject heading list to particular manuscripts or public record series.
The file naming convention we use for these is based on a physical picture file collection identifier based on our subject heading list to help group photographic images together; then the box number; folder number and item number. For example: pc012-02-19-002.tif This tells the staffer exactly what collection, box, folder and item number they need to look for to find the original if we have in physical form.
For photographic collections from a single source or with specific restrictions or crediting demands, we tend to create a manuscript collection from this. We have more control and it makes sense to intellectually group these like materials together rather than have them loose amongst many other photographic images.
If we receive digitally born photographs, we dont change the name generally of the file. If we are scanning then we use the file naming convention mentioned previously. You may have noticed that when you saw the file structure of the local publications and LLFA minutes.
For photographs, we describe them individually rather than as a collection or group of items that we normally do in archival practice. The reason is because we need that granularity in the online searching environment. We also need individual control over each photo because we do not have rights to every photograph.
We devised a metadata spreadsheet that gives us intellectual control as we work through scanning the physical collection. This same spreadsheet can be used for digitally mastered images as well.
The file highlighted is an electronic version while the others listed in this
example are physical holdings.
About every two years, we face major migration and software upgrades which can make reading older data created in that software more problematic. I have often faced formatting or other challenges. You need to consider not only keeping backups of files but the software on which to manage and access those backups and file versions. This is not an easy thing to impress upon your IT guru. If you are ingesting digital content that is created outside your control, this could be a major stumbling block.
Our collection management software Archivists Toolkit is an open source tool that is no longer supported. We are stuck using an older version of Java to use it and the IT person wants us all up on new Windows 10 pcs by the end of the year. So we are investigating our options to ArchiveSpace but its no longer free and there will be learning curves and likely migration issues.
Where does the
collections land on
the NSDP chart?
We have dipped our toes and even begun to wade out into the deeper water of digital preservation but knowing that we cannot yet go and swim with sharks.
The methods we have created to deal with the influx of donations of digitally mastered photographic info are very time consuming gone are the days of just add this to the brewery folder each item is accessioned, tracked with donor information, determination made on copyright, verifying metadata that the photograph may have come with, then the actual scanning and physical processing.
While we have worked successfully with individual departments, building up trust such as the Fire Dept. and the Airport folks to digitize their historical photographs, we hope to get involved more with City Hall and be apart of a larger discussion of digital born records well beyond photographic images.
Some obstacles we face right now include the lack of a city manager; the lack of trust on the part of IT to include the Library on its enterprise system; and the fact that each department works independently and without much guidance or support. Some departments use proprietary software to meet their needs, such as the Legal Dept. and Finance.
The city does not currently have a retention and disposition schedule which is an even larger problem.
We did an experiment with the commercial vendor Archive It but abandoned this after the trial. It was too cumbersome for the staff and customers to find anything on it. This was several years ago so we might revisit this in the future. If we did this, we would probably try to capture the Citys website for sure and perhaps select private or non-profit sites. The WHS is already capturing county level information.
We might employ Adobe Acrobat Professional to increase the number of layers we capture of selected websites.
Or, Dont do what I did
Rather than being overwhelmed or leaping into the water before youve had a swimming lesson, take a deep breath and think things through. Create a plan.
What material(s) do you want to start with? Why do you want to start with that?
Review your relationship with IT and think about how your needs and those of your organization fit within ITs structure how much say and control do you have if any? Do you need to enlist partners or advocates to state your case?
While you may not have a dark archive to hold your master files, can you control who has access to the storage place on the server that holds this data?
Employ best practices from the get go. For instance, for photographic images the archival best practice is to save files as uncompressed TIF images. TIF is non-proprietary and should be considered your master. When we scan we have three tiers of images: raw, enhanced and derivatives. We also scan the backsides at a lower resolution to capture the metadata that lives there. If you expect your clientele to want to take a photo image and create wallpaper, beer packaging or huge mural size photographs, think about bumping up that resolution to 1200 dpi well beyond the norm. If you think thats a rare occurrence at your institution, then the standard 600 dpi should work for you. We actually employ different resolution for black/white images vs color.
Best practices for file naming conventions is important for consistency and ideally would not contain spaces or any special characters like dashes, parenthesis, commas, apostrophes, and the like. While Windows can deal with these now (remember the confines of an eight character file name?) you may have issues when trying to read or recover data by using another software or through a computer process.
Anita Taylor Doering
Senior Archivist and Archives Manager
La Crosse Public Library