6
MANAGING TECHNOLOGY Being a Library of Record in a Digital Age By Mark Cain “All we want to do is keep the knowledge we think we will need intact and safe.”—Fahrenheit 451 A lmost twenty years ago, I was a [somewhat] young librarian, working on the program staff for what at that time was called the Council on Li- brary Resources (now the Council on Library and Informa- tion Resources). 1 One day, in the large room that served the Council as both a library and meeting room, a collection of luminaries met. (I was not a luminary. I was a fly on the wall, invited as a staff member, not an active participant.) Among those represented were directors of ARL institutions, provosts, heads of learned societies. Jim Haas, president of the Council and one of the finest minds in the profession, presided over a mini-think-tank on a topic of some interest to me: digital preservation. Why was I so intrigued? First, this was the time when I was just becoming interested in library automation. In addi- tion, not long before joining the Council staff, I had worked at The University of Texas at Austin, where among other duties I’d served as preservation coordinator for the General Libraries. So the topic of the session was an intersection of two of my interests. From my days at Texas, I knew a fair bit about micro- filming as a way to retain the intellectual if not artifactual value of the printed word. But that day we talked about an alternative: preserving an endangered work of scholarship through digital means. The technique seemed to hold great promise; once an image was captured, it could be retained indefinitely, with no further loss of quality. Even back then, the Group members were well aware of the rapid obsoles- cence of information technologies, so they knew provisions would need to be made to migrate digital content from one storage medium to another, but compared to the sheer vol- ume of embrittled volumes and deteriorating microfilm, this seemed more an annoyance than anything else. How simplistic we were in our thinking. Eighteen years have past, and we have realized that the issues are more complex. In 1985, we were focused on digitization as the technological successor to microfilming. It is that, I suppose, but it’s much more involved than that. What It Means To Be Digital Books, journals, photographs, even microfilm are analog media. I include microfilm, because with a magnifying glass and a good flashlight you can read it. An Edison wax cylin- der is also an analog medium: a needle and a funnel will coax out the sounds. But a digital object is something abstract; it has been con- verted into machine language, and the particular combination of zeros and ones in and of themselves are meaningless un- less one knows the rules for the encoding, has the original encoding device or something comparable with which we can unscramble the signal or can use the rules to construct a new device capable of reading those bits and bytes. When these simple facts seep into the consciousness, the ramifications of digital preservation begin to crystallize. The challenges, to put it bluntly, are staggering. They are con- ceptual, philosophical, technical, procedural, political, finan- cial. And they may change from format to format. As I said earlier, digital scanning certainly can be viewed as a successor to microfilming. There are still tens of mil- lions of monographic and serial volumes in danger of de- struction because of the paper they were printed on. As the years pass, the acid content in this paper increases; pages get increasingly brittle. Yet the nature of libraries, or at least the nature of librar- ies in a collection-driven model, has been that many institu- tions hold copies of the same books. Not all of the same books to be sure: rare editions exist, and when one considers archives, there are plenty of unique materials that, save for the individual institutions that house and protect them, would someday turn to dust. But much of humankind’s written record exists in multiple copies. Unlike in the dystopian world of Bradbury’s Fahrenheit 451, more than two copies of Ecclesiastes exist in the world. (I probably have four cop- ies in my house alone.) So while we should be concerned about deteriorating volumes in our book stacks, the presence of fifty disintegrating copies, scattered throughout the globe, of the same monograph, should give us some comfort. But what about those items that began their lives as digi- tal documents? There are many of these: digital images, dig- ital sound and video recordings, unique databases in the sci- entific community, census records. And let’s not forget that massive array of electronic documents, most of which will never find their way to physical form. I’m speaking, of course, of the World Wide Web. The following quotation from Peter Lyman illustrates the scope and some of the chal- lenges presented by preserving the Web: The Web is the largest document ever written, with more than 4 billion public pages and an additional 550 billion connected documents on call in the “deep” Web. . . The Web is written in 220 languages (although 78% of it is in English) by authors from every nation. Ninety-five percent of Web pages are publicly accessible, a collection 50 times larger than the texts collected in the Library of Congress (LC), making the Web the information source of first resort for millions of readers. Nonetheless, the Web is still less than 10 years old, and the economic, social, and intellectual innovation it is causing is just beginning.... The average Web page contains 15 links to other pages or objects and The Journal of Academic Librarianship, Volume 29, Number 6, pages 405– 410 November 2003 405

Being a library of record in a digital age

Embed Size (px)

Citation preview

Page 1: Being a library of record in a digital age

MANAGING TECHNOLOGY● Being a Library of Record in a Digital Age

By Mark Cain

“All we want to do is keep the knowledge we think we will need intactand safe.”—Fahrenheit 451

A lmost twenty years ago, I was a [somewhat]young librarian, working on the program staff forwhat at that time was called the Council on Li-

brary Resources (now the Council on Library and Informa-tion Resources).1 One day, in the large room that served theCouncil as both a library and meeting room, a collection ofluminaries met. (I was not a luminary. I was a fly on thewall, invited as a staff member, not an active participant.)Among those represented were directors of ARL institutions,provosts, heads of learned societies. Jim Haas, president ofthe Council and one of the finest minds in the profession,presided over a mini-think-tank on a topic of some interestto me: digital preservation.

Why was I so intrigued? First, this was the time when Iwas just becoming interested in library automation. In addi-tion, not long before joining the Council staff, I had workedat The University of Texas at Austin, where among otherduties I’d served as preservation coordinator for the GeneralLibraries. So the topic of the session was an intersection oftwo of my interests.

From my days at Texas, I knew a fair bit about micro-filming as a way to retain the intellectual if not artifactualvalue of the printed word. But that day we talked about analternative: preserving an endangered work of scholarshipthrough digital means. The technique seemed to hold greatpromise; once an image was captured, it could be retainedindefinitely, with no further loss of quality. Even back then,the Group members were well aware of the rapid obsoles-cence of information technologies, so they knew provisionswould need to be made to migrate digital content from onestorage medium to another, but compared to the sheer vol-ume of embrittled volumes and deteriorating microfilm, thisseemed more an annoyance than anything else.

How simplistic we were in our thinking. Eighteen yearshave past, and we have realized that the issues are morecomplex. In 1985, we were focused on digitization as thetechnological successor to microfilming. It is that, I suppose,but it’s much more involved than that.

What It Means To Be DigitalBooks, journals, photographs, even microfilm are analogmedia. I include microfilm, because with a magnifying glassand a good flashlight you can read it. An Edison wax cylin-der is also an analog medium: a needle and a funnel willcoax out the sounds.

But a digital object is something abstract; it has been con-verted into machine language, and the particular combination

of zeros and ones in and of themselves are meaningless un-less one knows the rules for the encoding, has the originalencoding device or something comparable with which wecan unscramble the signal or can use the rules to construct anew device capable of reading those bits and bytes.

When these simple facts seep into the consciousness, theramifications of digital preservation begin to crystallize. Thechallenges, to put it bluntly, are staggering. They are con-ceptual, philosophical, technical, procedural, political, finan-cial. And they may change from format to format.

As I said earlier, digital scanning certainly can be viewedas a successor to microfilming. There are still tens of mil-lions of monographic and serial volumes in danger of de-struction because of the paper they were printed on. As theyears pass, the acid content in this paper increases; pages getincreasingly brittle.

Yet the nature of libraries, or at least the nature of librar-ies in a collection-driven model, has been that many institu-tions hold copies of the same books. Not all of the samebooks to be sure: rare editions exist, and when one considersarchives, there are plenty of unique materials that, save forthe individual institutions that house and protect them, wouldsomeday turn to dust. But much of humankind’s writtenrecord exists in multiple copies. Unlike in the dystopianworld of Bradbury’s Fahrenheit 451, more than two copiesof Ecclesiastes exist in the world. (I probably have four cop-ies in my house alone.) So while we should be concernedabout deteriorating volumes in our book stacks, the presenceof fifty disintegrating copies, scattered throughout the globe,of the same monograph, should give us some comfort.

But what about those items that began their lives as digi-tal documents? There are many of these: digital images, dig-ital sound and video recordings, unique databases in the sci-entific community, census records. And let’s not forget thatmassive array of electronic documents, most of which willnever find their way to physical form. I’m speaking, ofcourse, of the World Wide Web. The following quotationfrom Peter Lyman illustrates the scope and some of the chal-lenges presented by preserving the Web:

The Web is the largest document ever written, with more than 4 billionpublic pages and an additional 550 billion connected documents on callin the “deep” Web. . . The Web is written in 220 languages (although78% of it is in English) by authors from every nation. Ninety-fivepercent of Web pages are publicly accessible, a collection 50 timeslarger than the texts collected in the Library of Congress (LC), makingthe Web the information source of first resort for millions of readers.Nonetheless, the Web is still less than 10 years old, and the economic,social, and intellectual innovation it is causing is just beginning. . . .The average Web page contains 15 links to other pages or objects and

The Journal of Academic Librarianship, Volume 29, Number 6, pages 405–410 November 2003 405

Page 2: Being a library of record in a digital age

five sourced objects, such as sounds or images. For this reason, theboundaries of the digital object are ambiguous. . . . The Web is growingquickly, adding more than 7 million pages daily. At the same time, itis continuously disappearing. The average life span of a Web page isonly 44 days, and 44% of the Web sites found in 1998 could not befound in 1999.2

The preservation of Web pages assumes they are worthpreserving to begin with. A host of preservation decisionsmust be made, an almost overwhelming task in and of itself.And one has to move quickly; Web pages are so ephemeralthat, like will-o-the-wisps, many will soon be gone. “Asubiquitous as the Web seems to be, it is also ephemeral, andmuch of today’s Web will have disappeared by tomorrow.The implication is clear: if we do not act to preserve today’sWeb, it will disappear.”3

No doubt, many of these pages should disappear; theyreally are ephemeral. After all, Bob’s World War II site(“cuz I love stuff about war!”) probably can be passed overfor something from the American Historical Society. Thefirst order of business, then, is deciding what is worth pre-serving. The Web grows at such a rapid pace that collectiondecisions won’t be able to keep up with it, but the attemptmust be made. However the collection choices are made, ornot made, at least some of the Web must be preserved.

A digital document is almost the converse of a physicalone. Exact copies of a printed and bound volume could re-side in a thousand libraries but only be viewed by going toone of those libraries and physically picking it up. Con-versely, a digital object can be viewed from almost any-where, but it usually exists only as a single copy in a singlelocation on a single server.

The point here is that, while the millions of embrittledphysical volumes pose a daunting preservation challenge,their existence in multiple copies in many different locationsis in itself a preservation strategy of sorts. Good thing, too,because we are going to have our hands full trying to pre-serve important electronic documents that will never exist inphysical form and that cannot be perused without some formof automated intermediary, that is a computer.

Methods of Digital Preservation

“Freeze” the technology. What if we had been storingour digital archives on 5-1/4 inch diskettes? Do we havemany readers left for that format? The answer, today, isprobably yes; those original IBM PCs were built to last. Buta day is coming when the answer will be no. A simplisticapproach to digital preservation is to preserve all the tech-nologies that went into creating, displaying, and as appropri-ate manipulating the digital document. Yet draping SaranWrap over an Osborne computer or an old VAX and stick-ing the machine in a corner somehow seems a less than opti-mal preservation strategy. Takes up a lot of real estate too.

In 2001, the Library of Congress conducted a series offormal interviews, conversations and e-mail exchanges onthe topic of digital information preservation. Participantsincluded publishers, authors, libraries, non-profit organiza-tions, professional associations, in short, the range of indi-viduals and institutions that had a stake in the topic. Amongthe respondents, “the longevity of the storage medium was aconsistent concern. . . There are methods for error detection;however, at some point, there is concern that the integrity ofthe digital object is compromised.”4

We really don’t know the archival life span of digitalstorage media. Tests have been run on CD’s and diskettes,but unlike books, with which we have had centuries of expe-rience, these newer storage media simply haven’t beenaround long enough for us to know for sure if they’ll last.

Emulation and Technology “Re-creation.” A better strat-egy might be to emulate the original technology. Here’s anexample from real life: over the years, my son has obtainedalmost every type of video game playing system you couldimagine: the original Nintendo, Sega GAME GEAR, Nin-tendo GAME BOY, Nintendo 64, X-Box. One system henever got, though, was Super Nintendo, so he went out onthe Internet and found some emulation software that wouldrun on our home computer and allow him to play down-loaded version of such Super Nintendo games as “SuperMario World” and “Streetfighter.” The emulator can “trans-late instructions from original software to execute on newplatforms.”5

Instead of emulating the original platform through soft-ware, the hardware could be recreated. This could be doneusing a configurable chip.6

Data migration. Data can’t sit forever on the same plat-form, for like the shoes of a small child, technologies areoutgrown before they are outworn. As Harvard’s DaleFlecker puts it, “The digital realm. . . is characterized bycontinual, rapid technological change. Unless investmentsare made regularly to move materials from platform to plat-form, and from format to format, older resources will be-come unreadable or unusable.”7 Yet there are challenges tothis as well.

Another true story: I wrote my first book on yellow legalpads. Two years later, I decided to “word process” it, whichI did, on a Kaypro 4, an early personal computer that I pur-chased in 1985. (It was the size of a sewing machine,weighing over 20 pounds. I used to lug it around, but that’sanother tale.) The operating system was CP/M, a precursorto DOS, and the word processor was WordStar. It storeddata on 5-1/4 inch diskettes.

It wasn’t long before I realized that CP/M was on its wayout, but now I had a 100,000 word novel in digital form onan obsolescent technology. Nothing but another CP/M sys-tem could read the disks, so I needed to do something.

I had bought a Macintosh in 1986, but it couldn’t readthe diskettes, not unless the 5-1/4 inch floppies could befolded in half and stuffed into a 3-1/2 inch drive, then magicinvoked. Fortunately both machines had modems, so Ihooked them up to two different telephone lines and, at 2400characters per second, transferred the entire 100,000 wordsto the Mac. Macintosh, OS version 7.x. I’d lost all my for-matting; in fact I had a bunch of funny characters embeddedthroughout the text, but the text itself had survived. I usedmy Mac word processor (MacWrite) to remove the errantcharacters, reformatted the text, and stored it all on double-sided, double-density 800 KB diskettes.

A few years later I moved to the PC platform. Fortu-nately, I was by this time able to use some software onsomebody else’s newer Mac to read the MacWrite files,translate them to Microsoft Word 2.x, and store them on a1.4 MB diskette that could be read by both Mac and PCplatforms. Most, but not all, of the formatting came over. Ilost my smart quotation marks (“ ”); they were turned into

406 The Journal of Academic Librarianship

Page 3: Being a library of record in a digital age

dumb ones (� �). Once again, I went through the entire book,and manually fixed what was lost in translation.

The PC was on Microsoft Windows 3.x when I did this.Then I moved through Windows 95 to Windows 98, not tomention several versions of Word. In the fifteen years fromthe time I first digitized the novel in 1985 to when I finallygot it published in 2000, I had passed through three hard-ware platforms, three diskette types, five operating systems,three word processing applications and multiple versions ofthose applications.

My experience illustrates the challenges of data migra-tion. As technologies matured and standardized, things goteasier, but the migration was never perfect, and it was onlybecause I was the author and knew the text well that I wasable to maintain its integrity. Just migrating an item to an-other storage medium will not necessarily suffice. Digitalpreservation operations will have to contend with all of theseissues.

Rules Description. Perhaps one can carefully describe therules for the interpretation and display of all digital docu-ments, so that the conditions necessary to display them canbe recreated as the need warrants. Remember, machine lan-guage is just a combination of zeroes and ones. There is aterm for this in the field: persistent object preservation.

The opposite of migration, persistent object preservation (POP) entailsexplicitly declaring the properties (for example, content, structure,context, presentation) of the original digital information that ensure itspersistence. Of the strategies listed here, POP is the only one that startswith and remains focused on preserving the digital information from itsinception. Other strategies attempt to counter or overcome the generictechnical problem of obsolescence.8

I think of POP as the “Rosetta Stone” approach.In the long run, we will probably be lucky to preserve a

digital item in any form we can, even if there are no guaran-tees that we can decipher it later. This physically preservesthe bits and bytes, hoping that we can figure out how to log-ically interpret them later if we need to. “Because bit storageis possible and often not too expensive, this solution forphysical preservation has much to recommend it. . .”9 Withthis strategy alone, the field of digital forensics could be-come a boom industry.

Different Media, Different Formats, Different Standards

Libraries have focused on the preservation of books andjournals for many years, but of course there are many othermaterial types out there needing attention. Museums andarchives are our partners in shouldering this burden, at leastfor manuscripts; photographs; and art, cultural and historicalartifacts. There are also video (including motion picture andtelevision) and audio recordings; both the public and privatesectors have a stake in preserving these.

Each of these media poses its own challenges. Unliketext, which is a linear medium, stored a character at a time,an image is a gestaltist format; you store it all at once. Andthe first time you do so, you have something less than theoriginal; you have a copy, and image degradation has al-ready taken place.

So we have to make choices. How good does a copyneed to be? How many dots-per-inch is acceptable? Threehundred? Six hundred? Ten thousand? Are we willing to

accept some quality loss that would result from compressingthe file?

Similar issues obtain with video and sound recordings.These recordings are by definition copies of original perfor-mances, and choices that affect the quality of the reproduc-tion have already been made. For example: How do you de-fine full-motion video? Is it 24 frames per second, thetraditional motion pictures standard, or is it 30 frames persecond, as with modern video? Or is it something higher?Because digital video results in such large files, compressionis almost always used when digitizing, inevitably resulting insome loss of quality. With sound, “there is a general consen-sus that the digital configuration of standard compact discs(44 MHz, 16 bit) is inadequate, but debate [continues] overhow high the sampling rate and word length of digital pres-ervation should be. Many engineers and conservators arguefor a sampling rate of 192 MHz and word length of 24 bits,at a minimum.10

There is at least one other format worthy of mention.“Certain fields, such as genomics, are building massive data-bases that require the attention of information managementspecialists in an academic domain. . .”11 No one would arguethat a Census database needs preserving, but we will needsoftware that can interpret the file format and allow for theability to fully manipulate it.

Over the past decade or two, each of the different mediahas gathered around it a set of file formats. Some have be-come standards, some merely de facto standards due to theirbroad use. HTML is a standard for the Web, but due tosome of its inadequacies, it is changing to XML. Adobe Ac-robat files (“PDF” for portable document format) are oftenused for rendering journal articles, but these same articlescould appear in other formats. Still images can be stored inmany different file types, and this is only a short list: BMP,PDF, JPEG, TIF, GIF. Video files are stored as MOV, AVI,QuickTime, MPEG etc., Sound files include WAV, MP3,and so forth. Databases are flat file or relational, and whiledatabase management systems share common characteristics,differences exist. There are few standards upon which allagree. Who is going to sort all this out?

Fortunately, there is one standard that appears to havegained universal acceptance in the field of digital preserva-tion. I am speaking of the Open Archival Information sys-tems (OAIS) reference model.

This model supplies a conceptual framework for discuss-ing and describing archival practice. OAIS articulates theroles and interrelationships of the three groups that have akey stake in digital process, that is, creator or distributor,user, and repository. The reference model identifies preser-vation as a process that begins when digital information iscreated; this is a critical point of difference from the stan-dard analog model, which considers preservation much laterin the life cycle of an artifact. Finally, the OAIS model iden-tifies the core functions and organizational features of a digi-tal archival repository. This has influenced perceptions ofwhat constitutes a trusted archives. OAIS is on the Interna-tional Organization for Standardization (ISO) standards trackand is the reference model of choice of those involved indigital preservation worldwide.12

A key element to this model is the metaphor of “informa-tion packages.” These packages include the digital objectsthemselves and the metadata describing them. In the OAIS

November 2003 407

Page 4: Being a library of record in a digital age

model, trusted archives receive, store, and distribute thesepackages.13

Current Efforts

Needless to say, digital preservation, whether it be of materi-als that were born digital or born analog and converted todigital, is a massive undertaking. Text, image, video, sound,and the ubiquitous World Wide Web all need preserving, yetthere is little agreement on how to do it, and there are fewuniversally agreed-upon standards.

There are other issues as well. “Most digital informationis owned by someone. . . . Archiving as we generally thinkof it would not be permitted under most contemporary uselicenses.”14 The costs of digital preservation are staggeringas well. It’s enough to make you give up in despair.

Fortunately, organizations and individuals are rising to thechallenge. Publishers, national libraries and other govern-mental agencies, foundations and research libraries, non-profit organizations, and even some enterprising individualsare all assuming key roles in digital preservation.

Publishers. Publishers have strong proprietary interests inpreserving their assets, especially if a possibility exists forreselling one of those assets in the future. Many major pub-lishers, such as Oxford University Press, Reed Elsevier, theAmerican Geophysical Union, and the American PhysicalSociety have committed to digital access for some of theircore publications.15 However, preserving content in a pub-lisher’s own archives, which typically have closed and pro-prietary system architectures, is problematic. A better placeto preserve a publisher’s intellectual assets is in an externalarchive that operates using open standards. Many publishers,however, are loath to do this.16

For those knowledgeable in the field, “the idea that digi-tal preservation, or at least some of its key functions, wouldbecome the responsibility of commercial or nonprofit entitiesthat come and go in the marketplace is unacceptable. Preser-vation, they argue, should be the responsibility of institutionsthat are buffered from the vicissitudes of business cycles.”17

National libraries and other governmental agencies.Several national libraries have extensive digital preservationprograms. In 1996, the National Library of Australia estab-lished PANDORA (Preserving and Accessing NetworkedDocumentary Resources of Australia) archive. PANDORA isa highly-focused collection of online publications that relateto Australia.18 The National Library of the Netherlands isarchiving many digital items, including the online journals ofElsevier Science and Kluwer Academic Publishers. It iswrestling with the challenge of establishing a mass storagesystem and providing leadership to the Networked EuropeanDeposit Library (NEDLIB) Project.19 A Digital PreservationCoalition has been established in the United Kingdom. Thenational libraries of the UK and France also have ambitiousprograms.20

In 2000, the Library of Congress received a congressionalmandate, and the National Digital Information Infrastructureand Preservation Program (NDIIPP) began. Funded with aninitial appropriation of $100 million, the purpose of theNDIIPP was to develop, design, and implement a preserva-tion infrastructure that would create the technical, legal, or-ganizational, and economic means to enable a variety ofpreservation stakeholders to work collaboratively to ensure

the persistence of digital heritage. . . LC has proposed thatsuch sectors as higher education, science, and other aca-demic and research enterprises take primary responsibilityfor collecting, curating, and ensuring the preservation oftheir own information assets, especially those that are notdeposited for copyright protection.21

The LC plan conforms to the OAIS notion of a smallnumber of trusted and certified repositories of digital con-tent.22 These would be the libraries of record for the nation.

Other national agencies have also stepped into the fray.The National Science Foundation (NSF) has had a DigitalLibraries Initiative (DLI) program in place for some time.NSF is also a principal funding agency for digital libraryresearch.23 The National Archives and Records Administra-tion is laying plans for preserving selected digital records ofthe U.S. government.24

Foundations and research libraries. The Andrew W.Mellon Foundation has been interested in preservation formany years. The Foundation has established an archivingprogram for electronic journals, providing funding to sevenresearch libraries to develop different approaches to the is-sue.25

All of these projects are significant, but one research li-brary initiative is, to me, of particular interest: LOCKSS.LOCKSS stands for Lots of Copies Keep Stuff Safe, and itis the brainchild of a Stanford librarian and a friend of herswho happens to be a distinguished engineer at Sun Micro-systems. LOCKSS is based upon the notion that, just likephysical library collections, electronic collections—to besecure—should exist in more than one location.26 By creat-ing an alliance between publishers and libraries, theLOCKSS initiative seeks to create a solution to the digitalpreservation problem by cost sharing and state-of-the-arttechnology. By participating in the LOCKSS Program, li-braries and publishers can make an important contribution tofuture generations.

Libraries install LOCKSS software on inexpensive PCs.Librarians choose what titles they wish to collect and pre-serve for the long term in the LOCKSS user interface. Inaddition to being inexpensive, LOCKSS has the virtue ofbeing fully automated. The LOCKSS system crawls the pub-lisher’s Web site, collecting all HTTP delivered content, in-cluding a variety of file formats (PDF, HTML, JPEG, TIF,audio, video), as new material is published. This material isheld in the local LOCKSS Web cache.

The content on a LOCKSS cache is automatically auditedand preserved. LOCKSS caches continuously talk with eachother to validate that the content is in good repair. If an er-ror is found, an undamaged replacement is obtained fromeither the publisher or one of the other library LOCKSScaches.27 One important point: LOCKSS collects presenta-tion files, that is, the way an item appears on the Web,rather than the source files themselves.28

Non-profit organizations. Other non-profits are also play-ing a role. The Inter-university Consortium for Political andSocial Research (ICPSR) is collecting and preserving eco-nomic survey and data sets. At the Harvard-SmithsonianCenter for Astrophysics, the Astrophysics Data System(ADS) is collecting and indexing astronomy research.29 Oneof the most well-known digital preservation efforts is JS-TOR. While it provides access for its member institutions to

408 The Journal of Academic Librarianship

Page 5: Being a library of record in a digital age

back issues of journals in a variety of disciplines, it has al-ways defined itself primarily as a digital archiving opera-tion.30

An agency that does not itself preserve digital content,but helps others in their efforts is the Digital Library Federa-tion. The DLF is a non-profit consortium that includes anumber of academic research libraries, the Library of Con-gress, the National Archives, the Los Alamos National Labo-ratory Research Library, the California Digital Library,CLIR, OCLC, the Research Libraries Group, and the Coali-tion of Networked Information.31

An individual effort. In 1996, Brewster Kahle, an inven-tor and the founder of Wide Areas Information Services, Inc.(Wais) began an extremely ambitious effort that some haveseen as the answer to the problem of archiving the WorldWide Web. Kahle’s Internet Archive has been crawling theWeb for seven years now, amassing more than 250 terabytesof data, over 2 billion pages and 40 million sites. This effortis a physical as opposed to logical preservation of Webpages, which may mean that some of the Web’s more inter-active content may not be retrievable in years to come32

without first employing some elaborate digital forensics.Still, the Internet Archive is an amazing achievement, show-ing how great the extent one committed individual’s contri-butions can be.

Research. There is a great deal of research underway onthe subject of digital preservation. This research covers suchtopics as archival repositories, persistent identification ofarchived information, longevity testing of digital storage me-dia (magnetic and optical), and ensuring the authenticity ofarchived information.33

What’s A Library To Do?

Libraries have spent the last decade creating digital librariesby cobbling together the electronic offerings of publishers,consortia, and so forth, but they have given very little timeor attention to creating digital archives, which is somethingquite different. Digital libraries focus on access; digital ar-chives focus on preservation.34

Being libraries of record in a digital age is clearly beyondthe capability of most institutions. Fortunately, national li-braries, major research libraries, the associations of scholarlydisciplines, consortia, and other non-profit organizationsseem willing to assume these roles. Remember, in the Li-brary of Congress model for a National Digital InformationInfrastructure and Preservation Program, only a few trustedrepositories are necessary.

Yet if a library has unique items—such as in its ar-chives—it considers worth preserving, and if that libraryintends to use digital means to preserve them, then it mustgive thought to the range of issues described in this article.Librarians aren’t nearly as invested in or as knowledgeableabout this topic as they should be.

For the typical college or university library that does notaspire to be a national repository of digital information, an-other alternative is needed. Fortunately OCLC and RLG aredeveloping preservation services.35 One hopes that these ser-vices will include repositories, conforming to the OAISmodel, into which libraries may deposit their own uniquedigital information.

NOTES AND REFERENCES

1. The Council’s Website (www. clir. org) has a wonderful collec-tion of materials on digital preservation. Many of these areproceedings of conferences sponsored by CLIR and other orga-nizations. I am indebted to the Council for much of my under-standing on this topic. Note that all URLs referenced were validas of August 17, 2003.

2. Peter Lyman, “Archiving the World Wide Web,” in Building aNational Strategy for Preservation: Issues in Digital Media Ar-chiving, Commissioned for and sponsored by the National DigitalInformation Infrastructure and Preservation Program, Library ofCongress (Washington, DC, 2002), Available: http://www.clir.org/pubs/reports/pub106/web.html.

3. Ibid.4. Amy Friedlander, “Summary of Findings,” in Building a National

Strategy for Preservation. Available: http://www.clir.org/pubs/reports/pub106/summary.html.

5. Kenneth Thibodeau, “Overview of Technological Approaches toDigital Preservation and Challenges in Coming Years,” in TheState of Digital Preservation: An International Perspective, April24-25, 2002 (Washington, DC: Council on Library and Informa-tion Resources, 2002), Available: http://www.clir.org/pubs/reports/pub107/thibodeau.html.

6. Ibid.7. Dale Flecker, “Organizational Models for Digital Archiving, ” in

Abby Smith, ed. New-Model Scholarship: How Will it Survive?(Washington, DC, 2003), Available: http://www.clir.org/pubs/reports/pub114/appendix1.html.

8. Daniel Greenstein and Abby Smith, “Digital Preservation in theUnited States: Survey of Current Research, Practice, and Com-mon Understanding,” in Abby Smith, ed. New-Model Scholar-ship. Available: http://www.clir.org/pubs/reports/pub114/appendix2.html.

9. Abby Smith, New-Model Scholarship. Available: http://www.clir.org/pubs/reports/pub114/newmod.html.

10. Samuel Brylawski, “Preservation of Digitally Recorded Sound,”in Building a National Strategy for Preservation. Available:http://www.clir.org/pubs/reports/pub106/sound.html.

11. Smith, New-Model Scholarship. Available: http://www.clir.org/pubs/reports/pub114/newmod.html.

12. Greenstein and Smith, “Digital Preservation in the United States:Survey of Current Research, Practice, and Common Understand-ing.”

13. Dale Flecker, “Preserving Digital Periodicals,” in Building aNational Strategy for Preservation. Available: http://www.clir.org/pubs/reports/pub106/periodicals.html.

14. Flecker, “Organizational Models for Digital Archiving.”15. Smith, New Model Scholarship. Available: http://www.clir.org/

pubs/reports/pub114/approaches.html.16. Donald Waters, “Good Archives Make Good Scholars: Reflec-

tions on Recent Steps Toward the Archiving of Digital Informa-tion,” in The State of Digital Preservation. Available: http://www.clir.org/pubs/reports/pub107/waters.html.

17. Smith, New Model Scholarship. Available: http://www.clir.org/pubs/reports/pub114/newmod.html.

18. Colin Webb, “Digital Preservation—A Many-Layered Thing: Ex-perience at the National Library of Australia,” in The State ofDigital Preservation. Available: http://www.clir.org/pubs/reports/pub107/webb.html.

19. Titia van der Werf, “Experience of the National Library of theNetherlands,” in The State of Digital Preservation. Available:http://www.clir.org/pubs/reports/pub107/vanderwerf.html.

20. Neil Beagrie, National Digital Preservation Initiatives: An Over-view of Developments in Australia, France, the Netherlands, andthe United Kingdom and of Related International Activity, (Wash-ington, DC: Council on Library and Information Resources and

November 2003 409

Page 6: Being a library of record in a digital age

Library of Congress, 2003), Available: http://www.clir.org/pubs/reports/pub116/contents.html.

21. Library of Congress 2003, Quoted in Smith, New Model Schol-arship. Available: http://www.clir.org/pubs/reports/pub114/approaches.html.

22. Ibid.23. Smith, New Model Scholarship. Available: http://www.clir.org/

pubs/reports/pub114/approaches.html.24. Greenstein and Smith, “Digital Preservation in the United States:

Survey of Current Research, Practice, and Common Understand-ing.”

25. The New York Public Library and the university libraries ofCornell, Harvard, Massachusetts Institute of Technology [MIT],Pennsylvania, Stanford, and Yale.

26. Chris Dobson, “the Story of LOCKSS,” Searcher 11 (2003): 50.27. This description of how LOCKSS works is based upon corre-

spondence in July 2003 between the author and Vicky Reich, theStanford librarian who helped conceive the project.

28. Smith, New Model Scholarship. Available: http://www.clir.org/pubs/reports/pub114/approaches.html.

29. Flecker, “Organizational Models for Digital Archiving.”30. Smith, New Model Scholarship. Available: http://www.clir.org/

pubs/reports/pub114/approaches.html.31. http://www.diglib.org/about.htm.32. Smith, New Model Scholarship. Available: http://www.clir.org/

pubs/reports/pub114/approaches.html.33. Greenstein and Smith, “Digital Preservation in the United States:

Survey of Current Research, Practice, and Common Understanding.”34. Deanna Marcum, “The Preservation of Digital Information,” The

Journal of Academic Librarianship 22 (1996): 452.35. Smith, New Model Scholarship. Available: http://www.clir.org/

pubs/reports/pub114/approaches.html.

410 The Journal of Academic Librarianship