8
Ontario Government documents repository D-Space pilot project Rea Devakos Information Technology Services, University of Toronto, Toronto, Canada, and Annemarie Toth-Waddell Legislative Library, Toronto, Canada Abstract Purpose – This paper aims to describe a project to increase access and longevity of electronic government documents. Design/methodology/approach – The Ontario Legislative Library has partnered with the Ontario Council of University Libraries to extend the longevity and accessibility of electronic government documents using DSpace. Findings – Digital repository software, such as DSpace, can be used to extend access to, and longevity of, special collections. Research limitations/implications – The case study may be specific to local practices and the institutions involved. Practical implications – The consortial approach builds on existing practices to build a cost effective and sustainable service. Originality/value – Many projects in electronic government document preservation and access require large investments. This project leverages existing practices and resources. Keywords Document management, Archives management, Digital storage, Government, Canada Paper type Case study Digital preservation presents huge challenges for individual libraries and archives. No where is this more manifest than in electronic government documents that vanish rapidly from web sites with governments seemingly giving little thought to longevity, reliability and authenticity. Governments worldwide are increasingly publishing documents electronically – sometimes to the exclusion of print. In turn, libraries worldwide are grappling with how to provide persistent access to electronic government documents in a timely and cost effective manner. Digital preservation strategies, particularly for e-government, have been described as “labour-intensive, time consuming and costly, and need to be applied from the moment of creation of the materials, and then continued indefinitely” (Cunningham and Phillips, 2005, p. 304). Furthermore, current strategies are seen as rudimentary. Hubbertz’s Cookbook for a Basic Collection of Web-based Government Information provides a starting point while recognizing that “a more fully satisfactory methodology would include systematic collection of preservation metadata and formal agreements with government institutions for collection and preservation of databases, interactive services, and similar material. That lies in the future” (Hubbertz, 2005, p. 16). This paper describes a project that provides an initial, and admittedly incomplete, first step to this critical issue. The Ontario Legislative Library (OLL) has partnered with the Ontario Council of University Libraries (OCUL) to provide long-lived The current issue and full text archive of this journal is available at www.emeraldinsight.com/1065-075X.htm OCLC 24,1 40 Received October/ November 2006 Revised November 2006-February 2007 Accepted December 2006-March 2007 OCLC Systems & Services: International digital library perspectives Vol. 24 No. 1, 2008 pp. 40-47 q Emerald Group Publishing Limited 1065-075X DOI 10.1108/10650750810847233

Ontario Government documents repository D‐Space pilot project

Embed Size (px)

Citation preview

Ontario Government documentsrepository D-Space pilot project

Rea DevakosInformation Technology Services, University of Toronto, Toronto, Canada, and

Annemarie Toth-WaddellLegislative Library, Toronto, Canada

Abstract

Purpose – This paper aims to describe a project to increase access and longevity of electronicgovernment documents.

Design/methodology/approach – The Ontario Legislative Library has partnered with the OntarioCouncil of University Libraries to extend the longevity and accessibility of electronic governmentdocuments using DSpace.

Findings – Digital repository software, such as DSpace, can be used to extend access to, andlongevity of, special collections.

Research limitations/implications – The case study may be specific to local practices and theinstitutions involved.

Practical implications – The consortial approach builds on existing practices to build a costeffective and sustainable service.

Originality/value – Many projects in electronic government document preservation and accessrequire large investments. This project leverages existing practices and resources.

Keywords Document management, Archives management, Digital storage, Government, Canada

Paper type Case study

Digital preservation presents huge challenges for individual libraries and archives. Nowhere is this more manifest than in electronic government documents that vanishrapidly from web sites with governments seemingly giving little thought to longevity,reliability and authenticity. Governments worldwide are increasingly publishingdocuments electronically – sometimes to the exclusion of print. In turn, librariesworldwide are grappling with how to provide persistent access to electronicgovernment documents in a timely and cost effective manner. Digital preservationstrategies, particularly for e-government, have been described as “labour-intensive,time consuming and costly, and need to be applied from the moment of creation of thematerials, and then continued indefinitely” (Cunningham and Phillips, 2005, p. 304).Furthermore, current strategies are seen as rudimentary. Hubbertz’s Cookbook for aBasic Collection of Web-based Government Information provides a starting point whilerecognizing that “a more fully satisfactory methodology would include systematiccollection of preservation metadata and formal agreements with governmentinstitutions for collection and preservation of databases, interactive services, andsimilar material. That lies in the future” (Hubbertz, 2005, p. 16).

This paper describes a project that provides an initial, and admittedly incomplete,first step to this critical issue. The Ontario Legislative Library (OLL) has partneredwith the Ontario Council of University Libraries (OCUL) to provide long-lived

The current issue and full text archive of this journal is available at

www.emeraldinsight.com/1065-075X.htm

OCLC24,1

40

Received October/November 2006Revised November2006-February 2007Accepted December2006-March 2007

OCLC Systems & Services:International digital libraryperspectivesVol. 24 No. 1, 2008pp. 40-47q Emerald Group Publishing Limited1065-075XDOI 10.1108/10650750810847233

worldwide access to electronic provincial government documents. Both institutionshave built on existing expertise, and perhaps most importantly, on existing resourcesto build a sustainable service.

BackgroundOntario’s 12.5 million people make this Canada’s most populous province. The OntarioLegislative Library’s 67 staff members provide research and information services tothe province’s 103 Members of the Provincial Parliament (MPPs), legislativecommittees and Legislative Assembly staff. Being the descendant of theparliamentary libraries of the province of Upper Canada (1792-1841) and of theunited Province of Canada (1841-1867), the Library has the most extensive collection ofOntario documents available. These documents are collected to serve the Library’sprimary clientele, but access is also made available to the public through theinternet-accessible catalogue as well as through catalogue record extracts of newlypublished documents that are submitted to POOL, Publication Ontario’s onlineordering tool. With the increase of government internet publishing in the late 1990sand the move towards eliminating print versions entirely, the Library was presentedwith the major challenge of finding a way to continue to build its Ontario documentscollection while meeting clients’ increasing preference for timely, desktop access tothese documents.

Origins of the project: OLL repository detailsBy 1999 various library organizations began to approach the Legislative Library to seeif it could provide assistance with preserving Ontario documents in electronic form forthe long term. The Library agreed the time had come to proceed with a pilot project tobuild a documents repository starting with monographs. With the advice andassistance of other libraries that had started archiving programs and after someextensive research on the internet, staff gradually developed a practical set ofprocedures for archiving documents that met the key requirements that these newprocedures be simple, streamlined and integrated with acquisitions and cataloguingworkflows.

In July 2000, the Library began to formally build a repository of born digital Ontariomonographs that also included a few election campaign web sites. Since then the scopehas expanded to include all of the Ontario serials the Library collects that can becaptured in electronic format as well as all Ontario government press releases.Selection of documents is based on the collection policy goal of maintaining anextensive collection of Ontario documents that is not only of value to clients, but alsopreserves the publishing output of the province and has long-term significance forscholars, researchers, members of the public and public servants. In the rare instancethat the Library collects a document that is not intended or not yet ready for publicdistribution, the archived document is stored on a restricted access portion of therepository server.

Initially, staff attempted to archive documents in the format they were posted in, butHTML documents proved to be difficult and time consuming to capture since manyfiles and associated links to various other HTML, JPEG files, etc. are often required tobuild a complete document. The practical solution adopted was to provide staff withthe Adobe Acrobat software program so that they could convert HTML files into PDF

OntarioGovernmentpilot project

41

when necessary. PDF use was widespread and research indicated it was a format likelyto be supported for the foreseeable future, so the decision was made that all files addedto the repository would be in pdf format. The PDF standard has been published and isfreely available, so it may be expected there will be sufficient demand for either aviewer and/or migration path to be developed for this format in future.

A total of roughly 4,500 Ontario digitally born documents are catalogued each year,the majority of which are archived. Dynamic content and integrating resourcescontinue to present problems for capture so these are catalogued and linked to theoriginal site, but excluded from the repository. Currently, to build the Ontariodocuments repository, a team of six staff members monitors 85 Ontario governmentweb sites and almost 1,000 pages on a daily basis using web site monitoring software.As soon as a document is discovered, it is captured or converted into PDF format. Thegoal is to catalogue and archive all new Ontario monographs within the next day orsooner after they are discovered. Libraries that rely on the repository for cataloguingcopy can expect to find full catalogue records with LC subject access within one monthof the document capture. The existing cataloguing policy of creating just one record forboth the print and electronic format with the description being based on the printformat has been maintained. This is being done partly for workload reasons, but alsoto meet the staff and clients’ stated preference for this type of access. This policy hasproved problematic in the context of the DSpace pilot project.

Use of the repository continues to grow with visits from clients as well as librarystaff now reaching over 200,000 hits in the past year. As of March 2007, the LegislativeLibrary has archived some 13,000 monographs, 900 serial titles, 7,500 individual serialtitles and 7,200 press releases. Archiving procedures have been so tightly integratedwith workflows that it is difficult to determine exactly how much time is spent onbuilding the repository or to separate out the time spent on the acquisition andcataloguing of print titles versus electronic. However, a rough estimate of time spent tobuild and maintain both the print and electronic Ontario document repository is 2.8FTE cataloguing technicians, two FTE acquisitions technicians and one FTE librarian.With the addition of only one cataloguing technician to their staff complement to allowthem to keep up with the increase in internet publishing in general, the Library hasbeen able to create a resource that provides timely access to Ontario internetdocuments.

OCUL & OLL partnershipBy 2006, the Legislative Library had successfully reached a stage where they couldreliably maintain the Ontario document repository and meet clients’ access needs.Nevertheless, library staff knew that still more needed to be done to ensure thatcaptured documents were preserved in the long term. Concern among OCUL librariesthat government documents were disappearing led to the first talks between OCUL andthe Legislative Library in the spring of 2004 on the possibility of creating a sharedrepository of Ontario documents. The Legislative Library had an interest in exploringpartnerships with other organizations that might ensure the repository they had builtcould be preserved in the long term as well as exploring the options for creating betterpublic access to this valuable resource. OCUL offered the possibility of conducting atest using a DSpace repository, named OZone, to determine whether this softwarecould be used. To this end, an informal agreement was struck to conduct a pilot project.

OCLC24,1

42

A small working group was put together which including IT, technical and publicservice staff.

OCUL is a consortium of 20 public university libraries in the province of Ontario,Canada. Begun in 1967, OCUL member libraries serve over 360,000 FTE students.Libraries cooperate to enhance information services through resource sharing,collective purchasing, document delivery and many other similar activities. OCULlocally loads over 11 million articles from over 7,300 journals and supports over 21,000registered users of RefWorks. Scholars Portal offers integrated single point access to 65million citations from over 50 A&I databases. In 2005 OCUL received the CanadianAssociation of College and University Librarians’ Innovation Achievement Award.

DSpaceOZone, OCUL’s shared repository, is an installation of DSpace. Dspace is an opensource institutional repository platform originally developed by MIT and HewlettPackard. There are over 250 live installations of DSpace, including consortial orassociations such as the Washington Research Library Consortium, the Alliance forInnovation in Science and Technology Information, and the Texas Digital Library. Inaddition to Ontario provincial government documents, the OZone repository alsohouses collections on information and data literacy.

For the purposes of this project, DSpace’s most important functions are to facilitatepreservation and access to digital objects. DSpace has a number of preservationfeatures including the ability for libraries to set preservation support by file type,checksums to ensure file authenticity, and persistent identifiers. A number of thesearchival features are also cited in Illinois’s Electronic Document Initiative (EDI) asincreasing security and permanent access to state documents. (Frankenfeld, 2005).DSpace uses open standards to facilitate interoperability and hence makes it easy tore-use metadata and for search services, such as Google, to crawl content.

DSpace is organized by the concept of communities – often corresponding toadministrative units within an organization. Each community can have multiplecollections and subcollections. Single items may reside in more than one community orcollection. While this hierarchical and nested structure may seem ideal for governmentdocuments, it was quickly rejected by the working group. As most librarians wellknow, government departmental names and structures are far from stable. Since thisproject is being undertaken with no additional funding, required upkeep was deemedfar too onerous. Instead, a single Ontario government documents collection andcommunity is planned. Unfortunately, neither the OLL nor OCUL could commit thestaffing to maintain a more appropriate and intricate structure.

Meshing practicesThe OLL catalogues all government documents using MARC while DSpace’s metadataschema is Dublin Core. The working group reviewed the Library of Congress’ Marc toDublin Core cross walk to ensure that the MARC cataloguing metadata mapped to theappropriate DC elements. The mapping of the MARC catalogue records for monographrecords proved to be fairly straightforward. However, OLL’s local practice of creatingone record to describe both the print and electronic format results in some of themetadata relating to the print format appearing together with that applicable to the

OntarioGovernmentpilot project

43

electronic format in the DSpace record (see Figures 1 and 2 for an example of an OLLrecord that has been mapped from MARC to Dublin Core.)

For the purposes of this project, OLL is responsible for identifying, fully cataloguingand saving Ontario government documents to a local server. The original hope wasthat OCUL would host the “working” copy of the documents, but various issues madethis impractical. The DSpace batch loading utility will be used periodically to harvestfiles and metadata. Unlike most American documents, copyright provisions varybetween departments and agencies in Ontario, so OLL staff also verify and/or clear

Figure 1.MARC record as itappears in the OLLcatalogue

Figure 2.MARC to Dublin coremapping in OZone

OCLC24,1

44

copyright before archiving. Crown copyright is retained by the province of Ontario.Publications Ontario has stipulated conditions of use including a prominent copyrightstatement. In addition, documents may only be used for non-commercial purposes andcannot be sold without the expressed written consent of the government. A notice tothat effect accompanies every item.

OCUL is responsible for maintaining access and preserving said documentsthrough their repository. It is envisioned that OCUL government document librarianswill supplement the collection development responsibilities of OLL staff by, forexample, collecting municipal documents. It was thought prudent, however, to startwith an existing collection. Metadata will be available for export to OCUL memberlibrary catalogues. The division of duties bears some similarity to the TRAILrepository for Texan government documents, jointly administered by the Texas StateLibrary and Archives Commission and the University of North Texas Libraries(Hartman and Condrey, 2004)

SerialsAs expected, serial archiving proved to be a major challenge. Other libraries consultedby the OLL either chose to create a bibliographic record for the serial title with linksfrom item records to individual serial issues or they linked to an intermediatehard-coded HTML web page which served as an index page for the archived serialissues. The original procedure the Legislative Library chose to follow was to codeHTML index pages for each serial title, but this soon proved to be far too labourintensive. A method was developed that eliminates the need to maintain thishard-coded index page by automating the index creation through a script process thatuses a combination of asp and Microsoft Access. Index pages are now createddynamically. Using this process, each serial title is given a folder name based on therecord control number. When archiving, staff assign filenames to individual issues thatallow the issues to file in descending chronological order and then place the file in theappropriate folder. When a catalogue user selects a serial to view, the request invokes ascript that makes a call to a Microsoft Access database that supplies the “friendlyname” of the serial. The script also lists results to the user’s screen in the form of anindex page. If the serial has a frequent publication pattern, annual sub-folders can becreated to further organize the issues. For an example, see the title Monthly MarketReport in the Legislative Library’s OPAC: www.ontla.on.ca/web/go2.jsp?locale ¼ en&Page ¼ /lao-organization/catalogue/library_catalogue&menuItem ¼lao-organization

Serial archiving was carefully examined. While several institutions, including theUniversity of Toronto, use DSpace to archive serials it was felt that the “fit” was notgood enough. The OLL not only added electronic serials to existing cataloguingrecords for the print format, but also archived on an issue basis and added each issue toa single catalogue record. Most journal archiving in DSpace installations has been atthe article level. On the surface, the DSpace equivalent were multi file items. However,once a multi file object has been archived, additional files may not be added to the samerecord. Frequent title changes were also seen as problematic. OLL and OCUL are stilllooking for a viable solution. For similar reasons, dynamic sites and integratingresources have yet to be addressed.

OntarioGovernmentpilot project

45

ConclusionIn 2005 Hubbertz identified a number of barriers to starting a collection of electronicgovernment documents including the lack of established standards, special purposetools and recognized best practices. In response to this void, most Canadian legislativelibraries’ digital collections were organized to maintain continuity with the print(Johnston, 2005). In many respects the current project continues the print tradition –the official collection remains at the Legislative Library. OCUL now provides satelliteaccess in the tradition of depository libraries. For now. And “for now” is an importantdistinction in the virtual world. OCUL and OLL view their Ontario GovernmentDocuments Repository as an important first step in increasing the accessibility andlongevity of provincial government documents. The hope is that this partnership willnot only provide immediate benefits but also serve as a basis to jointly explore “a morefully satisfactory methodology” (Hubbertz, 2005, p. 16).

Web sites (as listed in the paper)

Ontario Legislative Library – www.ontla.on.ca/web/go2.jsp?locale ¼ en&menuItem ¼ lao_header&Page ¼ /lao-organization/office_of_the_assembly_main

Catalogue – www.ontla.on.ca/web/go2.jsp?locale ¼ en&Page ¼/lao-organization/catalogue/library_catalogue&menuItem ¼ lao-organization

Publications Ontario On-Line (POOL) – www.publications.gov.on.ca/english/

Ontario Council of University Libraries – www.ocul.on.ca/

OZone – https://ospace.scholarsportal.info/

DSpace – www.dspace.org/

Dublin Core Metadata Initiative – http://dublincore.org/

Washington Research Library Consortium – www.wrlc.org

Alliance for Innovation in Science and Technology Information –https://repository.lanl.gov/index.jsp

Texas Digital Library – www.tdl.org/index.html

Dublin Core/MARC/GILS Crosswalk – www.loc.gov/marc/dccross.html

T-Space – http://tspace.library.utoronto.ca

References

Cunningham, A. and Phillips, M. (2005), “Accountability and accessibility: ensuring the evidenceof e-governance in Australia”, ASLIB Proceedings, Vol. 57 No. 4, pp. 301-17.

Frankenfeld, C. (2005), “Electronic documents initiative”, Illinois Libraries, Vol. 85 No. 4, pp. 9-11.

Hartman, C.N. and Condrey, C. (2004), “TRAIL: from government information locator service toelectronic depository program for Texas State publications”, DttP, Vol. 32 No. 2, pp. 22-7.

Hubbertz, A. (2005), Cookbook for a Basic Collection of Web-based Government Information,Canadian Association of Research Libraries, Ottawa.

Johnston, L. (2005), “Preservation of born-digital government publications in Canadianjursidictions”, DttP: Documents to the People, Vol. 33 No. 4, pp. 13-15.

OCLC24,1

46

About the authorsRea Devakos is Coordinator, Scholarly Communication Initiatives, University of TorontoLibraries. Rea manages the scholarly communication initiatives of the University of Torontoincluding the institutional repository, journal and conference proceeding publishing platforms.In addition, she is involved in several consortial efforts in electronic publishing and archiving.Rea is the corresponding author and can be contacted at: [email protected]

Annemarie Toth-Waddell is Information Access Manager and RIM Manager for theLegislative Assembly, Legislative Library, Legislative Assembly of Ontario. Annemarie’sresponsibilities range from managing the section responsible for creating the Library’s Ontariodocuments repository to working on projects to improve data organization and access toinformation and records at the Library and Assembly level.

OntarioGovernmentpilot project

47

To purchase reprints of this article please e-mail: [email protected] visit our web site for further details: www.emeraldinsight.com/reprints