Upload
dennis-oliver
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
CLOCKSS, LOCKSS & Barrels of Stuff: Libraries and Publishers share the Task
Peter BurnhillEDINA, University of Edinburgh
ICOLC, Rome, 13 October 2006
Overview
1. Introduction
2. What is CLOCKSS?
3. CLOCKSS Project
4. Progress Report
5. Recap & look to the future
6. How you can engage with CLOCKSS
1. Introduction
• Director, EDINA National Data Centre (1 of 2 in UK) www.edina.ac.uk Based at University of Edinburgh Designated & largely funded by the JISC I could say a lot more about EDINA (if given half a chance…)
• SUNCAT (national serials union catalogue), OpenURL Router, Onix for Serials, etc
• Member of Information Services Directorate (my other 50%), University of Edinburgh Research-led university in Scotland’s capital city
• Active in CURL & SCURL• Contributes to JISC work • Keen to engage in international initiatives
Could also speak of the Digital Curation Centre …
• But here to speak of CLOCKSS, a University, Research Library, Learned Society and Publisher initiative
2. What is CLOCKSS?
Public good solution to problem of global significance
How to preserve & ensure continuing access to electronic scholarly content
• Partly an organisational solution Collaboration and shared governance between
Libraries & Publishers
• Partly a technical solution LOCKSS technology
In short, it’s Controlled use of LOCKSS
So, what is LOCKSS?
“Lots of Copies Keep Stuff Safe”
Digital Preservation Infrastructure
Decentralized, Peer to Peer, Continuous Content Audit & Repair
“computers chattering away to one another across the Internet”
Open Source
3. The CLOCKSS Project
Two-year (2006 -) demonstrator project that intends:
• open reporting of progress & outcome• a public demonstration that this solution really
can be trusted for the long term• scalability in terms of publisher content &
library deployment
The project was first funded by its participants, now with additional NDIPP grant support from the Library of Congress to assist reporting.
Who is in CLOCKSS?
Consortium acting on behalf of the wider community of libraries and publishers• was 6 Libraries and 6 Publishers
– Including learned societies acting as publishers
• now 7 Libraries and 12 Publishers• will be …. sufficient to cover the bases
Commitment based on stewardship of libraries & responsibility of publishers
Libraries
University of Edinburgh New York Public Library Indiana University Rice University Stanford University University of Virginia + OCLC (recently joined as the 7th)
• Aim to add more to cover ‘tecktonic plates’ of all types of geography
Publishers
Blackwell PublishingElsevier Nature Publishing Group Oxford University PressSAGE Publications Springer Taylor and Francis John Wiley & SonsAmerican Chemical AssociationAmerican Medical Association American Physiological Society Institute of Physics +aim to add all the rest …
Equal Partners
Librarians have made a strategic decision, with publishers, that retains their role as stewards, as memory institutions
Publishers have made strategic decision to trust and engage those libraries, committing to prospect of continuing access
Both are exploring social and technical models over an initial two-year period, working to build a full-scale production system
Costs of the initiative are shared equally between the parties, with additional funds to support for audit & reporting from NDIPPNational Digital Information Infrastructure and Preservation Program
administered by the US Library of Congress
Agreed Mission
“CLOCKSS is a not-for-profit community partnership between publishers and libraries that is developing a distributed, validated, comprehensive archive that preserves and ensures continuing access to electronic scholarly content”
Community Governance
• Governed by both library and publisher partners• Each partner represents an organization but
collectively represents each sectorLibraries & Publishers (& Learned Societies?)
• No one-single point of failure or institutional interest will prevent long-term governance
• Consensus driven, united for support of scholarly communication over the long term
Complementary to territorial arrangements for legal deposit
Format Migration
Ingest format from publishers (during the project) of both/either:
1. as delivered to the Web2. as XML source files
Access format is “on the fly” When content is requestedProcess is transparent to the reader
http://www.dlib.org/dlib/january05/ rosenthal/01rosenthal.html
Reduce the cost of ingest• allowing more material to be preserved
value for moneyPostpone costs of migration• taking advantage both of the time value of
money, and of the technology cost curve.Migrate material upon reader request• vastly lowering amount of content that needs to
be processed Allow what the reader sees to be the result of
best available technology at time of access
Preserve the original look-and-feel, (which can be a large part of the value)
4. Progress Report
• We are up and running, with two LOCKSS boxes per Library Partner, and one for observation at each Publisher
• We are ingesting content from the Publishers• The (C)LOCKSS boxes are chattering away• We meet via teleconference on weekly basis• We are readying to simulate our first ‘trigger event’• We are beginning to report and preparing to audit
• The CLOCKSS is ticking …
Incoming email, 9 October 2004
Dear CLOCKSS technical group,We are pleased to announce the release of more content to the CLOCKSS network.Today we have released additional content for the four previously released titles, three are from Oxford University Press and one from SAGE Publications:•Age and Ageing (OUP)•Environment & Urbanization (SAGE)•Journal of Experimental Botany (OUP)•Toxicological Sciences (OUP)CLOCKSS libraries
The CLOCKSS system works differently from the LOCKSS system. The titles will be automatically configured for your CLOCKSS boxes. You NEED NOT DO ANYTHING -- the content will be automatically ingested and preserved. … we will be paying close attention to make sure that harvesting and auditing is going smoothly. In the future we willtransition some of this responsibility [to] each institution hosting CLOCKSS boxes. We will send details on how to do that in the future. We are working to develop and test plugins for titles that currently have manifest pages. We will be releasing these in the near future.
Thomas S. [email protected] Director & Technical Manager,
5. Recap & look to future
• What’s the Problem?
• What are the Threats?
• Need for Public Good
• Trust in Library Stewardship
• Purpose of CLOCKSS
• Strategies
• Transition to full production
What’s the Problem?
Coming of the digital and the Web accidentally changed the business relationship between librarians and publishers.
• With rare exceptions, libraries no longer take physical custody of the content, but provide access to web materials
• This has disrupted the role libraries have played in society for hundreds of years as trusted keepers of information and culture
• There is concern that was is now digital may cease to be availableOur digital cultural and intellectual heritage is at risk.
What are the Threats?
Continuous and Abrupt changes
• Technology storage media, hardware, software, formats
• Commerce here one day, gone the next
• Organizations (even Institutions) shifting priorities, politics, staffing
• Natural disasters• Human folly: errors and attacks
Need for Public Good
To ensure• Content kept safe on behalf of scholarly
community• Global access to content on a continuing
basis• The ‘trigger event’
Establish a self-sustaining large dark archive • That keep costs low, with revenue to sustain
operations and access
Trust in Library Stewardship
Decentralized, to gain leverage from existing infrastructure of libraries
• The libraries hold content, act as custodians • On trigger event
Board insures content becomes available again
• No one-single point of failure, nor institutional interest will prevent provision of access
• Libraries are here for the long term
Transition to full production
Review strategies
• Replication - more copies are safer• Migration - move copies forward in time• Transparency - open source software• Diversity - no single point of failure• Audit – to confirm data really is preserved• Sustainable economics - cost effective
processes, more materials preserved per Euro/Dollar/Pound
Review Vision
Comprehensive • How much is all?
Need to define scope and ambition• Once content is in the archive it stays in the archive
Stock and flow • Will be available in perpetuity for use by the (dark) archive
Need to ensure access by ‘loss’ trigger Need to investigate ‘end-licence’ trigger of back runs?
• Will serve as a secure backup to world-wide e-copies of material
Implications of these 4 statements are being investigated during the two-year project
Purpose of CLOCKSS
to preserve content over time, & ensure there is always prospect of service access
Being .. comprehensive of all electronic scholarly content
• with keen focus on published journal articles & the like
globally secure against regional disaster• of all types: natural, commercial, political• through deployment across ‘tecktonic plates’ of all
forms of geography
6. How you can engage with CLOCKSS
www.lockss.org/clockss/talkbackBe a clockss-watcher!
Let us have ideas & feedback; register interest to be an Associate ; undertake advocacy
In turn, we will use your support to build the community archive to preserve and ensure continuing access to electronic scholarly content.
(Controlled) Lots of Copies
(to) Keep Stuff Safe
Publisher buy-in, shared governance
Many geographically distributed sites • guarding against catastrophic failure
natural or man-made
Many independently administered repositories• guard against “insider attacks”
Many open source software contributor's (eyes and minds)
• guard against technical arrogance
I should end here
I do have additional slides on:
• Look and Feel to Readers
• Fancy graphics
But I should close & invite questions
Join Us
Look and Feel to Readers
• When content is served to the user from a LOCKSS & CLOCKSS Box Look and feel is as close as
possible to what the publisher published
Preserve content & presentation
Format Migration
Ingest format (during the project) both/either:1. As delivered to the Web2. XML source filesAccess format is “on the fly” When content is requestedProcess is transparent to the reader
http://www.dlib.org/dlib/january05/ rosenthal/01rosenthal.html
Reduce the cost of ingest• allowing more material to be preserved (VFM)Postpone costs of migration• taking advantage both of the time value of
money, and of the technology cost curve.Migrate material upon reader request• vastly lowering amount of content that needs to
be processed Allow what the reader sees to be the result of
best available technology at time of access
Preserve the original look-and-feel, which can be a large part of the value
Comprehensive
• Once content is in the archive it stays in the archive
• Will be available in perpetuity for use by the archive
• Will serve as a secure backup to world-wide e-copies of the material