View
208
Download
0
Tags:
Embed Size (px)
Citation preview
Preserving The Integrity of The Scholarly Record
http://www.flickr.com/photos/shinez/5000985919/
Peter Burnhill, EDINA @ University of Edinburgh
NaAonal Library of Scotland George IV Bridge 5.30pm 16th February
Preserving The Integrity of The Scholarly Record
http://www.flickr.com/photos/shinez/5000985919/
Peter Burnhill, EDINA @ University of Edinburgh
NaAonal Library of Scotland George IV Bridge 5.30pm 16th February
Take Home Message: 1) Archive Streams of Issued Content 2) Avoid Reference Rot
The Scholarly Record & Serials … [a focus on the digital]
‘The Scholarly Record’ has a fuzzy edge
‘e-‐journals’
Websites, Databases, Repositories
‘Book-‐length work’
The Scholarly Record & Serials … [a focus on the digital]
ConAnuing Resources, inc. Serials
‘The Scholarly Record’ has a fuzzy edge
‘e-‐journals’
Websites, Databases, Repositories
‘Book-‐length work’
The Scholarly Record & Serials … [a focus on the digital]
ConAnuing Resources, inc. Serials
‘The Scholarly Record’ has a fuzzy edge
Issued in Parts (Serials)
Content changes over Ame
(IntegraAng)
‘e-‐journals’
Websites, Databases, Repositories
‘Book-‐length work’
The Scholarly Record & Serials … [a focus on the digital]
ConAnuing Resources, inc. Serials
‘The Scholarly Record’ has a fuzzy edge
Other ‘resources needed
for scholarship’ Issued in Parts (Serials)
Content changes over Ame
(IntegraAng)
‘e-‐journals’
Websites, Databases, Repositories
‘Book-‐length work’
‘Gov Docs’
1. What exactly is the scholarly record? • What of that now ‘issued on the Web’?
• And what if we limit focus to what could get an ISSN?
2. Whose responsibility is it to act as steward? Each research library; library consorAa;
naAonal/state libraries/archives?
& is this a naAonal, or a trans-‐naAonal challenge?
The following quesAons are implicit:
An Article, once available in print on-shelf locally …
… is now online & accessed remotely,
‘anytime/anywhere’ => Improved Ease of Access J
But what of Continuity of Access? Will it be still be there tomorrow?
Libraries boast of ‘e-collections’, but maybe now they only have ‘e-connections’
Picture credit: hgp://somanybooksblog.com/2009/03/27/library-‐tour/
=> real & present danger for the integrity of what is published as scholarly record
10
This is a global challenge: trans-national action
%age of 132,806 ISSN issued for e-serials (December 2013)
US: 20% UK: 8.6%
Rest of World: 71%
Researchers (& libraries/publishers) in any one country are dependent upon content written and published as
serials in countries other than their own
So, who is offering digital shelving?
① Web-scale not-for-profit archiving agencies:
② National libraries …
③ Research libraries: consortia & specialist centres …
Ingesting content with archival intent …
National Science Library, Chinese Academy of Sciences
National Science Library, Chinese Academy of Sciences
Many archiving organisations a Good Thing
“Digital information is best preserved by replicating it at multiple archives run by autonomous organizations”
B. Cooper and H. Garcia-Molina (2002)
Some bad stuff will happen!
A Project to Pilot an E-‐journal PreservaAon Registry Service
Need to know who is looking after what & how?
ISSN Register
E-J Preservation Registry Service
E-Journal Preservation
Registry
user requirements
(a)
(b)
ISSN-‐L as kernel field
METADATA on extant e-serials
METADATA on preservation action
Digital Preservation Agencies
Pilot: CLOCKSS, Portico; BL, KB; UK LOCKSS Alliance
A Project to Pilot an E-‐journal PreservaAon Registry Service
Need to know who is looking after what & how?
ISSN Register
E-J Preservation Registry Service
E-Journal Preservation
Registry
user requirements
(a)
(b)
ISSN-‐L as kernel field
METADATA on extant e-serials
METADATA on preservation action
Digital Preservation Agencies
Pilot: CLOCKSS, Portico; BL, KB; UK LOCKSS Alliance
A Project to Pilot an E-‐journal PreservaAon Registry Service
Need to know who is looking after what & how?
The Keepers Registry
"Tales from the Keepers Registry"
Serials Review 39.1 (2013)
… to discover who is looking a5er what
thekeepers.org as Global Monitor
*New in 2014*
Library of Congress and Scholars Portal now reporAng in
e-‐journals should be easy – right?
the Keepers Registry recorded
In 2011, 16,558 Atles ‘ingested & archived’ by at least 1 ‘keeper’
in 2013, 21,557 in 2014, 26,195 now 26,712
9,731 'ingested & archived' by 3+
… more archiving & as more archives report into Registry !
Some signs of Progress:
Wrigen & produced by Julie Brown, 1989
“Are we there yet?” … “Don’t think so”
‘Ingest Ratio’ = titles being ingested by one or more Keeper / ‘online serials’ in ISSN Register
= 26,195 / 136,965 [in March 2014]
=> 19% (We do not know about 80% of all resources having ISSN)
‘KeepSafe Ratio’ = titles being ingested by 3+ Keepers / ‘online serials’ in ISSN Register
= 9,656 / 136,965
=> 7%
Evidence on what libraries care about
Using Title List Comparison tool in Members Area of Keepers Registry As reported in: P. Burnhill (2013) Tales from The Keepers Registry: Serial Issues About Archiving & the Web. Serials Review 39 (1), 3–20. hgp://www.sciencedirect.com/science/arAcle/pii/S0098791313000178, &hgps://www.era.lib.ed.ac.uk/handle/1842/6682
In 2011/12 three major research libraries in the USA (Columbia, Cornell & Duke)
checked archival status of serial Atles regarded as important
‘Ingest RaKo’ = 22% to 28%, ie about a quarter
=> fate of c.75% is unknown
very many ‘at risk’ e-‐journals from many small publishers
BIG publishers act early but incompletely
Priority: find economic way to archive content from …
… logs for the UK OpenURL Router*
• 8.5m full text requests in UK during 2012 => 53,311 online Atles requested
Analysis in 2013::
‘Ingest RaKo’ = 32% (16,985/53,311)
=> over two thirds 68% (36,326 Atles) held by none!
Evidence based on what Researchers Use
* As reported in Keepers Registry Blog, OpenURL Router passes ‘discovery’ requests to commercial OpenURL resolver services; developed & delivered by EDINA as part of Jisc support for UK universiAes & colleges
… logs for the UK OpenURL Router*
• 8.5m full text requests in UK during 2012 => 53,311 online Atles requested
Analysis in 2013::
‘Ingest RaKo’ = 32% (16,985/53,311)
=> over two thirds 68% (36,326 Atles) held by none!
Evidence based on what Researchers Use
* As reported in Keepers Registry Blog, OpenURL Router passes ‘discovery’ requests to commercial OpenURL resolver services; developed & delivered by EDINA as part of Jisc support for UK universiAes & colleges
“I believe we've … a problem here.” [John Swigert, Jr.]
Another threat to the integrity of the record
Language Technology Group Funded by the Andrew W. Mellon Foundation
‘Reference Rot’ When what was referenced & cited ceases to say the same thing, or ‘has ceased to be’
hJp://www.snorgtees.com/this-‐parrot-‐has-‐ceased-‐to-‐be
Reference Rot = Link Rot + Content Drift
“when links to web resources no longer point to what they once did”
Link Rot
‘Link Rot’
+ Content Drift: What is at end of URI has changed, or gone!
http://dl00.org 2000
http://dl00.org 2004
http://dl00.org 2005
http://dl00.org 2008
(a) Dynamic content as values on webpage changes over Ame
(b) StaKc content but very different (o{en unrelated) web pages
Hiberlink: Time Travel for The Scholarly Web 1. Threat: Creating evidence on extent of ‘Reference Rot’
– Main focus: references (& URIs) made in Journal Articles • "Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot"
– PLOS One paper published on 26 December 2014.
• Harvard Law Library & permaCC reference rot in Supreme Court judgments
• http://www.newyorker.com/magazine/2015/01/26/cobweb
– Also looked at Reference Rot & the e-Thesis, ETD2014
2. Remedy: Opportunities for productive intervention – Identify workflows: preparation, publication, ingest
– Prototype tools to avoid or limit reference rot
– Pro-active or ‘transactional’ archiving as remedy • Embedding such ‘solutions’ in existing tools & infrastructure
• Propose/test new infrastructure for temporal referencing – supporting & using the Memento protocol
Peter Burnhill, EDINA @ University of Edinburgh
hgp://www.res|ulliving.com/wp-‐content/uploads/2013/12/Time-‐1024x861.jpg
Preserving the integrity of the scholarly
record
• Robust Link - re-factor the HTML link that is returned
‘Infrastructure’ to Enable Remedy
<a href="http://www.bnf.fr">
Link to the BNF
</a>
b) Augment Link with a set of Datetime & location pairs <a href="http://www.bnf.fr"
mset="2014-05-19,
http://archive.today/zdpAn 2014-05-15 memento">
Link to the BNF
</a>
a) Take simple URI - to French National Library (say)
hgp://robustlinks.mementoweb.org/
Remedy for The Integrity of The Scholarly Record
Envisage the best opportuniAes for IntervenAon to make Remedy, to ‘flash-‐freeze’, either to avoid reference rot or to ‘stop the rot’. 3 basic workflows: ① Study: PreparaAon -‐> (Review) -‐> Submission ② PublicaAon: Editorial -‐> (Revision) -‐> Acceptance -‐> Issue ③ Post-‐PublicaAon: Deposit/Ingest -‐> Provide/Access -‐> Use
IdenPfy the Actors involved in: ① ComposiAon: author/creator ② Public Release: editor/referee/copy ③ CuraAon: librarian / repository manager / archivist
Hiberlink Plug-in: help authors & middle-folk do the right thing:
① Triggers archiving of referenced web content when it is noted in:
– Zotero - used by authors to manage references
https://www.zotero.org/
– Open Journal System (OJS) - used by OA publishers
https://pkp.sfu.ca/ojs/
② Returns Datetime URI for archived content that can
be used in the citation
Two-step Remedy To Avoid Reference Rot
Time’s Up!
thekeepers.org hiberlink.org
• See also • thekeepers.blogs.edina.ac.uk • safenet.blogs.edina.ac.uk/
HelpDesk: [email protected]