Preserving the Integrity of the Scholarly Record

Preview:

Citation preview

Preserving  The  Integrity  of  The  Scholarly  Record  

http://www.flickr.com/photos/shinez/5000985919/

Peter  Burnhill,    EDINA  @  University  of  Edinburgh  

 

NaAonal  Library  of  Scotland    George  IV  Bridge    5.30pm  16th  February    

Preserving  The  Integrity  of  The  Scholarly  Record  

http://www.flickr.com/photos/shinez/5000985919/

Peter  Burnhill,    EDINA  @  University  of  Edinburgh  

 

NaAonal  Library  of  Scotland    George  IV  Bridge    5.30pm  16th  February    

Take  Home  Message:  1)  Archive  Streams  of  Issued  Content  2)  Avoid  Reference  Rot      

 

The  Scholarly  Record  &  Serials  …  [a  focus  on  the  digital]      

‘The  Scholarly    Record’  has  a    fuzzy  edge  

‘e-­‐journals’  

Websites,    Databases,    Repositories  

‘Book-­‐length  work’  

The  Scholarly  Record  &  Serials  …  [a  focus  on  the  digital]      

ConAnuing    Resources,    inc.  Serials    

‘The  Scholarly    Record’  has  a    fuzzy  edge  

‘e-­‐journals’  

Websites,    Databases,    Repositories  

‘Book-­‐length  work’  

The  Scholarly  Record  &  Serials  …  [a  focus  on  the  digital]      

ConAnuing    Resources,    inc.  Serials    

‘The  Scholarly    Record’  has  a    fuzzy  edge  

Issued  in  Parts    (Serials)  

Content  changes    over  Ame    

(IntegraAng)  

‘e-­‐journals’  

Websites,    Databases,    Repositories  

‘Book-­‐length  work’  

The  Scholarly  Record  &  Serials  …  [a  focus  on  the  digital]      

ConAnuing    Resources,    inc.  Serials    

‘The  Scholarly    Record’  has  a    fuzzy  edge  

Other  ‘resources    needed    

for  scholarship’   Issued  in  Parts    (Serials)  

Content  changes    over  Ame    

(IntegraAng)  

‘e-­‐journals’  

Websites,    Databases,    Repositories  

‘Book-­‐length  work’  

‘Gov  Docs’  

1.  What  exactly  is  the  scholarly  record?  •  What  of  that  now  ‘issued  on  the  Web’?  

•  And  what  if  we  limit  focus  to  what  could  get  an  ISSN?  

2.  Whose  responsibility  is  it  to  act  as  steward?    Each  research  library;  library  consorAa;    

naAonal/state  libraries/archives?  

&  is  this  a  naAonal,  or  a  trans-­‐naAonal  challenge?  

 

The  following  quesAons  are  implicit:  

An Article, once available in print on-shelf locally …

… is now online & accessed remotely,

‘anytime/anywhere’ => Improved Ease of Access J

But what of Continuity of Access? Will it be still be there tomorrow?

 

Libraries boast of ‘e-collections’, but maybe now they only have ‘e-connections’

Picture  credit:  hgp://somanybooksblog.com/2009/03/27/library-­‐tour/    

=> real & present danger for the integrity of what is published as scholarly record

10  

This is a global challenge: trans-national action

%age of 132,806 ISSN issued for e-serials (December 2013)

US:  20%  UK:  8.6%  

Rest  of  World:    71%  

Researchers (& libraries/publishers) in any one country are dependent upon content written and published as

serials in countries other than their own

So, who is offering digital shelving?

①  Web-scale not-for-profit archiving agencies:

②  National libraries …

③  Research libraries: consortia & specialist centres …

Ingesting content with archival intent …

National Science Library, Chinese Academy of Sciences

National Science Library, Chinese Academy of Sciences

Many archiving organisations a Good Thing

“Digital information is best preserved by replicating it at multiple archives run by autonomous organizations”

B. Cooper and H. Garcia-Molina (2002)

Some  bad  stuff  will  happen!  

A  Project  to    Pilot  an    E-­‐journal    PreservaAon    Registry    Service  

Need to know who is looking after what & how?    

ISSN Register

E-J Preservation Registry Service

E-Journal Preservation

Registry

user requirements

(a)

(b)

ISSN-­‐L  as  kernel  field  

METADATA on extant e-serials

METADATA    on preservation action

Digital Preservation Agencies

Pilot: CLOCKSS, Portico; BL, KB; UK LOCKSS Alliance

A  Project  to    Pilot  an    E-­‐journal    PreservaAon    Registry    Service  

Need to know who is looking after what & how?    

ISSN Register

E-J Preservation Registry Service

E-Journal Preservation

Registry

user requirements

(a)

(b)

ISSN-­‐L  as  kernel  field  

METADATA on extant e-serials

METADATA    on preservation action

Digital Preservation Agencies

Pilot: CLOCKSS, Portico; BL, KB; UK LOCKSS Alliance

A  Project  to    Pilot  an    E-­‐journal    PreservaAon    Registry    Service  

Need to know who is looking after what & how?    

The Keepers Registry

"Tales  from  the    Keepers  Registry"    

Serials  Review  39.1  (2013)  

…  to  discover  who  is  looking  a5er  what  

thekeepers.org as Global Monitor

*New  in  2014*      

Library  of  Congress    and  Scholars  Portal    now  reporAng  in  

 

e-­‐journals  should  be  easy    –  right?    

the  Keepers  Registry  recorded    

In  2011,  16,558  Atles  ‘ingested  &  archived’  by  at  least  1  ‘keeper’    

 in  2013,  21,557          in  2014,  26,195  now  26,712      

   

9,731  'ingested  &  archived'  by  3+  

…  more  archiving  &  as  more  archives  report  into  Registry  !    

Some  signs  of  Progress:  

Wrigen  &  produced  by  Julie  Brown,  1989  

“Are we there yet?” … “Don’t think so”

‘Ingest Ratio’ = titles being ingested by one or more Keeper / ‘online serials’ in ISSN Register

= 26,195 / 136,965 [in March 2014]

=> 19% (We do not know about 80% of all resources having ISSN)

‘KeepSafe Ratio’ = titles being ingested by 3+ Keepers / ‘online serials’ in ISSN Register

= 9,656 / 136,965

=> 7%

Evidence  on  what  libraries  care  about  

Using  Title  List  Comparison  tool  in  Members  Area  of  Keepers  Registry  As  reported  in:    P.  Burnhill  (2013)  Tales  from  The  Keepers  Registry:  Serial  Issues  About  Archiving  &  the  Web.  Serials  Review  39  (1),  3–20.  hgp://www.sciencedirect.com/science/arAcle/pii/S0098791313000178,  &hgps://www.era.lib.ed.ac.uk/handle/1842/6682  

 

In  2011/12  three  major  research  libraries  in  the  USA    (Columbia,  Cornell  &  Duke)    

checked  archival  status  of  serial  Atles  regarded  as  important      

‘Ingest  RaKo’  =  22%  to  28%,  ie  about  a  quarter      

 

=>  fate  of  c.75%  is  unknown  

very  many  ‘at  risk’  e-­‐journals  from  many  small  publishers  

BIG    publishers    act  early  but  incompletely  

Priority:    find  economic  way  to  archive  content  from  …  

…  logs  for  the  UK  OpenURL  Router*  

•  8.5m  full  text  requests  in  UK  during  2012    =>  53,311  online  Atles  requested    

 Analysis  in  2013::    

 ‘Ingest  RaKo’  =  32%  (16,985/53,311)        

 =>  over  two  thirds  68%  (36,326  Atles)  held  by  none!      

   

 

Evidence  based  on  what  Researchers  Use  

*  As  reported  in  Keepers  Registry  Blog,  OpenURL  Router  passes  ‘discovery’  requests  to  commercial  OpenURL  resolver  services;  developed  &  delivered  by  EDINA  as  part  of  Jisc  support  for  UK  universiAes  &  colleges    

…  logs  for  the  UK  OpenURL  Router*  

•  8.5m  full  text  requests  in  UK  during  2012    =>  53,311  online  Atles  requested    

 Analysis  in  2013::    

 ‘Ingest  RaKo’  =  32%  (16,985/53,311)        

 =>  over  two  thirds  68%  (36,326  Atles)  held  by  none!      

   

 

Evidence  based  on  what  Researchers  Use  

*  As  reported  in  Keepers  Registry  Blog,  OpenURL  Router  passes  ‘discovery’  requests  to  commercial  OpenURL  resolver  services;  developed  &  delivered  by  EDINA  as  part  of  Jisc  support  for  UK  universiAes  &  colleges    

“I  believe  we've  …  a  problem  here.”  [John  Swigert,  Jr.]    

Another threat to the integrity of the record  

Language Technology Group  Funded by the Andrew W. Mellon Foundation

‘Reference  Rot’    When  what  was  referenced  &  cited    ceases  to  say  the  same  thing,  or  ‘has  ceased  to  be’  

hJp://www.snorgtees.com/this-­‐parrot-­‐has-­‐ceased-­‐to-­‐be  

Reference Rot = Link Rot + Content Drift

“when links to web resources no longer point to what they once did”

Link Rot

‘Link Rot’  

+ Content Drift: What is at end of URI has changed, or gone!

http://dl00.org 2000

http://dl00.org 2004

http://dl00.org 2005

http://dl00.org 2008

(a)  Dynamic  content  as  values  on  webpage  changes  over  Ame  

(b)  StaKc  content  but  very  different  (o{en  unrelated)  web  pages  

Hiberlink: Time Travel for The Scholarly Web 1. Threat: Creating evidence on extent of ‘Reference Rot’

–  Main focus: references (& URIs) made in Journal Articles •  "Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot"

–  PLOS One paper published on 26 December 2014.

•  Harvard Law Library & permaCC reference rot in Supreme Court judgments

•  http://www.newyorker.com/magazine/2015/01/26/cobweb

–  Also looked at Reference Rot & the e-Thesis, ETD2014

2.  Remedy: Opportunities for productive intervention –  Identify workflows: preparation, publication, ingest

–  Prototype tools to avoid or limit reference rot

–  Pro-active or ‘transactional’ archiving as remedy •  Embedding such ‘solutions’ in existing tools & infrastructure

•  Propose/test new infrastructure for temporal referencing –  supporting & using the Memento protocol

Peter Burnhill, EDINA @ University of Edinburgh

hgp://www.res|ulliving.com/wp-­‐content/uploads/2013/12/Time-­‐1024x861.jpg  

Preserving  the  integrity  of  the  scholarly    

record  

•  Robust Link - re-factor the HTML link that is returned

‘Infrastructure’ to Enable Remedy

<a href="http://www.bnf.fr">

Link to the BNF

</a>

b)  Augment Link with a set of Datetime & location pairs <a href="http://www.bnf.fr"

mset="2014-05-19,

http://archive.today/zdpAn 2014-05-15 memento">

Link to the BNF

</a>

a)  Take simple URI - to French National Library (say)  

hgp://robustlinks.mementoweb.org/  

Remedy for The Integrity of The Scholarly Record

Envisage  the  best  opportuniAes  for  IntervenAon  to  make  Remedy,  to  ‘flash-­‐freeze’,  either  to  avoid  reference  rot  or  to  ‘stop  the  rot’.    3  basic  workflows:  ① Study:  PreparaAon  -­‐>  (Review)  -­‐>  Submission    ② PublicaAon:  Editorial  -­‐>  (Revision)  -­‐>  Acceptance  -­‐>  Issue      ③ Post-­‐PublicaAon:  Deposit/Ingest  -­‐>  Provide/Access  -­‐>  Use                

IdenPfy  the  Actors  involved  in:  ① ComposiAon:  author/creator  ② Public  Release:  editor/referee/copy    ③ CuraAon:  librarian  /  repository  manager  /  archivist    

Hiberlink Plug-in: help authors & middle-folk do the right thing:

①  Triggers archiving of referenced web content when it is noted in:

– Zotero - used by authors to manage references

https://www.zotero.org/

–  Open Journal System (OJS) - used by OA publishers

https://pkp.sfu.ca/ojs/

②  Returns Datetime URI for archived content that can

be used in the citation

Two-step Remedy To Avoid Reference Rot

Time’s Up!

thekeepers.org hiberlink.org

•  See also •  thekeepers.blogs.edina.ac.uk •  safenet.blogs.edina.ac.uk/

HelpDesk: edina@ed.ac.uk

Recommended