31
Preserving The Integrity of The Scholarly Record http://www.flickr.com/photos/shinez/5000985919/ Peter Burnhill, EDINA @ University of Edinburgh NaAonal Library of Scotland George IV Bridge 5.30pm 16th February

Preserving the Integrity of the Scholarly Record

Embed Size (px)

Citation preview

Page 1: Preserving the Integrity of the Scholarly Record

Preserving  The  Integrity  of  The  Scholarly  Record  

http://www.flickr.com/photos/shinez/5000985919/

Peter  Burnhill,    EDINA  @  University  of  Edinburgh  

 

NaAonal  Library  of  Scotland    George  IV  Bridge    5.30pm  16th  February    

Page 2: Preserving the Integrity of the Scholarly Record

Preserving  The  Integrity  of  The  Scholarly  Record  

http://www.flickr.com/photos/shinez/5000985919/

Peter  Burnhill,    EDINA  @  University  of  Edinburgh  

 

NaAonal  Library  of  Scotland    George  IV  Bridge    5.30pm  16th  February    

Take  Home  Message:  1)  Archive  Streams  of  Issued  Content  2)  Avoid  Reference  Rot      

 

Page 3: Preserving the Integrity of the Scholarly Record

The  Scholarly  Record  &  Serials  …  [a  focus  on  the  digital]      

‘The  Scholarly    Record’  has  a    fuzzy  edge  

‘e-­‐journals’  

Websites,    Databases,    Repositories  

‘Book-­‐length  work’  

Page 4: Preserving the Integrity of the Scholarly Record

The  Scholarly  Record  &  Serials  …  [a  focus  on  the  digital]      

ConAnuing    Resources,    inc.  Serials    

‘The  Scholarly    Record’  has  a    fuzzy  edge  

‘e-­‐journals’  

Websites,    Databases,    Repositories  

‘Book-­‐length  work’  

Page 5: Preserving the Integrity of the Scholarly Record

The  Scholarly  Record  &  Serials  …  [a  focus  on  the  digital]      

ConAnuing    Resources,    inc.  Serials    

‘The  Scholarly    Record’  has  a    fuzzy  edge  

Issued  in  Parts    (Serials)  

Content  changes    over  Ame    

(IntegraAng)  

‘e-­‐journals’  

Websites,    Databases,    Repositories  

‘Book-­‐length  work’  

Page 6: Preserving the Integrity of the Scholarly Record

The  Scholarly  Record  &  Serials  …  [a  focus  on  the  digital]      

ConAnuing    Resources,    inc.  Serials    

‘The  Scholarly    Record’  has  a    fuzzy  edge  

Other  ‘resources    needed    

for  scholarship’   Issued  in  Parts    (Serials)  

Content  changes    over  Ame    

(IntegraAng)  

‘e-­‐journals’  

Websites,    Databases,    Repositories  

‘Book-­‐length  work’  

‘Gov  Docs’  

Page 7: Preserving the Integrity of the Scholarly Record

1.  What  exactly  is  the  scholarly  record?  •  What  of  that  now  ‘issued  on  the  Web’?  

•  And  what  if  we  limit  focus  to  what  could  get  an  ISSN?  

2.  Whose  responsibility  is  it  to  act  as  steward?    Each  research  library;  library  consorAa;    

naAonal/state  libraries/archives?  

&  is  this  a  naAonal,  or  a  trans-­‐naAonal  challenge?  

 

The  following  quesAons  are  implicit:  

Page 8: Preserving the Integrity of the Scholarly Record

An Article, once available in print on-shelf locally …

… is now online & accessed remotely,

‘anytime/anywhere’ => Improved Ease of Access J

But what of Continuity of Access? Will it be still be there tomorrow?

 

Page 9: Preserving the Integrity of the Scholarly Record

Libraries boast of ‘e-collections’, but maybe now they only have ‘e-connections’

Picture  credit:  hgp://somanybooksblog.com/2009/03/27/library-­‐tour/    

=> real & present danger for the integrity of what is published as scholarly record

Page 10: Preserving the Integrity of the Scholarly Record

10  

This is a global challenge: trans-national action

%age of 132,806 ISSN issued for e-serials (December 2013)

US:  20%  UK:  8.6%  

Rest  of  World:    71%  

Researchers (& libraries/publishers) in any one country are dependent upon content written and published as

serials in countries other than their own

Page 11: Preserving the Integrity of the Scholarly Record

So, who is offering digital shelving?

①  Web-scale not-for-profit archiving agencies:

②  National libraries …

③  Research libraries: consortia & specialist centres …

Ingesting content with archival intent …

National Science Library, Chinese Academy of Sciences

National Science Library, Chinese Academy of Sciences

Page 12: Preserving the Integrity of the Scholarly Record

Many archiving organisations a Good Thing

“Digital information is best preserved by replicating it at multiple archives run by autonomous organizations”

B. Cooper and H. Garcia-Molina (2002)

Some  bad  stuff  will  happen!  

Page 13: Preserving the Integrity of the Scholarly Record

A  Project  to    Pilot  an    E-­‐journal    PreservaAon    Registry    Service  

Need to know who is looking after what & how?    

Page 14: Preserving the Integrity of the Scholarly Record

ISSN Register

E-J Preservation Registry Service

E-Journal Preservation

Registry

user requirements

(a)

(b)

ISSN-­‐L  as  kernel  field  

METADATA on extant e-serials

METADATA    on preservation action

Digital Preservation Agencies

Pilot: CLOCKSS, Portico; BL, KB; UK LOCKSS Alliance

A  Project  to    Pilot  an    E-­‐journal    PreservaAon    Registry    Service  

Need to know who is looking after what & how?    

Page 15: Preserving the Integrity of the Scholarly Record

ISSN Register

E-J Preservation Registry Service

E-Journal Preservation

Registry

user requirements

(a)

(b)

ISSN-­‐L  as  kernel  field  

METADATA on extant e-serials

METADATA    on preservation action

Digital Preservation Agencies

Pilot: CLOCKSS, Portico; BL, KB; UK LOCKSS Alliance

A  Project  to    Pilot  an    E-­‐journal    PreservaAon    Registry    Service  

Need to know who is looking after what & how?    

The Keepers Registry

"Tales  from  the    Keepers  Registry"    

Serials  Review  39.1  (2013)  

Page 16: Preserving the Integrity of the Scholarly Record

…  to  discover  who  is  looking  a5er  what  

thekeepers.org as Global Monitor

*New  in  2014*      

Library  of  Congress    and  Scholars  Portal    now  reporAng  in  

 

Page 17: Preserving the Integrity of the Scholarly Record

e-­‐journals  should  be  easy    –  right?    

the  Keepers  Registry  recorded    

In  2011,  16,558  Atles  ‘ingested  &  archived’  by  at  least  1  ‘keeper’    

 in  2013,  21,557          in  2014,  26,195  now  26,712      

   

9,731  'ingested  &  archived'  by  3+  

…  more  archiving  &  as  more  archives  report  into  Registry  !    

Some  signs  of  Progress:  

Wrigen  &  produced  by  Julie  Brown,  1989  

Page 18: Preserving the Integrity of the Scholarly Record

“Are we there yet?” … “Don’t think so”

‘Ingest Ratio’ = titles being ingested by one or more Keeper / ‘online serials’ in ISSN Register

= 26,195 / 136,965 [in March 2014]

=> 19% (We do not know about 80% of all resources having ISSN)

‘KeepSafe Ratio’ = titles being ingested by 3+ Keepers / ‘online serials’ in ISSN Register

= 9,656 / 136,965

=> 7%

Page 19: Preserving the Integrity of the Scholarly Record

Evidence  on  what  libraries  care  about  

Using  Title  List  Comparison  tool  in  Members  Area  of  Keepers  Registry  As  reported  in:    P.  Burnhill  (2013)  Tales  from  The  Keepers  Registry:  Serial  Issues  About  Archiving  &  the  Web.  Serials  Review  39  (1),  3–20.  hgp://www.sciencedirect.com/science/arAcle/pii/S0098791313000178,  &hgps://www.era.lib.ed.ac.uk/handle/1842/6682  

 

In  2011/12  three  major  research  libraries  in  the  USA    (Columbia,  Cornell  &  Duke)    

checked  archival  status  of  serial  Atles  regarded  as  important      

‘Ingest  RaKo’  =  22%  to  28%,  ie  about  a  quarter      

 

=>  fate  of  c.75%  is  unknown  

Page 20: Preserving the Integrity of the Scholarly Record

very  many  ‘at  risk’  e-­‐journals  from  many  small  publishers  

BIG    publishers    act  early  but  incompletely  

Priority:    find  economic  way  to  archive  content  from  …  

Page 21: Preserving the Integrity of the Scholarly Record

…  logs  for  the  UK  OpenURL  Router*  

•  8.5m  full  text  requests  in  UK  during  2012    =>  53,311  online  Atles  requested    

 Analysis  in  2013::    

 ‘Ingest  RaKo’  =  32%  (16,985/53,311)        

 =>  over  two  thirds  68%  (36,326  Atles)  held  by  none!      

   

 

Evidence  based  on  what  Researchers  Use  

*  As  reported  in  Keepers  Registry  Blog,  OpenURL  Router  passes  ‘discovery’  requests  to  commercial  OpenURL  resolver  services;  developed  &  delivered  by  EDINA  as  part  of  Jisc  support  for  UK  universiAes  &  colleges    

Page 22: Preserving the Integrity of the Scholarly Record

…  logs  for  the  UK  OpenURL  Router*  

•  8.5m  full  text  requests  in  UK  during  2012    =>  53,311  online  Atles  requested    

 Analysis  in  2013::    

 ‘Ingest  RaKo’  =  32%  (16,985/53,311)        

 =>  over  two  thirds  68%  (36,326  Atles)  held  by  none!      

   

 

Evidence  based  on  what  Researchers  Use  

*  As  reported  in  Keepers  Registry  Blog,  OpenURL  Router  passes  ‘discovery’  requests  to  commercial  OpenURL  resolver  services;  developed  &  delivered  by  EDINA  as  part  of  Jisc  support  for  UK  universiAes  &  colleges    

“I  believe  we've  …  a  problem  here.”  [John  Swigert,  Jr.]    

Page 23: Preserving the Integrity of the Scholarly Record

Another threat to the integrity of the record  

Language Technology Group  Funded by the Andrew W. Mellon Foundation

‘Reference  Rot’    When  what  was  referenced  &  cited    ceases  to  say  the  same  thing,  or  ‘has  ceased  to  be’  

hJp://www.snorgtees.com/this-­‐parrot-­‐has-­‐ceased-­‐to-­‐be  

Reference Rot = Link Rot + Content Drift

“when links to web resources no longer point to what they once did”

Page 24: Preserving the Integrity of the Scholarly Record

Link Rot

‘Link Rot’  

Page 25: Preserving the Integrity of the Scholarly Record

+ Content Drift: What is at end of URI has changed, or gone!

http://dl00.org 2000

http://dl00.org 2004

http://dl00.org 2005

http://dl00.org 2008

(a)  Dynamic  content  as  values  on  webpage  changes  over  Ame  

(b)  StaKc  content  but  very  different  (o{en  unrelated)  web  pages  

Page 26: Preserving the Integrity of the Scholarly Record

Hiberlink: Time Travel for The Scholarly Web 1. Threat: Creating evidence on extent of ‘Reference Rot’

–  Main focus: references (& URIs) made in Journal Articles •  "Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot"

–  PLOS One paper published on 26 December 2014.

•  Harvard Law Library & permaCC reference rot in Supreme Court judgments

•  http://www.newyorker.com/magazine/2015/01/26/cobweb

–  Also looked at Reference Rot & the e-Thesis, ETD2014

2.  Remedy: Opportunities for productive intervention –  Identify workflows: preparation, publication, ingest

–  Prototype tools to avoid or limit reference rot

–  Pro-active or ‘transactional’ archiving as remedy •  Embedding such ‘solutions’ in existing tools & infrastructure

•  Propose/test new infrastructure for temporal referencing –  supporting & using the Memento protocol

Page 27: Preserving the Integrity of the Scholarly Record

Peter Burnhill, EDINA @ University of Edinburgh

hgp://www.res|ulliving.com/wp-­‐content/uploads/2013/12/Time-­‐1024x861.jpg  

Preserving  the  integrity  of  the  scholarly    

record  

Page 28: Preserving the Integrity of the Scholarly Record

•  Robust Link - re-factor the HTML link that is returned

‘Infrastructure’ to Enable Remedy

<a href="http://www.bnf.fr">

Link to the BNF

</a>

b)  Augment Link with a set of Datetime & location pairs <a href="http://www.bnf.fr"

mset="2014-05-19,

http://archive.today/zdpAn 2014-05-15 memento">

Link to the BNF

</a>

a)  Take simple URI - to French National Library (say)  

hgp://robustlinks.mementoweb.org/  

Page 29: Preserving the Integrity of the Scholarly Record

Remedy for The Integrity of The Scholarly Record

Envisage  the  best  opportuniAes  for  IntervenAon  to  make  Remedy,  to  ‘flash-­‐freeze’,  either  to  avoid  reference  rot  or  to  ‘stop  the  rot’.    3  basic  workflows:  ① Study:  PreparaAon  -­‐>  (Review)  -­‐>  Submission    ② PublicaAon:  Editorial  -­‐>  (Revision)  -­‐>  Acceptance  -­‐>  Issue      ③ Post-­‐PublicaAon:  Deposit/Ingest  -­‐>  Provide/Access  -­‐>  Use                

IdenPfy  the  Actors  involved  in:  ① ComposiAon:  author/creator  ② Public  Release:  editor/referee/copy    ③ CuraAon:  librarian  /  repository  manager  /  archivist    

Page 30: Preserving the Integrity of the Scholarly Record

Hiberlink Plug-in: help authors & middle-folk do the right thing:

①  Triggers archiving of referenced web content when it is noted in:

– Zotero - used by authors to manage references

https://www.zotero.org/

–  Open Journal System (OJS) - used by OA publishers

https://pkp.sfu.ca/ojs/

②  Returns Datetime URI for archived content that can

be used in the citation

Two-step Remedy To Avoid Reference Rot

Page 31: Preserving the Integrity of the Scholarly Record

Time’s Up!

thekeepers.org hiberlink.org

•  See also •  thekeepers.blogs.edina.ac.uk •  safenet.blogs.edina.ac.uk/

HelpDesk: [email protected]