67
@hvdsomp, @phonedude_mln, @mart1nkle1n CNI Spring 2017, Albuquerque, NM, 3 Apr 2017 Herbert Van de Sompel Los Alamos National Laboratory @hvdsomp http://orcid.org/0000-0002-0715-6126 Michael L. Nelson Old Dominion University @phonedude_mln http://orcid.org/0000-0003-3749-8116 Martin Klein Los Alamos National Laboratory @mart1nkle1n http://orcid.org/0000-0003-0130-2097 To the Rescue of the Orphans of Scholarly Communication The project is funded by the Andrew W. Mellon Foundation

To the Rescue of the Orphans of Scholarly Communication

Embed Size (px)

Citation preview

Page 1: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Herbert Van de SompelLos Alamos National Laboratory @hvdsomp

http://orcid.org/0000-0002-0715-6126

Michael L. NelsonOld Dominion University @phonedude_mln

http://orcid.org/0000-0003-3749-8116

Martin KleinLos Alamos National Laboratory @mart1nkle1n

http://orcid.org/0000-0003-0130-2097

To the Rescue of theOrphans of Scholarly Communication

The project is funded by the Andrew W. Mellon Foundation

Page 2: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

• Problem statement Scholarly objects are everywhere on the web, and are not systematically archived

• Project perspectiveCapturing objects using an institutional & web archiving paradigm

• Object capture flow:• Step 1: Discovering a researcher’s web identities• Step 2: Discovering artifacts per web identity• Step 3: Determining the web boundary per artifact• Step 4: Capturing resources in the artifacts’ web boundary

Outline

Page 3: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Scholarship is Evolving

• The research process, not just its outcome, is becoming visible … on the web

• Massive extension of the scholarly record with an enormous variety of novel objects

• The objects are heterogeneous, dynamic, compound, inter-related and distributed across the web

• The objects are often hosted on common web platforms that are not dedicated to scholarship

Page 4: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

101 Innovations in Scholarly Communication

Bianca Kramer & Jeroen Bosman. 101 Innovations in Scholarly Communicationhttps://innoscholcomm.silk.co/

Page 5: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

The Evolving Scholarly Record

Brian Lavoie et al. (2014) The Evolving Scholarly Recordhttp://www.oclc.org/content/dam/research/publications/library/2014/oclcresearch-evolving-scholarly-

record-2014-5-a4.pdf

Page 6: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Web Platforms Record Scholarship

• Increasingly, common web platforms are used for scholarship• GitHub, Wikis, Wordpress, etc.

• Many of these platforms have desirable characteristics• Versioning• Time stamping• Social embedding

• But, these platforms record rather than archive

Herbert Van de Sompel & Andrew Treloar (2014) A Perspective on Archiving the Scholarly Webhttp://public.lanl.gov/herbertv/papers/Papers/2014/iPres2014_Sompel_Treloar.pdf

Page 7: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Recording is not Archiving

“GitHub reserves the right at any time and from time to time to modify or discontinue, temporarily or permanently, the Service (or any part thereof) with or without notice.”

GitHub Terms of Servicehttp://help.github.com/articles/github-terms-of-service https://help.github.com/articles/github-terms-of-service/

Page 8: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Recording is not Archiving

GitHub Terms of Servicehttp://help.github.com/articles/github-terms-of-service

https://opensource.googleblog.com/2015/03/farewell-to-google-code.html

Page 9: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Recording versus Archiving

Recording ArchivingShort-term Longer-term

No guarantees provided Attempt to provide guaranteesWrite many/read many Write once/Read many

Scholarly process Scholarly record

Herbert Van de Sompel & Andrew Treloar (2014) A Perspective on Archiving the Scholarly Webhttp://public.lanl.gov/herbertv/papers/Papers/2014/iPres2014_Sompel_Treloar.pdf

Page 10: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Meet Some New School Researchers

Ian Milligan Mark Matienzo

Page 11: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Meet Some New School Researchers

Ian Milligan

https://ianmilligan.ca/https://www.slideshare.net/IanMilligan1https://github.com/ianmilligan1

Page 12: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Meet Some New School Researchers

Mark Matienzo

http://matienzo.org/https://www.slideshare.net/anarchivist/presentationshttps://github.com/anarchivisthttps://osf.io/tgr4k/https://www.drupal.org/user/380762

Page 13: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

SlideShare Artifact: 0 Mementos

https://www.slideshare.net/IanMilligan1/resaw-geo-citieshttp://timetravel.mementoweb.org/list/20140513211653/https://www.slideshare.net/IanMilligan1/

resaw-geo-cities

Page 14: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

GitHub Artifact: 1 Memento

https://github.com/ianmilligan1/Historian-WARC-1http://web.archive.org/web/*/https://github.com/ianmilligan1/Historian-WARC-1

Page 15: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

The Scholarly Orphans Project

• Funded by the Andrew W, Mellon Foundation• Los Alamos National Laboratory & New Mexico Consortium• Old Dominion University• 04/2016 - 03/2019

• How to capture scholarly orphans for long-term archiving?

• Project explores a paradigm inspired by web archiving• Scale of the problem• Bilateral agreements with most web portals unlikely

• Project explores an institution driven paradigm • Institution should be interested in capturing the artifacts its

scholars deposit on the web

Page 16: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

An Institutional & Web Archiving Perspective

Page 17: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Related Work

• LOCKSS• Web crawling approach• Focused on journal literature

• Archive-It• On-demand, subscription-based web archiving• Not focused on scholarly orphans

• Institutional repository • Capture an institution’s output• Focused on manual upload (of journal literature)

• The Locker Project• Capture an individual’s web presence• Not focused on scholarly orphans

Page 18: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Capture Flow

Page 19: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Capture Flow – Step 1

Page 20: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Algorithmic Discovery of Web Identities

James Powell et al. (2014) EgoSystem: Where are our alumni?In: code4lib http://journal.code4lib.org/articles/9519

Page 21: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Discovery of Web Identities via a Registry: ORCID

Martin Klein and Herbert Van de Sompel (2017) Discovering scholarly orphans using ORCID In: JCDL2017 https://arxiv.org/abs/1703.09343

Ian Milliganhttp://orcid.org/0000-0002-1470-7723

Mark Matienzohttp://orcid.org/0000-0003-3270-1306

Page 22: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Ian Milligan’s ORCID

• Web Identities: 0

http://orcid.org/0000-0002-1470-7723

Page 23: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Mark Matienzo’s ORCID

• Web Identities: 3(homepage, ScopusID, ResearcherID)

http://orcid.org/0000-0003-3270-1306

Page 24: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Mark Matienzo’s Home Page

• URI to GitHub repository, Twitter

• Could be included in ORCID profile

http://matienzo.org/

Page 25: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

• Evaluation of ORCID for automatic discovery of Web Identities

• How well does ORCID represent the global community of active researchers?• Adoption rate• Subject coverage• Geo-location coverage

• How well does ORCID score when it comes to listing Web Identities?

Discovery of Web Identities via a Registry: ORCID

Page 26: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

ORCID - Adoption Rate

Page 27: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

ORCID - Subject Coverage

Page 28: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

ORCID - Geo-Location Coverage

Page 29: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

ORCID - Geo-Location Coverage

Page 30: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

ORCID - Geo-Location Coverage

Page 31: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

ORCID - Web Identities

Page 32: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

ORCID - Web Identities

Page 33: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Discovery of Web Identities via a Registry: ORCID

• Adoption rate is increasing

• Subject coverage is focused, does not cover all disciplines equally

• Geo-Location coverage is good but not quite representative

• Web Identity coverage is poor; not usable for our purpose in its current state

Page 34: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Capture Flow – Step 2

Page 35: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Discovery of Artifacts per Web Identity

• Algorithmic approach

• Scrape artifacts from pages

http://matienzo.org/publications/

Page 36: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Discovery of Artifacts per Web Identity

• Notifications

• Subscribe to portal notifications about a researcher’s new artifacts

https://www.slideshare.net/anarchivist/presentations

Page 37: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Discovery of Artifacts per Web Identity

• Artifact Registry

• 5 artifacts of interest (standards document, reports, book reviews)

http://orcid.org/0000-0003-3270-1306

Page 38: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Capture Flow – Step 3

Page 39: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Determination of Web Boundary per Artifact

http://signposting.org

Page 40: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

HTTP Links

Mark Nottingham (2010) RFC5988: Web Linking. http://tools.iets.org/rfc/rfc5988.txt

Page 41: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

HTTP Links

Page 42: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

HTTP Links

Page 43: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Signposting - Publication Boundary Pattern

http://signposting.org/publication_boundary/oxford/

Page 44: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Signposting - Bibliographic Metadata Pattern

http://signposting.org/bibliographic_metadata/springer/

Page 45: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Capture Flow – Step 4

Page 46: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

• Legal• robots.txt• Licenses

• Technical• Capture tools• Capture quality• Capture authenticity

Challenges Regarding Capturing Web Artifacts

Page 47: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Legal Challenges re Capturing Artifacts – A Wake-Up Call

SlideShare• robots.txt unclear, some pages disallowed• License seems to prohibit archiving

GitHub• robots.txt unclear, some pages disallowed• License seems to allow archiving

Drupal• robots.txt allows relevant URIs• License seems to prohibit archiving

Open Science Framework• robots.txt does not disallow crawlers• License does not mention archiving, individual content may have

specific license

Page 48: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Capture Tools Challenges: Mark’s SlideShare

Live

Internet Archive

Webrecorder.io

https://www.slideshare.net/anarchivist/to-hell-with-good-intentions-linked-data-and-the-power-to-namehttp://web.archive.org/web/20161229053246/http://www.slideshare.net/anarchivist/to-hell-with-good-intentions-linked-data-and-the-power-to-name

https://webrecorder.io/martinklein/cni_test/20170330014029/https://www.slideshare.net/anarchivist/to-hell-with-good-intentions-linked-data-and-the-power-to-name

Page 49: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Capture Tools Challenges: Mark’s GitHub

Live

Internet Archive

Webrecorder.io

https://github.com/rightsstatements/rightsstatements.github.iohttps://web.archive.org/web/20170328040646/https://github.com/rightsstatements/rightsstatements.github.io

https://webrecorder.io/martinklein/cni_test/20170330014135/https://github.com/rightsstatements/rightsstatements.github.io

Page 50: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Capture Tools Challenges: Mark’s OSF

Live

Internet Archive

Webrecorder.io

https://osf.io/h4ru8/wiki/home/http://web.archive.org/web/20170328042647/https://osf.io/h4ru8/wiki/home/

https://webrecorder.io/martinklein/cni_test/20170330014219/https://osf.io/h4ru8/wiki/home/

Page 51: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Capture Quality - How well was this page archived?

• Continuing research on memento damage, first published at JCDL 2014

• Premise: simply reporting “9/10 embedded images were archived” is insufficient to describe how well the archive / replay system performed

• Use heuristics from Mechanical Turk testing to approximate human conception of damage, e.g.:o increase weight of missing images

that are large, or centered in the viewport

o stylesheets can be important! check for “ugly” results

J.F. Brunelle, M. Kelly, H. SalahEldeen M. C. Weigle, and M. L. Nelson (2014) Not all mementos are created equal: Measuring the impact of missing resources. In: JCDL 2014

http://dx.doi.org/10.1109/JCDL.2014.6970187 http://dx.doi.org/10.1007/s00799-015-0150-6

Page 52: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Triptych CSS

“regular” web pages have nearly equal distributionof content over each third of a page

if a CSS is missing AND > 75% of the

non-background color is in the left 2/3s of the page,

then users consider this

damaged

Page 53: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

A Memento Damage Service, Python Library, and Docker Image

Erika Siregarhttp://memento-damage.cs.odu.edu/

https://github.com/erikaris/web-memento-damage

Page 54: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Just a Little Bit of Damage…

Page 55: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Moderate Damage…

Page 56: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Significant Damage…

Page 57: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Ian’s GitHub Memento…

https://github.com/ianmilligan1/Historian-WARC-1http://web.archive.org/web/20130922192416/https://github.com/ianmilligan1/Historian-WARC-1

Page 58: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

… Has Slight Damage

does not appear to

violate the “75% / left-

2/3s” rule

Page 59: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Capture Authenticity - Has this page been tampered with?

• The days of implicitly trusting Brewster & IA are overo the people who brought you

fake news will eventually bring you fake archives

o “mo’ archives mo’ problems”• Premise: use multiple, independent

archives to record fixity information from dated observations of mementos

• Plans:o blockchaino provenance (i.e., a memento of

memento != 2 independent mementos)

https://climate.nasa.gov/vital-signs/carbon-dioxide/http://web.archive.org/web/20170312201332/https://climate.nasa.gov/vital-signs/carbon-dioxide/

http://localhost:8282/michael/wayback/20170313023607/https://climate.nasa.gov/vital-signs/carbon-dioxide/

Page 60: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Push a Web Page into Multiple Archives

Mohamed Aturban (2017) Archive Now (archivenow): A Python Library to Integrate On-Demand Archives http://ws-dl.blogspot.com/2017/02/2017-02-22-archive-now-archivenow.html

Page 61: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Record Fixity in a Manifest File

Shawn Jones (2016) Mementos In the Raw, Take Twohttp://ws-dl.blogspot.com/2016/08/2016-08-15-mementos-in-raw-take-two.html

Page 62: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Publish Manifest to the Web

Page 63: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Archive the Manifest

Page 64: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

“You can’t tell the players without a scorecard” – Harry M. Stevens

Page 65: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Verifying the Authenticity of a Memento

• Given a Memento, URI-M, that we wish to verify• Lookup the URI-M at a manifest server

o e.g, captureproject.org/{URI-M}• Discover all the mementos of the manifest, and verify their

integrity with “trusty URIs”• For each URI-M listed in the manifest, repeat the fixity calculation

as described in the manifest• Vote if fixity matches (not tampered with) or if fixity doesn’t match

(tampered with)o Majority vote wins (assuming independent archives)

Mohamed Aturban (2017) Summary of "Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data” http://ws-dl.blogspot.com/2017/01/2017-01-15-summary-of-trusty-uris.html

Video at https://www.youtube.com/watch?v=EY15lj-7_lc

Page 66: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Discussion

Page 67: To the Rescue of the Orphans of Scholarly Communication

@hvdsomp, @phonedude_mln, @mart1nkle1nCNI Spring 2017, Albuquerque, NM, 3 Apr 2017

Herbert Van de SompelLos Alamos National Laboratory @hvdsomp

http://orcid.org/0000-0002-0715-6126

Michael L. NelsonOld Dominion University @phonedude_mln

http://orcid.org/0000-0003-3749-8116

Martin KleinLos Alamos National Laboratory @mart1nkle1n

http://orcid.org/0000-0003-0130-2097

To the Rescue of theOrphans of Scholarly Communication

The project is funded by the Andrew W. Mellon Foundation