View
2.300
Download
2
Category
Preview:
DESCRIPTION
Keynote presented at IDCC13, Amsterdam, The Netherlands, January 16 2013.
Citation preview
Wanderer above the Sea of Fog – Caspar David Friedrich (1818) http://en.wikipedia.org/wiki/Wanderer_above_the_Sea_of_Fog
@hvdsomp #idcc13
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
The Scholarly Record is Changing
• The scholarly record is extending with a wide range of non-traditional assets emerging from eScience and eHumanities • e.g. datasets, software, ontologies, workflows, online debate,
slides, blogs, videos, etc.
• Many of these non-traditional assets: • Have a wide range of relationships with and dependencies on
other assets – grouping assets • Are becoming increasingly dynamic, and do not have the sense
of fixity that traditional assets such as journal articles or books have – versioning assets
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
grouping assets
versioning assets
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
discovering assets
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
1999
• OAI was a heroic effort to fundamentally transform scholarly communication • By promoting communication via
preprints, non-peer-reviewed papers
• The OAI took a technical approach to achieve the goal • Make preprints easier to discover,
access – Protocol for Metadata Harvesting
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
HTTP GET on record identifier
An HTTP link
Don’t trust HTTP
Just another HTTP baseURL
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
grouping assets
versioning assets
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
2007
• OAI-ORE observation: Scholarly assets are rapidly becoming compound, consisting of multiple resources with various: • Relationships • Interdependencies
• How to convey this compound-ness in an interoperable manner so that applications can access, consume such assets?
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
See e.g. http://www.ctwatch.org/quarterly/articles/2007/08/interoperability-for-the-discovery-use-and-re-use-of-units-of-scholarly-communication/8/
index.html
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
grouping assets
versioning assets
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
2009
• Memento is about the Web and time: • Resources evolve over time • Only the current representation is
available from a resource’s URI • How to seamlessly access prior
representation, if they exist?
• Memento looks at this problem for the Web, in general
Digital Preservation Award 2010
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
• Memento has potential consequences for scholarly communication
• Observation: Scholarly assets are becoming increasingly dynamic, and do not have the sense of fixity that traditional assets such as journal articles or books have • Even traditional assets are becoming
increasingly dynamic and dependent on other assets, which may themselves be dynamic
2009
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Scientific Workflows, Services, Data, Workflow Engines
Carole Goble, JCDL 2012 Keynote https://dl.dropbox.com/u/617206/JCDL2012keynoteGoble.ppt
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
From The Version of Record to A Version of the Record
• The ever-evolving nature of some assets challenges the notion of fixity as “forever frozen” and begs considering the notion of the “state of the scholarly record at a specific moment in time”
• It will become essential to be able to determine what the state of related and interdependent assets was at certain moments in time
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Two Perspectives on Memento
URI-M - http://web.archive.org/web/20010911203610/http://www.cnn.com/
Web Archive
URI-R - http://www.cnn.com/
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Two Perspectives on Memento
URI-M - http://en.wikipedia.org/w/index.php?title=September_11_attacks&oldid=282333
CMS
URI-R - http://en.wikipedia.org/wiki/September_11_attacks
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
• How to get to the time-specific resources from the generic resource?
• Memento addresses the problem in a resource-centric way: • Resource, URI, state, representation,
link, content negotiation
2009
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Today Select Date Sep 12 2010 Sep 16 2010
From BL Archive
Access Versions via the original URI and datetime
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
From The Version of Record to A Version of the Record
• The ever-evolving nature of some assets challenges the notion of fixity as “forever frozen” and begs considering the notion of the “state of the scholarly record at a specific moment in time”
• It will become essential to be able to determine what the state of related and interdependent assets was at certain moments in time
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
• Is it possible to reconstruct the Web-based scholarly record as it was at a certain point in time?
• Consider a special case: Given a paper can one see the referenced materials as they were the time of publication of the paper?
• ti: Time of publication • Relationship: Cited resources
Recreating a Version of the Record
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Published September 15 2004
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Domain Gone
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Archived copy December 5 2003
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Current version
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Archived copy December 11 2004
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Resource gone
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Archived copy December 5 2003
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Resource gone
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Archived copy unavailable
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
• Papers from arXiv: 400,000 papers => 144,000 unique URIs • Papers from UNT ETD repository: 3,600 papers => 18,000 URIs • Referenced URIs of established scholarly repositories removed (e.g. http://dx.doi.org), i.e. focusing in on the periphery of the scholarly record
• Study looks into: • Does the referenced resource still exist? • Are there archived versions of of the referenced resource?
• From around the time of publication of the citing paper?
• Study does not look into dynamic aspects: • If the referenced resource still exists, is its content same as at ti? • Does an archived version have the same content as at ti?
Pilot Study at Scale with Memento
Sanderson, R., Phillips, M., and Van de Sompel, H. (2011) Analyzing the Persistence of Referenced Web Resources with Memento. Open Repositories 2011; Arxiv preprint. arXiv:1105.3459 ; http://arxiv.org/abs/1105.3459
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
UNT
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
arXiv
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
The Good News ™
• Despite there not being a pro-active effort to archive those resources, a considerable amount were
o Because they had HTTP URIs and hence were archived as part of ongoing web archiving processes
o In The Wild archiving comes for free with the web infrastructure
• 404 resources exist in web archives and Memento can access them via their original HTTP URI
o Does that make an HTTP URI a PID?
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
The Bad News ™
• Many resources were not archived
• For many resources there were no archival versions around ti
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Automatic Creation of Archival Snapshots
• There is a need for a more pro-active approach to archive dynamic, interdependent assets, e.g.:
o Web Archives as infrastructure o Use CMS, wikis, datawikis with solid versioning mechanisms o Archiving linked context at the time of publication o Archive at the moment of use (social interaction,
downloading, annotating, etc.) o Delineate which resources are considered in/out of a
scholarly assets (OAI-ORE) to understand what needs archiving
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
discovering assets
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
2012
• ResourceSync is about allowing 3rd party systems and applications to remain synchronized with a server’s evolving resources.
• Many use cases: • Mirroring repository content • Aggregating content • Replicating datasets • Exposing content to archives • Keeping linked data applications that
leverage remote data up-to-date
• Differing needs regarding: • Coverage • Accuracy • Latency
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
ResourceSync Approach
• Resource centric; it’s all about the URI (again)
• Introduces a set of modular capabilities that a server can implement to allow 3rd parties to remain in sync with its resources. Recurrently publish:
o Resource Lists o Change Lists o Resource Dumps o Change Dumps
• All capabilities based on the Sitemap document formats and extensions thereof
o Existing Sitemaps are off-the-shelf compliant
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
ResourceSync Capabilities
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
2012
• Beta spec end 01/2013 • http://www.openarchives.org/rs/
• Feedback • mailto:resourcesync@googlegroups.com
• Papers in D-Lib Magazine • http://dx.doi.org/10.145/september2012-
vandesompel • http://dx.doi.org/10.145/january2013-klein
• Paper in Ariadne • http://www.ariande.ac.uk/issue70/lewis-et-
al
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
1998 - 2013
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
a stack of journals or a bunch of PDF files
a network of interconnected assets and actors
1998 - 2013
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013
Conclusion
• OAI-ORE, Memento, ResourceSync illustrate the potential of leveraging the Web infrastructure for scholarly communication
• This suggests that other special requirements of scholarly communication (certification, archiving, persistence, trust, annotation, metrics, …) may be addressable in an interoperable manner by leveraging the Web infrastructure
• Wins: • Long Term Sustainability: Reuse of infrastructure (network, software, platforms, standards, etc.) that the entire world depends on • Integration of scholarly discourse with other Web-based discourse
Herbert Van de Sompel
IDCC 2013, Amsterdam, The Netherlands, January 16 2013 Wanderer above the Sea of Fog – Caspar David Friedrich (1818)
http://en.wikipedia.org/wiki/Wanderer_above_the_Sea_of_Fog
@hvdsomp #idcc13
Recommended