Upload
gunther-eysenbach
View
3.339
Download
0
Tags:
Embed Size (px)
DESCRIPTION
(Talk at the 12th International Conference on Electronic Publishing held in Toronto, Canada 25-27 June 2008). ABSTRACT: Scholars are increasingly citing electronic “web references” which are not preserved in libraries or full text archives. WebCite is a new standard for citing web references. To “webcite” a document involves archiving the cited Web page through www.webcitation.org and citing the WebCite permalink instead of (or in addition to) the unstable live Web page. Almost 200 journals are already using the system. We discuss the rationale for WebCite, its technology, and how scholars, editors, and publishers can benefit from the service. Citing scholars initiate an archiving process of all cited Web references, ideally before they submit a manuscript. Authors of online documents and websites which are expected to be cited by others can ensure that their work is permanently available by creating an archived copy using WebCite and providing the citation information including the WebCite link on their Web document(s). Editors should ask their authors to cache all cited Web addresses (Uniform Resource Locators, or URLs) “prospectively” before submitting their manuscripts to their journal. Editors and publishers should also instruct their copyeditors to cache cited Web material if the author has not done so already. Finally, WebCite can process publisher submitted “citing articles” (submitted for example as eXtensible Markup Language [XML] documents) to automatically archive all cited Web pages shortly before or on publication. Finally, WebCite can act as a focussed crawler, caching retrospectively references of already published articles. Copyright issues are addressed by honouring respective Internet standards (robot exclusion files, no-cache and no-archive tags). Long-term preservation is ensured by agreements with libraries and digital preservation organizations. The resulting WebCite Index may also have applications for research assessment exercises, being able to measure the impact of Web services and published Web documents through access and Web citation metrics.FULL PAPER: Eysenbach, Gunther. Preserving the scholarly record with WebCite (www.webcitation.org): an archiving system for long-term digital preservation of cited webpages. In: ELPUB2008. Openness in Digital Publishing: Awareness, Discovery and Access - Proceedings of the 12th International Conference on Electronic Publishing held in Toronto, Canada 25-27 June 2008 / Edited by: Leslie Chan and Susanne Mornatti. ISBN 978-0-7727-6315-0, 2008, pp. 378-389. http://elpub.scix.net/data/works/att/378_elpub2008.content.pdf
Citation preview
Gunther Eysenbach MD MPH
Gunther Eysenbach MD MPH
Editor/Publisher, J Med Internet Res
Associate Professor Department of Health Policy, Management and Evaluation, & KMDI, University of Toronto;
Senior Scientist, Centre for Global eHealth Innovation,Division of Medical Decision Making and Health Care Research; Toronto General Research Institute of the UHN, Toronto General Hospital, Canada
WebCite® (www.webcitation.org)WebCite® (www.webcitation.org)
WebCite® is an on-demand archiving system (controlled by citing and cited
authors, editors, and publishers), which enables long-term digital
preservation and citability of any kind of Internet-accessible object *
Mission
* webpages, blogs, wikis, data files e.g. spreadsheets, PDF-reports, “grey” research reports, preprints etc.
E-publishing & Open Access Research Group at the CGEI, Toronto
• Journal of Medical Internet Research (www.jmir.org), – Living publishing lab– a pioneer in Open Access publishing (10 yrs)– Leading journal in its discipline (Impact Factor 3.0)– “triple-O” philosophy (open access, open source, open peer-
review)– OS contributions include contributions to OJS and XML-
typesetting software (originally © MJ Suhonos, G. Eysenbach, J Alperin, code released under GNU forms basis for PKP Lemon8 project)
• CIHR-funded research on the Impact of Open Access on Knowledge Translation (see e.g. Eysenbach. PLoS Biol 4(5): e157)
• Publishing innovations incl. WebCite® (www.webcitation.org)
www.jmir.org
Authors increasingly cite non-traditional (web)references
• Webpages (e.g. personal homepages)
• “grey” PDF reports (e.g. research progress reports, etc.)
• Blogs
• Wikis
• Datasets which are available online
Note: For the purpose of this talk I refer to “webpages” or webreferences - but what I really mean is any sort of electronic digital object that can be cited and which can be deemed non-traditional (not having a DOI)
Problem 1: URLs go “dead”
Attrition rate of cited non-journal URLs
0
10
20
30
40
50
60
70
80
90
100
0 2 4 6 8 10 12
years
% U
RL
s s
till
wo
rkin
g
Dellavalle RP, Hester EJ, Heilig LF, Drake AL, Kuntzman JW, Graber M, et al. Information science. Going, going, gone: lost Internet references. Science 2003 Oct 31;302(5646):787-788. DOI:10.1126/science.1088234
In one study published in the journal Science, 13% of Internet references in scholarly articles were inactive after only 27 months.
Problem 2: Even if URLs don’t go “dead”, their content may change
Eysenbach G. Towards quality management of medical information on the internet: evaluation, labelling, and filtering of informationBMJ 1998;317:1496-1502
Today, that site looks different…
medpics.org
Wikis and Blogs change constantly
The homepage of a blog shows the most recent posts only
Problem 3: Internet material not deemed “citable”
(impedes the use of blogs, wikis, online-sharing of datasets etc.)
Editors often discourage citing web material (including datasets)
URL:http://www.plantphysiol.org/misc/ifora.shtml. Accessed: 2008-06-26. (Archived by WebCite® at http://www.webcitation.org/5YsaBISU5)
Internet material not considered citable(Deemed unstable, not archived)
Fear of plagiarism / not getting credits
Authors are reluctant to-Making data and datasets online accessible-Participate in collaborative projects (wikis)-Share information in blogs
Problem 4: Crawler-based archiving insufficient
Limitations of crawler based archiving
• No author-initiated on demand archiving on a given date/time
• “Shotgun” approach
• Crawler cannot go everywhere (“hidden web”)
• No impact statistics (how often has my archived copy been retrieved)
• Impossible to curate
WebCite = Web Archiving 2.0
The solution: WebCite®
• First mentioned as an idea and implemented as a prototype in 1998 (Eysenbach, BMJ 1998;317:1496-1502)
• Project idea revived in 2004/2005• First implemented by J Med Internet Res• Today, used by >200 journals and large
publishers (including Biomed Central, Oxford University Press)
• Became member of the International Internet Preservation Consortium in 2008
Citing Author
/comb
WebCite®
/archive Cited Author
/bookmarklet/archive
(self-archiving)
Publisher/Editor
/archive
/comb
What the world needsJ. Author
This is a sample citing paper [1].
References:1. Doe J.
www.citedwebsite.com/exmpl [Accessed 1.1.2004]
2. -------------------3. -------------------4. -------------------
XMLManuscriptwith DOI®
DOI® server
IALibraries/Digital PreservationPartners
mirrorsSnapshotRetrievalRequest (DOI with Hash)
© WebCite®
LinkResolver
Reverse (citation-triggered) archiving Self (author-triggered) archiving
Third-party archiving
CrossRef®ForwardLinking XML
(optional) DOI assignment
Reader
(dynamic content)
(static content)
Citing Author
/comb
WebCite®
/archive/bookmarklet
What the world needsJ. Author
This is a sample citing paper [1].
References:1. Doe J.
www.citedwebsite.com/exmpl [Accessed 1.1.2004]
2. -------------------3. -------------------4. -------------------
IALibraries/Digital PreservationPartners
mirrorsSnapshotRetrievalRequest
© WebCite® Third-party archiving
Reader
Eysenbach, Gunther. Gunther Eysenbach Random Research Rants Blog. 2008-06-26. URL:http://gunther-eysenbach.blogspot.com. Accessed: 2008-06-26. (Archived by WebCite® at http://www.webcitation.org/5YreMGRz7)
Eysenbach, Gunther. Gunther Eysenbach Random Research Rants Blog. 2008-06-26.http://www.webcitation.org/query?url=http%3A%2F%2Fgunther-eysenbach.blogspot.com&date=2008-06-26
Two possible citation formats to cite the WebCite snapshot
Opaque (ID-based)
Transparent
(Note that there are also others: Hash-based, and citing-document-DOI-based)
4. Displays cached version
2. Request is redirected to webcitation
www.citedwebsite.com/exmplERROR: NOT FOUND
3. Attempts to retrieve “live” cited URL, if not found displays
cached version (and/or other versions)
What the world needsJ. Author
This is a sample citing paper [1].
References:1. Doe J.
www.webcitation.org?cache_url=www.citedwebsite.com/exmpl&cache_date=31.1.2003 [Accessed 31.1.2004]
2. -------------------3. -------------------4. -------------------
Webcitation.org
Reader point of view: for retrieving archived material the reader simply clicks on the WebCite link
1. Reader clicks on cited webcitation-URL(on 1.1.2005)
Cached version (timestamp 31.1.2004)
Bookmarklet
Can be used to rapidly archive the currently viewed webpage
(bookmarklet hands over current URL and email adress of the citing author to the WebCite server)
Citing Author
/comb
WebCite®
/archive Cited Author
/bookmarklet/archive
(self-archiving)What the world needs
J. Author
This is a sample citing paper [1].
References:1. Doe J.
www.citedwebsite.com/exmpl [Accessed 1.1.2004]
2. -------------------3. -------------------4. -------------------
IALibraries/Digital PreservationPartners
mirrorsSnapshotRetrievalRequest
© WebCite®
Reverse (citation-triggered) archiving Self (author-triggered) archiving
Third-party archiving
Reader
(dynamic content)
(static content)
As “potentially cited” author I can self-archive and add a static WebCite-enriched
reference as citation suggestion…
As “potentially cited” author I can self-archive and add a static WebCite-enriched
reference as citation suggestion…
… or I provide a dynamic link to the WebCite archiving form
(“WebCite this!”)
… or I provide a dynamic link to the WebCite archiving form
(“WebCite this!”)
Click on “WebCite this” populates the archiving form with metadata
from the cited author
(the same approach can be used by authors of wikis, datasets etc.)
Implementation from a publisher / editor point of view
Level 1-4 implementation
Time since author saw the cited webdocument
Author “webcites” document immediately(or reference manager takes care of this)Editors stipulate this in their Instructions for authors
Editor/Copyeditor “webcites” cited document before publication
1
2
WebCite® immediately archives cited webreferences on publication (combing XML files)
3
Retrospective focussed crawling of old articles4
Level 1-Implementation by journal editors: Instructions for authors
Citing Author
/comb
WebCite®
/archive/bookmarklet
Publisher/Editor
/archive
/comb
What the world needsJ. Author
This is a sample citing paper [1].
References:1. Doe J.
www.citedwebsite.com/exmpl [Accessed 1.1.2004]
2. -------------------3. -------------------4. -------------------
XMLManuscriptwith DOI®
IALibraries/Digital PreservationPartners
mirrors
© WebCite®
Reverse (citation-triggered) archiving Self (author-triggered) archiving
Third-party archiving
CrossRef®ForwardLinking XML
Implemented by >200 journals
What’s next
Future developments
WebCite 2.0
• User accounts• Enables users to view a list of the snapshots they
created (and to categorize and export them e.g. in BibTex, Refman etc.)
• Enables tagging, “crowdsourcing” of curation tasks such as metadata entering & reconciliation
• Recommender service (people who cited x also cited y)• Post-publication peer-review (others can rate
documents)• For cited authors
– WebCite® Impact Factor (access / citation statistics, which can be used for tenure & promotion purposes)
– WebCitation-Alert service
Implementation of WebCite® in tools facilitating “archive as you cite”
• Bibliographic management systems (Endnote, reference manager) and shared bookmarks (Connotea, CiteULike)
• XML-editing software (Word 2007 XML-addin, Lemon8 etc.)
• Plugin for OJS and other manuscript management systems (allowing authors to automatically WebCite all references in their manuscript)
WebCite® works within the International Internet Preservation
Consortium (IIPC)• Collect and preserve a rich body of Internet content from
around the world
• To foster the development and use of common tools, techniques and standards that enable the creation of international archives
• To encourage and support national libraries everywhere to address Internet collecting and preservation
http://netpreserve.org
2008 IIPC Members (38)• Asia
– Jewish National and University Library (Israel) – National Diet Library, Japan – National Library Board, Singapore – National Library of China
• Europe – Biblioteca de Catalunya (Library of Catalonia) – Biblioteca Nazionale Centrale di Firenze (National Library
of Italy, Florence) – Biblioteka Narodowa (National Library of Poland) – Bibliotheque nationale de France (National Library of
France) – British Library (U.K.) – Deutsche Nationalbibliothek (German National Library) – European Archive Foundation – Hanzo Archives Ltd. (U.K.) – Kansalliskirjasto (National Library of Finland) – Koninklijke Bibliotheek (National Library of the
Netherlands) – Kungl. biblioteket (National Library of Sweden) – Landsbokasafn Islands – Haskolabokasafn (National and
University Library of Iceland) – Latvijas Nacionālā bibliotēka (National Library of Latvia) – Nacionalna i sveučilišna knjižnica u Zagrebu (National
and University Library in Zagreb, Croatia) – Narodna in univerzitetna knjižnica (National and
University Library, Slovenia) – Národní knihovna České republiky (National Library of
the Czech Republic) – Nasjonalbiblioteket (National Library of Norway)
• Europe, cont.– National Archives (U.K.) – National Library of Scotland – Netarchive.dk (Royal Library and the State and University Library,
Aarhus) – Österreichische Nationalbibliothek (Austrian National Library) – Schweizerische Nationalbibliothek (Swiss National Library) – Virtual Knowledge Studio – Royal Netherlands Academy for Arts
and Sciences
• North America– Bibliothèque et Archives Nationales du Québec (BAnQ) – California Digital Library (U.S.) – Centre for Global eHealth Innovation, WebCite® Internet
Citations Archiving Project (Canada) – Internet Archive (U.S.) – Library and Archives Canada – Library of Congress (U.S.) – Library of Virginia (U.S.) – United States Government Printing Office – University of North Texas Libraries (U.S.)
• Oceania– National Library of Australia – National Library of New Zealand
The vision
• A global infrastructure (standard APIs) – for cross-archive searching of cited URLs (by
URL & date)– Decentralized storing of archived webmaterial
• Pilot project with WebCite®, Internet Archive, and Library and Archives Canada
Summary: What WebCite® contributes
• Links/URL no longer go 404 (dead)• WebCite’d content does not change• Internet material can be deemed citable and “archived”
– Encourages “openess” (authors contribute to blogs, wikis etc., and make their datasets available)
– Takes the submission load off journals – much of the scholarly communication can take place outside of journals
• Provides access/impact statistics for cited authors• Enables one-click self-archiving• “Internet Archiving 2.0”: Enables archiving of the
“hidden/deep web” (where crawlers cannot go), collaborative assignment of metadata
Call for action
• If you are an citing author: use WebCite next time you cite a non-journal URL
• If you are a blogger or a (potentially cited) author publishing online in any other way, put a “WebCite this!” link on your page
• If you are an editor/publisher: Implement WebCite in your workflow (instructions for authors, copyeditors, XML production department)
• If you are a librarian: Contact us to become a long-term preservation partner
www.medicine20congress.com, Toronto, Sept 4-5th, 2008
Thank you!
FundingChange Foundation, Canadian Institutes for Health Research, NSERC, European Union,
SSHRC
Dr G. Eysenbach, Email: geysenba at uhnres.utoronto.ca or @gmail.com,
My peer-reviewed Journal: http://www.jmir.org
My Blog: http://gunther-eysenbach.blogspot.com
My Conferences: http://www.medicine20congress.com
http://www.ehealthcongresss.org
My Slides: http://www.slideshare.net/eysen
Appendix
Copyright Issues
• WebCite® honors robot exclusion standards and “no-archive” tags
• Copyright holders can request removal of material• “Fair use” defence (used for non-profit/scholarly
purposes, only a part of the site was archived, etc.)• U.S. court ruled that Google’s caching does not
constitute a copyright violation, because of fair use and an implied license (Field vs Google, US District Court, District of Nevada, CV-S-04-0413-RCJ-LRL)
• In the future, WebCite® may also – Allow copyright holders to specify a fee-per-access royalty fee – Long-term goal: WebCite® does not physically store anything
but instead deposits the material in the respective National Libraries etc., who often have a legal deposit mandate*
Legal deposit: a copy of any work published in COUNTRY must be deposited with the National Library of COUNTRY
WebCite® is a disruptive technology
• If online articles/material are– Permanently archived and “citable”– Findable– “Rankable” (post-publication peer-review)– (all of which WebCite® plans to implement)
• … what will be the role of the traditional scholarly journal publication?– Quality of pre-publication peer-review, editing,
copyediting is key– Value-added services (e.g. semantic markup,
curation)
<ref id="ref19"><label>19</label>-- <nlm-citation citation-type="web"><article-title>Who Gets ALS</article-title><source>ALS Association</source><access-date>2008 Apr 25</access-date>- <comment><ext-link xlink:type="simple"xlink:href="http://www.alsa.org/als/who.cfm"ext-link-type="uri">http://www.alsa.org/als/who.cfm</ext-link></comment><pub-id pub-id-type=“other">5Y0NuDIU9</pub-id></nlm-citation></ref>
<ref id="ref19"><label>19</label>-- <nlm-citation citation-type="web"><article-title>Who Gets ALS</article-title><source>ALS Association</source><access-date>2008 Apr 25</access-date>- <comment><ext-link xlink:type="simple"xlink:href="http://www.webcitation.org/query?url= http://www.alsa.org/als/who.cfm&date=2008-04-25"ext-link-type="uri"> http://www.webcitation.org/query?url= http://www.alsa.org/als/who.cfm&date=2008-04-25 </ext-link></comment></nlm-citation></ref>
Citing Author
/comb
WebCite®
/archive Cited Author
/bookmarklet/archive
(self-archiving)
Publisher/Editor
/archive
/comb
What the world needsJ. Author
This is a sample citing paper [1].
References:1. Doe J.
www.citedwebsite.com/exmpl [Accessed 1.1.2004]
2. -------------------3. -------------------4. -------------------
XMLManuscriptwith DOI®
DOI® server
IALibraries/Digital PreservationPartners
mirrorsSnapshotRetrievalRequest (DOI with Hash)
© WebCite®
LinkResolver
Reverse (citation-triggered) archiving Self (author-triggered) archiving
Third-party archiving
CrossRef®ForwardLinking XML
(optional) DOI assignment
Reader
(dynamic content)
(static content)