Upload
roanna-moon
View
19
Download
1
Embed Size (px)
DESCRIPTION
Lessons from the Open Citation Project. Presented by Steve Hitchcock, Southampton University - PowerPoint PPT Presentation
Citation preview
Lessons from the Open Citation Project
Presented by Steve Hitchcock, Southampton University
These slides prepared for The Open Archives Initiative: application and exploitation, a one-day seminar on the application and exploitation of the
OAI Protocol for Metadata Harvesting, May 14, 2003, London
A joint JISC-NSF
International Digital Libraries Project 1999-2002
A post-Google information environment
Electronic journals exist in a post-Gutenberg and a post-Google information environment
The ability to locate a specified item of information precisely and instantly among the mass of information available on the Web has profound implications.
In the electronic environment the search engine has become the de facto interface to information, rather than the fragmented packages that have migrated from the print world.
About this presentation
• Citebase: citation-ranked search and impact discovery service– New scientometric indices– Evaluating Citebase
• EPrints.org software: free software to build and manage OAI-compliant eprint archives
• Growth of OAI, Eprints.org and institutional archives
• How to accelerate the growth of OAI eprint archives
Citebase, a discovery service with usage- and citation-bases ranking
http://citebase.eprints.org/
“Google for the refereed literature”
Citebase is based on a citation database
• Harvests metadata using OAI-PMH
• Extracts and indexes citations from published research papers stored in the larger open access, OAI disciplinary archives - currently arXiv, CogPrints and BioMed Central
• Provides impact (and other)-ranked search based on reference data
• Re-exports metadata + references
Some old and new scientometric (“publish or perish”) indices of
research impact• Quality-level and citation-counts of the journal in which the article appears
• Citation-counts for the article
• Citation-counts for the researcher
• Co-citations, co-text (cited with whom/what else?)
• Citation-counts for the preprint
• Usage-measures (“hits”, Webmetrics)
• Time-course analyses, early predictors, etc.
Citebase, a new interface to the scholarly literature
Time-Course of Citations (red) and Usage (hits, green)
Witten, Edward (1998) String Theory and Noncommutative Geometry Adv. Theor. Math. Phys. 2 : 253
1. Preprint or Postprint appears. 2. It is downloaded (and sometimes read).3. Eventually citations may follow (for more important papers).4. This generates more downloads, etc.
“Perhaps the most important new information to become available for bibliometric studies is the per article readership information.”Kurtz et al. (2003) "The NASA Astrophysics Data System: Sociology, Bibliometrics and Impact" http://cfa-www.harvard.edu/~kurtz/jasist-submitted.ps
Evaluating Citebase
http://opcit.eprints.org/opcitevaluation.shtml
• First detailed user evaluation of an open access Web citation indexing service
• The evaluation was aimed at users of arXiv, and all others who use bibliographic services to access the refereed journal literature.
• Citebase was evaluated by nearly 200 users from different backgrounds between June and October 2002
• Just prior to the evaluation Citebase had records for 230,000 papers, indexing 5.6 million references.
• By discipline, approximately 200,000 of these papers are classified within arXiv physics archives.
Results of Citebase evaluation• Web-based citation indexing of open access eprint archives is closer to a state of readiness for serious use than had previously been realised
• Within the scope of its primary components, the search interface and services available from its rich bibliographic records, Citebase can be used simply and reliably for the purpose intended
• Tasks can be accomplished efficiently with Citebase regardless of the background of the user
• Links to citing and co-citing papers are features of Citebase that are valued by users
• Citebase compares favourably with other bibliographic services
• Coverage is seen as a limiting factor. Non-physicists were frustrated at the lack of papers from other sciences
Accomplishing tasks with CitebaseTasks can be accomplished efficiently with Citebase regardless of the background of the user.
A key part of the evaluation assessed the usability of Citebase with a practical exercise to build a short bibliography based on a series of questions
Yellow line, T=true
Blue, F=false
Purple, N=no response
All users
Physicists only
Most useful features of Citebase
Links to citing and co-citing papers are features of Citebase that are valued by users
Citebase compares favourably with other bibliographic services
Growth of OAI, Eprints.org and Institutional Archives
How OAI Archives for institutional research output have been growing – and how to accelerate their growth
The following slides are taken from the presentation The Research Impact Cycle, which contains key data on the growth of open access through the self-archiving of institutional (peer-reviewed) research. These data can be freely used or adapted for other talks. Copy this PPT version for reuse.
http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving.ppt
Data collected and analysed by Tim Brody, Electronics and Computer Science, Southampton University
Growth in number of OAI Archives
(now 140+ Archives, but the average number of papers per Archive (9000) needs to grow faster!)
Number of Archives and Mean Number of Papers Per Archive (all OAI Archives)
0100020003000400050006000700080009000
10000
Mean
Rec
ord
s p
er A
rch
ive
020406080100120140160
Arc
hiv
es
Cumulative Mean Records per Archive Cumulative Archives to Date
EPrints.org softwarehttp://www.eprints.org/
Generates eprint archives that are compliant with the OAI Protocol for Metadata Harvesting.
Eprints.org software has been used to build institutional archives, and disciplinary archives.
In conjunction with OAI, Eprints.org has been a primary motivator for institutional archives
Eprints.org v. 2.0 released February 2002 (now on v. 2.2.1)
EPrints is free (GPL) software, aimed at organisations and communities.
Growth in number of Eprints.org Archives (c. 70)
(again, average number of papers per Archive [c. 120] needs to grow faster!)
Cumulative Number of Eprints.org Archives and Mean Number of Papers Per Archive (- top 3)
0
20
40
60
80
100
120
140
mar
s-01
mai-
01
juil-0
1
sept
-01
nov-0
1
janv-
02
mar
s-02
mai-
02
juil-0
2
sept
-02
nov-0
2
Mea
n R
eco
rds
per
Arc
hiv
e
0
10
20
30
40
50
60
70
Arc
hiv
es
Mean Records per Archive Cumulative Archives to Date
Work that needs to be done to accelerate growth per archive
These curves must become convex upward: Institutional self-archiving policies are needed
Latency of Record Additions to New EPrints Archives
0
500
1000
1500
2000
2500
3000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Latency (30 Day Periods)
New
Rec
ords
05101520253035404550
Mea
n
New Records in Latency Period Mean New Records per Archive
What have we learned from the Open Citation Project?
• OAI is gathering momentum
• Software for building OAI repositories is available
• Institutional archives are beginning to be created, but need to be filled by authors
• Attracting authors requires evidence of services that will improve the visibility and impact of their works
• Citation-ranked search and reference linking are examples of OAI services that do this
“Online or Invisible?” (Lawrence 2001)
“average of 336% more citations to online articles compared to offline articles published in the same venue”
Lawrence, S. (2001) “Free online availability substantially increases a paper's impact”. Nature, 411 (6837): 521
http://www.neci.nec.com/~lawrence/papers/online-nature01/
What is needed to fill the archives1. Universities: Adopt a university-wide policy of self-
archiving all university research output, e.g. Southampton (ECS) Research Self-Archiving Policy http://www.ecs.soton.ac.uk/~lac/archpol.html
2. Departments: Create Departmental OAI-compliant Eprint Archives
3. University Libraries: Provide digital library support for research self-archiving and archive-maintenance
4. Promotion Committees: Request a standardized online CV from all candidates, with refereed publications all linked to their full-texts in the Departmental Archives
5. Research Funders: Assess research impact online (from the online CVs)
Mandating online UK Research Assessment CVs linked to university eprint archives
"will set an example for the rest of the world that will almost certainly be emulated in terms of research assessment and
research access"
Ariadne, issue 35, April 30, 2003 http://www.ariadne.ac.uk/issue35/harnad/
Exploiting OAI
• OAI has become the critical technical infrastructure for open access to author self-archived papers in institutional archives
• OAI enables cross-archive services such as Citebase
• Open access data and services promise increased visibility and impact for authors
• OAI resources will begin to grow significantly when authors realise this, and when research councils start mandating open access to the publication of results of funded research
Credits: Open Citation Project @ Southampton
• Principal Investigator is Stevan Harnad
• Technical development at Southampton is directed by Les Carr
• EPrints.org software is being developed by Chris Gutteridge
• Citebase is produced and managed by Tim Brody
• Project manager is Steve Hitchcock
A copy of these slides can be found on the OpCit Web site
http://opcit.eprints.org/. Look for Papers and Presentations
Contact Steve Hitchcock: [email protected]