24
Lessons from the Open Citation Project Presented by Steve Hitchcock, Southampton University These slides prepared for The Open Archives Initiative: application and exploitation, a one- day seminar on the application and exploitation of the OAI Protocol for Metadata Harvesting, May 14, 2003, London A joint JISC-NSF International Digital Libraries Project 1999-2002

Lessons from the Open Citation Project

Embed Size (px)

DESCRIPTION

Lessons from the Open Citation Project. Presented by Steve Hitchcock, Southampton University - PowerPoint PPT Presentation

Citation preview

Page 1: Lessons from the  Open Citation Project

Lessons from the Open Citation Project

Presented by Steve Hitchcock, Southampton University

These slides prepared for The Open Archives Initiative: application and exploitation, a one-day seminar on the application and exploitation of the

OAI Protocol for Metadata Harvesting, May 14, 2003, London

A joint JISC-NSF

International Digital Libraries Project 1999-2002

Page 2: Lessons from the  Open Citation Project
Page 3: Lessons from the  Open Citation Project

A post-Google information environment

Electronic journals exist in a post-Gutenberg and a post-Google information environment

The ability to locate a specified item of information precisely and instantly among the mass of information available on the Web has profound implications.

In the electronic environment the search engine has become the de facto interface to information, rather than the fragmented packages that have migrated from the print world.

Page 4: Lessons from the  Open Citation Project

About this presentation

• Citebase: citation-ranked search and impact discovery service– New scientometric indices– Evaluating Citebase

• EPrints.org software: free software to build and manage OAI-compliant eprint archives

• Growth of OAI, Eprints.org and institutional archives

• How to accelerate the growth of OAI eprint archives

Page 5: Lessons from the  Open Citation Project

Citebase, a discovery service with usage- and citation-bases ranking

http://citebase.eprints.org/

“Google for the refereed literature”

Citebase is based on a citation database

• Harvests metadata using OAI-PMH

• Extracts and indexes citations from published research papers stored in the larger open access, OAI disciplinary archives - currently arXiv, CogPrints and BioMed Central

• Provides impact (and other)-ranked search based on reference data

• Re-exports metadata + references

Page 6: Lessons from the  Open Citation Project

Some old and new scientometric (“publish or perish”) indices of

research impact• Quality-level and citation-counts of the journal in which the article appears

• Citation-counts for the article

• Citation-counts for the researcher

• Co-citations, co-text (cited with whom/what else?)

• Citation-counts for the preprint

• Usage-measures (“hits”, Webmetrics)

• Time-course analyses, early predictors, etc.

Page 7: Lessons from the  Open Citation Project

Citebase, a new interface to the scholarly literature

Page 8: Lessons from the  Open Citation Project

Time-Course of Citations (red) and Usage (hits, green)

Witten, Edward (1998) String Theory and Noncommutative Geometry Adv. Theor. Math. Phys. 2 : 253

1. Preprint or Postprint appears. 2. It is downloaded (and sometimes read).3. Eventually citations may follow (for more important papers).4. This generates more downloads, etc.

“Perhaps the most important new information to become available for bibliometric studies is the per article readership information.”Kurtz et al. (2003) "The NASA Astrophysics Data System: Sociology, Bibliometrics and Impact" http://cfa-www.harvard.edu/~kurtz/jasist-submitted.ps

Page 9: Lessons from the  Open Citation Project

Evaluating Citebase

http://opcit.eprints.org/opcitevaluation.shtml

• First detailed user evaluation of an open access Web citation indexing service

• The evaluation was aimed at users of arXiv, and all others who use bibliographic services to access the refereed journal literature.

• Citebase was evaluated by nearly 200 users from different backgrounds between June and October 2002

• Just prior to the evaluation Citebase had records for 230,000 papers, indexing 5.6 million references.

• By discipline, approximately 200,000 of these papers are classified within arXiv physics archives.

Page 10: Lessons from the  Open Citation Project

Results of Citebase evaluation• Web-based citation indexing of open access eprint archives is closer to a state of readiness for serious use than had previously been realised

• Within the scope of its primary components, the search interface and services available from its rich bibliographic records, Citebase can be used simply and reliably for the purpose intended

• Tasks can be accomplished efficiently with Citebase regardless of the background of the user

• Links to citing and co-citing papers are features of Citebase that are valued by users

• Citebase compares favourably with other bibliographic services

• Coverage is seen as a limiting factor. Non-physicists were frustrated at the lack of papers from other sciences

Page 11: Lessons from the  Open Citation Project

Accomplishing tasks with CitebaseTasks can be accomplished efficiently with Citebase regardless of the background of the user.

A key part of the evaluation assessed the usability of Citebase with a practical exercise to build a short bibliography based on a series of questions

Yellow line, T=true

Blue, F=false

Purple, N=no response

All users

Physicists only

Page 12: Lessons from the  Open Citation Project

Most useful features of Citebase

Links to citing and co-citing papers are features of Citebase that are valued by users

Page 13: Lessons from the  Open Citation Project

Citebase compares favourably with other bibliographic services

Page 14: Lessons from the  Open Citation Project

Growth of OAI, Eprints.org and Institutional Archives

How OAI Archives for institutional research output have been growing – and how to accelerate their growth

The following slides are taken from the presentation The Research Impact Cycle, which contains key data on the growth of open access through the self-archiving of institutional (peer-reviewed) research. These data can be freely used or adapted for other talks. Copy this PPT version for reuse.

http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving.ppt

Data collected and analysed by Tim Brody, Electronics and Computer Science, Southampton University

Page 15: Lessons from the  Open Citation Project

Growth in number of OAI Archives

(now 140+ Archives, but the average number of papers per Archive (9000) needs to grow faster!)

Number of Archives and Mean Number of Papers Per Archive (all OAI Archives)

0100020003000400050006000700080009000

10000

Mean

Rec

ord

s p

er A

rch

ive

020406080100120140160

Arc

hiv

es

Cumulative Mean Records per Archive Cumulative Archives to Date

Page 16: Lessons from the  Open Citation Project

EPrints.org softwarehttp://www.eprints.org/

Generates eprint archives that are compliant with the OAI Protocol for Metadata Harvesting.

Eprints.org software has been used to build institutional archives, and disciplinary archives.

In conjunction with OAI, Eprints.org has been a primary motivator for institutional archives

Eprints.org v. 2.0 released February 2002 (now on v. 2.2.1)

EPrints is free (GPL) software, aimed at organisations and communities.

Page 17: Lessons from the  Open Citation Project

Growth in number of Eprints.org Archives (c. 70)

(again, average number of papers per Archive [c. 120] needs to grow faster!)

Cumulative Number of Eprints.org Archives and Mean Number of Papers Per Archive (- top 3)

0

20

40

60

80

100

120

140

mar

s-01

mai-

01

juil-0

1

sept

-01

nov-0

1

janv-

02

mar

s-02

mai-

02

juil-0

2

sept

-02

nov-0

2

Mea

n R

eco

rds

per

Arc

hiv

e

0

10

20

30

40

50

60

70

Arc

hiv

es

Mean Records per Archive Cumulative Archives to Date

Page 18: Lessons from the  Open Citation Project

Work that needs to be done to accelerate growth per archive

These curves must become convex upward: Institutional self-archiving policies are needed

Latency of Record Additions to New EPrints Archives

0

500

1000

1500

2000

2500

3000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Latency (30 Day Periods)

New

Rec

ords

05101520253035404550

Mea

n

New Records in Latency Period Mean New Records per Archive

Page 19: Lessons from the  Open Citation Project

What have we learned from the Open Citation Project?

• OAI is gathering momentum

• Software for building OAI repositories is available

• Institutional archives are beginning to be created, but need to be filled by authors

• Attracting authors requires evidence of services that will improve the visibility and impact of their works

• Citation-ranked search and reference linking are examples of OAI services that do this

Page 20: Lessons from the  Open Citation Project

“Online or Invisible?” (Lawrence 2001)

“average of 336% more citations to online articles compared to offline articles published in the same venue”

Lawrence, S. (2001) “Free online availability substantially increases a paper's impact”. Nature, 411 (6837): 521

http://www.neci.nec.com/~lawrence/papers/online-nature01/

Page 21: Lessons from the  Open Citation Project

What is needed to fill the archives1. Universities: Adopt a university-wide policy of self-

archiving all university research output, e.g. Southampton (ECS) Research Self-Archiving Policy http://www.ecs.soton.ac.uk/~lac/archpol.html

2. Departments: Create Departmental OAI-compliant Eprint Archives

3. University Libraries: Provide digital library support for research self-archiving and archive-maintenance

4. Promotion Committees: Request a standardized online CV from all candidates, with refereed publications all linked to their full-texts in the Departmental Archives

5. Research Funders: Assess research impact online (from the online CVs)

Page 22: Lessons from the  Open Citation Project

Mandating online UK Research Assessment CVs linked to university eprint archives

"will set an example for the rest of the world that will almost certainly be emulated in terms of research assessment and

research access"

Ariadne, issue 35, April 30, 2003 http://www.ariadne.ac.uk/issue35/harnad/

Page 23: Lessons from the  Open Citation Project

Exploiting OAI

• OAI has become the critical technical infrastructure for open access to author self-archived papers in institutional archives

• OAI enables cross-archive services such as Citebase

• Open access data and services promise increased visibility and impact for authors

• OAI resources will begin to grow significantly when authors realise this, and when research councils start mandating open access to the publication of results of funded research

Page 24: Lessons from the  Open Citation Project

Credits: Open Citation Project @ Southampton

• Principal Investigator is Stevan Harnad

• Technical development at Southampton is directed by Les Carr

• EPrints.org software is being developed by Chris Gutteridge

• Citebase is produced and managed by Tim Brody

• Project manager is Steve Hitchcock

A copy of these slides can be found on the OpCit Web site

http://opcit.eprints.org/. Look for Papers and Presentations

Contact Steve Hitchcock: [email protected]