27
Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010

Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

Embed Size (px)

Citation preview

Page 1: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

Archiving Research Data, Dryad,and Publishers

Neil Beagrie, Charles Beagrie Ltd

Bloomsbury Conference June 2010

With contributions from Julia Chruszcz, Peter Williams, and Todd Vision

Page 2: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

Overview• The Challenge;

• The Dryad Consortium;

• Supplementary Data and Publishers;

• Research Data Preservation Costs (KRDS);

• The Future.

Page 3: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

The Challenge

Page 4: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

4

PRC Global Study

n=3759

n=2940

n=1262

n=1653

n=2989

n=2118

n=1294

n=2565

n=1868

n=2273

n=841

n=2362

Source: PRC global study (forthcoming)

Page 5: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

Requesting Data

• Wicherts et al. (2006 Am. Psychol. 61, 726) requested data from the 141 most recent articles in American Psychological Association (APA) journals.

“6 months later, after … 400 emails, [sending] detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes…”

Only 27% of authors shared their data

Page 6: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

The Dryad Consortium of Scholarly Societies and publishers (and libraries)

Page 7: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

Archiving at publication

• Avoids loss, corruption, obsolescence of data files;

• The point in time when authors are best able to ensure the correctness of data and metadata;

• Authors have incentive to deposit their data in order to complete the publication process;

• Journals are best able to monitor compliance with policy;

• In short, the “Genbank model” works.

Page 8: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

Incentives to authors• Access to colleagues’ data• Visibility and citability

– Another way for work to have high impact

• Integration– Combinability with other data adds value

• Long-term preservation– Including data format migration

• Ad hoc data sharing can be burdensome– Deposition to multiple specialized repositories– Fulfilling individual requests for data takes effort

Page 9: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

Joint Data Archiving Policy

• DEPOSIT AT PUBLICATION– As a condition for publication, all data used in the paper should be

archived in an appropriate public archive.

• REPEATABILITY– Data should be given with sufficient detail so that together with the

paper content, each result in the published paper may be re-created.

• EMBARGO– Authors may elect to have the data publicly available at time of

publication, or if the archive allows opt to embargo access to the data.

• EXCEPTIONS– Exceptions may be granted at the discretion of the editor, especially

for sensitive information such as the location of endangered species.

• COORDINATION– The aim is for the Dryad consortium of journals to adopt this policy

simultaneously.

Page 10: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

That’s all well and good, but where’s this “appropriate

public archive”?

Page 11: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

A mosaic of specialized databases• There are a growing number to which deposition

is encouraged/required (Genbank, Treebase)– And others are emerging

• A world in which every datatype had its own required database, each with its own submission system:– Would be a huge burden on authors– Would inevitably leave some data orphaned– Might never be financially possible

Page 12: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

Overcoming the submission burden

• Integrating journal submission and data submission– Prepopulating bibliographic metadata– “Handshaking” with specialized repositories

• Enhancing low-quality author-provided metadata– Human curation– Machine assisted metadata enhancement

Page 13: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

The Dryad Digital Repository

Page 14: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

The Repository

• Dryad is a repository (at Duke) for datasets underlying scientific research articles;

• Its initial focus has been evolution and ecology;�• Participating journals subscribe to the Joint Data �

Archiving Policy;• Dryad datasets will have (DOIs), and Creative �

Commons ‘CC-Zero’ licenses;• Project Funded by the National Science Foundation �

2008-2012;• Sustainability plan a key deliverable.

Page 15: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

Supplementary Data and Publishers

Page 16: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

Overview• Consultancy for Dryad Sustainability: covered areas of draft

business plan and sustainability for Dryad

• Presenting one of the contributions(publishers) to section on Comparators and Costs

• Outcomes from desk research and 12 interviews with publishers/data publishers + some additional input drawn from Keeping Research Data Safe

• Very brief presentation – article in preparation for Learned Publishing Oct 2010 issue….KRDS2 available from JISC

Page 17: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

Interviewees• Journal of Clinical Investigation• Journal of the American Medical Association• Molecular Phylogenetics and Evolution (Elsevier)• Journal of Heredity (OUP)• Ecological Society of America• Wiley-Blackwell + Ecology Letters• Royal Society• Federation of American Societies for Experimental Biology• OECD Publishing• Internet Archaeology and Archaeology Data Service• Pangaea: Publishing Network for Geoscientific & Environmental

Data• Dataverse Network (Social Sciences, Harvard)

Page 18: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

Some Findings: growth• Many interviewees stated that supplementary data and

materials are showings rapid growth• 3 gave figures: from 32 articles in 2000, to 251 in 2009 – an

increase of 784%; from 6% in 2005 to 38% in 2009; from 2% a decade ago to 87% in 2009.

Page 19: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

Some Findings: workflow• supplementary data have grown organically at the various

journals investigated (author driven);• Both the work and the costs being absorbed into the daily

running of journals;• in 4 cases minimal impact on work duties; in 5 others there was a

significant but often unquantified impact (two of these might be considered data publications with a focus on publishing data papers or datasets); and in 3 cases the information was not available or unknown;

• can be explained in terms of level of effort or importance applied : the greatest levels of effort are associated with copy editing, format migration, addition of metadata, etc, whilst the least effort is required for simply hosting the material; and/or high-levels of automation in the workflow.

Page 20: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

Some Findings: costs• These were in most cases unknown or only partially known;• Costs mentioned but usually not quantified include: digital

storage costs, salary costs of journal staff; and long term preservation costs;

• detailed cost information was really only available from Internet Archaeology via Archaeology Data Service which had participated in an activity based costing study (KRDS2);

• Internet Archaeology archiving costs reflect those for a “dataset publisher” so only a comparator for part of Dryad’s content – large datasets.

Page 21: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

Some Findings: revenue• only author fees and journal subscription fees were

mentioned as current revenue sources for the supplementary materials in journals;

• 3 journals interviewed have author charges for supplementary materials (see next slide);

• The data archiving and sharing organisations interviewed relied primarily on (uncertain) research grants and temporary or re-current core funding, but one had access to a small endowment and another has a charging policy for some depositors.

Page 22: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

Some Findings: author charges• Journal of Clinical Investigation - authors are charged $300 for

supplemental data to appear online with accepted articles; • Ecological Archives - submission of ‘appendices and

supplements’ is free up to 10MB. Above this, there is a fee of $250 for the first 1 GB and $50 for each subsequent GB. The fee for publication of a data paper is $250 for publication of the abstract in the relevant journal plus publication of up to 10 MB in Ecological Archives. An additional $250 is charged for data sets between 10MB and 1GB, and for larger datasets there is an additional $50 per GB fee;

• The Federation of American Societies for Experimental Biology (FASEB) charges $100 for each Supplemental file.

Page 23: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

Keeping Research Data Safe (KRDS1 & KRDS2):

JISC-funded studies of Research Data Preservation Costs

(separate Dryad costing project by Lori Eakin-Richards based on KRDS approach)

Page 24: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

KRDS: what did we learn?Whole of Service costing/Seeing the“Big Picture”

Selection of 2009 Allocation of UKDA Activity Costs

Acquisition 5.8%

Ingest 21.5%

A. Storage +Pres. Planning 3.1%

Access 16.9%

Page 25: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

KRDS:Implications

• Changing view of digital preservation costs: – “getting stuff in and out” costs much higher than

“keeping it (bit preservation + migration)”;– Staff costs c.70% of total costs;– Importance of economies of scale and

automation;– Findings of KRDS and Dryad Repository’s own

activity costing projections fed into Dryad sustainability planning.

Page 26: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

Future Plans• Dryad sustainability plan being put to Dryad

member societies and publishers;

• Dryad extending consortium to new members –achieving economies of scale;

• Bid to JISC to establish Dryad-UK;

• Extending KRDS research and implementations.

Page 27: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

Further InformationDryad see www.datadryad.org

Keeping Research Data Safe2 (KRDS2) webpage at www.beagrie.com/jisc.php

KRDS2 report available from JISC website http://www.jisc.ac.uk/publications/reports/2010/keepingresearchdatasafe2.aspx#downloads

Email: [email protected]