57
How can we ensure that data is reusable? The role of Publishers in Research Data Management LEARN 2nd Workshop, Vienna Catriona MacCallum, Senior Advocacy Manager PLOS, Consulting Editor, PLOS ONE Member of the Boards OASPA, OpenAire April 2016

How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Embed Size (px)

Citation preview

Page 1: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

How can we ensure that

data is reusable? The role of Publishers in Research Data

Management

LEARN 2nd Workshop, Vienna

Catriona MacCallum, Senior Advocacy Manager PLOS, Consulting

Editor, PLOS ONE

Member of the Boards OASPA, OpenAire

April 2016

Page 2: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

PLOS – a publisher since 2003

Page 3: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

PLOS ONE

• Multi-disciplinary

• Online only

• Open access (CC BY)

• Large, independent editorial board (>6000)

• Manuscripts assessed only on the rigour of the science,

not the novelty/scope of the topic

• Enables publication of negative/inconclusive results (&

data)

Page 4: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum
Page 5: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum
Page 6: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

DATA

Page 7: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Data Availability

Probability of finding the

data associated with a

paper declined by 17%

every year

Vines, Timothy et al. “The

Availability of Research Data

Declines Rapidly with Article

Age.” Current Biology 24, no. 1

(June 1, 2014): 94–97.

doi:10.1016/j.cub.2013.11.014.

Page 8: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

What are publishers doing…?

Summary data in the paper

• In tables and figures within the final published article

• Data used to compile figures and tables generally not provided

• In supplementary material

Archiving

• Held by author, journal

• In institutional or other data repositories, at author’s discretion

A plethora of policies and non-policies

• Generally journal specific

• Sometimes publisher specific

• Generally not enforced

Page 9: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Major Publisher Policies

Wiley, Springer-Nature, Taylor & Francis

• Partnerships with Figshare, Dryad

• Sharing depends on journal policy

• Where required, enforcement by journal editors

Elsevier

• Encourages data sharing but no explicit partnerships

• Reuse depends on licence of article

• Testing which licences should be applied

• Open Data Pilot

• Hosting research data on Science Direct (CC BY)

Society Publishers?

• Journal/Discipline specific

Page 10: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Data Journals

Page 11: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Licences

Data in papers subject to the same copyright and licence as the

paper

• Subscription journals restrict access

• Some journals have a different licence for supplementary

information (Nature?)

Open Access licences vary

• Many hybrid articles restrict commercial re-use

• Most OA publishers apply CC BY

Bespoke licences

• STM association licenses

• Repository specific licences

Mostly incompatible with Text & Data Mining

Page 12: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

PLOS data policy

PLOS journals require authors to make all data underlying

the findings described in their manuscript fully available

without restriction, with rare exception.

When submitting a manuscript online, authors must

provide a Data Availability Statement describing

compliance with PLOS's policy. If the article is accepted

for publication, the data availability statement will be

published as part of the final article.

Since March 3, 2014

Page 13: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

DAS

NB The DAS is openly available, and machine-readable as part of

the PLOS search API

Page 14: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Guidance

General guidelines:

http://journals.plos.org/plosone/s/data-availabilityplos

PLOS External Data Advisory Group

PLOS Data Lead: Emma Ganley

Academic chair: Phil Bourne

Discipline specific guidelines:

Page 15: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Data Availability

PLOS Data Availability Policy

Define compliance

In 2015: ~95% of PLOS ONE

papers have Data Availability

Statement

But what is true compliance?

Tim Vines, Richard Van Noorden

Page 16: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Anecdotes & Interpretation

‘Mandated data archiving greatly improves access

to research data’ T. H Vines et al. Faseb J 27,

1304-1308; Jan 2013

Source: ‘Confusion over publisher’s pioneering

open-data rules’ Nature 515, 478 (27 November

2014) doi:10.1038/515478a

50 fMRI studies in PLOS ONE1

38 had shared the data

12 had not shared the data (completely anecdotal)

An increase in data sharing2:

- from 12% to 40%

- even up to as much as 76%

Not seeing full compliance but we

are seeing a MASSIVE improvement

Page 17: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Where are the Data (PLOS)

Time Papers with DAS

Data in Submission Files

(#)

Data in Submission

Files (%)

Data in Repositories (Estimate)

Data upon Request

(Estimate)

Q2-Q4 2014 9491 7918 74% 11% 10%

Q2-Q4 2015 22142 15382 69% 14% 12%

Dryad Figshare NCBI Github

Q2-Q4 2014 152 210 551 37

Q2-Q4 2015 551 753 1229 174

Percent change

50% increase 54% increase 8% decrease <1% consistent

DAS = Data availability statement

Page 18: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Data checks on PLOS ONE

Contractor (‘Editorial Office’) does initial check

• Flags any instance of not being able to share the data publicly

• sends author a detailed request (template) with the decision

letter.

• Escalates further concerns to PLOS Staff

14 Publication Assistants work on escalated data related issues

• During peer-review and final checks.

• Currently amounts to two full time staff

• Problem papers escalated to internal editorial staff

• 1-2 a month (but time intensive)

• Post-publication concerns raised by readers

• So far 36 corrections

• 4 republications for identifying patient information

Page 19: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Major issues (PLOS)

• Most papers say data are within the paper and SI files

but...

• Patient IDi info: we check all clinical SI files on

acceptance, but still too many papers reveal information

• Cohort/Consortia/Multi-institutional and Multi-national

studies:

• Many have steering committees that do not permit public

deposition of the data

• restrictions are not always for ethical or legal reasons and

some of these groups also require authorship for access to

data.

Page 20: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

PLOS Data Policy

OTHER ISSUES ENCOUNTERED ALONG THE WAY

• Concern re early sharing & scooping.

• How much data checking should editors/reviewers do?

• Which data are actually required?

• Lack of or inconsistent community standards

• Which repositories?

• Un-extractable data, proprietary file-types.

• Tension between patient privacy issues and data sharing

• Fibbing authors

• Field specific differences about what’s acceptable

Page 21: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Challenges

QUESTIONS WE DON’T KNOW ANSWERS TO YET

• Treatment of software/code

• How should materials sharing differ

• What to do with big data?

• Do we need better/more aligned consenting for patient studies?

• Best practices for data access committees?

• How to fund data access committees?

• Preservation of obsolete formats?

• How to cite data & credit data reuse?

Michael Carroll. PLOS Biology 2015. Sharing Research Data and Intellectual Property Law: A

Primer http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002235

Page 22: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

CREDIT

Page 23: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

On January 7, 2016, a coalition of

publishers sign an Open Letter committing

to start requiring ORCID IDs in 2016.

1. Implementing best practices for

ORCID collection and auto-update of

ORCID records upon publication

2. Require ORCID IDs for corresponding

authors and encourage for co-authors

Page 24: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

CRediT – Contributor Roles Taxonomy

A simple taxonomy of research

contributions developed under the

auspices of CASRAI and NISO.

- Includes but not limited to

traditional authorship roles

- Makes contributions machine-

readable and portable

- Meant to inspire development:

Mozilla badges, VIVO-ISF

ontology, JATS integration,

ORCID integration

Page 25: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

The CRediT taxonomy is by design simple, which may become limiting,

but it provides an important framework for authorship discussions.

Ideal solution:

* includes a free text field for each contribution

* can be used upstream from submission, during research

Page 26: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Persistent identifiers and metadata

• Data citation not standard practice

• Inability to link data to papers

• No separate identifiers for figures, tables,

supplementary material etc

• Low adoption of persistent identifiers for Researchers

• Persistent identifiers for Funders & Institutions in flux

Page 27: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

The ecosystem of persistent identifiers is

growing Contributions in a machine readable format can enrich this ecosystem

DOI

DOI accession # DOI

Page 28: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

OPEN CITATIONS will create services for authors e.g. linking EU PMC’s Open Citations to an ORCID iD

Page 29: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum
Page 30: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

THOR

• EU funded project (~Euro 2 million)

• A consortium of partners

• British Library ORCID, Datacite, CERN, EMBL-EBI,

Pangaea, Australian National Data Service, Dryad,

PLOS, Elsevier

• Project Duration is 30 months (~June 2015-Dec 2017)

http://project-thor.eu/

Page 31: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Data Citation (1): credit for data producers and collectors

• Should comply with Force11 Data Citation Principles

• Minimum Requirements

• author names, repository name, date + persistent unique

identifier (such as DOI or URI)

• citation should link to the dataset directly via the

persistent identifier

• comprehensive, machine-readable landing pages for

deposited data

• guidance to authors to include data in references

https://www.force11.org/group/joint-declaration-data-citation-principles-final

Page 32: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Data Citation (2): challenges

• Only ‘approved’ repositories?

• In main reference list or separate?

• Distinguish between data produced in the study versus

reuse of data produced elsewhere?

• Datasets that are continuously updated

• Data citations not tagged by publishers (JATS)

• Data citation metrics

• Software citation

Page 33: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

BEYOND THE ARTICLE

Page 34: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Protocols.io

• Data base of experimental protocols

• Open access and free for users

• Desktop and mobile applications

• Functionality to

• Create

• Fork – create derivatives (keeps provenance)

• Run

• Annotate while running

• Keep date-stamped version of actual run

• Export to PDF, etc

10k registrants

1,000 private protocols

Page 35: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Peer review reports are data too:

innovations

Page 36: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

DATA INTEGRITY (publishing)

Page 37: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

37

Page 38: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Retractions

• Retraction of a research article is a complete and

permanent removal from the scientific record

• Although an article remains accessible to readers, it

should no longer be cited.

• Reasons for retraction:

• Conclusions cannot be relied upon and are no longer

supported (invalid results)

• Serious breach of research or publication ethics

Page 39: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Why are papers retracted?

Van Noorden, Nature 478, 26-28 (2011)

Page 40: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

September 2009

Elizabeth Wager, Virginia Barbour, Steven Yentis, Sabine Kleinert on behalf of COPE

Council

“The main purpose of retractions is to correct

the literature and ensure its integrity rather than

to punish authors who misbehave.”

Page 41: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Process and best practices

• Retractions have their own

DOI.

• Permanent bi-directional

linking & clear marking of the

paper – syndicate to indexers.

• For corrections, clear

indication if the paper was

republished

CrossRef industry-wide initiative to

provide a standard way for

readers to locate the most up-to-

date version of an article.

PLOS has adopted

What about data retractions?

Page 42: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Editorial and peer review evaluation

Editorial office:

• Trial registration

• Data deposition

• Reporting guidelines

• Ethical approval

• Competing interests

• Financial disclosures

• Permissions

• Plagiarism

• Image integrity

Peer reviewers:

• Methodology and

experimental design

• Analysis

• Statistics

• Conclusions

• Ethics

Limitations:

- Science has become more cross-disciplinary

- Confidential peer review can show biases

- Paper should live on after publication

- Still not enough incentives to publish all results

Page 43: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

False expectations

Peer review is expected to police the literature but:

• Science has become more cross disciplinary and more

complicated (mammoth datasets)

• Is 2 or 3 reviewers + 1 editor sufficient?

• Anonymity conceals/engenders negativity and bias

• No incentive/reward for constructive collaboration

• Reviewers review for journals and editors – not for

readers, colleagues or society

• Peer review is a black box – impossible to assess its

effectiveness

Page 44: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Is science reliable ?

• Poorly Designed studies

• small sample sizes, lack of randomisation, blinding

and controls

• Data not available to scrutinise/replicate

• ‘p-hacking’ (selective reporting) widespread1

• Poorly reported methods & results2

• Negative results are not published

1Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD (2015) The Extent

and Consequences of P-Hacking in Science. PLoS Biol 13(3): e1002106.

doi:10.1371/journal.pbio.1002106 2Landis SC, et al. (2012) A call for transparent reporting to optimize

the predictive value of preclinical research. Nature 490(7419):

187–191.

Page 45: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

45

Page 46: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Does prestige ensure ‘quality’

• Higher ranked journals have more papers retracted1

• Papers in higher ranked journals are more likely to

report either no or inappropriate statistics2,3

• Papers from highly ranked institutions have poorer

reporting standards3

1Fang, Ferric C., and Arturo Casadevall. “Retracted Science and the Retraction

Index.” Infection and Immunity 79, no. 10 (October 1, 2011): 3855–59.

doi:10.1128/IAI.05661-11. 2Tressoldi PE, Giofre D, Sella F, Cumming G. High impact = high statistical standards?

Not necessarily so. PLoS One 2013; 8(2):e56180. doi: 10.1371/journal.pone.0056180

PMID: 23418533 3 Macleod MR, et al. (2015) Risk of Bias in Reports of In Vivo Research: A Focus for

Improvement. PLoS Biol 13(10): e1002273. doi:10.1371/journal.pbio.1002273

Page 47: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

“Current incentive structures in science are likely to lead rational

scientists to adopt an approach to maximise their career

advancement that is to the detriment of the advancement of

scientific knowledge. “

Andrew Higginson and Marcus Mufano, in prep (cited with their

permission)

Page 48: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

• Researchers gain from publishing in ‘designer’ journals

• Journals gain financially from their brand/ Journal

Impact factor

• Institutions gain financially by hiring and firing based on

where researchers publish, not on what they publish (or

the mission of the University)

• Research assessment by funders often based on very

few publications and brand/impact factor (some are

changing)

Page 49: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Declaration on Research

Assessment

• A worldwide initiative, spearheaded

by the ASCB (American Society for

Cell Biology), together with scholarly

journals and funders

• Focuses on the need to improve the

way in which the outputs of scientific

research are evaluated:

• the need to eliminate the use of

journal-based metrics, such as Journal

Impact Factors, in funding,

appointment, and promotion

considerations;

• “need to assess research on its own

merits rather than on the basis of the

journal in which the research is

published”

Page 50: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Raise

awareness

&

promote

research

Page 51: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Joint initiatives

Page 52: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum
Page 53: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

POLICY HARMONISATION

Page 54: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

By the time a paper is submitted to a journal it’s

generally too late

Page 55: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

• Lack of incentives for authors to share data and software, or to

transparently report details of data, methods etc within articles

• Different data sharing expectations among co-authors in receipt of grants

from different funders or locations or disciplines.

• The absence of a culture/lack of education within institutions to put data

management and archiving at the centre of good lab practice.

• Lack of any coherent infrastructure (e.g. repositories, metadata standards).

• Licensing chaos (implications for Text and Data Mining).

• No clear definition of what the data underlying the paper means

• Lack of enforcement by different stakeholders in the chain (funders,

institutions, publishers)

• No means of reporting compliance.

Page 56: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

• Align policies between funders, publishers, institutions

• Reduce the burden on researchers

• Incentivise all players (sticks and carrots)

• Monitor progress towards common goals

• Create global community standards for open science

• Define ‘Open Science’

• COPE, TOP guidelines, Leiden Manifesto, HEFCE report on metrics

• Build the infrastructure to support open science

• Interoperable publicly available platforms

• New submission and reviewing tools that foster openness, transparency

and collaboration

• The means to track, link & assign credit to all types of outputs

• Persistent identifiers for researchers, funders, institutions, licences etc -ORCID,

FundRef, DataCite (DOIs for data) etc

• Apply the scientific method to scholarly communication itself

• ‘Evidence-based’ policy

• Publically available data on metrics, indicators, evaluation

• Independent scrutiny

Page 57: How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum

Open Access was just the start