ELPUB 2008: A review of journal policies for sharing research data

Preview:

DESCRIPTION

Abstract: Sharing data is a tenet of science, yet commonplace in only a few subdisciplines. Recognizing that a data sharing culture is unlikely to be achieved without policy guidance, some funders and journals have begun to request and require that investigators share their primary datasets with other researchers. The purpose of this study is to understand the current state of data sharing policies within journals, the features of journals that are associated with the strength of their data sharing policies, and whether the strength of data sharing policies impact the observed prevalence of data sharing. Methods: We investigated these relationships with respect to gene expression microarray data in the journals that most often publish studies about this type of data. We measured data sharing prevalence as the proportion of papers with submission links from NCBI’s Gene Expression Omnibus (GEO) database. We conducted univariate and linear multivariate regressions to understand the relationship between the strength of data sharing policy and journal impact factor, journal subdiscipline, journal publisher (academic societies vs. commercial), and publishing model (open vs. closed access). Results: Of the 70 journal policies, 53 made some mention of sharing publication-related data within their Instruction to Author statements. Of the 40 policies with a data sharing policy applicable to gene expression microarrays, we classified 17 as weak and 23 as strong (strong policies required an accession number from database submission prior to publication). Existence of a data sharing policy was associated with the type of journal publisher: 46% of commercial journals had data sharing policy, compared to 82% of journals published by an academic society. All five of the openaccess journals had a data sharing policy. Policy strength was associated with impact factor: the journals with no data sharing policy, a weak policy, and a strong policy had respective median impact factors of 3.6, 4.9, and 6.2. Policy strength was positively associated with measured data sharing submission into the GEO database: the journals with no data sharing policy, a weak policy, and a strong policy had median data sharing prevalence of 8%, 20%, and 25%, respectively. Conclusion: This review and analysis begins to quantify the relationship between journal policies and data sharing outcomes. We hope it contributes to assessing the incentives and initiatives designed to facilitate widespread, responsible, effective data sharing.

Citation preview

A review of journal policies for sharing research data

Heather Piwowar, Wendy Chapman

Department of Biomedical Informatics University of Pittsburgh

ELPUB 2008

http://www.flickr.com/photos/cogdog/123072/

“An inherent principle of publication is that others should be able to replicate and build upon the authors' published claims. Therefore, a condition of publication in a Nature journal is that authors are required to make materials, data and associated protocols available in a publicly accessible database …” http://www.nature.com/authors/editorial_policies/availability.html

http://www.nature.com/nature/journal/v453/n7197/index.html

Benefits for journal – allows publications to be useful (and cited) in

additional ways – demonstrates commitment to quality research – discourages fraud

Drawbacks for journal – might decrease submissions – administrative burden

Prior work in this area

•  McCain: 16% of 850 science+engineering journals have a policy about sharing RRI

•  NAS: 53% of 38 life sciences journals

But these reviews are dated, consider a variety of resources, and don’t correlate policy to behaviour

McCain. Science Communication, Vol. 16, No. 4. (1 June 1995), pp. 403-431 NAS. Sharing Publication-Related Data and Materials. (2003), p. 33

•  In this study, we looked at the data-sharing policies within Instruction to Author statements of 70 journals for a specific data type

•  We look at themes within the statements

•  We correlate the strength of the policy statements to the frequency with which the authors actually share their data

Data type: gene expression microarrays

http://en.wikipedia.org/wiki/Image:Heatmap.png

Three types of results

1.  Themes within data sharing policies

2.  Relative policy strength

3.  Observed data sharing behaviour

Themes within data sharing policies •  statements of policy motivation •  datatype-specific policies •  requested vs. required •  data location •  data format •  data completeness •  timeliness of sharing •  consequences for not sharing •  exceptions

Relative policy strength

•  No applicable policy (43%)

•  Weak policy (24%) – should, recommend, request – must, but without database accession number

•  Strong policy (33%) – must, required, condition of publication –  requires database accession number

High-impact journals tend to have

a strong data-sharing policy

What journal characteristics are associated with having a data-sharing policy?

Journal has a data sharing policy?

Impact Factor

Open Access?

Society Publisher?

Subdisciplines…

What journal characteristics are associated with having a data-sharing policy?

Journal has a data sharing policy?

Impact Factor

Open Access?

Society Publisher?

•  Biochemistry &Molecular Biology •  Oncology

Observed Sharing Behaviour

For each of the 70 journals, we measured % of papers with links to database

submission entries

% of submission links is our proxy for % of publications with shared data

Articles published in journals with a strong data-sharing

policy are more likely to have publicly available datasets

What journal characteristics are associated with data sharing behaviour?

% of articles with shared data

Impact Factor

Open Access?

Society Publisher?

Subdisciplines…

Having a data-sharing policy?

What journal characteristics are associated with data sharing behaviour?

% of articles with shared data

Impact Factor

Open Access?

Society Publisher?

•  Genetics & Heredity •  Multidisciplinary Sciences

Having a data-sharing policy?

Limitation

•  Association does not imply causation

Take-home message

•  Many, but not all, journals require sharing of microarray data. Very diverse policies.

•  Stronger data-sharing policies: – high-impact journals – open-access journals – published by association

•  Policy strength correlates with behaviour •  Policies would benefit from

improved clarity, scope, and accountability

Future work

•  Who shares data? •  Who reuses data?

Hopefully the answers will inform our decisions about where to focus our energy to improve

policies, tools, and incentives

Thank you

Advisor: Dr. Wendy Chapman Funding: NLM for training grant, and

Pitt DBMI department for travel grant

My shared data: www.dbmi.pitt.edu/piwowar Share your research data too!

“Does anyone want your data?

That’s hard to predict […] After all, no one ever knocked on your door asking to buy those figurines collecting dust in your cabinet before you listed them on eBay.

Your data, too, may simply be awaiting an effective matchmaker.”

Got data? Nature Neuroscience 10, 931 (2007)

Recommended