21
Primary Data Archiving and Citation in Biomedical Research: a DCIP Progress Report Tim Clark, PhD Harvard Medical School & Massachusetts General Hospital BD2K All Hands Meeting, Bethesda MD November 29-30, 2016

Primary Data Archiving and Citation in Biomedical Research ......•Reproducibility crisis: Science policy makers & funders concerned. • Policy studies: Recommend primary data archiving

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Primary Data Archiving and Citation in Biomedical Research ......•Reproducibility crisis: Science policy makers & funders concerned. • Policy studies: Recommend primary data archiving

Primary Data Archiving and Citation in Biomedical Research: a DCIP Progress Report

Tim Clark, PhD

Harvard Medical School & Massachusetts General Hospital

BD2K All Hands Meeting, Bethesda MD

November 29-30, 2016

Page 2: Primary Data Archiving and Citation in Biomedical Research ......•Reproducibility crisis: Science policy makers & funders concerned. • Policy studies: Recommend primary data archiving

Background• Reproducibility crisis: Science policy makers & funders concerned.

• Policy studies: Recommend primary data archiving and citation.

• BD2K⇒FAIR: “Facilitate broad use of biomedical digital assets by making them discoverable, accessible and citable”.

• Opportunity: Technologies & recommendations now in place.

• bioCADDIE requirements: Need all the primary data + metadata.

Page 3: Primary Data Archiving and Citation in Biomedical Research ......•Reproducibility crisis: Science policy makers & funders concerned. • Policy studies: Recommend primary data archiving

Three Main Reasons to Cite Data

⇒ Better Science

⇒ Re-use & discovery

⇒ Cure Diseases

Page 4: Primary Data Archiving and Citation in Biomedical Research ......•Reproducibility crisis: Science policy makers & funders concerned. • Policy studies: Recommend primary data archiving

DCIP Goals

• Facilitate data citation in biomedical research as standard practice w/ common information models.

• Coordinate efforts amongst publishers, repositories, identifier services, bioCADDIE & NIH.

• Support development of the NIH bioCADDIE data discovery index software and ecosystem.

Page 5: Primary Data Archiving and Citation in Biomedical Research ......•Reproducibility crisis: Science policy makers & funders concerned. • Policy studies: Recommend primary data archiving

What is DCIP Based On?

• CODATA, National Academies & NIH recommendations.

• Joint Declaration of Data Citation Principles (JDDCP).

• Starr et al. 2015 “Achieving Human and Machine Accessibility of Cited Data” https://doi.org/10.7717/peerj-cs.1.

• Existing & emerging standards e.g. JATS, schema.org, DATS.

• Community participation by publishers, repositories, identifier and metadata services, standards groups.

Page 6: Primary Data Archiving and Citation in Biomedical Research ......•Reproducibility crisis: Science policy makers & funders concerned. • Policy studies: Recommend primary data archiving

Approach

• Coordinate early adopter best practices.• Help establish standard benchmark implementations.• Report on lessons learned to the community. • Focus on primary biomedical research data. • Make cited data discoverable and reusable.

Page 7: Primary Data Archiving and Citation in Biomedical Research ......•Reproducibility crisis: Science policy makers & funders concerned. • Policy studies: Recommend primary data archiving

Major Expected Outputs

• Publishers: Publisher’s Roadmap. • Repositories: Repositories Roadmap.• Identifiers: Harmonized compact ID resolution.• FAQs: Guidance for common implementations.

Page 8: Primary Data Archiving and Citation in Biomedical Research ......•Reproducibility crisis: Science policy makers & funders concerned. • Policy studies: Recommend primary data archiving
Page 9: Primary Data Archiving and Citation in Biomedical Research ......•Reproducibility crisis: Science policy makers & funders concerned. • Policy studies: Recommend primary data archiving
Page 10: Primary Data Archiving and Citation in Biomedical Research ......•Reproducibility crisis: Science policy makers & funders concerned. • Policy studies: Recommend primary data archiving

4Data

DiscoveryIndices

bioCADDIE

Page 11: Primary Data Archiving and Citation in Biomedical Research ......•Reproducibility crisis: Science policy makers & funders concerned. • Policy studies: Recommend primary data archiving

Publishers Roadmap Development

Elsevier

SpringerNature

∙ Leads: Amye Kenall & Helena Cousijn

∙ Participants: Elsevier, SpringerNature, eLife, PLoS, Frontiers, Wylie, et al.

∙ Roadmap: Now in final draft form. To be submitted to Nature Scientific Data.

∙ Implementation: Elsevier has implemented the JDDCP in 1,800 journals based on the Roadmap. SpringerNature plans to follow suit shortly in all their journals. Stay tuned ...

Page 12: Primary Data Archiving and Citation in Biomedical Research ......•Reproducibility crisis: Science policy makers & funders concerned. • Policy studies: Recommend primary data archiving

Christian Haselgrove

Ian Fore

Philipe Rocca-Serra

Andy Jenkinson Repository Roadmap Development

Page 13: Primary Data Archiving and Citation in Biomedical Research ......•Reproducibility crisis: Science policy makers & funders concerned. • Policy studies: Recommend primary data archiving

Christian Haselgrove

Ian Fore

Philipe Rocca-Serra

Andy Jenkinson Repository Roadmap Development

Leads: Martin Fenner (DataCite), Merce Crosas (Dataverse)

Page 14: Primary Data Archiving and Citation in Biomedical Research ......•Reproducibility crisis: Science policy makers & funders concerned. • Policy studies: Recommend primary data archiving

Landing Page Metadata Data Citation Metadata Element

Dublin Core

Schema.org DataCite DATS

Dataset Identifier identifier • @id• Resource• itemid*

identifier identifier

Title title name title title

Creator creator author creator creator

Data repository or archive publisher publisher publisher publisher

Publication Date date datePublished publicationYear date

Version <not defined> version version version

Type type type resourceTypeGeneral

type

* name of ID field depends on schema.org serialization format:@id in JSON-LD, resource in RDFa, and itemid in microdata; * JSON-LD the preferred serialization for schema.org elements.

Page 15: Primary Data Archiving and Citation in Biomedical Research ......•Reproducibility crisis: Science policy makers & funders concerned. • Policy studies: Recommend primary data archiving

Landing Page Data Citation Metadata s.b.Human and Machine Readable

Page 16: Primary Data Archiving and Citation in Biomedical Research ......•Reproducibility crisis: Science policy makers & funders concerned. • Policy studies: Recommend primary data archiving

Repository Metadata Status

• Required and supplemental metadata defined with alternative vocabularies and serializations specified.

• Backward and forward compatibility modes defined.

• Integration w/ ref. managers (EndNote, Zotero, CSL).

• Roadmap in near-final draft.

• Moving forward: outreach to repositories.

Page 17: Primary Data Archiving and Citation in Biomedical Research ......•Reproducibility crisis: Science policy makers & funders concerned. • Policy studies: Recommend primary data archiving

Compact Identifier ResolutionDCIP Identifiers Workshop, June 2, 2016, Harvard University, Cambridge MA

John Kunze (CDL), Niall Beard (Manchester), Tim Clark (Harvard),Nick Juty (EBI), Ian Fore (NIH),Julie McMurry (UCSB), Jeff Grethe (UCSD), Rafa Jimenez (ELIXIR), Sarala Wimalaratne (EBI)

Page 18: Primary Data Archiving and Citation in Biomedical Research ......•Reproducibility crisis: Science policy makers & funders concerned. • Policy studies: Recommend primary data archiving

Compact Identifier Resolution

• International collaboration of EBI, CDL, Prefix Commons & bioCADDIE throughout the past year.

• Technical approach for common prefix registry has been agreed and specification document is in near-final draft.

• Implementation is near-complete at both EBI and CDL.

• Extensive ongoing discussions with DataCite.

Page 19: Primary Data Archiving and Citation in Biomedical Research ......•Reproducibility crisis: Science policy makers & funders concerned. • Policy studies: Recommend primary data archiving

FAQ / Primer Group

UCSD

CaliforniaDigital Library

• Communicates DCIP outcomes.

• Major Deliverables:

• FAQs for Repositories & Publishers ✔

• Data Citation Primer ✔

• Website in design phase, will aggregate all specifications, roadmaps, and training materials, plus providing ongoing implementation status.

Page 20: Primary Data Archiving and Citation in Biomedical Research ......•Reproducibility crisis: Science policy makers & funders concerned. • Policy studies: Recommend primary data archiving

Participants

And you!

Page 21: Primary Data Archiving and Citation in Biomedical Research ......•Reproducibility crisis: Science policy makers & funders concerned. • Policy studies: Recommend primary data archiving

Conclusions

• Publisher & repository Roadmap documents, Primer and FAQs have been created & are in final editing state.

• Leading publishers are taking data citation very seriously.

• Elsevier will shortly announce it has implemented JDDCP for 1,800 journals. SpringerNature plan to follow suit soo.

• Compact identifier resolution implemented at CDL & EBI.

• Expanding repository outreach is a key issue for 2017.