31
Contributing to the Big Data to Knowledge Initiative at the NIH Data Sharing and the NIH Data Catalog

Data sharing & the nih data catalog

  • Upload
    readkev

  • View
    111

  • Download
    2

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Data sharing & the nih data catalog

Contributing to the Big Data to Knowledge Initiative at the

NIH

Data Sharing and the NIH Data Catalog

Page 2: Data sharing & the nih data catalog

Big Data to Knowledge (BD2K)

Page 3: Data sharing & the nih data catalog

Big Data 2 Knowledge

Data Catalog Frameworks and Standards

Policies and Data Sharing

Page 4: Data sharing & the nih data catalog

Data Sharing Repositories

All NIH-funded data sharing repositories that are open to receiving data submissions from any researcher internationally - whether they are funded by the NIH or not

Page 5: Data sharing & the nih data catalog

Data Sharing Policies

All data sharing policies that exist within the NIH that assist researchers in developing a plan to share their research data

Page 6: Data sharing & the nih data catalog

Big Data 2 Knowledge

Data Catalog Frameworks and Standards

Policies and Data Sharing

Page 7: Data sharing & the nih data catalog

NIH Data Catalog

Bringing Data Into the Research Ecosystem

Page 8: Data sharing & the nih data catalog

Each dataset will be identified via Data Unique Identifier [DUID] (in NIH Data Catalog and in the associated journal)

Datasets specified in catalog using MeSH (creation of a dataset Publication Type)

Datasets are discoverable

Page 9: Data sharing & the nih data catalog

NIH Data Catalog produces citable data publications

Citability + proper credit = incentives to submit and publish data

Datasets are citable

Page 10: Data sharing & the nih data catalog

Data citations linked between and across the NIH Data Catalog with their associated scientific publication in PubMed/PubMed Central

Datasets are linked to the literature

Page 11: Data sharing & the nih data catalog

Analysis of trends, impact of data, effect on NIH research funding

Datasets become information in the research

ecosystem

Page 12: Data sharing & the nih data catalog

Common Metadata Elements

How do current data repositories describe their data?

Page 13: Data sharing & the nih data catalog

NIH Data Sharing Repositories

Page 14: Data sharing & the nih data catalog

Identifying Metadata Commonalities

Page 15: Data sharing & the nih data catalog

Identifying Metadata Commonalities

Common Metadata Elements

Authorship

Data Description

Date Information

Page 16: Data sharing & the nih data catalog

Building a Taxonomy of Metadata Descriptors

• Authorshipo Attributiono Authorso Creator(s)o Data Authorso Data Ownero Data Attributiono Contributor(s)o PI Name(s)o Investigator(s)o Sequence Authorso Responsible Partyo Data Providero Submitter

• Title informationo Name Titleo Collection Typeo Type of Deposito Service Nameo Image File Nameo File Nameo Data Collection Titleo Dataset Titleo Dataset Name and Accessiono Submission Titleo Lab Data Titleo Research Objective

Page 17: Data sharing & the nih data catalog

Common Metadata Elements

Common Metadata Elements

Common Metadata Elements

Mapping Metadata to Existing

Standards

Page 18: Data sharing & the nih data catalog

Mapping to DataCite

• DataCite Metadata Schemao Identifiero Creatoro Titleo Publishero PublicationYearo Subjecto Contributoro Dateo Resource Typeo RelatedIdentifiero Rightso Descriptiono Size, Format, Version

• Common Metadata Elementso Data Unique Identifiero Authorshipo Data Titleo Data Locationo Data Completion/Release

Dateo Data Descriptors (controlled

vocabulary)o Data Submitter/Affiliationo Date Informationo Data File Typeso Related Resourceso Access Data Restrictionso Data Description (narrative)

Page 19: Data sharing & the nih data catalog

Mapping to Dryad

• Dryad Metadata Schemao dcterms:identifier/Data

Package Identifiero dcterms:creator/Authoro dcterms:title/Data Package

Titleo dcterms:relation/Location of

related content outside of Dryad

o dcterms:available/Date Available

o dcterms:descriptiono dcterms:subject/Keywordo dwc:scientificNameo dcterms:references/

Associated Dryad publication record ID

• Common Metadata Elementso Data Unique Identifiero Authorshipo Data Titleo Data Locationo Data Completion/Release

Dateo Data Descriptors (controlled

vocabulary)o Data Submitter/Affiliationo Date Informationo Data File Typeso Related Resourceso Access Data Restrictionso Data Description (narrative)

Page 20: Data sharing & the nih data catalog

Mapping to MEDLINE

Common Metadata Elements

Proposed Definition

Data Unique Identifier A unique ID string that identifies a dataset within the catalog

Author Individuals involved in producing or contributing to data

Affiliation Affiliation of each author associated with the appropriate author occurrence

Data Title Name or title by which the dataset is known

Data Location The name of the entity that holds, archives, publishes, distributes, releases, issues, or produces the data w/ its associated accession number

Date The year, month and date when the data was made available

Data Description (structured narrative) Structured narrative description for efficient indexing

Data Descriptors Metadata describing data contents using controlled labels (i.e. Organism, Disease, Perturbation, Gender, Cell type, etc.)

PMIDs Identifier that will link dataset to associated article(s)

Availability/Accessibility of Data Whether the data is available to use and how to access it

Award Number Grant/award numbers associated with the dataset

Version The version of the dataset (represented as a unique record)

Page 21: Data sharing & the nih data catalog

Data Citation - ICMJE

Marazita ML, Weynat RJ, Feingold E, Weeks D, Crout R, McNeill D. Dental Caries: Whole Genome Association and Gene x Environment Studies. NIH Data Catalog. 2014 Jan;1(1):DUID00001. PubMed PMID: 22123456.

SI: dbGaP/pht002543.v2.p1

Marazita ML, Weynat RJ, Feingold E, Weeks D, Crout R, McNeill D. Dental Caries: Whole Genome Association and Gene x Environment Studies. NIH Data Catalog. 2014 Jan;1(1):DUID00001. PubMed PMID: 22123456.

SI: dbGaP/pht002543.v2.p1

Author

Marazita ML, Weynat RJ, Feingold E, Weeks D, Crout R, McNeill D. Dental Caries: Whole Genome Association and Gene x Environment Studies. NIH Data Catalog. 2014 Jan;1(1):DUID00001. PubMed PMID: 22123456.

SI: dbGaP/pht002543.v2.p1

Data Title

Marazita ML, Weynat RJ, Feingold E, Weeks D, Crout R, McNeill D. Dental Caries: Whole Genome Association and Gene x Environment Studies. NIH Data Catalog. 2014 Jan;1(1):DUID00001. PubMed PMID: 22123456.

SI: dbGaP/pht002543.v2.p1

Data Location

Marazita ML, Weynat RJ, Feingold E, Weeks D, Crout R, McNeill D. Dental Caries: Whole Genome Association and Gene x Environment Studies. NIH Data Catalog. 2014 Jan;1(1):DUID00001. PubMed PMID: 22123456.

SI: dbGaP/pht002543.v2.p1

Date data is submitted and paper is ready to publish

Marazita ML, Weynat RJ, Feingold E, Weeks D, Crout R, McNeill D. Dental Caries: Whole Genome Association and Gene x Environment Studies. NIH Data Catalog. 2014 Jan;1(1):DUID00001. PubMed PMID: 22123456.

SI: dbGaP/pht002543.v2.p1

NIH Data Catalog Volume (Issue)

Marazita ML, Weynat RJ, Feingold E, Weeks D, Crout R, McNeill D. Dental Caries: Whole Genome Association and Gene x Environment Studies. NIH Data Catalog. 2014 Jan;1(1):DUID00001. PubMed PMID: 22123456.

SI: dbGaP/pht002543.v2.p1

Data Unique Identifier

Marazita ML, Weynat RJ, Feingold E, Weeks D, Crout R, McNeill D. Dental Caries: Whole Genome Association and Gene x Environment Studies. NIH Data Catalog. 2014 Jan;1(1):DUID00001. PubMed PMID: 22123456.

SI: dbGaP/pht002543.v2.p1

PMID Assigned to NIH Data Catalog Record

Marazita ML, Weynat RJ, Feingold E, Weeks D, Crout R, McNeill D. Dental Caries: Whole Genome Association and Gene x Environment Studies. NIH Data Catalog. 2014 Jan;1(1):DUID00001. PubMed PMID: 22123456.

SI: dbGaP/pht002543.v2.p1

Secondary source ID (Link to actual dataset)

Page 22: Data sharing & the nih data catalog
Page 23: Data sharing & the nih data catalog
Page 24: Data sharing & the nih data catalog

NIH Data Catalog Issues and Concerns

What are we missing?

Page 25: Data sharing & the nih data catalog

How many NIH datasets actually

exist?

Page 26: Data sharing & the nih data catalog

How many unique NIH datasets are

NOT represented in existing data repositories?

Page 27: Data sharing & the nih data catalog

Could these datasets be represented as a

data publication instead of in a repository?

Page 28: Data sharing & the nih data catalog

If the datasets are already housed

somewhere – do we need a one stop

shop?

Page 29: Data sharing & the nih data catalog

Is a NIH Data Catalog the best

solution?

Page 30: Data sharing & the nih data catalog

Next Steps• Find out how many datasets are currently in NIH

data sharing repositorieso How many datasets do these repositories process per year?

• How many datasets are unique and NOT housed in a repository?o Search PubMed and PubMed Central and assign categories

• MeSHo PT: Electronic Supplementary materialo SH: Statistical and numerical datao MeSH: Databases, Factual

o Statistical Analysis – exclude datasets that already have a location

• How do we manage these unique datasets?

Page 31: Data sharing & the nih data catalog

Questions?Thank you.