30
High quality data publications: drives and needs Susanna-Assunta Sansone, PhD @biosharing @isatools @scientificdata B-DEBATE: Big Data in Biomedicine. Challenges and Opportunities, 12 Nov, 2014 Data Consultant, Honorary Academic Editor Associate Director, Principal Investigator

High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

Embed Size (px)

DESCRIPTION

http://www.bdebate.org/sites/default/files/archivos/debate/bdebate_bib_big_data_opensimposium_program_121114_web_1.pdf

Citation preview

Page 1: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

!

High quality data publications:!

drives and needs!!

Susanna-Assunta Sansone, PhD!

!

@biosharing!@isatools!

@scientificdata!!

B-DEBATE: Big Data in Biomedicine. Challenges and Opportunities, 12 Nov, 2014

Data Consultant, Honorary Academic Editor

Associate Director, Principal Investigator

Page 2: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

https://projects.ac/blog/five-top-reasons-to-protect-your-data-and-practise-safe-science/

Credit to:

Page 3: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

•  Over 50% of completed studies in biomedicine do not appear in the published literature!

!

•  Often because results do not conform to author's hypotheses!

“Only half the health-related studies funded by the European Union between 1998 and 2006 - an expenditure of €6 billion - led to identifiable reports”!

Plagued by selective reporting of data and methods

Page 4: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

•  Big science efforts!o  data is often better organized, reported and shared!

•  Small independent efforts, yielding a rich variety of specialty data sets!o  Most of these data (such as null findings) is unpublished!o  These dark data hold a potential wealth of knowledge!

Incentivizing individual contributor to share data

Page 5: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

From made reproducible to born reproducible

“Reproducing the method took several months of effort, and required using new versions and new software that posed challenges to reconstructing and validating the results”

Page 6: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

http://bd2k.nih.gov/workshops.html#ADDS

Worldwide movement for FAIR data

Page 7: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

Because of importance of formal publications in the academic !

incentive structure!

Publishers occupy a leverage point

Page 8: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

Serve as the implementation and/or enforcement arm at the point of publication!

Role of publishers as “agents of change”

Page 9: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

Credit to: Iain Hrynaszkiewicz

2013

Page 10: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

Wang et al, Nature, 2013 doi:10.1038/nature12730

Data/reproducibility at NPG

•  Figure source data o  putting data behind figures/graphs

Page 11: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

Data/reproducibility at NPG

•  Figure source data o  putting data behind figures/graphs

•  Data citation o  tackling both styling and format; monitoring community developments,

such the Data Citation Synthesis Group

•  Code reproducibility o  peer review, availability and reuse

•  NPG’s Linked Data release – CC0

•  A new data journal

Page 12: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

Role of data papers and data journals

•  Incentive, credit for sharing!•  Peer review focus!•  Value of data vs. analysis!•  Discoverability and reusability!

Page 13: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

market research (2011)

•  What do researchers want from a data publications? o  96% - increased visibility and discovery o  95% - increased usability of their research data o  93% - credit mechanism for deposit of data o  80% - peer review of content/datasets

Respondent characteristics 387 respondents (329 active researchers Physics (24%) Earth and environmental science (21%) Biology (20%) Chemistry (19%) Others (16%)

Page 14: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

!!!

Helping you publish, discover and reuse research data

Credit for sharing your data

Focused on reuse and reproducibility

Peer reviewed, curated

Promoting community data and code repositories

Open Access

•  Currently covering life, natural and environmental sciences!

•  Big and small data!o  power of small data are in their aggregation and

integration with other datasets!

•  New and previously published individual datasets, curated collections and citizen science!

o  a fuller, more in-depth look at the data processing steps, additional data files, codes etc!

o  tutorial-like information for scientists interested in reusing or integrating the data with their own!

Page 15: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

Methods and technical analyses supporting the quality of the measurements:!What did I do to generate the data?!How was the data processed?!Where is the data?!Who did what when!How can the data be used or reused?!

Introducing a new content type: Data Descriptor

Designed to make data more discoverable,

interpretable and reusable

Page 16: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

!!!!!!!!Scientific hypotheses:!Synthesis!Analysis!Conclusions!

Methods and technical analyses supporting the quality of the measurements:!What did I do to generate the data?!How was the data processed?!Where is the data?!Who did what when!How can the data be used or reused?!

Relation with traditional article - content

Page 17: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

AFTER: expand on your research articles, adding further information for reuse of the data

AT THE SAME TIME: publish your Data Descriptor(s) alongside research article(s)

OR BEFORE !

Relation with traditional article - time

Publish Data!

Page 18: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

!!!!!!!!!

Code in GitHub

!!!!!!!!!Data in OpenfMRI

Share your data, get credited and cited

Page 19: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

!!!

Experimental metadata or !structured component!

(in-house curated, machine-readable formats)!

Article or !narrative component!

(PDF and HTML) !

Data Descriptor: narrative and structure

Page 20: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

Sections:!•  Title!•  Abstract!•  Background & Summary!•  Methods!•  Technical Validation!•  Data Records!•  Usage Notes !•  Figures & Tables !•  References!•  Data Citations!!

Focus on data reuse!Detailed descriptions of the methods and technical analyses supporting the quality of the measurements.!Does not contain tests of new scientific hypotheses!

Joint Declaration of Data Citation Principles by the Data Citation Synthesis Group

Data Descriptor: narrative

Page 21: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

In traditional publications this information is not provided in a sufficiently detailed manner

However this information is essential for understanding, reusing, and reproducing datasets

Focus on data reuse!Detailed descriptions of the methods and technical analyses supporting the quality of the measurements.!Does not contain tests of new scientific hypotheses!

Sections:!•  Title!•  Abstract!•  Background & Summary!•  Methods!•  Technical Validation!•  Data Records!•  Usage Notes !•  Figures & Tables !•  References!•  Data Citations!!

Data Descriptor: narrative

Page 22: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

In-house editorial curator:!•  assists users to submit the structured

content via simple templates and an internal authoring tool!

•  performs value-added semantic annotation of the experimental metadata!

For advanced users/service providers willing to export ISA-Tab for direct submission, we have released a technical specification:!

analysis !method! script!

Data file or !record in a database!

Data Descriptor: structure (CC0)

Page 23: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

Res

earc

h pa

pers

D

ata

reco

rds

Dat

a D

escr

ipto

rs

We currently recognize over 60 public data repositories!!

Adding value to research articles and data records

Page 24: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

Citation of and link to data files and databases

Page 25: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

Evaluation is not be based on the perceived impact !or novelty of the findings or size of the data!

!

•  Experimental rigour and technical data quality!o  Methodologically sound!o  Technical validation experiments and statistical analyses!o  Depth, coverage, size, and/or completeness of data sufficient for the types

of applications!•  Completeness of the description!

o  Sufficient details to allow others to reproduce the results, reuse or integrate it with other data!

o  Compliance with relevant minimum information or reporting standards!•  Integrity of the data files and repository record!

o  Data files match the descriptions in the Data Descriptor!o  Deposited in the most appropriate available databases!

Peer review process focused on quality and reuse!

Page 26: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

~ 156

~ 70

~ 334

Source: BioPortal

Databases !implementing !

standards!

miame!MIAPA!

MIRIAM!MIQAS!MIX!

MIGEN!

CIMR!MIAPE!

MIASE!

MIQE!

MISFISHIE….!

REMARK!

CONSORT!

MAGE-Tab!GCDML!

SRAxml!SOFT! FASTA!

DICOM!

MzML !SBRML!

SEDML…!

GELML!

ISA-Tab!

CML!

MITAB!

AAO!CHEBI!

OBI!

PATO! ENVO!MOD!

BTO!IDO…!

TEDDY!

PRO!XAO!

DO

VO!

Progressively refine guidance to authors and reviewers

Page 27: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

Mapping the landscape of standards and databases

Page 28: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

PI: Lucila Ohno-Machado, UCSD

biocaddie.org

Page 29: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

PI: Mark Musen, Stanford

metadatacenter.org

Page 30: High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014

Acknowledgements!

Visit nature.com/scientificdata

Email [email protected]

Tweet @ScientificData

Honorary Academic Editor Susanna-Assunta Sansone, PhD

Managing Editor Andrew L Hufton, PhD Editorial Curator Varsha Khodiyar

Publisher Iain Hrynaszkiewicz Advisory Panel and Editorial Board including senior researchers, funders, librarians and curators

and our Advisory Boards and Collaborators

Philippe Rocca-Serra, PhD

Alejandra Gonzalez-Beltran, PhD

Eamonn Maguire

Milo Thurston, PhD

Funds: