Upload
susanna-assunta-sansone
View
431
Download
0
Embed Size (px)
DESCRIPTION
http://www.bdebate.org/sites/default/files/archivos/debate/bdebate_bib_big_data_opensimposium_program_121114_web_1.pdf
Citation preview
!
High quality data publications:!
drives and needs!!
Susanna-Assunta Sansone, PhD!
!
@biosharing!@isatools!
@scientificdata!!
B-DEBATE: Big Data in Biomedicine. Challenges and Opportunities, 12 Nov, 2014
Data Consultant, Honorary Academic Editor
Associate Director, Principal Investigator
https://projects.ac/blog/five-top-reasons-to-protect-your-data-and-practise-safe-science/
Credit to:
• Over 50% of completed studies in biomedicine do not appear in the published literature!
!
• Often because results do not conform to author's hypotheses!
“Only half the health-related studies funded by the European Union between 1998 and 2006 - an expenditure of €6 billion - led to identifiable reports”!
Plagued by selective reporting of data and methods
• Big science efforts!o data is often better organized, reported and shared!
• Small independent efforts, yielding a rich variety of specialty data sets!o Most of these data (such as null findings) is unpublished!o These dark data hold a potential wealth of knowledge!
Incentivizing individual contributor to share data
From made reproducible to born reproducible
“Reproducing the method took several months of effort, and required using new versions and new software that posed challenges to reconstructing and validating the results”
http://bd2k.nih.gov/workshops.html#ADDS
Worldwide movement for FAIR data
Because of importance of formal publications in the academic !
incentive structure!
Publishers occupy a leverage point
Serve as the implementation and/or enforcement arm at the point of publication!
Role of publishers as “agents of change”
Credit to: Iain Hrynaszkiewicz
2013
Wang et al, Nature, 2013 doi:10.1038/nature12730
Data/reproducibility at NPG
• Figure source data o putting data behind figures/graphs
Data/reproducibility at NPG
• Figure source data o putting data behind figures/graphs
• Data citation o tackling both styling and format; monitoring community developments,
such the Data Citation Synthesis Group
• Code reproducibility o peer review, availability and reuse
• NPG’s Linked Data release – CC0
• A new data journal
Role of data papers and data journals
• Incentive, credit for sharing!• Peer review focus!• Value of data vs. analysis!• Discoverability and reusability!
market research (2011)
• What do researchers want from a data publications? o 96% - increased visibility and discovery o 95% - increased usability of their research data o 93% - credit mechanism for deposit of data o 80% - peer review of content/datasets
Respondent characteristics 387 respondents (329 active researchers Physics (24%) Earth and environmental science (21%) Biology (20%) Chemistry (19%) Others (16%)
!!!
Helping you publish, discover and reuse research data
Credit for sharing your data
Focused on reuse and reproducibility
Peer reviewed, curated
Promoting community data and code repositories
Open Access
• Currently covering life, natural and environmental sciences!
• Big and small data!o power of small data are in their aggregation and
integration with other datasets!
• New and previously published individual datasets, curated collections and citizen science!
o a fuller, more in-depth look at the data processing steps, additional data files, codes etc!
o tutorial-like information for scientists interested in reusing or integrating the data with their own!
Methods and technical analyses supporting the quality of the measurements:!What did I do to generate the data?!How was the data processed?!Where is the data?!Who did what when!How can the data be used or reused?!
Introducing a new content type: Data Descriptor
Designed to make data more discoverable,
interpretable and reusable
!!!!!!!!Scientific hypotheses:!Synthesis!Analysis!Conclusions!
Methods and technical analyses supporting the quality of the measurements:!What did I do to generate the data?!How was the data processed?!Where is the data?!Who did what when!How can the data be used or reused?!
Relation with traditional article - content
AFTER: expand on your research articles, adding further information for reuse of the data
AT THE SAME TIME: publish your Data Descriptor(s) alongside research article(s)
OR BEFORE !
Relation with traditional article - time
Publish Data!
!!!!!!!!!
Code in GitHub
!!!!!!!!!Data in OpenfMRI
Share your data, get credited and cited
!!!
Experimental metadata or !structured component!
(in-house curated, machine-readable formats)!
Article or !narrative component!
(PDF and HTML) !
Data Descriptor: narrative and structure
Sections:!• Title!• Abstract!• Background & Summary!• Methods!• Technical Validation!• Data Records!• Usage Notes !• Figures & Tables !• References!• Data Citations!!
Focus on data reuse!Detailed descriptions of the methods and technical analyses supporting the quality of the measurements.!Does not contain tests of new scientific hypotheses!
Joint Declaration of Data Citation Principles by the Data Citation Synthesis Group
Data Descriptor: narrative
In traditional publications this information is not provided in a sufficiently detailed manner
However this information is essential for understanding, reusing, and reproducing datasets
Focus on data reuse!Detailed descriptions of the methods and technical analyses supporting the quality of the measurements.!Does not contain tests of new scientific hypotheses!
Sections:!• Title!• Abstract!• Background & Summary!• Methods!• Technical Validation!• Data Records!• Usage Notes !• Figures & Tables !• References!• Data Citations!!
Data Descriptor: narrative
In-house editorial curator:!• assists users to submit the structured
content via simple templates and an internal authoring tool!
• performs value-added semantic annotation of the experimental metadata!
For advanced users/service providers willing to export ISA-Tab for direct submission, we have released a technical specification:!
analysis !method! script!
Data file or !record in a database!
Data Descriptor: structure (CC0)
Res
earc
h pa
pers
D
ata
reco
rds
Dat
a D
escr
ipto
rs
We currently recognize over 60 public data repositories!!
Adding value to research articles and data records
Citation of and link to data files and databases
Evaluation is not be based on the perceived impact !or novelty of the findings or size of the data!
!
• Experimental rigour and technical data quality!o Methodologically sound!o Technical validation experiments and statistical analyses!o Depth, coverage, size, and/or completeness of data sufficient for the types
of applications!• Completeness of the description!
o Sufficient details to allow others to reproduce the results, reuse or integrate it with other data!
o Compliance with relevant minimum information or reporting standards!• Integrity of the data files and repository record!
o Data files match the descriptions in the Data Descriptor!o Deposited in the most appropriate available databases!
Peer review process focused on quality and reuse!
~ 156
~ 70
~ 334
Source: BioPortal
Databases !implementing !
standards!
miame!MIAPA!
MIRIAM!MIQAS!MIX!
MIGEN!
CIMR!MIAPE!
MIASE!
MIQE!
MISFISHIE….!
REMARK!
CONSORT!
MAGE-Tab!GCDML!
SRAxml!SOFT! FASTA!
DICOM!
MzML !SBRML!
SEDML…!
GELML!
ISA-Tab!
CML!
MITAB!
AAO!CHEBI!
OBI!
PATO! ENVO!MOD!
BTO!IDO…!
TEDDY!
PRO!XAO!
DO
VO!
Progressively refine guidance to authors and reviewers
Mapping the landscape of standards and databases
PI: Lucila Ohno-Machado, UCSD
biocaddie.org
PI: Mark Musen, Stanford
metadatacenter.org
Acknowledgements!
Visit nature.com/scientificdata
Email [email protected]
Tweet @ScientificData
Honorary Academic Editor Susanna-Assunta Sansone, PhD
Managing Editor Andrew L Hufton, PhD Editorial Curator Varsha Khodiyar
Publisher Iain Hrynaszkiewicz Advisory Panel and Editorial Board including senior researchers, funders, librarians and curators
and our Advisory Boards and Collaborators
Philippe Rocca-Serra, PhD
Alejandra Gonzalez-Beltran, PhD
Eamonn Maguire
Milo Thurston, PhD
Funds: