Upload
peter-mcquilton
View
38
Download
2
Embed Size (px)
Citation preview
Describing and Connecting Standards, Databases and
Policies Across Disciplines
Peter McQuilton, PhD
@fairsharing_org
International Workshop on Sharing, Citation and Publication of Scientific Data across Disciplines, Tachikawa, Tokyo, 5-7 December 2017
Credit to: ttps://projects.ac/blog/five-top-reasons-to-protect-your-data-and-practise-safe-science/ 2014
But we don’t handle data well
A set of principles, for those wishing to enhance
the value of their
data holdings
Designed and endorsed by a diverse set of stakeholders - representing academia, industry, funding
agencies, and scholarly publishers.
FAIRFindable
Accessible
Interoperable
Reusable
Visible, citable
Trackable
Community standards
Reproducible
These put emphasis on enhancing the
ability of machines to automatically find
and use the data, in addition to supporting
its reuse by individuals
• Not always well cited, stored
o Software, code, workflows are hard to find/access
• Poorly described for third party reuse
o Different level of detail and annotation
• Curation activities are perceived as time-consuming
o Collection and harmonization of detailed methods and
experimental steps is rushed at the publication stage
Not FAIR – low findability and badly documented
• Available in a public repository
• Findable through some sort of search facility
• Retrievable in a standard format
• Self-described so that third parties can make sense of it
• Intended to outlive the experiment for which they were collected
To do better science, more efficiently, we need data that are…
My database is going offline, where should I
put the data, and in what format?
Before accepting my paper, this journal
wants my data to be in a public repository, but
which one?
My funder says I should deposit the data in a reputable
repository. But which one?
I’m collecting in-vivo animal
testing data –what metadata should I curate?
I’m about to start a set of experiments. In what
format should I record the data?
A web-based, curated, and searchable portal that monitors the
development and evolution of standards*, across all disciplines,
inter-related to databases/repositories and data policies
* A standard is a formal community specification for reporting, sharing and citing data, metadata and other digital assets.
Initial focus on metadata (or content) standards
Content standards
Models/Formats = Conceptual
model, conceptual schema,
exchange formats
Terminologies = Controlled
vocabularies, taxonomies,
thesauri, ontologies etc.
Guidelines = Minimum information
reporting requirements, checklists
Formats Terminologies Guidelines
Formats Terminologies Guidelines
240+
119+
709+
Source:
Sources:
MIAME
MIRIAM
MIQASMIX
MIGEN
ARRIVEMIAPE
MIASE
MIQE
MISFISHIE….
REMARK
CONSORT
SRAxml
SOFT FASTA
DICOM
MzML
SBRML
SEDML…
GELML
ISA
CML
MITAB
AAO
CHEBIOBI
PATO ENVO
MOD
BTO
IDO…
TEDDY
PRO
XAO
DO
VO
~1500
Source:
FAIRsharing enhances their findability
Content standards
Data policies by funders, journals and other organizations
Databases/Repositories
Formats Terminologies Guidelines
Mapping a complex and evolving landscape
270
4823
2
97
87 4
204
9 6 8
Paper in preparation, preliminary information as of July 2017
Ready for use, implementation, or recommendation
In development
Status uncertain
Deprecated as subsumed or superseded
All records are manually curated
in-house and verified by the
community behind each resource
Community verified status indicators
Collections group together
one or more types of
resource by domain,
project or organization.
Recommendations are a
core-set of resources that
are selected and
recommended by a funder
or journal data policy.
Grouping the data
Making FAIRsharing FAIR -Interoperability/Accessibility
• Data annotation:• Users/Maintainers – ORCID
• Organisations – FundRef
• Species – NCBI Taxon ontology
• Disciplines and Domains – re3data/EDAM/BRO
• API – swagger (ELIXIR guidelines)
• DOIs for standards (coming soon)
Making FAIRsharing FAIR -Findable - Embeddable Widget• Recommendation/Collection Widget for embedding
in third-party websites• Journal data policies (GigaScience, PLOS, Springer
Nature…)
• Standard Developing Organisations (e.g. TDWG)
• Societies/Organisations (e.g. ELIXIR)
Dr Massimiliano Izzo
Standard developing groups, incl:Journal publishers, incl:
Cross-links, data exchange, incl:
Societies and organisations, incl: Institutional RDM services, incl:
Projects, programmes, incl:
Working with and for the community
OBO