Upload
blue-bridge
View
102
Download
0
Embed Size (px)
Citation preview
www.d4science.org
D4SCIENCE DATA INFRASTRUCTURE Facilitator for a FAIR data management
Pasquale PaganoCNR – ISTI (Pisa, Italy)
www.d4science.orgD4Science: Facilitator for a FAIR data management 2
Outline
Context
Requirements
Virtual Research Environments
Dealing with complexity
FAIR principles
Conclusions
www.d4science.orgD4Science: Facilitator for a FAIR data management 3
D4Science is an hybrid data infrastructuretechnologies integrated to provide
elastic access and usage of data and data-management capabilities• +55 VREs hosted• +2500 scientists in 44 countries• +50 data providers• +25,000 derivative data/month• over a billion quality records • +20,000 temporal datasets• +50,000 spatial datasets • 99.7% service availability
Humanities and Cultural Heritage
Social Mining
Environmental Studies
Biological and Ecological Studies
www.d4science.org
are multidisciplinary, involve members belonging to diverse organisations
cannot rely on costly environments managed by dedicated organizations
require to access data and services that are spread among many providers
Communities’ needs
D4Science: Facilitator for a FAIR data management 4
cost and time required to implement this approach largely exceed the available capacities
Not individual researchers but group of researchers
dynamically aggregated to address research questions/problems
build and operate their own supporting environments
wish to effectively inject open science in daily tasks
www.d4science.org
Requirements for IT systems
Support collaborative research and experimentation
Implement Reproducibility-Repeatability-Reusability
Allow sharing data and findings
Grant open access to produced scientific knowledge and data
Tackle simplified access to existing computing and storage resources
Ensure low operational and maintenance costs
Manage heterogeneous data access policies
D4Science: Facilitator for a FAIR data management 5
www.d4science.org
Virtual Research Environment
An operational environment
Where set of resources (data, services, computational, and storage resources)
are assigned to group of users via interfaces
for a limited timeframe
L. Candela, D. Castelli, P. Pagano (2013) Virtual Research Environments: An Overview and a Research Agenda. Data Science Journal, Vol. 12
Created on demand
Regulated by tailored policies
No cost for the resource providers
Open to host and operate custom software
D4Science: Facilitator for a FAIR data management 6
www.d4science.org
D4Science Geospatial Interpolation
In situ observations from Copernicus Marine Environment Monitoring Service
Interpolation service SeaDataNet Data-
Interpolating Variational Analysis service (DIVA)
Estimates global, uniform distributions of environmental parameters from scattered observations
Exploit the global estimate and run niche modelling to calculate a species distribution
www.d4science.org
WPS
REST
Geospatial data infra.
Work--space
WMSWCSGeoTiffNetCDFOPeNDAP
VRE
Data preparation+
Comp. parametersNetCDF file
Provenance Metadata(Prov-O)
Out. file
Sharing
Input
User
Other user
OGC StandardsVisualisation
Publication
VRE
The SeaDataNet-D4Science ConnectorArchitecture
www.d4science.orgD4Science: Facilitator for a FAIR data management 9
•I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation
•I2. (meta)data use vocabularies that follow FAIR principles
•I3. (meta)data include qualified references to other (meta)data.
•R1. meta(data) have a plurality of accurate and relevant attributes
•R1.1. (meta)data are released with a clear and accessible data usage license.
•R1.2. (meta)data are associated with their provenance.•R1.3. (meta)data meet domain-relevant community standards.
•A1 retrievable by their identifier using a standardized protocol
•A1.1 the protocol is open, free, and universally implementable
•A1.2 the protocol allows for an authentication and authorization procedure
•A2 metadata are accessible, even when the data are no longer available.
•F1. globally unique and eternally persistent identifier•F2. rich metadata•F3. indexed in a searchable resource•F4. metadata specify the data identifier
Findable Accessible
InteroperableRe-usable
www.d4science.orgD4Science: Facilitator for a FAIR data management 10
D4Science: Findability
Findability is enabled
• By extending the concept of resources to datasets, methods/algorithms, research objects, and services
• by assigning to each of the D4Science managed resources • a unique identifier• rich and extensible metadata (including attribution, provenance
and licence information)
• by publishing resources in tailored and global catalogues that supports keyword, faceted and temporal/geospatial discovery
www.d4science.orgD4Science: Facilitator for a FAIR data management 11
D4Science: Accessibility
Accessibility is obtained
• by making shared and published resources available through multiple protocols in order to maximise the set of potential exploitation cases
• by providing also for transparent Authentication and Authorization, whenever the published resource requires it
• by enabling policies enforcement
www.d4science.orgD4Science: Facilitator for a FAIR data management 12
D4Science: Interoperability
Interoperability is facilitated
• by enriching automatically the resources with metadata in multiple formats • including ISO 19115, Darwin Core, Dublin Core, DCAT and
application profiles
• by promoting exploitation of ontologies and controlled vocabularies
www.d4science.orgD4Science: Facilitator for a FAIR data management 13
D4Science: Reusability
Reusability is promoted
• by systematically endowing shared and published resources with • a clear licence governing their use/re-use • citation and attribution statements
• by systematically generating provenance metadata
• by design allowing the execution of the experiment in the same technical and contextual environment
www.d4science.org
D4Science enacts FAIR because …
Embrace as-a-Service approach Exploit communication standards Hide complexity of computational capabilities Enable Access via VRE governed by tailored policies Facilitate provenance and attribution management Implement economy-of-scale and costs reduction Promote collaboration and sharing Enable Re-usability