View
303
Download
0
Embed Size (px)
Citation preview
.02 E. Dzalé Yeumo/ IG Agriculture meeting 21 September 2015
Agenda
Introduction
What data do we produce at INRA
Current data management and sharing practices and pain points: the point of
view of some researchers
The service offer under construction
.03 E. Dzalé Yeumo/ IG Agriculture meeting 21 September 2015
2009: political awarness of Inra CEO
2012: Report* of Inra scientific council « data management and sharing » 9 recommendations, 1st : define Inra Policy
2013: Inra data sharing policy 11 data management and sharing principles Implementation
Domain specific working groups inventory, requirements Trans-disciplinary working groups state of the art (legal/IP,
technical, social issues), proposals
Some key dates Introduction
*Gaspin, C., Pontier, D., Colinet, L., Dardel, F., Franc, A., Hologne, O., Le Gall, O., Maurin, N., Perrière, G., Pichot, C., Rodolphe, F. (2012). Rapport du groupe de travail sur la gestion et le partage des données http://prodinra.inra.fr/record/206746
.05 E. Dzalé Yeumo/ IG Agriculture meeting 21 September 2015
Current data sharing practices: the point of view of some scientists
E. Dzalé Yeumo, O. Hologne 16/juillet/2015
Genetic & genomic
- In general, data is released once the producers have published a paper - Many data sharing platforms exist at the national and international level - Many metadata and data format standards exist - Lots of data is produced in collaboration with public and private partners
Experimentation, observation and
simulation
- Raw data may be of as interest as processed data - Data sharing rules depends on the nature, granularity, and origin of the data - The importance of metadata is of paramount for the reusability of the released data - Lots of data is unique, and can be captured only once
Social sciences
- A few data sharing platforms exist at the national and international level - Lots of data are are bought and can’t be freely released - Experimental economics data can be freely released most of the time - Data documentation and statistical disclosure are of paramount importance - The importance of the longitudinal aspect (range or historical aspect) of the data increases the need for their long term preservation
Practices
.06 E. Dzalé Yeumo/ IG Agriculture meeting 21 September 2015
Current data management pain points: the point of view of some scientists
E. Dzalé Yeumo, O. Hologne 16/juillet/2015
Genetic & genomic
- Data is more and more massive economical model for long term hosting - Exchange, transfer and storage of large datasets - Lack of human resources - Gaps in semantic coverage - Many existing public data repositories are congested with important waiting time for the deposit of datasets - Risk of conferring an economic advantage to our competitors or to our partners competitors
Experimentation, observation and
simulation
- Data standardization - Exchange, transfer and storage of large datasets - Capturing metadata automatically - Gaps in semantic coverage - Metadata may be strategic (e.g protocols, methods) - Some metadata or data are sensitive (geographic information about epidemiologic data or GMO data) - Exchange of large datasets
Social sciences
- Data archiving: sustainability of the existing platforms - Statistical disclosure control - Lots of personal and sensitive data (social and economical survey data) risk of re-identification - Legal and intellectual property issues are of paramount importance (ex: data inferred from existing textual data through decision tools, data purchased from third parties)
Pain points
.07 E. Dzalé Yeumo/ IG Agriculture meeting 21 September 2015
What service offer to support the data management and sharing?
.08 E. Dzalé Yeumo/ IG Agriculture meeting 21 September 2015
The scientists whish list
E. Dzalé Yeumo, O. Hologne 16/juillet/2015
Genetic & genomic
- Provide templates for consortium agreements + identify special cases (biological material, long tradition of partnership, etc.)
- A data portal with access to all the data shared by INRA - DOIs - Metadata training - Volunteering for species-related data repositories at an international level
Experimentation, observation and
simulation
- Recognition of all the contributors - Recognition of data sharing as first
class skill at the institutional level
- Data papers training - A data portal with access to all the data shared by INRA - Harmonization of metadata standards and vocabularies
Social sciences
- Guidance with regards to publishing derived data: when, how?
- Webscrapping data of interest which may be removed from original sources - DOIs - A secured data platform that allows reviewers to access data and reproduce findings with respect to legal and IP requirements. - Thematic active data management platforms
Legal, IP , social
Technics, methods, tools
A digital repository
Base on the IT environment provided by the IT department: Two data centers The Netapp Storage Virtual Machine technology
Outsource the long term preservation Leverage the many existing data repositories
Upgrade the existing repositories up to trusted repositories in accordance with the Data Seal of Approval and the defra assessement grids: http://www.datasealofapproval.org/en/assessment/ and https://defradigital.blog.gov.uk/2015/02/09/are-you-a-mature-open-data-publisher/
Conform to the OAIS reference model Cover both active and historical data
Training
Technical school in data papers Technical school in Linked Data and how to publish data according to
Linked Data principles
.014 E. Dzalé Yeumo/ IG Agriculture meeting 21 September 2015
Appendix: 11 data sharing and management principles