13

Sharing Big Data - Bob Jones

Embed Size (px)

Citation preview

Page 1: Sharing Big Data - Bob Jones
Page 2: Sharing Big Data - Bob Jones

Sharing big data

15 June 2017Bob Jones

CERNBob.Jones <at> cern.ch

Helix Nebula – The Science Cloud

Helix Nebula – The Science Cloud with Grant Agreement 687614 is a Pre-Commercial Procurement Action funded by H2020 Framework Programme

Page 3: Sharing Big Data - Bob Jones

Accelerating Science and Innovation

Page 4: Sharing Big Data - Bob Jones

Data in High-Energy Physics

Based on DPHEP Study Group (2009). Data Preservation in High Energy Physics. http://arxiv.org/abs/0912.0255

Patricia Herterich

Page 5: Sharing Big Data - Bob Jones

5EPFL & SDSC visit 2017-03-24

CERN Open Data Portal

• 2015• 40 TB of 2010 data

• 2016• 320 TB of 2011 data

• Curation, release of • Simulated data (MC)

• Trigger information

• Configuration files

http://github.com/cernopendata

Page 6: Sharing Big Data - Bob Jones

Barend Mons, Leiden University Medical Center

Page 7: Sharing Big Data - Bob Jones

In the FAIR Data approach, data should be:

• Findable – Easy to find by both humans and computer systems and based on mandatory description of the metadata that allow the discovery of interesting datasets

• Accessible – Stored for long term such that they can be easily accessed and/or downloaded with well-defined license and access conditions (Open Access when possible), whether at the level of metadata, or at the level of the actual data content

• Interoperable – Ready to be combined with other datasets by humans as well as computer systems

• Reusable – Ready to be used for future research and to be processed further using computational methods.

https://www.dtls.nl/fair-data/

Peter Doorn, Director DANShttps://www.force11.org/group/fairgroup/fairprinciples

Page 8: Sharing Big Data - Bob Jones

27/06/2017

Page 9: Sharing Big Data - Bob Jones

The Hybrid Cloud ModelBrings together• research organisations,• data providers,• publicly funded e-

infrastructures,• commercial cloud service

providers

In a hybrid cloud with procurement and governance approaches suitable for the dynamic cloud market In-house

27/06/2017

Page 10: Sharing Big Data - Bob Jones

Data Commons is a Platform that fosters development of a digital Ecosystem

Treats products of research – data, software, methods, papers, training materials etc. as a digital asset (object)

Digital objects need to conform to FAIR principles

- Findable, Accessible, Interoperable, Reproducible

Digital objects exist in a shared virtual space (initial)- Find, Deposit, Manage, Share and Reuse: digital assets

Enables interactions between Producers and Consumers of digital assets

Gives currency to digital assets and the people who develop and support them

Philip E. Bourne, Ph.D. FACMI

Associate Director for Data Science

National Institutes of Health, USA

Page 11: Sharing Big Data - Bob Jones

Data Commons Pilot – connecting the pieces

Co-location of large and/or highly

utilized NIH funded data on the cloud

+ commonly used tools for analyzing

and sharing digital objects

to create an interoperable resource for

the research community.

Investigators will be able to collaborate

and share digital objects within this

environment and connect with others

Page 12: Sharing Big Data - Bob Jones

Impact

Biggest issuer of DOIs for software in the world

Reference material for publications

F1000, Wiley, eLife, PLoS, Elsevier, Nature, etc

Recommended by EC and National programmes

https://www.zenodo.org/

Page 13: Sharing Big Data - Bob Jones

Summary

Sharing big data needs technology, processes & organisation, people

FAIR principles represent best practice

Findable, Accessible, Interoperable, Reusable

Research communities around the world are developing science commons to accelerate the sharing of digital assets

27/06/2017