22
enabling data-intensive science through data infrastructures LIBER 42 nd Annual Conference München, 27 June 2013 Carlos Morais Pires European Commission e-Infrastructures, DG CNECT.C1 Author’s views do not commit the European Commission

Enabling Data-Intensive Science Through Data Infrastructures

Embed Size (px)

DESCRIPTION

These slides are from a talk given at LIBER's 42nd annual conference by Carlos Morais Pires of the European Commission. In light of the current data deluge, and plans by the European Commission to harness this deluge through the implementation of e-infrastructures for data driven science under Horizon 2020, Pires issued a call to action to libraries to engage in the data infrastructure and bring their own unique, and now much needed competencies, to bear in bringing meaning to, and spreading the word about, data-driven science.

Citation preview

Page 1: Enabling Data-Intensive Science Through Data Infrastructures

enabling data-intensive sciencethrough data infrastructures

LIBER 42nd Annual ConferenceMünchen, 27 June 2013

Carlos Morais PiresEuropean Commission

e-Infrastructures, DG CNECT.C1

Author’s views do not commit the European Commission

Page 2: Enabling Data-Intensive Science Through Data Infrastructures

summary

• engineers and librarians… all about communicating information

• data as infrastructure: Europe is "Riding the Wave"

• interoperable data infrastructure

• balancing community driven and service driven initiatives

• H2020 WP under construction (pending “trilogue” decisions)

• times of change & “influence the future”

Page 3: Enabling Data-Intensive Science Through Data Infrastructures

engineers…

The number pi (symbol: π) /paɪ/ is a mathematical constant that is the ratio of a circle's circumference to its diameter, and is approximately equal to 3.14159 26535 89793 23846 26433 83279 50288 41971 69399 37510 58…

in http://en.wikipedia.org/

22 divided by 7 = 3,1428571428571428571428571428571

Page 4: Enabling Data-Intensive Science Through Data Infrastructures

it’s all about bits…http://en.wikipedia.org/wiki/Entropy_%28information_theory%29#Definition

In information theory, entropy is a measure of the uncertainty in a random variable.

In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message.

Entropy is typically measured in bits.

Shannon entropy is the average unpredictability in a random variable, which is equivalent to its information content. The concept was introduced by Claude E. Shannon in his 1948 paper "A Mathematical Theory of Communication"…

Page 5: Enabling Data-Intensive Science Through Data Infrastructures

it’s all about communicating information…

Page 6: Enabling Data-Intensive Science Through Data Infrastructures

Policy context

A Reinforced European Research Area Partnership for Excellence and Growth, COM(2012) 392 – July 2012

Towards better access to scientific information: boosting the benefits of public investments in research, COM(2012) 401 final - July2012

Commission, Recommendation on access and preservation of scientific information, C(2012) 4890 final – July 2012

Page 7: Enabling Data-Intensive Science Through Data Infrastructures

data as infrastructure: Europe is Riding the Wave

The High Level Expert Group on Scientific Data presented Riding the Wave in October 2010

Vision: "data e-infrastructure that supports seamless access, use, re-use, and trust of data. In a sense, the physical and technical infrastructure becomes invisible and the data themselves become the infrastructure a valuable asset on which science, technology, the economy and society can advance".

Page 8: Enabling Data-Intensive Science Through Data Infrastructures

useful definitions

Data: digital recorded factual material commonly accepted in the scientific community as necessary to validate research findings

(not include lab notebooks, preliminary analysis, drafts of scientific papers, plans for future research, peer review reports, communication with peers, physical objects, lab specimens)

[c.f. White House Memo on "Increasing Access to the Results of Federally Funded Scientific Research"]

Data infrastructures: services, applications, tools, knowledge and policies for research data to be discoverable, understandable, accessible, preserved and curated… and available 24/7

Page 9: Enabling Data-Intensive Science Through Data Infrastructures

implementing interoperable data infrastructure

(a)data generators; research projects, big research infrastructure, installations or medium size laboratories, simulation centres, surveys or individual researchers

(b)discipline-specific data service providers, providing data and workflows as a service

(c) providers of generic common data services (computing centres, libraries)

(d)researchers as users, using the data for science and engineering

community driven data infrastructure, including ESFRI, ESFRI clusters and others

Page 10: Enabling Data-Intensive Science Through Data Infrastructures

network infrastructure, GÉANT

distributed computing/software infrastructure

scientific data infrastructure

data infrastructure:bridging islands

bridges

Page 11: Enabling Data-Intensive Science Through Data Infrastructures

consultation towards horizon2020

Page 12: Enabling Data-Intensive Science Through Data Infrastructures

consultation towards horizon2020

Page 13: Enabling Data-Intensive Science Through Data Infrastructures

What

Relevance, Strengths, Weaknesses

Propose additional areas of actions

How many

80+ replies from 100+ organisations

Who

Research organisations and associations, universities,…

LERU, LIBER, CNRS, COAR, EIROforum,, OpenAIRE, CERN, APA, Volker Mehrmann TU Berlin, European Bioinformatics Institute, Max Planck Society, Observatoire Astronomique de Strasbourg, Museum f. Naturkunde Berlin, Pensoft Publishers, University of Edinburgh, University of Göttingen, University of Florence, etc.

about the public consultation

Page 14: Enabling Data-Intensive Science Through Data Infrastructures

overall opinion

Page 15: Enabling Data-Intensive Science Through Data Infrastructures

responses…

involvement of all stakeholders across the fiches

relevance of long tail, universities have important role

preservation and access are related

look at areas that are less developed in IT

skills development

workflows for interaction researcher/data centres

match research and education

Page 16: Enabling Data-Intensive Science Through Data Infrastructures

H2020 workprogramme being preparedplease note that things may change as result of the “trilogue”

H2020 Research Infrastructure: ensure that Europe has world-class research infrastructures, including e-infrastructures, accessible to all researchers in Europe and beyond.

It is a key area of H2020 Excellence in Science priority.

e-infrastructures will make every European researcher digital.

5 challenges: (1) High Performance Computing, (2) Connectivity, (3) Data, (4) e-Infrastructure Integration and (5) Policy and International.

Page 17: Enabling Data-Intensive Science Through Data Infrastructures

H2020 workprogramme… (current version)

• Community data services

• Managing, preserving and computing with big research data

• E-Infrastructure for Open Access

• Towards global data e-Infrastructures (support RDA)

• e-Infrastructures for virtual research environments (VRE)

• Integration of Core and Basic Operations Services for e-Infrastructures

• Skills and professions for e-infrastructures

• Centres of Excellence for computing applications

• PRACE

• Network of Competence Centres for SMEs

• GEANTThese lines are related with the content of the Framework for Action

Page 18: Enabling Data-Intensive Science Through Data Infrastructures

Research Data Alliance:Common Infrastructure, Policy and PracticeDrives Data Sharing and Exchange throughout the Data Life Cyclehttp://rd-alliance.org

From Prof. Fran Berman and Prof. John Wood, Members of the RDA Council

Page 19: Enabling Data-Intensive Science Through Data Infrastructures

the conference to shape the future…

Re-inventing the Library for the Future

Libraries and why we need them

The revolution in Open Science

Preparedness for digital preservation

New horizons for OA policies in Europe

Ten recommendations on Data Management

What challenges for libraries to adapt to the new Era

The future of Science Publishing Ego-System

Converging parallel universes

Developing Data Informatics Capability in Libraries

Etc.

Page 20: Enabling Data-Intensive Science Through Data Infrastructures

“The Times They Are A-Changin…”

Come gather 'round peopleWherever you roamAnd admit that the watersAround you have grownAnd accept it that soonYou'll be drenched to the boneIf your time to youIs worth savin'Then you better start swimmin'Or you'll sink like a stoneFor the times they are a-changin'.

Bob Dylan

Page 21: Enabling Data-Intensive Science Through Data Infrastructures

back to engineering:looking at “change” and “amplitude modulation”

Page 22: Enabling Data-Intensive Science Through Data Infrastructures

Thank You!Carlos Morais Pires

carlos.morais-pires[at]ec.europa.eu@CarlosMPires