11
- Vendredi 27 mars 2009 1 PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL

- Vendredi 27 mars 2009 1 PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL

Embed Size (px)

Citation preview

Page 1: - Vendredi 27 mars 2009 1 PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL

- Vendredi 27 mars 2009 1

PRODIGUER un nœud de distribution des données

CMIP5 GIEC/IPCC

Sébastien Denvil

Pôle de Modélisation, IPSL

Page 2: - Vendredi 27 mars 2009 1 PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL

- Vendredi 27 mars 2009 2

Context : countdown of the GIEC/IPCC report

End of 2009 Fall 2010 : Climate simulations

End of 2010 ? : Data Distribution

End of 2010 Early 2012 : Scientific publications

Early 2013 : Report publication GIECC/IPCC AR5 (Assessment Report #5)

Octobre 2013 : Nobel price

Page 3: - Vendredi 27 mars 2009 1 PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL

- Vendredi 27 mars 2009 3

Context : National and European Project

PRODIGUER : project submitted in september 2008 to the GIS climat

In the wake of IS-ENES (FP7), Virtual Earth System Modeling resources Centre, Metadata standard and Metafor (FP7) metadata standard for climate modeling

Implementation of these tools at national level and integration to International effort

Must be done in close collaboration with national computing centers

Page 4: - Vendredi 27 mars 2009 1 PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL

- Vendredi 27 mars 2009 4

ESG/CMIP5 Timeline2008: Design and implement core functionality:

Browse and search Registration Single sign-on / security Publication Distributed metadata Server-side processing

Early 2009: Testbed By early 2009 it is expected to include at least seven centres in the US, Europe and Japan: Program for Climate Model Diagnosis and Intercomparison - PCMDI (U.S.), National Centre for Atmospheric Research - NCAR (U.S.), Geophysical Fluid Dynamics Laboratory - GFDL (U.S.), Oak Ridge National Laboratory - ORNL (U.S.), British Atmosphere Data Centre - BADC (U.K.), Max Planck Institute for Meteorology - MPI (Germany), The University of Tokyo Centre for Climate System Research (Japan).

2009: Deal with system integration issues and develop production system. By summer 2009, the hardware and software requirements will be provided to centres that want to

be Nodes.

2010: Modelling centres publish data2011-2012: Research and journal articles submissions2013: IPCC Report

Page 5: - Vendredi 27 mars 2009 1 PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL

- Vendredi 27 mars 2009 5

AR5 open issues

What are the set of runs to be done and, derived from that, the expected data volumes we can expect?

Expected participants – where will data be hosted? (Who is going to step up and host the data nodes, and provide the level of support expect in terms of manpower and hardware capability.). This includes minimum software and hardware data holding site requirement (e.g. ftp access and ESG authentication and authorization) and a skilled staff help desk.

The AR5 archive is to be globally distributed with support for WG1, WG2, and WG3. Will there be a need for a central (or core) archive and what will it look like?

Replication of holdings - disaster protection, a desire to have a replica of the core data archive on every continent, etc.

Number of users and level of access – scientist, policy makers, economists, health officials, etc.

Page 6: - Vendredi 27 mars 2009 1 PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL

- Vendredi 27 mars 2009 6

Page 7: - Vendredi 27 mars 2009 1 PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL

- Vendredi 27 mars 2009 7

Page 8: - Vendredi 27 mars 2009 1 PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL

- Vendredi 27 mars 2009 8

Orders of magnitude

Climate models, centennial runs. Resolutions used

Atmosphere 2.5° (280 Km) : 144 x 143 x 39

Ocean 2° (220 Km) : 180 x 149 x 31

Atm 2.5° - Ocean 2° : 20 GB/y, 300 ans 5,85 TB Atm 1.0° - Ocean 2° : 60 GB/y, 300 ans 17,5 TB Atm 0.5° - Ocean 0,5° : 400 GB/y, 30 ans 11,75 TB

Page 9: - Vendredi 27 mars 2009 1 PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL

- Vendredi 27 mars 2009 9

Global data amount

Raw Data amount low bound 565 TB

Raw Data amount high bound 1000 TB

CMIP5 Distribution (25-50%) (140-280) (250-500) TB

Global Storage (Raw+Distributed) 700-1500 TB

LMDz 0.5° (50 Km)

Page 10: - Vendredi 27 mars 2009 1 PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL

- Vendredi 27 mars 2009 10

Management of data since years

Mainly centralised, store on a SAN OpenDap access on Supercomputing Centre Basic system of data retrieval Access to raw data Security/Authentication/Restriction to data access : not an

issue No on demand post-processing No metadata integration No support for high level database query

Page 11: - Vendredi 27 mars 2009 1 PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL

- Vendredi 27 mars 2009 11

Data management with Prodiguer

Move the data a minimum, keep them close to supercomputing centres if possible Data access protocol, strong links with computing centres

When data needs to be moved do it quickly and with a minimum amount of human intervention Management of storage resources, fast network

Keep a track of what we got, particularly what is on deep storage Metadata et data catalogues

Exploiting of federation of sites Grid middleware Data grid ?