18
INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Intelligent Distributed Data Management in Earth System Science S. Kindermann, DKRZ, Germany K. Ronneberger, DKRZ, Germany T. Brücher, University of Cologne, Germany H. Ramthun, M&D, Germany M. Stockhause, MPI-Met, IFM-Geomar, Germany

Intelligent Distributed Data Management in Earth System Science

  • Upload
    gagan

  • View
    21

  • Download
    1

Embed Size (px)

DESCRIPTION

Intelligent Distributed Data Management in Earth System Science. S. Kindermann, DKRZ, Germany K. Ronneberger, DKRZ, Germany T. Brücher, University of Cologne, Germany H. Ramthun, M&D, Germany M. Stockhause, MPI-Met, IFM-Geomar, Germany. Structure. What is Earthsystem Science about? - PowerPoint PPT Presentation

Citation preview

Page 1: Intelligent Distributed Data Management in  Earth System Science

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

Intelligent Distributed Data Management in

Earth System Science

S. Kindermann, DKRZ, GermanyK. Ronneberger, DKRZ, Germany

T. Brücher, University of Cologne, GermanyH. Ramthun, M&D, Germany

M. Stockhause, MPI-Met, IFM-Geomar, Germany

Page 2: Intelligent Distributed Data Management in  Earth System Science

EGEE User Forum `07 Manchester 2

Enabling Grids for E-sciencE

INFSO-RI-031688 Deutsches Klimarechenzentrum

Structure

• What is Earthsystem Science about?– Typical workflows– Traditional infrastructure

• Why can grid-technology help?– Limits of the current practice

• How do we use this technology?– Conceptual outline of the developing infrastructure – Outline of the developed prototype

• Potential impact and vision– Next steps and challenges

Page 3: Intelligent Distributed Data Management in  Earth System Science

EGEE User Forum `07 Manchester 3

Enabling Grids for E-sciencE

INFSO-RI-031688 Deutsches Klimarechenzentrum

Motivation: Data in ESSModel Output Data + Observation Data + Analysis Data

Scenario Data

Data related to geo-referenced physical variables

Page 4: Intelligent Distributed Data Management in  Earth System Science

EGEE User Forum `07 Manchester 4

Enabling Grids for E-sciencE

INFSO-RI-031688 Deutsches Klimarechenzentrum

Collect & Prepare

Visualize4

Analyse

Find & Select

Model DataObservation Data

Analysis Dataset

Result Dataset

Scenario data

3

2

1

„I want to correlate model data from DKRZ with observation data from DWD and satellite data from DLR“

• Contact each data provider• Learn their data search utilities• Find and select data

• Get access rights for datasets at each data provider• Learn their data access / preprocessing services• Get access to sufficient storage facilities• Trigger preprocessing and download data

At central service provider: • start analysis tools• produce undocumented data

• copy to local resources

• create visualization

„has somebody done similiar things i want ?? Can i reuse data for …??“

ESS Data Management Nowadays

Page 5: Intelligent Distributed Data Management in  Earth System Science

EGEE User Forum `07 Manchester 5

Enabling Grids for E-sciencE

INFSO-RI-031688 Deutsches Klimarechenzentrum

Collect & Prepare

Visualize4

Analyse

Find & Select

AWI, GKSS, …

World Data Centers

Analysis Dataset

Result Dataset

DKRZ,DWD

3

2

1

Bridging C3Grid and EGEE

C3Grid:

• Standardized metadata description

• Uniform discovery of German data providers

• Uniform data access

• Grid based data delivery

EGEE:

• established international collaboration platform

• secure data management

data analysis and data sharing platform

Key component: (ISO) metadata catalog for ESS data in EGEE

C3Grid Middleware

Page 6: Intelligent Distributed Data Management in  Earth System Science

EGEE User Forum `07 Manchester 6

Enabling Grids for E-sciencE

INFSO-RI-031688 Deutsches Klimarechenzentrum

Fin

d &

sele

ct

Collect

&

pre

pare

an

aly

se

vis

ualize

• Central web-portal: unique entrance point to common central metadata catalogue (Lucene index) and access facility

• Standardized Metadata: hierarchical description of discovery- and some use-aspects of the data (ISO 19115/ISO 19139)

• Standardized data request interface: hide the complexity of specific data access mechanisms and pre-processing functionality (webservice technology)

• Automatic update and republishing of metadata: metadata of data processing is logged, managed and can be harvested (AMGA + java extension, OAI-PMH server )

C3 Grid and EGEE - the components

Page 7: Intelligent Distributed Data Management in  Earth System Science

EGEE User Forum `07 Manchester 7

Enabling Grids for E-sciencE

INFSO-RI-031688 Deutsches Klimarechenzentrum

(1) EGEE and C3Grid: Discovery

EGEEEGEE

UI

C3Grid data interfaceC3Grid data interface

ClimateData

Workspace

Webservice Interface

SE

CEWNWNWNWNWNWN

LFCCatalog

Web Portal C3

Lucene Index

OAI-PMHserver

Webservice Interface

OAI-PMHserver

AMGAMetadata Catalog

(f) Publish (ISO

19115/19139)

(g) Harvest (OAI-PMH)

WDC Climate, WDC RSAT, WDC Mare, DWD, AWI, PIK, IFMGeomar, MPI-Met, GKSS

DataResource Metadata

(a) Publish (ISO

19115/19139)

(b) Harvest (OAI-PMH)

Page 8: Intelligent Distributed Data Management in  Earth System Science

EGEE User Forum `07 Manchester 8

Enabling Grids for E-sciencE

INFSO-RI-031688 Deutsches Klimarechenzentrum

(1) EGEE and C3Grid: Data Discovery

Page 9: Intelligent Distributed Data Management in  Earth System Science

EGEE User Forum `07 Manchester 9

Enabling Grids for E-sciencE

INFSO-RI-031688 Deutsches Klimarechenzentrum

(2) EGEE and C3Grid: Data Upload

Page 10: Intelligent Distributed Data Management in  Earth System Science

EGEE User Forum `07 Manchester 10

Enabling Grids for E-sciencE

INFSO-RI-031688 Deutsches Klimarechenzentrum

(2) EGEE and C3Grid: Data Upload

EGEEEGEE

UI

DataResource

C3Grid data interfaceC3Grid data interface

ClimateData

Workspace

Webservice Interface

SE

CEWNWNWNWNWNWN

LFCCatalog

Web Portal C3

Lucene Index

Webservice Interface

OAI-PMHserverOAI-PMH

server

AMGAMetadata Catalog

(1) Find & Select

(2) Collect & Prepare

(b) Retrieve (jdbc or archive)

(c) Stage & Provide

Webservice Interface

(a) Reqest (webservice)

(d) notifyWebservice Interface

(f) Transfer &

Register (lcg-tools)

(e) Reqest (webservice)

(g) Register

(Java-API)

Metadata

(f) Publish (ISO

19115/19139)

Page 11: Intelligent Distributed Data Management in  Earth System Science

EGEE User Forum `07 Manchester 11

Enabling Grids for E-sciencE

INFSO-RI-031688 Deutsches Klimarechenzentrum

(3) EGEE and C3Grid: Data Analysis

Page 12: Intelligent Distributed Data Management in  Earth System Science

EGEE User Forum `07 Manchester 12

Enabling Grids for E-sciencE

INFSO-RI-031688 Deutsches Klimarechenzentrum

(3) EGEE and C3Grid: Data Analysis

EGEEEGEE

UI

DataResource Metadata

C3Grid data interfaceC3Grid data interface

ClimateData

Workspace

Webservice Interface

SE

CEWNWNWNWNWNWN

(3) Analyse

LFCCatalog

(4) Visualize

Web Portal C3

Lucene Index

Webservice Interface

OAI-PMHserverOAI-PMH

server

AMGAMetadata Catalog

Webservice Interface

(b) submit

(glite)

qflux

qflux

(a) Reqest (webservice)(g)

Harvest (OAI-PMH)

(f) Publish (ISO

19115/19139)

(c) retrieve

(lcg-tools)

(e) Return graphic

(d) Update (Java-

API)

Page 13: Intelligent Distributed Data Management in  Earth System Science

EGEE User Forum `07 Manchester 13

Enabling Grids for E-sciencE

INFSO-RI-031688 Deutsches Klimarechenzentrum

(3) Example Workflow

• Example: Humidity flux

(QFLUX)

Page 14: Intelligent Distributed Data Management in  Earth System Science

EGEE User Forum `07 Manchester 14

Enabling Grids for E-sciencE

INFSO-RI-031688 Deutsches Klimarechenzentrum

Approach in international contextEarth System Grid project

(USA)

C3 Grid/(EGEE) NERC data grid (UK)

Scope

(project)

High performance access of climate model data

Uniform & effective discovery and access of data of various disciplines & types

Harmonized & detailed search and access of data of various disciplines & types

Data stock

(status)

• Homogenous

• Flat-file storage

• Heterogeneous

• Databases & flat-file storage

• Heterogeneous

• Databases & flat-file storage

Data description

(solution)

• Use aspect of data, tools and models

• E.g. NcML for netCDF data

• Discovery and some use aspects

• ISO 19115/ISO 19139

• Content of the data in great detail

• Semantic datamodel (CSML, based on GML)

Data access

(solution)

• Different protocols

• Intelligence at portal

• Uniform access interface

• Intelligence at data provider / grid

• Different protocols

• Link to Data Provider

Page 15: Intelligent Distributed Data Management in  Earth System Science

EGEE User Forum `07 Manchester 15

Enabling Grids for E-sciencE

INFSO-RI-031688 Deutsches Klimarechenzentrum

Potential Impact

Potential impact on EGEE ESR-community:Provide a framework to easily and consistently

exchange and manage esr-data and tools between EGEE and traditional earth science data-storage-systems

Potential impact on international ESR-community:

Approach is based on international standards (ISO 19139, OAI-PMH) and uniform interfaces (Web services). Thus other data centers and infrastructures can be integrated uniformly

Page 16: Intelligent Distributed Data Management in  Earth System Science

EGEE User Forum `07 Manchester 16

Enabling Grids for E-sciencE

INFSO-RI-031688 Deutsches Klimarechenzentrum

Next steps

• Expand the demonstrated prototype to a reliable and stable system

• Porting further workflows and some pre-processing functionalities to EGEE

• Enlarge the user community

Page 17: Intelligent Distributed Data Management in  Earth System Science

EGEE User Forum `07 Manchester 17

Enabling Grids for E-sciencE

INFSO-RI-031688 Deutsches Klimarechenzentrum

Future challenges or missing bricks

• Comprehensive and consistent security context to control access to (restricted) data with a single sign-on– Approach: federated AA infrastructure based on

Shibboleth

• Analysis-services description to improve discovery, use and share possibilities– Approach: adapt ISO19119/19139 as a common

metadata format for analysis-tool description

• Modularized workflows to increase the flexibility and enable intelligent scheduling – Approach: implement a workflow information

service

Page 18: Intelligent Distributed Data Management in  Earth System Science

EGEE User Forum `07 Manchester 18

Enabling Grids for E-sciencE

INFSO-RI-031688 Deutsches Klimarechenzentrum

Thank You

kindermann @ dkrz.de