INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
Intelligent Distributed Data Management in
Earth system scienceK. Ronneberger, DKRZ, GermanyS. Kindermann, DKRZ, Germany
T. Brücher, University of Cologne, GermanyH. Ramthun, M&D, Germany
M. Stockhause, MPI-Met, IFM-Geomar, Germany
1st EU-Review May 15.-16 2007 2
Enabling Grids for E-sciencE
INFSO-RI-031688
QFLUX: Humidity flux calculation
1st EU-Review May 15.-16 2007 3
Enabling Grids for E-sciencE
INFSO-RI-031688
Structure
• What is Earthsystem Science about?– Typical workflows– Traditional infrastructure
• Why can grid-technology help?– Limits of the current practice– Outline of possible and existing use areas
• How do we use this technology?– Conceptual Outline of the developing infrastructure – Demo of an example workflow
• Potential impact and vision– Next steps and challenges
1st EU-Review May 15.-16 2007 4
Enabling Grids for E-sciencE
INFSO-RI-031688
Earthsystem Sciences
• Goal: learn about the past, the present, and possible futures of the earth system
• Community: internationally and interdisciplinary distributed but strongly interconnected
• Method: Analysing, comparing and processing data
• Input: data from observations and/or other modelling studies
Collect & Prepare
Visualize4
Analyse
Find & Select
Distributed Climate Data
Model DataObservation Data
Analysis Dataset
Result Dataset
Scenario data
3
2
Data description
1
Typical workflow
1st EU-Review May 15.-16 2007 5
Enabling Grids for E-sciencE
INFSO-RI-031688
Visualize
selected
result
An example workflow: “qflux”
Collect & Prepare a temporal and spatial subset of the data
4
Analyse the integrated, transport of humidity between selected levels
Find & Select relevant & available datasets
Distributed Climate Data
Analysis Dataset
Result Dataset
Wind speed
3
2
1TemperatureSpecific
humidity
Datavolume
Several PB
~3,1TB(300-500 files)
~10,3GB
(28 files)
~76 MB
~6MB
~66KB
Location
Various data centers & portals
Institutional storage & computing
facilities
local facilities
Personal Computer
1st EU-Review May 15.-16 2007 6
Enabling Grids for E-sciencE
INFSO-RI-031688
Potential use of grid technology
• Search & selectSearch & select– Different portals with
different authentications and data descriptions
• Collect & prepareCollect & prepare– Different access
mechanisms of the different providers
– Pre-processing requires sufficient local facilities
• AnalyseAnalyse– Existing tools and already
processed data are available locally and miss proper description
• VisualizeVisualize– Detached from the remaining
workflow
Current issues• Central unique authentication to a common catalogue with standardized metadata
• Shared resources with standardized access hiding proprietary access mechanisms
• Commonly defined tool description• Log processing steps and automatically republish processed data
• Integrate basic visualization (first peep) into the workflow
1st EU-Review May 15.-16 2007 7
Enabling Grids for E-sciencE
INFSO-RI-031688
Fin
d &
sele
ct
Collect
&
pre
pare
an
aly
se
vis
ualiz
e
C3 Grid and EGEE - the components
• Central web-portal: unique entrance point to common central metadata catalogue (Lucene index) and access facility
• Standardized Metadata: hierarchical description of discovery- and some use-aspects of the data (ISO 19115/ISO 19139)
• Standardized access interface: hide the complexity of specific data access mechanisms and pre-processing functionalities (webservice technology)
• Automatic update and republishing of metadata: metadata of data processing is logged, managed and can be harvested (AMGA + java extension, OAI-PMH server )
1st EU-Review May 15.-16 2007 8
Enabling Grids for E-sciencE
INFSO-RI-031688
Data access in ESR grid projects Earth System Grid project
(USA)
C3 Grid (Germany)
NERC data grid (UK)
Scope
(project)
High performance access of climate model data
Uniform & effective discovery and access of data of various disciplines & types
Harmonized & detailed search and access of data of various disciplines & types
Data stock
(status)
• Homogenous
• Flat-file storage
• Heterogeneous
• Databases & flat-file storage
• Heterogeneous
• Databases & flat-file storage
Data description
(solution)
• Use aspect of data, tools and models
• E.g. NcML for netCDF data
• Discovery and some use aspects
• ISO 19115/ISO 19139
• Content of the data in great detail
• Semantic datamodel (CSML, based on GML)
Data access
(solution)
• Different protocols
• Intelligence at portal
• Uniform access interface
• Intelligence at data provider / grid
• Different protocols
• Intelligence at portal
1st EU-Review May 15.-16 2007 9
Enabling Grids for E-sciencE
INFSO-RI-031688
Bridging EGEE and C3
EGEEEGEE
UI
C3Grid data interfaceC3Grid data interface
ClimateData
Workspace
Webservice Interface
SE
CEWNWNWNWNWNWN
LFCCatalog
Web Portal C3
Lucene Index
OAI-PMHserver
Webservice Interface
OAI-PMHserver
AMGAMetadata Catalog
(f) Publish (ISO
19115/19139)
(g) Harvest (OAI-PMH)
German Climate Data Providers:
WDC Climate WDC RSAT WDC Mare DWD AWI PIK IFMGeomar MPI-Met GKSS
DataResource Metadata
(a) Publish (ISO
19115/19139)
(b) Harvest (OAI-PMH)
1st EU-Review May 15.-16 2007 10
Enabling Grids for E-sciencE
INFSO-RI-031688
Demo
(1) Search-, discover-, and select- functionalities of the portal
(2) Upload and register data to EGEE
(3) Trigger the example workflow qflux from the portal
1st EU-Review May 15.-16 2007 11
Enabling Grids for E-sciencE
INFSO-RI-031688
Upload pre-processed data to EGEE
EGEEEGEE
UI
DataResource
C3Grid data interfaceC3Grid data interface
ClimateData
Workspace
Webservice Interface
SE
CEWNWNWNWNWNWN
LFCCatalog
Web Portal C3
Lucene Index
Webservice Interface
OAI-PMHserverOAI-PMH
server
AMGAMetadata Catalog
(1) Find & Select
(2) Collect & Prepare
(b) Retrieve (jdbc or archive)
(c) Stage & Provide
Webservice Interface
(a) Request (webservice)
(d) notifyWebservice Interface
(f) Transfer &
Register (lcg-tools)
(e) Request (webservice)
(g) Register
(Java-API)
Metadata
(f) Publish (ISO
19115/19139)
1st EU-Review May 15.-16 2007 12
Enabling Grids for E-sciencE
INFSO-RI-031688
Trigger qflux workflow
EGEEEGEE
UI
DataResource Metadata
C3Grid data interfaceC3Grid data interface
ClimateData
Workspace
Webservice Interface
SE
CEWNWNWNWNWNWN
(3) Analyse
LFCCatalog
(4) Visualize
Web Portal C3
Lucene Index
Webservice Interface
OAI-PMHserverOAI-PMH
server
AMGAMetadata Catalog
Webservice Interface
(b) submit
(glite)
qflux
qflux
(a) Request (webservice)(g)
Harvest (OAI-PMH)
(f) Publish (ISO
19115/19139)
(c) retrieve
(lcg-tools)
(e) Return graphic
(d) Update (Java-
API)
1st EU-Review May 15.-16 2007 13
Enabling Grids for E-sciencE
INFSO-RI-031688
Potential Impact
Ease and accelerate the search, discovery, access and processing of German ESR data
Potential impact on the German ESR-community
Provide a framework to easily and consistently exchange and manage esr-data and tools between EGEE and traditional earth science data-storage-systems
Potential impact on current and potential EGEE ESR-community
Other portals or infrastructures can be integrated analogously to EGEE
Potential impact on international ESR-community
Built on international standards thus easy adaptable/expandable by other disciplines and by further partners
Potential impact on other disciplines
1st EU-Review May 15.-16 2007 14
Enabling Grids for E-sciencE
INFSO-RI-031688
Next steps
• Expand the demonstrated prototype to a reliable and stable system
• Porting further workflows and some pre-processing functionalities to EGEE
• Enlarge the user community
1st EU-Review May 15.-16 2007 15
Enabling Grids for E-sciencE
INFSO-RI-031688
Future challenges or missing bricks
• Establish a comprehensive and consistent security context to control access to (restricted) data with a single sign-on– C3Grid starts to implement a federated AA
infrastructure based on Shibboleth
• Describe analysis-services to improve discovery, use and share possibilities– First approaches to adapt ISO19119/19139 as a
common metadata format for tool description
• Modularize workflows to increase the flexibility and enable intelligent scheduling – First steps to implement a workflow information
service