Upload
gagan
View
21
Download
1
Embed Size (px)
DESCRIPTION
Intelligent Distributed Data Management in Earth System Science. S. Kindermann, DKRZ, Germany K. Ronneberger, DKRZ, Germany T. Brücher, University of Cologne, Germany H. Ramthun, M&D, Germany M. Stockhause, MPI-Met, IFM-Geomar, Germany. Structure. What is Earthsystem Science about? - PowerPoint PPT Presentation
Citation preview
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
Intelligent Distributed Data Management in
Earth System Science
S. Kindermann, DKRZ, GermanyK. Ronneberger, DKRZ, Germany
T. Brücher, University of Cologne, GermanyH. Ramthun, M&D, Germany
M. Stockhause, MPI-Met, IFM-Geomar, Germany
EGEE User Forum `07 Manchester 2
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
Structure
• What is Earthsystem Science about?– Typical workflows– Traditional infrastructure
• Why can grid-technology help?– Limits of the current practice
• How do we use this technology?– Conceptual outline of the developing infrastructure – Outline of the developed prototype
• Potential impact and vision– Next steps and challenges
EGEE User Forum `07 Manchester 3
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
Motivation: Data in ESSModel Output Data + Observation Data + Analysis Data
Scenario Data
Data related to geo-referenced physical variables
EGEE User Forum `07 Manchester 4
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
Collect & Prepare
Visualize4
Analyse
Find & Select
Model DataObservation Data
Analysis Dataset
Result Dataset
Scenario data
3
2
1
„I want to correlate model data from DKRZ with observation data from DWD and satellite data from DLR“
• Contact each data provider• Learn their data search utilities• Find and select data
• Get access rights for datasets at each data provider• Learn their data access / preprocessing services• Get access to sufficient storage facilities• Trigger preprocessing and download data
At central service provider: • start analysis tools• produce undocumented data
• copy to local resources
• create visualization
„has somebody done similiar things i want ?? Can i reuse data for …??“
ESS Data Management Nowadays
EGEE User Forum `07 Manchester 5
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
Collect & Prepare
Visualize4
Analyse
Find & Select
AWI, GKSS, …
World Data Centers
Analysis Dataset
Result Dataset
DKRZ,DWD
3
2
1
Bridging C3Grid and EGEE
C3Grid:
• Standardized metadata description
• Uniform discovery of German data providers
• Uniform data access
• Grid based data delivery
EGEE:
• established international collaboration platform
• secure data management
data analysis and data sharing platform
Key component: (ISO) metadata catalog for ESS data in EGEE
C3Grid Middleware
EGEE User Forum `07 Manchester 6
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
Fin
d &
sele
ct
Collect
&
pre
pare
an
aly
se
vis
ualize
• Central web-portal: unique entrance point to common central metadata catalogue (Lucene index) and access facility
• Standardized Metadata: hierarchical description of discovery- and some use-aspects of the data (ISO 19115/ISO 19139)
• Standardized data request interface: hide the complexity of specific data access mechanisms and pre-processing functionality (webservice technology)
• Automatic update and republishing of metadata: metadata of data processing is logged, managed and can be harvested (AMGA + java extension, OAI-PMH server )
C3 Grid and EGEE - the components
EGEE User Forum `07 Manchester 7
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
(1) EGEE and C3Grid: Discovery
EGEEEGEE
UI
C3Grid data interfaceC3Grid data interface
ClimateData
Workspace
Webservice Interface
SE
CEWNWNWNWNWNWN
LFCCatalog
Web Portal C3
Lucene Index
OAI-PMHserver
Webservice Interface
OAI-PMHserver
AMGAMetadata Catalog
(f) Publish (ISO
19115/19139)
(g) Harvest (OAI-PMH)
WDC Climate, WDC RSAT, WDC Mare, DWD, AWI, PIK, IFMGeomar, MPI-Met, GKSS
DataResource Metadata
(a) Publish (ISO
19115/19139)
(b) Harvest (OAI-PMH)
EGEE User Forum `07 Manchester 8
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
(1) EGEE and C3Grid: Data Discovery
EGEE User Forum `07 Manchester 9
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
(2) EGEE and C3Grid: Data Upload
EGEE User Forum `07 Manchester 10
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
(2) EGEE and C3Grid: Data Upload
EGEEEGEE
UI
DataResource
C3Grid data interfaceC3Grid data interface
ClimateData
Workspace
Webservice Interface
SE
CEWNWNWNWNWNWN
LFCCatalog
Web Portal C3
Lucene Index
Webservice Interface
OAI-PMHserverOAI-PMH
server
AMGAMetadata Catalog
(1) Find & Select
(2) Collect & Prepare
(b) Retrieve (jdbc or archive)
(c) Stage & Provide
Webservice Interface
(a) Reqest (webservice)
(d) notifyWebservice Interface
(f) Transfer &
Register (lcg-tools)
(e) Reqest (webservice)
(g) Register
(Java-API)
Metadata
(f) Publish (ISO
19115/19139)
EGEE User Forum `07 Manchester 11
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
(3) EGEE and C3Grid: Data Analysis
EGEE User Forum `07 Manchester 12
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
(3) EGEE and C3Grid: Data Analysis
EGEEEGEE
UI
DataResource Metadata
C3Grid data interfaceC3Grid data interface
ClimateData
Workspace
Webservice Interface
SE
CEWNWNWNWNWNWN
(3) Analyse
LFCCatalog
(4) Visualize
Web Portal C3
Lucene Index
Webservice Interface
OAI-PMHserverOAI-PMH
server
AMGAMetadata Catalog
Webservice Interface
(b) submit
(glite)
qflux
qflux
(a) Reqest (webservice)(g)
Harvest (OAI-PMH)
(f) Publish (ISO
19115/19139)
(c) retrieve
(lcg-tools)
(e) Return graphic
(d) Update (Java-
API)
EGEE User Forum `07 Manchester 13
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
(3) Example Workflow
• Example: Humidity flux
(QFLUX)
EGEE User Forum `07 Manchester 14
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
Approach in international contextEarth System Grid project
(USA)
C3 Grid/(EGEE) NERC data grid (UK)
Scope
(project)
High performance access of climate model data
Uniform & effective discovery and access of data of various disciplines & types
Harmonized & detailed search and access of data of various disciplines & types
Data stock
(status)
• Homogenous
• Flat-file storage
• Heterogeneous
• Databases & flat-file storage
• Heterogeneous
• Databases & flat-file storage
Data description
(solution)
• Use aspect of data, tools and models
• E.g. NcML for netCDF data
• Discovery and some use aspects
• ISO 19115/ISO 19139
• Content of the data in great detail
• Semantic datamodel (CSML, based on GML)
Data access
(solution)
• Different protocols
• Intelligence at portal
• Uniform access interface
• Intelligence at data provider / grid
• Different protocols
• Link to Data Provider
EGEE User Forum `07 Manchester 15
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
Potential Impact
Potential impact on EGEE ESR-community:Provide a framework to easily and consistently
exchange and manage esr-data and tools between EGEE and traditional earth science data-storage-systems
Potential impact on international ESR-community:
Approach is based on international standards (ISO 19139, OAI-PMH) and uniform interfaces (Web services). Thus other data centers and infrastructures can be integrated uniformly
EGEE User Forum `07 Manchester 16
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
Next steps
• Expand the demonstrated prototype to a reliable and stable system
• Porting further workflows and some pre-processing functionalities to EGEE
• Enlarge the user community
EGEE User Forum `07 Manchester 17
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
Future challenges or missing bricks
• Comprehensive and consistent security context to control access to (restricted) data with a single sign-on– Approach: federated AA infrastructure based on
Shibboleth
• Analysis-services description to improve discovery, use and share possibilities– Approach: adapt ISO19119/19139 as a common
metadata format for analysis-tool description
• Modularized workflows to increase the flexibility and enable intelligent scheduling – Approach: implement a workflow information
service
EGEE User Forum `07 Manchester 18
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
Thank You
kindermann @ dkrz.de