30
AR5 Data and Product AR5 Data and Product Access Architecture Access Architecture Concepts for Discussion Concepts for Discussion Steve Hankin (NOAA/PMEL) Steve Hankin (NOAA/PMEL) (Not including metadata architecture or security)

AR5 Data and Product Access Architecture Concepts for Discussion Steve Hankin (NOAA/PMEL) (Not including metadata architecture or security)

Embed Size (px)

Citation preview

AR5 Data and Product AR5 Data and Product Access ArchitectureAccess Architecture

Concepts for DiscussionConcepts for Discussion

Steve Hankin (NOAA/PMEL)Steve Hankin (NOAA/PMEL)

(Not including metadata architecture or security)

June '07June '07 GO-ESSPGO-ESSP 22

You’ve just heard Bryan’s thoughts on You’ve just heard Bryan’s thoughts on requirementsrequirements (which probably resemble the following)(which probably resemble the following)

– User needs -- by IT sophistication level (WG*) User needs -- by IT sophistication level (WG*) WG1 - physical processesWG1 - physical processes

– Raw files (on native grids)Raw files (on native grids)– CF subsets (potentially large – e.g. global)CF subsets (potentially large – e.g. global)

Native grid and regriddedNative grid and regridded– Broad range of analyses (scope tbd by science community)Broad range of analyses (scope tbd by science community)– Intercomparison on hi-res global fieldsIntercomparison on hi-res global fields– Visualizations, tables, animations, …Visualizations, tables, animations, …

WG2,3 – WG2,3 – regionalregional impacts on life and societies; mitigationimpacts on life and societies; mitigation– CF subsets (regional)CF subsets (regional)– Basic analysis (e.g. area averages, extrema)Basic analysis (e.g. area averages, extrema)– Intercomparison on regional scaleIntercomparison on regional scale– Visualizations, tablesVisualizations, tables– tab-delimited (“Excel”) tab-delimited (“Excel”) – viz on globe (e.g.Google Earth), animations, …viz on globe (e.g.Google Earth), animations, …

June '07June '07 GO-ESSPGO-ESSP 33

Requirements, cont’dRequirements, cont’d

– Provider needs by IT capabilities levelProvider needs by IT capabilities level20-30 (est. 28?) contributing orgs20-30 (est. 28?) contributing orgs

Some providers not able to serve own dataSome providers not able to serve own data

Deployable AR5 components (if any) must install Deployable AR5 components (if any) must install easily at various infrastructureseasily at various infrastructures

User authentication/access controlUser authentication/access control

– Data volumesData volumes200+ TB (ESG proposal) – 20,000 TB (Bryan)200+ TB (ESG proposal) – 20,000 TB (Bryan)

June '07June '07 GO-ESSPGO-ESSP 44

How AR4 did itHow AR4 did it– Central DBCentral DB– Data sent on hard drives by postal serviceData sent on hard drives by postal service– All data regridded to same gridAll data regridded to same grid– QC via CMOR -- run at sites (scalable)QC via CMOR -- run at sites (scalable)– Some central analysis (summaries)Some central analysis (summaries)– Massive data distribution from a central pointMassive data distribution from a central point

AR4 Data Base:

• 30 Tbyte data collection

• 61,000 files

June '07June '07 GO-ESSPGO-ESSP 55

AR4 stumbling blocksAR4 stumbling blocks

Show stoppers:Show stoppers:– Some ocean models could not be regridded to Some ocean models could not be regridded to

the AR4 grid without information lossthe AR4 grid without information loss(solved?)(solved?)

DifficultiesDifficulties– Unreliable disk drivesUnreliable disk drives– Headache to match CMOR requirementsHeadache to match CMOR requirements– No doubt many other war stories ….No doubt many other war stories ….

Could we adapt the AR4Could we adapt the AR4approach to AR5?approach to AR5?

ESG proposal asserts, “No”.ESG proposal asserts, “No”. ““With an increasing number of users and an increasing With an increasing number of users and an increasing

quantity of data, it will no longer be feasible to carry out quantity of data, it will no longer be feasible to carry out the requirements of AR5 with the centralized data the requirements of AR5 with the centralized data management strategy utilized for AR4.”management strategy utilized for AR4.”

Well, that’s the party line, anyway.Well, that’s the party line, anyway.

Assertion:Assertion:if necessary a centralized solution is again if necessary a centralized solution is again possiblepossible

June '07June '07 GO-ESSPGO-ESSP 77

Centralized approachCentralized approach

Ship disks againShip disks again– Disk drives today: $250 = 500 GbytesDisk drives today: $250 = 500 Gbytes– By AR5 time (24 months?) , say, 2-5 Tbytes of By AR5 time (24 months?) , say, 2-5 Tbytes of

disk could reasonably be mailed from each disk could reasonably be mailed from each modeling sitemodeling site

– With insistence on a standard drive model, With insistence on a standard drive model, might retain data on original disksmight retain data on original disks

– Up to 150Tbyte by this meansUp to 150Tbyte by this means– Who would step forward to take this burdenWho would step forward to take this burden

June '07June '07 GO-ESSPGO-ESSP 88

Centralized approachCentralized approach

All data regridded to standard gridAll data regridded to standard grid– Accept a sub-optimal resolution, but add Accept a sub-optimal resolution, but add

GODAE-style hi-res fields (surface-only , GODAE-style hi-res fields (surface-only , selected sections and time series, etc.)selected sections and time series, etc.)

Hi-res analysis results. E.g. vertical Hi-res analysis results. E.g. vertical integralsintegrals

June '07June '07 GO-ESSPGO-ESSP 99

Could we adapt the AR4Could we adapt the AR4approach to AR5?approach to AR5?

Major burdens on [whatever] host organizationMajor burdens on [whatever] host organization– FinancialFinancial– Sysadmin headachesSysadmin headaches– Network loadsNetwork loads– IO loads from subsettingIO loads from subsetting

Compromises in the flexibility of analysesCompromises in the flexibility of analyses(due to pre-computed fields)(due to pre-computed fields)

But it could work …But it could work …

June '07June '07 GO-ESSPGO-ESSP 1010

Why make this point ?Why make this point ?

The IT challenges that we are debating are anThe IT challenges that we are debating are an opportunityopportunity to demonstrate a new way of to demonstrate a new way of doing thingsdoing things– The risk is that we disappoint ourselvesThe risk is that we disappoint ourselves

(as much as to AR5 science)(as much as to AR5 science)

What we want to demonstrate:What we want to demonstrate:– A “data grid” – a scalable, distributed approachA “data grid” – a scalable, distributed approach– The potential of IT to improve how science is doneThe potential of IT to improve how science is done– Enhanced collaborationEnhanced collaboration

June '07June '07 GO-ESSPGO-ESSP 1111

Time TablesTime Tables

Distributed technology has to be demonstrated in Distributed technology has to be demonstrated in time for AR5 planners to make decisions.time for AR5 planners to make decisions.

18 months from now (18 months from now (“early 2009” in the SciDAC “early 2009” in the SciDAC proposal) for functioning testbedproposal) for functioning testbed

– Conclusions:Conclusions:Few (if any) new “standards” can be considered. Must Few (if any) new “standards” can be considered. Must work with the ones we have.work with the ones we have.Consider areas in need of further standardization as Consider areas in need of further standardization as testing opportunitiestesting opportunitiesCode components should be running at at least a BETA Code components should be running at at least a BETA level by (?when? 12 months?) [group sense?]level by (?when? 12 months?) [group sense?]

netCDF-CF files

atomic datasets (aggregations)

analyses (incl. regridding)

products (viz, etc.)

services (protocols)

FTP

OPeNDAP & WCS (*)

OPeNDAP & WCS

* - analysis embedded in URL. No syntax standard. (F-TDS?)

multiple (**)

** - LAS request protocol; TDS/netCDF “fileout”; WMS?

Services(protocols)

Proposal:ESG Data and Product Access

Stack

June '07June '07 GO-ESSPGO-ESSP 1313

netCDF-CF files

atomic datasets (aggregations)

analyses (incl. regridding)

products (viz, etc.)

rawfiles

desktop access & subsets

desktop access & subsets

Visualizations, tables & scripts

Products

ESG Data and Product Access

Stack

June '07June '07 GO-ESSPGO-ESSP 1414

Data suppliersData suppliers

internet

Gateway node

Gateway node

Data node

Data node

Data node

June '07June '07 GO-ESSPGO-ESSP 1515

netCDF-CF files

atomic datasets (aggregations)

analyses (incl. regridding)

products (viz, etc.)

rawfiles

desktop access & subsets

desktop access & subsets

Visualizations, tables & scripts

O(1TB)

How to distribute the layers on the

nodes?

O(10GB)

O(0.1-10GB)

O(1-10MB)

Size of single data

requests

Which operations are feasible over

the internet?

June '07June '07 GO-ESSPGO-ESSP 1616

netCDF-CF files

atomic datasets (aggregations)

analyses (incl. regridding)

products (viz, etc.)

Gateway node

netCDF-CF files

atomic datasets (aggregations)

analyses (incl. regridding)

Data node

Proposed Proposed deployment of deployment of stack layers stack layers

based on output based on output sizessizes

Server-side Server-side analysisanalysis

netCDF-CF files

atomic datasets (aggregations)

analyses (incl. regridding)

any node

netCDF-CF files

atomic datasets (aggregations)

analyses (incl. regridding)

any node

Differencing:a standard analysis

operation (and a perennial issue for model

intercomparisons)

Difference

Regrid

netCDF-CF files

atomic datasets (aggregations)

analyses (incl. regridding)

products (viz, etc.)

Gateway node

netCDF-CF files

atomic datasets (aggregations)

analyses (incl. regridding)

any node

Difference

Regrid

netCDF-CF files

atomic datasets (aggregations)

analyses (incl. regridding)

RegridDifferencing:

also doable in the product layer

June '07June '07 GO-ESSPGO-ESSP 1919

netCDF-CF files

atomic datasets (aggregations)

analyses (incl. regridding)

products (viz, etc.)

An Existing ImplementationAn Existing Implementation

TDS(w/ HYRAX?)

F-TDS(a TDS plug-in)(“F” for ferret, but applicable to other legacy apps, too)

LAS(using ferret, CDAT

and other legacy apps)

F-TDSF-TDSTDS

IOServiceProvider

Ferret

(or other legacy app.)

http://server/_expr_{levitus}{Tave=TEMP[Z=@AVE]}http://server/_expr_{levitus}{Tave=TEMP[Z=@AVE]}

http://server/_expr_{model(s)}{<http://server/_expr_{model(s)}{<expressionexpression>}>}

Data provider supplies own regridding and

analysis tools.

Java CDAT

Ferret

Java

Matlab

Java

(We need to standardize an analysis expression

language.)

Workflow orchestrationWorkflow orchestration

Backend Service

Backend Service

Backend Service

metadata

LAS API

back endrequest (SOAP)

Product Server

Backend Service

TDSOPeNDAP

LegacyCDAT

JDBC LegacyFerret

Serviceproxy

LAS Architecture (v7)LAS Architecture (v7)

UI

netCDFfiles

SQLdatabase

Metadata(XML)

GISservices

ServiceServiceAPIAPI

SOAPSOAP

June '07June '07 GO-ESSPGO-ESSP 2222

DesktopDesktop::Matlab,Matlab,IDL, IDV,IDL, IDV,Ferret,Ferret,GrADS, GrADS, ……

Information Products

netCDF,netCDF,ASCII,ASCII,GIS layersGIS layers

June '07June '07 GO-ESSPGO-ESSP 2323

What products should AR5 offer ?What products should AR5 offer ?

A matter of policy tbd:A matter of policy tbd:– Each gateway node offers distinct productsEach gateway node offers distinct products

(CDAT, NCL, BADC, Ferret, Matlab, …)(CDAT, NCL, BADC, Ferret, Matlab, …)oror

– Standard set of productsStandard set of productsoror

– Some combination of these Some combination of these

June '07June '07 GO-ESSPGO-ESSP 2424

One style of user experience:One style of user experience:access to native coordinates and regridded fieldsaccess to native coordinates and regridded fields

June '07June '07 GO-ESSPGO-ESSP 2525

Large subsets may be created in batch mode

Visual model intercomparison

June '07June '07 GO-ESSPGO-ESSP 2727

Segue from browser to

desktop

June '07June '07 GO-ESSPGO-ESSP 2828

Plot on Google Earth

• Fine structure materializes as we zoom in

Display to Google Earth ?

June '07June '07 GO-ESSPGO-ESSP 2929

An AR5-wide UI through HTML smoke and mirrors

(“sister servers”)

LASUI

NetScape

Data

LAS

site 1

Meta

Meta

VIRTUAL server

Data

LAS

Meta

Data

LAS

site 2

Meta

site 4

Data

LAS

Meta

site 3

LASuser

interface

Meta Meta

Meta

June '07June '07 GO-ESSPGO-ESSP 3030

DiscussionDiscussion(Thank you)(Thank you)