48
SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson, Benyang Tang, Gerald Manipon, Dominic Mazzoni, Amy Braverman, and Eric Fetzer Jet Propulsion Laboratory Do multi-instrument science by authoring a dataflow doc. for a reusable operator tree. Access scientific data by naming it.

SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

Embed Size (px)

Citation preview

Page 1: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFloTM: Scientific Knowledge Creation on the Grid Using a Semantically-Enabled

Dataflow Execution Environment

Brian Wilson, Tom Yunck, Elaine Dobinson, Benyang Tang, Gerald Manipon, Dominic Mazzoni, Amy Braverman, and Eric Fetzer Jet Propulsion Laboratory

Do multi-instrument science by authoring a dataflow doc. for a reusable operator tree. Access scientific data by naming it.

Page 2: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 2ESIP Federation Meeting, Jan. 4-6, 2005

GENESIS Project: Multi-Instrument Atmospheric Science

Characterize Earth’s varied behavior

Understand the Earth as an integrated system

PredictEarth’s response to complex forcings

Page 3: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 3ESIP Federation Meeting, Jan. 4-6, 2005

iEarth Vision will be enabled by the open-source SciFlo Engine.

Automate large-scale, multi-instrument science processing by authoring a dataflow document that specifies a tree of executable operators.

iEarth Visual Authoring Tool Distributed Dataflow Execution Engine Move operators (executables) to the data. Built-in reusable operators provided for many tasks such as

subsetting, co-registration, regridding, data fusion, etc. Custom operators easily plugged in by scientists. Leverage convergence of Web Services (SOAP) with Grid

Services (Globus v3.2).

Hierarchical namespace of objects, types, & operators. sciflo.data.EOS.AIRS.L2.atmosphericParameters sciflo.operator.EOS.coregistration.PointToSwath

Carbon Cycle

GENESIS SciFlo Engine

Page 4: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 4ESIP Federation Meeting, Jan. 4-6, 2005

Carbon Cycle

Distributed Computing Using SciFlo

Inject data query or flow execution requestinto SciFlo network from any node.

Front-endReceives Requests

Queue ofPending Flows

SOAP

Master ServerPlans

Execution

SciFlo Server at JPL

Parallel ExecutionEngine

SOAP

SOAP

SciFlo ServerAt Data Portal

RedirectFlow

GENESIS ESIP

Co-registrationSOAP Call

Goddard DataSubsetting

SciFlo ServerAt Langley

SOAPQuery

Execute

Grid-EnabledComputing

Cluster

CondorDAGMan

SubFlow

Page 5: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 5ESIP Federation Meeting, Jan. 4-6, 2005

Carbon Cycle Enabling Technologies

Web Services: SOAP Grid Services: OGSI & Globus v3.2 Parallel dataflow engines Semantic Web: OWL inference using metadata

SciFlo Distributed Dataflow System Loosely-coupled distributed computing using Web (SOAP)

and Grid services Specifying a processing stream as an XML document Dataflow engine for automated execution and load balancing

Multi-Instrument Earth Science Motivating Example: Compare the temperature & water vapor

profiles retrieved from AIRS (Atmospheric Infrared Sounder) swaths and GPS limb soundings.

Outline

Page 6: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 6ESIP Federation Meeting, Jan. 4-6, 2005

Carbon Cycle SOAP-based Web Computing & Semantic Web Exchange structured data in XML format (not HTML) Semantics or “meaning” kept with the data Emphasize programmatic interfaces Web (Grid) Services Leverage WS-Security and other WS-* standards

Simple Object Access Protocol (SOAP) Distributed Computing by Exchange of XML Messages Lightweight, Loosely-Coupled API Programming language independent Multiple Transport Protocols Possible (HTTP, P2P) Web Services Description Language (WSDL) Publish Services in catalogs for automated discovery

Third Generation of the Web

Page 7: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 7ESIP Federation Meeting, Jan. 4-6, 2005

Carbon Cycle

Evolving Grid Computing Standards

[From Globus Toolkit “Ecosystem”presentation at GGF11 by Lee Liming]

Page 8: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 8ESIP Federation Meeting, Jan. 4-6, 2005

Carbon Cycle History of Scientific Computing as a Utility

The Grid began as effort to tightly couple multiple super- or cluster computers together (e.g., Globus Toolkit v1 & v2).

Needed job scheduling, submission, monitoring, steering, etc. SETI@HOME success

OGSI: Open Grid Services Infrastructure WS-Resource Framework (WSRF): Capabilities treated as

storage or computing resources exposed on the web. Globus v3.2 is open-source implementation using Java/C. A service is Grid-enabled by inheriting from Java class. Standard is complex and growing. Challenge: Ease of installation & use.

SciFlo is a lighter weight peer-to-peer (P2P) approach.

Evolving Grid Computing Standards

Page 9: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 9ESIP Federation Meeting, Jan. 4-6, 2005

Carbon Cycle

Evolving Grid Computing Standards

[From Globus Toolkit “Ecosystem”presentation at GGF11 by Lee Liming]

Page 10: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 10ESIP Federation Meeting, Jan. 4-6, 2005

Grid Apps Using Globus Middleware

WebBrowser

ComputeServer

GlobusMCS/RLS

DataViewer

Tool

CertificateAuthority

CHEF ChatTeamlet

MyProxy

CHEF

ComputeServer

Resources implement standard access & management interfaces

Collective services aggregate &/or

virtualize resources

Users work with client applications

Application services organize VOs & enable

access to other services

Databaseservice

Databaseservice

Databaseservice

SimulationTool

Camera

Camera

TelepresenceMonitor

Globus IndexService

GlobusGRAM

GlobusGRAM

GlobusDAI

GlobusDAI

GlobusDAI

Application Developer

2

Off the Shelf 9

Globus Toolkit

4

Grid Community

4

Page 11: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 11ESIP Federation Meeting, Jan. 4-6, 2005

Planned Components in GT 4.0

GSI

WS-Security

CAS(WSRF)

SimpleCA

Data Managemen

tSecurity

WSCore

Resource Managemen

t

Information Services

Authz Framework

RFT(WSRF)

RLS

OGSI-DAI

New GridFTP

XIO

JAVAWS Core(WSRF)

C WS Core(WSRF)

MDS2

WS-Index(WSRF)

Pre-WSGRAM

WS-GRAM(WSRF)

CSF(contribution)

pyGlobus(contributed)

Page 12: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 12ESIP Federation Meeting, Jan. 4-6, 2005

Page 13: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 13ESIP Federation Meeting, Jan. 4-6, 2005

WS-* Alphabet Soup

Page 14: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 14ESIP Federation Meeting, Jan. 4-6, 2005

WS Architecture

Page 15: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 15ESIP Federation Meeting, Jan. 4-6, 2005

WS-Resource Framework Model

Page 16: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 16ESIP Federation Meeting, Jan. 4-6, 2005

Using a Web Service to Access a WS-Resource

Page 17: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 17ESIP Federation Meeting, Jan. 4-6, 2005

Carbon Cycle

Evolving Grid Computing Standards

[From Globus Toolkit “Ecosystem”presentation at GGF11 by Lee Liming]

Page 18: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 18ESIP Federation Meeting, Jan. 4-6, 2005

Carbon Cycle Grid:

Schedule & submit cluster computing jobs Operator tree is a Directed Acyclic Graph (DAG) CONDOR, CONDOR-G, DAGMan Globus Alliance Standards: GSI, GRAM, MDS, RLS, XIO, etc. Chimera -> Pegasus -> DAGMan -> Executing Grid Job

Web: Several web choreography standards IBM’s Business Process Execution Language (BPEL4WS)

Less convergence here than in OGSI/WSRF Marketplace winners? 10 workflow groups spoke at Global Grid Forum (GGF) meeting Sciflo will use some Globus capabilities via python bindings

(pyGlobus).

Dataflow / Workflow Engines

Page 19: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 19ESIP Federation Meeting, Jan. 4-6, 2005

Abstract (skeleton) Workflow is more easily authored.

Trivial data format & unit conversions auto-inserted.

As toolbox of known & reliable operators grows, even complex ops like regridding become trivial.

Could use other backends if desired (BPEL, DAGMan).

Elaborating Workflow Documents

Page 20: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 20ESIP Federation Meeting, Jan. 4-6, 2005

Carbon Cycle History

DAML: DARPA Agent Markup Language OWL: Ontology Web Language (from DAML+OIL) Numerous inference engines & ontologies being developed

Semantics for Web Services OWL-S: OWL-based Web service ontology Describe properties & capabilities in computer-interpretable form

(beyond WSDL).

SciFlo Semantics & Inference Use WSDL+ & OWL-S to describe local operators (executables),

remote services, & grid computing jobs. Discover & select operators to fill in missing steps in a dataflow.

Semantic Web

Page 21: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 21ESIP Federation Meeting, Jan. 4-6, 2005

Carbon Cycle

SciFlo’s Strength Lies in Combining Many Elements into a Single Open-Source System

Abstract XML dataflow documents translated to concrete flows. Parallel dataflow execution engine Semantic inference using XML metadata Move operators to the data. SOAP architecture, but also P2P functionality. Every node is both client & server; easy node replication. One-click installation onto server or desktop nodes. Initiate grid computations from your desktop.

Access data objects by naming them! P2P Distributed Namespace of data sources & operators

Server architecture Group of interacting SOAP services (replaceable modules) Implementation in XML, python, & C/C++ (not Java)

Strength in Numbers: Let a million nodes bloom!

Page 22: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 22ESIP Federation Meeting, Jan. 4-6, 2005

Carbon Cycle

P2P Server Infrastructure Distributed catalogs of data sources and operator bundles. Permanent hierarchical names for data & operators. Data & operator movement handled automatically by server.

SciFlo Execution Steps: Validate: Parse & validate XML doc. against schema. Queue: Push flow into an execution queue if slaves busy. Embellish: Infer concrete workflow from abstract workflow; insert

simple unit or format conversions, or more complex operators. Plan/Schedule: Construct execution plan & annotate flow

document accordingly. Start Execution: Parallel execution using master & slave servers. Forward: Forward partially evaluated flow to another server for

load balancing & locality optimization. Freeze/Thaw: Store execution state while waiting; wake up when

long-running operators complete. Deliver: Return URLs pointing to results (or fault) to SciFlo client,

which then pulls output files (via http, sftp, or GridFTP).

Server Implementation

Page 23: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 23ESIP Federation Meeting, Jan. 4-6, 2005

Carbon Cycle Parallelism at many levels during execution:

Create multiple processes on a single server node. Create multiple threads within each process to execute, monitor,

& respond to status queries. Launch a sub-flow or operator on a slave node within the local

computer cluster. Invoke a remote operator via a SOAP or http CGI call and wait for

results. Redirect a flow or sub-flow to a server that can access a large

data source locally. Partially evaluate a flow and then redirect. Launch a large, CPU-intensive operator on a Grid-enabled

cluster or supercomputer (Sciflo and Grid interoperability). Invoke the same flow multiple times if the source operators(s)

yield multiple input objects (implicit file parallelism). Switch execution between active flows while waiting for results

from “frozen” flows.

Parallel, Asynchronous Dataflow Execution

Page 24: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 24ESIP Federation Meeting, Jan. 4-6, 2005

Carbon Cycle Permanent Hierarchical Names (“Holy Grail”)

Naming Authority assigned at each namespace level Distributed P2P namespace (P2P catalog lookup)

Proper Names AIRS Level2 Parameter Retrieval Dataset (granules):

sciflo.data.EOS.AIRS.L2.atmosphericParameters (or metadata) Generic Point-To-Swath Co-registration Operator:

sciflo.operator.EOS.coregistration.PointToSwath

Generic Names Atmospheric Temperature Data:

sciflo.data.atmosphere.temperature.profile (or .grid)

Name resolves to list of EOS datasets

Semantics attached (3DGeoParameterGrid of temperature)

Data Access by Naming

Page 25: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 25ESIP Federation Meeting, Jan. 4-6, 2005

Carbon Cycle

AIRS/GPS Co-registration: Point to Swath

AIRS Level2Swaths over Pacific

GPS Level2Profile Locations

Page 26: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 26ESIP Federation Meeting, Jan. 4-6, 2005

Carbon Cycle getAirsMetaData(startTime, endTime, lat, lon)

Returns granule ID’s and geolocation info. for AIRS swaths that intersect lat/lon point.

getAirsUrlsFromGranuleIds(idList or xmlString) Redirection server – Returns URLs pointing to AIRS granules in

local cache or DataPool (ftp site at DAAC).

getGpsMetaData(startTime, endTime, NW, SW, SE, NE) Returns occultation ID’s and geolocation info. for GPS limb scans

that are inside a lat/lon region.

getGpsProfiles(idList or xmlString) Returns URL pointing to netCDF file containing GPS profiles for

all occultation ID’s in the list.

getGpsProfile(occID) Returns a single GPS profile in XML format.

SciFlo Data Access (SOAP) Services

Page 27: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 27ESIP Federation Meeting, Jan. 4-6, 2005

SciFlo Data Query/Access (SOAP) Services• GeoLocationQuery(startTime, endTime, lat, lon, timeTolerance,

distanceTolerace, variable, metadataGroups)– Returns granule ID’s and geolocation info. for AIRS L2 swaths that

intersect lat/lon point or are near enough.

• GeoRegionQuery(startTime, endTime, lat/lon region, . . .)– Returns granule ID’s and geolocation info. for AIRS L2 swaths that

intersect the lat/lon region or are near enough.

• QueryByMetadata(variable, ListOfConstraints, groups)– Returns granule ID’s and selected metadata for AIRS granules that

satisfy the metadata constraint expression:(min1 <= field1 <= max1 and/or min2 <= field2 <= max2 . . .).

• GetDataById(IdList, UrlOptions)– Given list of unique ID’s, returns list of ftp, http, or DODS URL’s

pointing to the granules (files). Uses cache and redirection server.

• -- Other semantic interfaces possible

Page 28: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 28ESIP Federation Meeting, Jan. 4-6, 2005

SciFlo Data Query/Access (SOAP) Services

• GetMetadata(type, groups)– Returns metadata describing scientific domain, dataset info, list of

metadata fields, generic name translation table, location of XML schema documents, location of related ontologies (OWL/RDF), etc.

• GetHelp(type, groups)– A SOAP service that itself returns help documentation describing

the available SOAP services.

• --These Services Combined Can Support:– Data Catalog, General Query, Data Access, Generic Names,– Data Discovery by Domain and Dataset Keywords– Type checking via XML schemas– Hooks to semantic web

Page 29: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 29ESIP Federation Meeting, Jan. 4-6, 2005

AIRS versus GPS Flowchart (part 1)

Page 30: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 30ESIP Federation Meeting, Jan. 4-6, 2005

AIRS versus GPS Flowchart (part 2)

Page 31: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 31ESIP Federation Meeting, Jan. 4-6, 2005

SciFlo Document

Page 32: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 32ESIP Federation Meeting, Jan. 4-6, 2005

AIRS & GPS Retrievals Matchup Demo

Interface: HTML web form auto- generated from XML dataflow doc.

Input: User enters start/end time & other co-registration criteria.

Flow Execution: Calls 4 SOAP data query services & total of 15 operators on 4 computers.

Page 33: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 33ESIP Federation Meeting, Jan. 4-6, 2005

AIRS & GPS Retrievals Matchup Demo

Results Page: Shows status updates during execution and then final results.

Caching: Reuse intermediate data products or force recompute.

Results: Merged data in netCDF file & plots as Flash movie.

Page 34: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 34ESIP Federation Meeting, Jan. 4-6, 2005

AIRS/GPS Matchups

Page 35: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 35ESIP Federation Meeting, Jan. 4-6, 2005

AIRS/GPS Temperature & Water Vapor Comparison Plots

Page 36: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 36ESIP Federation Meeting, Jan. 4-6, 2005

AIRS/GPS Temperature & Water Vapor Comparison Plots

Page 37: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 37ESIP Federation Meeting, Jan. 4-6, 2005

General Features:

• High latitude measurements (~73N deg) separated by ~2 deg in longitude and ~0.5 hours

• Sharper tropopause measured by GPS. AIRS overestimates tropopause temperature by ~5 deg.

• GPS measure a wavy structure in the stratosphere which is absent in AIRS

Temperature as a function of pressure from AIRS (blue) and a coincident GPS occultation (red)

Page 38: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 38ESIP Federation Meeting, Jan. 4-6, 2005

AIRS/GPS Temperature Difference Statistics

Mean temperature profilefor 41 matchups, latitude

region -60 to -70 deg.,Jan-Apr, 2003

Page 39: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 39ESIP Federation Meeting, Jan. 4-6, 2005

Compare/Contrast Template• Services Provided• Implemented Standards / Protocols• Inputs, Outputs, & Parameters• Metadata• Maturity

– Number of Users, Data Volume• Support

– Level, Type, Availability• Open / Proprietary

– Open? Proprietary? Published? Costs?• Activities

– Interoperability?• Future Plans

– Services to be Implemented

Page 40: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 40ESIP Federation Meeting, Jan. 4-6, 2005

Services Provided by SciFlo• Data Query / Access (SOAP services)

• GPS Occultation Database (atmospheric retrievals)• AIRS, MODIS, & MISR L2/L3 granules

• Grid Workflow Engine– Abstract XML dataflow documents translated to concrete

flows, with missing steps inserted automatically.– Parallel dataflow execution engine (scheduler)– Semantic inference using XML metadata– Operators can be SOAP services, local scripts, or local

native executables.– Move authenticated operators/executables to the data.– SOAP architecture, but also P2P functionality.– Every node is both client & server; easy node replication.– One-click installation onto server or desktop nodes.– Support Linux & Windows.– Initiate grid computations from your desktop.

Page 41: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 41ESIP Federation Meeting, Jan. 4-6, 2005

Implemented Standards / Protocols• XML schema• SOAP, WSDL, UDDI• WS-Resource Framework

• WS-ResourceProperties, -ResourceLifetime, -BaseFaults

• SciFlo Documents:• Dataflow docs• Operator registration/description docs• All described by XML schema

• SciFlo Ontologies:• Meaning of datasets & physical variables• Meaning of transformations (operations)• Using OWL/RDF

Page 42: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 42ESIP Federation Meeting, Jan. 4-6, 2005

Inputs, Outputs, & Parameters • Too numerous to list.• Described in XML schema docs, WSDL files, and

OWL/RDF ontologies.

Page 43: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 43ESIP Federation Meeting, Jan. 4-6, 2005

Metadata • Same.• ??

Page 44: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 44ESIP Federation Meeting, Jan. 4-6, 2005

Maturity • SciFlo framework & dataflow execution engine will be

rolled out by next ESIP Fed. meeting.

Page 45: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 45ESIP Federation Meeting, Jan. 4-6, 2005

Support • GENESIS ESIP• Open-source community• ??

Page 46: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 46ESIP Federation Meeting, Jan. 4-6, 2005

Open / Proprietary • Open source

Page 47: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 47ESIP Federation Meeting, Jan. 4-6, 2005

Future Plans • 10,000 SciFlo nodes!

Page 48: SciFlo TM : Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment Brian Wilson, Tom Yunck, Elaine Dobinson,

SciFlo: Scientific Knowledge Flow

Wilson et al., 12/15/04 48ESIP Federation Meeting, Jan. 4-6, 2005

Carbon Cycle SciFlo’s Innovation Lies in Combining Many Elements

into a Single Open-Source System Abstract XML dataflow documents Semantic inference using XML metadata Parallel dataflow execution engine Move operators to the data. Every node is both client & server; easy node replication. SOAP architecture, but also P2P functionality. Initiate grid computations from your desktop.

Goal: SciFlo nodes inside all Science Data Centers

Multi-Instrument Earth Science Instrument Cross-Comparisons Multi-Instrument Science Portals Large-scale multivariate statistical studies and verification of

weather/climate models.

Summary