Peter Bajcsy, Rob Kooper, Luigi Marini, Barbara Minsker and Jim Myers National Center for...

Preview:

Citation preview

Peter Bajcsy, Rob Kooper, Luigi Marini, Barbara Minsker and Jim Myers

National Center for Supercomputing Applications (NCSA)University of Illinois at Urbana-Champaign (UIUC)

POC: Peter Bajcsy, email: pbajcsy@ncsa.uiuc.edu

CyberIntegrator: A Meta-Workflow System Designed for Solving Complex Scientific Problems using Heterogeneous Tools

Outline

• Problem Formulation– Meta-Workflow Definitions– Past Work

• Design– Workflow Requirements Driven by Environmental Observatories– Architecture of NCSA Meta-workflow Prototype Called

CyberIntegrator

• Implementation– Key Capabilities of CyberIntegrator

• Use Cases– Environmental and Hydrological Engineering

• Summary

Problem Formulation

Science Problem Formulation

System Problem Formulation

Work Flow Problem Formulation

Meta-Workflow Definition

• Meta-workflow (MWF) definitions in the past: – (1) Workflow aspect: a workflow is an aggregation of tasks, a meta-

workflow is an aggregation of workflows or a hierarchy of workflows – (2) Process management aspect: large activities have to be

integrated, executed and evaluated in a process of conducting electronic commerce

• Our meta-workflow definition includes multiple of its dimensions:– (1) hierarchical structure and organization of software,

• combinatorial explosion of module connection– (2) heterogeneity of software tools and computational resources,

• the number of different engines and software applications used by people for a reason

– (3) usability of tool and workflow interfaces, – (4) community sharing of fragments and user friendly security, – (5) community knowledge and provenance, – (6) execution and built-in fault-tolerance, etc

Previous Work• Other efforts:

– Business process workflow architectures - FlowMark, WSFL and BPEL: serving business community

– Scientific workflow architectures - DAGMan, Taverna, SciFlo, Kepler, D2K, OGRE, CCA, Pegasus, GridFlow and Grid Ant, Triana and GSFL

• Comparison: – Our work focuses on the simplicity of end user

interactions with information technologies while utilizing all execution mechanisms transparently (workflow by example).

– Our work creates provenance to recommendation pipelines for the benefit of a community (recommendations based on provenance information).

Research Topics

• Data Translations: Semantic and syntactic mapping of data structures

• Provenance Information: Granularity of gathered provenance information for recommendations, auditing and re-construction

• HCI: User interface design issues and community dependencies

• Meta-Data: Federation of distributed (data, tool, computational resource) registries

• Execution: Just in time data delivery wrt. remote computing; Cost benefit analysis of data transfer vs. CPU requirements; Execution triggered by streaming data

Design

Design Goals

• Make scientific discoveries easier– Workflow by example (step-by-step

experimentation)– Design friendly user interfaces– Build seamless access to heterogeneous

data/tools/resources – Provide data and process provenance

information– Recommend data, tools and computational

resources– Derive higher level semantic tools

Meta-workflow Architecture

Implementation

Meta-Workflow Features

• Workflow by example

• Support of heterogeneous executors– Workflows: GeoLearn, D2K, Kepler/Ptolemy– Applications: MS Excel, Im2Learn, ArcGIS– Web services: D2KWS

• Provenance– Gathering & Meta-data repositories

• Recommendations

Meta-workflow Editor

Use Cases

Meta-Workflow R&D Drivers

• Community drivers: – Environmental Science: CLEANER– Hydrological Science: CUAHSI

• Science drivers:– Environmental Modeling of Nutrient Distribution

• Monte Carlo simulations of maximum amount of pollution that a water body can receive each day and still retain its uses

– Understanding the Dynamic Evolution of Land-Surface Variables in the Illinois River Basin

• Data-driven analyses of multi-variable relationships from remote sensing data

• Technology drivers: – Collaboratory Cyberenvironments

Summary

• The problem of designing a highly interactive scientific meta-workflow system is very complex

• Key capabilities of our meta-workflow prototype implementation called CyberIntegrator were demonstrated with two use cases.

• We plan on building and deploying a practical tool for multiple communities.

• Publications:– Image Spatial Data Analysis Group at NCSA: – URL: http://isda.ncsa.uiuc.edu

• Questions:– Peter Bajcsy; Email: pbajcsy@ncsa.uiuc.edu

Hydro-informatics

Backup

Meta-workflow System Information

Terminology

• Engines are stand-alone environments and applications that are used by many tools– Examples: Matlab, MS Excel, D2K, Im2Learn, ArcGIS,

Kepler

• Tools are solutions specific to a problem and consist of several algorithms– Examples: Image Calculator in Im2Learn, Pie chart

visualization in MS Excel, …

• Algorithms are code fragments that perform a specific operation in a tool– Examples: image addition operation in Image Calculator

Environmental Science

Hydrological Science

Recommended