1
Accessing Cloud Computing to Support Water Resources Modeling Scott D. Christensen, [email protected] Nathan R. Swain, [email protected] E. James Nelson, [email protected] Norman L. Jones, [email protected] This material is based upon work supported by the National Science Foundation under Grant No. 1135483. Background Applications Tethys Platform Integration CondorPy and TethysCluster Summary Advances in water resources modeling are providing us with better information, however, they require more computational power to run. Cloud computing enables universal access to costeffective computing, yet there still remains a significant technical barrier to accessing these resources. Here we present a set of Python tools, TethysCluster and CondorPy, that have been developed to lower the barrier to modelingin the cloud by providing: (1) programmatic access to dynamically scalable computing resources (2) a batch scheduling system to queue and dispatch the jobs to the computing resources (3) data management for job inputs and outputs (4)the ability to dynamically create, submit, and monitor computing jobs While TethysCluster and CondorPy can be used independently to provision computing resources and perform large modeling tasks, they have also been integrated into Tethys Platform, a development platform for water resources web apps, to enable computing support for modeling workflows and decision support systems deployed as web apps. Two Python modules have been developed to lower the technical barrier to accessing cloud computing for performing large modeling tasks. TethysCluster automates the process of provisioning diverse cloud resources and configuring them with HTCondor. CondorPy interfaces with HTCondor to enable computing jobs to programmatically be created, submitted, and monitored. CondorPy and TethysCluster have been integrated into Tethys Platform enabling web apps to easily perform large computing tasks. Stochastic Analysis Uncertainty is inherent to hydrologic modeling, and is often accounted for my performing a stochastic analysis which requires running hundreds or thousands of model simulations. For a spatiallydistributed, physicsbased models such as GSSHA running thousands of models may take months or even years. TethysCluster and CondorPy enable this type of analysis to be done much faster using cloud computing. Job Manager CondorPy has been integrated into the Tethys Platform Python SDK in the form of a job manager that enables developers to define computing jobs and submit them to the HTCondor pools to offload large computing tasks. CondorPy HTCondor is a software system that that enables High Throughput Computing (HTC) by managing computing resources and scheduling computing jobs. It enables diverse computing systems to be linked together into a unified computing pool. CondorPy serves as a crossplatform, highlevel interface for HTCondor, and allows jobs to be created, submitted and monitored from a Python scripting environment. This interface facilitates the use of HTCondor in a web environment like Tethys Platform (see panel D). TethysCluster Large modeling tasks often require a large amount of computing resources. Commercial cloud providers such as Amazon Web Services (AWS), and Microsoft Azure provide ondemand, scalable resources, however configuring them HTCondor can prove challenging. StarCluster is a Python module that automatically provisions and configures Linux computing resources with AWS. TethysCluster is an adaptation of StarCluster and expands it’s functionality to work with both Linux and Windows resources with AWS as well as Azure. ciwater.github.io/condorpy A C D B E TethysCluster CondorPy Tethys Platform Tethys Platform is a water resources web development platform that lowers the barrier to creating web apps. Tethys Platform provides open source web GIS and visualization tools all integrated into a unified Python SDK . Cluster Management Cloud computing resources are easy to provision through admin site of Tethys Portal, the web interface of Tethys Platform. TethysCluster works behind the scenes to automatically configure the cloud resources into an HTCondor computing pool. CondorPy TethysCluster Ensemble Forecast Processing TethysCluster and CondorPy are used by the Streamflow Prediction Tool (a Tethys web app) to automatically process a 52member ensemble forecast produced by the European Center for MediumRange Weather Forecasts. A scheduled Python script creates 52 jobs using CondorPy to process each ensemble forecast every 12 hours when a new forecast is available. TethysCluster can be used to automatically provision and deprovision cloud computing resources. Hierarchical Modeling Running high fidelity models over large domains often requires powerful computers and lots of time. One way to alleviate this problem is to partially parallelize the computation by decomposing the domain into smaller models. This results in a series of hierarchical models whose execution must be coordinated. CondorPy facilitates running this type of workflow with HTCondor in a parallel computing environment. ciwater.github.io/TethysCluster CondorPy TethysCluster Probabilistic flood map resulting from 5000 model runs using the spatiallydistributed physicsbased hydrologic model GSSHA. Top: large watershed shown divided into hierarchical subbasis. Bottom: Diagram showing the parallelization and hierarchy of the models. Screenshot of a Tethys web app, the Streamflow Prediction Tool, which uses CondorPy and TethysCluster to process ensemble forecasts.

Accessing Cloud Computing to Support Water Resources Modeling

Embed Size (px)

DESCRIPTION

Scott D. Christensen, Nathan R. Swain, E. James Nelson, Norman L. Jones,

Citation preview

Page 1: Accessing Cloud Computing to Support Water Resources Modeling

Accessing  Cloud  Computing  to  Support  Water  Resources  ModelingScott  D.  Christensen,  [email protected]

Nathan  R.  Swain,  [email protected].  James  Nelson,  [email protected]

Norman  L.  Jones,  [email protected]

This  material  is  based  upon  work  supported  by  the  National  Science  Foundation  under  Grant  No.  1135483.

Background Applications Tethys  Platform  Integration

CondorPy  and  TethysCluster Summary

Advances in water resources modeling are providing us with better information,however, they require more computational power to run. Cloud computingenables universal access to cost-­‐effective computing, yet there still remains asignificant technical barrier to accessing these resources. Here we present a setof Python tools, TethysCluster and CondorPy, that have been developed tolower the barrier to modeling in the cloud by providing :

(1)programmatic  access  to  dynamically  scalable  computing  resources

(2)a  batch  scheduling  system  to  queue  and  dispatch  the  jobs  to  the  computing  resources

(3)data  management  for  job  inputs  and  outputs(4) the  ability  to  dynamically  create,  submit,  and  monitor  computing  jobs

While TethysCluster and CondorPy can be used independently to provisioncomputing resources and perform large modeling tasks, they have also beenintegrated into Tethys Platform, a development platform for water resourcesweb apps, to enable computing support for modeling workflows and decisionsupport systems deployedas web apps.

Two Python modules have been developed to lower the technical barrier to

accessing cloud computing for performing large modeling tasks. TethysCluster

automates the process of provisioning diverse cloud resources and configuring

them with HTCondor. CondorPy interfaces with HTCondor to enable computing

jobs to programmatically be created, submitted, and monitored.

CondorPy and TethysCluster have been integrated into Tethys Platform enabling

web apps to easily perform large computing tasks.

Stochastic Analysis

Uncertainty is inherent to hydrologic modeling,

and is often accounted for my performing a

stochastic analysis which requires running

hundreds or thousands of model simulations.

For a spatially-­‐distributed, physics-­‐based models

such as GSSHA running thousands of models

may take months or even years. TethysCluster

and CondorPy enable this type of analysis to be

done much faster using cloud computing.

Job  ManagerCondorPy has been integrated intothe Tethys Platform Python SDK in theform of a job manager that enablesdevelopers to define computing jobsand submit them to the HTCondorpools to offload large computingtasks.

CondorPyHTCondor is a software system that that enables

High Throughput Computing (HTC) by managing

computing resources and scheduling computing

jobs. It enables diverse computing systems to be

linked together into a unified computing pool.

CondorPy serves as a cross-­‐platform, high-­‐level

interface for HTCondor, and allows jobs to be

created, submitted and monitored from a Python

scripting environment. This interface facilitates the

use of HTCondor in a web environment like Tethys

Platform (see panel D).

TethysClusterLarge modeling tasks often require a large amount of

computing resources. Commercial cloud providers

such as Amazon Web Services (AWS), and Microsoft

Azure provide on-­‐demand, scalable resources,

however configuring them HTCondor can prove

challenging. StarCluster is a Python module that

automatically provisions and configures Linux

computing resources with AWS. TethysCluster is an

adaptation of StarCluster and expands it’s functionality

to work with both Linux and Windows resources with

AWS as well as Azure. ci-­‐water.github.io/condorpy

A C D

B ETethysCluster

CondorPy

Tethys  PlatformTethys Platform is a water resourcesweb development platform thatlowers the barrier to creating webapps. Tethys Platform provides opensource web GIS and visualization toolsall integrated into a unified PythonSDK .

Cluster  ManagementCloud computing resources are easy toprovision through admin site of TethysPortal, the web interface of TethysPlatform. TethysCluster works behindthe scenes to automatically configurethe cloud resources into an HTCondorcomputing pool.

CondorPy

TethysCluster

Ensemble Forecast Processing

TethysCluster and CondorPy are used by the Streamflow Prediction Tool (a Tethys web app) to

automatically process a 52-­‐member ensemble forecast produced by the European Center for

Medium-­‐Range Weather Forecasts. A scheduled Python script creates 52 jobs using CondorPy to

process each ensemble forecast every 12 hours when a new forecast is available. TethysCluster can

be used to automatically provision and de-­‐provision cloud computing resources.

Hierarchical Modeling

Running high fidelity models over large

domains often requires powerful computers

and lots of time. One way to alleviate this

problem is to partially parallelize the

computation by decomposing the domain into

smaller models. This results in a series of

hierarchical models whose execution must be

coordinated. CondorPy facilitates running this

type of workflow with HTCondor in a parallel

computing environment.

ci-­‐water.github.io/TethysCluster

CondorPy TethysCluster

Probabilistic   flood  map  resulting  from  5000  model  runs   using  the  spatially-­‐distributed   physics-­‐based   hydrologic  model  GSSHA.

Top:  large  watershed  shown  divided   into  hierarchical  sub-­‐basis.  Bottom:  Diagram  showing  the  parallelization  and  hierarchy  of  the  models.  

Screenshot  of  a  Tethys  web  app,   the  Streamflow  Prediction  Tool,  which  uses  CondorPy   and  TethysCluster  to  process  ensemble  forecasts.