105
Text CCT: Center for Computation & Technology SAGA: An Introductory Lecture

Session 40 : SAGA Overview and Introduction

Embed Size (px)

DESCRIPTION

Speaker: Shantenu Jha

Citation preview

Page 1: Session 40 : SAGA Overview and Introduction

Text

CCT: Center for Computation & Technology

SAGA: An Introductory Lecture

Page 2: Session 40 : SAGA Overview and Introduction

Text

CCT: Center for Computation & Technology

Developing Distributed Applications Using SAGA

Understanding the Landscape of Distributed Applications

Page 3: Session 40 : SAGA Overview and Introduction

Text

Outline (1)

  Simple API for Grid Applications (SAGA)   Introduction to Simple API   SAGA as an emerging Standard

  Understanding Distributed Applications (DA)   Challenges of DA ? Differ from HPC or || App?   Rough Taxonomy of Distributed Applications   Using SAGA to develop DA

i.  Distributed Execution Mode Legacy (Implicit) ii.  Explicit coordination and distribution iii.  Frameworks: support characteristics or patterns

  Understanding the SAGA Landscape   Interface/Specification, C++ Engine, Adaptors

Page 4: Session 40 : SAGA Overview and Introduction

Text

Outline (2)

  Applications Developed Using SAGA:   Replica-Exchange, C02 Sequestration (EnKF)   MapReduce

  Interoperabilty across Grids, Grids-Clouds

  Other SAGA Projects (Advantage of Standards)   Uptake by JSAGA and XtreemOS   Service Discovery (SD) and uptake gLite   DESHL (DEISA) -- Tool or Application?

  Session II: SAGA Hands-on   Using Mini-Examples to become familiar with the API   A few Real Application examples   Time to play with the code programmer’s manual

Page 5: Session 40 : SAGA Overview and Introduction

Text

A take on “What are Grids?”

Grids are:   Dynamic:

  Version, updates, new resources...   Heterogenous:

  Operating Systems, Libraries, software stack   Middleware service versions and semantics   Administrative policies – access, usage, upgrade

  Complex:   Production Grid Service with high QoS non-trivial   Derived from above as well as inherently

Not only are Grids difficult to operate etc., but it is very difficult to develop, program & deploy Grid applications

Page 6: Session 40 : SAGA Overview and Introduction

Text

SAGA: A Quick Tour

  There exists a lack of:   Programmatic approaches that provide common grid

functionality at the correct level of abstraction for applications   Ability to hide underlying complexity of infrastructure, varying

semantics, heterogeneity and changes from the application-developer

  Simple, integrated, stable, uniform and high-level interface   Simple: 80:20 restricted scope   Integrated: Similar semantics & style across commonly used

distributed functional requirements   Stable: Standard   Uniform: Same interface for different distributed systems

  SAGA: Provides the high-level abstraction that application developers need that will work across different distributed systems

  Shields details of lower level middle-ware and system issues   Enables the details of distribution to be left out

Page 7: Session 40 : SAGA Overview and Introduction

Text

Copy a File: Globus GASS

if (source_url.scheme_type == GLOBUS_URL_SCHEME_GSIFTP || source_url.scheme_type == GLOBUS_URL_SCHEME_FTP ) { globus_ftp_client_operationattr_init (&source_ftp_attr); globus_gass_copy_attr_set_ftp (&source_gass_copy_attr, &source_ftp_attr); } else { globus_gass_transfer_requestattr_init (&source_gass_attr, source_url.scheme); globus_gass_copy_attr_set_gass(&source_gass_copy_attr, &source_gass_attr); }

output_file = globus_libc_open ((char*) target, O_WRONLY | O_TRUNC | O_CREAT, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP); if ( output_file == -1 ) { printf ("could not open the file \"%s\"\n", target); return (-1); } /* convert stdout to be a globus_io_handle */ if ( globus_io_file_posix_convert (output_file, 0, &dest_io_handle) != GLOBUS_SUCCESS) { printf ("Error converting the file handle\n"); return (-1); }

result = globus_gass_copy_register_url_to_handle ( &gass_copy_handle, (char*)source_URL, &source_gass_copy_attr, &dest_io_handle, my_callback, NULL); if ( result != GLOBUS_SUCCESS ) { printf ("error: %s\n", globus_object_printable_to_string (globus_error_get (result))); return (-1); } globus_url_destroy (&source_url); return (0); }

int copy_file (char const* source, char const* target) { globus_url_t source_url; globus_io_handle_t dest_io_handle; globus_ftp_client_operationattr_t source_ftp_attr; globus_result_t result; globus_gass_transfer_requestattr_t source_gass_attr; globus_gass_copy_attr_t source_gass_copy_attr; globus_gass_copy_handle_t gass_copy_handle; globus_gass_copy_handleattr_t gass_copy_handleattr; globus_ftp_client_handleattr_t ftp_handleattr; globus_io_attr_t io_attr; int output_file = -1;

if ( globus_url_parse (source_URL, &source_url) != GLOBUS_SUCCESS ) { printf ("can not parse source_URL \"%s\"\n", source_URL); return (-1); }

if ( source_url.scheme_type != GLOBUS_URL_SCHEME_GSIFTP && source_url.scheme_type != GLOBUS_URL_SCHEME_FTP && source_url.scheme_type != GLOBUS_URL_SCHEME_HTTP && source_url.scheme_type != GLOBUS_URL_SCHEME_HTTPS ) { printf ("can not copy from %s - wrong prot\n", source_URL); return (-1); } globus_gass_copy_handleattr_init (&gass_copy_handleattr); globus_gass_copy_attr_init (&source_gass_copy_attr);

globus_ftp_client_handleattr_init (&ftp_handleattr); globus_io_fileattr_init (&io_attr);

globus_gass_copy_attr_set_io (&source_gass_copy_attr, &io_attr); &io_attr); globus_gass_copy_handleattr_set_ftp_attr (&gass_copy_handleattr, &ftp_handleattr); globus_gass_copy_handle_init (&gass_copy_handle, &gass_copy_handleattr);

Page 8: Session 40 : SAGA Overview and Introduction

Text

SAGA Example: Copy a File Application Level Interface

#include <string> #include <saga/saga.hpp>

void copy_file(std::string source_url, std::string target_url) { try { saga::file f(source_url); f.copy(target_url); } catch (saga::exception const &e) { std::cerr << e.what() << std::endl; } }

The interface is simple and the actual function calls remain the same

Page 9: Session 40 : SAGA Overview and Introduction

Text

SAGA Interface Job Submission API

01: // Submitting a simple job and wait for completion 02: // 03: saga::job_description jobdef; 04: jobdef.set_attribute ("Executable", "job.sh"); 05: 06: saga::job_service js; 07: saga::job job = js.create_job ("remote.host.net", jobdef); 08: 09: job.run(); 10: 11: while( job.get_state() == saga::job::Running )‏ 12: { 13: std::cout << “Job running with ID: “ 14: << job.get_attribute(“JobID”) << std::endl; 15: sleep(1); 16: }

The interface is simple and the actual function calls remain the same

Page 10: Session 40 : SAGA Overview and Introduction

Text

SAGA: Job Submission Role of Adaptors (middleware binding)‏

Page 11: Session 40 : SAGA Overview and Introduction

Text

File API Example

01: // Read the first 10 bytes of a file if file size > 10 bytes 02: // 03: saga::file my_file (“griftp://gridhub/~/result.dat); 04: 05: off_t size = my_file.get_size (); 06: 07: if ( size > 10 ) 08: { 09: char buffer[11]; 10: long bufflen; 11: 12: my_file.read (10, buffer, &bufflen); 13: 14: if ( bufflen == 10 ) 15: { 16: std::cout << buffer << std::endl; 17: } 18: }

Page 12: Session 40 : SAGA Overview and Introduction

Text

SAGA: Class Diagram

In the works: CPR, Information Services, Messaging, DAI,...

Page 13: Session 40 : SAGA Overview and Introduction

Text

SAGA: Scope

  Is:   Simple API for Grid-Aware Applications

  Deal with distributed infrastructure explicitly   Application-level functionality/interface   An uniform interface to different middleware(s)   Client-side software

  Is NOT:   Middleware   A service management interface!   Does not hide the resources - remote files, job (but

the details)

Page 14: Session 40 : SAGA Overview and Introduction

Text

SAGA API: Towards a Standard Standards promote Interoperability

•  The need for standard programming interface •  Trade-off “Go it alone” versus “Community” model •  Reinventing the wheel again, yet again, & then again •  MPI a useful analogy of community standard

•  Vendors (Resource Provider), Software developers, users.. •  social/historic parallels also important

•  Time to adoption, after specification .... •  OGF the natural choice (SAGA-RG, SAGA-WG)

•  Spin-off of the Applications Research Group •  Driven by UK, EU (German/Dutch), US •  Design derived from 23 Use Cases

•  different projects, applications and functionality •  biological, coastal modelling, visualization

•  Will discuss the advantage of SAGA as a standard specification

Page 15: Session 40 : SAGA Overview and Introduction

Text

Outline (1)

  Simple API for Grid Applications (SAGA)   Introduction to Simple API   SAGA as an emerging Standard

  Understanding Distributed Applications (DA)   Challenges of DA ? Differ from HPC or || App?   Rough Taxonomy of Distributed Applications   Using SAGA to develop DA

i.  Distributed Execution Mode Legacy (Implicit) ii.  Explicit coordination and distribution iii.  Frameworks: support characteristics or patterns

  Understanding the SAGA Landscape   Interface/Specification, C++ Engine, Adaptors

Page 16: Session 40 : SAGA Overview and Introduction

Text

Developing DA with SAGA: Parallel Programming Analogy

  Distributed programming today, || programming pre-MPI   Bewildering # of ways of doing the basic stuff (e.g., job

submission)   MPI was a “success” in that it helped many new applications

  MPI is simple (most things can be done with 6-8 calls)   MPI was a standard (stable and portable code!)

  SAGA conception & trajectory similar to MPI   SAGA focusses on simplicity (common vs corner case)   OGF specification; on path to becoming a standard

  Therefore, SAGA's Measure(s) of success:   Does SAGA enable “new” distributed applications?   Does it enable effective development of “old” distributed

application

Page 17: Session 40 : SAGA Overview and Introduction

Text

Programming Distributed Applications

•  Characteristics: •  Dynamical and Heterogenous resources •  Control (or lack thereof) on remote sytems

•  Computational Models of Distributed Computing not as well understood as parallel computing

•  Parallel Computing: Focus on performance! For DA? •  Greater range of distributed application... •  Underlying model of distributed infrastructure complex..

•  What should end-user control? Must control? Should not? •  What should the system support? Role of tools? •  No universal answers! Simplification and distillation

Page 18: Session 40 : SAGA Overview and Introduction

Challenges of Distributed Applications (1) How do they differ from traditional HPC applications?

  Performance Models:   Not “peak utilization” e.g., # of jobs   Manage scalability, faults, heterogenity

  Usage Modes:   How applications are run, deployed and utilized is

often determined by the infrastructure   Static versus Dynamic Execution:

  Varying resource conditions; distributed applications should be dynamic

Page 19: Session 40 : SAGA Overview and Introduction

Distributed Applications

  Ability to develop simple, novel or effective distributed applications lags behind; remains a very hard undertaking

  The Future is Distributed:   Basic Trends in Computing   Decoupling & Delocalization of Data Production-

Consumption; Components of computation; Services; People

  Therefore develop applications that require distribution, coordination, management & collaboration of compute-data-people over multiple & distributed sites:

  Logically or physically distributed   Scale-up and Scale-out

Page 20: Session 40 : SAGA Overview and Introduction

Distributed Applications (2)

  Challenge is about how to develop DA effectively and efficiently, with extensibility, scalability and generalization as first-class concerns

  Understand what can be done? What are the gaps?

  Understand characteristics and requirements DA

  Vectors: Axes representing application characteristics, the value of which help us understand the application and what influences the design/constraints of solutions, tools & programming systems that can be used

  What makes distributed applications hard to develop?

  What makes distributed infrastructure hard to use

Page 21: Session 40 : SAGA Overview and Introduction

Taxonomy of Distributed Applications (1) How are DA developed?

  Distributing a legacy Application:   Focus mechanisms to support distributed execution

  Science Gateways, Condor..   No-coupling or coordination between tasks

  E.g., Replica Ensemble   Distributed Applications with Multiple Components

  Naturally decomposed; aggregate (coordination)   E.g. DDDAS, sensor-based applications

  Decompose a Large problem into sub-components   Degree-of-coupling (e.g freq/vol of comm)   Flexibility (scheduling, ordering)

Page 22: Session 40 : SAGA Overview and Introduction

Taxonomy of Distributed Applications (2) Coordination

  Challenge: Coordination of multiple-components   Decompose a large problem into sub-problem   Designing De Novo Distributed Application

  E.g. GridSAT (Wolski), Distributed Reasoning (Bal)   Coupling components set coordination strategy

  Coupling is high:   Low flexibility in placement, ordering, resource

selection; or Communication (e.g. MPIg)   Coupling is low:

  High flexibility in placement, ordering.. E.g. (Identical?) Replica-Exchange

Page 23: Session 40 : SAGA Overview and Introduction

Distributed Applications (3) What makes a DA a DA?

  Vectors: Axes representing application characteristics, the values of which help us understand:   The application requirements, and   Design and Constraints of solutions, tools

  Application Vectors:   Executable Unit   Communication   Coordination   Execution Environment

Page 24: Session 40 : SAGA Overview and Introduction

Frameworks: Capturing Application Requirements, Characteristics & Patterns

  Pattern: Commonly recurring modes of computation   Abstraction: Mechanism to support patterns and application

characteristics   Development, Deployment and Execution

  Frameworks: i.  Support Patterns:

•  MapReduce, Master-Worker, Hierarchical Job-Submission

ii.  Provide the abstractions and support the requirements & characteristics

Page 25: Session 40 : SAGA Overview and Introduction

SAGA and Distributed Applications

Page 26: Session 40 : SAGA Overview and Introduction

SAGA: Bridging the Gap between Infrastructure and Applications

Focus on Application Development and

Characteristics, not infrastructure details

Page 27: Session 40 : SAGA Overview and Introduction

SAGA: Application Types   Example of Distributed Execution Mode:

  Implicitly Distributed   SAGA can be used to script 800 job submissions on

the TeraGrid for GridChem   Example of Explicit Coordination and Distribution

  Explicitly Distributed   EnKF-HM application in which application controls the

work-load and determines where & when the execution of a task is to take place

Page 28: Session 40 : SAGA Overview and Introduction

SAGA: Frameworks

  Frameworks can either support a specific programming pattern or run-time pattern, or a commonly occuring application characteristic, i.e. provides an implementation of the abstraction

  Three types don’t have to be mutually exclusive… Can have a framework that helps in the explicit coordination

Page 29: Session 40 : SAGA Overview and Introduction

Abstractions for Distributed Computing (1) BigJob: Container Task

Adaptive:

Type A: Fix number of replicas; vary cores assigned

to each replica.

Type B: Fix the size of replica, vary number of replicas

Page 30: Session 40 : SAGA Overview and Introduction

Abstractions for Distributed Computing (2) SAGA Pilot-Job (Glide-In)

Page 31: Session 40 : SAGA Overview and Introduction

Coordinate Deployment & Scheduling of Multiple Pilot-Jobs

FAUST production-level implementation coming:

http://macpro01.cct.lsu.edu/~oweidner/faust/

Page 32: Session 40 : SAGA Overview and Introduction

Text

Using SAGA-based Frameworks (1)‏

Page 33: Session 40 : SAGA Overview and Introduction

Text

Using SAGA-based Frameworks (2): Ibis

–  Programming

•  Divide-and-conquer (Satin)‏ –  Fault-tolerant, malleable

•  IPL: Java-centric grid communication layer

– Run-anywhere, connectivity – Deployment and Management

Slide Courtesy H. Bal

Page 34: Session 40 : SAGA Overview and Introduction

Application Class vs Type

• No unique mapping between Class & Type.. . Interesting/Unique?

•  The same application can often be either explict or implicitly distributed

Page 35: Session 40 : SAGA Overview and Introduction

Distributed: Implicit vs Explicit ?

  Most applications/application classes can be either. Which approach (implicit vs explicit) is used depends:

  How the application is used?   Need to control/marshall more than one resource?

  Why distributed resources are being used?   How much can be kept out of the application?

  Can’t predict in advance?   Not obvious what to do, application-specific

metric   Unless necessary, applications should not be explicitly

distributed

Page 36: Session 40 : SAGA Overview and Introduction

Text

Outline (1)

  Simple API for Grid Applications (SAGA)   Introduction to Simple API   SAGA as an emerging Standard

  Understanding Distributed Applications (DA)   Challenges of DA ? Differ from HPC or || App?   Rough Taxonomy of Distributed Applications   Using SAGA to develop DA

i.  Distributed Execution Mode Legacy (Implicit) ii.  Explicit coordination and distribution iii.  Frameworks: support characteristics or patterns

  Understanding the SAGA Landscape   Interface/Specification, C++ Engine, Adaptors

Page 37: Session 40 : SAGA Overview and Introduction

Text

SAGA: The Landscape

Page 38: Session 40 : SAGA Overview and Introduction

Text

SAGA Interface Hierarchy

Look and feel: Top level Interfaces; Core SAGA objects needed by other API packages that provide specific functionality -- capability providing packages e.g., jobs, files, streams, namespaces etc.

Page 39: Session 40 : SAGA Overview and Introduction

Text

SAGA Interface Tour

The common root for all SAGA classes. Provides unique ID to maintain a list of SAGA objects. Provides methods (get_id()) essential for all SAGA objects

Page 40: Session 40 : SAGA Overview and Introduction

Text

Errors and Exceptions

SAGA defines a hierarchy of exceptions (and allows implementations to fill in specific details)

Page 41: Session 40 : SAGA Overview and Introduction

Text

Session, Context, Permissions

Object provides functionality of a session handle and isolates independent sets of SAGA objects. Only needed if you wish to handle multiple credentials. Otherwise default context is used.

Page 42: Session 40 : SAGA Overview and Introduction

Text

Attributes

Where attributes need to be associated with objects, e.g. Job-submission. Key-value pairs, e.g. for resource descriptions attached to the object.

Page 43: Session 40 : SAGA Overview and Introduction

Text

Application Monitoring

Metric defines application-level data structure(s) that can be monitored and modified (steered). Also, task model requires state monitoring.

Page 44: Session 40 : SAGA Overview and Introduction

Text

Asynchronous Operations, Tasks

Most calls can be synchronous, asynchronous, or tasks (need explicit start.)

Page 45: Session 40 : SAGA Overview and Introduction

Text

Jobs

Jobs are submitted to run somewhere in the grid.

Page 46: Session 40 : SAGA Overview and Introduction

Text

Files, Directories, Name Spaces

Both for physical and replicated (“logical”) files

Page 47: Session 40 : SAGA Overview and Introduction

Text

Streams

Simple, data streaming end points.

Page 48: Session 40 : SAGA Overview and Introduction

Text

SAGA v1.0 Hierarchy - Full Glory

GridRPC - A rendering of GridRPC

Page 49: Session 40 : SAGA Overview and Introduction

Text

SAGA Task Model

  All SAGA objects implement the task model   Every method has three “flavors”

  Synchronous version - the implementation   Asynchronous version - synchronous version

wrapped in a task (thread) and started   Task version - synchronous version wrapped in a task

but not started (task handle returned)   Adaptor can implement own async. version

Page 50: Session 40 : SAGA Overview and Introduction

SAGA Task Model

Page 51: Session 40 : SAGA Overview and Introduction

Text

Functional packages post-v1.0

  Service Discovery   Adverts   Checkpointing/Recovery (Migol)   Information Service   Steering   MessageBus: Structured data transfer, also many-to-

many   Important for coupled, multi-physics applications

Page 52: Session 40 : SAGA Overview and Introduction

Text

SAGA: The Landscape

Page 53: Session 40 : SAGA Overview and Introduction

Text

And in pictures..

Page 54: Session 40 : SAGA Overview and Introduction

Text

SAGA-Condor: Campus Grids

Page 55: Session 40 : SAGA Overview and Introduction

Text

10/22/2006 LCSD'06 53

SAGA Implementation: Requirements

  Non-trivial set of requirements:   Allow heterogenous middleware to co-exist   Cope with evolving grid environments; dyn resources   Future SAGA API extensions   Portable, syntactically and semantically platform

independent; permit latency hiding mechanisms   Ease of deployment, configuration, multiple-language

support, documentation etc.   Provide synchronous, asynchronous & task versions Portability, modularity, flexibility, adaptabilty, extensibility

Page 56: Session 40 : SAGA Overview and Introduction

Text

10/22/2006 LCSD'06 54

SAGA Implementation: Extensibility

  Horizontal Extensibility – API Packages   Current packages:

  file management, job management, remote procedure calls, replica management, data streaming

  Steering, information services, checkpoint in pipeline

  Vertical Extensibility – Middleware Bindings   Different adaptors for different middleware   Set of ‘local’ adaptors

  Extensibility for Optimization and Features   Bulk optimization, modular design

Page 57: Session 40 : SAGA Overview and Introduction

Text

Is SAGA Simple?

  It depends: It is certainly not simple to implement! •  Grids are complex and the complexity needs to be

addressed somewhere, by someone! •  Pain using the middleware goes into the SAGA engine

and adaptors.   But it is simple to use!!

  Functional Packages (specific calls), Look & Feel   Somewhat like MPI - most users only need a very

small subset of calls

Page 58: Session 40 : SAGA Overview and Introduction

Text

Implementations

  OGF Standard: Two independent implementations of a specification are required

  LSU: C++   V1.0 release Sep 15   Uptake by NAREGI/KEK, gLite (SD)

  Java   VU (Amsterdam):

  Part of the OMII-UK Project   JSAGA   DEISA (DESHL)

Page 59: Session 40 : SAGA Overview and Introduction

Text

SAGA: C++ Status and Timelines http://saga.cct.lsu.edu

•  C++: Currently at v1.31 (July 2009) •  Monthly release cycles •  Release V1.0 September 15 2008 (OGF24)‏

  Both (OMII) JAVA and C++   Will mirror V1.0 of the specification   Cleaned up Build system for Linux, Mac & Win

•  Adaptors: •  Globus RLS, GRAM, GridSAM, Gridftp (available)‏ •  ssh, libcurl (in progress)‏ •  Condor, LSF •  CloudStore (KFS), HDFS

•  Extension packages: •  Service Discovery (document in public comment)‏ •  Checkpoint and Recovery (Migol)‏

•  Python bindings to the C++ available •  User Manual and Programmer's Guide (beta version available)‏

Page 60: Session 40 : SAGA Overview and Introduction

Text

Outline (2)

  Applications Developed Using SAGA:   Replica-Exchange, C02 Sequestration (EnKF)   MapReduce

  Interoperabilty across Grids, Grids-Clouds

  Other SAGA Projects (Advantage of Standards)   Uptake by JSAGA and XtreemOS   Service Discovery (SD) and uptake gLite   DESHL (DEISA) -- Tool or Application?

  Session II: SAGA Hands-on   Using mini-examples, become familiar with the API   A few Application examples   Time to play with the code programmer’s manual

Page 61: Session 40 : SAGA Overview and Introduction

Text

Replica Exchange: Hello Distributed World

•  Task Level Parallelism –  Embarrassingly

distributable! –  Loosely coupled

•  Create replicas of initial configuration

•  Spawn 'N' replicas over different machine

•  Run for time t ; Attempt configuration swap

•  Run for further time t; Repeat till finish

RN

R1 R2 R3

t

hot

300K Exchange attempts

T

t

Page 62: Session 40 : SAGA Overview and Introduction

Text

RE: Programming Requirements

•  RE can be implemented using following “primitives” – Read job description

•  # of processors, replicas, determine resources –  Submit jobs (local and remote)‏

•  Move files, job launch – Checkpoint and re-launch simulations

•  Exchange, RPC (logic to swap or not)‏ •  Implement above using “Grid primitives” provided by SAGA •  Separated “distributed” logic from “simulation” logic

–  Independent of underlying code/engine –  Science kernel is independent of details of distributed resource mgmt –  Need to coordinate resources integrated from Desktop to Supercomputers!!

Page 63: Session 40 : SAGA Overview and Introduction

Scale-Out: Dynamic Execution & Resource Aggregation

Page 64: Session 40 : SAGA Overview and Introduction

Using Multiple Resources Performance Enhancements

Page 65: Session 40 : SAGA Overview and Introduction

Ensemble Kalman Filters Heterogeneous Sub-Tasks

  Ensemble Kalman filters (EnKF), are recursive filters to handle large, noisy data; use the EnKF for history matching and reservoir characterization

  EnKF is a particularly interesting case of irregular, hard-to-predict run time characteristics:

Page 66: Session 40 : SAGA Overview and Introduction

EnKF C02 Sequestration: Data-Driven (Sensor) Distributed HPC

work with Yaakoub El-Khamra (TACC)

Page 67: Session 40 : SAGA Overview and Introduction

Performance Advantage from Scale-Out

But Why does BQP Help?

Page 68: Session 40 : SAGA Overview and Introduction

Text

Irregular Applications

  Many applications have well defined run-time characteristics:   Predictable (static) resource requirements and execution   Most policy, usage scenarios & support tools assume this   True, but not universal!

  Many Applications have irregular runtime characteristics:   Number of simulations (sub-jobs) varies   Variable resource: sub-job memory, wall-clock, px count..

  Class of applications with such characteristics:   GridSAT, learning algorithms, EnKF

  Are such applications supported on “isolated” single machines?   Is this is an application class suited for distributed infrastructure?

Page 69: Session 40 : SAGA Overview and Introduction

Text

Irregular Applications (2)‏

  Irregular resource requirement and availability:   Internal: application specific reasons   External: infrastructure, trajectory-dependence

  Grids are heterogenous and dynamic   Heterogenous due to the typical aggregation model   Dynamic due to changing resource availability & loads

  Hard to predict resource requirement and optimal mapping a priori   How can applications use dynamically determined resources?

  Development, Deployment and Scheduling   Many solutions, as always!

  Logic for adapting is provided at the application level!   Use SAGA to manage run time complexity for real application

  General purpose solution: applicable to different application   Extensible (eg features, type-of-irregularity) and scalable (resources)‏

Page 70: Session 40 : SAGA Overview and Introduction

Forward Vs Reverse Scheduling Adv of Ensemble of Loosely-Coupled

  Distributed Applications: When formulated using Loosely-Coupled Tasks:   Application keeps work ready to be fired, when resource

becomes available (ideal case)   Match resources to computational requirement (Forward)

  Take a workload; find a set of resources   Find a set of resources (first-to-run); tune workload

  Reverse Scheduling: Support for Dynamic Execution of Applications is underlying paradigm

  Late binding to resource and optimal resource configuration (not just resource selection)

  Unbalanced, irregular workload

Page 71: Session 40 : SAGA Overview and Introduction

Text

SAGA for your Distributed Application?

  Should you develop using SAGA or not? (As always) it depends:   Advantages:

  Simple yet powerful API, which supports many:   Applications types, but not all kinds of applications   Programming patterns, models (eg superscalar)‏   Usage modes (eg parameter sweep, task-farming..)‏

  Application are Infrastructure independent   Provides extensibility (eg Condor, fault-tolerance)   Interfaces well with other software (eg BQP)‏

  Disadvantages:   Overhead (trivial IMHO); “script it vs just do it”   Deployment of SAGA is currently non-trivial

  sagalite and work with RP's to ease deployment   Stable & Simple interface, but restricted functional scope

Page 72: Session 40 : SAGA Overview and Introduction

Some Motivation for Interoperabilty (1)

Courtesy: Rob

Grossman Decouple from vertical stack

Decouple from details of resource provisioning (cf: Walker)

Couple horizontal stacks

Page 73: Session 40 : SAGA Overview and Introduction

Application-level Interoperability Cloud-Cloud; Cloud-Grid

  Application-level (ALI) vs. System-level Interoperability (SLI)   Infrastructure Independence is Pre-requisite for ALI

  Programming Model should be Infrastructure & SLA agnostic:

  Same application, priced differently, for same performance   Same application, priced same, for different performance

  Availability zone   VM motion   Data-transfer cost vs. no cost

  Grids AND Clouds:   Hybrid & Heterogeneous workload: data-compute affinity differ   Complex data-flow dependency: need runtime determination

Page 74: Session 40 : SAGA Overview and Introduction

Power of Google’s MapReduce?

  MapReduce an important development but..   Currently tied to infrastructure

  MapReduce + BigTable + GFS or   MapReduce + Hbase + HDFS (Yahoo)

  Limited number of scientific applications that use this simple (though powerful) pattern

  Other patterns?

Is it possible to provide the power of MapReduce and other patterns in an infrastructure independent

Page 75: Session 40 : SAGA Overview and Introduction

SAGA: Role of Adaptors Use over different infrastructure

Page 76: Session 40 : SAGA Overview and Introduction

Controlling Relative Compute-Data Placement

Page 77: Session 40 : SAGA Overview and Introduction
Page 78: Session 40 : SAGA Overview and Introduction

Interoperability

Page 79: Session 40 : SAGA Overview and Introduction

SAGA: Extension to Clouds

  Is the API perfect, that nothing really changes?   Minor change:

  Grids: Job_service() is not coupled to app lifetime   Clouds: Job_service() is coupled to the VM lifetime

  When job_service() terminates, what happens to VM?   When VM terminates, what happens to job_service() ?   Changes to the SAGA Resource Discovery and

Management Package will address these shortcomings

Page 80: Session 40 : SAGA Overview and Introduction

Grids vs Clouds?

  Its not Grids vs. Clouds:   “…..such a good job of understanding parallel programming in the 90’s … very well prepared for multi-core..” (Ken Kennedy)

  Hard problems in distributed applications & computing, e.g., Coordination of dist. Data & computation   These remain, even if we change names..

  Its about (scientific) Applications: HLA & Support for Usage modes -- then Clouds vs. Grids is less critical   Interop of SAGA-MapReduce shows how – once correct abstractions and interfaces are in place -- much of work in easily translates to managing different back-end (adaptors)

Page 81: Session 40 : SAGA Overview and Introduction

Providing Fundamental Distributed Functionality for Applications

  Interop of SAGA-MapReduce shows how – once correct abstractions and interfaces are in place -- much of work in easily translates to managing different back-end (adaptors)

  Its not Grids vs Clouds: How can scientific applications be developed to utilize a broad range of DS, without vendor lock-in, or disruption, yet with the flexibility and performance that scientific applications demand?

  SAGA is the first comprehensive application programming system to attempt to address these issues.

Page 82: Session 40 : SAGA Overview and Introduction

Text

Outline (2)

  Applications Developed Using SAGA:   Replica-Exchange, C02 Sequestration (EnKF)   MapReduce

  Interoperabilty across Grids, Grids-Clouds

  Other SAGA Projects (Advantage of Standards)   Uptake by JSAGA and XtreemOS   Service Discovery (SD) and uptake gLite   DESHL (DEISA) -- Tool or Application?

  Session II: SAGA Hands-on   Using mini-examples, become familiar with the API   A few Application examples   Time to play with the code programmer’s manual

Page 83: Session 40 : SAGA Overview and Introduction

Text

56

http://saga.cct.lsu.edu

Page 84: Session 40 : SAGA Overview and Introduction

Text

SAGA: Other Projects

  XtreemOS   DEISA: Java library

  JRA7 DEISA (UNICORE) files and jobs   JSAGA from IN2P3 (Lyon)

  http://grid.in2p3.fr/jsaga/index.html   SD Specification

  With gLite adaptors   NAREGI, KEK

Advantage of Standards

Page 85: Session 40 : SAGA Overview and Introduction

Text

SAGA XtreemOS

  Challenges:   Linux applications should run with little (no) modifications   Grid applications should run with little (no) modifications   XtreemOS functionality must be provided to applications

Approach:   Use OGF-standardized SAGA API, enabling both Linux apps

(via POSIX semantics) and grid apps   Provide extensions packages to SAGA for XtreemOS

functionality and in the process contribute to standards

Page 86: Session 40 : SAGA Overview and Introduction

CCT: Center for Computation & Technology

Co-evolution of SAGA Standard & the XtreemOS API

(slide courtesy Thilo Kielmann)

Page 87: Session 40 : SAGA Overview and Introduction

Text

Tools as Applications

Page 88: Session 40 : SAGA Overview and Introduction

Text

Page 89: Session 40 : SAGA Overview and Introduction

Text

JSAGA

Page 90: Session 40 : SAGA Overview and Introduction

JSAGA 90

Motivations for using several grid infrastructures •  increasing the number of computing resources available to user •  need for resources with specific constraints

•  super-computer •  confidentiality (e.g. OpenPlast industrial grid) •  small overhead (e.g. consolidation) •  interactivity

•  availability, on a given grid, of: •  the software (license) •  the data

Page 91: Session 40 : SAGA Overview and Introduction

JSAGA 91

Page 92: Session 40 : SAGA Overview and Introduction

JSAGA 92

  Elis@ –  a web portal for submitting jobs to industrial and

research grid infrastructures

  SimExplorer –  a set of tools for managing simulation experiments –  includes a workflow engine that submit jobs to

heterogeneous distributed computing resources   JJS

–  a tool for running efficiently short-life jobs on EGEE

  JUX –  a multi-protocols file browser

Page 93: Session 40 : SAGA Overview and Introduction

JSAGA 93

input data

firew

all

job desc.

gLite plug-ins

Globus plug-ins

job

staging graph

job

OPlast EGEE

JDL RSL

Page 94: Session 40 : SAGA Overview and Introduction

EGEE-III INFSO-RI-222667

Enabling Grids for E-

sciencE

www.eu-egee.org

EGEE and gLite are registered trademarks

An Extension to the SAGA Service Discovery API Antony Wilson (Steve Fisher and Paul Livesey) 4th EGEE Users Forum – Catania 2009

Page 95: Session 40 : SAGA Overview and Introduction

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Service Discovery API

•  The specification for the Service Discovery has been finalised – GFD.144 –  http://www.ogf.org/documents/GFD.144.pdf –  Loosely based around the gLite Service Discovery –  Uses the GLUE (version 1.3) model of a service

  A Site may host many Services   A Service has multiple Service Data entries   Each Service Data entry is represented by a key and a value

•  APIs in C, C++, Java and Python complete but not releasd

•  Adapter for gLite under development –  Based around GLUE 1.3

  Once GLUE 2 is finalised the adapter will be updated

An Extension to the SAGA Service Discovery API - Catania 2009 95

Page 96: Session 40 : SAGA Overview and Introduction

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Service Discovery API

•  API allows selection based on three filters: –  serviceFilter – allows filtering on:

  type, name, uid, site, url, implementor and relatedService –  dataFilter – no predefined values

  Uses keys from Service Data entries –  authzFilter – authorization, no predefined values, useful values

include:   vo, dn, group and role (values dependent on adapter)

–  NB if an authzFilter is not provided then one is automatically constructed from the users security context   The gLite adapter will provide the VOMS proxy credentials as the

default value for the security context

•  Each of the filter strings uses SQL92 syntax •  The filters act as if they are part of a WHERE clause

An Extension to the SAGA Service Discovery API - Catania 2009 96

Page 97: Session 40 : SAGA Overview and Introduction

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Service Discovery

•  Selection returns a list of ServiceDescriptions –  Each description contains:

  type, name, uid, site, url, implementor, list of relatedServices and service data (key value pairs)

•  Example: discoverer = SDFactory.createDiscoverer( ) serviceDescriptions = discoverer.listServices(“type = ‘computing service’”, “”) loop serviceDescriptions description.getAttribute(“name”)

returns the value of the name attribute •  URL of information system can be passed in with the

constructor, or obtained from a conf file

An Extension to the SAGA Service Discovery API - Catania 2009 97

Page 98: Session 40 : SAGA Overview and Introduction

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Extended Service Discovery API

•  Problem: The service discovery API only gives basic information as it cannot represent the GLUE model in three tables

•  Purpose: To navigate the information model starting from a selected service

•  API will be independent of the underlying information system

•  Different information systems supported by means of adapters –  We will provide a gLite adapter

•  Navigation will be from entity to entity as expressed in the GLUE entity relationship models

An Extension to the SAGA Service Discovery API - Catania 2009 98

Page 99: Session 40 : SAGA Overview and Introduction
Page 100: Session 40 : SAGA Overview and Introduction
Page 101: Session 40 : SAGA Overview and Introduction
Page 102: Session 40 : SAGA Overview and Introduction
Page 103: Session 40 : SAGA Overview and Introduction
Page 104: Session 40 : SAGA Overview and Introduction

Acknowledgements

FundingAgencies:UKEPSRC:DPA,OMII‐UK,OMII‐UKPAL

USNSF:Cybertools,HPCOPS(TeraGrid),GridChem

NIH:LaCATS,INBRE‐LBRN

CCTInternalResources

People:SAGAD&D:HartmutKaiser,OleWeidner,AndreMerzky,JoohyunKim,Lukasz

Lacinski,JoãoAbecasis,ChrisMiceli,BetyRodriguez‐Milla

SpecialUsers:AndreLuckow,Yaakoubel‐Khamra,KateStamou,Cybertools(AbhinavThota,Jeff,N.Kim),OwainKenway

GoogleSoC:MichaelMiceli,SaurabhSehgal,MiklosErdelyi

CollaboratorsandContributors:SteveFisher&Group,ThiloKielman&Co,SylvainRenaud(JSAGA),GoIwai&YoshiyukiWatase(KEK)

Page 105: Session 40 : SAGA Overview and Introduction

SomeRelevantLinks

•  hWp://saga.cct.lsu.edu•  hWp://www.cct.lsu.edu/~sjha/select_publica\ons•  hWp://saga.cct.lsu.edu/projects

(outofdate;tobeupdated)

•  TeraGrid‐DEISAInteroperabiltyProject– hWp://www.teragridforum.org/mediawiki/index.php?\tle=LONI‐TeraGrid‐DEISA_Interoperabilty_Project