38
Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science Manish Parashar Center for Autonomic Computing The Applied Software Systems Laboratory Rutgers, The State University of New Jersey & Office of Cyberinfrastructure National Science Foundation

Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

  • Upload
    etan

  • View
    41

  • Download
    1

Embed Size (px)

DESCRIPTION

Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science. Manish Parashar Center for Autonomic Computing The Applied Software Systems Laboratory Rutgers, The State University of New Jersey & Office of Cyberinfrastructure National Science Foundation. - PowerPoint PPT Presentation

Citation preview

Page 1: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

Addressing Complexity in Emerging Cyber-Ecosystems –

Exploring the Role of Autonomics in E-Science

Manish ParasharCenter for Autonomic Computing

The Applied Software Systems LaboratoryRutgers, The State University of New Jersey

&Office of CyberinfrastructureNational Science Foundation

Page 2: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Outline of My Presentation• Computational Ecosystems

– Unprecedented opportunities, challenges

• Autonomic computing – A pragmatic approach for addressing complexity!

• Experiments with autonomics for science and engineering

• Concluding Remarks

Page 3: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

Cyberinfrastructure => Cyber-Ecosystems

21st Century Science and Engineering: New Paradigms & Practices

• Transformed by CI• End-to-end – seamless access, aggregation, interactions• Fundamentally collaborative & data-driven/data intensive

• Unprecedented opportunities• New requirements, challenges • New thinking in/approaches to computation science

• How can it benefit current applications?• How can it enable new thinking in science?

Page 4: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10The Instrumented Oil Field (with UT-CSM, UT-IG, OSU, UMD, ANL)

Detect and track changes in data during production.Invert data for reservoir properties.Detect and track reservoir changes.

Assimilate data & reservoir properties into the evolving reservoir model.

Use simulation and optimization to guide future production.

Data Driven

ModelDriven

Page 5: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Many Application Areas ….• Hazard prevention, mitigation and response

– Earthquakes, hurricanes, tornados, wild fires, floods, landslides, tsunamis, terrorist attacks

• Critical infrastructure systems– Condition monitoring and prediction of future capability

• Transportation of humans and goods – Safe, speedy, and cost effective transportation networks and vehicles (air,

ground, space)• Energy and environment

– Safe and efficient power grids, safe and efficient operation of regional collections of buildings

• Health– Reliable and cost effective health care systems with improved outcomes

• Enterprise-wide decision making– Coordination of dynamic distributed decisions for supply chains under

uncertainty• Next generation communication systems

– Reliable wireless networks for homes and businesses• … … … …

• Report of the Workshop on Dynamic Data Driven Applications Systems, F. Darema et al., March 2006, www.dddas.org

Source: M. Rotea, NSF

Page 6: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

The Challenge: Managing Complexity, Uncertainty (I)• Increasing application, data/information, system complexity

– Scale, heterogeneity, dynamism, unreliability, …, disruptive trends, …

• New application formulations, practices– Data intensive and data driven, coupled, multiple

physics/scales/resolution, adaptive, compositional, workflows, etc.

• Complexity/uncertainty must be simultaneously addressed at multiple levels– Algorithms/Application formulations

• Asynchronous/chaotic, failure tolerant, …– Abstractions/Programming systems

• Adaptive, application/system aware, proactive, …– Infrastructure/Systems

• Decoupled, self-managing, resilient, …

Page 7: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

The Challenge: Managing Complexity, Uncertainty (II)

• The ability of scientists to realize the potential of computational ecosystems is being severely hampered due to the increased complexity and dynamism of the applications and computing environments.

• To be productive, scientists often have to comprehend and manage complex computing configurations, software tools and libraries as well as application parameters and behaviors.

• Autonomics and self-* can help ?(with the “plumbing” for starters…)

Page 8: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Outline of My Presentation• Computational Ecosystems

– Unprecedented opportunities, challenges

• Autonomic computing – A pragmatic approach for addressing complexity!

• Experiments with autonomics for science and engineering

• Concluding Remarks

Page 9: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

The Autonomic Computing Metaphor• Current paradigms, mechanisms, management tools are

inadequate to handle the scale, complexity, dynamism and heterogeneity of emerging systems and applications

• Nature has evolved to cope with scale, complexity, heterogeneity, dynamism and unpredictability, lack of guarantees– self configuring, self adapting, self optimizing, self healing, self protecting,

highly decentralized, heterogeneous architectures that work !!!

• Goal of autonomic computing is to enable self-managing systems/applications that addresses these challenges using high level guidance– Unlike AI duplication of human thought is not the ultimate goal!

“Autonomic Computing: An Overview,” M. Parashar, and S. Hariri, Hot Topics, Lecture Notes in Computer Science, Springer Verlag, Vol. 3566, pp. 247-259, 2005.

Page 10: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10Motivations for Autonomic Computing

Source: http:idc 2006

8/12/07: 20K people + 60 planes held at LAX after computer failure prevented customs from screening arrivals

8/3/07: (EPA) datacenter energy use by 2011 will cost $7.4 B, 15 power plants, 15 Gwatts/hour peak

Source:http://www.almaden.ibm.com/almaden/talks/Morris_AC_10-02.pdf

Key ChallengeCurrent levels of scale, complexity and dynamism make it infeasible for humans to effectively manage and control systems and applications

2/27/07: Dow fell 546. Since worst plunge took place after 2:30 pm, trading limits were not activated

8/1/06: UK NHS hit with massive computer outage. 72 primary care + 8 acute hospital trusts affected.

Page 11: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Autonomic Computing – A Pragmatic Approach• Separation + Integration + Automation !

• Separation of knowledge, policies and mechanisms for adaptation

• The integration of self–configuration, – healing, – protection,–optimization, …

• Self-* behaviors build on automation concepts and mechanisms– Increased productivity, reduced operational costs, timely and effective

response

• System/Applications self-management is more than the sum of the self-management of its individual components

M. Parashar and S. Hariri, Autonomic Computing: Concepts, Infrastructure, and Applications, CRC Press, Taylor & Francis Group, ISBN 0-8493-9367-1, 2007.

Page 12: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Autonomic Computing Theory• Integrates and advances several fields

– Distributed computing• Algorithms and architectures

– Artificial intelligence• Models to characterize,

predict and mine data and behaviors

– Security and reliability• Designs

and models of robust systems

– Systems and software architecture• Designs and models of

components at different IT layers– Control theory

• Feedback-based control and estimation– Systems and signal processing theory

• System and data models and optimization methods• Requires experimental validation

(From S. Dobson et al., ACM Tr. on Autonomous & Adaptive Systems, Vol. 1, No. 2, Dec. 2006.)

Page 13: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Autonomics for Science and Engineering ?

• Manage application/information/system complexity• not just hide it!

• Enabling new thinking, formulations• how do I think about/formalize my

problem differently?

Page 14: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10Existing Autonomic Practices in Computational Science (GMAC 09, SOAR 09, with S. Jha and O. Rana)

Autonomic tuning by the application

Autonomic tuning of the application

Page 15: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Spatial, Temporal and Computational Heterogeneity and Dynamics in SAMR

Simulation of combustion based on SAMR (H2-Air mixture; ignition via 3 hot-spots)

Temperature

OH Profile

Temporal Heterogeneity

Spatial Heterogeneity

Courtesy: Sandia National Lab

Page 16: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Autonomics in SAMR• Tuning by the application

– Application level: when and where to refine– Runtime/Middleware level: When, where, how to partition and

load balance– Runtime level: When, where, how to partition and load balance– Resource level: Allocate/de-allocate resources

• Tuning of the application, runtime – When/where to refine– Latency aware ghost synchronization– Heterogeneity/Load-aware partitioning and load-balancing– Checkpoint frequency– Asynchronous formulations– …

Page 17: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Outline of My Presentation• Computational Ecosystems

– Unprecedented opportunities, challenges

• Autonomic computing – A pragmatic approach for addressing complexity!

• Experiments with autonomics for science and engineering

• Concluding Remarks

Page 18: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Autonomics for Science and Engineering – Application-level Examples

Page 19: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10Coupled Fusion Simulations: A Data Intensive Workflow

Page 20: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10Autonomic Data Streaming and In-Transit Processing for Data-Intensive Workflows• Workflow with coupled simulation codes, i.e., the edge

turbulence particle-in-cell (PIC) code (GTC) and the microscopic MHD code (M3D) -- run simultaneously on separate HPC resources

• Data streamed and processed enroute -- e.g. data from the PIC codes filtered through “noise detection” processes before it can be coupled with the MHD code

• Efficiently data streaming between live simulations -- to arrive just-in-time -- if it arrives too early, times and resources will have to be wasted to buffer the data, and if it arrives too late, the application would waste resources waiting for the data to come in

• Opportunistic use of in-transit resources “An Self-Managing Wide-Area Data Streaming Service,” V. Bhat*, M. Parashar, H. Liu*, M. Khandekar*, N. Kandasamy, S. Klasky, and S. Abdelwahed, Cluster Computing: The Journal of Networks, Software Tools, and Applications, Volume 10, Issue 7, pp. 365 – 383, December 2007.

Page 21: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10Autonomic Data Streaming & In-Transit Processing

– Application level• Proactive QoS management strategies using model-based LLC controller• Capture constraints for in-transit processing using slack metric

– In-transit level• Opportunistic data processing using dynamic in-transit resource overlay• Adaptive run-time management at in-transit nodes based on slack metric

generated at application level– Adaptive buffer management and forwarding

Application Level “Proactive” management

Simulation

LLC Controller

Slack metric Generator

In-Transit nodeSimulation

Slack metric Generator

In-Transit Level “Reactive” management

Slack metric corrector

Coupling

Slack metric corrector

Budget estimation

Slack metric adjustment

metric updates

Sink

Data flow

Page 22: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10Autonomic Streaming: Implementation/Deployment

• Simulation Workflow– SS = Simulation Service (GTC)– ADSS = Autonomic Data

Streaming Service• CBMS = LLC Controller based

buffer management service • DTS = Data Transfer service

– DAS = Data Analysis Service– SLAMS = Slack Manager Service– PS = Processing Service– BMS = Buffer Management

Service– ArchS = Archiving data at sink

Sort data

Scale data

Data Producers

SSNERSC

Rutgers University

ADSSArchSDAS

DASCBMS DTS

DAS

SSORNL

ADSS

Data In-Transit

Data Consumers

SLAMS

DTS

PS

PPPL

FFT

DAS DAS

Rutgers University

VisSDASBMS

SLAMS

BudjS

SLAMS

Sink

SLAMS

FFT

• Simulations executes on leadership class machines at ORNL and NERSC

• In-transit nodes located at PPPL and Rutgers

Page 23: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Adaptive Data Transfer

• No congestion in intervals 1-9 – Data transferred over WAN

• Congested at intervals 9-19 – Controller recognizes this congestion and advises the Element Manager, which in

turn adapts DTS to transfer data to local storage (LAN).• Adaptation continues until the network is not congested

– Data sent to the local storage by the DTS falls to zero at the 19th controller interval.

Controller Interval0 2 4 6 8 10 12 14 16 18 20 22 24

Dat

a Tr

ansf

errr

ed b

y D

TS(M

B)

0

20

40

60

80

100

120

140

Ban

dwid

th (M

b/se

c)

0

20

40

60

80

100

120

DTS to WANDTS to LANBandwidthCongestion

Page 24: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Exploring Hybrid HPC-Grid/Cloud Usage Modes [eScience’09]

• Production computation infrastructures will be (are) hybrid integrating HPC Grids and Clouds

• What are appropriate usage modes for hybrid infrastructure?– Acceleration

• Clouds can be used as accelerators to improve the application time to completion

– To alleviate the impact of queue wait times– “Strategically Off load” appropriate tasks to Cloud resources– All whilst respecting budget constraints.

– Conservation• Clouds can be used to conserve HPC Grid allocations, given

appropriate runtime and budget constraints. – Resilience

• Clouds can be used to handle:– General: Response to dynamic execution environments– Specific: Unanticipated HPC Grid downtime, inadequate allocations

or unexpected Queue delays/QoS change

Page 25: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Reservoir Characterization: EnKF-based History Matching (with S. Jha)

• Black Oil Reservoir Simulator – simulates the

movement of oil and gas in subsurface formations

• Ensemble Kalman Filter– computes the Kalman

gain matrix and updates the model parameters of the ensembles

• Hetergeneous, dynamic workflows

• Based on Cactus, PETSc

Page 26: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Exploring Hybrid HPC-Grid/Cloud Usage Modes using CometCloud

EnKF application

CometCloud

Cloud

GridAgent

Pull TasksPull Tasks

Push Tasks

HPC Grid

Mgmt. Info. Mgmt. Info.

HPC Grid CloudCloud

CloudAgent

Workflowmanager

Runtimeestimator

Autonomicscheduler

Monitor

Analysis

Adaptation

AdaptivityManager

Applicationadaptivity

Infrastructureadaptivity

Page 27: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Objective I: Using Clouds as Acceleratorsfor HPC Grids (2/2)

The TTC and TCC for Objective I with 16 TG CPUs and queuing times set to 5 and 10 minutes. As expected, more the number of VMs that are made available, the greater the acceleration, i.e., lower the TTC. The reduction in TTC is roughly linear, but is not perfectly so, because of a complex interplay between the tasks in the work load and resource availability

Page 28: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Objective II: Using Clouds for ConservingCPU-Time on the TeraGrid• Explore how to conserve fixed allocation of CPU hours

by offloading tasks that perhaps don’t need the specialized capabilities of the HPC Grid

Distribution of tasks across EC2 and TG, TTC and TCC, as the CPU-minute allocation on the TG is increased.

Page 29: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Objective III: Response to Changing Operating Conditions (Resilience) (2/4)

Allocation of tasks to TG CPUs and EC2 nodes for usage mode III. As the 16 allocated TG CPUs become unavailable after only 70 minutes rather than the planned 800 minutes, the bulk of the tasks are completed by EC2 nodes.

Page 30: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Objective III: Response to Changing Operating Conditions (Resilience) (3/4)

Number of TG cores and EC2 nodes as a function of time for usage mode III. Note that the TG CPU allocation goes to zero after about 70 minutes causing the autonomic scheduler to increase the EC2 nodes by 8.

Page 31: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

The Instrumented Oil Field• Production of oil and gas can take advantage of installed sensors that

will monitor the reservoir’s state as fluids are extracted• Knowledge of the reservoir’s state during production can result in better

engineering decisions – economical evaluation; physical characteristics (bypassed oil, high pressure

zones); productions techniques for safe operating conditions in complex and difficult areas

Detect and track changes in data during productionInvert data for reservoir propertiesDetect and track reservoir changes

Assimilate data & reservoir properties into the evolving reservoir model

Use simulation and optimization to guide future production, future data acquisition strategy

“Application of Grid-Enabled Technologies for Solving Optimization Problems in Data-Driven Reservoir Studies,” M. Parashar, H. Klie, U. Catalyurek, T. Kurc, V. Matossian, J. Saltz and M Wheeler, FGCS. The International Journal of Grid Computing: Theory, Methods and Applications (FGCS), Elsevier Science Publishers, Vol. 21, Issue 1, pp 19-26, 2005.

Page 32: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Effective Oil Reservoir Management: Well Placement/Configuration

• Why is it important – Better utilization/cost-effectiveness of existing reservoirs– Minimizing adverse effects to the environment

Better Management

Less Bypassed Oil

Bad Management

Much Bypassed Oil

Page 33: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Optimize• Economic revenue• Environmental hazard• …Based on the present subsurface knowledge and numerical model

Improve numerical model

Plan optimal data acquisition

Acquire remote sensing data

Improve knowledge of subsurface to reduce uncertainty

Update knowledge of model

Man

agem

ent d

ecis

ion

START

Dynamic Decision Dynamic Decision SystemSystem

Dynamic Data-Dynamic Data-Driven Assimilation Driven Assimilation

Data assimilation

Subsurface characterization

Experimental design

Autonomic Autonomic Grid Grid MiddlewareMiddleware

Grid Data ManagementGrid Data ManagementProcessing MiddlewareProcessing Middleware

Autonomic Reservoir Management: “Closing the Loop” using Optimization

Page 34: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Autonomic Formulations/ProgrammingElement Manager

Functional Port

Autonomic Element

Control Port

Operational Port

ComputationalElement

Element Manager

Functional Port

Autonomic Element

Control Port

Operational Port

ComputationalElement

Element Manager

Event generation

Actuatorinvocation

OtherInterface

invocation

Internalstate

Contextualstate

Rules

Element Manager

Event generation

Actuatorinvocation

OtherInterface

invocation

Internalstate

Contextualstate

Rules

Application workflow

Composition manager

Application strategiesApplication requirements

Interaction rules

Interaction rules

Interaction rules

Interaction rules

Behavior rules

Behavior rules

Behavior rules

Behavior rules

Application workflow

Composition manager

Application strategiesApplication requirements

Interaction rules

Interaction rules

Interaction rules

Interaction rules

Behavior rules

Behavior rules

Behavior rules

Behavior rules

Page 35: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

An Autonomic Well Placement/Configuration Workflow

If guess not in DBinstantiate IPARS

with guess asparameter

Send guesses

MySQLDatabase

If guess in DB:send response to Clientsand get new guess fromOptimizer

OptimizationService

IPARSFactory

SPSA

VFSA

ExhaustiveSearch

DISCOVERclient

client

Generate Guesses Send GuessesStart Parallel

IPARS InstancesInstance connects to

DISCOVER

DISCOVERNotifies ClientsClients interact

with IPARS

AutoMate Programming System/Grid Middleware

History/ Archived Data

Sensor/ContextData

Oil prices, Weather, etc.

Page 36: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Autonomic Oil Well Placement/Configuration (VFSA)

“An Reservoir Framework for the Stochastic Optimization of Well Placement,” V. Matossian, M. Parashar, W. Bangerth, H. Klie, M.F. Wheeler, Cluster Computing: The Journal of Networks, Software Tools, and Applications, Kluwer Academic Publishers, Vol. 8, No. 4, pp 255 – 269, 2005 “Autonomic Oil Reservoir Optimization on the Grid,” V. Matossian, V. Bhat, M. Parashar, M. Peszynska, M. Sen, P. Stoffa and M. F. Wheeler, Concurrency and Computation: Practice and Experience, John Wiley and Sons, Volume 17, Issue 1, pp 1 – 26, 2005.

Page 37: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Summary• CI and emerging computational ecosystems

– Unprecedented opportunity• new thinking, practices in science and engineering

– Unprecedented research challenges• scale, complexity, heterogeneity, dynamism, reliability, uncertainty, …

• Autonomic Computing can address complexity and uncertainty– Separation + Integration + Automation

• Experiments with Autonomics for science and engineering – Autonomic data streaming and in-transit data manipulation, Autonomic

Workflows, Autonomic Runtime Management, …

• However, there are implications– Added uncertainty– Correctness, predictability, repeatability– Validation– New formulations necessary….

Page 38: Addressing Complexity in Emerging Cyber-Ecosystems – Exploring the Role of Autonomics in E-Science

5th EGEE User Forum – 04/13/10

Thank You!

Email: [email protected]