31
Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Embed Size (px)

Citation preview

Page 1: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Dan Bennett

Oct 26, 2005

Steering and Visualization of Batch Style Distributed

Computations-or-

What I did on my Summer Vacation

Page 2: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Overview

● Review of Parallel and Distributed Computing

● STV middle-ware● Applications of STV middle-ware

– Steering and Visualization of a MD simulation

– Model coupling, weather simulation– Check pointing, liquid crystal simulation

● My ultimate project● Conclusions

Page 3: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Types of Parallel Computers● Shared Memory

Multiprocessor● Cluster ● MPP● Vector Processors● Hybrid● Supercomputer - anything on

the Top 500 list

Rachel and Jonas at PSC, SMPs with 64 processors

Cray XT3 MPP at PSC2068 processors, custom interconnect#33 Top 500 (July 05)

Lemieux at PSC,750 4-processor node cluster,#68 Top 500

Retired T3E at PSC

Page 4: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

What I Work With

● Cluster, but perhaps of SMP● Message Passing (MPI,

PVM)● SPMD – Single Program

Multiple Data● Braveheart● Interprocess

communications dominate

the computation● CUMULVS

Bravheart, MCS Cluster

Page 5: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Supercomputer Center

Most large machines require batch batch submissionsubmission

– Allows for priority scheduling

– And maximum utilization of resources

PSC allows for interactive jobs, but they must be submitted to the batch queue

OSC states: “The fill in computer name herefill in computer name here has fixed usage limits for any interactive execution”

Page 6: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Parallel Simulations in Batch

● This has the effect of removing the scientist from “the loop”– Set up Parameters– Submit Job– Post-process Output

● The computation becomes a black

box– No way to observe what is happening

inside

Page 7: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Distributed Programs/Data

● Collecting data to a central location takes

time.● You need to collect it for

visualization/inspection of the computation● You don't always want to collect it for

processingIn a weather simulation, data at processor borders (indicated in yellow) needs to be shared with the neighbors.

Page 8: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Steering and Visualization Software

● Extracts data from distributed computation● Minimally invasive (few lines of added code)● Minimal Impact on performance● CUMULVS and others.

●Cumulvs is from ORNL

●It is part of the ACTS toolkit

●It is part of an ongoing research project

●It runs under PVM, but works with MPI

Page 9: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

CUMULVS

● Extracts the data from

the distributed computation● Synchronizes this data ● Delivers this data to a visualization client● Receives data from a steering client● Delivers this data to the distributed computation.

Page 10: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Case 1, Steering and

Visualization of ALCMD● ALCMD-Ames Lab

Classical Molecular Dynamics simulation– MPI based FORTRAN

code, ~6K lines

– Spatial decomposition

of data, irregularly

distributed

A MD computation spread over 8 nodes. The data from node four is shown in the exploded view

Page 11: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Case 1, Instrument the Code

● Insert Commands to

– Initialize the STV environment call stvfinit(FSIMPLGRP, TAG, nprocs, myproc, info)

– Define parameters● Class 1, global scaler steering parameters (ex

temperature) call stvfparamdefine(temp,'Temp',STVDOUBLE,STVVIZONLY,parID(2))

● Class 2, Distributed data (ex ID of each molecule) call stvfpfielddefne('id',STVINT,1,getid,1,0,0,STVVISONLY,ipd)

Page 12: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

ResultsConditions

● 16K atoms● 4K time steps● 6 processors

4 Runs● No

instrumentation● Instrumentation,

no extraction● Extract a scalar

value● Extract 3 scalar

values

I II III IVMFLOPS 2448 2453 2043 2037Time 100 100 120 120

A screen shot of ALCMD running on 8 nodes, along with visualizations of extracted data.

Page 13: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Further Results

● 6413 total lines of code● 200 lines added ● 3% increase● Found problems with CUMULVS

– Poor documentation– Missing commands– Missing functionality

● But the package is usable for our purposes.

Page 14: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Case 1, Work to Do● Re-do Steering of a scalar variable

– This was working once, but never on a cluster● Produce a number of steering clients

– Scalar Change Widget– Delta Widget– Molecule Browser

● Improve visualization● Add Check pointing● Extract Additional Data● Work on STV toolkit

Page 15: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Why Did I Do This?

● To gain experience with CUMULVS– Look for areas of improvement– Understand how CUMULVS works

● Basis for future experimentation● Plan to experiment with other STV

packages, this is a “control” package.

Page 16: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Case 2, Vorticity Budget of an MCV

● An opportunity for work is with Jim Kirk● Jim studies Mesoscale Convective Vortices● He wishes to perform Vorticity Budgets and

Thermodynamic Budgets ● Data comes from MM5, which can run as a

distributed computation.

Page 17: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Case 2, MM5

● mm5 produces huge amounts of data each

timestep (90 seconds of simulated time)● Normally this is only saved every half hour of

simulated time (at most)● We don’t know when to save more often (ie when

the MCV starts)● Solution, use CUMULVS to extract the data,

detect the MCV and save data.

Page 18: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Case 2, Predicted Work

● This will involve:– Intelligent viewers to detect the MCV– Intelligent viewers to perform the various budgets

● Detecting an MCV may be a parallel computation

as well.● This is model coupling

Page 19: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Model Coupling● Build a viewer that is a parallel computation.● Each node only extracts the data that it needs.● This is a natural extension of STV software● And is an active area of

research

Page 20: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Checkpointing● Save a copy of a process so if it fails it can be

restarted.

● The state includes registers and memory.

● To do this: Save the state of a process in execution

– Save local states

– Save messages in communication channels

– We must be able to recreate a state of the computation

Page 21: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Checkpointing and STV Software● Checkpointing is tough because

– You must describe what values to be saved– You must collect the save file to a central location

● STV packages allow you to do this for visualization already● This is a natural extension to these packages.

– They need to software to manage checkpoint files– They need software to restart a computation

Page 22: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Case 3, Liquid Crystal

Modelling● Liquid Crystalline material behaves both like a

liquid and a crystal– Molecules are free to fit the container– But have orientational and some positional order

Visualization of a liquid crystal simulation

Relative energy is indicated by color

The eigenvalues of the Q vector determine shape

The eigenvalues of the Q vector determine orientation

Page 23: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

The Model● Represented on a regular 3D grid by a 3x3

symmetric traceless tensor Q.● The desire is to minimize the free energy F(Q,τ).

– τ is a parameter equivalent to temperature.

Set τfind equilibriumcalculate new τ

● The model encounters bifurcation points during

this procedure.

Page 24: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Bifurcation

As I don't know it!

● Some systems of equations can experience a

dramatic change in behavior from the slightest

change of a parameter.● The following represent the behavior of a

continuously stirred tank reactor

Page 25: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Small Changes in Lambda

Lead to large

changes in the

behavior of the

model.

I used Runge-Kutta

to solve the

previous equations

varying lambda as

shown.

Lambda =0.128225884245

Lambda =0.128225884244

Page 26: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Bifurcation Points

● Occur when the Jacobian becomes non-singular● Can be detected, but it seems a real bear to do so.

– Probably involves some human intervention (now)– Read Computational Steering

● Can apparently cause multiple“ paths” that can be

followed, which can lead to physically unrealistic

situations in simulations.

Page 27: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Bifurcation, and Liquid

Crystals● It happens, and we don’t like it!● The solution to date:

Pick some path out of the

bifurcation point and follow it● This leads to “useful” but not

necessarily correct simulations.

Page 28: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Finding Bifurcation Points in the Model

● The system yields a system of n nonlinear equations (n>3x105)

● Solve this using Newton’s method.● Things get really touchytouchy near the bifurcation point

– May need to switch numeric methods used.● After the paths out of the bifurcation point are

found, there is no way to predict the correct path without taking a few steps down the path.

● This leads the need for checkpointing/rollback

Page 29: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

Final

Project

● Full Steering and visualization to liquid crystal code– Including numerical methods

● Model coupling to determine bifurcation points● Rollback/restart when the wrong path is chosen.● Steering through bifurcation points● Distributed rendering

Page 30: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

And …● Improve some STV package

– Make it more Bifurcation friendly● Specialized checkpointing/rollback features● Other necessary improvements

– Add other needed features● The ability to document extracted data better

● Create a STV client toolkit– New steering clients– New visualization clients

● Whatever else I am told to do!

Page 31: Dan Bennett Oct 26, 2005 Steering and Visualization of Batch Style Distributed Computations -or- What I did on my Summer Vacation

The End

Questions?

Thanks!