Upload
pierce-booth
View
216
Download
0
Embed Size (px)
Citation preview
Dan Bennett
Oct 26, 2005
Steering and Visualization of Batch Style Distributed
Computations-or-
What I did on my Summer Vacation
Overview
● Review of Parallel and Distributed Computing
● STV middle-ware● Applications of STV middle-ware
– Steering and Visualization of a MD simulation
– Model coupling, weather simulation– Check pointing, liquid crystal simulation
● My ultimate project● Conclusions
Types of Parallel Computers● Shared Memory
Multiprocessor● Cluster ● MPP● Vector Processors● Hybrid● Supercomputer - anything on
the Top 500 list
Rachel and Jonas at PSC, SMPs with 64 processors
Cray XT3 MPP at PSC2068 processors, custom interconnect#33 Top 500 (July 05)
Lemieux at PSC,750 4-processor node cluster,#68 Top 500
Retired T3E at PSC
What I Work With
● Cluster, but perhaps of SMP● Message Passing (MPI,
PVM)● SPMD – Single Program
Multiple Data● Braveheart● Interprocess
communications dominate
the computation● CUMULVS
Bravheart, MCS Cluster
Supercomputer Center
Most large machines require batch batch submissionsubmission
– Allows for priority scheduling
– And maximum utilization of resources
PSC allows for interactive jobs, but they must be submitted to the batch queue
OSC states: “The fill in computer name herefill in computer name here has fixed usage limits for any interactive execution”
Parallel Simulations in Batch
● This has the effect of removing the scientist from “the loop”– Set up Parameters– Submit Job– Post-process Output
● The computation becomes a black
box– No way to observe what is happening
inside
Distributed Programs/Data
● Collecting data to a central location takes
time.● You need to collect it for
visualization/inspection of the computation● You don't always want to collect it for
processingIn a weather simulation, data at processor borders (indicated in yellow) needs to be shared with the neighbors.
Steering and Visualization Software
● Extracts data from distributed computation● Minimally invasive (few lines of added code)● Minimal Impact on performance● CUMULVS and others.
●Cumulvs is from ORNL
●It is part of the ACTS toolkit
●It is part of an ongoing research project
●It runs under PVM, but works with MPI
CUMULVS
● Extracts the data from
the distributed computation● Synchronizes this data ● Delivers this data to a visualization client● Receives data from a steering client● Delivers this data to the distributed computation.
Case 1, Steering and
Visualization of ALCMD● ALCMD-Ames Lab
Classical Molecular Dynamics simulation– MPI based FORTRAN
code, ~6K lines
– Spatial decomposition
of data, irregularly
distributed
A MD computation spread over 8 nodes. The data from node four is shown in the exploded view
Case 1, Instrument the Code
● Insert Commands to
– Initialize the STV environment call stvfinit(FSIMPLGRP, TAG, nprocs, myproc, info)
– Define parameters● Class 1, global scaler steering parameters (ex
temperature) call stvfparamdefine(temp,'Temp',STVDOUBLE,STVVIZONLY,parID(2))
● Class 2, Distributed data (ex ID of each molecule) call stvfpfielddefne('id',STVINT,1,getid,1,0,0,STVVISONLY,ipd)
ResultsConditions
● 16K atoms● 4K time steps● 6 processors
4 Runs● No
instrumentation● Instrumentation,
no extraction● Extract a scalar
value● Extract 3 scalar
values
I II III IVMFLOPS 2448 2453 2043 2037Time 100 100 120 120
A screen shot of ALCMD running on 8 nodes, along with visualizations of extracted data.
Further Results
● 6413 total lines of code● 200 lines added ● 3% increase● Found problems with CUMULVS
– Poor documentation– Missing commands– Missing functionality
● But the package is usable for our purposes.
Case 1, Work to Do● Re-do Steering of a scalar variable
– This was working once, but never on a cluster● Produce a number of steering clients
– Scalar Change Widget– Delta Widget– Molecule Browser
● Improve visualization● Add Check pointing● Extract Additional Data● Work on STV toolkit
Why Did I Do This?
● To gain experience with CUMULVS– Look for areas of improvement– Understand how CUMULVS works
● Basis for future experimentation● Plan to experiment with other STV
packages, this is a “control” package.
Case 2, Vorticity Budget of an MCV
● An opportunity for work is with Jim Kirk● Jim studies Mesoscale Convective Vortices● He wishes to perform Vorticity Budgets and
Thermodynamic Budgets ● Data comes from MM5, which can run as a
distributed computation.
Case 2, MM5
● mm5 produces huge amounts of data each
timestep (90 seconds of simulated time)● Normally this is only saved every half hour of
simulated time (at most)● We don’t know when to save more often (ie when
the MCV starts)● Solution, use CUMULVS to extract the data,
detect the MCV and save data.
Case 2, Predicted Work
● This will involve:– Intelligent viewers to detect the MCV– Intelligent viewers to perform the various budgets
● Detecting an MCV may be a parallel computation
as well.● This is model coupling
Model Coupling● Build a viewer that is a parallel computation.● Each node only extracts the data that it needs.● This is a natural extension of STV software● And is an active area of
research
Checkpointing● Save a copy of a process so if it fails it can be
restarted.
● The state includes registers and memory.
● To do this: Save the state of a process in execution
– Save local states
– Save messages in communication channels
– We must be able to recreate a state of the computation
Checkpointing and STV Software● Checkpointing is tough because
– You must describe what values to be saved– You must collect the save file to a central location
● STV packages allow you to do this for visualization already● This is a natural extension to these packages.
– They need to software to manage checkpoint files– They need software to restart a computation
Case 3, Liquid Crystal
Modelling● Liquid Crystalline material behaves both like a
liquid and a crystal– Molecules are free to fit the container– But have orientational and some positional order
Visualization of a liquid crystal simulation
Relative energy is indicated by color
The eigenvalues of the Q vector determine shape
The eigenvalues of the Q vector determine orientation
The Model● Represented on a regular 3D grid by a 3x3
symmetric traceless tensor Q.● The desire is to minimize the free energy F(Q,τ).
– τ is a parameter equivalent to temperature.
Set τfind equilibriumcalculate new τ
● The model encounters bifurcation points during
this procedure.
Bifurcation
As I don't know it!
● Some systems of equations can experience a
dramatic change in behavior from the slightest
change of a parameter.● The following represent the behavior of a
continuously stirred tank reactor
Small Changes in Lambda
Lead to large
changes in the
behavior of the
model.
I used Runge-Kutta
to solve the
previous equations
varying lambda as
shown.
Lambda =0.128225884245
Lambda =0.128225884244
Bifurcation Points
● Occur when the Jacobian becomes non-singular● Can be detected, but it seems a real bear to do so.
– Probably involves some human intervention (now)– Read Computational Steering
● Can apparently cause multiple“ paths” that can be
followed, which can lead to physically unrealistic
situations in simulations.
Bifurcation, and Liquid
Crystals● It happens, and we don’t like it!● The solution to date:
Pick some path out of the
bifurcation point and follow it● This leads to “useful” but not
necessarily correct simulations.
Finding Bifurcation Points in the Model
● The system yields a system of n nonlinear equations (n>3x105)
● Solve this using Newton’s method.● Things get really touchytouchy near the bifurcation point
– May need to switch numeric methods used.● After the paths out of the bifurcation point are
found, there is no way to predict the correct path without taking a few steps down the path.
● This leads the need for checkpointing/rollback
Final
Project
● Full Steering and visualization to liquid crystal code– Including numerical methods
● Model coupling to determine bifurcation points● Rollback/restart when the wrong path is chosen.● Steering through bifurcation points● Distributed rendering
And …● Improve some STV package
– Make it more Bifurcation friendly● Specialized checkpointing/rollback features● Other necessary improvements
– Add other needed features● The ability to document extracted data better
● Create a STV client toolkit– New steering clients– New visualization clients
● Whatever else I am told to do!
The End
Questions?
Thanks!