Parallel Simulation of Continuous Systems:A Brief Introduction
Oct. 19, 2005
CS6236 Lecture
Background
Sample applications of continuous systems Civil engineering: building construction Aerospace engineering: aircraft design Mechanical engineering: machining Systems biology: heart simulations Computer engineering: semiconductor simulations
Computersimulations
Discretemodels
Continuous models
Outline
Mathematical models and methods
Parallel algorithm methodology
Some active research areas
Mathematical Models
Ordinary/partial differential equations Laplace equation: Heat (diffusion) equation:
Steady-state v.s. time-dependent Convert into discrete problem through numerical
discretization Finite difference methods: structured grids Finite element methods: local basis functions Spectral methods: global basis functions Finite volume methods: conservation
Example: 1-D Laplace Equation
Laplace equation in one dimension
with boundary conditions
Finite difference approximation
with Jacobi iteration
Example: 2-D Laplace Equation
Laplace equation in two dimension
with boundary conditions at four sides
Parallel Programming Model
Parallel computation: two or more tasks executing concurrently
Task encapsulates sequential program and local memory
Tasks can be mapped to processors in various ways, including multiple tasks per processor
Performance Considerations
Load balance: work divided evenly Concurrency: work done simultaneously Overhead: work not present in serial
computation Communication Synchronization Redundant work Speculative work
Example: 1-D Laplace Equation
Define n tasks, one for each yi
Program for task i, i=1,…,n
Initialize yi
for k=1,…if i>1, send yi to task i-1if i<n, send yi to task i+1
if i<n, recv yi+1 from task i+1if i>1, recv yi-1 from task i-1yi = (yi-1+yi+1)/2
end
Design Methodology
Partition (Decomposition): decompose problem into fine-grained tasks to maximize potential parallelism
Communication: determine communication pattern among tasks
Agglomeration: combine into coarser-grained tasks, if necessary, to reduce communication requirements or other costs
Mapping: assign tasks to processors, subject to tradeoff between communication cost and concurrency
Design Methodology
Types of Partitioning
Domain decomposition: partition data Example: grid points in 1-, 2-, or 3-D mesh
Functional decomposition: partition computation Example: components in climate model (atmosphere,
ocean, land, etc.)
Example: Domain Decomposition
3-D mesh can be partitioned along any combination of one, two, or all three of its dimensions
Partitioning Checklist
Identify at least an order of magnitude more tasks than processors in target parallel system
Avoid redundant computation or storage Make tasks reasonably uniform in size Number of tasks, rather than size of each task,
should grow as problem size increases
Communication Issues
Latency and bandwidth Routing and switching Contention, flow control, and aggregate
bandwidth Collective communication
One-to-many: broadcast, scatter Many-to-one: gather, reduction, scan All-to-all Barrier
Communication Checklist
Communication should be reasonably uniform across tasks in frequency and volume
As localized as possible Concurrent Overlapped with computation, if possible Not inhibiting concurrent execution of tasks
Agglomeration
Communication is proportional to surface area of subdomain, whereas computation is proportional to volume of subdomain
Higher-dimensional decompositions have more favorable communication-to-computation ratio
Increasing task sizes reduces communication but also reduces potential concurrency and flexibility
Surface-to-Volume Ratio
Example: Agglomeration
Define p tasks, each with n/p of yi’s Program for task j, j=1,...p
initialize yl,...,yh
for k=1,...if j>1, send yl to task j-1if j<p, send yh to task j+1if j<p, recv yh+1 from task j+1if j>1, recv yl-1 from task j-1for i=l to h
zi = (yi-1+yi+1)/2endy = z
end
Example: Overlap Comm/Comp
Program for task j, j=1,...p
initialize yl,...,yh
for k=1,...if j>1, send yl to task j-1if j<p, send yh to task j+1for i=l+1 to h-1
zi = (yi-1+yi+1)/2endif j<p, recv yh+1 from task j+1zh = (yh-1+yh+1)/2 if j>1, recv yl-1 from task j-1zl = (yl-1+yl+1)/2y = z
end
Mapping
Two basic strategies for assigning tasks to processors: Place tasks that can execute concurrently on different
processors Place tasks that communicate frequently on same
processor Problem: These two strategies often conflict In general, finding optimal solution to this
tradeoff is NP-complete, so heuristics are used to find reasonable compromise
Dynamic vs static strategies
Mapping Issues
Partitioning Granularity Mapping Scheduling Load balancing
Particularly challenging for irregular problems Some software tools: Metis, Chaco, Zoltan, etc.
Example: Atmosphere Model
Partitioning grid points in 3-D finite difference model Typically yields 105 to 107 tasks
Communication 9-point stencil horizontally and 3-point stencil vertically Physics computations in vertical columns Global operations to compute total mass
Example: Atmosphere Model
Other Equations
Heat (diffusion) equation: Laplace equation:
Advection equation: Wave equation: Classification of second-order equations
Parabolic, hyperbolic, and elliptic Methods for time-dependent equations
Explicit v.s. implicit Finite-difference, finite-volume, finite-element
CFL Condition for Stability
Necessary condition named after Courant, Friedrichs, and Lewy
Computational domain of dependence must contain physical domain of dependence
Implies time step must satisfy
Active Research Areas
DES of continuous systems
Active Research Areas
Coupling of different physics Different mathematical models Continuous v.s. discrete techniques
Load balancing Manager-worker model Irregular/unstructured problems Dynamic load balancing
Summary
Mathematical models for continuous systems Ordinary and partial differential equations Finite difference, finite volume, and finite element
Parallel algorithm design Partitioning Communication Agglomeration Mapping
Active research areas
References
I. T. Foster, Designing and Building Parallel Programs, Addison-Wesley, 1995
A. Grama, A. Gupta, G. Karypis, and V. Kumar, Introduction to Parallel Computing, 2nd. ed., Addison-Wesley, 2003
M. J. Quinn, Parallel Computing: Theory and Practice, McGraw-Hill, 1994
K. M. Chandy and J. Misra, Parallel Program Design: A Foundation, Addison-Wesley, 1988