14
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

Parallelizing stencil computations

  • Upload
    metta

  • View
    42

  • Download
    1

Embed Size (px)

DESCRIPTION

Parallelizing stencil computations. Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267. Parallelism in the Game of Life. The activities in this system are discrete events The simulation is synchronous - PowerPoint PPT Presentation

Citation preview

Page 1: Parallelizing  stencil computations

Parallelizing stencil

computations

Based on slides from David Culler,

Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB

CS267

Page 2: Parallelizing  stencil computations

2

Parallelism in the Game of Life

• The activities in this system are discrete events• The simulation is synchronous

• use two copies of the grid (old and new)• the value of each new grid cell in new depends only on the 9

cells (itself plus neighbors) in old grid (“stencil computation”)• Each grid cell update is independent: reordering or parallelism OK

• simulation proceeds in timesteps, where (logically) each cell is evaluated at every timestep

old world new world

Page 3: Parallelizing  stencil computations

3

Parallelism in Life

• Parallelism is straightforward• ocean is regular data structure• even decomposition across processors gives load balance

• Locality is achieved by using large patches of the world• boundary values from neighboring patches are needed

• Optimization: visit only occupied cells (and neighbors)

Page 4: Parallelizing  stencil computations

4

Two-dimensional block decomposition

• If each processor owns n2/p elements to update, …• … amount of data communicated, n/p per neighbor,

is relatively small if n>>p• This is less than n per neighbor for block column

decomposition

Page 5: Parallelizing  stencil computations

5

Redundant “Ghost” Nodes in Stencil Computations

• Size of ghost region (and redundant computation) depends on network/memory speed vs. computation

• Can be used on unstructured meshes

To compute green

Copy yellow

Compute blue

Page 6: Parallelizing  stencil computations

6

Comments on practical meshes

• Regular 1D, 2D, 3D meshes• Important as building blocks for more complicated meshes

• Practical meshes are often irregular• Composite meshes, consisting of multiple “bent” regular

meshes joined at edges• Unstructured meshes, with arbitrary mesh points and

connectivities• Adaptive meshes, which change resolution during solution

process to put computational effort where needed

Page 7: Parallelizing  stencil computations

7

Parallelism in Regular meshes

• Computing a Stencil on a regular mesh• need to communicate mesh points near boundary to

neighboring processors.• Often done with ghost regions

• Surface-to-volume ratio keeps communication down, but• Still may be problematic in practice

Implemented using “ghost” regions.

Adds memory overhead

Page 8: Parallelizing  stencil computations

8

Irregular mesh: NASA Airfoil in 2D

Page 9: Parallelizing  stencil computations

9

Composite Mesh from a Mechanical Structure

Page 10: Parallelizing  stencil computations

10

Converting the Mesh to a Matrix

Page 11: Parallelizing  stencil computations

11

Adaptive Mesh Refinement (AMR)

•Adaptive mesh around an explosion•Refinement done by calculating errors

•Parallelism •Mostly between “patches,” dealt to processors for load balance•May exploit some within a patch (SMP)

Page 12: Parallelizing  stencil computations

12

Adaptive Mesh

Shock waves in a gas dynamics using AMR (Adaptive Mesh Refinement) See: http://www.llnl.gov/CASC/SAMRAI/

fluid

de n

s ity

Page 13: Parallelizing  stencil computations

13

Irregular mesh: Tapered Tube (Multigrid)

Page 14: Parallelizing  stencil computations

14

Challenges of Irregular Meshes for PDE’s

• How to generate them in the first place• E.g. Triangle, a 2D mesh generator by Jonathan Shewchuk• 3D harder! E.g. QMD by Stephen Vavasis

• How to partition them• ParMetis, a parallel graph partitioner

• How to design iterative solvers• PETSc, a Portable Extensible Toolkit for Scientific Computing• Prometheus, a multigrid solver for finite element problems on

irregular meshes

• How to design direct solvers• SuperLU, parallel sparse Gaussian elimination

• These are challenges to do sequentially, more so in parallel