26
Domain Decomposition in High- Level Parallelizaton of PDE codes Xing Cai Xing Cai University of Oslo

Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

Embed Size (px)

Citation preview

Page 1: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

Domain Decomposition in High-Level Parallelizaton of PDE codes

Xing CaiXing CaiUniversity of Oslo

Page 2: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

Outline of the Talk

Introduction and motivation

A simulator parallel model

A generic programming framework

Applications

Page 3: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

Inro

du

ctio

nThe Question

Starting point: sequential PDE simulators.How to do the parallelization?

Resulting parallel simulators should have Good parallel performance Good overall numerical performance A relative simple parallelization process

We need a good parallelization strategy a good implementation of the strategy

Page 4: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

Intr

od

uct

ion

3 Key Words

Parallel Computing

faster solution, larger simulation

Domain Decomposition (additive Schwarz method)

good algorithmic efficiency

mathematical foundation of parallelization

Object-Oriented Programming extensible sequential simulator

flexible implementation framework for parallelization

Page 5: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

Intr

od

uct

ion

A Known Problem

“The hope among early domain decomposition workers was that one could write a simple controlling program which would call the old PDE software directly to perform the subdomain solves. This turned out to be unrealistic because most PDE packages are too rigid and inflexible.”

- Smith, Bjørstad and Gropp

The remedy:

Correct use of object-oriented programming techniques.

Page 6: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

Do

mai

n D

eco

mp

osi

tio

nAdditive Schwarz Method

Example:Solving the Poissonproblem on the unitsquare

Page 7: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

Des

ign

Parallelization

A simulator-parallel model

Each processor hosts an arbitrary number of subdomains balance between algorithmic efficiency and load balancing

One subdomain is assigned with a sequential simulator

Flexibility - different linear system solvers, preconditioners, convergence monitors etc. can easily be chosen for different subproblems

Domain decomposition at the level of subdomain simulators!

Page 8: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

Ob

serv

atio

ns

The Simulator-Parallel Model

Reuse of existing sequential simulators

Data distribution is implied

No need for global data

Needs additional functionalities for exchanging nodal values inside the overlapping region

Needs some global administration

Page 9: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

OO

Im

ple

men

tati

onA Generic Programming Framework

An add-on library (SPMD model) Use of object-oriented programming technique Flexibility and portability Simplified parallelization process for end-user

Page 10: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

OO

Im

ple

men

tati

on

The Administrator

Parameter Interfacesolution method or preconditioner, max iterations, stopping criterion etc

DD algorithm Interfaceaccess to predefined numerical algorithm e.g. CG

Operation Interface (standard codes & UDC)access to subdomain simulators, matrix-vector product, inner product etc

Page 11: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

OO

Im

ple

men

tati

on

The Communicator

Encapsulation of communication related codes

Hidden concrete communication model

MPI in use, but easy to change

Communication pattern determination

Inter-processor communication

Intra-processor communication

Page 12: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

OO

Im

ple

men

tati

on

The Subdomain Simulator

Subdomain Simulator -- a generic representation C++ class hierarchy Standard interface of generic member functions

Page 13: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

OO

Im

ple

men

tati

onAdaptation of Subdomain Simulator

class NewSimulator : public SubdomainFEMSolver

public OldSimulator

{

// ….

virtual void createLocalMatrix ()

{ OldSimualtor::makeSystem (); }

};SubdomainSimulator

SubdomainFEMSolver OldSimulator

NewSimulator

Page 14: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

Performance

Algorithmic efficiency efficiency of original sequential simulator(s) efficiency of domain decomposition method

Parallel efficiency communication overhead (low) coarse grid correction overhead (normally low) load balancing

subproblem size work on subdomain solves

Page 15: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

Sim

ula

tor

Par

alle

lApplication

P Sim. Time Speedup Efficiency

1 53.08 N/A N/A

2 27.23 1.95 0.97

4 14.12 3.76 0.94

8 7.01 7.57 0.95

16 3.26 16.28 1.02

32 1.63 32.56 1.02

Test case: 2D Poisson problem on unit square. Fixed subdomains M=32 based on a 481 x 481 global grid. Straightforward parallelization of an existing simulator. Subdomain solves use CG+FFT

P: number of processors.

Page 16: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

Sim

ula

tor

Par

alle

lApplication

Test case: 2D linear elasticity, 241 x 241 global grid.

Vector equation ),( 21 uuu

fuu )(

Straightforward parallelization based on an existing Diffpack simulator

Page 17: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

Sim

ula

tor

Par

alle

l2D Linear Elasticity

Page 18: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

Sim

ula

tor

Par

alle

l2D Linear Elasticity

P: number of processors in use (P=M).I: number of parallel BiCGStab iterations needed.

Multigrid V-cycle in subdomain solves

P CPU Speedup I Subgrid

1 66.01 N/A 19 241 x 241

2 24.64 2.68 12 129 x 241

4 14.97 4.41 14 129 x 129

8 5.96 11.08 11 69 x 129

16 3.58 18.44 13 69 x 69

Page 19: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

Ap

pli

cati

on

Unstructured Grid

P Subgrid Time Speedup Efficiency

1 1,503,433 201.30 N/A N/A

2 766,489 114.91 1.75 0.83

4 388,025 54.95 3.66 0.92

8 200,489 25.18 7.99 1.00

16 105,297 13.69 14.70 0.92

32 56,121 7.74 26.01 0.81

Page 20: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

Sim

ula

tor

Par

alle

lApplication

Test case: two-phase porous media flow problem.

P Total CPU Subgrid CPU PEQ I CPU SEQ

1 4053.33 241x241 3586.98 3.10 440.58

2 2497.43 129 x 241 2241.78 3.48 241.08

4 1244.29 129 x 129 1101.58 2.97 134.28

8 804.47 129 x 69 725.58 3.93 72.76

16 490.47 69 x 69 447.27 4.13 39.64

psv

Tqps

Tsfvst

)(

,0in ))((

0,in 0))((

PEQ:

SEQ:

I: average number of parallel BiCGStab iterations per step

Multigrid V-cycle in subdomain solves

Page 21: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

Sim

ula

tor

Par

alle

lTwo-Phase Porous Media Flow

Simulation result obtained on 16 processors

Page 22: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

Two-Phase Porous Media Flow

Page 23: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

Sim

ula

tor

Par

alle

lApplication

Test case: fully nonlinear 3D water wave problem.

wallssolidon 0

surfaceon water 02/)(

surfaceon water 0

olumein water v 0

222

2

n

gzyxt

zyyxxt

Parallelization based on an existing Diffpack simulator.

Page 24: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

Sim

ula

tor

Par

alle

lPreliminary Results

Fixed number of subdomains M=16. Subdomain grids from partitioning a global 41x41x41 grid. Simulation over 32 time steps. DD as preconditioner of CG for the Laplace eq. Multigrid V-cycle as subdomain solver.

P Execution time Speedup

1 1404.40 N/A

2 715.32 1.96

4 372.79 3.77

8 183.99 7.63

16 90.89 15.45

Page 25: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

Sim

ula

tor

Par

alle

l3D Water Waves

Page 26: Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

Sim

ula

tor

Par

alle

lSummary

High-level parallelization of PDE codes through DD

Introduction of a simulator-parallel model

A generic implementation framework