24
Hans De Sterck Department of Applied Mathematics University of Colorado at Boulder Ulrike Meier Yang Center for Applied Scientific Computing Lawrence Livermore National Laboratory Reducing Complexity in Algebraic Multigrid

Hans De Sterck Department of Applied Mathematics University of Colorado at Boulder Ulrike Meier Yang Center for Applied Scientific Computing Lawrence Livermore

Embed Size (px)

Citation preview

Hans De SterckDepartment of Applied Mathematics

University of Colorado at Boulder

Ulrike Meier YangCenter for Applied Scientific ComputingLawrence Livermore National Laboratory

Reducing Complexity in Algebraic Multigrid

2Copper 2004

Outline

introduction: AMG

complexity growth when using classical coarsenings

Parallel Modified Independent Set (PMIS) coarsening

scaling results

conclusions and future work

3Copper 2004

Introduction

solve

from 3D PDE – sparse!

large problems (109 dof) - parallel

unstructured grid problems

fAu =

A

4Copper 2004

Algebraic Multigrid (AMG)

multi-level iterative algebraic: suitable forunstructured!

5Copper 2004

— Select coarse “grids” — Define interpolation, — Define restriction and coarse-grid operators

AMG building blocks

Setup Phase:

Solve Phasemm(m) fuA =Relaxmm(m) fuA =Relax

1m(m)m ePe +=eInterpolat

m(m)mm uAfr −=Compute mmm euu +←Correct

1m1m1)(m reA +++ =

Solve

m(m)1m rRr =+Restrict

1,2,...m ,P (m) =

(m)(m)(m)T 1)(m(m)T(m) PAPA PR == +

6Copper 2004

AMG complexity - scalability

Operator complexity Cop=

e.g., 3D: Cop = 1 + 1/8 + 1/64 + … < 8 / 7

measure of memory use, and work in solve phase scalable algorithm:

O(n) operations per V-cycle (Cop bounded)

AND number of V-cycles independent of n

(AMG independent of n)

)(

)(

0Anonzeros

Anonzerosi

i∑

7Copper 2004

AMG Interpolation

after relaxation:

Ae 0 (relative to e)

heuristic: error after interpolation should also satisfy this relation approximately

derive interpolation from:

Fieaeaea jFj

ijjCj

ijiii ∈∀=++ ∑∑∈∈

0

8Copper 2004

AMG interpolation

≤θ<,}−{θ≥− aa kiikji 10xam≠

“large” aij should be taken into account accurately

“strong connections”: i strongly depends on j (and j strongly influences i ) if

with strong threshold θ

Fieaeaea jFj

ijjCj

ijiii ∈∀=++ ∑∑∈∈

0

9Copper 2004

AMG coarsening

(C1) Maximal Independent Set:

Independent: no two C-points are connected

Maximal: if one more C-point is added, the independence is lost

(C2) All F-F connections require connections to a common C-point (for good interpolation)

F-points have to be changed into C-points, to ensure (C2); (C1) is violated

more C-points, higher complexity

10Copper 2004

Classical Coarsenings

1. Ruge-Stueben (RS)— two passes: 2nd pass to ensure that F-F

have common C— disadvantage: highly sequential

2. CLJP— based on parallel independent set

algorithms developed by Luby and later by Jones & Plassman

— also ensures that F-F have common C3. hybrid RS-CLJP (“Falgout”)

— RS in processor interiors, CLJP at interprocessor boundaries

11Copper 2004

Classical coarsenings: complexity growth

proc dof Cop max avg nonzeros per row

tsetup tsolve Iter AMG

1 323 3.76 161 (5/8) 2.91 1.81 5 0.035

8 643 4.69 463 (6/11) 8.32 4.37 8 0.098

64 1283 5.72 1223 (7/16) 36.28 8.59 10 0.172

example: hybrid RS-CLJP (Falgout), 7-point finite difference Laplacian in 3D, θ= 0.25

increased memory use, long solution times, long setup times loss of scalability

12Copper 2004

our approach to reduce complexity

do not add C points for strong F-F connections

that do not have a common C point

less C points, reduced complexity, but worse convergence factors expected

can something be gained?

13Copper 2004

PMIS coarsening (De Sterck, Yang)

Parallel Modified Independent Set (PMIS)

do not enforce condition (C2)

weighted independent set algorithm:

points i that influence many equations (i

large), are good candidates for C-points

add random number between 0 and 1 to i

to break ties

14Copper 2004

pick C-points with maximal measures (like in CLJP), then make all their neighbors fine (like in RS)

proceed until all points are either coarse or fine

PMIS coarsening

15Copper 2004

PMIS select 1

select C-pts with maximal measure locally

make neighbor F-pts

remove neighbor edges

3.7 5.3 5.0 5.9 5.4 5.3 3.4

5.2 8.0 8.5 8.2 8.6 8.9 5.1

5.9 8.1 8.8 8.9 8.4 8.2 5.9

5.7 8.6 8.3 8.8 8.3 8.1 5.0

5.3 8.7 8.3 8.4 8.3 8.8 5.9

5.0 8.8 8.5 8.6 8.7 8.9 5.3

3.2 5.6 5.8 5.6 5.9 5.9 3.0

16Copper 2004

PMIS: remove and update 1

select C-pts with maximal measure locally

make neighbors F-pts

remove neighbor edges

3.7 5.3 5.0 5.9

5.2 8.0

5.9 8.1

5.7 8.6 8.1 5.0

8.4

8.6

5.6

17Copper 2004

PMIS: select 2

select C-pts with maximal measure locally

make neighbors F-pts

remove neighbor edges

3.7 5.3 5.0 5.9

5.2 8.0

5.9 8.1

5.7 8.1 5.0

8.4

8.6

5.6

8.6

18Copper 2004

PMIS: remove and update 2

select C-pts with maximal measure locally

make neighbors F-pts

remove neighbor edges

3.7 5.3

5.2 8.0

19Copper 2004

PMIS: final grid

select C-pts with maximal measure locally

make neighbor F-pts

remove neighbor edges

20Copper 2004

Preliminary results: 7pt 3D Laplacian on an un-structured grid (n = 76,527), serial,

θ=0.5, GS

with GMRES(10)

Coarsening Cop # its Time # its Time

RS 4.43 12 5.74 8 5.37

CLJP 3.83 12 5.39 8 5.15

PMIS 1.56 58 8.59 17 4.53

Implementation in CASC/LLNL’s Hypre/BoomerAMG library (Falgout, Yang, Henson, Jones, …)

21Copper 2004

PMIS results: 27-point finite element Laplacian in 3D, 403 dof per proc (IBM Blue)

Falgout (θ=0.5) and PMIS-GMRES(10) (θ=0.25)

proc Cop max nz levels tsetup tsolve Iter ttotal

1 1.62 124 12 8.39 6.48 7 14.87

8 2.19 446 16 16.65 11.20 8 27.85

64 2.43 697 17 37.95 16.89 9 54.84

256 2.51 804 19 48.36 18.42 9 66.78

384 2.55 857 19 57.70 21.66 10 79.36

1 1.11 36 6 4.60 8.42 10 13.02

8 1.11 42 7 5.81 13.27 13 19.08

64 1.11 48 8 6.84 16.83 15 23.67

256 1.12 51 9 7.23 18.50 16 25.73

384 1.12 52 9 7.56 19.62 17 27.18

22Copper 2004

PMIS results: 7-point finite difference Laplacian in 3D, 403 dof per proc (IBM Blue)

Falgout (θ=0.5) and PMIS-GMRES(10) (θ=0.25)

proc Cop max nz Levels tsetup tsolve Iter ttotal

1 3.62 104 11 4.91 4.02 6 8.99

8 4.45 292 14 10.52 6.95 7 17.47

64 5.07 450 16 21.82 11.89 9 33.71

256 5.28 521 19 26.72 13.18 9 39.90

384 5.35 549 19 30.41 14.13 10 44.54

1 2.32 56 7 2.28 8.05 13 10.36

8 2.37 65 8 3.78 13.30 17 17.08

64 2.39 71 9 5.19 19.44 21 24.63

256 2.40 73 10 5.59 20.47 22 26.06

384 2.41 74 10 5.97 22.56 24 28.53

23Copper 2004

Conclusions

PMIS leads to reduced, scalable complexities for large problems on parallel computers

using PMIS-GMRES, large problems can be done efficiently, with good scalability, and without requiring much memory (Blue Gene/L)

24Copper 2004

Future work

parallel aggressive coarsening, multi-pass interpolation to further reduce complexity (using one-pass RS, PMIS, or hybrid)

improved interpolation formulas for more aggressively coarsened grids (Jacobi improvement, …), to reduce need for GMRES

parallel First-Order System Least-Squares-AMG code for large-scale PDE problems Blue Gene/L applications