Multilevel Algorithms for Generating Coarse Grids for Multigrid Methods

Multilevel Algorithms for Generating Coarse Grids for Multigrid Methods

Irene Moulitsas – George KarypisDepartment of Computer Science and Engineering

Army HPC Research CenterUniversity of Minnesota

Work supported by: NSF, DOE, AHPCRC

2

Outline

Problem Definition - Motivation

Serial Multilevel Coarse Grid Construction

Parallel Formulation

Experimental Results

Summary

3

Motivation…

Geometric Multigrid Methods

Widely used as they exhibit fast convergence rates, O(n) work for a problem of n unknowns

Level 3

Coarse Grid

Level 2

Level 1

Fine Grid

4

…Motivation…

Structured .vs. Unstructured Grids

For structured grids, there is an optimal way to generate the coarse grid

Most real life problems need unstructured grids Generating a sequence of coarse unstructured grids is far from obvious !

5

…Motivation…

Performance of Geometric Multigrid Methods on Unstructured Grids is highly dependent on the quality of the grids.

Agglomeration Techniques They use the connectivity of the dual graph They start from a vertex of this graph and fuse

together some of its adjacent vertices into a new control volume. This is repeated until all vertices have been fused into

control volumes. The quality of the control volumes can be

optimizedSteve, Lallemand, Dervieux (Computer and Fluids, 1992), Venkatakrishnan, Mavriplis (NASA, 1994)

6

…Motivation

Limitations…

They are serial in nature

Greedy algorithms operate locally

7

Our contribution…

A multilevel approach for coarse grid construction

We formulate it as an optimization problem that Optimizes a particular measure of the overall quality of the

coarse grid. Subject to the constraint that each control volume contains

between Lmin and Lmax elements.

We use the multilevel paradigm to solve this optimization problem

8

Challenges…

Key Issues… How to measure the quality of a coarse grid ? How to use the multilevel paradigm to optimize the

quality of a coarse grid ? How to parallelize our algorithm ?

Design Objectives… Robust Algorithms Highly Accurate Computationally Efficient

9

How to Measure the Quality of the Coarse Grid

Individual Element Aspect Ratios

VS 2

3

A

2D

3D

sl2A

!!!n combinatio F

maxF

F

F

4

...13

12

11

iNCoarsei

NCoarse

iii

NCoarse

ii

A

Aw

A

Control Volume Aspect Ratios

10

How to use the Multilevel Paradigm to Optimize the Quality of the Coarse Grid?

The Multilevel Paradigm For Graph Partitioning

Coarsening P

hase

Initial Partitioning Phase

Ref

inem

ent P

hase

A sequence of coarse graphs is constructed.

A partitioning of the coarsest graph is computed quickly.

The partitioning is successively

projected back to the original graph. At each finer graph,

a refinement algorithm is applied.

METIS

CHACO

JOSTLE

11

Modeling the Grid via a Graph…

Weighted Dual Graph

Vertex weight

Vertex boundary surface

Vertex volume

Edge weight

12

Can our problem be solved using the existing graph partitioning algorithms? Is it an instance of a k-way partitioning problem in

which k=N/Lmin?

No! The objectives of the two problems do no match!

How to use the Multilevel Paradigm to Optimize the Quality of the Coarse Grid?

13

Coarsening Phase

Compute a maximal matching of the vertices.

Collapse together matched vertices.

Ensure edge and vertex attributes of the coarsened graph accurately reflect those of the finer graph.

Random Matching

Globular Matching2

[2] [1]

Original graph

Matching computed

Graph coarsened

14

Refinement phase

Compute the gain of each vertex.

The gain of a vertex is equal to the reduction in the objective function that will result from moving it to a different control volume.

Move the highest gain vertex to the adjacent subdomain subject to control volume size constraints.

Update the gains of each neighboring vertex.

15

Example: Graph Coarsening and Refining

2

2

3

2

2

3

2

2

3

3 2

42

16

Enforcing the Lmin and Lmax constraints

The coarsening and refinement techniques do not guarantee that our size constraints are enforced

A “merge” phase follows the last step of refinement. All undersized control volumes are merged

If there still are undersized elements, a “contribute” phase follows

17

How to Parallelize our Algorithm…

Highly parallel formulations for multilevel graph partitioning already exist (e.g., ParMETIS).

Answer : NO !!!Answer : NO !!!

Number of control volumes is too large !

QuestionQuestionCan we use the basic structure of these Can we use the basic structure of these algorithms to parallelize our coarse algorithms to parallelize our coarse grid construction algorithm?grid construction algorithm?

18

…Parallel Formulation…

k - way partitioning algorithms are designed for problems in which k << n (n is the size of the problem)

These algorithms are highly unscalable if k~n

They cannot ensure the Lmin and Lmax size constraints

We need to find a new parallel formulation

19

Our Parallel Formulation…

Call a graph partitioning algorithm to find a good p-way partitioning of the grid.

The graph is then moved such that each partition becomes local to a single processor.

Each processor finds a good coarse grid for its locally stored elements.

20

…Parallel Implementation…

Features :

Leads to good control volumes for most of the internal regions of the grid.

Embarrassingly parallel procedure. Very Fast

The control volumes of the regions along and near the partition interfaces are not the best.

21

… Parallel Implementation…

We readjust the partitioning boundaries so that interface nodes become well internal

We employ an adaptive repartitioning algorithm from ParMetis for that

22

…Parallel Implementation

This is the overview of the parallel formulation method

23

Experimental Results…

Evaluate the different objective functions

Evaluate the scalability of the parallel formulation

Name #Elements Description

M6 94,493 M6 Wing

F22 428,748 F22 Wing

F16 1,124,648 F16 Wing

Data Sets

24

…Experimental Results…

We tested the quality of the coarse grids in the simulation of an unsteady flow of moving grids arising in aero-elasticity problems, using an edge based multigrid solver (from Daimler Chrysler Aerospace Military Aircraft, Germany)

Data Set

Residual Norm

#Coarse Levels Lmin Lmax

M6

F22

F16

10 5

10 10

10 5

4

4

4

6

6

6

1

1

1

25

Serial Algorithm Evaluation

M6

215

160146 149 156 148

0

50

100

150

200

250

300

350

400

Trad1 Trad2 ML_F1 ML_F2 ML_F3 ML_F3_F2

Iter

atio

ns

F22

181153 155 159 157 160

0

50

100

150

200

250

300

350

400


Iter

atio

ns

F16

399

358 349 345 349 339

0

50

100

150

200

250

300

350

400


Iter

atio

ns

We evaluate the performance of our serial algorithm by looking at the number of iterations the multigrid algorithm needs to converge

Trad1Traditional Agglomeration Technique

Trad2Agglomeration Technique based on Aspect Ratios

26

CRAY T3E Performance

M6

103.17

9.95

50.43

22.3

4.38 2.24 1.68 1.06 1.060

20

40

60

80

100

120

2 4 8 16 32 64 128 256 512

number of processors

tim

e (

in s

ec

)

F22

256.13

125.21

3.55

61.06

29.7114.86 7.11 4.52

0

50

100

150

200

250

300

4 8 16 32 64 128 256 512


time

(in

se

c)

F16

163.14

90.08

7.1110.7219.15

40.47

0

30

60

90

120

150

180

16 32 64 128 256 512


time

(in

se

c)

We measured the actual time our multilevel algorithm needs to produce a coarse grid

27

Linux Cluster Performance

M6

46.14

28.21

14.18

18.64

0

10

20

30

40

50

2 4 8 16


time

(in s

ec)

F22

246.66

153.21

74.71 50.55

0

50

100

150

200

250

300

2 4 8 16


time

(in s

ec)

We measured the actual time our multilevel algorithm needs to produce a coarse grid

28

Parallel Algorithm Evaluation

M6

146 147 149 150147 146 148 149148 148 150 152146 146 147 152

0

50

100

150

200

250

300

350

p=2 p=4 p=8 p=16

Itera

tions

ML_F1

ML_F2

ML_F3

ML_F3_F2

F22

158 159 160 157158 157 158 156157 156 156 156159 158 161 158

0

50

100

150

200

250

300

350

p=2 p=4 p=8 p=16

Itera

tions

ML_F1

ML_F2

ML_F3

ML_F3_F2

F16

353 352 350 344341 342 344 355339 348 345 354

338 341 349 357

0

50

100

150

200

250

300

350

400

p=2 p=4 p=8 p=16

Itera

tions

ML_F1

ML_F2

ML_F3

ML_F3_F2

We evaluate the performance of our parallel algorithm by looking at the number of iterations the serial multigrid algorithm needs to converge for the set of coarse grids we produced in parallel

29

Quality Measures

M6 F22 F16

#PES F3 F2 F3 F2 F3 F21 24.3 1.82e+06 - - - -2 22.6 1.82e+06 - - - -4 22.5 1.82e+06 27.1 8.29e+06 - -8 22.7 1.82e+06 29.3 8.28e+06 - -16 22.6 1.81e+06 23.1 8.25e+06 22.4 2.02e+0732 22.6 1.80e+06 23.6 8.23e+06 26.4 2.02e+0764 22.6 1.80e+06 24.0 8.21e+06 71.1 2.01e+07128 22.6 1.80e+06 23.1 8.20e+06 22.8 2.01e+07256 23.0 1.79e+06 168 8.19e+06 28.4 2.01e+07512 24.3 1.78e+06 45.7 8.18e+06 35.0 2.01e+07

Quality measures on Cray T3E

30

Summary of Work

New multilevel algorithms for generating coarse grids Coarse grids with well shaped elements

Parallel multilevel algorithms Highly scalable algorithm that creates coarse

grids of the same quality as the serial algorithm

MGridGen / ParMGridGen http://www.cs.umn.edu/~moulitsa/software.html

[email protected], [email protected]

Thank You !

31

Serial Algorithm Evaluation

Trad1 : Traditional Agglomeration Technique

Trad2 : Agglomeration Technique based on Aspect Ratios

M6 F22 F16

Technique #Iterations #Iterations #Iterations

Trad1 215 181 399

Trad2 160 153 358

ML_F1 146 155 349

ML_F2 149 159 345

ML_F3 156 157 349

ML_F3_F2 148 160 339

Convergence of serial multigrid algorithm

32

Parallel Algorithm Evaluation

M6 #Iterations F22 #Iterations F16 #Iterations

P=2 P=4 P=8 P=16 P=2 P=4 P=8 P=16 P=2 P=4 P=8 P=16

F1 146 147 149 150 158 159 160 157 353 352 350 344

F2 147 146 148 149 158 157 158 156 341 342 344 355

F3 148 148 150 152 157 156 156 156 339 348 345 354

F3_F2 146 146 147 152 159 158 161 158 338 341 349 357

Convergence of parallel multigrid algorithm

33

Parallel Algorithm Timings

CRAY T3E BEO

M6 F22 F16 M6 F22

#PES Time Time Time Time Time

1 80.66 - - 32.74 173.63

2 103.17 - - 46.14 246.66

4 50.43 256.13 - 28.21 153.21

8 22.30 125.21 - 14.18 74.71

16 9.95 61.06 163.14 18.64 50.55

32 4.38 29.71 90.08 - -

64 2.24 14.86 40.47 - -

128 1.68 7.11 19.15 - -

256 1.06 4.52 10.72 - -

512 1.06 3.55 7.11 - -

Run Times (in sec)

Documents

Multilevel Algorithms for Generating Coarse Grids for Multigrid Methods