Upload
alyson
View
42
Download
2
Embed Size (px)
DESCRIPTION
Multilevel Algorithms for Generating Coarse Grids for Multigrid Methods. Irene Moulitsas – George Karypis Department of Computer Science and Engineering Army HPC Research Center University of Minnesota. Work supported by: NSF, DOE, AHPCRC. Outline. Problem Definition - Motivation - PowerPoint PPT Presentation
Citation preview
Multilevel Algorithms for Generating Coarse Grids for Multigrid Methods
Irene Moulitsas – George KarypisDepartment of Computer Science and Engineering
Army HPC Research CenterUniversity of Minnesota
Work supported by: NSF, DOE, AHPCRC
2
Outline
Problem Definition - Motivation
Serial Multilevel Coarse Grid Construction
Parallel Formulation
Experimental Results
Summary
3
Motivation…
Geometric Multigrid Methods
Widely used as they exhibit fast convergence rates, O(n) work for a problem of n unknowns
Level 3
Coarse Grid
Level 2
Level 1
Fine Grid
4
…Motivation…
Structured .vs. Unstructured Grids
For structured grids, there is an optimal way to generate the coarse grid
Most real life problems need unstructured grids Generating a sequence of coarse unstructured grids is far from obvious !
5
…Motivation…
Performance of Geometric Multigrid Methods on Unstructured Grids is highly dependent on the quality of the grids.
Agglomeration Techniques They use the connectivity of the dual graph They start from a vertex of this graph and fuse
together some of its adjacent vertices into a new control volume. This is repeated until all vertices have been fused into
control volumes. The quality of the control volumes can be
optimizedSteve, Lallemand, Dervieux (Computer and Fluids, 1992), Venkatakrishnan, Mavriplis (NASA, 1994)
6
…Motivation
Limitations…
They are serial in nature
Greedy algorithms operate locally
7
Our contribution…
A multilevel approach for coarse grid construction
We formulate it as an optimization problem that Optimizes a particular measure of the overall quality of the
coarse grid. Subject to the constraint that each control volume contains
between Lmin and Lmax elements.
We use the multilevel paradigm to solve this optimization problem
8
Challenges…
Key Issues… How to measure the quality of a coarse grid ? How to use the multilevel paradigm to optimize the
quality of a coarse grid ? How to parallelize our algorithm ?
Design Objectives… Robust Algorithms Highly Accurate Computationally Efficient
9
How to Measure the Quality of the Coarse Grid
Individual Element Aspect Ratios
VS 2
3
A
2D
3D
sl2A
!!!n combinatio F
maxF
F
F
4
...13
12
11
iNCoarsei
NCoarse
iii
NCoarse
ii
A
Aw
A
Control Volume Aspect Ratios
10
How to use the Multilevel Paradigm to Optimize the Quality of the Coarse Grid?
The Multilevel Paradigm For Graph Partitioning
Coarsening P
hase
Initial Partitioning Phase
Ref
inem
ent P
hase
A sequence of coarse graphs is constructed.
A partitioning of the coarsest graph is computed quickly.
The partitioning is successively
projected back to the original graph. At each finer graph,
a refinement algorithm is applied.
METIS
CHACO
JOSTLE
11
Modeling the Grid via a Graph…
Weighted Dual Graph
Vertex weight
Vertex boundary surface
Vertex volume
Edge weight
12
Can our problem be solved using the existing graph partitioning algorithms? Is it an instance of a k-way partitioning problem in
which k=N/Lmin?
No! The objectives of the two problems do no match!
How to use the Multilevel Paradigm to Optimize the Quality of the Coarse Grid?
13
Coarsening Phase
Compute a maximal matching of the vertices.
Collapse together matched vertices.
Ensure edge and vertex attributes of the coarsened graph accurately reflect those of the finer graph.
Random Matching
Globular Matching2
[2] [1]
Original graph
Matching computed
Graph coarsened
14
Refinement phase
Compute the gain of each vertex.
The gain of a vertex is equal to the reduction in the objective function that will result from moving it to a different control volume.
Move the highest gain vertex to the adjacent subdomain subject to control volume size constraints.
Update the gains of each neighboring vertex.
15
Example: Graph Coarsening and Refining
2
2
3
2
2
3
2
2
3
3 2
42
16
Enforcing the Lmin and Lmax constraints
The coarsening and refinement techniques do not guarantee that our size constraints are enforced
A “merge” phase follows the last step of refinement. All undersized control volumes are merged
If there still are undersized elements, a “contribute” phase follows
17
How to Parallelize our Algorithm…
Highly parallel formulations for multilevel graph partitioning already exist (e.g., ParMETIS).
Answer : NO !!!Answer : NO !!!
Number of control volumes is too large !
QuestionQuestionCan we use the basic structure of these Can we use the basic structure of these algorithms to parallelize our coarse algorithms to parallelize our coarse grid construction algorithm?grid construction algorithm?
18
…Parallel Formulation…
k - way partitioning algorithms are designed for problems in which k << n (n is the size of the problem)
These algorithms are highly unscalable if k~n
They cannot ensure the Lmin and Lmax size constraints
We need to find a new parallel formulation
19
Our Parallel Formulation…
Call a graph partitioning algorithm to find a good p-way partitioning of the grid.
The graph is then moved such that each partition becomes local to a single processor.
Each processor finds a good coarse grid for its locally stored elements.
20
…Parallel Implementation…
Features :
Leads to good control volumes for most of the internal regions of the grid.
Embarrassingly parallel procedure. Very Fast
The control volumes of the regions along and near the partition interfaces are not the best.
21
… Parallel Implementation…
We readjust the partitioning boundaries so that interface nodes become well internal
We employ an adaptive repartitioning algorithm from ParMetis for that
22
…Parallel Implementation
This is the overview of the parallel formulation method
23
Experimental Results…
Evaluate the different objective functions
Evaluate the scalability of the parallel formulation
Name #Elements Description
M6 94,493 M6 Wing
F22 428,748 F22 Wing
F16 1,124,648 F16 Wing
Data Sets
24
…Experimental Results…
We tested the quality of the coarse grids in the simulation of an unsteady flow of moving grids arising in aero-elasticity problems, using an edge based multigrid solver (from Daimler Chrysler Aerospace Military Aircraft, Germany)
Data Set
Residual Norm
#Coarse Levels Lmin Lmax
M6
F22
F16
10 5
10 10
10 5
4
4
4
6
6
6
1
1
1
25
Serial Algorithm Evaluation
M6
215
160146 149 156 148
0
50
100
150
200
250
300
350
400
Trad1 Trad2 ML_F1 ML_F2 ML_F3 ML_F3_F2
Iter
atio
ns
F22
181153 155 159 157 160
0
50
100
150
200
250
300
350
400
Trad1 Trad2 ML_F1 ML_F2 ML_F3 ML_F3_F2
Iter
atio
ns
F16
399
358 349 345 349 339
0
50
100
150
200
250
300
350
400
Trad1 Trad2 ML_F1 ML_F2 ML_F3 ML_F3_F2
Iter
atio
ns
We evaluate the performance of our serial algorithm by looking at the number of iterations the multigrid algorithm needs to converge
Trad1Traditional Agglomeration Technique
Trad2Agglomeration Technique based on Aspect Ratios
26
CRAY T3E Performance
M6
103.17
9.95
50.43
22.3
4.38 2.24 1.68 1.06 1.060
20
40
60
80
100
120
2 4 8 16 32 64 128 256 512
number of processors
tim
e (
in s
ec
)
F22
256.13
125.21
3.55
61.06
29.7114.86 7.11 4.52
0
50
100
150
200
250
300
4 8 16 32 64 128 256 512
number of processors
time
(in
se
c)
F16
163.14
90.08
7.1110.7219.15
40.47
0
30
60
90
120
150
180
16 32 64 128 256 512
number of processors
time
(in
se
c)
We measured the actual time our multilevel algorithm needs to produce a coarse grid
27
Linux Cluster Performance
M6
46.14
28.21
14.18
18.64
0
10
20
30
40
50
2 4 8 16
number of processors
time
(in s
ec)
F22
246.66
153.21
74.71 50.55
0
50
100
150
200
250
300
2 4 8 16
number of processors
time
(in s
ec)
We measured the actual time our multilevel algorithm needs to produce a coarse grid
28
Parallel Algorithm Evaluation
M6
146 147 149 150147 146 148 149148 148 150 152146 146 147 152
0
50
100
150
200
250
300
350
p=2 p=4 p=8 p=16
Itera
tions
ML_F1
ML_F2
ML_F3
ML_F3_F2
F22
158 159 160 157158 157 158 156157 156 156 156159 158 161 158
0
50
100
150
200
250
300
350
p=2 p=4 p=8 p=16
Itera
tions
ML_F1
ML_F2
ML_F3
ML_F3_F2
F16
353 352 350 344341 342 344 355339 348 345 354
338 341 349 357
0
50
100
150
200
250
300
350
400
p=2 p=4 p=8 p=16
Itera
tions
ML_F1
ML_F2
ML_F3
ML_F3_F2
We evaluate the performance of our parallel algorithm by looking at the number of iterations the serial multigrid algorithm needs to converge for the set of coarse grids we produced in parallel
29
Quality Measures
M6 F22 F16
#PES F3 F2 F3 F2 F3 F21 24.3 1.82e+06 - - - -2 22.6 1.82e+06 - - - -4 22.5 1.82e+06 27.1 8.29e+06 - -8 22.7 1.82e+06 29.3 8.28e+06 - -16 22.6 1.81e+06 23.1 8.25e+06 22.4 2.02e+0732 22.6 1.80e+06 23.6 8.23e+06 26.4 2.02e+0764 22.6 1.80e+06 24.0 8.21e+06 71.1 2.01e+07128 22.6 1.80e+06 23.1 8.20e+06 22.8 2.01e+07256 23.0 1.79e+06 168 8.19e+06 28.4 2.01e+07512 24.3 1.78e+06 45.7 8.18e+06 35.0 2.01e+07
Quality measures on Cray T3E
30
Summary of Work
New multilevel algorithms for generating coarse grids Coarse grids with well shaped elements
Parallel multilevel algorithms Highly scalable algorithm that creates coarse
grids of the same quality as the serial algorithm
MGridGen / ParMGridGen http://www.cs.umn.edu/~moulitsa/software.html
[email protected], [email protected]
Thank You !
31
Serial Algorithm Evaluation
Trad1 : Traditional Agglomeration Technique
Trad2 : Agglomeration Technique based on Aspect Ratios
M6 F22 F16
Technique #Iterations #Iterations #Iterations
Trad1 215 181 399
Trad2 160 153 358
ML_F1 146 155 349
ML_F2 149 159 345
ML_F3 156 157 349
ML_F3_F2 148 160 339
Convergence of serial multigrid algorithm
32
Parallel Algorithm Evaluation
M6 #Iterations F22 #Iterations F16 #Iterations
P=2 P=4 P=8 P=16 P=2 P=4 P=8 P=16 P=2 P=4 P=8 P=16
F1 146 147 149 150 158 159 160 157 353 352 350 344
F2 147 146 148 149 158 157 158 156 341 342 344 355
F3 148 148 150 152 157 156 156 156 339 348 345 354
F3_F2 146 146 147 152 159 158 161 158 338 341 349 357
Convergence of parallel multigrid algorithm
33
Parallel Algorithm Timings
CRAY T3E BEO
M6 F22 F16 M6 F22
#PES Time Time Time Time Time
1 80.66 - - 32.74 173.63
2 103.17 - - 46.14 246.66
4 50.43 256.13 - 28.21 153.21
8 22.30 125.21 - 14.18 74.71
16 9.95 61.06 163.14 18.64 50.55
32 4.38 29.71 90.08 - -
64 2.24 14.86 40.47 - -
128 1.68 7.11 19.15 - -
256 1.06 4.52 10.72 - -
512 1.06 3.55 7.11 - -
Run Times (in sec)