Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
A Comparison of Adaptive Algebraic
Multigrid and Luscher’s Inexact Deflation
Andreas Frommer, Karsten Kahl, Stefan Krieg, Bjorn Lederand Matthias Rottmann
Bergische Universitat Wuppertal
April 11, 2013
Ingredients Methods Numerical Results Conclusion & Outlook
Outline
IngredientsSAPAggregation-based InterpolationInverse Iteration Setup
MethodsDD-αAMGInexact Deflation
Numerical ResultsComparison without Low Level OptimizationComparison with Low Level OptimizationComparison for gauge fields with more noiseFurther Numerical Results
Conclusion & Outlook
M. Rottmann et al., AMG vs. Inexact Deflation 1/17
Ingredients Methods Numerical Results Conclusion & Outlook
SAP: Schwarz Alternating Procedure [1,2]
Two colordecomposition of L
L2 L3
L1 L4
I canonical inductionsILi
: Li → L
I block restrictionsDLi
= I†LiDILi
I block inversesBLi
= ILiD−1LiI†Li
Algorithm 1 SAP
1: in: ψ, η, ν – out: ψ2: for k = 1 to ν do3: r ← η −Dψ4: for all green Li do5: ψ ← ψ +BLi
r6: end for7: r ← η −Dψ8: for all white Li do9: ψ ← ψ +BLi
r10: end for11: end for
[1] Hermann Schwarz 1870[2] Martin Luscher 2003
M. Rottmann et al., AMG vs. Inexact Deflation 2/17
Ingredients Methods Numerical Results Conclusion & Outlook
Coarse Grid via Aggregation [Luscher 2007], [Clark et al. 2010]
Coarse Grid Correction Requires:I coarse grid interpolation I coarse grid operator
Aggregates: block of fine grid points → one coarse grid point
A2
A1
A4
A3
P
R = P †
P : interpolation R: restriction
Note:
I DD-αAMG preserves γ5 structure [Clark et al. 2010]
I Inexact deflation does not [Luscher 2007]
M. Rottmann et al., AMG vs. Inexact Deflation 3/17
Ingredients Methods Numerical Results Conclusion & Outlook
Aggregation-based Interpolation
Speeding up respective method: range(P ) shouldcontain small EVs
Construction:
I getting test vectors v1, ..., vN (representing small EVs)
I decomposing test vectors over aggregates A1, ...,As
(v1, . . . , vN) = =A2
A1
As
→ P =
A1
A2
As
Coarse grid operator: Dc = P †DP
M. Rottmann et al., AMG vs. Inexact Deflation 4/17
Ingredients Methods Numerical Results Conclusion & Outlook
Setup Procedure: How to Obtain Sufficient Test Vectors
Inverse iteration with the method itself [Luscher 2007]
yes
no
build P and Dc
stop
satisfied?
η timessmooth all vi
test vectors {v1, . . . , vn}start with random
systems Dψi = vi
apply method itself to all
set all vi = ψi/||ψi||
M. Rottmann et al., AMG vs. Inexact Deflation 5/17
Ingredients Methods Numerical Results Conclusion & Outlook
DD-αAMG [Frommer, Kahl, Krieg, Leder & Rottmann 2013, arXiv:1303.1377]
I Aggregation-based AMGfor QCD works[Clark et al. 2010]
I Schwarz has good“smoothing properties”
0 500 1000 1500 2000 2500 3000 35000
0.2
0.4
0.6
0.8
1
1.2
1.4
||(I
−M
−1 D
) v
i||2
eigen vectors vi
red−black Schwarz
→ use as smoother in AMG
DD-αAMG Approach
I Corase grid correction andsmoother as complementarycomponents
I Solve Dψ = η via FMGRESright-preconditioned withtwo grid V-cycle
I Two grid V-cycle correctiononly with post smoothingwk = MSAPPD
−1c P †wk
(wk the k-th Arnoldi vector)
I approximating D−1c with
low accuracy is sufficient
M. Rottmann et al., AMG vs. Inexact Deflation 6/17
Ingredients Methods Numerical Results Conclusion & Outlook
The Inexact Deflation Method [Luscher 2007]
Let πL = 1−DPD−1c P † and πR = 1− PD−1
c P †D.Using these projections and DπR = πLD, we decomposeDψ = η into
DπRψ = πLη and (1)
D(1− πR)ψ = (1− πL)η. (2)
(2) is simplified to (1− πR)ψ = PD−1c P †η.
(1) is solved via GCR preconditioned with SAP.
Note:
Occurrence of D−1c in (1) and (2) yields that
||η −Dψ|| < tol requires ||ηc −Dcψc|| < tol .
M. Rottmann et al., AMG vs. Inexact Deflation 7/17
Ingredients Methods Numerical Results Conclusion & Outlook
Implementation Details and Parameter Tuning
Inexact deflation DD-αAMG
SAP block solver MINRES+odd-even MINREScoarse solver GCR+odd-even+deflation GMRES+odd-even
SSE yes no
Table of parameters ((∗) : same in solver and setup)
parameter DD-αMG Inexact Deflation
setup number of iterations ninv 6number of test vectors N 20size of lattice-blocks for aggregates size(Ai) 44
coarse system relative residual tolerance ctol 5 · 10−2 10−12
(stopping criterion for the coarse system)
solver restart length of FGMRES/GCR nkv 25relative residual tolerance (stopping criterion) tol 10−10
SAP number of SAP iterations(∗) ν 5size of lattice-blocks in SAP(∗) size(Li) 24 44
number of Minimal Residual (MR) iterations tosolve the local systems in SAP(∗) nmr 3 4
M. Rottmann et al., AMG vs. Inexact Deflation 8/17
Ingredients Methods Numerical Results Conclusion & Outlook
Comparison without Low Level Optimization
484 lattice [BMW-c], mπ = 136 MeV, 2,592 cores
Inexact deflation DD-αAMG
setup setup iteration solver setup iteration solversteps ninv timing count (coarse) timing timing count (coarse) timing
1 2.03s 233 (82) 18.3s 1.99s 350 (12) 19.7s2 3.19s 155 (145) 14.7s 3.17s 120 (29) 8.05s3 4.45s 108 (224) 12.1s 4.58s 52 (54) 4.43s4 5.88s 84 (301) 10.5s 6.95s 32 (74) 3.28s5 7.71s 70 (320) 8.86s 10.3s 25 (81) 2.60s6 9.22s 63 (282) 7.30s 14.2s 23 (86) 2.50s7 10.6s 58 (277) 6.53s 18.3s 22 (100) 2.62s8 12.3s 54 (267) 6.07s 22.7s 22 (102) 2.61s9 14.1s 51 (263) 5.53s 24.9s 21 (116) 2.70s10 15.9s 49 (265) 5.44s 27.6s 21 (127) 2.83s11 17.5s 50 (266) 5.48s 30.6s 22 (129) 3.06s12 19.5s 53 (254) 5.72s 33.8s 23 (131) 3.27s
methods tuned equally, except size(Li) = 44, nmr = 4 and ctol = 10−12 for inexact deflation
both codes compiled only with icc -O3
ninv –
N 20
size(Ai) 44
ctol 5 · 10−2
nkv 25
tol 10−10
ν 5
size(Li) 24
nmr 3
M. Rottmann et al., AMG vs. Inexact Deflation 9/17
Ingredients Methods Numerical Results Conclusion & Outlook
Comparison without Low Level Optimization
484 lattice [BMW-c], mπ = 136 MeV, 2,592 cores
Inexact deflation DD-αAMG
setup setup iteration solver setup iteration solversteps ninv timing count (coarse) timing timing count (coarse) timing
1 2.03s 233 (82) 18.3s 1.99s 350 (12) 19.7s2 3.19s 155 (145) 14.7s 3.17s 120 (29) 8.05s3 4.45s 108 (224) 12.1s 4.58s 52 (54) 4.43s4 5.88s 84 (301) 10.5s 6.95s 32 (74) 3.28s5 7.71s 70 (320) 8.86s 10.3s 25 (81) 2.60s6 9.22s 63 (282) 7.30s 14.2s 23 (86) 2.50s7 10.6s 58 (277) 6.53s 18.3s 22 (100) 2.62s8 12.3s 54 (267) 6.07s 22.7s 22 (102) 2.61s9 14.1s 51 (263) 5.53s 24.9s 21 (116) 2.70s10 15.9s 49 (265) 5.44s 27.6s 21 (127) 2.83s11 17.5s 50 (266) 5.48s 30.6s 22 (129) 3.06s12 19.5s 53 (254) 5.72s 33.8s 23 (131) 3.27s
methods tuned equally, except size(Li) = 44, nmr = 4 and ctol = 10−12 for inexact deflation
both codes compiled only with icc -O3
ninv –
N 20
size(Ai) 44
ctol 5 · 10−2
nkv 25
tol 10−10
ν 5
size(Li) 24
nmr 3
M. Rottmann et al., AMG vs. Inexact Deflation 9/17
Ingredients Methods Numerical Results Conclusion & Outlook
Comparison with Low Level Optimization
484 lattice [BMW-c], mπ = 136 MeV, 2,592 cores
Inexact deflation DD-αAMG
setup setup iteration solver setup iteration solversteps ninv timing count (coarse) timing timing count (coarse) timing
1 1.01s 233 (82) 10.1s 1.78s 350 (12) 19.1s2 1.87s 155 (145) 10.2s 2.76s 122 (29) 7.66s3 2.69s 108 (224) 9.96s 4.33s 51 (55) 4.45s4 3.43s 84 (301) 9.25s 6.47s 31 (73) 2.69s5 6.14s 70 (320) 7.50s 9.02s 25 (80) 2.54s6 5.68s 63 (282) 5.21s 13.5s 23 (86) 2.49s7 6.93s 58 (277) 4.67s 16.6s 22 (96) 2.23s8 7.71s 54 (267) 4.12s 20.5s 22 (108) 2.35s9 8.74s 51 (263) 3.89s 21.7s 21 (118) 2.62s10 10.1s 49 (265) 3.62s 25.2s 21 (126) 2.77s11 11.3s 50 (266) 3.77s 28.7s 22 (129) 3.08s12 12.6s 53 (254) 4.13s 32.5s 22 (132) 2.69s
methods tuned equally, except size(Li) = 44, nmr = 4 and ctol = 10−12 for inexact deflation
both codes compiled with respectively best compiler settings
ninv –
N 20
size(Ai) 44
ctol 5 · 10−2
nkv 25
tol 10−10
ν 5
size(Li) 24
nmr 3
M. Rottmann et al., AMG vs. Inexact Deflation 10/17
Ingredients Methods Numerical Results Conclusion & Outlook
Comparison with Low Level Optimization
484 lattice [BMW-c], mπ = 136 MeV, 2,592 cores
Inexact deflation DD-αAMG
setup setup iteration solver setup iteration solversteps ninv timing count (coarse) timing timing count (coarse) timing
1 1.01s 233 (82) 10.1s 1.78s 350 (12) 19.1s2 1.87s 155 (145) 10.2s 2.76s 122 (29) 7.66s3 2.69s 108 (224) 9.96s 4.33s 51 (55) 4.45s4 3.43s 84 (301) 9.25s 6.47s 31 (73) 2.69s5 6.14s 70 (320) 7.50s 9.02s 25 (80) 2.54s6 5.68s 63 (282) 5.21s 13.5s 23 (86) 2.49s7 6.93s 58 (277) 4.67s 16.6s 22 (96) 2.23s8 7.71s 54 (267) 4.12s 20.5s 22 (108) 2.35s9 8.74s 51 (263) 3.89s 21.7s 21 (118) 2.62s10 10.1s 49 (265) 3.62s 25.2s 21 (126) 2.77s11 11.3s 50 (266) 3.77s 28.7s 22 (129) 3.08s12 12.6s 53 (254) 4.13s 32.5s 22 (132) 2.69s
methods tuned equally, except size(Li) = 44, nmr = 4 and ctol = 10−12 for inexact deflation
both codes compiled with respectively best compiler settings
ninv –
N 20
size(Ai) 44
ctol 5 · 10−2
nkv 25
tol 10−10
ν 5
size(Li) 24
nmr 3
M. Rottmann et al., AMG vs. Inexact Deflation 10/17
Ingredients Methods Numerical Results Conclusion & Outlook
Comparison for a Non-Smeared Configuration
I BMW-c gauge fields are smeared and tree level clover improved
I for completeness we show results for a non-smearedclover improved CLS configuration
128× 643 lattice [CLS], mπ = 270 MeV, 8,192 cores
Inexact deflation DD-αAMG speed up factor
setup iter 6 4setup time 22.3s 14.8s 1.51
solve iter 37 40solve time 16.6s 10.1s 1.64total time 38.9s 24.9s 1.56
methods tuned equally, except size(Li) = 44, nmr = 4 and ctol = 10−12
for inexact deflation, both codes compiled only with icc -O3
ninv 4
N 20
size(Ai) 44
ctol 5 · 10−2
nkv 25
tol 10−10
ν 5
size(Li) 24
nmr 3
M. Rottmann et al., AMG vs. Inexact Deflation 11/17
Ingredients Methods Numerical Results Conclusion & Outlook
Further Numerical Results
Mass scaling behavior, 484 lattice [BMW-c], 2,592 cores
CGNR DD-αAMG coarse system
m iteration solver iteration solver �iteration timingcount timing count timing count (% solve time)
−0.03933 1,597 14.1 15 0.83 10 0.13 (15.5)−0.04933 1,937 17.2 16 0.89 11 0.15 (16.6)−0.05933 2,454 21.8 17 0.95 13 0.18 (18.9)−0.06933 3,320 29.4 18 1.04 16 0.22 (21.1)−0.07933 5,102 45.3 18 1.13 20 0.29 (25.1)−0.08933 10,294 91.5 20 1.44 31 0.50 (34.8)−0.09033 11,305 100.3 20 1.47 33 0.53 (35.8)−0.09133 12,527 111.2 20 1.43 36 0.53 (37.1)−0.09233 14,009 124.4 20 1.48 38 0.57 (38.4)−0.09333 15,869 141.3 21 1.68 41 0.67 (39.7)−0.09433 18,608 165.5 21 1.68 45 0.71 (42.2)−0.09533 22,580 201.2 21 1.70 49 0.75 (43.9)−0.09633 27,434 244.4 21 1.79 54 0.82 (45.7)−0.09733 33,276 296.5 22 2.15 63 1.08 (50.4)−0.09833 42,067 373.7 22 2.30 74 1.24 (53.9)−0.09933 53,932 480.4 23 2.60 86 1.49 (57.4)
ninv 6
N 20
size(Ai) 44
ctol 5 · 10−2
nkv 25
tol 10−10
ν 5
size(Li) 24
nmr 3
M. Rottmann et al., AMG vs. Inexact Deflation 12/17
Ingredients Methods Numerical Results Conclusion & Outlook
Further Numerical Results
Mass scaling study for the error e (where e = ψtrue − ψcurrent)
10−14
10−12
10−10
10−8
10−6
10−4
−0.04 −0.05 −0.06 −0.07 −0.08 −0.09 −0.1
norm
ofth
eer
ror,
tol=
10−
10
m0
CGNRDD-αAMG
484 lattice [BMW-c], 2,592 cores
M. Rottmann et al., AMG vs. Inexact Deflation 13/17
Ingredients Methods Numerical Results Conclusion & Outlook
Further Numerical Results
Weak scaling study
0
5
10
15
20
101 102 103 104
tim
e(s
ec)
number of processes
DD-αAMG-setup(3,5)DD-αAMG(100,5)
lattice sizes ranging from 162 × 82 to 644, 4× 83 fixed local lattice size per process
M. Rottmann et al., AMG vs. Inexact Deflation 14/17
Ingredients Methods Numerical Results Conclusion & Outlook
Further Numerical Results
three different lattice sizes [BMW-c], mπ = 250 MeV, local lattice 8× 43
CGNR DD-αAMG
lattice size iteration solver setup iteration solverNt ×N3
s count timing timing count timing
48× 163 7,055 55.9s 4.14 22 1.3248× 243 11,664 96.2s 4.22 26 1.6548× 323 15,872 131.9s 4.33 30 1.99
484 lattice [BMW-c], mπ = 136 MeV, 6 stochastically independent gauge fields
DD-αAMG iteration counts
ninv conf 1 conf 2 conf 3 conf 4 conf 5 conf 6
3 54 58 56 55 54 656 23 24 24 24 23 23
M. Rottmann et al., AMG vs. Inexact Deflation 15/17
Ingredients Methods Numerical Results Conclusion & Outlook
Conclusion & Outlook
Conclusion:
⊕ For seen cases DD-αAMG outperforms inexact deflationI For smeared and non-smeared configurationsI In solver timingI Setup timingI Solver iteration countI Setup iteration count
⊕ DD-αAMG shows potential for additional speed upI More levelsI Low level optimization
M. Rottmann et al., AMG vs. Inexact Deflation 16/17
Ingredients Methods Numerical Results Conclusion & Outlook
Conclusion & Outlook
Outlook:
I implement additional levels
I optimized versions in simulation codes of ourcollaborators (within SFB TRR 55)
Acknowledgments:
I All results computed on Juropa at Julich SupercomputingCentre (JSC)
I BMW-c: configurations [arXiv:1011.2403,1011.2711]
I Work funded by Deutsche Forschungsgemeinschaft(DFG), Transregional Collaborative Research Centre 55(SFB TRR 55)
M. Rottmann et al., AMG vs. Inexact Deflation 17/17