Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison...

Preview:

Citation preview

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Peta-Scale Multigrid for the Stokes System withFinite Elements

9th International Conference on Large-Scale ScientificComputations

Bjorn Gmeiner, Ulrich Rude

June, 2013

Department for Computer Science 10 (System Simulation)

2

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Contents

Hierarchical Hybrid Grids

Comparison on Different HPC-Clusters

Stokes Solver within HHG

Department for Computer Science 10 (System Simulation)

3

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Hierarchical Hybrid Grids

Department for Computer Science 10 (System Simulation)

4

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

HHG - Combining finite element and multigrid methods

FE mesh may be unstructured.What nodes to remove for coarsening? Not straightforward!Why not start from the coarse grid?

Department for Computer Science 10 (System Simulation)

5

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Properties of the HHG approach

Advantages

• Multigrid is straightforward

• Very memory efficient

• Stencil-based for fast execution

• Subserves parallelization

⇒ 3 · 1012 unknowns are possible

Limitation

• Coarse input grid needed

• Adaptivity

Department for Computer Science 10 (System Simulation)

6

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Performance Comparison on Different

HPC-Clusters

Department for Computer Science 10 (System Simulation)

7

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Benchmark Problem and Discretization

Scalar elliptic equation:∫Ωε(x)∇u(x) · ∇v(x)dx =

∫Ωf (x)v(x)dx (1)

Finite elements settings:

• linear (tetrahedral) finite elements

• constant stencil for constant coefficients ε(x)

• on-the-fly assembly for variable coefficients inside themacro-elements, storage of the stencils on the interfaces

Department for Computer Science 10 (System Simulation)

8

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Multigrid Setup

Multigrid sub-algorithms and parameters:

• three Gauss-Seidel for pre- and post-smoothing steps

• linear interpolation

• six multigrid levels

• parallel conjugate gradient algorithm on the coarsest grid

• direct coarse grid approximation with coefficients averaging

Department for Computer Science 10 (System Simulation)

9

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Benchmark-Machine Overview

JUGENE JUQUEEN SuperMUC

System IBM BlueGene/P IBM BlueGene/Q IBM System x iDataPlex

Processor IBM PowerPC 450 IBM PowerPC A2 Intel Xeon E5-2680 8C

Clock Frequency 0.85 GHz 1.6 GHz 2.8 GHz

Number of Nodes 73 728 28 672 9 216

HW Threads per Core 4 16x4 16x2

Memory per HW Thread 0.5 GB 0.25 GB 1 GB

Network Topology 3D Torus 5D Torus Tree

GFlops per Watt 0.44 2.54 0.94

Table : System overview of the supercomputers JUGENE, JUQUEEN,and SuperMUC

Department for Computer Science 10 (System Simulation)

10

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Performance comparison at ≈ 0.8 PFlop peak

JUGENE JUQUEEN SuperMUC

Single NodePeak FLOP/s (const. coeff. ε) 6% 7% 12%Peak FLOP/s (var. coeff. ε) 9% 10% 13%Peak bandwidth eq. (const. coeff. ε) 11% 53% 60%

Parallel Efficiencies (at ≈ 0.8 PFlop peak)Scaling (const. coeff. ε) 65% 64% 72%Scaling (var. coeff. ε) 94% 93% 96%Number of processes 262 144 262 144 32 768

Energy EfficiencyEnergy improvement compared to JUGENE 1 6.6 4.7(const. coeff. ε)Energy improvement compared to JUGENE 1 6.4 3.2(var. coeff. ε)

Table : Single node, parallel performance and power consumptions on thedifferent clusters.

Department for Computer Science 10 (System Simulation)

11

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Parallel Efficiencies of HHG on Different Clusters

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

1.0E+07 1.0E+08 1.0E+09 1.0E+10 1.0E+11 1.0E+12 1.0E+13

Par

alle

l Eff

icie

ncy

Problem Size

JUGENE JUQUEEN SuperMUC

1 Node Card

1 Midplane

1 Island

Hybrid Parallel

262 144 cores

393 216 cores

131 072 cores

Hybrid Parallel

Department for Computer Science 10 (System Simulation)

12

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Stokes Solver within HHG

Department for Computer Science 10 (System Simulation)

13

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Boussinesq Model for Mantle Convection Problems

−∇ · (2µε(u)) +∇p = RaTer ,

∇ · u = 0,

∂T

∂t+ u · ∇T −∇ · (∇T ) = γ.

Helps understanding geophysical phenomena, like

• Continental drift

• Plate tectonics

• Earthquakes

Department for Computer Science 10 (System Simulation)

14

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Stokes System

∇ · (2µ∇u)−∇p = RaTer , (2)

∇ · (ρu) = 0. (3)

Decoupling with a pressure correction approach using the Schurcomplement leads to(

A BT

0 −BA−1BT

)(UP

)=

(F0

).

with the matrices

Aij = (∇φui, 2µ∇φu

j) (4)

Bij = −(φqi ,∇ · φuj) (5)

Fi = (φui,RaTer ). (6)

Department for Computer Science 10 (System Simulation)

15

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Spherical refinement of an icosahedron (0)

Department for Computer Science 10 (System Simulation)

16

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Discretization with prismatic elements

Department for Computer Science 10 (System Simulation)

17

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Spherical refinement of an icosahedron (0)

Department for Computer Science 10 (System Simulation)

18

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Spherical refinement of an icosahedron (1)

Department for Computer Science 10 (System Simulation)

19

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Spherical refinement of an icosahedron (2)

Department for Computer Science 10 (System Simulation)

20

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Spherical refinement of an icosahedron (3)

Department for Computer Science 10 (System Simulation)

21

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Spherical refinement of an icosahedron (4)

Department for Computer Science 10 (System Simulation)

22

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Icosahedral Finite Element Mesh

Department for Computer Science 10 (System Simulation)

23

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Run-time breakup of the parallel Stokes solver onJUQUEEN with 15 360 threads

15.8 s

12.2 s

8.5 s

4.6 s

1.6 s

1.0 s

42.6 s

38.2 s

28.2 s

8.8 s

4.5 s

0.1 s

Overall

Pressurecorrection

Multigrid

Multigrid, w.o.finest level

Multigrid, CL

Multigrid, CL,scalar product

Computation Communication

Department for Computer Science 10 (System Simulation)

24

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

JUQUEEN: Stokes - Weak Scaling

Nodes Threads Grid points Resolution Time: (A) (B)

1 30 2.1 · 1007 32 km 30 s 89 s4 240 1.6 · 1008 16 km 38 s 114 s

30 1 920 1.3 · 1009 8 km 40 s 121 s240 15 360 1.1 · 1010 4 km 44 s 133 s

1 920 122 880 8.5 · 1010 2 km 48 s 153 s15 360 983 040 6.9 · 1011 1 km 54 s 170 s

Table : Weak scaling of the Stokes equation on the mantle geometry onJUQUEEN. The pressure residual is reduced by at least three orders ofmagnitude for (A), and eight orders of magnitude for (B).

Department for Computer Science 10 (System Simulation)

25

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

SuperMUC: Stokes - Strong Scaling

27

15

8.3 4.7 3.0

80

44

25

14 8.9

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

10

20

30

40

50

60

70

80

1 920 3 840 7 680 15 360 30 720

Par

alle

l Eff

icie

ncy

Ru

n-t

ime

[s]

Number of Cores

Time (case A) Time (case B) Parallel efficiency (case A)

Figure : Run-times of strong scaling experiments on SuperMUC from1 920 to 30 720 cores.

Department for Computer Science 10 (System Simulation)

26

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Thermal advection in a thick squerical shell, simulatedusing our prototype of Terra-Neo

Department for Computer Science 10 (System Simulation)

27

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Conclusions and Outlook

• Comparison of a geometric multigrid on three differentlarge-scale clusters

• First extensions of HHG to solve the Stokes system

• Utilize HHG as a test-bed for a new Earth mantle convectioncode (TERRA-NEO)

Department for Computer Science 10 (System Simulation)

28

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Number of Number of Time per Number of Number of Time perThreads Unknowns V-cycle Threads Unknowns V-cycle

64 1.33 · 108 2.34 s 16 384 3.43 · 1010 3.15 s128 2.67 · 108 2.41 s 32 768 6.87 · 1010 3.28 s256 5.35 · 108 2.80 s 65 536 1.37 · 1011 3.39 s512 1.07 · 109 2.82 s 131 072 2.75 · 1011 3.56 s

1 024 2.14 · 109 2.82 s 262 144 5.50 · 1011 3.68 s2 048 4.29 · 109 2.84 s 524 288 1.10 · 1012 3.76 s4 096 8.58 · 109 2.96 s 1 048 576 2.20 · 1012 4.07 s8 192 1.72 · 1010 3.09 s 1 572 864 3.29 · 1012 4.03 s

Table : Weak scaling experiment on JUQUEEN solving a problem on thefull machine with up to 3.29 · 1012 unknowns.

Department for Computer Science 10 (System Simulation)

Recommended