Scalability analysis of the distributed-memory ...users.monash.edu › ~jdroniou › MWNDEA ›...

Preview:

Citation preview

Scalability analysis of the distributed-memory implementation of the

Aggregated unfitted Finite Element Method (AgFEM)

Alberto F. Martín∗, Santiago Badia, Francesc Verdugo, Eric Neiva

MWNDEA 2020, Melbourne, Australia, 12/02/2020

Embedded Finite elements

CutFEM, Finite Cell Method, AgFEM, X-FEM, ...

body-fitted mesh unfitted mesh

4 Simplified mesh generation

8 Dirichlet BC? 8 Numerical integration? 8 ill-conditioning? (this talk)

Alberto F. Martín (Monash University) MWNDEA2020 2/35

Embedded Finite elements

CutFEM, Finite Cell Method, AgFEM, X-FEM, ...

body-fitted mesh unfitted mesh

4 Simplified mesh generation

8 Dirichlet BC? 8 Numerical integration? 8 ill-conditioning? (this talk)

Alberto F. Martín (Monash University) MWNDEA2020 2/35

Parallel distributed-memory simulation pipeline

1. Unfitted (adaptive) Cartesian grids(p4est)

2. Partition using space filling-curves(p4est)

3. Unfitted FE discretization (AgFEM)

4. AMG linear solver (PETSc)

Alberto F. Martín (Monash University) MWNDEA2020 3/35

Parallel distributed-memory simulation pipeline

1. Unfitted (adaptive) Cartesian grids(p4est)

2. Partition using space filling-curves(p4est)

3. Unfitted FE discretization (AgFEM)

4. AMG linear solver (PETSc)

Alberto F. Martín (Monash University) MWNDEA2020 3/35

Parallel distributed-memory simulation pipeline

1. Unfitted (adaptive) Cartesian grids(p4est)

2. Partition using space filling-curves(p4est)

3. Unfitted FE discretization (AgFEM)

4. AMG linear solver (PETSc)

Alberto F. Martín (Monash University) MWNDEA2020 3/35

Unfitted methods at large scales: pros and cons

4 Highly scalable mesh generationbased on octrees (e.g. p4est)

4 Highly scalable mesh partition withspace-filling curves (Parmetis notneeded)

4 Highly scalable adaptive meshrefinement + load balancing

8 Not guaranteed that highly scalablelinear solvers keep their optimalproperties for cut elements.

Alberto F. Martín (Monash University) MWNDEA2020 4/35

Unfitted methods at large scales: pros and cons

4 Highly scalable mesh generationbased on octrees (e.g. p4est)

4 Highly scalable mesh partition withspace-filling curves (Parmetis notneeded)

4 Highly scalable adaptive meshrefinement + load balancing

8 Not guaranteed that highly scalablelinear solvers keep their optimalproperties for cut elements.

Alberto F. Martín (Monash University) MWNDEA2020 4/35

Unfitted methods at large scales: pros and cons

4 Highly scalable mesh generationbased on octrees (e.g. p4est)

4 Highly scalable mesh partition withspace-filling curves (Parmetis notneeded)

4 Highly scalable adaptive meshrefinement + load balancing

8 Not guaranteed that highly scalablelinear solvers keep their optimalproperties for cut elements.

Alberto F. Martín (Monash University) MWNDEA2020 4/35

Unfitted methods at large scales: pros and cons

4 Highly scalable mesh generationbased on octrees (e.g. p4est)

4 Highly scalable mesh partition withspace-filling curves (Parmetis notneeded)

4 Highly scalable adaptive meshrefinement + load balancing

8 Not guaranteed that highly scalablelinear solvers keep their optimalproperties for cut elements.

Alberto F. Martín (Monash University) MWNDEA2020 4/35

PETSc CG + AMG preconditioner on unfitted meshes

Poisson equation (weak scaling test with 5 meshes)

AMG+AgFEM AMG + Naive unfitted FEM*

103 104 105 106 107 108

DOFs

4

6

8

10

12

14

16

GC

itera

tions

103 104 105 106 107 108

DOFs

4

6

8

10

12

14

16

GC

itera

tions

* Nitsche BCs + modified integration in cut cells

Alberto F. Martín (Monash University) MWNDEA2020 5/35

Why linear solvers are affected by cut cells?

Condition number estimates (Poisson Eq.)

(a) Body-fitted case

k2(A) ∼ h−2

(b) Naive unfitted FEM

k2(A) ∼ |η|−(2p+1− 2d )

"small cut cell problem"

Alberto F. Martín (Monash University) MWNDEA2020 6/35

Possible remedies

Fix the linear solver

Taylor your parallel solver to deal with k2(A) ∼ |η|−(2p+1−2/d)

Example: [S. Badia, F. Verdugo. Robust and scalable domaindecomposition solvers for unfitted finite element methods. Journal ofComputational and Applied Mathematics (2018) ].

Fix the linear system (this talk)

Enhance the unfitted FE method so that k2(A) ∼ h−2

Use a standard scalable solver

Examples: CutFEM, AgFEM

Alberto F. Martín (Monash University) MWNDEA2020 7/35

Agenda

1. The AgFEM method (serial case)

2. Parallel implementation

3. Performance of parallel AgFEM + AMG solvers

Alberto F. Martín (Monash University) MWNDEA2020 8/35

Agenda

1. The AgFEM method (serial case)

2. Parallel implementation

3. Performance of parallel AgFEM + AMG solvers

Alberto F. Martín (Monash University) MWNDEA2020 9/35

AgFEM method for the Poisson Eq.

−∆u = f in Ω

u = uD on ∂Ω

Alberto F. Martín (Monash University) MWNDEA2020 10/35

Weak imposition of Dirichlet BCs

Nitsche’s Method

Find uh ∈ V h such that

ah(uh, vh) = lh(vh) ∀vh ∈ V h (vh does not vanish on ∂Ω!)

where

ah(uh, vh) :=∑

K∈T acth

K∩Ω

∇uh · ∇vh −∑

F∈T acth ∩∂Ω

F

(∇uh · n)vh

−∑

F∈(T acth ∩∂Ω)

F

uh(∇vh · n) +∑

F∈(T acth ∩∂Ω)

βh−1F

∂Ω

uh vh

lh(vh) :=∑

K∈T acth

K∩Ω

f vh +∑

F∈(T acth ∩∂Ω)

βh−1F

F

uDvh −∑

F∈(T acth ∩∂Ω)

F

uD(∇vh · n

)

The key feature of AgFEM is the definition of the discrete space VhAlberto F. Martín (Monash University) MWNDEA2020 11/35

Starting point: "naive" FE space

V stdh := u ∈ C0(Ωact) : u|K ∈ Qp(K) ∀K ∈ T act

h

T acth , Ωact V std

h

Alberto F. Martín (Monash University) MWNDEA2020 12/35

Aggregated FE space

Basic idea: improve conditioning by removing problematic DOFs

V aggh :=

u ∈ Vh : u× =

•∈masters(×)

Cוu• ∀× ∈ P

• well-posed dofs× problematic dofs (P)

Alberto F. Martín (Monash University) MWNDEA2020 13/35

Definition of constraints via cell aggregates

1. Generate cell aggregates(1 interior cell + several cut cells)

2. Define dof to root cell map root(×)

via the aggregates

3. Define constraints:

u× =∑

•∈dofs(root(×))

φroot(×)• (x×)u•

Alberto F. Martín (Monash University) MWNDEA2020 14/35

Definition of constraints via cell aggregates

1. Generate cell aggregates(1 interior cell + several cut cells)

2. Define dof to root cell map root(×)

via the aggregates

3. Define constraints:

u× =∑

•∈dofs(root(×))

φroot(×)• (x×)u•

Alberto F. Martín (Monash University) MWNDEA2020 14/35

Definition of constraints via cell aggregates

1. Generate cell aggregates(1 interior cell + several cut cells)

2. Define dof to root cell map root(×)

via the aggregates

3. Define constraints:

u× =∑

•∈dofs(root(×))

φroot(×)• (x×)u•

Alberto F. Martín (Monash University) MWNDEA2020 14/35

Definition of constraints via cell aggregates

1. Generate cell aggregates(1 interior cell + several cut cells)

2. Define dof to root cell map root(×)

via the aggregates

3. Define constraints:

u× =∑

•∈dofs(root(×))

φroot(×)• (x×)u•

Alberto F. Martín (Monash University) MWNDEA2020 14/35

Results for the unfitted aggregated FEM (Poisson Eq.)1

κ(A) ≤ c1h−2 (Condition number bound)

β ≤ c2h−2 (Nitsche’s penalty coef.)

‖u− uh‖H1(Ω) ≤ c3hp (Optimal convergence order)

‖u− uh‖L2(Ω) ≤ c4hp+1 (Optimal convergence order)

and others (inverse/trace inequalities, bound of aggregate size, boundof the extended solution, ...)

1 [Badia, Verdugo, Martín. The aggregated unfitted finite element method for ellipticproblems. Comput. Methods Appl. Mech. Eng. (2018).]

Alberto F. Martín (Monash University) MWNDEA2020 15/35

0.3 0.4 0.5 0.6 0.7

0

5

10

15

20

25

30

log

10

(co

nd

est(

A))

p=1, standard

p=2, standard

p=1, aggr.

p=2, aggr.

Alberto F. Martín (Monash University) MWNDEA2020 16/35

0.3 0.4 0.5 0.6 0.7

0

5

10

15

20

25

30

log

10

(co

nd

est(

A))

p=1, standard

p=2, standard

p=1, aggr.

p=2, aggr.

Alberto F. Martín (Monash University) MWNDEA2020 17/35

Convergence test

-2.5 -2 -1.5 -1

log10(h)

-8

-7

-6

-5

-4

-3

-2

-1

0

1

log10(E

rror

energ

y n

orm

)

p=1, standard

p=2, standard

p=1, aggr.

p=2, aggr.

slope 1

slope 2

(a) 2D

-2.5 -2 -1.5 -1

log10(h)

-8

-7

-6

-5

-4

-3

-2

-1

0

1

log10(E

rror

energ

y n

orm

)p=1, standard

p=2, standard

p=1, aggr.

p=2, aggr.

slope 1

slope 2

(b) 3DAlberto F. Martín (Monash University) MWNDEA2020 18/35

Extension to the Stokes problem2

75.0 |u|0.0

−∆u+∇p = f in Ω

∇ · u = 0 in Ω

u = 0 on ΓD

(∇u− pI) · n = g on ΓN

2[Badia, Martín, Verdugo. Mixed aggregated finite element methods for the unfitteddiscretization of the stokes problem. SIAM J. Sci. Comput., 40(6). 2018.]

Alberto F. Martín (Monash University) MWNDEA2020 19/35

Alberto F. Martín (Monash University) MWNDEA2020 20/35

0.3 0.4 0.5 0.6 0.7

`

0

5

10

15

20

25

30

35

40

log10(condest(A

))

Aggregated

Standard

0.3 0.4 0.5 0.6 0.7

`

0

5

10

15

20

25

30

35

40

log10(condest(A

))

Aggregated

Standard

Alberto F. Martín (Monash University) MWNDEA2020 21/35

1.2 1.4 1.6 1.8 2.0

log10((DOF)1/d

)

−6

−5

−4

−3

−2

log10

(‖u

h−

u‖ H

1

‖u‖ H

1

)

Aggregated

Standard

slope -2

1.2 1.4 1.6 1.8 2.0

log10((DOF)1/d

)

−6

−5

−4

−3

−2

log10

(‖u

h−

u‖ L

2

‖u‖ L

2

)Aggregated

Standard

slope -3

1.2 1.4 1.6 1.8 2.0

log10((DOF)1/d

)

−6

−5

−4

−3

−2

log10

(‖p

h−

p‖ L

2

‖p‖ L

2

)

Aggregated

Standard

slope -2

Alberto F. Martín (Monash University) MWNDEA2020 22/35

Agenda

1. The AgFEM method (serial case)

2. Parallel implementation

3. Performance of parallel AgFEM + AMG solvers

Alberto F. Martín (Monash University) MWNDEA2020 23/35

Parallel mesh distribution

D1 D2

(a) D = D1, D2. (b) View from D1. (c) View from D2

Main phases to be parallelized:

• Cell Aggregation

• Imposition of constraints

Alberto F. Martín (Monash University) MWNDEA2020 24/35

Cell aggregation (serial)

touched

untouched

aggregated

Alberto F. Martín (Monash University) MWNDEA2020 25/35

Cell aggregation (serial)

touched

untouched

aggregated

Alberto F. Martín (Monash University) MWNDEA2020 25/35

Cell aggregation (serial)

touched

untouched

aggregated

Alberto F. Martín (Monash University) MWNDEA2020 25/35

Cell aggregation (serial)

touched

untouched

aggregated

Alberto F. Martín (Monash University) MWNDEA2020 25/35

Cell aggregation (serial)

touched

untouched

aggregated

Alberto F. Martín (Monash University) MWNDEA2020 25/35

Aggregates in 3D

Alberto F. Martín (Monash University) MWNDEA2020 26/35

Cell aggregation (parallel)

(a) Step 1.

(b) Step 2. (c) Comm. (d) Step 3.

4 Standard nearest neighbor communication to determine root cellsAlberto F. Martín (Monash University) MWNDEA2020 27/35

Cell aggregation (parallel)

(a) Step 1. (b) Step 2.

(c) Comm. (d) Step 3.

4 Standard nearest neighbor communication to determine root cellsAlberto F. Martín (Monash University) MWNDEA2020 27/35

Cell aggregation (parallel)

(a) Step 1. (b) Step 2. (c) Comm.

(d) Step 3.

4 Standard nearest neighbor communication to determine root cellsAlberto F. Martín (Monash University) MWNDEA2020 27/35

Cell aggregation (parallel)

(a) Step 1. (b) Step 2. (c) Comm. (d) Step 3.

4 Standard nearest neighbor communication to determine root cellsAlberto F. Martín (Monash University) MWNDEA2020 27/35

Parallel imposition of constraints

Ds′ Ds′′ Ds

8 Subdomain-local constraints not even possible in some cases

4 At the end, only standard neighbor communication required

Alberto F. Martín (Monash University) MWNDEA2020 28/35

Agenda

1. The AgFEM method (serial case)

2. Parallel implementation

3. Performance of parallel AgFEM + AMG solvers

Alberto F. Martín (Monash University) MWNDEA2020 29/35

Weak scaling test setup

• Poisson eq.

• AgFEM vs "naive" unfitted FEM

• Linear solver:PCG from PETSc

• Preconditioner:smooth aggregation AMG fromPETSc (GAMG)

• Up to 16K cores and 1000Mbackground cells

Computed at Mare Nostrum 4

Alberto F. Martín (Monash University) MWNDEA2020 30/35

Number of PCG iterations (weak scaling)

103 104 105 106 107 108

DOFs

4

6

8

10

12

14

16

GC

itera

tions

103 104 105 106 107 108

DOFs

4

6

8

10

12

14

16

GC

itera

tions

103 104 105 106 107 108

DOFs

4

6

8

10

12

14

GC

itera

tions

(a) Popcorn (b) Spiral (c) Swiss Cheese

agg (load 1)agg (load 2)agg (load 3)

std (load 1)std (load 2)std (load 3)

Alberto F. Martín (Monash University) MWNDEA2020 31/35

Computational time (secs) AgFEM stages (weak scaling)

105 106 107 108

DOFs

0

1

2

3

4

Wal

lclo

cktim

e[s

]

105 106 107 108

DOFs

0

1

2

3

4

5

Wal

lclo

cktim

e[s

]105 106 107 108

DOFs

0

5

10

15

Wal

lclo

cktim

e[s

]

(a) Popcorn (b) Spiral (c) Swiss Cheese

Cell aggregation (Alg. 2)Path reconstruction (Algs. 3 and 4)Import data from root cells (Alg. 5)Setup constraints (Sect. 3.8)

Setup of local DOFs ids (Sect. 3.3)Setup of global DOFs ids (Sect. 3.7)FE integration + assembly (Table 2)

Alberto F. Martín (Monash University) MWNDEA2020 32/35

Computational time (secs) AMG solver (weak scaling)

105 106 107 108

DOFs

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Wal

lclo

cktim

e[s

]

105 106 107 108

DOFs

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Wal

lclo

cktim

e[s

]

104 105 106 107

DOFs

0

1

2

3

4

Wal

lclo

cktim

e[s

]

(a) Popcorn (b) Spiral (c) Swiss Cheese

Linear solver setup Linear solver run

• Setup degradation (11.94x CPU time for 4,448x larger problem)

• Similar for standard FEM in a box (2.50x for 355.26x)

Alberto F. Martín (Monash University) MWNDEA2020 33/35

Conclusions

4 Embedded FEM enables scalable octree-based meshes

8 ... but can destroy the scalability of linear solvers

4 AgFEM allows to recover the optimal scaling of linear solver

4 ... while keeping the optimal discretization order.

Alberto F. Martín (Monash University) MWNDEA2020 34/35

For more details, see papers:

F. Verdugo, A.F. Martín, S. Badia. Distributed-memory parallelization ofthe aggregated unfitted finite element method. CMAME, 357, 2019.

S. Badia, A.F. Martín, F. Verdugo. Mixed aggregated finite elementmethods for the unfitted discretization of the Stokes problem. SIAM J.Sci. Comput., 40(6), 2018.

S. Badia, F. Verdugo, A.F. Martín. The aggregated unfitted finite elementmethod for elliptic problems. CMAME, 336, 2018.

https://github.com/fempar/fempar

Alberto F. Martín (Monash University) MWNDEA2020 35/35

Recommended