27
Journal of Physics: Conference Series OPEN ACCESS Primal and dual-primal iterative substructuring methods of stochastic PDEs To cite this article: Waad Subber and Abhijit Sarkar 2010 J. Phys.: Conf. Ser. 256 012001 View the article online for updates and enhancements. You may also like Fast MPI reconstruction with non-smooth priors by stochastic optimization and data- driven splitting Lena Zdun and Christina Brandt - Higher-order total variation approaches and generalisations Kristian Bredies and Martin Holler - Laplacians on discrete and quantum geometries Gianluca Calcagni, Daniele Oriti and Johannes Thürigen - Recent citations A domain decomposition method of stochastic PDEs: An iterative solution techniques using a two-level scalable preconditioner Waad Subber and Abhijit Sarkar - Domain decomposition method of stochastic PDEs: a two-level scalable preconditioner Waad Subber and Abhijit Sarkar - This content was downloaded from IP address 220.116.144.228 on 08/01/2022 at 07:37

Primal and dual-primal iterative substructuring methods of stochastic

  • Upload
    others

  • View
    25

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Primal and dual-primal iterative substructuring methods of stochastic

Journal of Physics Conference Series

OPEN ACCESS

Primal and dual-primal iterative substructuringmethods of stochastic PDEsTo cite this article Waad Subber and Abhijit Sarkar 2010 J Phys Conf Ser 256 012001

View the article online for updates and enhancements

You may also likeFast MPI reconstruction with non-smoothpriors by stochastic optimization and data-driven splittingLena Zdun and Christina Brandt

-

Higher-order total variation approachesand generalisationsKristian Bredies and Martin Holler

-

Laplacians on discrete and quantumgeometriesGianluca Calcagni Daniele Oriti andJohannes Thuumlrigen

-

Recent citationsA domain decomposition method ofstochastic PDEs An iterative solutiontechniques using a two-level scalablepreconditionerWaad Subber and Abhijit Sarkar

-

Domain decomposition method ofstochastic PDEs a two-level scalablepreconditionerWaad Subber and Abhijit Sarkar

-

This content was downloaded from IP address 220116144228 on 08012022 at 0737

Primal and dual-primal iterative substructuring

methods of stochastic PDEs

Waad Subber1 and Abhijit Sarkar2

Department of Civil and Environmental Engineering Carleton University Ottawa OntarioK1S5B6 Canada

E-mail 1wsubberconnectcarletonca

2abhijit sarkarcarletonca

Abstract A novel non-overlapping domain decomposition method is proposed to solve thelarge-scale linear system arising from the finite element discretization of stochastic partialdifferential equations (SPDEs) The methodology is based on a Schur complement basedgeometric decomposition and an orthogonal decomposition and projection of the stochasticprocesses using Polynomial Chaos expansion The algorithm offers a direct approach toformulate a two-level scalable preconditioner The proposed preconditioner strictly enforcesthe continuity condition on the corner nodes of the interface boundary while weakly satisfyingthe continuity condition over the remaining interface nodes This approach relates to aprimal version of an iterative substructuring method Next a Lagrange multiplier based dual-primal domain decomposition method is introduced in the context of SPDEs In the dual-primal method the continuity condition on the corner nodes is strictly satisfied while Lagrangemultipliers are used to enforce continuity on the remaining part of the interface boundary Fornumerical illustrations a two dimensional elliptic SPDE with non-Gaussian random coefficientsis considered The numerical results demonstrate the scalability of these algorithms with respectto the mesh size subdomain size fixed problem size per subdomain order of Polynomial Chaosexpansion and level of uncertainty in the input parameters The numerical experiments areperformed on a Linux cluster using MPI and PETSc libraries

1 Introduction

A domain decomposition method of SPDEs is introduced in [1 2] to quantify uncertainty inlarge-scale linear systems The methodology is based on a Schur complement based geomet-ric decomposition and an orthogonal decomposition and projection of the stochastic processesA parallel preconditioned conjugate gradient method (PCGM) is adopted in [3] to solve theinterface problem without explicitly constructing the Schur complement system The parallelperformance of the algorithms is demonstrated using a lumped preconditioner for non-Gaussiansystems arising from a hydraulic problem having random soil permeability properties

A one-level Neumann-Neumann domain decomposition preconditioner for SPDEs is intro-duced in [4] in order to enhance the performance of the parallel PCGM iterative solver in [3]The implementation of the algorithm requires a local solve of a stochastic Dirichlet problem fol-lowed by a local solve of a stochastic Neumann problem in each iteration of the PCGM solverThe multilevel sparsity structure of the coefficient matrices of the stochastic system namely (a)the sparsity structure due to the finite element discretization and (b) the block sparsity structuredue to the Polynomial Chaos expansion is exploited for computational efficiency The one-level

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

ccopy 2010 IOP Publishing Ltd 1

Neumann-Neumann preconditioner in [4] demonstrates a good (strong and weak) scalability forthe moderate range of CPUs considered

In this paper we first describe a primal version of iterative substructuring methods for thesolution of the large-scale linear system arising from stochastic finite element method Thealgorithm offers a straightforward approach to formulate a two-level scalable preconditionerThe continuity condition is strictly enforced on the corner nodes (nodes shared among morethan two subdomains including the nodes at the ends of the interface edges) For the remainingpart of the interface boundary the continuity condition is satisfied weakly (in an average sense)Note that the continuity of the solution field across the entire interface boundary is eventuallysatisfied at the convergence of the iterations This approach naturally leads to a coarse gridwhich connects the subdomains globally via the corner nodes The coarse grid provides amechanism to propagate information globally which makes the algorithm scalable with respectto subdomain size In the second part of the paper a dual-primal iterative substructuringmethod is introduced for SPDEs which maybe viewed as an extension of the Dual-Primal FiniteElement Tearing and Interconnecting method (FETI-DP) [5] in the context of SPDEs In thisapproach the continuity condition on the corner nodes is strictly satisfied by partial assemblywhile Lagrange multipliers are used to enforce continuity on the remaining part of the interfaceboundary A system of Lagrange multiplier (also called the dual variable) is solved iterativelyusing PCGM method equipped with Dirichlet preconditioner PETSc [6] and MPI [7] librariesare used for efficient parallel implementation of the primal and dual-primal algorithms Thegraph partitioning tool METIS [8] is used for optimal decomposition of the finite element meshfor load balancing and minimum interprocessor communication The parallel performance of thealgorithms is studied for a two dimensional stochastic elliptic PDE with non-Gaussian randomcoefficients The numerical experiments are performed using a Linux cluster

2 Uncertainty representation by stochastic processes

This section provides a brief review of the theories of stochastic processes relevant to thesubsequent developments of the paper [9 1] We assume the data induces a representation of themodel parameters as random variables and processes which span the Hilbert space HG A set ofbasis functions ξi is identified to characterize this space using Karhunen-Loeve expansion Thestate of the system resides in the Hilbert space HL with basis functions Ψi being identifiedwith the Polynomial Chaos expansion (PC) The Karhunen-Loeve expansion of a stochasticprocess α(x θ) is based on the spectral expansion of its covariance function Rαα(x y) Theexpansion takes the following form

α(x θ) = α(x) +infin

sum

i=1

radic

λiξi(θ)φi(x)

where α(x) is the mean of the stochastic process θ represents the random dimension andξi(θ) is a set of uncorrelated (but not generally independent for non-Gaussian processes) randomvariables φi(x) are the eigenfunctions and λi are the eigenvalues of the covariance kernel whichcan be obtained as the solution to the following integral equation

int

D

Rαα(x y)φi(y)dy = λiφi(x)

where D denotes the spatial dimension over which the process is defined The covariance functionof the solution process is not known a priori and hence the Karhunen-Loeve expansion cannot beused to represent it Therefore a generic basis that is complete in the space of all second-orderrandom variables will be identified and used in the approximation process Since the solution

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

2

process is a function of the material properties nodal solution variables denoted by u(θ) can beformally expressed as some nonlinear functional of the set ξi(θ) used to represent the materialstochasticity It has been shown that this functional dependence can be expanded in terms ofpolynomials in Gaussian random variables namely Polynomial Chaos [9] as

u(θ) =N

sum

j=0

Ψj(θ)uj

These polynomials are orthogonal in the sense that their inner product 〈ΨjΨk〉 defined asthe statistical average of their product is equal to zero for j 6= k

3 Review of Schur complement based domain decomposition method of SPDEs

Consider an elliptic stochastic PDE defined on a domain Ω with a given boundary conditionson partΩ Finite element discretization of the stochastic PDE leads to the following linear system

A(θ)u(θ) = f (1)

where A(θ) is the stiffness matrix with random coefficients u(θ) is the stochastic processrepresenting the response vector and f is the applied force For large-scale system Eq(1) canbe solved efficiently using domain decomposition method [1 2]

In domain decomposition method the spatial domain Ω is partitioned into ns non-overlappingsubdomains Ωs 1 le s le ns such that

Ω =

ns⋃

s=1

Ωs Ωs

Ωr = 0 s 6= r

andΓ =

s=1

Γs where Γs = partΩspartΩ

For a typical subdomain Ωs the nodal vector us(θ) is partitioned into a set of interior un-knowns us

I(θ) associated with nodes in the interior of Ωs and interface unknowns usΓ(θ) associated

with nodes that are shared among two or more subdomains as shown in Fig(1)

Consequently the subdomain equilibrium equation can be represented as

[

AsII(θ) As

IΓ (θ)As

ΓI(θ) AsΓΓ

(θ)

]

usI(θ)

usΓ(θ)

=

f sI

f sΓ

The Polynomial Chaos expansion can be used to represent the uncertainty in the modelparameters as

Lsum

i=0

Ψi

[

AsIIi As

IΓ i

AsΓIi As

ΓΓ i

]

usI(θ)

usΓ(θ)

=

f sI

f sΓ

A Boolean restriction operator Rs of size (nsΓtimes nΓ) which maps the global interface vector

uΓ (θ) to the local interface vector usΓ(θ) is defined as

usΓ (θ) = RsuΓ (θ)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

3

Figure 1 Partitioning domain nodes into interior () and interface ()

Enforcing the transmission conditions (compatibility and equilibrium) along the interfacesthe global equilibrium equation of the stochastic system can be expressed in the following blocklinear systems of equations

Lsum

i=0

Ψi

A1IIi 0 A1

IΓ iR1

0 Ans

IIi Ans

IΓ iRns

RT1 A1

ΓIi RTns

Ans

ΓIi

nssum

s=1

RTs As

ΓΓ iRs

u1I(θ)

uns

I (θ)uΓ (θ)

=

f1I

fns

Inssum

s=1

RTs f s

Γ

(2)

The solution process can be expanded using the same Polynomial Chaos basis as

u1I(θ)

uns

I (θ)uΓ (θ)

=N

sum

j=0

Ψj(θ)

u1Ij

uns

Ij

uΓ j

(3)

Substituting Eq(3) into Eq(2) and performing Galerkin projection to minimize the errorover the space spanned by the Polynomial Chaos basis [1] the following coupled deterministicsystems of equations is obtained

A1II 0 A1

IΓR1

0 Ans

II Ans

IΓRns

RT1 A

1ΓI RT

nsAns

ΓI

nssum

s=1

RTs A

sΓΓRs

U1I

Uns

I

=

F1I

Fns

Inssum

s=1

RTs F

(4)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

4

where

[Asαβ]jk =

Lsum

i=0

〈ΨiΨjΨk〉Asαβi Fs

αk = 〈Ψkfsα〉

UmI = (um

I 0 umI N )T UΓ = (uΓ 0 uΓ N )T

the subscripts α and β represent the index I and Γ The coefficient matrix in Eq(4) is of ordern(N +1)timesn(N +1) where n and (N +1) denote the total number of the degrees of freedom andchaos coefficients respectively The stochastic counterpart of the restriction operator in Eq(4)takes the following form

Rs = blockdiag(R0s R

Ns )

where (R0s R

Ns ) are the deterministic restriction operators In parallel implementation

Rs acts as a scatter operator while RTs acts as a gather operator and are not constructed explic-

itly

A block Gaussian elimination reduces the system in Eq(4) to the following extended Schurcomplement system for the interface variable UΓ

S UΓ = GΓ (5)

where the global extended Schur complement matrix S is given by

S =

nssum

s=1

RTs [As

ΓΓ minusAsΓI (A

sII)

minus1AsIΓ ]Rs

and the corresponding right hand vector GΓ is

GΓ =

nssum

s=1

RTs [Fs

Γ minusAsΓI (A

sII)

minus1FsI ]

Once the interface unknowns UΓ is available the interior unknowns can be obtainedconcurrently by solving the interior problem on each subdomain as

AsII Us

I = FsI minusAs

ΓIRsUΓ

4 Solution methods for the extended Schur complement system

Solution methods for linear systems are broadly categorized into direct methods and iterativemethods The direct methods generally are based on sparse Gaussian elimination technique andare popular for their robustness However they are expensive in computation time and memoryrequirements and therefore cannot be applied to the solution of large-scale linear systems [10]On the other hand the iterative methods generate a sequences of approximate solutions whichconverge to the true solutions In the iterative methods the main arithmetic operation is thematrix-vector multiplication Therefore the linear system itself need not be constructed explicitlyand only a procedure for matrix-vector product is required This property makes iterativemethods more suitable to parallel processing than direct methods

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

5

41 Preconditioned Conjugate Gradient Method (PCGM)Non-overlapping domain decomposition method or iterative substructuring can be viewed as apreconditioned iterative method to solve the Schur complement system of the form [11]

S UΓ = GΓ

For symmetric positive-definite system such as Schur complement system the ConjugateGradient Method (CGM) is generally used The performance of CGM mainly depends on thespectrum of the coefficient matrix However the rate of convergence of the iterative methodcan generally be improved by transforming the original system into an equivalent system thathas better spectral properties (ie lower condition number κ(S)) of the coefficient matrix Thistransformation is called preconditioning and the matrix used in the transformation is called thepreconditioner In other words the transformed linear system becomes

Mminus1S UΓ = Mminus1GΓ

In general κ(Mminus1S) is much smaller than κ(S) and the eigenvalues of Mminus1S are clusterednear one This procedure known as Preconditioned Conjugate Gradient Method (PCGM) Inpractice the explicit construction of Mminus1 is not needed Instead for a given vector rΓ a systemof the the following form is solved

MZ = rΓ

The PCGM algorithm to solve the Schur complement system proceeds as follows [10]

Algorithm 1 The PCGM Algorithm

1 Initialize UΓ0= 0

2 Compute rΓ0= GΓ minus S UΓ0

3 Precondition Z0 = Mminus1rΓ0

4 First search direction P0 = Z0

5 Initialize ρ0 = (rΓ0Z0)

6 For j = 0 1 middot middot middot until convergence Do

7 Qj = SPj

8 ρtmpj= (Qj Pj)

9 αj = ρjρtmpj

10 UΓj+1= UΓj

+ αjPj

11 rΓj+1= rΓj

minus αjQj

12 Zj+1 = Mminus1rΓj+1

13 ρj+1 = (rΓj+1Zj+1)

14 βj = ρj+1ρj

15 Pj+1 = Zj+1 + βjPj

16 EndDo

The PCGM algorithm indicates that the main arithmetic operations are calculating the prod-uct Q = SP in step 7 and the preconditioned residual Z = Mminus1rΓ in step 12 These operationscan be performed in parallel as outlined next

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

6

Given the subdomain Schur complement matrices Ss and a global vector P the matrix-vectorproduct Q = SP can be calculated in parallel as

Q =

nssum

s=1

RTs SsRsP

where ns is the number of subdomains and Rs and RTs are scatter and gather operator re-

spectively The parallel implementation of this procedure is summarized in Algorithm (2)

Algorithm 2 Parallel Matrix-Vector Product Procedure

1 Input (P)

2 Scatter Ps = RsP

3 Local operation Qs = SsPs

4 Gather Q =

nssum

s=1

RTs Q

s

5 Output (Q)

The working vectors Ps and Qs are defined on the subdomain level

Similarly the effect of a parallel preconditioner on a residual vector Z = Mminus1rΓ can becomputed as

Z =

nssum

s=1

RTs M

minus1s RsrΓ

This procedure is outlined in following algorithm

Algorithm 3 Parallel Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= RsrΓ

3 Local Solve MsZs = rs

Γ

4 Gather Z =

nssum

s=1

RTs Z

s

5 Output (Z)

The local preconditioner Ms and the working vectors rsΓ

and Zs are defined on the subdo-main level

5 Iterative substructuring methods of SPDEs

Next sections describe the primal and dual-primal substructuring methods in the context ofSPDEs In the primal method the interface problem is solved iteratively using PCGM solverequipped with a scalable preconditioner At each iteration of the iterative solver loop localproblems are solved on each subdomain in parallel These local problems are used to construct asubdomain level precondtioner Moreover a coarse problem is required to propagate information

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

7

globally across the subdomains This global exchange of information leads to a scalablepreconditioner In the dual-primal method a system of Lagrange multiplier that enforcescontinuity constraints across the interface boundary is solved iteratively using PCGM solverThe global coarse problem is already embedded in the operator of the Lagrange multiplier systemand therefore a one-level preconditioner such as lumped or Dirichlet is sufficient for scalability Aframework of the primal and dual-primal iterative substructuring methods for SPDEs is detailednext

6 A primal iterative substructuring method of SPDEs

In order to define local problems over each of the subdomains we partition the subdomainnodal vector us(θ) into a set of interior unknowns us

i (θ) corner unknowns usc(θ) and remaining

unknowns usr(θ) as schematically shown in Fig(2)

Figure 2 Partitioning domain nodes into interior () remaining () and corner(bull) nodes

According to this partitioning scheme the subdomain equilibrium equation can berepresented as

Asii(θ) As

ir(θ) Asic(θ)

Asri(θ) As

rr(θ) Asrc(θ)

Asci(θ) As

cr(θ) Ascc(θ)

usi (θ)

usr(θ)

usc(θ)

=

f si

f sr

f sc

The Polynomial Chaos representation of uncertain model parameters leads to the followingsubdomain equilibrium equation

Lsum

l=0

Ψl

Asiil As

irl Asicl

Asril As

rrl Asrcl

Ascil As

crl Asccl

usi (θ)

usr(θ)

usc(θ)

=

f si

f sr

f sc

(6)

The solution process is expressed using the same Polynomial Chaos basis as

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

8

usi (θ)

usr(θ)

usc(θ)

=

Nsum

j=0

Ψj(θ)

usij

usrj

uscj

(7)

Substituting Eq(7) into Eq(6) and performing Galerkin projection leads to the followingcoupled deterministic systems of equations

Asii As

ir Asic

Asri As

rr Asrc

Asci As

cr Ascc

Usi

Usr

Usc

=

Fsi

Fsr

Fsc

(8)

where

[Asαβ ]jk =

Lsum

l=0

〈ΨlΨjΨk〉Asαβl

Fsαk = 〈Ψkf

sα〉

Usα = (us

α0 middot middot middot usαN )T

the subscripts α and β represent the index i r and c

Enforcing the transmission conditions along the boundary interfaces the subdomainequilibrium equation can be expressed as

Asii As

irBsr As

icBsc

nssum

s=1

BsrTAs

ri

nssum

s=1

BsrTAs

rrBsr

nssum

s=1

BsrTAs

rcBsc

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

crBsr

nssum

s=1

BscTAs

ccBsc

Usi

Ur

Uc

=

Fsi

nssum

s=1

BsrTFs

r

nssum

s=1

BscTFs

c

(9)

where Bsr is a Boolean rectangular matrix that maps the global remaining vector Ur to the

local remaining vector Usr as

Usr = Bs

rUr (10)

Similarly the restriction operator Bsc is a Boolean rectangular matrix that maps the global

corner vector Uc to the local corner vector Usc as

Usc = Bs

cUc (11)

In parallel implementation both Bsr and Bs

c act as scatter operators while BsrT and Bs

cT act

as gather operators

The first block equation in Eq(9) can be solved for Usi in parallel as

Usi = [As

ii]minus1(Fs

i minusAsirB

srUr minusAs

icBscUc) (12)

Substituting Eq(12) into Eq(9) leads to the following condensed system which representsthe Schur compliment system in Eq(5) but now the interface boundary nodes are split into

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

9

remaining and corner nodes as shown schematically in Fig(3)

nssum

s=1

BsrTSs

rrBsr

nssum

s=1

BsrTSs

rcBsc

nssum

s=1

BscTSs

crBsr

nssum

s=1

BscTSs

ccBsc

Ur

Uc

=

nssum

s=1

BsrTGs

r

nssum

s=1

BscTGs

c

(13)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

Figure 3 The interface boundary nodes split into remainder () and corner(bull) nodes

The corner nodal vector Uc in Eq(13) is eliminated next to obtain the following (symmetricpositive definite) reduced interface problem

(Frr minus Frc[Fcc]minus1Fcr)Ur = dr minus Frc[Fcc]

minus1dc (14)

where

Fαβ =

nssum

s=1

Bsα

TSsαβB

dα =

nssum

s=1

Bsα

TGsα

and α and β denotes subscripts r and c The above system can be solved using PCGM withan appropriate preconditioner Mminus1 defined in the next section

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

10

61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system

Figure 4 Partial assembly of corner nodes (bull)

Ssrr Ss

rcBsc

nssum

s=1

BscTSs

crBsr

nssum

s=1

BscTSs

ccBsc

Usr

Uc

=

Fsr

0

(15)

where

Fsr = Ds

rBsrrj

and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal

weighting matrix which satisfies the following property

nssum

s=1

BsrTDs

rBsr = I

The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that

share the interface boundary nodes

The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)

as

Usr = [Ss

rr]minus1(Fs

r minus SsrcB

scUc) (16)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

11

Substituting Usr into the second block of Eq(15) leads to the following coarse problem

F lowast

ccUc = dlowastc (17)

where

F lowast

cc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

dlowastc = minus

nssum

s=1

BscTSs

cr[Ssrr]

minus1Fsr

The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as

Ur =

nssum

s=1

BsrTDs

rUsr

After some algebraic manipulations the preconditioner can be expressed as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr + RT

0 [F lowast

cc]minus1R0 (18)

where

R0 =

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

7 A dual-primal iterative substructuring method of SPDEs

In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes

Partial assembly of Eq(8) leads to the following subdomain equilibrium equation

Asii As

ir AsicB

sc

Asri As

rr AsrcB

sc

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc

Usi

Usr

Uc

=

Fsi

Fsr

nssum

s=1

BscTFs

c

(19)

where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local

corner vector Usc as

Usc = Bs

cUc

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

12

Eq(19) can be rewritten in compact form as

AsUs = Fs (20)

Let Bsr be a block diagonal signed Boolean continuity matrix defined as

s=nssum

s=1

BsrU

sr = 0

Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as

1

2UT AU minus UT F rarr min (21)

subject to BU = 0

where

A =

A1

As

Ans

U =

U1

Us

Uns

F =

F1

Fs

Fns

B =[

(0 B1r 0) middot middot middot (0 Bs

r 0) middot middot middot (0 Bnsr 0)

]

By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as

L(U Λ) =1

2UT AU minus UT F + UT BT Λ (22)

Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system

Asii As

ir AsicB

sc 0

Asri As

rr AsrcB

sc Bs

rT

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc 0

0

nssum

s=1

Bsr 0 0

Usi

Usr

Uc

Λ

=

Fsi

Fsr

nssum

s=1

BscTFs

c

0

(23)

where

Λ =

λ0

λN

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

13

Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary

and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)

Eliminating the interior unknowns Usi from Eq(23) as

Usi = [As

ii]minus1(Fs

i minusAsirU

sr minusAs

icBscUc) (24)

Substituting Eq(24) into Eq(23) leads to

Ssrr Ss

rcBsc Bs

rT

nssum

s=1

BscTSs

cr

nssum

s=1

BscTSs

ccBsc 0

nssum

s=1

Bsr 0 0

Usr

Uc

Λ

=

Gsr

nssum

s=1

BscTGs

c

0

(25)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)

as

Usr = [Ss

rr]minus1(Gs

r minus SsrcB

scUc minus Bs

rT Λ) (26)

Substituting Eq(26) into Eq(25) leads to

[

Fcc minusFcr

Frc Frr

]

Uc

Λ

=

dc

dr

(27)

where

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

14

Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Fcr =

nssum

s=1

BscTSs

cr[Ssrr]

minus1BsrT

Frc =

nssum

s=1

Bsr [S

srr]

minus1SsrcB

sc

Frr =

nssum

s=1

Bsr [S

srr]

minus1BsrT

dc =

nssum

s=1

BscT (Gs

c minus Sscr[S

srr]

minus1Gsr)

dr =

nssum

s=1

Bsr [S

srr]

minus1Gsr

Solving for Uc from Eq(27) gives the following coarse problem

FccUc = (dc + FcrΛ) (28)

Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system

(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]

minus1dc (29)

Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as

M =

nssum

s=1

BsrD

srS

srrD

srB

srT (30)

8 Connection between the methods

The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as

F lowast

cc = Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr

+

nssum

s=1

BsrTDs

r[Ssrr]

minus1SrcBsc

[

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

]minus1

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

which has the same form of the dual-primal operator in Eq(29)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

15

9 Parallel implementation

In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem

91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 4 Parallel Matrix-Vector Product for Primal Method

1 Input (P)

2 Scatter Ps = BsrP

3 Compute vs1 = Ss

crPs

4 Gather V1 =

nssum

s=1

BsrT vs

1

5 Global Solve FccV2 = V1

6 Scatter vs2 = Bs

cV2

7 Compute vs3 = Ss

rcvs2

8 Update Qs = SsrrP

s minus vs3

9 Gather Q =

nssum

s=1

BsrTQs

10 Output (Q)

Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as

vsα = Ss

αβvsβ

vsα = (As

αβ minusAsαi[A

sii]

minus1Asiβ)vs

β

This procedure is outlined in the following algorithm

Algorithm 5 Dirichlet Solver Procedure

1 Input (vsβ)

2 Compute vs1 = As

iβvsβ

3 Solve Asiiv

s2 = vs

1

4 Compute vs3 = As

αivs2

5 Compute vs4 = As

αβvsβ

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

16

6 Compute vsα = vs

4 minus vs3

7 Output (vsα)

The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as

Mminus1cc FccV2 = Mminus1

cc V1

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm

Algorithm 6 Two-Level Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter Fsr = Ds

rBsrrΓ

3 Local Solve Ssrrv

s1 = Fs

r

4 Compute dsc = Ss

crvs1

5 Gather dc =

nssum

s=1

BscT ds

c

6 Global Solve F lowastccZc = minusdc

7 Scatter Zsc = BsZc

8 Update vs2 = Fs

r + SsrcZ

sc

9 Local Solve SsrrZ

sf = vs

2

10 Gather Z =

nssum

s=1

BsrTDs

rZsf

11 Output (Z)

The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss

rrUsr = rs

r which can be solved using the following algorithm

Algorithm 7 Neumann-Solver Procedure

1 Input (rsr)

2 Solve[

Asii As

ir

Asri As

rr

]

X s

Usr

=

0rsr

3 Output (Usr )

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

17

The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as

Mminus1cc F lowast

ccZc = minusMminus1cc dc

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart

Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem

92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr + Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method

1 Input (P)

2 Scatter Ps = BsrTP

3 Local Solve Ssrrv

s1 = Ps

4 Compute vs2 = Ss

crvs1

5 Gather V2 =

nssum

s=1

BscT vs

2

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

18

6 Global Solve FccV3 = V2

7 Scatter vs3 = Bs

cV3

8 Compute vs4 = Ss

rcvs3

9 Update vs5 = Ps + vs

4

10 Local Solve SsrrQ

s = vs5

11 Gather Q =

nssum

s=1

BsrQ

s

12 Output (Q)

The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner

Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm

Algorithm 9 Dirichlet Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= DsrB

srT rΓ

3 Compute Zs = Ssrrr

4 Gather Z =

nssum

s=1

BsrD

srZ

s

5 Output (Z)

We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart

Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

19

10 Numerical results

For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as

part

partx[cx(x y θ)

partu(x y θ)

partx] +

part

party[cy(x y θ)

partu(x y θ)

party] = f(x y) in Ω

where the forcing term is

f(x y) = 10

For simplicity a homogeneous Dirichlet boundary condition is imposed as

u(x y θ) = 0 on partΩ

The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025

In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5

GkΓminus SUk

Γ2

G0Γ2

6 10minus5

Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh

101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

20

Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS

Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed

Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1

102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

21

Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3

Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5

Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1

a total problem size of (10 times n)

Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

22

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 2: Primal and dual-primal iterative substructuring methods of stochastic

Primal and dual-primal iterative substructuring

methods of stochastic PDEs

Waad Subber1 and Abhijit Sarkar2

Department of Civil and Environmental Engineering Carleton University Ottawa OntarioK1S5B6 Canada

E-mail 1wsubberconnectcarletonca

2abhijit sarkarcarletonca

Abstract A novel non-overlapping domain decomposition method is proposed to solve thelarge-scale linear system arising from the finite element discretization of stochastic partialdifferential equations (SPDEs) The methodology is based on a Schur complement basedgeometric decomposition and an orthogonal decomposition and projection of the stochasticprocesses using Polynomial Chaos expansion The algorithm offers a direct approach toformulate a two-level scalable preconditioner The proposed preconditioner strictly enforcesthe continuity condition on the corner nodes of the interface boundary while weakly satisfyingthe continuity condition over the remaining interface nodes This approach relates to aprimal version of an iterative substructuring method Next a Lagrange multiplier based dual-primal domain decomposition method is introduced in the context of SPDEs In the dual-primal method the continuity condition on the corner nodes is strictly satisfied while Lagrangemultipliers are used to enforce continuity on the remaining part of the interface boundary Fornumerical illustrations a two dimensional elliptic SPDE with non-Gaussian random coefficientsis considered The numerical results demonstrate the scalability of these algorithms with respectto the mesh size subdomain size fixed problem size per subdomain order of Polynomial Chaosexpansion and level of uncertainty in the input parameters The numerical experiments areperformed on a Linux cluster using MPI and PETSc libraries

1 Introduction

A domain decomposition method of SPDEs is introduced in [1 2] to quantify uncertainty inlarge-scale linear systems The methodology is based on a Schur complement based geomet-ric decomposition and an orthogonal decomposition and projection of the stochastic processesA parallel preconditioned conjugate gradient method (PCGM) is adopted in [3] to solve theinterface problem without explicitly constructing the Schur complement system The parallelperformance of the algorithms is demonstrated using a lumped preconditioner for non-Gaussiansystems arising from a hydraulic problem having random soil permeability properties

A one-level Neumann-Neumann domain decomposition preconditioner for SPDEs is intro-duced in [4] in order to enhance the performance of the parallel PCGM iterative solver in [3]The implementation of the algorithm requires a local solve of a stochastic Dirichlet problem fol-lowed by a local solve of a stochastic Neumann problem in each iteration of the PCGM solverThe multilevel sparsity structure of the coefficient matrices of the stochastic system namely (a)the sparsity structure due to the finite element discretization and (b) the block sparsity structuredue to the Polynomial Chaos expansion is exploited for computational efficiency The one-level

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

ccopy 2010 IOP Publishing Ltd 1

Neumann-Neumann preconditioner in [4] demonstrates a good (strong and weak) scalability forthe moderate range of CPUs considered

In this paper we first describe a primal version of iterative substructuring methods for thesolution of the large-scale linear system arising from stochastic finite element method Thealgorithm offers a straightforward approach to formulate a two-level scalable preconditionerThe continuity condition is strictly enforced on the corner nodes (nodes shared among morethan two subdomains including the nodes at the ends of the interface edges) For the remainingpart of the interface boundary the continuity condition is satisfied weakly (in an average sense)Note that the continuity of the solution field across the entire interface boundary is eventuallysatisfied at the convergence of the iterations This approach naturally leads to a coarse gridwhich connects the subdomains globally via the corner nodes The coarse grid provides amechanism to propagate information globally which makes the algorithm scalable with respectto subdomain size In the second part of the paper a dual-primal iterative substructuringmethod is introduced for SPDEs which maybe viewed as an extension of the Dual-Primal FiniteElement Tearing and Interconnecting method (FETI-DP) [5] in the context of SPDEs In thisapproach the continuity condition on the corner nodes is strictly satisfied by partial assemblywhile Lagrange multipliers are used to enforce continuity on the remaining part of the interfaceboundary A system of Lagrange multiplier (also called the dual variable) is solved iterativelyusing PCGM method equipped with Dirichlet preconditioner PETSc [6] and MPI [7] librariesare used for efficient parallel implementation of the primal and dual-primal algorithms Thegraph partitioning tool METIS [8] is used for optimal decomposition of the finite element meshfor load balancing and minimum interprocessor communication The parallel performance of thealgorithms is studied for a two dimensional stochastic elliptic PDE with non-Gaussian randomcoefficients The numerical experiments are performed using a Linux cluster

2 Uncertainty representation by stochastic processes

This section provides a brief review of the theories of stochastic processes relevant to thesubsequent developments of the paper [9 1] We assume the data induces a representation of themodel parameters as random variables and processes which span the Hilbert space HG A set ofbasis functions ξi is identified to characterize this space using Karhunen-Loeve expansion Thestate of the system resides in the Hilbert space HL with basis functions Ψi being identifiedwith the Polynomial Chaos expansion (PC) The Karhunen-Loeve expansion of a stochasticprocess α(x θ) is based on the spectral expansion of its covariance function Rαα(x y) Theexpansion takes the following form

α(x θ) = α(x) +infin

sum

i=1

radic

λiξi(θ)φi(x)

where α(x) is the mean of the stochastic process θ represents the random dimension andξi(θ) is a set of uncorrelated (but not generally independent for non-Gaussian processes) randomvariables φi(x) are the eigenfunctions and λi are the eigenvalues of the covariance kernel whichcan be obtained as the solution to the following integral equation

int

D

Rαα(x y)φi(y)dy = λiφi(x)

where D denotes the spatial dimension over which the process is defined The covariance functionof the solution process is not known a priori and hence the Karhunen-Loeve expansion cannot beused to represent it Therefore a generic basis that is complete in the space of all second-orderrandom variables will be identified and used in the approximation process Since the solution

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

2

process is a function of the material properties nodal solution variables denoted by u(θ) can beformally expressed as some nonlinear functional of the set ξi(θ) used to represent the materialstochasticity It has been shown that this functional dependence can be expanded in terms ofpolynomials in Gaussian random variables namely Polynomial Chaos [9] as

u(θ) =N

sum

j=0

Ψj(θ)uj

These polynomials are orthogonal in the sense that their inner product 〈ΨjΨk〉 defined asthe statistical average of their product is equal to zero for j 6= k

3 Review of Schur complement based domain decomposition method of SPDEs

Consider an elliptic stochastic PDE defined on a domain Ω with a given boundary conditionson partΩ Finite element discretization of the stochastic PDE leads to the following linear system

A(θ)u(θ) = f (1)

where A(θ) is the stiffness matrix with random coefficients u(θ) is the stochastic processrepresenting the response vector and f is the applied force For large-scale system Eq(1) canbe solved efficiently using domain decomposition method [1 2]

In domain decomposition method the spatial domain Ω is partitioned into ns non-overlappingsubdomains Ωs 1 le s le ns such that

Ω =

ns⋃

s=1

Ωs Ωs

Ωr = 0 s 6= r

andΓ =

s=1

Γs where Γs = partΩspartΩ

For a typical subdomain Ωs the nodal vector us(θ) is partitioned into a set of interior un-knowns us

I(θ) associated with nodes in the interior of Ωs and interface unknowns usΓ(θ) associated

with nodes that are shared among two or more subdomains as shown in Fig(1)

Consequently the subdomain equilibrium equation can be represented as

[

AsII(θ) As

IΓ (θ)As

ΓI(θ) AsΓΓ

(θ)

]

usI(θ)

usΓ(θ)

=

f sI

f sΓ

The Polynomial Chaos expansion can be used to represent the uncertainty in the modelparameters as

Lsum

i=0

Ψi

[

AsIIi As

IΓ i

AsΓIi As

ΓΓ i

]

usI(θ)

usΓ(θ)

=

f sI

f sΓ

A Boolean restriction operator Rs of size (nsΓtimes nΓ) which maps the global interface vector

uΓ (θ) to the local interface vector usΓ(θ) is defined as

usΓ (θ) = RsuΓ (θ)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

3

Figure 1 Partitioning domain nodes into interior () and interface ()

Enforcing the transmission conditions (compatibility and equilibrium) along the interfacesthe global equilibrium equation of the stochastic system can be expressed in the following blocklinear systems of equations

Lsum

i=0

Ψi

A1IIi 0 A1

IΓ iR1

0 Ans

IIi Ans

IΓ iRns

RT1 A1

ΓIi RTns

Ans

ΓIi

nssum

s=1

RTs As

ΓΓ iRs

u1I(θ)

uns

I (θ)uΓ (θ)

=

f1I

fns

Inssum

s=1

RTs f s

Γ

(2)

The solution process can be expanded using the same Polynomial Chaos basis as

u1I(θ)

uns

I (θ)uΓ (θ)

=N

sum

j=0

Ψj(θ)

u1Ij

uns

Ij

uΓ j

(3)

Substituting Eq(3) into Eq(2) and performing Galerkin projection to minimize the errorover the space spanned by the Polynomial Chaos basis [1] the following coupled deterministicsystems of equations is obtained

A1II 0 A1

IΓR1

0 Ans

II Ans

IΓRns

RT1 A

1ΓI RT

nsAns

ΓI

nssum

s=1

RTs A

sΓΓRs

U1I

Uns

I

=

F1I

Fns

Inssum

s=1

RTs F

(4)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

4

where

[Asαβ]jk =

Lsum

i=0

〈ΨiΨjΨk〉Asαβi Fs

αk = 〈Ψkfsα〉

UmI = (um

I 0 umI N )T UΓ = (uΓ 0 uΓ N )T

the subscripts α and β represent the index I and Γ The coefficient matrix in Eq(4) is of ordern(N +1)timesn(N +1) where n and (N +1) denote the total number of the degrees of freedom andchaos coefficients respectively The stochastic counterpart of the restriction operator in Eq(4)takes the following form

Rs = blockdiag(R0s R

Ns )

where (R0s R

Ns ) are the deterministic restriction operators In parallel implementation

Rs acts as a scatter operator while RTs acts as a gather operator and are not constructed explic-

itly

A block Gaussian elimination reduces the system in Eq(4) to the following extended Schurcomplement system for the interface variable UΓ

S UΓ = GΓ (5)

where the global extended Schur complement matrix S is given by

S =

nssum

s=1

RTs [As

ΓΓ minusAsΓI (A

sII)

minus1AsIΓ ]Rs

and the corresponding right hand vector GΓ is

GΓ =

nssum

s=1

RTs [Fs

Γ minusAsΓI (A

sII)

minus1FsI ]

Once the interface unknowns UΓ is available the interior unknowns can be obtainedconcurrently by solving the interior problem on each subdomain as

AsII Us

I = FsI minusAs

ΓIRsUΓ

4 Solution methods for the extended Schur complement system

Solution methods for linear systems are broadly categorized into direct methods and iterativemethods The direct methods generally are based on sparse Gaussian elimination technique andare popular for their robustness However they are expensive in computation time and memoryrequirements and therefore cannot be applied to the solution of large-scale linear systems [10]On the other hand the iterative methods generate a sequences of approximate solutions whichconverge to the true solutions In the iterative methods the main arithmetic operation is thematrix-vector multiplication Therefore the linear system itself need not be constructed explicitlyand only a procedure for matrix-vector product is required This property makes iterativemethods more suitable to parallel processing than direct methods

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

5

41 Preconditioned Conjugate Gradient Method (PCGM)Non-overlapping domain decomposition method or iterative substructuring can be viewed as apreconditioned iterative method to solve the Schur complement system of the form [11]

S UΓ = GΓ

For symmetric positive-definite system such as Schur complement system the ConjugateGradient Method (CGM) is generally used The performance of CGM mainly depends on thespectrum of the coefficient matrix However the rate of convergence of the iterative methodcan generally be improved by transforming the original system into an equivalent system thathas better spectral properties (ie lower condition number κ(S)) of the coefficient matrix Thistransformation is called preconditioning and the matrix used in the transformation is called thepreconditioner In other words the transformed linear system becomes

Mminus1S UΓ = Mminus1GΓ

In general κ(Mminus1S) is much smaller than κ(S) and the eigenvalues of Mminus1S are clusterednear one This procedure known as Preconditioned Conjugate Gradient Method (PCGM) Inpractice the explicit construction of Mminus1 is not needed Instead for a given vector rΓ a systemof the the following form is solved

MZ = rΓ

The PCGM algorithm to solve the Schur complement system proceeds as follows [10]

Algorithm 1 The PCGM Algorithm

1 Initialize UΓ0= 0

2 Compute rΓ0= GΓ minus S UΓ0

3 Precondition Z0 = Mminus1rΓ0

4 First search direction P0 = Z0

5 Initialize ρ0 = (rΓ0Z0)

6 For j = 0 1 middot middot middot until convergence Do

7 Qj = SPj

8 ρtmpj= (Qj Pj)

9 αj = ρjρtmpj

10 UΓj+1= UΓj

+ αjPj

11 rΓj+1= rΓj

minus αjQj

12 Zj+1 = Mminus1rΓj+1

13 ρj+1 = (rΓj+1Zj+1)

14 βj = ρj+1ρj

15 Pj+1 = Zj+1 + βjPj

16 EndDo

The PCGM algorithm indicates that the main arithmetic operations are calculating the prod-uct Q = SP in step 7 and the preconditioned residual Z = Mminus1rΓ in step 12 These operationscan be performed in parallel as outlined next

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

6

Given the subdomain Schur complement matrices Ss and a global vector P the matrix-vectorproduct Q = SP can be calculated in parallel as

Q =

nssum

s=1

RTs SsRsP

where ns is the number of subdomains and Rs and RTs are scatter and gather operator re-

spectively The parallel implementation of this procedure is summarized in Algorithm (2)

Algorithm 2 Parallel Matrix-Vector Product Procedure

1 Input (P)

2 Scatter Ps = RsP

3 Local operation Qs = SsPs

4 Gather Q =

nssum

s=1

RTs Q

s

5 Output (Q)

The working vectors Ps and Qs are defined on the subdomain level

Similarly the effect of a parallel preconditioner on a residual vector Z = Mminus1rΓ can becomputed as

Z =

nssum

s=1

RTs M

minus1s RsrΓ

This procedure is outlined in following algorithm

Algorithm 3 Parallel Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= RsrΓ

3 Local Solve MsZs = rs

Γ

4 Gather Z =

nssum

s=1

RTs Z

s

5 Output (Z)

The local preconditioner Ms and the working vectors rsΓ

and Zs are defined on the subdo-main level

5 Iterative substructuring methods of SPDEs

Next sections describe the primal and dual-primal substructuring methods in the context ofSPDEs In the primal method the interface problem is solved iteratively using PCGM solverequipped with a scalable preconditioner At each iteration of the iterative solver loop localproblems are solved on each subdomain in parallel These local problems are used to construct asubdomain level precondtioner Moreover a coarse problem is required to propagate information

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

7

globally across the subdomains This global exchange of information leads to a scalablepreconditioner In the dual-primal method a system of Lagrange multiplier that enforcescontinuity constraints across the interface boundary is solved iteratively using PCGM solverThe global coarse problem is already embedded in the operator of the Lagrange multiplier systemand therefore a one-level preconditioner such as lumped or Dirichlet is sufficient for scalability Aframework of the primal and dual-primal iterative substructuring methods for SPDEs is detailednext

6 A primal iterative substructuring method of SPDEs

In order to define local problems over each of the subdomains we partition the subdomainnodal vector us(θ) into a set of interior unknowns us

i (θ) corner unknowns usc(θ) and remaining

unknowns usr(θ) as schematically shown in Fig(2)

Figure 2 Partitioning domain nodes into interior () remaining () and corner(bull) nodes

According to this partitioning scheme the subdomain equilibrium equation can berepresented as

Asii(θ) As

ir(θ) Asic(θ)

Asri(θ) As

rr(θ) Asrc(θ)

Asci(θ) As

cr(θ) Ascc(θ)

usi (θ)

usr(θ)

usc(θ)

=

f si

f sr

f sc

The Polynomial Chaos representation of uncertain model parameters leads to the followingsubdomain equilibrium equation

Lsum

l=0

Ψl

Asiil As

irl Asicl

Asril As

rrl Asrcl

Ascil As

crl Asccl

usi (θ)

usr(θ)

usc(θ)

=

f si

f sr

f sc

(6)

The solution process is expressed using the same Polynomial Chaos basis as

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

8

usi (θ)

usr(θ)

usc(θ)

=

Nsum

j=0

Ψj(θ)

usij

usrj

uscj

(7)

Substituting Eq(7) into Eq(6) and performing Galerkin projection leads to the followingcoupled deterministic systems of equations

Asii As

ir Asic

Asri As

rr Asrc

Asci As

cr Ascc

Usi

Usr

Usc

=

Fsi

Fsr

Fsc

(8)

where

[Asαβ ]jk =

Lsum

l=0

〈ΨlΨjΨk〉Asαβl

Fsαk = 〈Ψkf

sα〉

Usα = (us

α0 middot middot middot usαN )T

the subscripts α and β represent the index i r and c

Enforcing the transmission conditions along the boundary interfaces the subdomainequilibrium equation can be expressed as

Asii As

irBsr As

icBsc

nssum

s=1

BsrTAs

ri

nssum

s=1

BsrTAs

rrBsr

nssum

s=1

BsrTAs

rcBsc

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

crBsr

nssum

s=1

BscTAs

ccBsc

Usi

Ur

Uc

=

Fsi

nssum

s=1

BsrTFs

r

nssum

s=1

BscTFs

c

(9)

where Bsr is a Boolean rectangular matrix that maps the global remaining vector Ur to the

local remaining vector Usr as

Usr = Bs

rUr (10)

Similarly the restriction operator Bsc is a Boolean rectangular matrix that maps the global

corner vector Uc to the local corner vector Usc as

Usc = Bs

cUc (11)

In parallel implementation both Bsr and Bs

c act as scatter operators while BsrT and Bs

cT act

as gather operators

The first block equation in Eq(9) can be solved for Usi in parallel as

Usi = [As

ii]minus1(Fs

i minusAsirB

srUr minusAs

icBscUc) (12)

Substituting Eq(12) into Eq(9) leads to the following condensed system which representsthe Schur compliment system in Eq(5) but now the interface boundary nodes are split into

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

9

remaining and corner nodes as shown schematically in Fig(3)

nssum

s=1

BsrTSs

rrBsr

nssum

s=1

BsrTSs

rcBsc

nssum

s=1

BscTSs

crBsr

nssum

s=1

BscTSs

ccBsc

Ur

Uc

=

nssum

s=1

BsrTGs

r

nssum

s=1

BscTGs

c

(13)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

Figure 3 The interface boundary nodes split into remainder () and corner(bull) nodes

The corner nodal vector Uc in Eq(13) is eliminated next to obtain the following (symmetricpositive definite) reduced interface problem

(Frr minus Frc[Fcc]minus1Fcr)Ur = dr minus Frc[Fcc]

minus1dc (14)

where

Fαβ =

nssum

s=1

Bsα

TSsαβB

dα =

nssum

s=1

Bsα

TGsα

and α and β denotes subscripts r and c The above system can be solved using PCGM withan appropriate preconditioner Mminus1 defined in the next section

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

10

61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system

Figure 4 Partial assembly of corner nodes (bull)

Ssrr Ss

rcBsc

nssum

s=1

BscTSs

crBsr

nssum

s=1

BscTSs

ccBsc

Usr

Uc

=

Fsr

0

(15)

where

Fsr = Ds

rBsrrj

and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal

weighting matrix which satisfies the following property

nssum

s=1

BsrTDs

rBsr = I

The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that

share the interface boundary nodes

The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)

as

Usr = [Ss

rr]minus1(Fs

r minus SsrcB

scUc) (16)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

11

Substituting Usr into the second block of Eq(15) leads to the following coarse problem

F lowast

ccUc = dlowastc (17)

where

F lowast

cc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

dlowastc = minus

nssum

s=1

BscTSs

cr[Ssrr]

minus1Fsr

The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as

Ur =

nssum

s=1

BsrTDs

rUsr

After some algebraic manipulations the preconditioner can be expressed as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr + RT

0 [F lowast

cc]minus1R0 (18)

where

R0 =

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

7 A dual-primal iterative substructuring method of SPDEs

In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes

Partial assembly of Eq(8) leads to the following subdomain equilibrium equation

Asii As

ir AsicB

sc

Asri As

rr AsrcB

sc

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc

Usi

Usr

Uc

=

Fsi

Fsr

nssum

s=1

BscTFs

c

(19)

where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local

corner vector Usc as

Usc = Bs

cUc

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

12

Eq(19) can be rewritten in compact form as

AsUs = Fs (20)

Let Bsr be a block diagonal signed Boolean continuity matrix defined as

s=nssum

s=1

BsrU

sr = 0

Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as

1

2UT AU minus UT F rarr min (21)

subject to BU = 0

where

A =

A1

As

Ans

U =

U1

Us

Uns

F =

F1

Fs

Fns

B =[

(0 B1r 0) middot middot middot (0 Bs

r 0) middot middot middot (0 Bnsr 0)

]

By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as

L(U Λ) =1

2UT AU minus UT F + UT BT Λ (22)

Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system

Asii As

ir AsicB

sc 0

Asri As

rr AsrcB

sc Bs

rT

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc 0

0

nssum

s=1

Bsr 0 0

Usi

Usr

Uc

Λ

=

Fsi

Fsr

nssum

s=1

BscTFs

c

0

(23)

where

Λ =

λ0

λN

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

13

Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary

and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)

Eliminating the interior unknowns Usi from Eq(23) as

Usi = [As

ii]minus1(Fs

i minusAsirU

sr minusAs

icBscUc) (24)

Substituting Eq(24) into Eq(23) leads to

Ssrr Ss

rcBsc Bs

rT

nssum

s=1

BscTSs

cr

nssum

s=1

BscTSs

ccBsc 0

nssum

s=1

Bsr 0 0

Usr

Uc

Λ

=

Gsr

nssum

s=1

BscTGs

c

0

(25)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)

as

Usr = [Ss

rr]minus1(Gs

r minus SsrcB

scUc minus Bs

rT Λ) (26)

Substituting Eq(26) into Eq(25) leads to

[

Fcc minusFcr

Frc Frr

]

Uc

Λ

=

dc

dr

(27)

where

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

14

Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Fcr =

nssum

s=1

BscTSs

cr[Ssrr]

minus1BsrT

Frc =

nssum

s=1

Bsr [S

srr]

minus1SsrcB

sc

Frr =

nssum

s=1

Bsr [S

srr]

minus1BsrT

dc =

nssum

s=1

BscT (Gs

c minus Sscr[S

srr]

minus1Gsr)

dr =

nssum

s=1

Bsr [S

srr]

minus1Gsr

Solving for Uc from Eq(27) gives the following coarse problem

FccUc = (dc + FcrΛ) (28)

Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system

(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]

minus1dc (29)

Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as

M =

nssum

s=1

BsrD

srS

srrD

srB

srT (30)

8 Connection between the methods

The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as

F lowast

cc = Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr

+

nssum

s=1

BsrTDs

r[Ssrr]

minus1SrcBsc

[

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

]minus1

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

which has the same form of the dual-primal operator in Eq(29)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

15

9 Parallel implementation

In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem

91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 4 Parallel Matrix-Vector Product for Primal Method

1 Input (P)

2 Scatter Ps = BsrP

3 Compute vs1 = Ss

crPs

4 Gather V1 =

nssum

s=1

BsrT vs

1

5 Global Solve FccV2 = V1

6 Scatter vs2 = Bs

cV2

7 Compute vs3 = Ss

rcvs2

8 Update Qs = SsrrP

s minus vs3

9 Gather Q =

nssum

s=1

BsrTQs

10 Output (Q)

Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as

vsα = Ss

αβvsβ

vsα = (As

αβ minusAsαi[A

sii]

minus1Asiβ)vs

β

This procedure is outlined in the following algorithm

Algorithm 5 Dirichlet Solver Procedure

1 Input (vsβ)

2 Compute vs1 = As

iβvsβ

3 Solve Asiiv

s2 = vs

1

4 Compute vs3 = As

αivs2

5 Compute vs4 = As

αβvsβ

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

16

6 Compute vsα = vs

4 minus vs3

7 Output (vsα)

The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as

Mminus1cc FccV2 = Mminus1

cc V1

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm

Algorithm 6 Two-Level Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter Fsr = Ds

rBsrrΓ

3 Local Solve Ssrrv

s1 = Fs

r

4 Compute dsc = Ss

crvs1

5 Gather dc =

nssum

s=1

BscT ds

c

6 Global Solve F lowastccZc = minusdc

7 Scatter Zsc = BsZc

8 Update vs2 = Fs

r + SsrcZ

sc

9 Local Solve SsrrZ

sf = vs

2

10 Gather Z =

nssum

s=1

BsrTDs

rZsf

11 Output (Z)

The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss

rrUsr = rs

r which can be solved using the following algorithm

Algorithm 7 Neumann-Solver Procedure

1 Input (rsr)

2 Solve[

Asii As

ir

Asri As

rr

]

X s

Usr

=

0rsr

3 Output (Usr )

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

17

The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as

Mminus1cc F lowast

ccZc = minusMminus1cc dc

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart

Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem

92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr + Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method

1 Input (P)

2 Scatter Ps = BsrTP

3 Local Solve Ssrrv

s1 = Ps

4 Compute vs2 = Ss

crvs1

5 Gather V2 =

nssum

s=1

BscT vs

2

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

18

6 Global Solve FccV3 = V2

7 Scatter vs3 = Bs

cV3

8 Compute vs4 = Ss

rcvs3

9 Update vs5 = Ps + vs

4

10 Local Solve SsrrQ

s = vs5

11 Gather Q =

nssum

s=1

BsrQ

s

12 Output (Q)

The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner

Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm

Algorithm 9 Dirichlet Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= DsrB

srT rΓ

3 Compute Zs = Ssrrr

4 Gather Z =

nssum

s=1

BsrD

srZ

s

5 Output (Z)

We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart

Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

19

10 Numerical results

For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as

part

partx[cx(x y θ)

partu(x y θ)

partx] +

part

party[cy(x y θ)

partu(x y θ)

party] = f(x y) in Ω

where the forcing term is

f(x y) = 10

For simplicity a homogeneous Dirichlet boundary condition is imposed as

u(x y θ) = 0 on partΩ

The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025

In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5

GkΓminus SUk

Γ2

G0Γ2

6 10minus5

Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh

101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

20

Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS

Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed

Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1

102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

21

Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3

Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5

Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1

a total problem size of (10 times n)

Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

22

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 3: Primal and dual-primal iterative substructuring methods of stochastic

Neumann-Neumann preconditioner in [4] demonstrates a good (strong and weak) scalability forthe moderate range of CPUs considered

In this paper we first describe a primal version of iterative substructuring methods for thesolution of the large-scale linear system arising from stochastic finite element method Thealgorithm offers a straightforward approach to formulate a two-level scalable preconditionerThe continuity condition is strictly enforced on the corner nodes (nodes shared among morethan two subdomains including the nodes at the ends of the interface edges) For the remainingpart of the interface boundary the continuity condition is satisfied weakly (in an average sense)Note that the continuity of the solution field across the entire interface boundary is eventuallysatisfied at the convergence of the iterations This approach naturally leads to a coarse gridwhich connects the subdomains globally via the corner nodes The coarse grid provides amechanism to propagate information globally which makes the algorithm scalable with respectto subdomain size In the second part of the paper a dual-primal iterative substructuringmethod is introduced for SPDEs which maybe viewed as an extension of the Dual-Primal FiniteElement Tearing and Interconnecting method (FETI-DP) [5] in the context of SPDEs In thisapproach the continuity condition on the corner nodes is strictly satisfied by partial assemblywhile Lagrange multipliers are used to enforce continuity on the remaining part of the interfaceboundary A system of Lagrange multiplier (also called the dual variable) is solved iterativelyusing PCGM method equipped with Dirichlet preconditioner PETSc [6] and MPI [7] librariesare used for efficient parallel implementation of the primal and dual-primal algorithms Thegraph partitioning tool METIS [8] is used for optimal decomposition of the finite element meshfor load balancing and minimum interprocessor communication The parallel performance of thealgorithms is studied for a two dimensional stochastic elliptic PDE with non-Gaussian randomcoefficients The numerical experiments are performed using a Linux cluster

2 Uncertainty representation by stochastic processes

This section provides a brief review of the theories of stochastic processes relevant to thesubsequent developments of the paper [9 1] We assume the data induces a representation of themodel parameters as random variables and processes which span the Hilbert space HG A set ofbasis functions ξi is identified to characterize this space using Karhunen-Loeve expansion Thestate of the system resides in the Hilbert space HL with basis functions Ψi being identifiedwith the Polynomial Chaos expansion (PC) The Karhunen-Loeve expansion of a stochasticprocess α(x θ) is based on the spectral expansion of its covariance function Rαα(x y) Theexpansion takes the following form

α(x θ) = α(x) +infin

sum

i=1

radic

λiξi(θ)φi(x)

where α(x) is the mean of the stochastic process θ represents the random dimension andξi(θ) is a set of uncorrelated (but not generally independent for non-Gaussian processes) randomvariables φi(x) are the eigenfunctions and λi are the eigenvalues of the covariance kernel whichcan be obtained as the solution to the following integral equation

int

D

Rαα(x y)φi(y)dy = λiφi(x)

where D denotes the spatial dimension over which the process is defined The covariance functionof the solution process is not known a priori and hence the Karhunen-Loeve expansion cannot beused to represent it Therefore a generic basis that is complete in the space of all second-orderrandom variables will be identified and used in the approximation process Since the solution

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

2

process is a function of the material properties nodal solution variables denoted by u(θ) can beformally expressed as some nonlinear functional of the set ξi(θ) used to represent the materialstochasticity It has been shown that this functional dependence can be expanded in terms ofpolynomials in Gaussian random variables namely Polynomial Chaos [9] as

u(θ) =N

sum

j=0

Ψj(θ)uj

These polynomials are orthogonal in the sense that their inner product 〈ΨjΨk〉 defined asthe statistical average of their product is equal to zero for j 6= k

3 Review of Schur complement based domain decomposition method of SPDEs

Consider an elliptic stochastic PDE defined on a domain Ω with a given boundary conditionson partΩ Finite element discretization of the stochastic PDE leads to the following linear system

A(θ)u(θ) = f (1)

where A(θ) is the stiffness matrix with random coefficients u(θ) is the stochastic processrepresenting the response vector and f is the applied force For large-scale system Eq(1) canbe solved efficiently using domain decomposition method [1 2]

In domain decomposition method the spatial domain Ω is partitioned into ns non-overlappingsubdomains Ωs 1 le s le ns such that

Ω =

ns⋃

s=1

Ωs Ωs

Ωr = 0 s 6= r

andΓ =

s=1

Γs where Γs = partΩspartΩ

For a typical subdomain Ωs the nodal vector us(θ) is partitioned into a set of interior un-knowns us

I(θ) associated with nodes in the interior of Ωs and interface unknowns usΓ(θ) associated

with nodes that are shared among two or more subdomains as shown in Fig(1)

Consequently the subdomain equilibrium equation can be represented as

[

AsII(θ) As

IΓ (θ)As

ΓI(θ) AsΓΓ

(θ)

]

usI(θ)

usΓ(θ)

=

f sI

f sΓ

The Polynomial Chaos expansion can be used to represent the uncertainty in the modelparameters as

Lsum

i=0

Ψi

[

AsIIi As

IΓ i

AsΓIi As

ΓΓ i

]

usI(θ)

usΓ(θ)

=

f sI

f sΓ

A Boolean restriction operator Rs of size (nsΓtimes nΓ) which maps the global interface vector

uΓ (θ) to the local interface vector usΓ(θ) is defined as

usΓ (θ) = RsuΓ (θ)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

3

Figure 1 Partitioning domain nodes into interior () and interface ()

Enforcing the transmission conditions (compatibility and equilibrium) along the interfacesthe global equilibrium equation of the stochastic system can be expressed in the following blocklinear systems of equations

Lsum

i=0

Ψi

A1IIi 0 A1

IΓ iR1

0 Ans

IIi Ans

IΓ iRns

RT1 A1

ΓIi RTns

Ans

ΓIi

nssum

s=1

RTs As

ΓΓ iRs

u1I(θ)

uns

I (θ)uΓ (θ)

=

f1I

fns

Inssum

s=1

RTs f s

Γ

(2)

The solution process can be expanded using the same Polynomial Chaos basis as

u1I(θ)

uns

I (θ)uΓ (θ)

=N

sum

j=0

Ψj(θ)

u1Ij

uns

Ij

uΓ j

(3)

Substituting Eq(3) into Eq(2) and performing Galerkin projection to minimize the errorover the space spanned by the Polynomial Chaos basis [1] the following coupled deterministicsystems of equations is obtained

A1II 0 A1

IΓR1

0 Ans

II Ans

IΓRns

RT1 A

1ΓI RT

nsAns

ΓI

nssum

s=1

RTs A

sΓΓRs

U1I

Uns

I

=

F1I

Fns

Inssum

s=1

RTs F

(4)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

4

where

[Asαβ]jk =

Lsum

i=0

〈ΨiΨjΨk〉Asαβi Fs

αk = 〈Ψkfsα〉

UmI = (um

I 0 umI N )T UΓ = (uΓ 0 uΓ N )T

the subscripts α and β represent the index I and Γ The coefficient matrix in Eq(4) is of ordern(N +1)timesn(N +1) where n and (N +1) denote the total number of the degrees of freedom andchaos coefficients respectively The stochastic counterpart of the restriction operator in Eq(4)takes the following form

Rs = blockdiag(R0s R

Ns )

where (R0s R

Ns ) are the deterministic restriction operators In parallel implementation

Rs acts as a scatter operator while RTs acts as a gather operator and are not constructed explic-

itly

A block Gaussian elimination reduces the system in Eq(4) to the following extended Schurcomplement system for the interface variable UΓ

S UΓ = GΓ (5)

where the global extended Schur complement matrix S is given by

S =

nssum

s=1

RTs [As

ΓΓ minusAsΓI (A

sII)

minus1AsIΓ ]Rs

and the corresponding right hand vector GΓ is

GΓ =

nssum

s=1

RTs [Fs

Γ minusAsΓI (A

sII)

minus1FsI ]

Once the interface unknowns UΓ is available the interior unknowns can be obtainedconcurrently by solving the interior problem on each subdomain as

AsII Us

I = FsI minusAs

ΓIRsUΓ

4 Solution methods for the extended Schur complement system

Solution methods for linear systems are broadly categorized into direct methods and iterativemethods The direct methods generally are based on sparse Gaussian elimination technique andare popular for their robustness However they are expensive in computation time and memoryrequirements and therefore cannot be applied to the solution of large-scale linear systems [10]On the other hand the iterative methods generate a sequences of approximate solutions whichconverge to the true solutions In the iterative methods the main arithmetic operation is thematrix-vector multiplication Therefore the linear system itself need not be constructed explicitlyand only a procedure for matrix-vector product is required This property makes iterativemethods more suitable to parallel processing than direct methods

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

5

41 Preconditioned Conjugate Gradient Method (PCGM)Non-overlapping domain decomposition method or iterative substructuring can be viewed as apreconditioned iterative method to solve the Schur complement system of the form [11]

S UΓ = GΓ

For symmetric positive-definite system such as Schur complement system the ConjugateGradient Method (CGM) is generally used The performance of CGM mainly depends on thespectrum of the coefficient matrix However the rate of convergence of the iterative methodcan generally be improved by transforming the original system into an equivalent system thathas better spectral properties (ie lower condition number κ(S)) of the coefficient matrix Thistransformation is called preconditioning and the matrix used in the transformation is called thepreconditioner In other words the transformed linear system becomes

Mminus1S UΓ = Mminus1GΓ

In general κ(Mminus1S) is much smaller than κ(S) and the eigenvalues of Mminus1S are clusterednear one This procedure known as Preconditioned Conjugate Gradient Method (PCGM) Inpractice the explicit construction of Mminus1 is not needed Instead for a given vector rΓ a systemof the the following form is solved

MZ = rΓ

The PCGM algorithm to solve the Schur complement system proceeds as follows [10]

Algorithm 1 The PCGM Algorithm

1 Initialize UΓ0= 0

2 Compute rΓ0= GΓ minus S UΓ0

3 Precondition Z0 = Mminus1rΓ0

4 First search direction P0 = Z0

5 Initialize ρ0 = (rΓ0Z0)

6 For j = 0 1 middot middot middot until convergence Do

7 Qj = SPj

8 ρtmpj= (Qj Pj)

9 αj = ρjρtmpj

10 UΓj+1= UΓj

+ αjPj

11 rΓj+1= rΓj

minus αjQj

12 Zj+1 = Mminus1rΓj+1

13 ρj+1 = (rΓj+1Zj+1)

14 βj = ρj+1ρj

15 Pj+1 = Zj+1 + βjPj

16 EndDo

The PCGM algorithm indicates that the main arithmetic operations are calculating the prod-uct Q = SP in step 7 and the preconditioned residual Z = Mminus1rΓ in step 12 These operationscan be performed in parallel as outlined next

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

6

Given the subdomain Schur complement matrices Ss and a global vector P the matrix-vectorproduct Q = SP can be calculated in parallel as

Q =

nssum

s=1

RTs SsRsP

where ns is the number of subdomains and Rs and RTs are scatter and gather operator re-

spectively The parallel implementation of this procedure is summarized in Algorithm (2)

Algorithm 2 Parallel Matrix-Vector Product Procedure

1 Input (P)

2 Scatter Ps = RsP

3 Local operation Qs = SsPs

4 Gather Q =

nssum

s=1

RTs Q

s

5 Output (Q)

The working vectors Ps and Qs are defined on the subdomain level

Similarly the effect of a parallel preconditioner on a residual vector Z = Mminus1rΓ can becomputed as

Z =

nssum

s=1

RTs M

minus1s RsrΓ

This procedure is outlined in following algorithm

Algorithm 3 Parallel Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= RsrΓ

3 Local Solve MsZs = rs

Γ

4 Gather Z =

nssum

s=1

RTs Z

s

5 Output (Z)

The local preconditioner Ms and the working vectors rsΓ

and Zs are defined on the subdo-main level

5 Iterative substructuring methods of SPDEs

Next sections describe the primal and dual-primal substructuring methods in the context ofSPDEs In the primal method the interface problem is solved iteratively using PCGM solverequipped with a scalable preconditioner At each iteration of the iterative solver loop localproblems are solved on each subdomain in parallel These local problems are used to construct asubdomain level precondtioner Moreover a coarse problem is required to propagate information

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

7

globally across the subdomains This global exchange of information leads to a scalablepreconditioner In the dual-primal method a system of Lagrange multiplier that enforcescontinuity constraints across the interface boundary is solved iteratively using PCGM solverThe global coarse problem is already embedded in the operator of the Lagrange multiplier systemand therefore a one-level preconditioner such as lumped or Dirichlet is sufficient for scalability Aframework of the primal and dual-primal iterative substructuring methods for SPDEs is detailednext

6 A primal iterative substructuring method of SPDEs

In order to define local problems over each of the subdomains we partition the subdomainnodal vector us(θ) into a set of interior unknowns us

i (θ) corner unknowns usc(θ) and remaining

unknowns usr(θ) as schematically shown in Fig(2)

Figure 2 Partitioning domain nodes into interior () remaining () and corner(bull) nodes

According to this partitioning scheme the subdomain equilibrium equation can berepresented as

Asii(θ) As

ir(θ) Asic(θ)

Asri(θ) As

rr(θ) Asrc(θ)

Asci(θ) As

cr(θ) Ascc(θ)

usi (θ)

usr(θ)

usc(θ)

=

f si

f sr

f sc

The Polynomial Chaos representation of uncertain model parameters leads to the followingsubdomain equilibrium equation

Lsum

l=0

Ψl

Asiil As

irl Asicl

Asril As

rrl Asrcl

Ascil As

crl Asccl

usi (θ)

usr(θ)

usc(θ)

=

f si

f sr

f sc

(6)

The solution process is expressed using the same Polynomial Chaos basis as

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

8

usi (θ)

usr(θ)

usc(θ)

=

Nsum

j=0

Ψj(θ)

usij

usrj

uscj

(7)

Substituting Eq(7) into Eq(6) and performing Galerkin projection leads to the followingcoupled deterministic systems of equations

Asii As

ir Asic

Asri As

rr Asrc

Asci As

cr Ascc

Usi

Usr

Usc

=

Fsi

Fsr

Fsc

(8)

where

[Asαβ ]jk =

Lsum

l=0

〈ΨlΨjΨk〉Asαβl

Fsαk = 〈Ψkf

sα〉

Usα = (us

α0 middot middot middot usαN )T

the subscripts α and β represent the index i r and c

Enforcing the transmission conditions along the boundary interfaces the subdomainequilibrium equation can be expressed as

Asii As

irBsr As

icBsc

nssum

s=1

BsrTAs

ri

nssum

s=1

BsrTAs

rrBsr

nssum

s=1

BsrTAs

rcBsc

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

crBsr

nssum

s=1

BscTAs

ccBsc

Usi

Ur

Uc

=

Fsi

nssum

s=1

BsrTFs

r

nssum

s=1

BscTFs

c

(9)

where Bsr is a Boolean rectangular matrix that maps the global remaining vector Ur to the

local remaining vector Usr as

Usr = Bs

rUr (10)

Similarly the restriction operator Bsc is a Boolean rectangular matrix that maps the global

corner vector Uc to the local corner vector Usc as

Usc = Bs

cUc (11)

In parallel implementation both Bsr and Bs

c act as scatter operators while BsrT and Bs

cT act

as gather operators

The first block equation in Eq(9) can be solved for Usi in parallel as

Usi = [As

ii]minus1(Fs

i minusAsirB

srUr minusAs

icBscUc) (12)

Substituting Eq(12) into Eq(9) leads to the following condensed system which representsthe Schur compliment system in Eq(5) but now the interface boundary nodes are split into

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

9

remaining and corner nodes as shown schematically in Fig(3)

nssum

s=1

BsrTSs

rrBsr

nssum

s=1

BsrTSs

rcBsc

nssum

s=1

BscTSs

crBsr

nssum

s=1

BscTSs

ccBsc

Ur

Uc

=

nssum

s=1

BsrTGs

r

nssum

s=1

BscTGs

c

(13)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

Figure 3 The interface boundary nodes split into remainder () and corner(bull) nodes

The corner nodal vector Uc in Eq(13) is eliminated next to obtain the following (symmetricpositive definite) reduced interface problem

(Frr minus Frc[Fcc]minus1Fcr)Ur = dr minus Frc[Fcc]

minus1dc (14)

where

Fαβ =

nssum

s=1

Bsα

TSsαβB

dα =

nssum

s=1

Bsα

TGsα

and α and β denotes subscripts r and c The above system can be solved using PCGM withan appropriate preconditioner Mminus1 defined in the next section

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

10

61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system

Figure 4 Partial assembly of corner nodes (bull)

Ssrr Ss

rcBsc

nssum

s=1

BscTSs

crBsr

nssum

s=1

BscTSs

ccBsc

Usr

Uc

=

Fsr

0

(15)

where

Fsr = Ds

rBsrrj

and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal

weighting matrix which satisfies the following property

nssum

s=1

BsrTDs

rBsr = I

The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that

share the interface boundary nodes

The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)

as

Usr = [Ss

rr]minus1(Fs

r minus SsrcB

scUc) (16)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

11

Substituting Usr into the second block of Eq(15) leads to the following coarse problem

F lowast

ccUc = dlowastc (17)

where

F lowast

cc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

dlowastc = minus

nssum

s=1

BscTSs

cr[Ssrr]

minus1Fsr

The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as

Ur =

nssum

s=1

BsrTDs

rUsr

After some algebraic manipulations the preconditioner can be expressed as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr + RT

0 [F lowast

cc]minus1R0 (18)

where

R0 =

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

7 A dual-primal iterative substructuring method of SPDEs

In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes

Partial assembly of Eq(8) leads to the following subdomain equilibrium equation

Asii As

ir AsicB

sc

Asri As

rr AsrcB

sc

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc

Usi

Usr

Uc

=

Fsi

Fsr

nssum

s=1

BscTFs

c

(19)

where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local

corner vector Usc as

Usc = Bs

cUc

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

12

Eq(19) can be rewritten in compact form as

AsUs = Fs (20)

Let Bsr be a block diagonal signed Boolean continuity matrix defined as

s=nssum

s=1

BsrU

sr = 0

Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as

1

2UT AU minus UT F rarr min (21)

subject to BU = 0

where

A =

A1

As

Ans

U =

U1

Us

Uns

F =

F1

Fs

Fns

B =[

(0 B1r 0) middot middot middot (0 Bs

r 0) middot middot middot (0 Bnsr 0)

]

By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as

L(U Λ) =1

2UT AU minus UT F + UT BT Λ (22)

Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system

Asii As

ir AsicB

sc 0

Asri As

rr AsrcB

sc Bs

rT

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc 0

0

nssum

s=1

Bsr 0 0

Usi

Usr

Uc

Λ

=

Fsi

Fsr

nssum

s=1

BscTFs

c

0

(23)

where

Λ =

λ0

λN

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

13

Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary

and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)

Eliminating the interior unknowns Usi from Eq(23) as

Usi = [As

ii]minus1(Fs

i minusAsirU

sr minusAs

icBscUc) (24)

Substituting Eq(24) into Eq(23) leads to

Ssrr Ss

rcBsc Bs

rT

nssum

s=1

BscTSs

cr

nssum

s=1

BscTSs

ccBsc 0

nssum

s=1

Bsr 0 0

Usr

Uc

Λ

=

Gsr

nssum

s=1

BscTGs

c

0

(25)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)

as

Usr = [Ss

rr]minus1(Gs

r minus SsrcB

scUc minus Bs

rT Λ) (26)

Substituting Eq(26) into Eq(25) leads to

[

Fcc minusFcr

Frc Frr

]

Uc

Λ

=

dc

dr

(27)

where

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

14

Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Fcr =

nssum

s=1

BscTSs

cr[Ssrr]

minus1BsrT

Frc =

nssum

s=1

Bsr [S

srr]

minus1SsrcB

sc

Frr =

nssum

s=1

Bsr [S

srr]

minus1BsrT

dc =

nssum

s=1

BscT (Gs

c minus Sscr[S

srr]

minus1Gsr)

dr =

nssum

s=1

Bsr [S

srr]

minus1Gsr

Solving for Uc from Eq(27) gives the following coarse problem

FccUc = (dc + FcrΛ) (28)

Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system

(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]

minus1dc (29)

Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as

M =

nssum

s=1

BsrD

srS

srrD

srB

srT (30)

8 Connection between the methods

The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as

F lowast

cc = Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr

+

nssum

s=1

BsrTDs

r[Ssrr]

minus1SrcBsc

[

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

]minus1

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

which has the same form of the dual-primal operator in Eq(29)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

15

9 Parallel implementation

In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem

91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 4 Parallel Matrix-Vector Product for Primal Method

1 Input (P)

2 Scatter Ps = BsrP

3 Compute vs1 = Ss

crPs

4 Gather V1 =

nssum

s=1

BsrT vs

1

5 Global Solve FccV2 = V1

6 Scatter vs2 = Bs

cV2

7 Compute vs3 = Ss

rcvs2

8 Update Qs = SsrrP

s minus vs3

9 Gather Q =

nssum

s=1

BsrTQs

10 Output (Q)

Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as

vsα = Ss

αβvsβ

vsα = (As

αβ minusAsαi[A

sii]

minus1Asiβ)vs

β

This procedure is outlined in the following algorithm

Algorithm 5 Dirichlet Solver Procedure

1 Input (vsβ)

2 Compute vs1 = As

iβvsβ

3 Solve Asiiv

s2 = vs

1

4 Compute vs3 = As

αivs2

5 Compute vs4 = As

αβvsβ

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

16

6 Compute vsα = vs

4 minus vs3

7 Output (vsα)

The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as

Mminus1cc FccV2 = Mminus1

cc V1

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm

Algorithm 6 Two-Level Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter Fsr = Ds

rBsrrΓ

3 Local Solve Ssrrv

s1 = Fs

r

4 Compute dsc = Ss

crvs1

5 Gather dc =

nssum

s=1

BscT ds

c

6 Global Solve F lowastccZc = minusdc

7 Scatter Zsc = BsZc

8 Update vs2 = Fs

r + SsrcZ

sc

9 Local Solve SsrrZ

sf = vs

2

10 Gather Z =

nssum

s=1

BsrTDs

rZsf

11 Output (Z)

The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss

rrUsr = rs

r which can be solved using the following algorithm

Algorithm 7 Neumann-Solver Procedure

1 Input (rsr)

2 Solve[

Asii As

ir

Asri As

rr

]

X s

Usr

=

0rsr

3 Output (Usr )

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

17

The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as

Mminus1cc F lowast

ccZc = minusMminus1cc dc

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart

Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem

92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr + Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method

1 Input (P)

2 Scatter Ps = BsrTP

3 Local Solve Ssrrv

s1 = Ps

4 Compute vs2 = Ss

crvs1

5 Gather V2 =

nssum

s=1

BscT vs

2

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

18

6 Global Solve FccV3 = V2

7 Scatter vs3 = Bs

cV3

8 Compute vs4 = Ss

rcvs3

9 Update vs5 = Ps + vs

4

10 Local Solve SsrrQ

s = vs5

11 Gather Q =

nssum

s=1

BsrQ

s

12 Output (Q)

The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner

Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm

Algorithm 9 Dirichlet Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= DsrB

srT rΓ

3 Compute Zs = Ssrrr

4 Gather Z =

nssum

s=1

BsrD

srZ

s

5 Output (Z)

We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart

Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

19

10 Numerical results

For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as

part

partx[cx(x y θ)

partu(x y θ)

partx] +

part

party[cy(x y θ)

partu(x y θ)

party] = f(x y) in Ω

where the forcing term is

f(x y) = 10

For simplicity a homogeneous Dirichlet boundary condition is imposed as

u(x y θ) = 0 on partΩ

The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025

In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5

GkΓminus SUk

Γ2

G0Γ2

6 10minus5

Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh

101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

20

Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS

Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed

Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1

102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

21

Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3

Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5

Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1

a total problem size of (10 times n)

Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

22

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 4: Primal and dual-primal iterative substructuring methods of stochastic

process is a function of the material properties nodal solution variables denoted by u(θ) can beformally expressed as some nonlinear functional of the set ξi(θ) used to represent the materialstochasticity It has been shown that this functional dependence can be expanded in terms ofpolynomials in Gaussian random variables namely Polynomial Chaos [9] as

u(θ) =N

sum

j=0

Ψj(θ)uj

These polynomials are orthogonal in the sense that their inner product 〈ΨjΨk〉 defined asthe statistical average of their product is equal to zero for j 6= k

3 Review of Schur complement based domain decomposition method of SPDEs

Consider an elliptic stochastic PDE defined on a domain Ω with a given boundary conditionson partΩ Finite element discretization of the stochastic PDE leads to the following linear system

A(θ)u(θ) = f (1)

where A(θ) is the stiffness matrix with random coefficients u(θ) is the stochastic processrepresenting the response vector and f is the applied force For large-scale system Eq(1) canbe solved efficiently using domain decomposition method [1 2]

In domain decomposition method the spatial domain Ω is partitioned into ns non-overlappingsubdomains Ωs 1 le s le ns such that

Ω =

ns⋃

s=1

Ωs Ωs

Ωr = 0 s 6= r

andΓ =

s=1

Γs where Γs = partΩspartΩ

For a typical subdomain Ωs the nodal vector us(θ) is partitioned into a set of interior un-knowns us

I(θ) associated with nodes in the interior of Ωs and interface unknowns usΓ(θ) associated

with nodes that are shared among two or more subdomains as shown in Fig(1)

Consequently the subdomain equilibrium equation can be represented as

[

AsII(θ) As

IΓ (θ)As

ΓI(θ) AsΓΓ

(θ)

]

usI(θ)

usΓ(θ)

=

f sI

f sΓ

The Polynomial Chaos expansion can be used to represent the uncertainty in the modelparameters as

Lsum

i=0

Ψi

[

AsIIi As

IΓ i

AsΓIi As

ΓΓ i

]

usI(θ)

usΓ(θ)

=

f sI

f sΓ

A Boolean restriction operator Rs of size (nsΓtimes nΓ) which maps the global interface vector

uΓ (θ) to the local interface vector usΓ(θ) is defined as

usΓ (θ) = RsuΓ (θ)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

3

Figure 1 Partitioning domain nodes into interior () and interface ()

Enforcing the transmission conditions (compatibility and equilibrium) along the interfacesthe global equilibrium equation of the stochastic system can be expressed in the following blocklinear systems of equations

Lsum

i=0

Ψi

A1IIi 0 A1

IΓ iR1

0 Ans

IIi Ans

IΓ iRns

RT1 A1

ΓIi RTns

Ans

ΓIi

nssum

s=1

RTs As

ΓΓ iRs

u1I(θ)

uns

I (θ)uΓ (θ)

=

f1I

fns

Inssum

s=1

RTs f s

Γ

(2)

The solution process can be expanded using the same Polynomial Chaos basis as

u1I(θ)

uns

I (θ)uΓ (θ)

=N

sum

j=0

Ψj(θ)

u1Ij

uns

Ij

uΓ j

(3)

Substituting Eq(3) into Eq(2) and performing Galerkin projection to minimize the errorover the space spanned by the Polynomial Chaos basis [1] the following coupled deterministicsystems of equations is obtained

A1II 0 A1

IΓR1

0 Ans

II Ans

IΓRns

RT1 A

1ΓI RT

nsAns

ΓI

nssum

s=1

RTs A

sΓΓRs

U1I

Uns

I

=

F1I

Fns

Inssum

s=1

RTs F

(4)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

4

where

[Asαβ]jk =

Lsum

i=0

〈ΨiΨjΨk〉Asαβi Fs

αk = 〈Ψkfsα〉

UmI = (um

I 0 umI N )T UΓ = (uΓ 0 uΓ N )T

the subscripts α and β represent the index I and Γ The coefficient matrix in Eq(4) is of ordern(N +1)timesn(N +1) where n and (N +1) denote the total number of the degrees of freedom andchaos coefficients respectively The stochastic counterpart of the restriction operator in Eq(4)takes the following form

Rs = blockdiag(R0s R

Ns )

where (R0s R

Ns ) are the deterministic restriction operators In parallel implementation

Rs acts as a scatter operator while RTs acts as a gather operator and are not constructed explic-

itly

A block Gaussian elimination reduces the system in Eq(4) to the following extended Schurcomplement system for the interface variable UΓ

S UΓ = GΓ (5)

where the global extended Schur complement matrix S is given by

S =

nssum

s=1

RTs [As

ΓΓ minusAsΓI (A

sII)

minus1AsIΓ ]Rs

and the corresponding right hand vector GΓ is

GΓ =

nssum

s=1

RTs [Fs

Γ minusAsΓI (A

sII)

minus1FsI ]

Once the interface unknowns UΓ is available the interior unknowns can be obtainedconcurrently by solving the interior problem on each subdomain as

AsII Us

I = FsI minusAs

ΓIRsUΓ

4 Solution methods for the extended Schur complement system

Solution methods for linear systems are broadly categorized into direct methods and iterativemethods The direct methods generally are based on sparse Gaussian elimination technique andare popular for their robustness However they are expensive in computation time and memoryrequirements and therefore cannot be applied to the solution of large-scale linear systems [10]On the other hand the iterative methods generate a sequences of approximate solutions whichconverge to the true solutions In the iterative methods the main arithmetic operation is thematrix-vector multiplication Therefore the linear system itself need not be constructed explicitlyand only a procedure for matrix-vector product is required This property makes iterativemethods more suitable to parallel processing than direct methods

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

5

41 Preconditioned Conjugate Gradient Method (PCGM)Non-overlapping domain decomposition method or iterative substructuring can be viewed as apreconditioned iterative method to solve the Schur complement system of the form [11]

S UΓ = GΓ

For symmetric positive-definite system such as Schur complement system the ConjugateGradient Method (CGM) is generally used The performance of CGM mainly depends on thespectrum of the coefficient matrix However the rate of convergence of the iterative methodcan generally be improved by transforming the original system into an equivalent system thathas better spectral properties (ie lower condition number κ(S)) of the coefficient matrix Thistransformation is called preconditioning and the matrix used in the transformation is called thepreconditioner In other words the transformed linear system becomes

Mminus1S UΓ = Mminus1GΓ

In general κ(Mminus1S) is much smaller than κ(S) and the eigenvalues of Mminus1S are clusterednear one This procedure known as Preconditioned Conjugate Gradient Method (PCGM) Inpractice the explicit construction of Mminus1 is not needed Instead for a given vector rΓ a systemof the the following form is solved

MZ = rΓ

The PCGM algorithm to solve the Schur complement system proceeds as follows [10]

Algorithm 1 The PCGM Algorithm

1 Initialize UΓ0= 0

2 Compute rΓ0= GΓ minus S UΓ0

3 Precondition Z0 = Mminus1rΓ0

4 First search direction P0 = Z0

5 Initialize ρ0 = (rΓ0Z0)

6 For j = 0 1 middot middot middot until convergence Do

7 Qj = SPj

8 ρtmpj= (Qj Pj)

9 αj = ρjρtmpj

10 UΓj+1= UΓj

+ αjPj

11 rΓj+1= rΓj

minus αjQj

12 Zj+1 = Mminus1rΓj+1

13 ρj+1 = (rΓj+1Zj+1)

14 βj = ρj+1ρj

15 Pj+1 = Zj+1 + βjPj

16 EndDo

The PCGM algorithm indicates that the main arithmetic operations are calculating the prod-uct Q = SP in step 7 and the preconditioned residual Z = Mminus1rΓ in step 12 These operationscan be performed in parallel as outlined next

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

6

Given the subdomain Schur complement matrices Ss and a global vector P the matrix-vectorproduct Q = SP can be calculated in parallel as

Q =

nssum

s=1

RTs SsRsP

where ns is the number of subdomains and Rs and RTs are scatter and gather operator re-

spectively The parallel implementation of this procedure is summarized in Algorithm (2)

Algorithm 2 Parallel Matrix-Vector Product Procedure

1 Input (P)

2 Scatter Ps = RsP

3 Local operation Qs = SsPs

4 Gather Q =

nssum

s=1

RTs Q

s

5 Output (Q)

The working vectors Ps and Qs are defined on the subdomain level

Similarly the effect of a parallel preconditioner on a residual vector Z = Mminus1rΓ can becomputed as

Z =

nssum

s=1

RTs M

minus1s RsrΓ

This procedure is outlined in following algorithm

Algorithm 3 Parallel Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= RsrΓ

3 Local Solve MsZs = rs

Γ

4 Gather Z =

nssum

s=1

RTs Z

s

5 Output (Z)

The local preconditioner Ms and the working vectors rsΓ

and Zs are defined on the subdo-main level

5 Iterative substructuring methods of SPDEs

Next sections describe the primal and dual-primal substructuring methods in the context ofSPDEs In the primal method the interface problem is solved iteratively using PCGM solverequipped with a scalable preconditioner At each iteration of the iterative solver loop localproblems are solved on each subdomain in parallel These local problems are used to construct asubdomain level precondtioner Moreover a coarse problem is required to propagate information

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

7

globally across the subdomains This global exchange of information leads to a scalablepreconditioner In the dual-primal method a system of Lagrange multiplier that enforcescontinuity constraints across the interface boundary is solved iteratively using PCGM solverThe global coarse problem is already embedded in the operator of the Lagrange multiplier systemand therefore a one-level preconditioner such as lumped or Dirichlet is sufficient for scalability Aframework of the primal and dual-primal iterative substructuring methods for SPDEs is detailednext

6 A primal iterative substructuring method of SPDEs

In order to define local problems over each of the subdomains we partition the subdomainnodal vector us(θ) into a set of interior unknowns us

i (θ) corner unknowns usc(θ) and remaining

unknowns usr(θ) as schematically shown in Fig(2)

Figure 2 Partitioning domain nodes into interior () remaining () and corner(bull) nodes

According to this partitioning scheme the subdomain equilibrium equation can berepresented as

Asii(θ) As

ir(θ) Asic(θ)

Asri(θ) As

rr(θ) Asrc(θ)

Asci(θ) As

cr(θ) Ascc(θ)

usi (θ)

usr(θ)

usc(θ)

=

f si

f sr

f sc

The Polynomial Chaos representation of uncertain model parameters leads to the followingsubdomain equilibrium equation

Lsum

l=0

Ψl

Asiil As

irl Asicl

Asril As

rrl Asrcl

Ascil As

crl Asccl

usi (θ)

usr(θ)

usc(θ)

=

f si

f sr

f sc

(6)

The solution process is expressed using the same Polynomial Chaos basis as

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

8

usi (θ)

usr(θ)

usc(θ)

=

Nsum

j=0

Ψj(θ)

usij

usrj

uscj

(7)

Substituting Eq(7) into Eq(6) and performing Galerkin projection leads to the followingcoupled deterministic systems of equations

Asii As

ir Asic

Asri As

rr Asrc

Asci As

cr Ascc

Usi

Usr

Usc

=

Fsi

Fsr

Fsc

(8)

where

[Asαβ ]jk =

Lsum

l=0

〈ΨlΨjΨk〉Asαβl

Fsαk = 〈Ψkf

sα〉

Usα = (us

α0 middot middot middot usαN )T

the subscripts α and β represent the index i r and c

Enforcing the transmission conditions along the boundary interfaces the subdomainequilibrium equation can be expressed as

Asii As

irBsr As

icBsc

nssum

s=1

BsrTAs

ri

nssum

s=1

BsrTAs

rrBsr

nssum

s=1

BsrTAs

rcBsc

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

crBsr

nssum

s=1

BscTAs

ccBsc

Usi

Ur

Uc

=

Fsi

nssum

s=1

BsrTFs

r

nssum

s=1

BscTFs

c

(9)

where Bsr is a Boolean rectangular matrix that maps the global remaining vector Ur to the

local remaining vector Usr as

Usr = Bs

rUr (10)

Similarly the restriction operator Bsc is a Boolean rectangular matrix that maps the global

corner vector Uc to the local corner vector Usc as

Usc = Bs

cUc (11)

In parallel implementation both Bsr and Bs

c act as scatter operators while BsrT and Bs

cT act

as gather operators

The first block equation in Eq(9) can be solved for Usi in parallel as

Usi = [As

ii]minus1(Fs

i minusAsirB

srUr minusAs

icBscUc) (12)

Substituting Eq(12) into Eq(9) leads to the following condensed system which representsthe Schur compliment system in Eq(5) but now the interface boundary nodes are split into

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

9

remaining and corner nodes as shown schematically in Fig(3)

nssum

s=1

BsrTSs

rrBsr

nssum

s=1

BsrTSs

rcBsc

nssum

s=1

BscTSs

crBsr

nssum

s=1

BscTSs

ccBsc

Ur

Uc

=

nssum

s=1

BsrTGs

r

nssum

s=1

BscTGs

c

(13)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

Figure 3 The interface boundary nodes split into remainder () and corner(bull) nodes

The corner nodal vector Uc in Eq(13) is eliminated next to obtain the following (symmetricpositive definite) reduced interface problem

(Frr minus Frc[Fcc]minus1Fcr)Ur = dr minus Frc[Fcc]

minus1dc (14)

where

Fαβ =

nssum

s=1

Bsα

TSsαβB

dα =

nssum

s=1

Bsα

TGsα

and α and β denotes subscripts r and c The above system can be solved using PCGM withan appropriate preconditioner Mminus1 defined in the next section

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

10

61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system

Figure 4 Partial assembly of corner nodes (bull)

Ssrr Ss

rcBsc

nssum

s=1

BscTSs

crBsr

nssum

s=1

BscTSs

ccBsc

Usr

Uc

=

Fsr

0

(15)

where

Fsr = Ds

rBsrrj

and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal

weighting matrix which satisfies the following property

nssum

s=1

BsrTDs

rBsr = I

The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that

share the interface boundary nodes

The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)

as

Usr = [Ss

rr]minus1(Fs

r minus SsrcB

scUc) (16)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

11

Substituting Usr into the second block of Eq(15) leads to the following coarse problem

F lowast

ccUc = dlowastc (17)

where

F lowast

cc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

dlowastc = minus

nssum

s=1

BscTSs

cr[Ssrr]

minus1Fsr

The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as

Ur =

nssum

s=1

BsrTDs

rUsr

After some algebraic manipulations the preconditioner can be expressed as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr + RT

0 [F lowast

cc]minus1R0 (18)

where

R0 =

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

7 A dual-primal iterative substructuring method of SPDEs

In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes

Partial assembly of Eq(8) leads to the following subdomain equilibrium equation

Asii As

ir AsicB

sc

Asri As

rr AsrcB

sc

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc

Usi

Usr

Uc

=

Fsi

Fsr

nssum

s=1

BscTFs

c

(19)

where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local

corner vector Usc as

Usc = Bs

cUc

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

12

Eq(19) can be rewritten in compact form as

AsUs = Fs (20)

Let Bsr be a block diagonal signed Boolean continuity matrix defined as

s=nssum

s=1

BsrU

sr = 0

Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as

1

2UT AU minus UT F rarr min (21)

subject to BU = 0

where

A =

A1

As

Ans

U =

U1

Us

Uns

F =

F1

Fs

Fns

B =[

(0 B1r 0) middot middot middot (0 Bs

r 0) middot middot middot (0 Bnsr 0)

]

By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as

L(U Λ) =1

2UT AU minus UT F + UT BT Λ (22)

Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system

Asii As

ir AsicB

sc 0

Asri As

rr AsrcB

sc Bs

rT

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc 0

0

nssum

s=1

Bsr 0 0

Usi

Usr

Uc

Λ

=

Fsi

Fsr

nssum

s=1

BscTFs

c

0

(23)

where

Λ =

λ0

λN

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

13

Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary

and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)

Eliminating the interior unknowns Usi from Eq(23) as

Usi = [As

ii]minus1(Fs

i minusAsirU

sr minusAs

icBscUc) (24)

Substituting Eq(24) into Eq(23) leads to

Ssrr Ss

rcBsc Bs

rT

nssum

s=1

BscTSs

cr

nssum

s=1

BscTSs

ccBsc 0

nssum

s=1

Bsr 0 0

Usr

Uc

Λ

=

Gsr

nssum

s=1

BscTGs

c

0

(25)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)

as

Usr = [Ss

rr]minus1(Gs

r minus SsrcB

scUc minus Bs

rT Λ) (26)

Substituting Eq(26) into Eq(25) leads to

[

Fcc minusFcr

Frc Frr

]

Uc

Λ

=

dc

dr

(27)

where

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

14

Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Fcr =

nssum

s=1

BscTSs

cr[Ssrr]

minus1BsrT

Frc =

nssum

s=1

Bsr [S

srr]

minus1SsrcB

sc

Frr =

nssum

s=1

Bsr [S

srr]

minus1BsrT

dc =

nssum

s=1

BscT (Gs

c minus Sscr[S

srr]

minus1Gsr)

dr =

nssum

s=1

Bsr [S

srr]

minus1Gsr

Solving for Uc from Eq(27) gives the following coarse problem

FccUc = (dc + FcrΛ) (28)

Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system

(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]

minus1dc (29)

Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as

M =

nssum

s=1

BsrD

srS

srrD

srB

srT (30)

8 Connection between the methods

The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as

F lowast

cc = Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr

+

nssum

s=1

BsrTDs

r[Ssrr]

minus1SrcBsc

[

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

]minus1

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

which has the same form of the dual-primal operator in Eq(29)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

15

9 Parallel implementation

In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem

91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 4 Parallel Matrix-Vector Product for Primal Method

1 Input (P)

2 Scatter Ps = BsrP

3 Compute vs1 = Ss

crPs

4 Gather V1 =

nssum

s=1

BsrT vs

1

5 Global Solve FccV2 = V1

6 Scatter vs2 = Bs

cV2

7 Compute vs3 = Ss

rcvs2

8 Update Qs = SsrrP

s minus vs3

9 Gather Q =

nssum

s=1

BsrTQs

10 Output (Q)

Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as

vsα = Ss

αβvsβ

vsα = (As

αβ minusAsαi[A

sii]

minus1Asiβ)vs

β

This procedure is outlined in the following algorithm

Algorithm 5 Dirichlet Solver Procedure

1 Input (vsβ)

2 Compute vs1 = As

iβvsβ

3 Solve Asiiv

s2 = vs

1

4 Compute vs3 = As

αivs2

5 Compute vs4 = As

αβvsβ

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

16

6 Compute vsα = vs

4 minus vs3

7 Output (vsα)

The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as

Mminus1cc FccV2 = Mminus1

cc V1

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm

Algorithm 6 Two-Level Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter Fsr = Ds

rBsrrΓ

3 Local Solve Ssrrv

s1 = Fs

r

4 Compute dsc = Ss

crvs1

5 Gather dc =

nssum

s=1

BscT ds

c

6 Global Solve F lowastccZc = minusdc

7 Scatter Zsc = BsZc

8 Update vs2 = Fs

r + SsrcZ

sc

9 Local Solve SsrrZ

sf = vs

2

10 Gather Z =

nssum

s=1

BsrTDs

rZsf

11 Output (Z)

The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss

rrUsr = rs

r which can be solved using the following algorithm

Algorithm 7 Neumann-Solver Procedure

1 Input (rsr)

2 Solve[

Asii As

ir

Asri As

rr

]

X s

Usr

=

0rsr

3 Output (Usr )

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

17

The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as

Mminus1cc F lowast

ccZc = minusMminus1cc dc

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart

Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem

92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr + Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method

1 Input (P)

2 Scatter Ps = BsrTP

3 Local Solve Ssrrv

s1 = Ps

4 Compute vs2 = Ss

crvs1

5 Gather V2 =

nssum

s=1

BscT vs

2

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

18

6 Global Solve FccV3 = V2

7 Scatter vs3 = Bs

cV3

8 Compute vs4 = Ss

rcvs3

9 Update vs5 = Ps + vs

4

10 Local Solve SsrrQ

s = vs5

11 Gather Q =

nssum

s=1

BsrQ

s

12 Output (Q)

The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner

Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm

Algorithm 9 Dirichlet Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= DsrB

srT rΓ

3 Compute Zs = Ssrrr

4 Gather Z =

nssum

s=1

BsrD

srZ

s

5 Output (Z)

We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart

Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

19

10 Numerical results

For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as

part

partx[cx(x y θ)

partu(x y θ)

partx] +

part

party[cy(x y θ)

partu(x y θ)

party] = f(x y) in Ω

where the forcing term is

f(x y) = 10

For simplicity a homogeneous Dirichlet boundary condition is imposed as

u(x y θ) = 0 on partΩ

The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025

In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5

GkΓminus SUk

Γ2

G0Γ2

6 10minus5

Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh

101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

20

Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS

Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed

Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1

102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

21

Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3

Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5

Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1

a total problem size of (10 times n)

Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

22

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 5: Primal and dual-primal iterative substructuring methods of stochastic

Figure 1 Partitioning domain nodes into interior () and interface ()

Enforcing the transmission conditions (compatibility and equilibrium) along the interfacesthe global equilibrium equation of the stochastic system can be expressed in the following blocklinear systems of equations

Lsum

i=0

Ψi

A1IIi 0 A1

IΓ iR1

0 Ans

IIi Ans

IΓ iRns

RT1 A1

ΓIi RTns

Ans

ΓIi

nssum

s=1

RTs As

ΓΓ iRs

u1I(θ)

uns

I (θ)uΓ (θ)

=

f1I

fns

Inssum

s=1

RTs f s

Γ

(2)

The solution process can be expanded using the same Polynomial Chaos basis as

u1I(θ)

uns

I (θ)uΓ (θ)

=N

sum

j=0

Ψj(θ)

u1Ij

uns

Ij

uΓ j

(3)

Substituting Eq(3) into Eq(2) and performing Galerkin projection to minimize the errorover the space spanned by the Polynomial Chaos basis [1] the following coupled deterministicsystems of equations is obtained

A1II 0 A1

IΓR1

0 Ans

II Ans

IΓRns

RT1 A

1ΓI RT

nsAns

ΓI

nssum

s=1

RTs A

sΓΓRs

U1I

Uns

I

=

F1I

Fns

Inssum

s=1

RTs F

(4)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

4

where

[Asαβ]jk =

Lsum

i=0

〈ΨiΨjΨk〉Asαβi Fs

αk = 〈Ψkfsα〉

UmI = (um

I 0 umI N )T UΓ = (uΓ 0 uΓ N )T

the subscripts α and β represent the index I and Γ The coefficient matrix in Eq(4) is of ordern(N +1)timesn(N +1) where n and (N +1) denote the total number of the degrees of freedom andchaos coefficients respectively The stochastic counterpart of the restriction operator in Eq(4)takes the following form

Rs = blockdiag(R0s R

Ns )

where (R0s R

Ns ) are the deterministic restriction operators In parallel implementation

Rs acts as a scatter operator while RTs acts as a gather operator and are not constructed explic-

itly

A block Gaussian elimination reduces the system in Eq(4) to the following extended Schurcomplement system for the interface variable UΓ

S UΓ = GΓ (5)

where the global extended Schur complement matrix S is given by

S =

nssum

s=1

RTs [As

ΓΓ minusAsΓI (A

sII)

minus1AsIΓ ]Rs

and the corresponding right hand vector GΓ is

GΓ =

nssum

s=1

RTs [Fs

Γ minusAsΓI (A

sII)

minus1FsI ]

Once the interface unknowns UΓ is available the interior unknowns can be obtainedconcurrently by solving the interior problem on each subdomain as

AsII Us

I = FsI minusAs

ΓIRsUΓ

4 Solution methods for the extended Schur complement system

Solution methods for linear systems are broadly categorized into direct methods and iterativemethods The direct methods generally are based on sparse Gaussian elimination technique andare popular for their robustness However they are expensive in computation time and memoryrequirements and therefore cannot be applied to the solution of large-scale linear systems [10]On the other hand the iterative methods generate a sequences of approximate solutions whichconverge to the true solutions In the iterative methods the main arithmetic operation is thematrix-vector multiplication Therefore the linear system itself need not be constructed explicitlyand only a procedure for matrix-vector product is required This property makes iterativemethods more suitable to parallel processing than direct methods

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

5

41 Preconditioned Conjugate Gradient Method (PCGM)Non-overlapping domain decomposition method or iterative substructuring can be viewed as apreconditioned iterative method to solve the Schur complement system of the form [11]

S UΓ = GΓ

For symmetric positive-definite system such as Schur complement system the ConjugateGradient Method (CGM) is generally used The performance of CGM mainly depends on thespectrum of the coefficient matrix However the rate of convergence of the iterative methodcan generally be improved by transforming the original system into an equivalent system thathas better spectral properties (ie lower condition number κ(S)) of the coefficient matrix Thistransformation is called preconditioning and the matrix used in the transformation is called thepreconditioner In other words the transformed linear system becomes

Mminus1S UΓ = Mminus1GΓ

In general κ(Mminus1S) is much smaller than κ(S) and the eigenvalues of Mminus1S are clusterednear one This procedure known as Preconditioned Conjugate Gradient Method (PCGM) Inpractice the explicit construction of Mminus1 is not needed Instead for a given vector rΓ a systemof the the following form is solved

MZ = rΓ

The PCGM algorithm to solve the Schur complement system proceeds as follows [10]

Algorithm 1 The PCGM Algorithm

1 Initialize UΓ0= 0

2 Compute rΓ0= GΓ minus S UΓ0

3 Precondition Z0 = Mminus1rΓ0

4 First search direction P0 = Z0

5 Initialize ρ0 = (rΓ0Z0)

6 For j = 0 1 middot middot middot until convergence Do

7 Qj = SPj

8 ρtmpj= (Qj Pj)

9 αj = ρjρtmpj

10 UΓj+1= UΓj

+ αjPj

11 rΓj+1= rΓj

minus αjQj

12 Zj+1 = Mminus1rΓj+1

13 ρj+1 = (rΓj+1Zj+1)

14 βj = ρj+1ρj

15 Pj+1 = Zj+1 + βjPj

16 EndDo

The PCGM algorithm indicates that the main arithmetic operations are calculating the prod-uct Q = SP in step 7 and the preconditioned residual Z = Mminus1rΓ in step 12 These operationscan be performed in parallel as outlined next

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

6

Given the subdomain Schur complement matrices Ss and a global vector P the matrix-vectorproduct Q = SP can be calculated in parallel as

Q =

nssum

s=1

RTs SsRsP

where ns is the number of subdomains and Rs and RTs are scatter and gather operator re-

spectively The parallel implementation of this procedure is summarized in Algorithm (2)

Algorithm 2 Parallel Matrix-Vector Product Procedure

1 Input (P)

2 Scatter Ps = RsP

3 Local operation Qs = SsPs

4 Gather Q =

nssum

s=1

RTs Q

s

5 Output (Q)

The working vectors Ps and Qs are defined on the subdomain level

Similarly the effect of a parallel preconditioner on a residual vector Z = Mminus1rΓ can becomputed as

Z =

nssum

s=1

RTs M

minus1s RsrΓ

This procedure is outlined in following algorithm

Algorithm 3 Parallel Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= RsrΓ

3 Local Solve MsZs = rs

Γ

4 Gather Z =

nssum

s=1

RTs Z

s

5 Output (Z)

The local preconditioner Ms and the working vectors rsΓ

and Zs are defined on the subdo-main level

5 Iterative substructuring methods of SPDEs

Next sections describe the primal and dual-primal substructuring methods in the context ofSPDEs In the primal method the interface problem is solved iteratively using PCGM solverequipped with a scalable preconditioner At each iteration of the iterative solver loop localproblems are solved on each subdomain in parallel These local problems are used to construct asubdomain level precondtioner Moreover a coarse problem is required to propagate information

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

7

globally across the subdomains This global exchange of information leads to a scalablepreconditioner In the dual-primal method a system of Lagrange multiplier that enforcescontinuity constraints across the interface boundary is solved iteratively using PCGM solverThe global coarse problem is already embedded in the operator of the Lagrange multiplier systemand therefore a one-level preconditioner such as lumped or Dirichlet is sufficient for scalability Aframework of the primal and dual-primal iterative substructuring methods for SPDEs is detailednext

6 A primal iterative substructuring method of SPDEs

In order to define local problems over each of the subdomains we partition the subdomainnodal vector us(θ) into a set of interior unknowns us

i (θ) corner unknowns usc(θ) and remaining

unknowns usr(θ) as schematically shown in Fig(2)

Figure 2 Partitioning domain nodes into interior () remaining () and corner(bull) nodes

According to this partitioning scheme the subdomain equilibrium equation can berepresented as

Asii(θ) As

ir(θ) Asic(θ)

Asri(θ) As

rr(θ) Asrc(θ)

Asci(θ) As

cr(θ) Ascc(θ)

usi (θ)

usr(θ)

usc(θ)

=

f si

f sr

f sc

The Polynomial Chaos representation of uncertain model parameters leads to the followingsubdomain equilibrium equation

Lsum

l=0

Ψl

Asiil As

irl Asicl

Asril As

rrl Asrcl

Ascil As

crl Asccl

usi (θ)

usr(θ)

usc(θ)

=

f si

f sr

f sc

(6)

The solution process is expressed using the same Polynomial Chaos basis as

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

8

usi (θ)

usr(θ)

usc(θ)

=

Nsum

j=0

Ψj(θ)

usij

usrj

uscj

(7)

Substituting Eq(7) into Eq(6) and performing Galerkin projection leads to the followingcoupled deterministic systems of equations

Asii As

ir Asic

Asri As

rr Asrc

Asci As

cr Ascc

Usi

Usr

Usc

=

Fsi

Fsr

Fsc

(8)

where

[Asαβ ]jk =

Lsum

l=0

〈ΨlΨjΨk〉Asαβl

Fsαk = 〈Ψkf

sα〉

Usα = (us

α0 middot middot middot usαN )T

the subscripts α and β represent the index i r and c

Enforcing the transmission conditions along the boundary interfaces the subdomainequilibrium equation can be expressed as

Asii As

irBsr As

icBsc

nssum

s=1

BsrTAs

ri

nssum

s=1

BsrTAs

rrBsr

nssum

s=1

BsrTAs

rcBsc

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

crBsr

nssum

s=1

BscTAs

ccBsc

Usi

Ur

Uc

=

Fsi

nssum

s=1

BsrTFs

r

nssum

s=1

BscTFs

c

(9)

where Bsr is a Boolean rectangular matrix that maps the global remaining vector Ur to the

local remaining vector Usr as

Usr = Bs

rUr (10)

Similarly the restriction operator Bsc is a Boolean rectangular matrix that maps the global

corner vector Uc to the local corner vector Usc as

Usc = Bs

cUc (11)

In parallel implementation both Bsr and Bs

c act as scatter operators while BsrT and Bs

cT act

as gather operators

The first block equation in Eq(9) can be solved for Usi in parallel as

Usi = [As

ii]minus1(Fs

i minusAsirB

srUr minusAs

icBscUc) (12)

Substituting Eq(12) into Eq(9) leads to the following condensed system which representsthe Schur compliment system in Eq(5) but now the interface boundary nodes are split into

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

9

remaining and corner nodes as shown schematically in Fig(3)

nssum

s=1

BsrTSs

rrBsr

nssum

s=1

BsrTSs

rcBsc

nssum

s=1

BscTSs

crBsr

nssum

s=1

BscTSs

ccBsc

Ur

Uc

=

nssum

s=1

BsrTGs

r

nssum

s=1

BscTGs

c

(13)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

Figure 3 The interface boundary nodes split into remainder () and corner(bull) nodes

The corner nodal vector Uc in Eq(13) is eliminated next to obtain the following (symmetricpositive definite) reduced interface problem

(Frr minus Frc[Fcc]minus1Fcr)Ur = dr minus Frc[Fcc]

minus1dc (14)

where

Fαβ =

nssum

s=1

Bsα

TSsαβB

dα =

nssum

s=1

Bsα

TGsα

and α and β denotes subscripts r and c The above system can be solved using PCGM withan appropriate preconditioner Mminus1 defined in the next section

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

10

61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system

Figure 4 Partial assembly of corner nodes (bull)

Ssrr Ss

rcBsc

nssum

s=1

BscTSs

crBsr

nssum

s=1

BscTSs

ccBsc

Usr

Uc

=

Fsr

0

(15)

where

Fsr = Ds

rBsrrj

and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal

weighting matrix which satisfies the following property

nssum

s=1

BsrTDs

rBsr = I

The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that

share the interface boundary nodes

The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)

as

Usr = [Ss

rr]minus1(Fs

r minus SsrcB

scUc) (16)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

11

Substituting Usr into the second block of Eq(15) leads to the following coarse problem

F lowast

ccUc = dlowastc (17)

where

F lowast

cc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

dlowastc = minus

nssum

s=1

BscTSs

cr[Ssrr]

minus1Fsr

The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as

Ur =

nssum

s=1

BsrTDs

rUsr

After some algebraic manipulations the preconditioner can be expressed as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr + RT

0 [F lowast

cc]minus1R0 (18)

where

R0 =

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

7 A dual-primal iterative substructuring method of SPDEs

In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes

Partial assembly of Eq(8) leads to the following subdomain equilibrium equation

Asii As

ir AsicB

sc

Asri As

rr AsrcB

sc

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc

Usi

Usr

Uc

=

Fsi

Fsr

nssum

s=1

BscTFs

c

(19)

where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local

corner vector Usc as

Usc = Bs

cUc

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

12

Eq(19) can be rewritten in compact form as

AsUs = Fs (20)

Let Bsr be a block diagonal signed Boolean continuity matrix defined as

s=nssum

s=1

BsrU

sr = 0

Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as

1

2UT AU minus UT F rarr min (21)

subject to BU = 0

where

A =

A1

As

Ans

U =

U1

Us

Uns

F =

F1

Fs

Fns

B =[

(0 B1r 0) middot middot middot (0 Bs

r 0) middot middot middot (0 Bnsr 0)

]

By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as

L(U Λ) =1

2UT AU minus UT F + UT BT Λ (22)

Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system

Asii As

ir AsicB

sc 0

Asri As

rr AsrcB

sc Bs

rT

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc 0

0

nssum

s=1

Bsr 0 0

Usi

Usr

Uc

Λ

=

Fsi

Fsr

nssum

s=1

BscTFs

c

0

(23)

where

Λ =

λ0

λN

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

13

Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary

and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)

Eliminating the interior unknowns Usi from Eq(23) as

Usi = [As

ii]minus1(Fs

i minusAsirU

sr minusAs

icBscUc) (24)

Substituting Eq(24) into Eq(23) leads to

Ssrr Ss

rcBsc Bs

rT

nssum

s=1

BscTSs

cr

nssum

s=1

BscTSs

ccBsc 0

nssum

s=1

Bsr 0 0

Usr

Uc

Λ

=

Gsr

nssum

s=1

BscTGs

c

0

(25)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)

as

Usr = [Ss

rr]minus1(Gs

r minus SsrcB

scUc minus Bs

rT Λ) (26)

Substituting Eq(26) into Eq(25) leads to

[

Fcc minusFcr

Frc Frr

]

Uc

Λ

=

dc

dr

(27)

where

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

14

Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Fcr =

nssum

s=1

BscTSs

cr[Ssrr]

minus1BsrT

Frc =

nssum

s=1

Bsr [S

srr]

minus1SsrcB

sc

Frr =

nssum

s=1

Bsr [S

srr]

minus1BsrT

dc =

nssum

s=1

BscT (Gs

c minus Sscr[S

srr]

minus1Gsr)

dr =

nssum

s=1

Bsr [S

srr]

minus1Gsr

Solving for Uc from Eq(27) gives the following coarse problem

FccUc = (dc + FcrΛ) (28)

Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system

(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]

minus1dc (29)

Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as

M =

nssum

s=1

BsrD

srS

srrD

srB

srT (30)

8 Connection between the methods

The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as

F lowast

cc = Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr

+

nssum

s=1

BsrTDs

r[Ssrr]

minus1SrcBsc

[

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

]minus1

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

which has the same form of the dual-primal operator in Eq(29)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

15

9 Parallel implementation

In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem

91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 4 Parallel Matrix-Vector Product for Primal Method

1 Input (P)

2 Scatter Ps = BsrP

3 Compute vs1 = Ss

crPs

4 Gather V1 =

nssum

s=1

BsrT vs

1

5 Global Solve FccV2 = V1

6 Scatter vs2 = Bs

cV2

7 Compute vs3 = Ss

rcvs2

8 Update Qs = SsrrP

s minus vs3

9 Gather Q =

nssum

s=1

BsrTQs

10 Output (Q)

Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as

vsα = Ss

αβvsβ

vsα = (As

αβ minusAsαi[A

sii]

minus1Asiβ)vs

β

This procedure is outlined in the following algorithm

Algorithm 5 Dirichlet Solver Procedure

1 Input (vsβ)

2 Compute vs1 = As

iβvsβ

3 Solve Asiiv

s2 = vs

1

4 Compute vs3 = As

αivs2

5 Compute vs4 = As

αβvsβ

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

16

6 Compute vsα = vs

4 minus vs3

7 Output (vsα)

The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as

Mminus1cc FccV2 = Mminus1

cc V1

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm

Algorithm 6 Two-Level Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter Fsr = Ds

rBsrrΓ

3 Local Solve Ssrrv

s1 = Fs

r

4 Compute dsc = Ss

crvs1

5 Gather dc =

nssum

s=1

BscT ds

c

6 Global Solve F lowastccZc = minusdc

7 Scatter Zsc = BsZc

8 Update vs2 = Fs

r + SsrcZ

sc

9 Local Solve SsrrZ

sf = vs

2

10 Gather Z =

nssum

s=1

BsrTDs

rZsf

11 Output (Z)

The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss

rrUsr = rs

r which can be solved using the following algorithm

Algorithm 7 Neumann-Solver Procedure

1 Input (rsr)

2 Solve[

Asii As

ir

Asri As

rr

]

X s

Usr

=

0rsr

3 Output (Usr )

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

17

The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as

Mminus1cc F lowast

ccZc = minusMminus1cc dc

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart

Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem

92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr + Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method

1 Input (P)

2 Scatter Ps = BsrTP

3 Local Solve Ssrrv

s1 = Ps

4 Compute vs2 = Ss

crvs1

5 Gather V2 =

nssum

s=1

BscT vs

2

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

18

6 Global Solve FccV3 = V2

7 Scatter vs3 = Bs

cV3

8 Compute vs4 = Ss

rcvs3

9 Update vs5 = Ps + vs

4

10 Local Solve SsrrQ

s = vs5

11 Gather Q =

nssum

s=1

BsrQ

s

12 Output (Q)

The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner

Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm

Algorithm 9 Dirichlet Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= DsrB

srT rΓ

3 Compute Zs = Ssrrr

4 Gather Z =

nssum

s=1

BsrD

srZ

s

5 Output (Z)

We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart

Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

19

10 Numerical results

For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as

part

partx[cx(x y θ)

partu(x y θ)

partx] +

part

party[cy(x y θ)

partu(x y θ)

party] = f(x y) in Ω

where the forcing term is

f(x y) = 10

For simplicity a homogeneous Dirichlet boundary condition is imposed as

u(x y θ) = 0 on partΩ

The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025

In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5

GkΓminus SUk

Γ2

G0Γ2

6 10minus5

Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh

101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

20

Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS

Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed

Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1

102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

21

Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3

Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5

Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1

a total problem size of (10 times n)

Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

22

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 6: Primal and dual-primal iterative substructuring methods of stochastic

where

[Asαβ]jk =

Lsum

i=0

〈ΨiΨjΨk〉Asαβi Fs

αk = 〈Ψkfsα〉

UmI = (um

I 0 umI N )T UΓ = (uΓ 0 uΓ N )T

the subscripts α and β represent the index I and Γ The coefficient matrix in Eq(4) is of ordern(N +1)timesn(N +1) where n and (N +1) denote the total number of the degrees of freedom andchaos coefficients respectively The stochastic counterpart of the restriction operator in Eq(4)takes the following form

Rs = blockdiag(R0s R

Ns )

where (R0s R

Ns ) are the deterministic restriction operators In parallel implementation

Rs acts as a scatter operator while RTs acts as a gather operator and are not constructed explic-

itly

A block Gaussian elimination reduces the system in Eq(4) to the following extended Schurcomplement system for the interface variable UΓ

S UΓ = GΓ (5)

where the global extended Schur complement matrix S is given by

S =

nssum

s=1

RTs [As

ΓΓ minusAsΓI (A

sII)

minus1AsIΓ ]Rs

and the corresponding right hand vector GΓ is

GΓ =

nssum

s=1

RTs [Fs

Γ minusAsΓI (A

sII)

minus1FsI ]

Once the interface unknowns UΓ is available the interior unknowns can be obtainedconcurrently by solving the interior problem on each subdomain as

AsII Us

I = FsI minusAs

ΓIRsUΓ

4 Solution methods for the extended Schur complement system

Solution methods for linear systems are broadly categorized into direct methods and iterativemethods The direct methods generally are based on sparse Gaussian elimination technique andare popular for their robustness However they are expensive in computation time and memoryrequirements and therefore cannot be applied to the solution of large-scale linear systems [10]On the other hand the iterative methods generate a sequences of approximate solutions whichconverge to the true solutions In the iterative methods the main arithmetic operation is thematrix-vector multiplication Therefore the linear system itself need not be constructed explicitlyand only a procedure for matrix-vector product is required This property makes iterativemethods more suitable to parallel processing than direct methods

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

5

41 Preconditioned Conjugate Gradient Method (PCGM)Non-overlapping domain decomposition method or iterative substructuring can be viewed as apreconditioned iterative method to solve the Schur complement system of the form [11]

S UΓ = GΓ

For symmetric positive-definite system such as Schur complement system the ConjugateGradient Method (CGM) is generally used The performance of CGM mainly depends on thespectrum of the coefficient matrix However the rate of convergence of the iterative methodcan generally be improved by transforming the original system into an equivalent system thathas better spectral properties (ie lower condition number κ(S)) of the coefficient matrix Thistransformation is called preconditioning and the matrix used in the transformation is called thepreconditioner In other words the transformed linear system becomes

Mminus1S UΓ = Mminus1GΓ

In general κ(Mminus1S) is much smaller than κ(S) and the eigenvalues of Mminus1S are clusterednear one This procedure known as Preconditioned Conjugate Gradient Method (PCGM) Inpractice the explicit construction of Mminus1 is not needed Instead for a given vector rΓ a systemof the the following form is solved

MZ = rΓ

The PCGM algorithm to solve the Schur complement system proceeds as follows [10]

Algorithm 1 The PCGM Algorithm

1 Initialize UΓ0= 0

2 Compute rΓ0= GΓ minus S UΓ0

3 Precondition Z0 = Mminus1rΓ0

4 First search direction P0 = Z0

5 Initialize ρ0 = (rΓ0Z0)

6 For j = 0 1 middot middot middot until convergence Do

7 Qj = SPj

8 ρtmpj= (Qj Pj)

9 αj = ρjρtmpj

10 UΓj+1= UΓj

+ αjPj

11 rΓj+1= rΓj

minus αjQj

12 Zj+1 = Mminus1rΓj+1

13 ρj+1 = (rΓj+1Zj+1)

14 βj = ρj+1ρj

15 Pj+1 = Zj+1 + βjPj

16 EndDo

The PCGM algorithm indicates that the main arithmetic operations are calculating the prod-uct Q = SP in step 7 and the preconditioned residual Z = Mminus1rΓ in step 12 These operationscan be performed in parallel as outlined next

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

6

Given the subdomain Schur complement matrices Ss and a global vector P the matrix-vectorproduct Q = SP can be calculated in parallel as

Q =

nssum

s=1

RTs SsRsP

where ns is the number of subdomains and Rs and RTs are scatter and gather operator re-

spectively The parallel implementation of this procedure is summarized in Algorithm (2)

Algorithm 2 Parallel Matrix-Vector Product Procedure

1 Input (P)

2 Scatter Ps = RsP

3 Local operation Qs = SsPs

4 Gather Q =

nssum

s=1

RTs Q

s

5 Output (Q)

The working vectors Ps and Qs are defined on the subdomain level

Similarly the effect of a parallel preconditioner on a residual vector Z = Mminus1rΓ can becomputed as

Z =

nssum

s=1

RTs M

minus1s RsrΓ

This procedure is outlined in following algorithm

Algorithm 3 Parallel Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= RsrΓ

3 Local Solve MsZs = rs

Γ

4 Gather Z =

nssum

s=1

RTs Z

s

5 Output (Z)

The local preconditioner Ms and the working vectors rsΓ

and Zs are defined on the subdo-main level

5 Iterative substructuring methods of SPDEs

Next sections describe the primal and dual-primal substructuring methods in the context ofSPDEs In the primal method the interface problem is solved iteratively using PCGM solverequipped with a scalable preconditioner At each iteration of the iterative solver loop localproblems are solved on each subdomain in parallel These local problems are used to construct asubdomain level precondtioner Moreover a coarse problem is required to propagate information

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

7

globally across the subdomains This global exchange of information leads to a scalablepreconditioner In the dual-primal method a system of Lagrange multiplier that enforcescontinuity constraints across the interface boundary is solved iteratively using PCGM solverThe global coarse problem is already embedded in the operator of the Lagrange multiplier systemand therefore a one-level preconditioner such as lumped or Dirichlet is sufficient for scalability Aframework of the primal and dual-primal iterative substructuring methods for SPDEs is detailednext

6 A primal iterative substructuring method of SPDEs

In order to define local problems over each of the subdomains we partition the subdomainnodal vector us(θ) into a set of interior unknowns us

i (θ) corner unknowns usc(θ) and remaining

unknowns usr(θ) as schematically shown in Fig(2)

Figure 2 Partitioning domain nodes into interior () remaining () and corner(bull) nodes

According to this partitioning scheme the subdomain equilibrium equation can berepresented as

Asii(θ) As

ir(θ) Asic(θ)

Asri(θ) As

rr(θ) Asrc(θ)

Asci(θ) As

cr(θ) Ascc(θ)

usi (θ)

usr(θ)

usc(θ)

=

f si

f sr

f sc

The Polynomial Chaos representation of uncertain model parameters leads to the followingsubdomain equilibrium equation

Lsum

l=0

Ψl

Asiil As

irl Asicl

Asril As

rrl Asrcl

Ascil As

crl Asccl

usi (θ)

usr(θ)

usc(θ)

=

f si

f sr

f sc

(6)

The solution process is expressed using the same Polynomial Chaos basis as

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

8

usi (θ)

usr(θ)

usc(θ)

=

Nsum

j=0

Ψj(θ)

usij

usrj

uscj

(7)

Substituting Eq(7) into Eq(6) and performing Galerkin projection leads to the followingcoupled deterministic systems of equations

Asii As

ir Asic

Asri As

rr Asrc

Asci As

cr Ascc

Usi

Usr

Usc

=

Fsi

Fsr

Fsc

(8)

where

[Asαβ ]jk =

Lsum

l=0

〈ΨlΨjΨk〉Asαβl

Fsαk = 〈Ψkf

sα〉

Usα = (us

α0 middot middot middot usαN )T

the subscripts α and β represent the index i r and c

Enforcing the transmission conditions along the boundary interfaces the subdomainequilibrium equation can be expressed as

Asii As

irBsr As

icBsc

nssum

s=1

BsrTAs

ri

nssum

s=1

BsrTAs

rrBsr

nssum

s=1

BsrTAs

rcBsc

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

crBsr

nssum

s=1

BscTAs

ccBsc

Usi

Ur

Uc

=

Fsi

nssum

s=1

BsrTFs

r

nssum

s=1

BscTFs

c

(9)

where Bsr is a Boolean rectangular matrix that maps the global remaining vector Ur to the

local remaining vector Usr as

Usr = Bs

rUr (10)

Similarly the restriction operator Bsc is a Boolean rectangular matrix that maps the global

corner vector Uc to the local corner vector Usc as

Usc = Bs

cUc (11)

In parallel implementation both Bsr and Bs

c act as scatter operators while BsrT and Bs

cT act

as gather operators

The first block equation in Eq(9) can be solved for Usi in parallel as

Usi = [As

ii]minus1(Fs

i minusAsirB

srUr minusAs

icBscUc) (12)

Substituting Eq(12) into Eq(9) leads to the following condensed system which representsthe Schur compliment system in Eq(5) but now the interface boundary nodes are split into

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

9

remaining and corner nodes as shown schematically in Fig(3)

nssum

s=1

BsrTSs

rrBsr

nssum

s=1

BsrTSs

rcBsc

nssum

s=1

BscTSs

crBsr

nssum

s=1

BscTSs

ccBsc

Ur

Uc

=

nssum

s=1

BsrTGs

r

nssum

s=1

BscTGs

c

(13)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

Figure 3 The interface boundary nodes split into remainder () and corner(bull) nodes

The corner nodal vector Uc in Eq(13) is eliminated next to obtain the following (symmetricpositive definite) reduced interface problem

(Frr minus Frc[Fcc]minus1Fcr)Ur = dr minus Frc[Fcc]

minus1dc (14)

where

Fαβ =

nssum

s=1

Bsα

TSsαβB

dα =

nssum

s=1

Bsα

TGsα

and α and β denotes subscripts r and c The above system can be solved using PCGM withan appropriate preconditioner Mminus1 defined in the next section

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

10

61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system

Figure 4 Partial assembly of corner nodes (bull)

Ssrr Ss

rcBsc

nssum

s=1

BscTSs

crBsr

nssum

s=1

BscTSs

ccBsc

Usr

Uc

=

Fsr

0

(15)

where

Fsr = Ds

rBsrrj

and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal

weighting matrix which satisfies the following property

nssum

s=1

BsrTDs

rBsr = I

The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that

share the interface boundary nodes

The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)

as

Usr = [Ss

rr]minus1(Fs

r minus SsrcB

scUc) (16)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

11

Substituting Usr into the second block of Eq(15) leads to the following coarse problem

F lowast

ccUc = dlowastc (17)

where

F lowast

cc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

dlowastc = minus

nssum

s=1

BscTSs

cr[Ssrr]

minus1Fsr

The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as

Ur =

nssum

s=1

BsrTDs

rUsr

After some algebraic manipulations the preconditioner can be expressed as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr + RT

0 [F lowast

cc]minus1R0 (18)

where

R0 =

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

7 A dual-primal iterative substructuring method of SPDEs

In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes

Partial assembly of Eq(8) leads to the following subdomain equilibrium equation

Asii As

ir AsicB

sc

Asri As

rr AsrcB

sc

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc

Usi

Usr

Uc

=

Fsi

Fsr

nssum

s=1

BscTFs

c

(19)

where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local

corner vector Usc as

Usc = Bs

cUc

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

12

Eq(19) can be rewritten in compact form as

AsUs = Fs (20)

Let Bsr be a block diagonal signed Boolean continuity matrix defined as

s=nssum

s=1

BsrU

sr = 0

Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as

1

2UT AU minus UT F rarr min (21)

subject to BU = 0

where

A =

A1

As

Ans

U =

U1

Us

Uns

F =

F1

Fs

Fns

B =[

(0 B1r 0) middot middot middot (0 Bs

r 0) middot middot middot (0 Bnsr 0)

]

By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as

L(U Λ) =1

2UT AU minus UT F + UT BT Λ (22)

Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system

Asii As

ir AsicB

sc 0

Asri As

rr AsrcB

sc Bs

rT

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc 0

0

nssum

s=1

Bsr 0 0

Usi

Usr

Uc

Λ

=

Fsi

Fsr

nssum

s=1

BscTFs

c

0

(23)

where

Λ =

λ0

λN

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

13

Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary

and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)

Eliminating the interior unknowns Usi from Eq(23) as

Usi = [As

ii]minus1(Fs

i minusAsirU

sr minusAs

icBscUc) (24)

Substituting Eq(24) into Eq(23) leads to

Ssrr Ss

rcBsc Bs

rT

nssum

s=1

BscTSs

cr

nssum

s=1

BscTSs

ccBsc 0

nssum

s=1

Bsr 0 0

Usr

Uc

Λ

=

Gsr

nssum

s=1

BscTGs

c

0

(25)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)

as

Usr = [Ss

rr]minus1(Gs

r minus SsrcB

scUc minus Bs

rT Λ) (26)

Substituting Eq(26) into Eq(25) leads to

[

Fcc minusFcr

Frc Frr

]

Uc

Λ

=

dc

dr

(27)

where

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

14

Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Fcr =

nssum

s=1

BscTSs

cr[Ssrr]

minus1BsrT

Frc =

nssum

s=1

Bsr [S

srr]

minus1SsrcB

sc

Frr =

nssum

s=1

Bsr [S

srr]

minus1BsrT

dc =

nssum

s=1

BscT (Gs

c minus Sscr[S

srr]

minus1Gsr)

dr =

nssum

s=1

Bsr [S

srr]

minus1Gsr

Solving for Uc from Eq(27) gives the following coarse problem

FccUc = (dc + FcrΛ) (28)

Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system

(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]

minus1dc (29)

Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as

M =

nssum

s=1

BsrD

srS

srrD

srB

srT (30)

8 Connection between the methods

The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as

F lowast

cc = Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr

+

nssum

s=1

BsrTDs

r[Ssrr]

minus1SrcBsc

[

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

]minus1

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

which has the same form of the dual-primal operator in Eq(29)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

15

9 Parallel implementation

In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem

91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 4 Parallel Matrix-Vector Product for Primal Method

1 Input (P)

2 Scatter Ps = BsrP

3 Compute vs1 = Ss

crPs

4 Gather V1 =

nssum

s=1

BsrT vs

1

5 Global Solve FccV2 = V1

6 Scatter vs2 = Bs

cV2

7 Compute vs3 = Ss

rcvs2

8 Update Qs = SsrrP

s minus vs3

9 Gather Q =

nssum

s=1

BsrTQs

10 Output (Q)

Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as

vsα = Ss

αβvsβ

vsα = (As

αβ minusAsαi[A

sii]

minus1Asiβ)vs

β

This procedure is outlined in the following algorithm

Algorithm 5 Dirichlet Solver Procedure

1 Input (vsβ)

2 Compute vs1 = As

iβvsβ

3 Solve Asiiv

s2 = vs

1

4 Compute vs3 = As

αivs2

5 Compute vs4 = As

αβvsβ

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

16

6 Compute vsα = vs

4 minus vs3

7 Output (vsα)

The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as

Mminus1cc FccV2 = Mminus1

cc V1

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm

Algorithm 6 Two-Level Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter Fsr = Ds

rBsrrΓ

3 Local Solve Ssrrv

s1 = Fs

r

4 Compute dsc = Ss

crvs1

5 Gather dc =

nssum

s=1

BscT ds

c

6 Global Solve F lowastccZc = minusdc

7 Scatter Zsc = BsZc

8 Update vs2 = Fs

r + SsrcZ

sc

9 Local Solve SsrrZ

sf = vs

2

10 Gather Z =

nssum

s=1

BsrTDs

rZsf

11 Output (Z)

The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss

rrUsr = rs

r which can be solved using the following algorithm

Algorithm 7 Neumann-Solver Procedure

1 Input (rsr)

2 Solve[

Asii As

ir

Asri As

rr

]

X s

Usr

=

0rsr

3 Output (Usr )

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

17

The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as

Mminus1cc F lowast

ccZc = minusMminus1cc dc

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart

Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem

92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr + Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method

1 Input (P)

2 Scatter Ps = BsrTP

3 Local Solve Ssrrv

s1 = Ps

4 Compute vs2 = Ss

crvs1

5 Gather V2 =

nssum

s=1

BscT vs

2

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

18

6 Global Solve FccV3 = V2

7 Scatter vs3 = Bs

cV3

8 Compute vs4 = Ss

rcvs3

9 Update vs5 = Ps + vs

4

10 Local Solve SsrrQ

s = vs5

11 Gather Q =

nssum

s=1

BsrQ

s

12 Output (Q)

The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner

Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm

Algorithm 9 Dirichlet Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= DsrB

srT rΓ

3 Compute Zs = Ssrrr

4 Gather Z =

nssum

s=1

BsrD

srZ

s

5 Output (Z)

We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart

Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

19

10 Numerical results

For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as

part

partx[cx(x y θ)

partu(x y θ)

partx] +

part

party[cy(x y θ)

partu(x y θ)

party] = f(x y) in Ω

where the forcing term is

f(x y) = 10

For simplicity a homogeneous Dirichlet boundary condition is imposed as

u(x y θ) = 0 on partΩ

The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025

In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5

GkΓminus SUk

Γ2

G0Γ2

6 10minus5

Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh

101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

20

Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS

Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed

Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1

102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

21

Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3

Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5

Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1

a total problem size of (10 times n)

Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

22

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 7: Primal and dual-primal iterative substructuring methods of stochastic

41 Preconditioned Conjugate Gradient Method (PCGM)Non-overlapping domain decomposition method or iterative substructuring can be viewed as apreconditioned iterative method to solve the Schur complement system of the form [11]

S UΓ = GΓ

For symmetric positive-definite system such as Schur complement system the ConjugateGradient Method (CGM) is generally used The performance of CGM mainly depends on thespectrum of the coefficient matrix However the rate of convergence of the iterative methodcan generally be improved by transforming the original system into an equivalent system thathas better spectral properties (ie lower condition number κ(S)) of the coefficient matrix Thistransformation is called preconditioning and the matrix used in the transformation is called thepreconditioner In other words the transformed linear system becomes

Mminus1S UΓ = Mminus1GΓ

In general κ(Mminus1S) is much smaller than κ(S) and the eigenvalues of Mminus1S are clusterednear one This procedure known as Preconditioned Conjugate Gradient Method (PCGM) Inpractice the explicit construction of Mminus1 is not needed Instead for a given vector rΓ a systemof the the following form is solved

MZ = rΓ

The PCGM algorithm to solve the Schur complement system proceeds as follows [10]

Algorithm 1 The PCGM Algorithm

1 Initialize UΓ0= 0

2 Compute rΓ0= GΓ minus S UΓ0

3 Precondition Z0 = Mminus1rΓ0

4 First search direction P0 = Z0

5 Initialize ρ0 = (rΓ0Z0)

6 For j = 0 1 middot middot middot until convergence Do

7 Qj = SPj

8 ρtmpj= (Qj Pj)

9 αj = ρjρtmpj

10 UΓj+1= UΓj

+ αjPj

11 rΓj+1= rΓj

minus αjQj

12 Zj+1 = Mminus1rΓj+1

13 ρj+1 = (rΓj+1Zj+1)

14 βj = ρj+1ρj

15 Pj+1 = Zj+1 + βjPj

16 EndDo

The PCGM algorithm indicates that the main arithmetic operations are calculating the prod-uct Q = SP in step 7 and the preconditioned residual Z = Mminus1rΓ in step 12 These operationscan be performed in parallel as outlined next

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

6

Given the subdomain Schur complement matrices Ss and a global vector P the matrix-vectorproduct Q = SP can be calculated in parallel as

Q =

nssum

s=1

RTs SsRsP

where ns is the number of subdomains and Rs and RTs are scatter and gather operator re-

spectively The parallel implementation of this procedure is summarized in Algorithm (2)

Algorithm 2 Parallel Matrix-Vector Product Procedure

1 Input (P)

2 Scatter Ps = RsP

3 Local operation Qs = SsPs

4 Gather Q =

nssum

s=1

RTs Q

s

5 Output (Q)

The working vectors Ps and Qs are defined on the subdomain level

Similarly the effect of a parallel preconditioner on a residual vector Z = Mminus1rΓ can becomputed as

Z =

nssum

s=1

RTs M

minus1s RsrΓ

This procedure is outlined in following algorithm

Algorithm 3 Parallel Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= RsrΓ

3 Local Solve MsZs = rs

Γ

4 Gather Z =

nssum

s=1

RTs Z

s

5 Output (Z)

The local preconditioner Ms and the working vectors rsΓ

and Zs are defined on the subdo-main level

5 Iterative substructuring methods of SPDEs

Next sections describe the primal and dual-primal substructuring methods in the context ofSPDEs In the primal method the interface problem is solved iteratively using PCGM solverequipped with a scalable preconditioner At each iteration of the iterative solver loop localproblems are solved on each subdomain in parallel These local problems are used to construct asubdomain level precondtioner Moreover a coarse problem is required to propagate information

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

7

globally across the subdomains This global exchange of information leads to a scalablepreconditioner In the dual-primal method a system of Lagrange multiplier that enforcescontinuity constraints across the interface boundary is solved iteratively using PCGM solverThe global coarse problem is already embedded in the operator of the Lagrange multiplier systemand therefore a one-level preconditioner such as lumped or Dirichlet is sufficient for scalability Aframework of the primal and dual-primal iterative substructuring methods for SPDEs is detailednext

6 A primal iterative substructuring method of SPDEs

In order to define local problems over each of the subdomains we partition the subdomainnodal vector us(θ) into a set of interior unknowns us

i (θ) corner unknowns usc(θ) and remaining

unknowns usr(θ) as schematically shown in Fig(2)

Figure 2 Partitioning domain nodes into interior () remaining () and corner(bull) nodes

According to this partitioning scheme the subdomain equilibrium equation can berepresented as

Asii(θ) As

ir(θ) Asic(θ)

Asri(θ) As

rr(θ) Asrc(θ)

Asci(θ) As

cr(θ) Ascc(θ)

usi (θ)

usr(θ)

usc(θ)

=

f si

f sr

f sc

The Polynomial Chaos representation of uncertain model parameters leads to the followingsubdomain equilibrium equation

Lsum

l=0

Ψl

Asiil As

irl Asicl

Asril As

rrl Asrcl

Ascil As

crl Asccl

usi (θ)

usr(θ)

usc(θ)

=

f si

f sr

f sc

(6)

The solution process is expressed using the same Polynomial Chaos basis as

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

8

usi (θ)

usr(θ)

usc(θ)

=

Nsum

j=0

Ψj(θ)

usij

usrj

uscj

(7)

Substituting Eq(7) into Eq(6) and performing Galerkin projection leads to the followingcoupled deterministic systems of equations

Asii As

ir Asic

Asri As

rr Asrc

Asci As

cr Ascc

Usi

Usr

Usc

=

Fsi

Fsr

Fsc

(8)

where

[Asαβ ]jk =

Lsum

l=0

〈ΨlΨjΨk〉Asαβl

Fsαk = 〈Ψkf

sα〉

Usα = (us

α0 middot middot middot usαN )T

the subscripts α and β represent the index i r and c

Enforcing the transmission conditions along the boundary interfaces the subdomainequilibrium equation can be expressed as

Asii As

irBsr As

icBsc

nssum

s=1

BsrTAs

ri

nssum

s=1

BsrTAs

rrBsr

nssum

s=1

BsrTAs

rcBsc

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

crBsr

nssum

s=1

BscTAs

ccBsc

Usi

Ur

Uc

=

Fsi

nssum

s=1

BsrTFs

r

nssum

s=1

BscTFs

c

(9)

where Bsr is a Boolean rectangular matrix that maps the global remaining vector Ur to the

local remaining vector Usr as

Usr = Bs

rUr (10)

Similarly the restriction operator Bsc is a Boolean rectangular matrix that maps the global

corner vector Uc to the local corner vector Usc as

Usc = Bs

cUc (11)

In parallel implementation both Bsr and Bs

c act as scatter operators while BsrT and Bs

cT act

as gather operators

The first block equation in Eq(9) can be solved for Usi in parallel as

Usi = [As

ii]minus1(Fs

i minusAsirB

srUr minusAs

icBscUc) (12)

Substituting Eq(12) into Eq(9) leads to the following condensed system which representsthe Schur compliment system in Eq(5) but now the interface boundary nodes are split into

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

9

remaining and corner nodes as shown schematically in Fig(3)

nssum

s=1

BsrTSs

rrBsr

nssum

s=1

BsrTSs

rcBsc

nssum

s=1

BscTSs

crBsr

nssum

s=1

BscTSs

ccBsc

Ur

Uc

=

nssum

s=1

BsrTGs

r

nssum

s=1

BscTGs

c

(13)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

Figure 3 The interface boundary nodes split into remainder () and corner(bull) nodes

The corner nodal vector Uc in Eq(13) is eliminated next to obtain the following (symmetricpositive definite) reduced interface problem

(Frr minus Frc[Fcc]minus1Fcr)Ur = dr minus Frc[Fcc]

minus1dc (14)

where

Fαβ =

nssum

s=1

Bsα

TSsαβB

dα =

nssum

s=1

Bsα

TGsα

and α and β denotes subscripts r and c The above system can be solved using PCGM withan appropriate preconditioner Mminus1 defined in the next section

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

10

61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system

Figure 4 Partial assembly of corner nodes (bull)

Ssrr Ss

rcBsc

nssum

s=1

BscTSs

crBsr

nssum

s=1

BscTSs

ccBsc

Usr

Uc

=

Fsr

0

(15)

where

Fsr = Ds

rBsrrj

and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal

weighting matrix which satisfies the following property

nssum

s=1

BsrTDs

rBsr = I

The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that

share the interface boundary nodes

The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)

as

Usr = [Ss

rr]minus1(Fs

r minus SsrcB

scUc) (16)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

11

Substituting Usr into the second block of Eq(15) leads to the following coarse problem

F lowast

ccUc = dlowastc (17)

where

F lowast

cc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

dlowastc = minus

nssum

s=1

BscTSs

cr[Ssrr]

minus1Fsr

The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as

Ur =

nssum

s=1

BsrTDs

rUsr

After some algebraic manipulations the preconditioner can be expressed as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr + RT

0 [F lowast

cc]minus1R0 (18)

where

R0 =

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

7 A dual-primal iterative substructuring method of SPDEs

In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes

Partial assembly of Eq(8) leads to the following subdomain equilibrium equation

Asii As

ir AsicB

sc

Asri As

rr AsrcB

sc

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc

Usi

Usr

Uc

=

Fsi

Fsr

nssum

s=1

BscTFs

c

(19)

where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local

corner vector Usc as

Usc = Bs

cUc

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

12

Eq(19) can be rewritten in compact form as

AsUs = Fs (20)

Let Bsr be a block diagonal signed Boolean continuity matrix defined as

s=nssum

s=1

BsrU

sr = 0

Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as

1

2UT AU minus UT F rarr min (21)

subject to BU = 0

where

A =

A1

As

Ans

U =

U1

Us

Uns

F =

F1

Fs

Fns

B =[

(0 B1r 0) middot middot middot (0 Bs

r 0) middot middot middot (0 Bnsr 0)

]

By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as

L(U Λ) =1

2UT AU minus UT F + UT BT Λ (22)

Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system

Asii As

ir AsicB

sc 0

Asri As

rr AsrcB

sc Bs

rT

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc 0

0

nssum

s=1

Bsr 0 0

Usi

Usr

Uc

Λ

=

Fsi

Fsr

nssum

s=1

BscTFs

c

0

(23)

where

Λ =

λ0

λN

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

13

Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary

and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)

Eliminating the interior unknowns Usi from Eq(23) as

Usi = [As

ii]minus1(Fs

i minusAsirU

sr minusAs

icBscUc) (24)

Substituting Eq(24) into Eq(23) leads to

Ssrr Ss

rcBsc Bs

rT

nssum

s=1

BscTSs

cr

nssum

s=1

BscTSs

ccBsc 0

nssum

s=1

Bsr 0 0

Usr

Uc

Λ

=

Gsr

nssum

s=1

BscTGs

c

0

(25)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)

as

Usr = [Ss

rr]minus1(Gs

r minus SsrcB

scUc minus Bs

rT Λ) (26)

Substituting Eq(26) into Eq(25) leads to

[

Fcc minusFcr

Frc Frr

]

Uc

Λ

=

dc

dr

(27)

where

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

14

Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Fcr =

nssum

s=1

BscTSs

cr[Ssrr]

minus1BsrT

Frc =

nssum

s=1

Bsr [S

srr]

minus1SsrcB

sc

Frr =

nssum

s=1

Bsr [S

srr]

minus1BsrT

dc =

nssum

s=1

BscT (Gs

c minus Sscr[S

srr]

minus1Gsr)

dr =

nssum

s=1

Bsr [S

srr]

minus1Gsr

Solving for Uc from Eq(27) gives the following coarse problem

FccUc = (dc + FcrΛ) (28)

Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system

(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]

minus1dc (29)

Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as

M =

nssum

s=1

BsrD

srS

srrD

srB

srT (30)

8 Connection between the methods

The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as

F lowast

cc = Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr

+

nssum

s=1

BsrTDs

r[Ssrr]

minus1SrcBsc

[

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

]minus1

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

which has the same form of the dual-primal operator in Eq(29)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

15

9 Parallel implementation

In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem

91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 4 Parallel Matrix-Vector Product for Primal Method

1 Input (P)

2 Scatter Ps = BsrP

3 Compute vs1 = Ss

crPs

4 Gather V1 =

nssum

s=1

BsrT vs

1

5 Global Solve FccV2 = V1

6 Scatter vs2 = Bs

cV2

7 Compute vs3 = Ss

rcvs2

8 Update Qs = SsrrP

s minus vs3

9 Gather Q =

nssum

s=1

BsrTQs

10 Output (Q)

Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as

vsα = Ss

αβvsβ

vsα = (As

αβ minusAsαi[A

sii]

minus1Asiβ)vs

β

This procedure is outlined in the following algorithm

Algorithm 5 Dirichlet Solver Procedure

1 Input (vsβ)

2 Compute vs1 = As

iβvsβ

3 Solve Asiiv

s2 = vs

1

4 Compute vs3 = As

αivs2

5 Compute vs4 = As

αβvsβ

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

16

6 Compute vsα = vs

4 minus vs3

7 Output (vsα)

The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as

Mminus1cc FccV2 = Mminus1

cc V1

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm

Algorithm 6 Two-Level Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter Fsr = Ds

rBsrrΓ

3 Local Solve Ssrrv

s1 = Fs

r

4 Compute dsc = Ss

crvs1

5 Gather dc =

nssum

s=1

BscT ds

c

6 Global Solve F lowastccZc = minusdc

7 Scatter Zsc = BsZc

8 Update vs2 = Fs

r + SsrcZ

sc

9 Local Solve SsrrZ

sf = vs

2

10 Gather Z =

nssum

s=1

BsrTDs

rZsf

11 Output (Z)

The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss

rrUsr = rs

r which can be solved using the following algorithm

Algorithm 7 Neumann-Solver Procedure

1 Input (rsr)

2 Solve[

Asii As

ir

Asri As

rr

]

X s

Usr

=

0rsr

3 Output (Usr )

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

17

The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as

Mminus1cc F lowast

ccZc = minusMminus1cc dc

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart

Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem

92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr + Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method

1 Input (P)

2 Scatter Ps = BsrTP

3 Local Solve Ssrrv

s1 = Ps

4 Compute vs2 = Ss

crvs1

5 Gather V2 =

nssum

s=1

BscT vs

2

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

18

6 Global Solve FccV3 = V2

7 Scatter vs3 = Bs

cV3

8 Compute vs4 = Ss

rcvs3

9 Update vs5 = Ps + vs

4

10 Local Solve SsrrQ

s = vs5

11 Gather Q =

nssum

s=1

BsrQ

s

12 Output (Q)

The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner

Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm

Algorithm 9 Dirichlet Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= DsrB

srT rΓ

3 Compute Zs = Ssrrr

4 Gather Z =

nssum

s=1

BsrD

srZ

s

5 Output (Z)

We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart

Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

19

10 Numerical results

For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as

part

partx[cx(x y θ)

partu(x y θ)

partx] +

part

party[cy(x y θ)

partu(x y θ)

party] = f(x y) in Ω

where the forcing term is

f(x y) = 10

For simplicity a homogeneous Dirichlet boundary condition is imposed as

u(x y θ) = 0 on partΩ

The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025

In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5

GkΓminus SUk

Γ2

G0Γ2

6 10minus5

Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh

101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

20

Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS

Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed

Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1

102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

21

Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3

Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5

Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1

a total problem size of (10 times n)

Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

22

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 8: Primal and dual-primal iterative substructuring methods of stochastic

Given the subdomain Schur complement matrices Ss and a global vector P the matrix-vectorproduct Q = SP can be calculated in parallel as

Q =

nssum

s=1

RTs SsRsP

where ns is the number of subdomains and Rs and RTs are scatter and gather operator re-

spectively The parallel implementation of this procedure is summarized in Algorithm (2)

Algorithm 2 Parallel Matrix-Vector Product Procedure

1 Input (P)

2 Scatter Ps = RsP

3 Local operation Qs = SsPs

4 Gather Q =

nssum

s=1

RTs Q

s

5 Output (Q)

The working vectors Ps and Qs are defined on the subdomain level

Similarly the effect of a parallel preconditioner on a residual vector Z = Mminus1rΓ can becomputed as

Z =

nssum

s=1

RTs M

minus1s RsrΓ

This procedure is outlined in following algorithm

Algorithm 3 Parallel Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= RsrΓ

3 Local Solve MsZs = rs

Γ

4 Gather Z =

nssum

s=1

RTs Z

s

5 Output (Z)

The local preconditioner Ms and the working vectors rsΓ

and Zs are defined on the subdo-main level

5 Iterative substructuring methods of SPDEs

Next sections describe the primal and dual-primal substructuring methods in the context ofSPDEs In the primal method the interface problem is solved iteratively using PCGM solverequipped with a scalable preconditioner At each iteration of the iterative solver loop localproblems are solved on each subdomain in parallel These local problems are used to construct asubdomain level precondtioner Moreover a coarse problem is required to propagate information

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

7

globally across the subdomains This global exchange of information leads to a scalablepreconditioner In the dual-primal method a system of Lagrange multiplier that enforcescontinuity constraints across the interface boundary is solved iteratively using PCGM solverThe global coarse problem is already embedded in the operator of the Lagrange multiplier systemand therefore a one-level preconditioner such as lumped or Dirichlet is sufficient for scalability Aframework of the primal and dual-primal iterative substructuring methods for SPDEs is detailednext

6 A primal iterative substructuring method of SPDEs

In order to define local problems over each of the subdomains we partition the subdomainnodal vector us(θ) into a set of interior unknowns us

i (θ) corner unknowns usc(θ) and remaining

unknowns usr(θ) as schematically shown in Fig(2)

Figure 2 Partitioning domain nodes into interior () remaining () and corner(bull) nodes

According to this partitioning scheme the subdomain equilibrium equation can berepresented as

Asii(θ) As

ir(θ) Asic(θ)

Asri(θ) As

rr(θ) Asrc(θ)

Asci(θ) As

cr(θ) Ascc(θ)

usi (θ)

usr(θ)

usc(θ)

=

f si

f sr

f sc

The Polynomial Chaos representation of uncertain model parameters leads to the followingsubdomain equilibrium equation

Lsum

l=0

Ψl

Asiil As

irl Asicl

Asril As

rrl Asrcl

Ascil As

crl Asccl

usi (θ)

usr(θ)

usc(θ)

=

f si

f sr

f sc

(6)

The solution process is expressed using the same Polynomial Chaos basis as

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

8

usi (θ)

usr(θ)

usc(θ)

=

Nsum

j=0

Ψj(θ)

usij

usrj

uscj

(7)

Substituting Eq(7) into Eq(6) and performing Galerkin projection leads to the followingcoupled deterministic systems of equations

Asii As

ir Asic

Asri As

rr Asrc

Asci As

cr Ascc

Usi

Usr

Usc

=

Fsi

Fsr

Fsc

(8)

where

[Asαβ ]jk =

Lsum

l=0

〈ΨlΨjΨk〉Asαβl

Fsαk = 〈Ψkf

sα〉

Usα = (us

α0 middot middot middot usαN )T

the subscripts α and β represent the index i r and c

Enforcing the transmission conditions along the boundary interfaces the subdomainequilibrium equation can be expressed as

Asii As

irBsr As

icBsc

nssum

s=1

BsrTAs

ri

nssum

s=1

BsrTAs

rrBsr

nssum

s=1

BsrTAs

rcBsc

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

crBsr

nssum

s=1

BscTAs

ccBsc

Usi

Ur

Uc

=

Fsi

nssum

s=1

BsrTFs

r

nssum

s=1

BscTFs

c

(9)

where Bsr is a Boolean rectangular matrix that maps the global remaining vector Ur to the

local remaining vector Usr as

Usr = Bs

rUr (10)

Similarly the restriction operator Bsc is a Boolean rectangular matrix that maps the global

corner vector Uc to the local corner vector Usc as

Usc = Bs

cUc (11)

In parallel implementation both Bsr and Bs

c act as scatter operators while BsrT and Bs

cT act

as gather operators

The first block equation in Eq(9) can be solved for Usi in parallel as

Usi = [As

ii]minus1(Fs

i minusAsirB

srUr minusAs

icBscUc) (12)

Substituting Eq(12) into Eq(9) leads to the following condensed system which representsthe Schur compliment system in Eq(5) but now the interface boundary nodes are split into

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

9

remaining and corner nodes as shown schematically in Fig(3)

nssum

s=1

BsrTSs

rrBsr

nssum

s=1

BsrTSs

rcBsc

nssum

s=1

BscTSs

crBsr

nssum

s=1

BscTSs

ccBsc

Ur

Uc

=

nssum

s=1

BsrTGs

r

nssum

s=1

BscTGs

c

(13)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

Figure 3 The interface boundary nodes split into remainder () and corner(bull) nodes

The corner nodal vector Uc in Eq(13) is eliminated next to obtain the following (symmetricpositive definite) reduced interface problem

(Frr minus Frc[Fcc]minus1Fcr)Ur = dr minus Frc[Fcc]

minus1dc (14)

where

Fαβ =

nssum

s=1

Bsα

TSsαβB

dα =

nssum

s=1

Bsα

TGsα

and α and β denotes subscripts r and c The above system can be solved using PCGM withan appropriate preconditioner Mminus1 defined in the next section

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

10

61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system

Figure 4 Partial assembly of corner nodes (bull)

Ssrr Ss

rcBsc

nssum

s=1

BscTSs

crBsr

nssum

s=1

BscTSs

ccBsc

Usr

Uc

=

Fsr

0

(15)

where

Fsr = Ds

rBsrrj

and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal

weighting matrix which satisfies the following property

nssum

s=1

BsrTDs

rBsr = I

The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that

share the interface boundary nodes

The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)

as

Usr = [Ss

rr]minus1(Fs

r minus SsrcB

scUc) (16)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

11

Substituting Usr into the second block of Eq(15) leads to the following coarse problem

F lowast

ccUc = dlowastc (17)

where

F lowast

cc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

dlowastc = minus

nssum

s=1

BscTSs

cr[Ssrr]

minus1Fsr

The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as

Ur =

nssum

s=1

BsrTDs

rUsr

After some algebraic manipulations the preconditioner can be expressed as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr + RT

0 [F lowast

cc]minus1R0 (18)

where

R0 =

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

7 A dual-primal iterative substructuring method of SPDEs

In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes

Partial assembly of Eq(8) leads to the following subdomain equilibrium equation

Asii As

ir AsicB

sc

Asri As

rr AsrcB

sc

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc

Usi

Usr

Uc

=

Fsi

Fsr

nssum

s=1

BscTFs

c

(19)

where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local

corner vector Usc as

Usc = Bs

cUc

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

12

Eq(19) can be rewritten in compact form as

AsUs = Fs (20)

Let Bsr be a block diagonal signed Boolean continuity matrix defined as

s=nssum

s=1

BsrU

sr = 0

Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as

1

2UT AU minus UT F rarr min (21)

subject to BU = 0

where

A =

A1

As

Ans

U =

U1

Us

Uns

F =

F1

Fs

Fns

B =[

(0 B1r 0) middot middot middot (0 Bs

r 0) middot middot middot (0 Bnsr 0)

]

By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as

L(U Λ) =1

2UT AU minus UT F + UT BT Λ (22)

Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system

Asii As

ir AsicB

sc 0

Asri As

rr AsrcB

sc Bs

rT

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc 0

0

nssum

s=1

Bsr 0 0

Usi

Usr

Uc

Λ

=

Fsi

Fsr

nssum

s=1

BscTFs

c

0

(23)

where

Λ =

λ0

λN

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

13

Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary

and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)

Eliminating the interior unknowns Usi from Eq(23) as

Usi = [As

ii]minus1(Fs

i minusAsirU

sr minusAs

icBscUc) (24)

Substituting Eq(24) into Eq(23) leads to

Ssrr Ss

rcBsc Bs

rT

nssum

s=1

BscTSs

cr

nssum

s=1

BscTSs

ccBsc 0

nssum

s=1

Bsr 0 0

Usr

Uc

Λ

=

Gsr

nssum

s=1

BscTGs

c

0

(25)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)

as

Usr = [Ss

rr]minus1(Gs

r minus SsrcB

scUc minus Bs

rT Λ) (26)

Substituting Eq(26) into Eq(25) leads to

[

Fcc minusFcr

Frc Frr

]

Uc

Λ

=

dc

dr

(27)

where

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

14

Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Fcr =

nssum

s=1

BscTSs

cr[Ssrr]

minus1BsrT

Frc =

nssum

s=1

Bsr [S

srr]

minus1SsrcB

sc

Frr =

nssum

s=1

Bsr [S

srr]

minus1BsrT

dc =

nssum

s=1

BscT (Gs

c minus Sscr[S

srr]

minus1Gsr)

dr =

nssum

s=1

Bsr [S

srr]

minus1Gsr

Solving for Uc from Eq(27) gives the following coarse problem

FccUc = (dc + FcrΛ) (28)

Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system

(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]

minus1dc (29)

Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as

M =

nssum

s=1

BsrD

srS

srrD

srB

srT (30)

8 Connection between the methods

The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as

F lowast

cc = Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr

+

nssum

s=1

BsrTDs

r[Ssrr]

minus1SrcBsc

[

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

]minus1

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

which has the same form of the dual-primal operator in Eq(29)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

15

9 Parallel implementation

In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem

91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 4 Parallel Matrix-Vector Product for Primal Method

1 Input (P)

2 Scatter Ps = BsrP

3 Compute vs1 = Ss

crPs

4 Gather V1 =

nssum

s=1

BsrT vs

1

5 Global Solve FccV2 = V1

6 Scatter vs2 = Bs

cV2

7 Compute vs3 = Ss

rcvs2

8 Update Qs = SsrrP

s minus vs3

9 Gather Q =

nssum

s=1

BsrTQs

10 Output (Q)

Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as

vsα = Ss

αβvsβ

vsα = (As

αβ minusAsαi[A

sii]

minus1Asiβ)vs

β

This procedure is outlined in the following algorithm

Algorithm 5 Dirichlet Solver Procedure

1 Input (vsβ)

2 Compute vs1 = As

iβvsβ

3 Solve Asiiv

s2 = vs

1

4 Compute vs3 = As

αivs2

5 Compute vs4 = As

αβvsβ

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

16

6 Compute vsα = vs

4 minus vs3

7 Output (vsα)

The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as

Mminus1cc FccV2 = Mminus1

cc V1

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm

Algorithm 6 Two-Level Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter Fsr = Ds

rBsrrΓ

3 Local Solve Ssrrv

s1 = Fs

r

4 Compute dsc = Ss

crvs1

5 Gather dc =

nssum

s=1

BscT ds

c

6 Global Solve F lowastccZc = minusdc

7 Scatter Zsc = BsZc

8 Update vs2 = Fs

r + SsrcZ

sc

9 Local Solve SsrrZ

sf = vs

2

10 Gather Z =

nssum

s=1

BsrTDs

rZsf

11 Output (Z)

The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss

rrUsr = rs

r which can be solved using the following algorithm

Algorithm 7 Neumann-Solver Procedure

1 Input (rsr)

2 Solve[

Asii As

ir

Asri As

rr

]

X s

Usr

=

0rsr

3 Output (Usr )

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

17

The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as

Mminus1cc F lowast

ccZc = minusMminus1cc dc

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart

Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem

92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr + Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method

1 Input (P)

2 Scatter Ps = BsrTP

3 Local Solve Ssrrv

s1 = Ps

4 Compute vs2 = Ss

crvs1

5 Gather V2 =

nssum

s=1

BscT vs

2

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

18

6 Global Solve FccV3 = V2

7 Scatter vs3 = Bs

cV3

8 Compute vs4 = Ss

rcvs3

9 Update vs5 = Ps + vs

4

10 Local Solve SsrrQ

s = vs5

11 Gather Q =

nssum

s=1

BsrQ

s

12 Output (Q)

The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner

Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm

Algorithm 9 Dirichlet Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= DsrB

srT rΓ

3 Compute Zs = Ssrrr

4 Gather Z =

nssum

s=1

BsrD

srZ

s

5 Output (Z)

We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart

Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

19

10 Numerical results

For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as

part

partx[cx(x y θ)

partu(x y θ)

partx] +

part

party[cy(x y θ)

partu(x y θ)

party] = f(x y) in Ω

where the forcing term is

f(x y) = 10

For simplicity a homogeneous Dirichlet boundary condition is imposed as

u(x y θ) = 0 on partΩ

The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025

In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5

GkΓminus SUk

Γ2

G0Γ2

6 10minus5

Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh

101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

20

Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS

Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed

Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1

102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

21

Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3

Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5

Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1

a total problem size of (10 times n)

Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

22

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 9: Primal and dual-primal iterative substructuring methods of stochastic

globally across the subdomains This global exchange of information leads to a scalablepreconditioner In the dual-primal method a system of Lagrange multiplier that enforcescontinuity constraints across the interface boundary is solved iteratively using PCGM solverThe global coarse problem is already embedded in the operator of the Lagrange multiplier systemand therefore a one-level preconditioner such as lumped or Dirichlet is sufficient for scalability Aframework of the primal and dual-primal iterative substructuring methods for SPDEs is detailednext

6 A primal iterative substructuring method of SPDEs

In order to define local problems over each of the subdomains we partition the subdomainnodal vector us(θ) into a set of interior unknowns us

i (θ) corner unknowns usc(θ) and remaining

unknowns usr(θ) as schematically shown in Fig(2)

Figure 2 Partitioning domain nodes into interior () remaining () and corner(bull) nodes

According to this partitioning scheme the subdomain equilibrium equation can berepresented as

Asii(θ) As

ir(θ) Asic(θ)

Asri(θ) As

rr(θ) Asrc(θ)

Asci(θ) As

cr(θ) Ascc(θ)

usi (θ)

usr(θ)

usc(θ)

=

f si

f sr

f sc

The Polynomial Chaos representation of uncertain model parameters leads to the followingsubdomain equilibrium equation

Lsum

l=0

Ψl

Asiil As

irl Asicl

Asril As

rrl Asrcl

Ascil As

crl Asccl

usi (θ)

usr(θ)

usc(θ)

=

f si

f sr

f sc

(6)

The solution process is expressed using the same Polynomial Chaos basis as

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

8

usi (θ)

usr(θ)

usc(θ)

=

Nsum

j=0

Ψj(θ)

usij

usrj

uscj

(7)

Substituting Eq(7) into Eq(6) and performing Galerkin projection leads to the followingcoupled deterministic systems of equations

Asii As

ir Asic

Asri As

rr Asrc

Asci As

cr Ascc

Usi

Usr

Usc

=

Fsi

Fsr

Fsc

(8)

where

[Asαβ ]jk =

Lsum

l=0

〈ΨlΨjΨk〉Asαβl

Fsαk = 〈Ψkf

sα〉

Usα = (us

α0 middot middot middot usαN )T

the subscripts α and β represent the index i r and c

Enforcing the transmission conditions along the boundary interfaces the subdomainequilibrium equation can be expressed as

Asii As

irBsr As

icBsc

nssum

s=1

BsrTAs

ri

nssum

s=1

BsrTAs

rrBsr

nssum

s=1

BsrTAs

rcBsc

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

crBsr

nssum

s=1

BscTAs

ccBsc

Usi

Ur

Uc

=

Fsi

nssum

s=1

BsrTFs

r

nssum

s=1

BscTFs

c

(9)

where Bsr is a Boolean rectangular matrix that maps the global remaining vector Ur to the

local remaining vector Usr as

Usr = Bs

rUr (10)

Similarly the restriction operator Bsc is a Boolean rectangular matrix that maps the global

corner vector Uc to the local corner vector Usc as

Usc = Bs

cUc (11)

In parallel implementation both Bsr and Bs

c act as scatter operators while BsrT and Bs

cT act

as gather operators

The first block equation in Eq(9) can be solved for Usi in parallel as

Usi = [As

ii]minus1(Fs

i minusAsirB

srUr minusAs

icBscUc) (12)

Substituting Eq(12) into Eq(9) leads to the following condensed system which representsthe Schur compliment system in Eq(5) but now the interface boundary nodes are split into

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

9

remaining and corner nodes as shown schematically in Fig(3)

nssum

s=1

BsrTSs

rrBsr

nssum

s=1

BsrTSs

rcBsc

nssum

s=1

BscTSs

crBsr

nssum

s=1

BscTSs

ccBsc

Ur

Uc

=

nssum

s=1

BsrTGs

r

nssum

s=1

BscTGs

c

(13)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

Figure 3 The interface boundary nodes split into remainder () and corner(bull) nodes

The corner nodal vector Uc in Eq(13) is eliminated next to obtain the following (symmetricpositive definite) reduced interface problem

(Frr minus Frc[Fcc]minus1Fcr)Ur = dr minus Frc[Fcc]

minus1dc (14)

where

Fαβ =

nssum

s=1

Bsα

TSsαβB

dα =

nssum

s=1

Bsα

TGsα

and α and β denotes subscripts r and c The above system can be solved using PCGM withan appropriate preconditioner Mminus1 defined in the next section

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

10

61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system

Figure 4 Partial assembly of corner nodes (bull)

Ssrr Ss

rcBsc

nssum

s=1

BscTSs

crBsr

nssum

s=1

BscTSs

ccBsc

Usr

Uc

=

Fsr

0

(15)

where

Fsr = Ds

rBsrrj

and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal

weighting matrix which satisfies the following property

nssum

s=1

BsrTDs

rBsr = I

The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that

share the interface boundary nodes

The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)

as

Usr = [Ss

rr]minus1(Fs

r minus SsrcB

scUc) (16)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

11

Substituting Usr into the second block of Eq(15) leads to the following coarse problem

F lowast

ccUc = dlowastc (17)

where

F lowast

cc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

dlowastc = minus

nssum

s=1

BscTSs

cr[Ssrr]

minus1Fsr

The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as

Ur =

nssum

s=1

BsrTDs

rUsr

After some algebraic manipulations the preconditioner can be expressed as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr + RT

0 [F lowast

cc]minus1R0 (18)

where

R0 =

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

7 A dual-primal iterative substructuring method of SPDEs

In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes

Partial assembly of Eq(8) leads to the following subdomain equilibrium equation

Asii As

ir AsicB

sc

Asri As

rr AsrcB

sc

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc

Usi

Usr

Uc

=

Fsi

Fsr

nssum

s=1

BscTFs

c

(19)

where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local

corner vector Usc as

Usc = Bs

cUc

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

12

Eq(19) can be rewritten in compact form as

AsUs = Fs (20)

Let Bsr be a block diagonal signed Boolean continuity matrix defined as

s=nssum

s=1

BsrU

sr = 0

Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as

1

2UT AU minus UT F rarr min (21)

subject to BU = 0

where

A =

A1

As

Ans

U =

U1

Us

Uns

F =

F1

Fs

Fns

B =[

(0 B1r 0) middot middot middot (0 Bs

r 0) middot middot middot (0 Bnsr 0)

]

By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as

L(U Λ) =1

2UT AU minus UT F + UT BT Λ (22)

Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system

Asii As

ir AsicB

sc 0

Asri As

rr AsrcB

sc Bs

rT

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc 0

0

nssum

s=1

Bsr 0 0

Usi

Usr

Uc

Λ

=

Fsi

Fsr

nssum

s=1

BscTFs

c

0

(23)

where

Λ =

λ0

λN

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

13

Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary

and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)

Eliminating the interior unknowns Usi from Eq(23) as

Usi = [As

ii]minus1(Fs

i minusAsirU

sr minusAs

icBscUc) (24)

Substituting Eq(24) into Eq(23) leads to

Ssrr Ss

rcBsc Bs

rT

nssum

s=1

BscTSs

cr

nssum

s=1

BscTSs

ccBsc 0

nssum

s=1

Bsr 0 0

Usr

Uc

Λ

=

Gsr

nssum

s=1

BscTGs

c

0

(25)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)

as

Usr = [Ss

rr]minus1(Gs

r minus SsrcB

scUc minus Bs

rT Λ) (26)

Substituting Eq(26) into Eq(25) leads to

[

Fcc minusFcr

Frc Frr

]

Uc

Λ

=

dc

dr

(27)

where

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

14

Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Fcr =

nssum

s=1

BscTSs

cr[Ssrr]

minus1BsrT

Frc =

nssum

s=1

Bsr [S

srr]

minus1SsrcB

sc

Frr =

nssum

s=1

Bsr [S

srr]

minus1BsrT

dc =

nssum

s=1

BscT (Gs

c minus Sscr[S

srr]

minus1Gsr)

dr =

nssum

s=1

Bsr [S

srr]

minus1Gsr

Solving for Uc from Eq(27) gives the following coarse problem

FccUc = (dc + FcrΛ) (28)

Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system

(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]

minus1dc (29)

Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as

M =

nssum

s=1

BsrD

srS

srrD

srB

srT (30)

8 Connection between the methods

The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as

F lowast

cc = Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr

+

nssum

s=1

BsrTDs

r[Ssrr]

minus1SrcBsc

[

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

]minus1

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

which has the same form of the dual-primal operator in Eq(29)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

15

9 Parallel implementation

In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem

91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 4 Parallel Matrix-Vector Product for Primal Method

1 Input (P)

2 Scatter Ps = BsrP

3 Compute vs1 = Ss

crPs

4 Gather V1 =

nssum

s=1

BsrT vs

1

5 Global Solve FccV2 = V1

6 Scatter vs2 = Bs

cV2

7 Compute vs3 = Ss

rcvs2

8 Update Qs = SsrrP

s minus vs3

9 Gather Q =

nssum

s=1

BsrTQs

10 Output (Q)

Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as

vsα = Ss

αβvsβ

vsα = (As

αβ minusAsαi[A

sii]

minus1Asiβ)vs

β

This procedure is outlined in the following algorithm

Algorithm 5 Dirichlet Solver Procedure

1 Input (vsβ)

2 Compute vs1 = As

iβvsβ

3 Solve Asiiv

s2 = vs

1

4 Compute vs3 = As

αivs2

5 Compute vs4 = As

αβvsβ

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

16

6 Compute vsα = vs

4 minus vs3

7 Output (vsα)

The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as

Mminus1cc FccV2 = Mminus1

cc V1

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm

Algorithm 6 Two-Level Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter Fsr = Ds

rBsrrΓ

3 Local Solve Ssrrv

s1 = Fs

r

4 Compute dsc = Ss

crvs1

5 Gather dc =

nssum

s=1

BscT ds

c

6 Global Solve F lowastccZc = minusdc

7 Scatter Zsc = BsZc

8 Update vs2 = Fs

r + SsrcZ

sc

9 Local Solve SsrrZ

sf = vs

2

10 Gather Z =

nssum

s=1

BsrTDs

rZsf

11 Output (Z)

The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss

rrUsr = rs

r which can be solved using the following algorithm

Algorithm 7 Neumann-Solver Procedure

1 Input (rsr)

2 Solve[

Asii As

ir

Asri As

rr

]

X s

Usr

=

0rsr

3 Output (Usr )

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

17

The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as

Mminus1cc F lowast

ccZc = minusMminus1cc dc

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart

Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem

92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr + Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method

1 Input (P)

2 Scatter Ps = BsrTP

3 Local Solve Ssrrv

s1 = Ps

4 Compute vs2 = Ss

crvs1

5 Gather V2 =

nssum

s=1

BscT vs

2

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

18

6 Global Solve FccV3 = V2

7 Scatter vs3 = Bs

cV3

8 Compute vs4 = Ss

rcvs3

9 Update vs5 = Ps + vs

4

10 Local Solve SsrrQ

s = vs5

11 Gather Q =

nssum

s=1

BsrQ

s

12 Output (Q)

The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner

Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm

Algorithm 9 Dirichlet Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= DsrB

srT rΓ

3 Compute Zs = Ssrrr

4 Gather Z =

nssum

s=1

BsrD

srZ

s

5 Output (Z)

We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart

Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

19

10 Numerical results

For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as

part

partx[cx(x y θ)

partu(x y θ)

partx] +

part

party[cy(x y θ)

partu(x y θ)

party] = f(x y) in Ω

where the forcing term is

f(x y) = 10

For simplicity a homogeneous Dirichlet boundary condition is imposed as

u(x y θ) = 0 on partΩ

The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025

In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5

GkΓminus SUk

Γ2

G0Γ2

6 10minus5

Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh

101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

20

Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS

Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed

Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1

102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

21

Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3

Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5

Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1

a total problem size of (10 times n)

Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

22

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 10: Primal and dual-primal iterative substructuring methods of stochastic

usi (θ)

usr(θ)

usc(θ)

=

Nsum

j=0

Ψj(θ)

usij

usrj

uscj

(7)

Substituting Eq(7) into Eq(6) and performing Galerkin projection leads to the followingcoupled deterministic systems of equations

Asii As

ir Asic

Asri As

rr Asrc

Asci As

cr Ascc

Usi

Usr

Usc

=

Fsi

Fsr

Fsc

(8)

where

[Asαβ ]jk =

Lsum

l=0

〈ΨlΨjΨk〉Asαβl

Fsαk = 〈Ψkf

sα〉

Usα = (us

α0 middot middot middot usαN )T

the subscripts α and β represent the index i r and c

Enforcing the transmission conditions along the boundary interfaces the subdomainequilibrium equation can be expressed as

Asii As

irBsr As

icBsc

nssum

s=1

BsrTAs

ri

nssum

s=1

BsrTAs

rrBsr

nssum

s=1

BsrTAs

rcBsc

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

crBsr

nssum

s=1

BscTAs

ccBsc

Usi

Ur

Uc

=

Fsi

nssum

s=1

BsrTFs

r

nssum

s=1

BscTFs

c

(9)

where Bsr is a Boolean rectangular matrix that maps the global remaining vector Ur to the

local remaining vector Usr as

Usr = Bs

rUr (10)

Similarly the restriction operator Bsc is a Boolean rectangular matrix that maps the global

corner vector Uc to the local corner vector Usc as

Usc = Bs

cUc (11)

In parallel implementation both Bsr and Bs

c act as scatter operators while BsrT and Bs

cT act

as gather operators

The first block equation in Eq(9) can be solved for Usi in parallel as

Usi = [As

ii]minus1(Fs

i minusAsirB

srUr minusAs

icBscUc) (12)

Substituting Eq(12) into Eq(9) leads to the following condensed system which representsthe Schur compliment system in Eq(5) but now the interface boundary nodes are split into

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

9

remaining and corner nodes as shown schematically in Fig(3)

nssum

s=1

BsrTSs

rrBsr

nssum

s=1

BsrTSs

rcBsc

nssum

s=1

BscTSs

crBsr

nssum

s=1

BscTSs

ccBsc

Ur

Uc

=

nssum

s=1

BsrTGs

r

nssum

s=1

BscTGs

c

(13)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

Figure 3 The interface boundary nodes split into remainder () and corner(bull) nodes

The corner nodal vector Uc in Eq(13) is eliminated next to obtain the following (symmetricpositive definite) reduced interface problem

(Frr minus Frc[Fcc]minus1Fcr)Ur = dr minus Frc[Fcc]

minus1dc (14)

where

Fαβ =

nssum

s=1

Bsα

TSsαβB

dα =

nssum

s=1

Bsα

TGsα

and α and β denotes subscripts r and c The above system can be solved using PCGM withan appropriate preconditioner Mminus1 defined in the next section

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

10

61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system

Figure 4 Partial assembly of corner nodes (bull)

Ssrr Ss

rcBsc

nssum

s=1

BscTSs

crBsr

nssum

s=1

BscTSs

ccBsc

Usr

Uc

=

Fsr

0

(15)

where

Fsr = Ds

rBsrrj

and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal

weighting matrix which satisfies the following property

nssum

s=1

BsrTDs

rBsr = I

The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that

share the interface boundary nodes

The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)

as

Usr = [Ss

rr]minus1(Fs

r minus SsrcB

scUc) (16)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

11

Substituting Usr into the second block of Eq(15) leads to the following coarse problem

F lowast

ccUc = dlowastc (17)

where

F lowast

cc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

dlowastc = minus

nssum

s=1

BscTSs

cr[Ssrr]

minus1Fsr

The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as

Ur =

nssum

s=1

BsrTDs

rUsr

After some algebraic manipulations the preconditioner can be expressed as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr + RT

0 [F lowast

cc]minus1R0 (18)

where

R0 =

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

7 A dual-primal iterative substructuring method of SPDEs

In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes

Partial assembly of Eq(8) leads to the following subdomain equilibrium equation

Asii As

ir AsicB

sc

Asri As

rr AsrcB

sc

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc

Usi

Usr

Uc

=

Fsi

Fsr

nssum

s=1

BscTFs

c

(19)

where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local

corner vector Usc as

Usc = Bs

cUc

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

12

Eq(19) can be rewritten in compact form as

AsUs = Fs (20)

Let Bsr be a block diagonal signed Boolean continuity matrix defined as

s=nssum

s=1

BsrU

sr = 0

Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as

1

2UT AU minus UT F rarr min (21)

subject to BU = 0

where

A =

A1

As

Ans

U =

U1

Us

Uns

F =

F1

Fs

Fns

B =[

(0 B1r 0) middot middot middot (0 Bs

r 0) middot middot middot (0 Bnsr 0)

]

By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as

L(U Λ) =1

2UT AU minus UT F + UT BT Λ (22)

Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system

Asii As

ir AsicB

sc 0

Asri As

rr AsrcB

sc Bs

rT

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc 0

0

nssum

s=1

Bsr 0 0

Usi

Usr

Uc

Λ

=

Fsi

Fsr

nssum

s=1

BscTFs

c

0

(23)

where

Λ =

λ0

λN

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

13

Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary

and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)

Eliminating the interior unknowns Usi from Eq(23) as

Usi = [As

ii]minus1(Fs

i minusAsirU

sr minusAs

icBscUc) (24)

Substituting Eq(24) into Eq(23) leads to

Ssrr Ss

rcBsc Bs

rT

nssum

s=1

BscTSs

cr

nssum

s=1

BscTSs

ccBsc 0

nssum

s=1

Bsr 0 0

Usr

Uc

Λ

=

Gsr

nssum

s=1

BscTGs

c

0

(25)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)

as

Usr = [Ss

rr]minus1(Gs

r minus SsrcB

scUc minus Bs

rT Λ) (26)

Substituting Eq(26) into Eq(25) leads to

[

Fcc minusFcr

Frc Frr

]

Uc

Λ

=

dc

dr

(27)

where

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

14

Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Fcr =

nssum

s=1

BscTSs

cr[Ssrr]

minus1BsrT

Frc =

nssum

s=1

Bsr [S

srr]

minus1SsrcB

sc

Frr =

nssum

s=1

Bsr [S

srr]

minus1BsrT

dc =

nssum

s=1

BscT (Gs

c minus Sscr[S

srr]

minus1Gsr)

dr =

nssum

s=1

Bsr [S

srr]

minus1Gsr

Solving for Uc from Eq(27) gives the following coarse problem

FccUc = (dc + FcrΛ) (28)

Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system

(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]

minus1dc (29)

Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as

M =

nssum

s=1

BsrD

srS

srrD

srB

srT (30)

8 Connection between the methods

The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as

F lowast

cc = Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr

+

nssum

s=1

BsrTDs

r[Ssrr]

minus1SrcBsc

[

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

]minus1

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

which has the same form of the dual-primal operator in Eq(29)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

15

9 Parallel implementation

In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem

91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 4 Parallel Matrix-Vector Product for Primal Method

1 Input (P)

2 Scatter Ps = BsrP

3 Compute vs1 = Ss

crPs

4 Gather V1 =

nssum

s=1

BsrT vs

1

5 Global Solve FccV2 = V1

6 Scatter vs2 = Bs

cV2

7 Compute vs3 = Ss

rcvs2

8 Update Qs = SsrrP

s minus vs3

9 Gather Q =

nssum

s=1

BsrTQs

10 Output (Q)

Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as

vsα = Ss

αβvsβ

vsα = (As

αβ minusAsαi[A

sii]

minus1Asiβ)vs

β

This procedure is outlined in the following algorithm

Algorithm 5 Dirichlet Solver Procedure

1 Input (vsβ)

2 Compute vs1 = As

iβvsβ

3 Solve Asiiv

s2 = vs

1

4 Compute vs3 = As

αivs2

5 Compute vs4 = As

αβvsβ

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

16

6 Compute vsα = vs

4 minus vs3

7 Output (vsα)

The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as

Mminus1cc FccV2 = Mminus1

cc V1

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm

Algorithm 6 Two-Level Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter Fsr = Ds

rBsrrΓ

3 Local Solve Ssrrv

s1 = Fs

r

4 Compute dsc = Ss

crvs1

5 Gather dc =

nssum

s=1

BscT ds

c

6 Global Solve F lowastccZc = minusdc

7 Scatter Zsc = BsZc

8 Update vs2 = Fs

r + SsrcZ

sc

9 Local Solve SsrrZ

sf = vs

2

10 Gather Z =

nssum

s=1

BsrTDs

rZsf

11 Output (Z)

The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss

rrUsr = rs

r which can be solved using the following algorithm

Algorithm 7 Neumann-Solver Procedure

1 Input (rsr)

2 Solve[

Asii As

ir

Asri As

rr

]

X s

Usr

=

0rsr

3 Output (Usr )

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

17

The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as

Mminus1cc F lowast

ccZc = minusMminus1cc dc

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart

Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem

92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr + Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method

1 Input (P)

2 Scatter Ps = BsrTP

3 Local Solve Ssrrv

s1 = Ps

4 Compute vs2 = Ss

crvs1

5 Gather V2 =

nssum

s=1

BscT vs

2

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

18

6 Global Solve FccV3 = V2

7 Scatter vs3 = Bs

cV3

8 Compute vs4 = Ss

rcvs3

9 Update vs5 = Ps + vs

4

10 Local Solve SsrrQ

s = vs5

11 Gather Q =

nssum

s=1

BsrQ

s

12 Output (Q)

The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner

Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm

Algorithm 9 Dirichlet Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= DsrB

srT rΓ

3 Compute Zs = Ssrrr

4 Gather Z =

nssum

s=1

BsrD

srZ

s

5 Output (Z)

We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart

Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

19

10 Numerical results

For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as

part

partx[cx(x y θ)

partu(x y θ)

partx] +

part

party[cy(x y θ)

partu(x y θ)

party] = f(x y) in Ω

where the forcing term is

f(x y) = 10

For simplicity a homogeneous Dirichlet boundary condition is imposed as

u(x y θ) = 0 on partΩ

The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025

In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5

GkΓminus SUk

Γ2

G0Γ2

6 10minus5

Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh

101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

20

Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS

Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed

Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1

102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

21

Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3

Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5

Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1

a total problem size of (10 times n)

Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

22

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 11: Primal and dual-primal iterative substructuring methods of stochastic

remaining and corner nodes as shown schematically in Fig(3)

nssum

s=1

BsrTSs

rrBsr

nssum

s=1

BsrTSs

rcBsc

nssum

s=1

BscTSs

crBsr

nssum

s=1

BscTSs

ccBsc

Ur

Uc

=

nssum

s=1

BsrTGs

r

nssum

s=1

BscTGs

c

(13)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

Figure 3 The interface boundary nodes split into remainder () and corner(bull) nodes

The corner nodal vector Uc in Eq(13) is eliminated next to obtain the following (symmetricpositive definite) reduced interface problem

(Frr minus Frc[Fcc]minus1Fcr)Ur = dr minus Frc[Fcc]

minus1dc (14)

where

Fαβ =

nssum

s=1

Bsα

TSsαβB

dα =

nssum

s=1

Bsα

TGsα

and α and β denotes subscripts r and c The above system can be solved using PCGM withan appropriate preconditioner Mminus1 defined in the next section

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

10

61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system

Figure 4 Partial assembly of corner nodes (bull)

Ssrr Ss

rcBsc

nssum

s=1

BscTSs

crBsr

nssum

s=1

BscTSs

ccBsc

Usr

Uc

=

Fsr

0

(15)

where

Fsr = Ds

rBsrrj

and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal

weighting matrix which satisfies the following property

nssum

s=1

BsrTDs

rBsr = I

The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that

share the interface boundary nodes

The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)

as

Usr = [Ss

rr]minus1(Fs

r minus SsrcB

scUc) (16)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

11

Substituting Usr into the second block of Eq(15) leads to the following coarse problem

F lowast

ccUc = dlowastc (17)

where

F lowast

cc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

dlowastc = minus

nssum

s=1

BscTSs

cr[Ssrr]

minus1Fsr

The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as

Ur =

nssum

s=1

BsrTDs

rUsr

After some algebraic manipulations the preconditioner can be expressed as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr + RT

0 [F lowast

cc]minus1R0 (18)

where

R0 =

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

7 A dual-primal iterative substructuring method of SPDEs

In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes

Partial assembly of Eq(8) leads to the following subdomain equilibrium equation

Asii As

ir AsicB

sc

Asri As

rr AsrcB

sc

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc

Usi

Usr

Uc

=

Fsi

Fsr

nssum

s=1

BscTFs

c

(19)

where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local

corner vector Usc as

Usc = Bs

cUc

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

12

Eq(19) can be rewritten in compact form as

AsUs = Fs (20)

Let Bsr be a block diagonal signed Boolean continuity matrix defined as

s=nssum

s=1

BsrU

sr = 0

Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as

1

2UT AU minus UT F rarr min (21)

subject to BU = 0

where

A =

A1

As

Ans

U =

U1

Us

Uns

F =

F1

Fs

Fns

B =[

(0 B1r 0) middot middot middot (0 Bs

r 0) middot middot middot (0 Bnsr 0)

]

By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as

L(U Λ) =1

2UT AU minus UT F + UT BT Λ (22)

Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system

Asii As

ir AsicB

sc 0

Asri As

rr AsrcB

sc Bs

rT

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc 0

0

nssum

s=1

Bsr 0 0

Usi

Usr

Uc

Λ

=

Fsi

Fsr

nssum

s=1

BscTFs

c

0

(23)

where

Λ =

λ0

λN

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

13

Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary

and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)

Eliminating the interior unknowns Usi from Eq(23) as

Usi = [As

ii]minus1(Fs

i minusAsirU

sr minusAs

icBscUc) (24)

Substituting Eq(24) into Eq(23) leads to

Ssrr Ss

rcBsc Bs

rT

nssum

s=1

BscTSs

cr

nssum

s=1

BscTSs

ccBsc 0

nssum

s=1

Bsr 0 0

Usr

Uc

Λ

=

Gsr

nssum

s=1

BscTGs

c

0

(25)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)

as

Usr = [Ss

rr]minus1(Gs

r minus SsrcB

scUc minus Bs

rT Λ) (26)

Substituting Eq(26) into Eq(25) leads to

[

Fcc minusFcr

Frc Frr

]

Uc

Λ

=

dc

dr

(27)

where

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

14

Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Fcr =

nssum

s=1

BscTSs

cr[Ssrr]

minus1BsrT

Frc =

nssum

s=1

Bsr [S

srr]

minus1SsrcB

sc

Frr =

nssum

s=1

Bsr [S

srr]

minus1BsrT

dc =

nssum

s=1

BscT (Gs

c minus Sscr[S

srr]

minus1Gsr)

dr =

nssum

s=1

Bsr [S

srr]

minus1Gsr

Solving for Uc from Eq(27) gives the following coarse problem

FccUc = (dc + FcrΛ) (28)

Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system

(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]

minus1dc (29)

Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as

M =

nssum

s=1

BsrD

srS

srrD

srB

srT (30)

8 Connection between the methods

The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as

F lowast

cc = Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr

+

nssum

s=1

BsrTDs

r[Ssrr]

minus1SrcBsc

[

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

]minus1

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

which has the same form of the dual-primal operator in Eq(29)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

15

9 Parallel implementation

In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem

91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 4 Parallel Matrix-Vector Product for Primal Method

1 Input (P)

2 Scatter Ps = BsrP

3 Compute vs1 = Ss

crPs

4 Gather V1 =

nssum

s=1

BsrT vs

1

5 Global Solve FccV2 = V1

6 Scatter vs2 = Bs

cV2

7 Compute vs3 = Ss

rcvs2

8 Update Qs = SsrrP

s minus vs3

9 Gather Q =

nssum

s=1

BsrTQs

10 Output (Q)

Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as

vsα = Ss

αβvsβ

vsα = (As

αβ minusAsαi[A

sii]

minus1Asiβ)vs

β

This procedure is outlined in the following algorithm

Algorithm 5 Dirichlet Solver Procedure

1 Input (vsβ)

2 Compute vs1 = As

iβvsβ

3 Solve Asiiv

s2 = vs

1

4 Compute vs3 = As

αivs2

5 Compute vs4 = As

αβvsβ

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

16

6 Compute vsα = vs

4 minus vs3

7 Output (vsα)

The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as

Mminus1cc FccV2 = Mminus1

cc V1

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm

Algorithm 6 Two-Level Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter Fsr = Ds

rBsrrΓ

3 Local Solve Ssrrv

s1 = Fs

r

4 Compute dsc = Ss

crvs1

5 Gather dc =

nssum

s=1

BscT ds

c

6 Global Solve F lowastccZc = minusdc

7 Scatter Zsc = BsZc

8 Update vs2 = Fs

r + SsrcZ

sc

9 Local Solve SsrrZ

sf = vs

2

10 Gather Z =

nssum

s=1

BsrTDs

rZsf

11 Output (Z)

The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss

rrUsr = rs

r which can be solved using the following algorithm

Algorithm 7 Neumann-Solver Procedure

1 Input (rsr)

2 Solve[

Asii As

ir

Asri As

rr

]

X s

Usr

=

0rsr

3 Output (Usr )

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

17

The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as

Mminus1cc F lowast

ccZc = minusMminus1cc dc

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart

Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem

92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr + Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method

1 Input (P)

2 Scatter Ps = BsrTP

3 Local Solve Ssrrv

s1 = Ps

4 Compute vs2 = Ss

crvs1

5 Gather V2 =

nssum

s=1

BscT vs

2

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

18

6 Global Solve FccV3 = V2

7 Scatter vs3 = Bs

cV3

8 Compute vs4 = Ss

rcvs3

9 Update vs5 = Ps + vs

4

10 Local Solve SsrrQ

s = vs5

11 Gather Q =

nssum

s=1

BsrQ

s

12 Output (Q)

The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner

Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm

Algorithm 9 Dirichlet Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= DsrB

srT rΓ

3 Compute Zs = Ssrrr

4 Gather Z =

nssum

s=1

BsrD

srZ

s

5 Output (Z)

We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart

Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

19

10 Numerical results

For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as

part

partx[cx(x y θ)

partu(x y θ)

partx] +

part

party[cy(x y θ)

partu(x y θ)

party] = f(x y) in Ω

where the forcing term is

f(x y) = 10

For simplicity a homogeneous Dirichlet boundary condition is imposed as

u(x y θ) = 0 on partΩ

The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025

In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5

GkΓminus SUk

Γ2

G0Γ2

6 10minus5

Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh

101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

20

Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS

Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed

Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1

102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

21

Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3

Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5

Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1

a total problem size of (10 times n)

Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

22

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 12: Primal and dual-primal iterative substructuring methods of stochastic

61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system

Figure 4 Partial assembly of corner nodes (bull)

Ssrr Ss

rcBsc

nssum

s=1

BscTSs

crBsr

nssum

s=1

BscTSs

ccBsc

Usr

Uc

=

Fsr

0

(15)

where

Fsr = Ds

rBsrrj

and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal

weighting matrix which satisfies the following property

nssum

s=1

BsrTDs

rBsr = I

The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that

share the interface boundary nodes

The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)

as

Usr = [Ss

rr]minus1(Fs

r minus SsrcB

scUc) (16)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

11

Substituting Usr into the second block of Eq(15) leads to the following coarse problem

F lowast

ccUc = dlowastc (17)

where

F lowast

cc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

dlowastc = minus

nssum

s=1

BscTSs

cr[Ssrr]

minus1Fsr

The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as

Ur =

nssum

s=1

BsrTDs

rUsr

After some algebraic manipulations the preconditioner can be expressed as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr + RT

0 [F lowast

cc]minus1R0 (18)

where

R0 =

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

7 A dual-primal iterative substructuring method of SPDEs

In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes

Partial assembly of Eq(8) leads to the following subdomain equilibrium equation

Asii As

ir AsicB

sc

Asri As

rr AsrcB

sc

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc

Usi

Usr

Uc

=

Fsi

Fsr

nssum

s=1

BscTFs

c

(19)

where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local

corner vector Usc as

Usc = Bs

cUc

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

12

Eq(19) can be rewritten in compact form as

AsUs = Fs (20)

Let Bsr be a block diagonal signed Boolean continuity matrix defined as

s=nssum

s=1

BsrU

sr = 0

Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as

1

2UT AU minus UT F rarr min (21)

subject to BU = 0

where

A =

A1

As

Ans

U =

U1

Us

Uns

F =

F1

Fs

Fns

B =[

(0 B1r 0) middot middot middot (0 Bs

r 0) middot middot middot (0 Bnsr 0)

]

By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as

L(U Λ) =1

2UT AU minus UT F + UT BT Λ (22)

Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system

Asii As

ir AsicB

sc 0

Asri As

rr AsrcB

sc Bs

rT

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc 0

0

nssum

s=1

Bsr 0 0

Usi

Usr

Uc

Λ

=

Fsi

Fsr

nssum

s=1

BscTFs

c

0

(23)

where

Λ =

λ0

λN

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

13

Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary

and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)

Eliminating the interior unknowns Usi from Eq(23) as

Usi = [As

ii]minus1(Fs

i minusAsirU

sr minusAs

icBscUc) (24)

Substituting Eq(24) into Eq(23) leads to

Ssrr Ss

rcBsc Bs

rT

nssum

s=1

BscTSs

cr

nssum

s=1

BscTSs

ccBsc 0

nssum

s=1

Bsr 0 0

Usr

Uc

Λ

=

Gsr

nssum

s=1

BscTGs

c

0

(25)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)

as

Usr = [Ss

rr]minus1(Gs

r minus SsrcB

scUc minus Bs

rT Λ) (26)

Substituting Eq(26) into Eq(25) leads to

[

Fcc minusFcr

Frc Frr

]

Uc

Λ

=

dc

dr

(27)

where

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

14

Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Fcr =

nssum

s=1

BscTSs

cr[Ssrr]

minus1BsrT

Frc =

nssum

s=1

Bsr [S

srr]

minus1SsrcB

sc

Frr =

nssum

s=1

Bsr [S

srr]

minus1BsrT

dc =

nssum

s=1

BscT (Gs

c minus Sscr[S

srr]

minus1Gsr)

dr =

nssum

s=1

Bsr [S

srr]

minus1Gsr

Solving for Uc from Eq(27) gives the following coarse problem

FccUc = (dc + FcrΛ) (28)

Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system

(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]

minus1dc (29)

Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as

M =

nssum

s=1

BsrD

srS

srrD

srB

srT (30)

8 Connection between the methods

The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as

F lowast

cc = Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr

+

nssum

s=1

BsrTDs

r[Ssrr]

minus1SrcBsc

[

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

]minus1

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

which has the same form of the dual-primal operator in Eq(29)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

15

9 Parallel implementation

In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem

91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 4 Parallel Matrix-Vector Product for Primal Method

1 Input (P)

2 Scatter Ps = BsrP

3 Compute vs1 = Ss

crPs

4 Gather V1 =

nssum

s=1

BsrT vs

1

5 Global Solve FccV2 = V1

6 Scatter vs2 = Bs

cV2

7 Compute vs3 = Ss

rcvs2

8 Update Qs = SsrrP

s minus vs3

9 Gather Q =

nssum

s=1

BsrTQs

10 Output (Q)

Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as

vsα = Ss

αβvsβ

vsα = (As

αβ minusAsαi[A

sii]

minus1Asiβ)vs

β

This procedure is outlined in the following algorithm

Algorithm 5 Dirichlet Solver Procedure

1 Input (vsβ)

2 Compute vs1 = As

iβvsβ

3 Solve Asiiv

s2 = vs

1

4 Compute vs3 = As

αivs2

5 Compute vs4 = As

αβvsβ

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

16

6 Compute vsα = vs

4 minus vs3

7 Output (vsα)

The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as

Mminus1cc FccV2 = Mminus1

cc V1

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm

Algorithm 6 Two-Level Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter Fsr = Ds

rBsrrΓ

3 Local Solve Ssrrv

s1 = Fs

r

4 Compute dsc = Ss

crvs1

5 Gather dc =

nssum

s=1

BscT ds

c

6 Global Solve F lowastccZc = minusdc

7 Scatter Zsc = BsZc

8 Update vs2 = Fs

r + SsrcZ

sc

9 Local Solve SsrrZ

sf = vs

2

10 Gather Z =

nssum

s=1

BsrTDs

rZsf

11 Output (Z)

The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss

rrUsr = rs

r which can be solved using the following algorithm

Algorithm 7 Neumann-Solver Procedure

1 Input (rsr)

2 Solve[

Asii As

ir

Asri As

rr

]

X s

Usr

=

0rsr

3 Output (Usr )

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

17

The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as

Mminus1cc F lowast

ccZc = minusMminus1cc dc

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart

Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem

92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr + Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method

1 Input (P)

2 Scatter Ps = BsrTP

3 Local Solve Ssrrv

s1 = Ps

4 Compute vs2 = Ss

crvs1

5 Gather V2 =

nssum

s=1

BscT vs

2

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

18

6 Global Solve FccV3 = V2

7 Scatter vs3 = Bs

cV3

8 Compute vs4 = Ss

rcvs3

9 Update vs5 = Ps + vs

4

10 Local Solve SsrrQ

s = vs5

11 Gather Q =

nssum

s=1

BsrQ

s

12 Output (Q)

The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner

Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm

Algorithm 9 Dirichlet Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= DsrB

srT rΓ

3 Compute Zs = Ssrrr

4 Gather Z =

nssum

s=1

BsrD

srZ

s

5 Output (Z)

We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart

Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

19

10 Numerical results

For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as

part

partx[cx(x y θ)

partu(x y θ)

partx] +

part

party[cy(x y θ)

partu(x y θ)

party] = f(x y) in Ω

where the forcing term is

f(x y) = 10

For simplicity a homogeneous Dirichlet boundary condition is imposed as

u(x y θ) = 0 on partΩ

The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025

In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5

GkΓminus SUk

Γ2

G0Γ2

6 10minus5

Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh

101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

20

Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS

Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed

Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1

102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

21

Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3

Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5

Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1

a total problem size of (10 times n)

Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

22

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 13: Primal and dual-primal iterative substructuring methods of stochastic

Substituting Usr into the second block of Eq(15) leads to the following coarse problem

F lowast

ccUc = dlowastc (17)

where

F lowast

cc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

dlowastc = minus

nssum

s=1

BscTSs

cr[Ssrr]

minus1Fsr

The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as

Ur =

nssum

s=1

BsrTDs

rUsr

After some algebraic manipulations the preconditioner can be expressed as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr + RT

0 [F lowast

cc]minus1R0 (18)

where

R0 =

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

7 A dual-primal iterative substructuring method of SPDEs

In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes

Partial assembly of Eq(8) leads to the following subdomain equilibrium equation

Asii As

ir AsicB

sc

Asri As

rr AsrcB

sc

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc

Usi

Usr

Uc

=

Fsi

Fsr

nssum

s=1

BscTFs

c

(19)

where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local

corner vector Usc as

Usc = Bs

cUc

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

12

Eq(19) can be rewritten in compact form as

AsUs = Fs (20)

Let Bsr be a block diagonal signed Boolean continuity matrix defined as

s=nssum

s=1

BsrU

sr = 0

Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as

1

2UT AU minus UT F rarr min (21)

subject to BU = 0

where

A =

A1

As

Ans

U =

U1

Us

Uns

F =

F1

Fs

Fns

B =[

(0 B1r 0) middot middot middot (0 Bs

r 0) middot middot middot (0 Bnsr 0)

]

By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as

L(U Λ) =1

2UT AU minus UT F + UT BT Λ (22)

Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system

Asii As

ir AsicB

sc 0

Asri As

rr AsrcB

sc Bs

rT

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc 0

0

nssum

s=1

Bsr 0 0

Usi

Usr

Uc

Λ

=

Fsi

Fsr

nssum

s=1

BscTFs

c

0

(23)

where

Λ =

λ0

λN

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

13

Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary

and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)

Eliminating the interior unknowns Usi from Eq(23) as

Usi = [As

ii]minus1(Fs

i minusAsirU

sr minusAs

icBscUc) (24)

Substituting Eq(24) into Eq(23) leads to

Ssrr Ss

rcBsc Bs

rT

nssum

s=1

BscTSs

cr

nssum

s=1

BscTSs

ccBsc 0

nssum

s=1

Bsr 0 0

Usr

Uc

Λ

=

Gsr

nssum

s=1

BscTGs

c

0

(25)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)

as

Usr = [Ss

rr]minus1(Gs

r minus SsrcB

scUc minus Bs

rT Λ) (26)

Substituting Eq(26) into Eq(25) leads to

[

Fcc minusFcr

Frc Frr

]

Uc

Λ

=

dc

dr

(27)

where

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

14

Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Fcr =

nssum

s=1

BscTSs

cr[Ssrr]

minus1BsrT

Frc =

nssum

s=1

Bsr [S

srr]

minus1SsrcB

sc

Frr =

nssum

s=1

Bsr [S

srr]

minus1BsrT

dc =

nssum

s=1

BscT (Gs

c minus Sscr[S

srr]

minus1Gsr)

dr =

nssum

s=1

Bsr [S

srr]

minus1Gsr

Solving for Uc from Eq(27) gives the following coarse problem

FccUc = (dc + FcrΛ) (28)

Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system

(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]

minus1dc (29)

Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as

M =

nssum

s=1

BsrD

srS

srrD

srB

srT (30)

8 Connection between the methods

The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as

F lowast

cc = Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr

+

nssum

s=1

BsrTDs

r[Ssrr]

minus1SrcBsc

[

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

]minus1

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

which has the same form of the dual-primal operator in Eq(29)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

15

9 Parallel implementation

In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem

91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 4 Parallel Matrix-Vector Product for Primal Method

1 Input (P)

2 Scatter Ps = BsrP

3 Compute vs1 = Ss

crPs

4 Gather V1 =

nssum

s=1

BsrT vs

1

5 Global Solve FccV2 = V1

6 Scatter vs2 = Bs

cV2

7 Compute vs3 = Ss

rcvs2

8 Update Qs = SsrrP

s minus vs3

9 Gather Q =

nssum

s=1

BsrTQs

10 Output (Q)

Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as

vsα = Ss

αβvsβ

vsα = (As

αβ minusAsαi[A

sii]

minus1Asiβ)vs

β

This procedure is outlined in the following algorithm

Algorithm 5 Dirichlet Solver Procedure

1 Input (vsβ)

2 Compute vs1 = As

iβvsβ

3 Solve Asiiv

s2 = vs

1

4 Compute vs3 = As

αivs2

5 Compute vs4 = As

αβvsβ

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

16

6 Compute vsα = vs

4 minus vs3

7 Output (vsα)

The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as

Mminus1cc FccV2 = Mminus1

cc V1

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm

Algorithm 6 Two-Level Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter Fsr = Ds

rBsrrΓ

3 Local Solve Ssrrv

s1 = Fs

r

4 Compute dsc = Ss

crvs1

5 Gather dc =

nssum

s=1

BscT ds

c

6 Global Solve F lowastccZc = minusdc

7 Scatter Zsc = BsZc

8 Update vs2 = Fs

r + SsrcZ

sc

9 Local Solve SsrrZ

sf = vs

2

10 Gather Z =

nssum

s=1

BsrTDs

rZsf

11 Output (Z)

The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss

rrUsr = rs

r which can be solved using the following algorithm

Algorithm 7 Neumann-Solver Procedure

1 Input (rsr)

2 Solve[

Asii As

ir

Asri As

rr

]

X s

Usr

=

0rsr

3 Output (Usr )

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

17

The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as

Mminus1cc F lowast

ccZc = minusMminus1cc dc

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart

Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem

92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr + Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method

1 Input (P)

2 Scatter Ps = BsrTP

3 Local Solve Ssrrv

s1 = Ps

4 Compute vs2 = Ss

crvs1

5 Gather V2 =

nssum

s=1

BscT vs

2

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

18

6 Global Solve FccV3 = V2

7 Scatter vs3 = Bs

cV3

8 Compute vs4 = Ss

rcvs3

9 Update vs5 = Ps + vs

4

10 Local Solve SsrrQ

s = vs5

11 Gather Q =

nssum

s=1

BsrQ

s

12 Output (Q)

The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner

Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm

Algorithm 9 Dirichlet Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= DsrB

srT rΓ

3 Compute Zs = Ssrrr

4 Gather Z =

nssum

s=1

BsrD

srZ

s

5 Output (Z)

We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart

Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

19

10 Numerical results

For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as

part

partx[cx(x y θ)

partu(x y θ)

partx] +

part

party[cy(x y θ)

partu(x y θ)

party] = f(x y) in Ω

where the forcing term is

f(x y) = 10

For simplicity a homogeneous Dirichlet boundary condition is imposed as

u(x y θ) = 0 on partΩ

The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025

In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5

GkΓminus SUk

Γ2

G0Γ2

6 10minus5

Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh

101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

20

Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS

Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed

Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1

102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

21

Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3

Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5

Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1

a total problem size of (10 times n)

Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

22

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 14: Primal and dual-primal iterative substructuring methods of stochastic

Eq(19) can be rewritten in compact form as

AsUs = Fs (20)

Let Bsr be a block diagonal signed Boolean continuity matrix defined as

s=nssum

s=1

BsrU

sr = 0

Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as

1

2UT AU minus UT F rarr min (21)

subject to BU = 0

where

A =

A1

As

Ans

U =

U1

Us

Uns

F =

F1

Fs

Fns

B =[

(0 B1r 0) middot middot middot (0 Bs

r 0) middot middot middot (0 Bnsr 0)

]

By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as

L(U Λ) =1

2UT AU minus UT F + UT BT Λ (22)

Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system

Asii As

ir AsicB

sc 0

Asri As

rr AsrcB

sc Bs

rT

nssum

s=1

BscTAs

ci

nssum

s=1

BscTAs

cr

nssum

s=1

BscTAs

ccBsc 0

0

nssum

s=1

Bsr 0 0

Usi

Usr

Uc

Λ

=

Fsi

Fsr

nssum

s=1

BscTFs

c

0

(23)

where

Λ =

λ0

λN

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

13

Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary

and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)

Eliminating the interior unknowns Usi from Eq(23) as

Usi = [As

ii]minus1(Fs

i minusAsirU

sr minusAs

icBscUc) (24)

Substituting Eq(24) into Eq(23) leads to

Ssrr Ss

rcBsc Bs

rT

nssum

s=1

BscTSs

cr

nssum

s=1

BscTSs

ccBsc 0

nssum

s=1

Bsr 0 0

Usr

Uc

Λ

=

Gsr

nssum

s=1

BscTGs

c

0

(25)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)

as

Usr = [Ss

rr]minus1(Gs

r minus SsrcB

scUc minus Bs

rT Λ) (26)

Substituting Eq(26) into Eq(25) leads to

[

Fcc minusFcr

Frc Frr

]

Uc

Λ

=

dc

dr

(27)

where

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

14

Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Fcr =

nssum

s=1

BscTSs

cr[Ssrr]

minus1BsrT

Frc =

nssum

s=1

Bsr [S

srr]

minus1SsrcB

sc

Frr =

nssum

s=1

Bsr [S

srr]

minus1BsrT

dc =

nssum

s=1

BscT (Gs

c minus Sscr[S

srr]

minus1Gsr)

dr =

nssum

s=1

Bsr [S

srr]

minus1Gsr

Solving for Uc from Eq(27) gives the following coarse problem

FccUc = (dc + FcrΛ) (28)

Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system

(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]

minus1dc (29)

Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as

M =

nssum

s=1

BsrD

srS

srrD

srB

srT (30)

8 Connection between the methods

The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as

F lowast

cc = Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr

+

nssum

s=1

BsrTDs

r[Ssrr]

minus1SrcBsc

[

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

]minus1

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

which has the same form of the dual-primal operator in Eq(29)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

15

9 Parallel implementation

In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem

91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 4 Parallel Matrix-Vector Product for Primal Method

1 Input (P)

2 Scatter Ps = BsrP

3 Compute vs1 = Ss

crPs

4 Gather V1 =

nssum

s=1

BsrT vs

1

5 Global Solve FccV2 = V1

6 Scatter vs2 = Bs

cV2

7 Compute vs3 = Ss

rcvs2

8 Update Qs = SsrrP

s minus vs3

9 Gather Q =

nssum

s=1

BsrTQs

10 Output (Q)

Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as

vsα = Ss

αβvsβ

vsα = (As

αβ minusAsαi[A

sii]

minus1Asiβ)vs

β

This procedure is outlined in the following algorithm

Algorithm 5 Dirichlet Solver Procedure

1 Input (vsβ)

2 Compute vs1 = As

iβvsβ

3 Solve Asiiv

s2 = vs

1

4 Compute vs3 = As

αivs2

5 Compute vs4 = As

αβvsβ

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

16

6 Compute vsα = vs

4 minus vs3

7 Output (vsα)

The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as

Mminus1cc FccV2 = Mminus1

cc V1

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm

Algorithm 6 Two-Level Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter Fsr = Ds

rBsrrΓ

3 Local Solve Ssrrv

s1 = Fs

r

4 Compute dsc = Ss

crvs1

5 Gather dc =

nssum

s=1

BscT ds

c

6 Global Solve F lowastccZc = minusdc

7 Scatter Zsc = BsZc

8 Update vs2 = Fs

r + SsrcZ

sc

9 Local Solve SsrrZ

sf = vs

2

10 Gather Z =

nssum

s=1

BsrTDs

rZsf

11 Output (Z)

The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss

rrUsr = rs

r which can be solved using the following algorithm

Algorithm 7 Neumann-Solver Procedure

1 Input (rsr)

2 Solve[

Asii As

ir

Asri As

rr

]

X s

Usr

=

0rsr

3 Output (Usr )

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

17

The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as

Mminus1cc F lowast

ccZc = minusMminus1cc dc

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart

Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem

92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr + Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method

1 Input (P)

2 Scatter Ps = BsrTP

3 Local Solve Ssrrv

s1 = Ps

4 Compute vs2 = Ss

crvs1

5 Gather V2 =

nssum

s=1

BscT vs

2

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

18

6 Global Solve FccV3 = V2

7 Scatter vs3 = Bs

cV3

8 Compute vs4 = Ss

rcvs3

9 Update vs5 = Ps + vs

4

10 Local Solve SsrrQ

s = vs5

11 Gather Q =

nssum

s=1

BsrQ

s

12 Output (Q)

The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner

Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm

Algorithm 9 Dirichlet Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= DsrB

srT rΓ

3 Compute Zs = Ssrrr

4 Gather Z =

nssum

s=1

BsrD

srZ

s

5 Output (Z)

We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart

Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

19

10 Numerical results

For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as

part

partx[cx(x y θ)

partu(x y θ)

partx] +

part

party[cy(x y θ)

partu(x y θ)

party] = f(x y) in Ω

where the forcing term is

f(x y) = 10

For simplicity a homogeneous Dirichlet boundary condition is imposed as

u(x y θ) = 0 on partΩ

The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025

In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5

GkΓminus SUk

Γ2

G0Γ2

6 10minus5

Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh

101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

20

Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS

Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed

Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1

102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

21

Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3

Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5

Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1

a total problem size of (10 times n)

Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

22

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 15: Primal and dual-primal iterative substructuring methods of stochastic

Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary

and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)

Eliminating the interior unknowns Usi from Eq(23) as

Usi = [As

ii]minus1(Fs

i minusAsirU

sr minusAs

icBscUc) (24)

Substituting Eq(24) into Eq(23) leads to

Ssrr Ss

rcBsc Bs

rT

nssum

s=1

BscTSs

cr

nssum

s=1

BscTSs

ccBsc 0

nssum

s=1

Bsr 0 0

Usr

Uc

Λ

=

Gsr

nssum

s=1

BscTGs

c

0

(25)

where

Ssαβ = As

αβ minusAsαi[A

sii]

minus1Asiβ

Gsα = Fs

α minusAsαi[A

sii]

minus1Fsi

The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)

as

Usr = [Ss

rr]minus1(Gs

r minus SsrcB

scUc minus Bs

rT Λ) (26)

Substituting Eq(26) into Eq(25) leads to

[

Fcc minusFcr

Frc Frr

]

Uc

Λ

=

dc

dr

(27)

where

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

14

Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Fcr =

nssum

s=1

BscTSs

cr[Ssrr]

minus1BsrT

Frc =

nssum

s=1

Bsr [S

srr]

minus1SsrcB

sc

Frr =

nssum

s=1

Bsr [S

srr]

minus1BsrT

dc =

nssum

s=1

BscT (Gs

c minus Sscr[S

srr]

minus1Gsr)

dr =

nssum

s=1

Bsr [S

srr]

minus1Gsr

Solving for Uc from Eq(27) gives the following coarse problem

FccUc = (dc + FcrΛ) (28)

Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system

(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]

minus1dc (29)

Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as

M =

nssum

s=1

BsrD

srS

srrD

srB

srT (30)

8 Connection between the methods

The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as

F lowast

cc = Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr

+

nssum

s=1

BsrTDs

r[Ssrr]

minus1SrcBsc

[

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

]minus1

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

which has the same form of the dual-primal operator in Eq(29)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

15

9 Parallel implementation

In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem

91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 4 Parallel Matrix-Vector Product for Primal Method

1 Input (P)

2 Scatter Ps = BsrP

3 Compute vs1 = Ss

crPs

4 Gather V1 =

nssum

s=1

BsrT vs

1

5 Global Solve FccV2 = V1

6 Scatter vs2 = Bs

cV2

7 Compute vs3 = Ss

rcvs2

8 Update Qs = SsrrP

s minus vs3

9 Gather Q =

nssum

s=1

BsrTQs

10 Output (Q)

Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as

vsα = Ss

αβvsβ

vsα = (As

αβ minusAsαi[A

sii]

minus1Asiβ)vs

β

This procedure is outlined in the following algorithm

Algorithm 5 Dirichlet Solver Procedure

1 Input (vsβ)

2 Compute vs1 = As

iβvsβ

3 Solve Asiiv

s2 = vs

1

4 Compute vs3 = As

αivs2

5 Compute vs4 = As

αβvsβ

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

16

6 Compute vsα = vs

4 minus vs3

7 Output (vsα)

The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as

Mminus1cc FccV2 = Mminus1

cc V1

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm

Algorithm 6 Two-Level Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter Fsr = Ds

rBsrrΓ

3 Local Solve Ssrrv

s1 = Fs

r

4 Compute dsc = Ss

crvs1

5 Gather dc =

nssum

s=1

BscT ds

c

6 Global Solve F lowastccZc = minusdc

7 Scatter Zsc = BsZc

8 Update vs2 = Fs

r + SsrcZ

sc

9 Local Solve SsrrZ

sf = vs

2

10 Gather Z =

nssum

s=1

BsrTDs

rZsf

11 Output (Z)

The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss

rrUsr = rs

r which can be solved using the following algorithm

Algorithm 7 Neumann-Solver Procedure

1 Input (rsr)

2 Solve[

Asii As

ir

Asri As

rr

]

X s

Usr

=

0rsr

3 Output (Usr )

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

17

The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as

Mminus1cc F lowast

ccZc = minusMminus1cc dc

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart

Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem

92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr + Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method

1 Input (P)

2 Scatter Ps = BsrTP

3 Local Solve Ssrrv

s1 = Ps

4 Compute vs2 = Ss

crvs1

5 Gather V2 =

nssum

s=1

BscT vs

2

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

18

6 Global Solve FccV3 = V2

7 Scatter vs3 = Bs

cV3

8 Compute vs4 = Ss

rcvs3

9 Update vs5 = Ps + vs

4

10 Local Solve SsrrQ

s = vs5

11 Gather Q =

nssum

s=1

BsrQ

s

12 Output (Q)

The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner

Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm

Algorithm 9 Dirichlet Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= DsrB

srT rΓ

3 Compute Zs = Ssrrr

4 Gather Z =

nssum

s=1

BsrD

srZ

s

5 Output (Z)

We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart

Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

19

10 Numerical results

For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as

part

partx[cx(x y θ)

partu(x y θ)

partx] +

part

party[cy(x y θ)

partu(x y θ)

party] = f(x y) in Ω

where the forcing term is

f(x y) = 10

For simplicity a homogeneous Dirichlet boundary condition is imposed as

u(x y θ) = 0 on partΩ

The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025

In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5

GkΓminus SUk

Γ2

G0Γ2

6 10minus5

Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh

101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

20

Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS

Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed

Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1

102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

21

Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3

Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5

Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1

a total problem size of (10 times n)

Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

22

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 16: Primal and dual-primal iterative substructuring methods of stochastic

Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Fcr =

nssum

s=1

BscTSs

cr[Ssrr]

minus1BsrT

Frc =

nssum

s=1

Bsr [S

srr]

minus1SsrcB

sc

Frr =

nssum

s=1

Bsr [S

srr]

minus1BsrT

dc =

nssum

s=1

BscT (Gs

c minus Sscr[S

srr]

minus1Gsr)

dr =

nssum

s=1

Bsr [S

srr]

minus1Gsr

Solving for Uc from Eq(27) gives the following coarse problem

FccUc = (dc + FcrΛ) (28)

Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system

(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]

minus1dc (29)

Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as

M =

nssum

s=1

BsrD

srS

srrD

srB

srT (30)

8 Connection between the methods

The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as

F lowast

cc = Fcc =

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as

Mminus1 =

nssum

s=1

BsrTDs

r[Ssrr]

minus1DsrB

sr

+

nssum

s=1

BsrTDs

r[Ssrr]

minus1SrcBsc

[

nssum

s=1

BscT (Ss

cc minus Sscr[S

srr]

minus1Ssrc)B

sc

]minus1

nssum

s=1

BscTScr[S

srr]

minus1DsrB

sr

which has the same form of the dual-primal operator in Eq(29)

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

15

9 Parallel implementation

In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem

91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 4 Parallel Matrix-Vector Product for Primal Method

1 Input (P)

2 Scatter Ps = BsrP

3 Compute vs1 = Ss

crPs

4 Gather V1 =

nssum

s=1

BsrT vs

1

5 Global Solve FccV2 = V1

6 Scatter vs2 = Bs

cV2

7 Compute vs3 = Ss

rcvs2

8 Update Qs = SsrrP

s minus vs3

9 Gather Q =

nssum

s=1

BsrTQs

10 Output (Q)

Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as

vsα = Ss

αβvsβ

vsα = (As

αβ minusAsαi[A

sii]

minus1Asiβ)vs

β

This procedure is outlined in the following algorithm

Algorithm 5 Dirichlet Solver Procedure

1 Input (vsβ)

2 Compute vs1 = As

iβvsβ

3 Solve Asiiv

s2 = vs

1

4 Compute vs3 = As

αivs2

5 Compute vs4 = As

αβvsβ

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

16

6 Compute vsα = vs

4 minus vs3

7 Output (vsα)

The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as

Mminus1cc FccV2 = Mminus1

cc V1

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm

Algorithm 6 Two-Level Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter Fsr = Ds

rBsrrΓ

3 Local Solve Ssrrv

s1 = Fs

r

4 Compute dsc = Ss

crvs1

5 Gather dc =

nssum

s=1

BscT ds

c

6 Global Solve F lowastccZc = minusdc

7 Scatter Zsc = BsZc

8 Update vs2 = Fs

r + SsrcZ

sc

9 Local Solve SsrrZ

sf = vs

2

10 Gather Z =

nssum

s=1

BsrTDs

rZsf

11 Output (Z)

The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss

rrUsr = rs

r which can be solved using the following algorithm

Algorithm 7 Neumann-Solver Procedure

1 Input (rsr)

2 Solve[

Asii As

ir

Asri As

rr

]

X s

Usr

=

0rsr

3 Output (Usr )

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

17

The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as

Mminus1cc F lowast

ccZc = minusMminus1cc dc

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart

Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem

92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr + Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method

1 Input (P)

2 Scatter Ps = BsrTP

3 Local Solve Ssrrv

s1 = Ps

4 Compute vs2 = Ss

crvs1

5 Gather V2 =

nssum

s=1

BscT vs

2

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

18

6 Global Solve FccV3 = V2

7 Scatter vs3 = Bs

cV3

8 Compute vs4 = Ss

rcvs3

9 Update vs5 = Ps + vs

4

10 Local Solve SsrrQ

s = vs5

11 Gather Q =

nssum

s=1

BsrQ

s

12 Output (Q)

The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner

Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm

Algorithm 9 Dirichlet Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= DsrB

srT rΓ

3 Compute Zs = Ssrrr

4 Gather Z =

nssum

s=1

BsrD

srZ

s

5 Output (Z)

We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart

Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

19

10 Numerical results

For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as

part

partx[cx(x y θ)

partu(x y θ)

partx] +

part

party[cy(x y θ)

partu(x y θ)

party] = f(x y) in Ω

where the forcing term is

f(x y) = 10

For simplicity a homogeneous Dirichlet boundary condition is imposed as

u(x y θ) = 0 on partΩ

The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025

In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5

GkΓminus SUk

Γ2

G0Γ2

6 10minus5

Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh

101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

20

Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS

Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed

Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1

102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

21

Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3

Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5

Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1

a total problem size of (10 times n)

Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

22

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 17: Primal and dual-primal iterative substructuring methods of stochastic

9 Parallel implementation

In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem

91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 4 Parallel Matrix-Vector Product for Primal Method

1 Input (P)

2 Scatter Ps = BsrP

3 Compute vs1 = Ss

crPs

4 Gather V1 =

nssum

s=1

BsrT vs

1

5 Global Solve FccV2 = V1

6 Scatter vs2 = Bs

cV2

7 Compute vs3 = Ss

rcvs2

8 Update Qs = SsrrP

s minus vs3

9 Gather Q =

nssum

s=1

BsrTQs

10 Output (Q)

Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as

vsα = Ss

αβvsβ

vsα = (As

αβ minusAsαi[A

sii]

minus1Asiβ)vs

β

This procedure is outlined in the following algorithm

Algorithm 5 Dirichlet Solver Procedure

1 Input (vsβ)

2 Compute vs1 = As

iβvsβ

3 Solve Asiiv

s2 = vs

1

4 Compute vs3 = As

αivs2

5 Compute vs4 = As

αβvsβ

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

16

6 Compute vsα = vs

4 minus vs3

7 Output (vsα)

The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as

Mminus1cc FccV2 = Mminus1

cc V1

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm

Algorithm 6 Two-Level Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter Fsr = Ds

rBsrrΓ

3 Local Solve Ssrrv

s1 = Fs

r

4 Compute dsc = Ss

crvs1

5 Gather dc =

nssum

s=1

BscT ds

c

6 Global Solve F lowastccZc = minusdc

7 Scatter Zsc = BsZc

8 Update vs2 = Fs

r + SsrcZ

sc

9 Local Solve SsrrZ

sf = vs

2

10 Gather Z =

nssum

s=1

BsrTDs

rZsf

11 Output (Z)

The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss

rrUsr = rs

r which can be solved using the following algorithm

Algorithm 7 Neumann-Solver Procedure

1 Input (rsr)

2 Solve[

Asii As

ir

Asri As

rr

]

X s

Usr

=

0rsr

3 Output (Usr )

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

17

The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as

Mminus1cc F lowast

ccZc = minusMminus1cc dc

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart

Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem

92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr + Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method

1 Input (P)

2 Scatter Ps = BsrTP

3 Local Solve Ssrrv

s1 = Ps

4 Compute vs2 = Ss

crvs1

5 Gather V2 =

nssum

s=1

BscT vs

2

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

18

6 Global Solve FccV3 = V2

7 Scatter vs3 = Bs

cV3

8 Compute vs4 = Ss

rcvs3

9 Update vs5 = Ps + vs

4

10 Local Solve SsrrQ

s = vs5

11 Gather Q =

nssum

s=1

BsrQ

s

12 Output (Q)

The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner

Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm

Algorithm 9 Dirichlet Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= DsrB

srT rΓ

3 Compute Zs = Ssrrr

4 Gather Z =

nssum

s=1

BsrD

srZ

s

5 Output (Z)

We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart

Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

19

10 Numerical results

For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as

part

partx[cx(x y θ)

partu(x y θ)

partx] +

part

party[cy(x y θ)

partu(x y θ)

party] = f(x y) in Ω

where the forcing term is

f(x y) = 10

For simplicity a homogeneous Dirichlet boundary condition is imposed as

u(x y θ) = 0 on partΩ

The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025

In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5

GkΓminus SUk

Γ2

G0Γ2

6 10minus5

Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh

101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

20

Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS

Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed

Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1

102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

21

Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3

Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5

Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1

a total problem size of (10 times n)

Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

22

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 18: Primal and dual-primal iterative substructuring methods of stochastic

6 Compute vsα = vs

4 minus vs3

7 Output (vsα)

The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as

Mminus1cc FccV2 = Mminus1

cc V1

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm

Algorithm 6 Two-Level Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter Fsr = Ds

rBsrrΓ

3 Local Solve Ssrrv

s1 = Fs

r

4 Compute dsc = Ss

crvs1

5 Gather dc =

nssum

s=1

BscT ds

c

6 Global Solve F lowastccZc = minusdc

7 Scatter Zsc = BsZc

8 Update vs2 = Fs

r + SsrcZ

sc

9 Local Solve SsrrZ

sf = vs

2

10 Gather Z =

nssum

s=1

BsrTDs

rZsf

11 Output (Z)

The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss

rrUsr = rs

r which can be solved using the following algorithm

Algorithm 7 Neumann-Solver Procedure

1 Input (rsr)

2 Solve[

Asii As

ir

Asri As

rr

]

X s

Usr

=

0rsr

3 Output (Usr )

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

17

The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as

Mminus1cc F lowast

ccZc = minusMminus1cc dc

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart

Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem

92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr + Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method

1 Input (P)

2 Scatter Ps = BsrTP

3 Local Solve Ssrrv

s1 = Ps

4 Compute vs2 = Ss

crvs1

5 Gather V2 =

nssum

s=1

BscT vs

2

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

18

6 Global Solve FccV3 = V2

7 Scatter vs3 = Bs

cV3

8 Compute vs4 = Ss

rcvs3

9 Update vs5 = Ps + vs

4

10 Local Solve SsrrQ

s = vs5

11 Gather Q =

nssum

s=1

BsrQ

s

12 Output (Q)

The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner

Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm

Algorithm 9 Dirichlet Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= DsrB

srT rΓ

3 Compute Zs = Ssrrr

4 Gather Z =

nssum

s=1

BsrD

srZ

s

5 Output (Z)

We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart

Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

19

10 Numerical results

For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as

part

partx[cx(x y θ)

partu(x y θ)

partx] +

part

party[cy(x y θ)

partu(x y θ)

party] = f(x y) in Ω

where the forcing term is

f(x y) = 10

For simplicity a homogeneous Dirichlet boundary condition is imposed as

u(x y θ) = 0 on partΩ

The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025

In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5

GkΓminus SUk

Γ2

G0Γ2

6 10minus5

Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh

101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

20

Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS

Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed

Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1

102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

21

Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3

Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5

Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1

a total problem size of (10 times n)

Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

22

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 19: Primal and dual-primal iterative substructuring methods of stochastic

The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as

Mminus1cc F lowast

ccZc = minusMminus1cc dc

where

Mminus1cc =

nssum

s=1

BscTAs

ccBsc

Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart

Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem

92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)

For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as

Qj = (Frr + Frc[Fcc]minus1Fcr)Pj

can be computed using the following algorithm

Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method

1 Input (P)

2 Scatter Ps = BsrTP

3 Local Solve Ssrrv

s1 = Ps

4 Compute vs2 = Ss

crvs1

5 Gather V2 =

nssum

s=1

BscT vs

2

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

18

6 Global Solve FccV3 = V2

7 Scatter vs3 = Bs

cV3

8 Compute vs4 = Ss

rcvs3

9 Update vs5 = Ps + vs

4

10 Local Solve SsrrQ

s = vs5

11 Gather Q =

nssum

s=1

BsrQ

s

12 Output (Q)

The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner

Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm

Algorithm 9 Dirichlet Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= DsrB

srT rΓ

3 Compute Zs = Ssrrr

4 Gather Z =

nssum

s=1

BsrD

srZ

s

5 Output (Z)

We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart

Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

19

10 Numerical results

For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as

part

partx[cx(x y θ)

partu(x y θ)

partx] +

part

party[cy(x y θ)

partu(x y θ)

party] = f(x y) in Ω

where the forcing term is

f(x y) = 10

For simplicity a homogeneous Dirichlet boundary condition is imposed as

u(x y θ) = 0 on partΩ

The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025

In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5

GkΓminus SUk

Γ2

G0Γ2

6 10minus5

Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh

101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

20

Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS

Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed

Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1

102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

21

Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3

Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5

Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1

a total problem size of (10 times n)

Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

22

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 20: Primal and dual-primal iterative substructuring methods of stochastic

6 Global Solve FccV3 = V2

7 Scatter vs3 = Bs

cV3

8 Compute vs4 = Ss

rcvs3

9 Update vs5 = Ps + vs

4

10 Local Solve SsrrQ

s = vs5

11 Gather Q =

nssum

s=1

BsrQ

s

12 Output (Q)

The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner

Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm

Algorithm 9 Dirichlet Preconditioner Effect Procedure

1 Input (rΓ )

2 Scatter rsΓ

= DsrB

srT rΓ

3 Compute Zs = Ssrrr

4 Gather Z =

nssum

s=1

BsrD

srZ

s

5 Output (Z)

We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart

Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

19

10 Numerical results

For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as

part

partx[cx(x y θ)

partu(x y θ)

partx] +

part

party[cy(x y θ)

partu(x y θ)

party] = f(x y) in Ω

where the forcing term is

f(x y) = 10

For simplicity a homogeneous Dirichlet boundary condition is imposed as

u(x y θ) = 0 on partΩ

The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025

In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5

GkΓminus SUk

Γ2

G0Γ2

6 10minus5

Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh

101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

20

Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS

Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed

Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1

102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

21

Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3

Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5

Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1

a total problem size of (10 times n)

Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

22

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 21: Primal and dual-primal iterative substructuring methods of stochastic

10 Numerical results

For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as

part

partx[cx(x y θ)

partu(x y θ)

partx] +

part

party[cy(x y θ)

partu(x y θ)

party] = f(x y) in Ω

where the forcing term is

f(x y) = 10

For simplicity a homogeneous Dirichlet boundary condition is imposed as

u(x y θ) = 0 on partΩ

The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025

In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5

GkΓminus SUk

Γ2

G0Γ2

6 10minus5

Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh

101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

20

Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS

Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed

Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1

102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

21

Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3

Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5

Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1

a total problem size of (10 times n)

Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

22

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 22: Primal and dual-primal iterative substructuring methods of stochastic

Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS

Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed

Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1

102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

21

Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3

Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5

Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1

a total problem size of (10 times n)

Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

22

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 23: Primal and dual-primal iterative substructuring methods of stochastic

Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3

Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5

Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1

a total problem size of (10 times n)

Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

22

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 24: Primal and dual-primal iterative substructuring methods of stochastic

Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3

Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5

Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)

Problem size PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9

ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)

Thirdly we fix problem size per subdomain while increase the overall problem size by adding

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

23

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 25: Primal and dual-primal iterative substructuring methods of stochastic

Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof

CPUs PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9

Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)

Subdomains PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9

more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain

Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant

Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments

11 Conclusion

Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

24

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 26: Primal and dual-primal iterative substructuring methods of stochastic

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order

0 2 4 6 8 10 12 1410

minus5

10minus4

10minus3

10minus2

10minus1

100

101

Iteration number

Rel

ativ

e re

sidu

al

PminusDDMDPminusDDM

Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order

primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors

Acknowledgments

The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

25

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26

Page 27: Primal and dual-primal iterative substructuring methods of stochastic

Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)

CoV PDDM DP-DDM

1st 2nd 3rd 1st 2nd 3rd

005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9

References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77

689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing

Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic

and Hydraulic

[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in

Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009

PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)

[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series

in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271

High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001

26