Multigrid Finite Element Methods for Electromagnetic Field Modeling (Zhu/Multigrid) || Iterative Methods, Preconditioners, and Multigrid

CHAPTER 4

ITERATIVE METHODS, PRECONDITIONERS, AND MULTlGRlD

Like most numerical methods, the final step in the finite element solution of a boundary value problem is the solution of a system of linear (or nonlinear, if the original system is nonlinear) equations with unknowns the degrees of freedom (DOF) in the finite element approximation. For problems of relevance to practical engineering applications, the number of DOE and, hence, the dimension of the resulting system is very large for direct matrix inversion techniques to be of practical use. Therefore, iterative methods must be employed for its solution.

It is the purpose of this chapter to review the tools that the computational mathematics community has put in place to assist us in this objective. The literature on iterative methods for the numerical solution of the types of systems of equations resulting from approximations to BVPs using finite methods is very rich. Classic texts such as [ l ] and [2] provide a comprehensive presentation of the state-of-the-art in iterative methods for sparse linear systems, accompanied by a thorough list of references.

The presentation in this chapter will serve as a brief overview of some of the methods that we have found useful and employed for the classes of electromagnetic BVPs considered in this book. We begin the discussion with a review of key definitions and results from linear matrix theory. This is followed by the review of some very popular iterative solution processes, which have been found to be very effective in practical applications. Our presentation emphasizes the importance of preconditioning the matrix for improving the effectiveness and speed of convergence of the iterative process. This discussion leads naturally to the concept of multigrid as a preconditioningprocess. Even though the specifics of the implementation of multigrid in the context of iterative solution of electromagnetic

111

Multigrid Finite Element Methods for Electromagnetic Field Modeling by Yu Zhu and Andreas C. Cangellaris

Copyright I 2006 Institute of Electrical and Electronics Engineers.

11 2 ITERATIVE METHODS, PRECONDITIONERS. AND MULTlGRlD

boundary value problems is presented in later chapters, this chapter concludes with a discussion of the basic ideas behind multigrid and its algorithmic implementation.

4.1 DEFINITIONS

4.1.1 Vector space, inner product, and norm

With R denoting the set of real numbers, he set of all the N-tuples of real numbers is a vector space designated as RN; the set of all the A4 x N real mamces is a vector space denoted by RMX N . Similarly, with C denoting the set of complex numbers, the set of all the N-tuples of complex numbers is a vector space designated as CN; the set of all the M x N complex matrices is a vector space denoted C M x N .

The innerproduct is an operation on two vectors which produces a scalar. Consider the vector space RN, its inner product is denoted as RNxRN+R. The standard Euclidean inner product is defined as

(4.1)

Consider the vector space CN, its inner product is denoted as CN xCN+@. Motivated by the desire to have the real associated norm on a complex vector, usually the inner product for complex vectors is the standard Hermitian inner product defined as

T b , Y ) = x Y.

(4.2) H @,Y) = 5 Y-

The aforementioned inner products are used to define the following vector norm known as 2-norm:

11~112 = rn (4.3)

For a complex vector, its norm induced by the Hermitian inner-product is a real number, while the one induced by Euclidean inner-product is a complex number.

Another norm which is often used in practice is the infinity norm is defined as

Finally, we will also make use of the A-norm denoted as IIzIIA, fora vector 2 E CN, defined as

11Z11A = d-1 (4.5)

where A E ( C N x N . Vector norms are used extensively for estimating the error in the numerical solution

of matrix equations. For example, let 2 be an approximation to the vector x. Then, the absolute ermr in 2 is calculated as

cabs = 112 - xllr (4.6)

where either the 2-norm or the infinity norm is used. Similarly, the relative ermr in x is calculated as follows:

Ilf - xll llxll Ere1 = -* (4.7)

Matrix norms are also used extensively in iterative algorithms for the numerical manipulation of matrices. Of particular relevance to the topics discussed in this book are the

DEFINITIONS 11 3

p-nonns, which are defined through the following expression:

A useful property of the pnorms is the following:

llA4lP 5 IIAllPll4IP. (4.9)

Examples of matrix pnorms are the 2-norm and the infinity norm mentioned above. In our subsequent discussion we will make use of the aforementioned vector and matrix

norms without necessarily making any specific reference to which norm is being used. Unless indicated otherwise, the 2-norm will be assumed to be the norm of choice in the development of all algorithms presented in this and later chapters.

4.1.2 Matrix eigenvalues and eigenvectors

The eigenvectors and eigenvalues of a matrix hold center stage in the development and assessment of iterative solution processes [3]. They are obtained from the solution of the following matrix equation:

At+ = X i ~ l i , (i = 1,2,. . . , N ) , (4.10)

where A E C N x N , Xi E C is the eigenvalue of A, and the vector vi E C N is called the right eigenuector of A associated with Xi. The set of all eigenvalues of A is referred to as the spectrum of A. An alternative statement of (4.10) is obtained by defining the matrix V with columns the right eigenvectors of A as V = [v1 , 212, . . . , 2 1 ~ 1 :

AV = 1/11, (4.1 1)

where A = diag(X1, X p , . . . , AN). The above constitutes the eigen decomposition statement of A.

The eigenvalues of A are the roots of its characteristic polynomial, p ( s ) = det(sI - A), where I denotes the identity matrix and s E @. Hence, it is

N (4.12) p ( s ) = n(s - xi) = s N + a 1 s N - l + a 2 ~ ~ - ~ +. . . + a N - 1 ~ + a N .

i=l

It follows immediately from the above equations that

N

det(A) = n Xi. i= 1

(4.13)

Furthermore, the following result may be deduced from (4.12) for the trace of A, defined as, tr(A) = ELl Aii:

N

tr(A) = I X i . (4.14) i = l

A matrix A is symmetric if it remains unaltered under transposition; hence, AT = A. If A E ( C M x N , the complex conjugate of its transpose is often used. In the following, we

114 ITERATIVE METHODS. PRECONDITIONERS, AND MULTIGRID

will use the term adjoint to refer to the Hermitian conjugate of a matrix, and we will use the short-hand notation AH for its symbolic representation; hence, AH = (A*)T. A complex matrix is called Hermitian if it is equal to its adjoint, AH = A. It follows immediately that the diagonal elements of a Hermitian matrix are real numbers.

The lejl eigenvector wi is defined as follows:

wHA = w:,&, (i = 1,2, * . * , N ) . (4.15)

When written in matrix eigen decomposition form, the above equation becomes

W H A = A W H or A H W = W A * (4.16)

where W = [wl, w2, . . . , WN]. Throughout the book, unless otherwise stated, "eigenvector" will imply "right eigenvector."

Next, we make use of the inner product to define a class of complex scalars that are most useful in practice are the Rayleigh quotients, defined for any nonzero x E CN, as follows:

(4.17)

It follows immediately from (4.10) that an eigenvalue, Xi, of a matrix A may be written as the following Rayleigh quotient:

(4.18)

where v, is the corresponding eigenvector.

4.1.3 Properties of Hermitian matrices

If A is Hermitian, it is straightforward to show that its eigenvalues are real and its eigenvectors form an orthonormal basis. The fact that the eigenvalues are real follows immediately from (4.10) through the following sequence of operations:

Xi = vHAvi = v:AHvi = (v:Av,)* = A* a ' (4.19)

With all eigenvectors normalized such that, ] lv i l l~ = 1, i = 1 , 2 , . . . , N, let vi and u j be two eigenvectors corresponding to distinct eigenvalues X i and Xj, X i # A j . Then, making use of the fact that A is Hermitian and its eigenvalues real, we have from (4.10),

(4.20)

The above result may be written in matrix form as

V H V = I . (4.21)

A matrix satisfying (4.21) is called unitary. From (4.21) it follows immediately that the inverse of a unitary matrix is equal to its complex transpose.

Returning to the case of left eigenvectors, it follows immediately from (4.1 1) and (4.16) that, if A is Hermitian, then W = V and A = A.

DEFINITIONS 11 5

4.1.4 Positive definite matrices

A matrix A E C N x N that satisfies (x, Ax) > 0 Vx E CN and x # 0, is called positive dejnite. From the discussion in the previous subsection it follows immediately that the eigenvalues of a Hermitian positive definite matrix are positive.

A matrix that satisfies (5, Az) 2 0 Vx E CN, is called positive semi-dejnite. If A is also Hermitian it follows immediately that its eigenvalues are non-negative.

An example of a positive semi-definite matrix is the Hermitian matrix AHA. Given a matrix A E C N x N , the square roots of the eigenvalues of A H A are called the singular values of A and are denoted by g,.

4.1.5 Independence, invariant subspaces, and similarity transformations

A set of vectors {yl, y2 , . . . , ym} in CN is said to be linearly independent if the statement CZ, aiy, = 0 and a, E C implies ai = 0, i = 1,2, . . . , m. It follows immediately from (4.21) that the eigenvectors of a Hermitian matrix are linearly independent.

A subspace of CN is a subset that is also a vector space. Given a set of vectors { y l , y2 , . . , ym} in CN, the set of all linear combinations of these vectors forms a subspace S that will be called the span of {yl, y 2 , . . - , ym}. We write

(4.22)

Of particular interest is the case where the vectors {yl, y2, . . . , y m } are linearly independent. In this case, they constitute a set of bases for S, and their number, m, is the dimension of s.

A subspace S C CN such that

y E S + A y E S (4.23)

is said to be invariant with respect to multiplication by A. For example, a set of eigenvectors of A defines an invariant subspace.

Next, let us consider the matrices A E C N x N , P E CNxn, and B E Cnxn, satisfying the relation AP = PB. Let y be an eigenvector of B with corresponding eigenvalue A, that is, By = Xy. In view of the above relationship, it is APy = PBy = XPy. This implies that the eigenvalues of B are also eigenvalues of A. For the special case where P is a square, non-singular matrix (hence, A, B, and P are all of the same dimension), the relation AP = PB implies that A and B have the same eigenvalues. The matrices A and B are called similar and P is referred to as the similarity transfonnation between A and B according to the relationship,

B = P-'AP. (4.24)

Similarity transformations are particularly useful for the reduction of a given matrix A to any one of several canonical forms. One of the most important canonical form of matrices is the Jordan form. The pertinent result for the Jordan decomposition of a matrix A is the following. Given a matrix A E C N x *, then there exists a nonsingularmatrix X E C N x N

116 ITERATIVE METHODS, PRECONDITIONERS, AND MULTIGRID

such that X - ' A X = diag(JI,Jz,. . , J t ) , where the Jordan block

(4.25)

is an mi x mi matrix and C::~ mi = N . Another important canonical form, which will be found useful in dealing with the solution

of matrix eigenvalue problems, is the Schur canonicalfonn [3]. If A E C N x N , then there exists a unitary matrix Q E C N x N such that

Q H A Q = R, (4.26)

where R is upper triangular. The column vectors forming the matrix Q are referred to as Schur vectors. If A is Hermitian, the upper triangular matrix R is actually the diagonal eigenvalue matrix A, and W = V = Q. Hence, for Hermitian matrices, Schur decomposition and eigen-decomposition are the same.

4.2 ITERATIVE METHODS FOR THE SOLUTION OF LARGE MATRICES

The finite element approximation of a linear boundary value problem results in the following, complex matrix equation

A X = f, (4.27)

where A E C N x N , 2 E CN, is the unknown vector with elements the DOF in the approximation, and b E CN represents the excitation vector. When N is sufficiently small, direct matrix decomposition methods, such as Gaussian elimination and Cholesky factorization can be employed [3]. However, what is of concern here is the case where N is large enough for such direct methods to be computationally prohibitive.

Because of the compact support of the expansion functions used in the development of finite element approximations of boundary value problems, the resulting matrices are very sparse. More specifically, the number of nonzero entries in each row of a finite element matrix is in the order of tens. Consequently, significant savings in memory can be achieved by storing only the nonzero entries of the A along with information about the row and column indices for each nonzero entry. Ways in which such compact storage is employed are discussed in [ 11.

Another important finite element matrix operation that is required quite often prior to solution is re-ordering. Re-ordering of a matrix is prompted by the desire to contain the number of fill-ins during its Gaussian elimination (or its incomplete LU decomposition, which is often required in the iterative solution process). Numerous heuristic methods, such as the minimum degree algorithm, are available for identifying and implementing effective re-ordering of a given matrix. A good overview is given in [ 11.

When the number of unknowns is large enough for the direct solution of the matrix to be impractical, an iterative process is utilized for obtaining an approximation to the solution. There are two general classes of iterative matrix solution methods, stationary methods and non-stationary methods. Stationary methods include the Jacobi, Gauss-Seidel, and succes- sive over relaxation (SOR) processes. The convergence of these methods is not guaranteed

ITERATIVE METHODS FOR THE SOLUTION OF U R G E MATRICES 11 7

for all types of matrices. However, they are known to be very effective when applied to matrices resulting from the finite element approximation of elliptic partial differential equations. Thus they are suitable for the solution of electrostatic and magnetostatic boundary value problems.

4.2.1 Stationary methods

To present the basic principles of stationary methods we will examine, briefly, some of the members of this class of methods. We begin by splitting the matrix A as follows:

A = D - L - U, (4.28)

where D is a diagonal matrix with elements the diagonal entries of A; - L is its strict lower part; and -U is its strict upper part. The Jacobi iteration determines the (k+l)th iterate of the solution, 2 k + 1 . from the previous iterate, Z k , through the diagonal sweep,

Z k + l = D-' [f ( L + u) z k ] . (4.29)

Its implementation requires two separate arrays for storing Z k + l and X k .

A more compact form of the Jacobi iteration results by defining the Jacobi iteration matrix, RJ, as follows:

RJ = D-' ( L + V). (4.30)

A simple matrix manipulation of the above equation, making use of (4.28). yields the following alternative form for R J:

RJ = I - D - ~ A , (4.31)

where I is the identity matrix. Using these definitions, (4.29) is cast in the following form:

Zk+1 = R j X k + D-'f . (4.32)

The weightedhcobi method is obtained from (4.32) through a slight modification of the iteration process as follows:

Z k + l = [(I - W ) I ~ W R J ] x k -t wD- ' f , (4.33)

where the weighting factor w is introduced with the objective of improving the convergence rate of the iteration. The weighted Jacobi iteration matrix

(4.34) R J , = [(I - W)I 4- W R J ]

x k + l = R J w X k -k WD-'f.

is introduced to cast (4.33) in a more compact form as follows:

(4.35)

It is useful to note that, in view of (4.31). the weighted Jacobi iteration matrix may be cast in the following form:

RJ, = I - w ~ - l ~ . (4.36)

The Gauss-Seidel iteration determines Z k f l from Z k through either forward or backward substitutions. For the forward Gauss-Seidel we have

( D - L ) z k + l = u x k f Zk+1 = ( D - L)-' [f + u z k ] . (4.37)

11 8 ITERATIVE METHODS, PRECONDITIONERS. AND MULTIGRID

On the other hand, the backward Gauss-Seidel performs the backward substitution,

Clearly, a single array suffices for storing the updated iterate Xk+1 and the previous one Xk. In the forward substitution the entries in the solution array are updated in ascending order; in the backward substitution they are updated in descending order.

4.2.2 Convergence of iterative methods

From the above discussion it is clear that the development of an iterative method begins with casting the original matrix equation in the form

x = B x + g . (4.39)

Subsequently, an approximation, 2, to x is sought through the iteration

i k + l = B k k + g , k = 0 , 1 , 2 , . . * , (4.40)

where the subscript k denotes the kth iterate of the approximation and 20 is an initial guess for x.

Let ek = x - 2 k denote the error in the approximation of x at iteration k. Then, it follows from (4.39) and (4.40) that the error after k iterations satisfies the following equation:

(4.41) k ek = B eo.

A bound for the error after k iterations can be obtained making use of (4.9),

llekll 5 IIBllklleOll. (4.42)

Clearly, convergence in the iteration process, manifested through reduction in the norm of the error, is achieved provided that llBll < 1. This result allows the definition of an asymptotic convergence factor for an iterative method in terms of the spectral radius, p ( B ) , of B , which is defined in terns of the eigenvalues of B by

It can be shown in [ 11 that, in any norm, it is

Thus, in view of (4.42), we conclude that the iterative method with iteration matrix B will converge for any (non-trivial) initial guess if and only if p ( B ) < 1.

The spectral radius is also useful as an estimate of the number of iterations, k, needed to reduce the error by a factor of 109. From (4.42), in view of (4.44), it is

For the above results to be useful in practice one must be able to either calculate or estimate the spectral radius of the matrix. For a very large matrix calculation of its eigenvalues is computationally expensive. Thus an estimate of the largest eigenvalue is sought instead.

GENERALIZED MINIMUM RESIDUAL METHOD 11 9

For this purpose, the following useful result, known as Gershgorin's theorem, can be used. Gershgorin's theorem states that any eigenvalue of a matrix B, B E C N x N , is located within one of the discs on the complex plane with centers Bii and radii, ri. given by

j = N

j = l , j # i

4.2.3 Non-stationary methods

Non-stationary methods are based on the so-calledprojection techniques. Let V = [ v l , . . . , v,] denote an N x m matrix whose column vectors form a set of bases for subspace K, and W = [WI,. . . , w,] denote an N x m matrix whose column vectors form a set of bases for subspace L. The objective of a projection method is, given an initial guess 20, to find an approximate solution, 2, such that f E 10 + K and (f - A2) I L. Therefore, K is referred to as the expansion space, and L as theprojection space. The approximate solution is written as

2 = 50 + v y . (4.47)

Thus the orthogonality condition (f-A2) I L leads to the following equation for y:

WT(f - AXO - AVy) = 0 + WTAVy = WTrO, (4.48)

where TO = f - h o is the initial residual. In view of the above, the approximate solution is written in the form

(4.49)

It must be noted that the matrix WTAV, the inversion of which leads to the calculation of the approximate solution, is a square matrix of dimension m. The key objective of a projection method is to construct subspaces iC and L with the smallest dimension possible in order to expedite the computation of the approximate solution.

One important class of projection methods is the class of Krylov subspace methods. In this class of methods the expansion space K is constructed as follows:

2 = xo + V(WTAV)-lWT~o.

K,(A, T O ) = span { T O , ATO, A2ro,. . . , An-'ro} . (4.50)

This subspace will be referred to as the Krylov subspace. Two popular Krylov subspace- based iterative methods, namely, the generalized minimum residual method and the conjugate gradient method, are presented next.

4.3 GENERALIZED MiNlMUM RESiDUAL METHOD

The generalized minimum residual method (GMRES) [ 1 J attempts to find a vector 2, such that the norm of the residual T = f - A2 is minimized. More specifically, using the notation and definitions in the previous section, let xo be an initial guess and 2 E xo + K. Then, it is

2 = Xo + v y (4.5 1)

and the norm of the residual is written as

120 ITERATIVE METHODS, PRECONDITIONERS, AND MULTlGRlD

where TO = f - h o . For the definition of the norm, it is straightforward to deduce that the minimization of the residual ( ( T I ] is equivalent to solving for y such that the residual vector r is perpendicular to AV,

min llrll 3 (TO - AVy) I AV. (4.53)

A comparison of (4.53) with (4.48), suggests that GMRES can be considered as a projection method where the projection space W is related to the expansion space V as W= AV, and V is taken to be the m-th Krylov subspace of (4.50).

From this introductory discussion it becomes apparent that the GMRES method may be broken down into two steps. The first step involves the construction of the Krylov subspace. The second step deals with the least-squares minimization process.

The construction of the Krylov subspace K, entails the generation of a set of orthonormal vectors ~ 1 , 2 1 2 , - . . , v, for K,. These vectors are often referred to as Arnoldi vectors. The generation of the Arnoldi vectors from (4.50) is facilitated through the Gram-Schmidt formula

Gi+i = Avi - h1,ivl - h2,ivz ... - h. z , z v z , . . (4.54)

where the coefficients hp,q are chosen such that Gi+1 is orthogonal to the earlier Arnoldi vectors and its norm is unity. It is straightforward to show that this is the case if

In matrix form, the above process after m steps yields the following result:

A [VI 212 . * . urn]

(4.55)

7

Vm

T +hm+l,mvm+lem (4.56) -

Vm hm,m-l hm,m

/

Hn,m

k---d hm+l,mem = 1211 v2 1 . . vm vm+11[ Hm'm T ] 9 - Hm+l,m

= [Vl 212 urn]

vm+1

where Hm,m is an m x m upper Hessenberg matrix. (A matrix H is upper Hessenberg if H ( i , j ) = 0 for i > j + 1.) The matrix H , + I , ~ is simply Hm,m augmented by one extra row. The vector em is a vector of zeros except for its m-th row entry which has a value of one.

The following is a pseudo-code description of the algorithm that can be used for the generation of the Arnoldi vectors urn.

Algorithm (4.1): Generation of Arnoldi Vectors. 1 ~1 - ro/llroll; 2 i c l ; 3 for i = 1,2,... ,m

GENERALIZED MINIMUM RESIDUAL METHOD 121

-h,1

-

Once the Arnoldi vectors have been generated, use of the relationship in (4.56) in the expression (4.52) for the norm of the residual yields the following expression for the functional to be minimized:

llrll = l l ro - Avrnyll = l l ro - Vm+1Hm+l,myll = llllrollel - Hm+l,myII, (4.57)

where use was made of the fact that V;+,TO = llroll due to Step 1 in Algorithm (4.1). The second step of GMRES concerns the determination of the vector y that minimizes

Ilrll. The process for the solution of this least-squares minimization problem begins with the transformation of the upper Hessenberg matrix H,+I,~ into an upper triangular matrix. This can be achieved through a series of left-multiplications of the matrix with a properly chosen series of rotation matrices Ri (i = 1,2, . - . , m). To reveal the structure of these rotation matrices. consider the form of the transformed matrix after multiplication by the (i-1)th rotation matrix,

- -

hllm-l “ m 1 ...

1 (4.58)

where it is shown that the sub-diagonal elements hj,j-1 (J’ = 2 , 3 , . . , i) have been elimi- nated. The N on top of the elements in the resulting matrix up to and including the ith row, indicates that these elements have been transformed from their original values. The next step involves the elimination of hi+l,i. For this, we left-multiply the matrix by the rotation matrix Ri,

Ri =

where it is

1

1 ci si

-s i % 1

1

(4.59)

(4.60)

122 ITERATIVE METHODS, PRECONDITIONERS. AND MULTlGRlD

The rotation matrices Ri, which are often referred to as Givens rotation matrices, are unitary matrices. Left multiplication of Ri with fim+l,m eliminates the hi+l,i and changes the entries on the i and (i + 1)-th rows. Continuous multiplication with rotation matrices transforms H,+I,~ to a triangular form

Rm+l,m

Thus the functional to be minimized assumes the simpler form (note that the llro llel is also multiplied on the left by the rotation matrices),

min llrll = min IIIIrollel - Hm+l,myll = min ~ ~ ~ l ~ o ~ \ Q m e i - Rm+l,mYIJ, (4.62)

where use was made of the fact that Qm is unitary. The least-squares solution to this minimization problem is straightforward to obtain by

noticing that, if we set llrOllQmel = g = [Jz, 6IT, where it should be clear that the vector Jm of length m has the same m first elements of llro IIQmel, then it is

IIII~OIIQmel - Rm+l,mYII = 161 + llJm - Rm,mYll, (4.63)

where Rm,m is obtained from Rm+l,m by deleting the last row. From this last equation it is now evident that the minimum is achieved when the second term on the right-hand side is zero. Hence, it is

Furthermore, the minimum residual of ~ ~ ~ o ~ ~ e ~ - H m + 1 , m y ~ ~ is 161. The GMRES algorithm results from the merging of the Arnoldi process for the construc-

tion of an orthonormal basis for the Krylov subspace with the aforementioned least-squares minimization process. A pseudo-code description of the algorithm is given below.

y = R - ' m,mgm, - (4.64)

Algorithm (4.2): GMRES 1 TO = b - AZO. v1 - T O / I I T O I I . g = llrollel 2 i c l ; 3 do 3.a 6;+1 t Av;; 3.b for j = l . . . i . 3.b.l hj, , = (v j , i j ;+i); iji+l + 5i+i - hj,ivi;

3.c hi+l,i = l l ~ i + 1 l l s vi+l = fi i+l/hi+l, i; 3.d [h l , i ,hz , i , . . . ,hi+l,iIT = Ri-1 . . . R 1 [ h l , i , h ~ , i r . ' . >hi+ l , i lT ;

3.g if last element of g is sufficiently small go to 4; else i++. 4 Compute y as the solution of H,$,y=g, , where H,-, results from H,+1,, by deleting

the last row, and g, is obtained from g by deleting the last row. 5 Z + - Z O + K Y

Bad conditioning of the matrix A could compound the robustness of the Arnoldi process. Therefore, use of a preconditioning matrix is most appropriate, especially when dealing with very large matrices. Preconditioning matrices and their utilization for improving convergence of the iterative solution process will be discussed in one of the following

CONJUGATE GRADIENT METHOD 123

sections. For the purposes of this section it suffices to say that a preconditioner M-' is an approximation to the inverse of A. When M-' is available, we solve, instead, the following matrix equation:

M-'Ax = M-'b. (4.65)

The changes required in the GMRES process are obvious. Step 1 in Algorithm (4.2) is changed to ro = M-'(b - Axo). while the matrix operation is Step 3.a becomes i&+1 +

The major drawback of GMRES is that CPU time and memory storage required per iteration both increase with the number of iterations. At each iteration, the new Arnoldi vector has to be made orthogonal to all previous Arnoldi vectors. Nevertheless, this cost may be acceptable if an effective preconditioner can be constructed to contain the number of iterations. The construction of such preconditioners using multigrid and multilevel methods is one of the key themes of this monograph and is discussed in detail later in this and subsequent chapters.

M - ' Av;.

4.4 CONJUGATE GRADIENT METHOD

The conjugate gradient (CG) method is an effective method for the iterative solution of symmetric, positive definite systems [l]; however, it is also often used for the iterative solution of general symmetric systems. CG is the oldest and best known of the Krylov subspace-based iterative methods.

The method proceeds by generating sequences of approximate solutions, residuals corresponding to the approximate solutions, and search directions along which the next solution approximate is calculated and the new residual is computed. Although the length of these sequences increases with the number of iterations, only the very latest approximate solution, residual, and search direction are kept in memory. With regards to computational cost, in each iteration only two inner products and one matrix-vector product are computed.

The objective of the CG method is to find a vector z E xo + K, x = xo + vy, (4.66)

such that the A-norm (4.5) of the error is minimized,

Such a minimization requires that A-'ro - V y is perpendicular to V with respect to an A-inner product; hence,

(4.68)

Contrasting this result to (4.53). it is immediately evident that GMRES minimizes the residual with respect to the Euclidean norm while CG minimizes the error with respect to the A-norm. Furthermore, considering the above result in the light of (4.48), we recognize CG as a projection method in which both the expansion and projection matrices are the same, V = W , both constructed from the same Krylov subspace Km.

Multiplication of both sides of (4.56) by V,' yields

(4.69)


-a1 P2

P2 a 2 P3

-

AVm = VmTm + Pm+lVm+l, Tm = P3 a 3 P3

- Pm am-

If A is symmetric, then Hm,m is also symmetric. In this case, the product AV, is cast in the form

(4.70)

Algorithm (43): Lanczos Algorithm 1 vo + 0, 81 = IlroII, v1 + ro/81; 2 2 - 1 ; 3 f o r i = l , . . . , m

3.a **+I + Av,; 3.b 0% = ( ~ , , C a + l ) ; 3 . c iti+l + Bi+ l - pivi-1 - aav,; 3.d Pi+l = Il*)i+lll~ V i + l = *i+l/P*+I;

The above Lanczos algorithm generates the column vectors in Vm, referred to as Lanczos

Since W = V , the approximate solution to the equation Az = f is obtained from (4.48) vectors, and the nonzero entries ai, Pi. in Tm.

as follows: VZAV,y = V:ro =+ y = T;'llrollel

(4.71)

where use was made of (4.70) and the fact that V ~ I - 0 = IlI-ollel, which is due to Step 1 in the Lanczos algorithm. However, such a solution process is not efficient because all the Lanczos vectors in V, must be stored and T, has to be inverted at each iteration. This is where the CG method comes in to provide for a more efficient iterative solution process.

To explain, consider the equation that results after multiplying y by both sides of the matrix equation on the left of (4.70), and making use of the fact that y = T;'llrollel (see (4.71)). It is

+ z = xo + Vmy = 50 + VmT;lllrollei,

T (4.72)

In view of the fact that Vmllrollel = TO and llr0II = PI, the above result may be cast in the following form:

(4.73)

To proceed further, use will be made of the symmetric, tri-diagonal form of T, to construct the following factorization:

AVmy = VmTmY + Pm+lvm+lemY

= vmlII-oIIel+ Pm+ivm+ie~T~lIIroIIei.

I-, = 1-0 - AVmy = -,Bm+lP1vm+le~T;lel.

T, =

1 712 1


where Um = L z , and

The inverse of Tm is obtained as U;'A;'L;', with the inverses of Lm and Um given by qi = Pi/&-1. (4.75)

1 - 772 772 773

1 - 773 1

1

(4.76)

(4.77)

This final result is most useful when interpreted in a matrix form that captures the sequence of residuals generated during the iteration process,

. (4.78)

/

om In this form it is immediately evident that the residual vectors are orthogonal; hence, they constitute a scaled set of the Lanczos vectors. Thus one may use them in place of the vectors Vm in the equation for the approximate solution. This yields

z = 50 + Vmy

(4.79)

= zo + RmR,lU~lA,lL,lIlrollel. A more compact form of this result is obtained through the definition of the following matrices, Uh, Pm, and vector om.

u:, = n,'urnom =

1 -7722 1 -73

1 (4.80)

126 ITERATIVE METHODS, PRECONDITIONERS. AND MULTIGRID

(4.8 1)

In view of the above, (4.79) assumes the form,

x = xo -I- pmgm. (4.82)

The column vectors in P, are referred to as the search vectors. A slight manipulation of (4.81), yields a recursive equation for their construction,

2 R, = P,U', + pi = T, + vi pi-1 (i = 1 ,2 , . . * , m), (PO = T O ) . (4.83)

In addition, recursive relations for both the approximate solution and the residual vector can be obtained from (4.82),

1

xi 1 Xi

xm = 20 + Pmjjm + xi = ~ i - 1 + -pi-l (i = 1,2 , * . . , m), (4.84)

r , = f - Ax, + ri = ri-1 - -Api-l (i = 1 , 2 , . . . , m).

The relations in (4.83) and (4.84) can be combined to yield the following simple CG algorithm.

Algorithm (4.4): A simple CG . 1 Initial guess 10. TO + f - A Z O , and po t T O ;

2 do until convergence 2.a Build T, using Algorithm 4.3; 2.b Factorize T, for A, and 7,;

2.d T , = f - Ax, and p , = r , + r):p,- l .

2.c I, = 2 , - 1 + + p , 4 ;

Despite its simplicity (which stems from avoiding the storage of Lanczos vectors V,), the above algorithm is hindered by the computational cost associated with the factorization of the matrices Ti. However, we can obtain the recursive coefficients without the factorization of Ti by making use of the orthogonality of the residual vectors (see (4.78)) and the fact that the column vectors of P, are A-orthogonal. The latter is deduced from (4.80) and (4.84) as follows:

(AP,,P,) = P,TAP,

To see how these two orthogonality properties of R, and Pm can be utilized for the development of recursive relations for xi, ri and pi without the need to factor the matrix Ti, let us assume, in view of (4.83) and (4.84), that the desired recursive relations are as follows:

(4.86) (4.87) (4.88)


The coefficients a and p are computed using the orthogonality properties of the vectors of R, and P, as shown next.

The inner-product of both sides of (4.87) with ~ i - 1 yields the following formula for a,

The inner-product of both sides of (4.88) with Api-1 yields the following formula for p.

(4.90)

where the results Api-1 = - (rZ -T+ 1)/a from (4.87) and (pi- 1 , Api- 1 ) = ( ~ i - 1 , ~ i - 1)/a

from (4.89) were used. The complete CG algorithm is obtained as the combination of (4.86- 4.90).

p = - (Ti,APz-1) - - (Ti, Tz)

( ~ i - 1 , ~i - 1 ) ' (pi - 1 , Api- 1 )

2.e if T, is small, converge and stop.

In the above algorithm each iteration involves the calculation of one matrix-vector product, Api-1, and two innerproducts ( p i - 1 , Api - l ) and ( ~ i , ~ i ) , since (ri-1, ~ i - 1 ) is already available from the previous iteration.

When a symmetric preconditioner M is available the equation to be solved becomes

M - ~ A x = M - l f . (4.91)

Since the M-'A is, in general, not symmetric, the Euclidean inner product must be replaced by an M-inner product for the CG process to be applicable. This is easily seen by noting that it is

(M- lAx , y ) ~ = ( A z ) ~ ~ = z ~ M M - ~ A ~ = (5, M - l A y j ~ . (4.92)

The resulting algorithm is the preconditioned CG algorithm [ 11.

Algorithm (4.6): Preconditioned Conjugate Gradient Algorithm (Version 1). 1 Initialize 10. TO +- (f - A x o ) , zo t M - l r o , p o c 20;

2 do until convergence 2.a a = ( ~ z - l ~ ~ x - l ) M .

(P.-1 t M-' Ap.- 1) M ' 2.b x a = x,-1 + a p a - l , T, = T , - I - a A p , - l , 2, = M - l r o ; 2.c p = F;

Z . - l v Z * - l ) M

2.d Pr=Za+OP3p,-l; 2.e If Ta is small enough (convergence) stop.

Recognizingthat (zz, Z ~ ) M = zTMz, = (zz, T ~ ) and ( p z - l , M - ' A P , - ~ ) M = ( p Z - l , Apz- l ) , the above algorithm can be further simplified as follows:

Algorithm (4.7): Preconditioned Conjugate Gradient Method (Version 2). 1 Initialize 10. TO + (f - A x o ) , 20 +- M - ~ T o , po + 20;

128 ITERATIVE M ~ O D S . PRECONDITIONERS, AND MULTIGRID

2 do until convergence a = (z.-l*r*-l) .

(Px- l>AP*- l ) ' 2 . b xa = xa-i + a p r - l , rr = ~ ~ - 1 - a A p a - l , zt = M - ~ T , ; 2 . c p = .a);

(2,- 1 *r*- 1 2 - d Pa = %a + PPa-1; 2.e if rr is small enough (convergence) stop.

Compared to the CG Algorithm (4.3, each iteration of the preconditioned CG involves one additional solution of the pseudo-residual equation M z , = I-,.

4.5 THE PRECONDITIONER MATRIX

A preconditioner matrix M-' is an approximation of the inverse of the matrix A in the equation Ax = f . Thus the preconditioning operation M - l Ax = M-'f may be interpreted as an operation to obtain an approximation of the solution x of the original matrix equation. In practice, the objective of preconditioning is to produce a new matrix, M-'A, the iterative solution of which exhibits faster convergence than that of the original matrix A. The obvious requirements for an effective preconditioner are that it must be easy and computationally inexpensive to compute and apply, while at the same time it must be a good approximation to the inverse of A. Clearly, these are conflicting requirements. Thus the decision on the choice of a preconditioner comes from a compromise between the computational cost associated with its construction and application, and the improvement in convergence resulting from its use. In the following we review three commonly-used preconditioners, namely, the Jacobi preconditioner, the symmetric Gauss-Seidel preconditioner, and incomplete LU factorization preconditioner.

4.5.1 The Jacob1 preconditioner

The Jacobi preconditioner is constructed through the iterative Jacobi process of (4.32). In view (4.32) the approximation after m steps is

With xo = 0 the above becomes

\ i=o

Then, the Jacobi preconditioner, MJ-l, is defined by m- 1

MY' = C R/D-?

(4.93)

(4.94)

(4.95) i = O

If A is symmetric, then each term in the sum is symmetric; thus the Jacobi preconditioner is symmetric.

The simplest preconditioner is obtained for m = 1, MF' = D-' . Hence, in this case the Jacobi preconditioner is, simply, the inverse of the diagonal matrix with elements the diagonal elements of A. The resulting matrix D-'A is interpreted as the scaling of each row of A by its diagonal element so as to make the diagonal elements of D-' A unity. This preconditoner is also referred to as the diagonal scaling preconditioner.

THE PRECONDITIONER MATRIX 129

4.5.2 The symmetric Gauss-Seidel preconditioner

For the purposes of preconditioning of the system Az = f, the matrix resulting from either the forward or the backward iteration of (4.37) or (4.38), respectively, can be used as a preconditioner. However if the matrix A is symmetric and the preconditioned CG is the desired iterative solver, the construction of a symmetric Gauss-Seidel (S-GS) preconditioner is needed. The way S-GS is constructed is described next.

Each S-GS iteration consists of one forward GS step and one backward GS step. Hence, we write

z k + $ = (D - L)-'Uzr, + (D - L)-'f zk+ ' = (D - U)-'Lzk++ + (D - U)-'f

Forward GS,

Backward GS. (4.96)

When combined into one, these two steps yield the following update equation:

Q+1=(0 - U)- 'L(D - L)-'Uzk + (D - u>-' [L(D - L)-1+ I] f =(D - U ) - l L ( D - L)- 'Uzk + (D - U)-'D(D - L)-1 f. (4.97) \ / \ /

Y * HS-GS Rs-cs

With zo = 0, at step m the above equation yields

Thus the S-GS preconditioner, is given by

m-1

(4.98)

(4.99) i = O

Let us consider the ith term in the sum

Hs-csRs-Gs i = [(D - U)- 'L(D - L)-'UIi (D - U)- 'D(D - L)-'

= (D - u)-' [L(D - L)-'U(D - u)-']1 D(D - q - 1 .

(4.100)

The proof of its symmetry is facilitated through the use of the following, easy to prove, identities:

D(D - u)-W = U ( D - u)- 'D,

D(D - L)-'L = L(D - L)-'D. (4.101)

Hence, with each one of the terms in the sum symmetric, the symmetry of M;-lCs follows. An alternative construction of a symmetric Gauss-Seidel preconditioner is possible

through an m-step forward Gauss-Seidel iteration followed by an m-step backward one.


The relevant equations describing the construction process are

xk+l/2 = (D - L)-'UXk -f (D - L)- 'b - - Hf -GS R f - c s

m-I

(4.102)

Combination of the two m-step GS iterations with xo = 0 yields the following alternative S-GS preconditioner:

m-1 m-1

i=O i=O

The above expression for &;& may be recast in a form that reveals its symmetry,

$f-1 - S-GS - H z C s ( I - H f - C s 1 - V - Hfm_cs)Rf-Gs

+ (I - H E ~ s ) ( ' - H b - C S ) - l R b - G S

= HEcsA-'(D - L)(' - H p c s ) R f - c s

= A-' - [ ( D - U)-'L]"A-' [U(D - L)-'Im.

(4.104)

+ ( I - HECs)A- ' (D - U ) R b - , s

When m = 1, the S-GS preconditioning matrix for both cases considered above becomes (D - U)- lD(D - L)-1.

4.5.3 incomplete LU factorization

As the name indicates incomplete LU factorization is obtained as an approximation of the complete LU factorization where some of the entries of the matrices L and U are "dropped" (hence, set to zero) in order to reduce the computational cost associated with the construction and use of the preconditioner. We begin the discussion of the process used for the construction of the incomplete LU (ILU) preconditioner by reviewing the LU factorization algorithm.

Consider a general matrix A, E Cnxn. Its LU factorization is

(4.105) 1 An-1 --

where (4.106)

1 all

L,-lUn-l = An-l - -1 uT.

MULTIGRID PROCESS AND ITS USE AS A PRECONDITIONER 131

In view of the above relations, we can develop a recursive, memory-in-place LU factorization algorithm in which L, is stored in the low-triangular and diagonal parts of A,, while U n is stored in its upper-triangular part. Since the diagonal elements of Un are all equal to 1, they are not stored.

If A, is symmetric, the matrix storage and computational cost can be reduced, approxi- mately, by a factor of 2, by employing the Cholesky factorization,

A, = C,C,', (4.107)

where Cn is a lower-triangular matrix. The Cholesky factorization has the form

Cl3

where (4.109)

In view of the above equations, we can develop a recursive, memory-in-place Cholesky factorization algorithm in which C, is stored in the low-triangular and diagonal parts of

If A, is a sparse matrix, the 1 uT/ull or 1 lT/a1l in (4.106) and (4.109) can change the original zero-entry pattern and introduce new fill-ins in the next submatrix A,-1. As the factorization algorithm proceeds, the new fill-ins eventually render the next submatrices dense, thus resulting in increased computational cost of the factorization process. Since the construction of the preconditioner does not require exact factorization, some fill-ins are dropped. The result is an incomplete LU factorization.

Depending on the criteria used for dropping the fill-ins, several variants of incomplete factorization schemes can be obtained. The simplest one is the zeroth level incomplete LU factorization, often referred to as ILU(O), where all the new fill-ins introduced by 1 uT/ull or 1 lT/a1l in the next submatrix A,-1 or C,-l are dropped. Thus the L and U matrices have the same non-zero patterns as A,.

The accuracy of ILU(0) may be insufficient to yield an adequately fast rate of convergence of the iterative solver in which the constructed matrix is used as preconditioner. More accurate ILU schemes can be constructed by maintaining more fill-ins according to one of several schemes. For example, some Z U schemes use a threshold to decide on the "value" of a new fill-in. If the absolute value of the fill-in is below the threshold, the fill-in is dropped. A detailed discussion of numerous popular ILU schemes is given in [ 13.

Jacobi, symmetric GS and incomplete factorization preconditioners are effective for the finite element matrices obtained from the discretization of the elliptic-type partial differential equations that govern electrostatic and magnetostatic BVPs. However, they are less effective and often of unpredictable performance when used for the iterative solution of the finite-element matrices resulting from the discretization of the Helmholtz-type elec- tric or magnetic partial differential equations that govern electrodynamic boundary value problems.

1 all

Cn-lCnT_l = A,-1 - -1 I T .

An

4.6 MULTlGRlD PROCESS AND ITS USE AS A PRECONDITIONER

In our discussion of matrix preconditioning very little has been said about the impact of grid size on the convergence rate of the iterative process used for the solution of equations


Az = f and its preconditioned form M-'Az = M - ' f . However, there is strong theo- retical evidence and numerical experience that the convergence of preconditioned iterative processes, including the class of preconditioned Krylov subspace methods, deteriorates with the dimension of the system, particularly when this increase in size is driven by the introduction of fine-size grids over portions of the computational domain.

In the context of the finite-element solution of electromagnetic BVPs the use of fine grids is driven by two factors. The first one is the presence of fine geometric features in the computation domain (e.g., thin slots in conducting plates, wires attached to platforms, thin dielectric layers), proper resolution of which during meshing is essential for solution accuracy. The second one concerns the utilization of a fine enough grid to contain numerical dispersion error and thus ensure the accuracy of numerical approximation of the wave interactions inside the computational domain. The numerical dispersion error is understood as the error in the numerical wave solution due to the fact that the numerical wave number of the sampled wave is different from the exact wavenumber [4]. For the case of linear elements it can be shown that the phase error in the finite element solution due to numerical dispersion is O[(h/X)2], where his the grid size and X is the wavelength. Hence, the smaller the grid size the smaller the phase error. In particular, it was shown in [4] that the larger the electrical size (i.e., the size in wavelengths) of the computational domain the smaller the grid size (or, equivalently, the higher the order of the finite element interpolation) must be to contain the numerical dispersion error below a desirable threshold.

Irrespective of which one of the aforementioned two factors is the dominant one, it is to be expected that the need for a very fine finite element grid will arise often in the electromagnetic analysis of structures of practical interest. Clearly, the finer the grid size the larger the dimension of the resulting finite element system. The consequences with regards to computational cost are obvious, namely, increased computer memory requirements and solution time. In the context of iterative methods, increased computation time manifests itself in terms of deterioration in convergence rate. Thus the development of iterative methods for which the convergence rate is independent of grid size is of particular importance. Multigrid methods were developed with this objective in mind.

The literature on multigrid methods is vast and continues to grow as new applications to both linear and nonlinear problems are being explored. Multigrid methods are loosely divided into two broad classes, namely, geometric and algebraic multigrid. While the former exploit the geometric structure of the finite difference or finite element grid used for the discretization of the partial differential equation of interest for the development of an fast iterative solution of the sparse matrix equation, the latter rely solely upon algebraic information contained in the system for achieving faster convergence. From the numerous references on multigrid methods in the mathematics literature we identify [5 ] and [6] as hands-on, intuitive, yet mathematically well-founded presentations of the key ideas behind multigrid methods. Reference [7] offers a more in-depth presentation of multigrid methods, including thorough discussions of the parallel implementation of multigrid methods and the algebraic multigrid. Up to date information on this rapidly involving field can be found at the website http://www.mgnet.org, sponsored by the National Science Foundation, Yale University, as well as numerous other universities and research organizations.

In this section, a brief overview is provided of the way multigrid can be used as an effective preconditioner for the iterative solution of finite element systems. Our presentation takes advantage of the illustrative power of geometric multigrid to motivate its use and highlight the key steps associated with its computer implementation. In subsequent chapters, a more in depth discussion of geometric multigrid is provided in the specific context of their

MULTlGRlD PROCESS AND ITS USE AS A PRECONDITIONER 133

A =

application as effective preconditioners for the iterative solution of large sparse matrices resulting from the finite element approximation of electromagnetic BVPs.

- -a b - fl - b a b f 2

1 f = h i 7 (4.112) . . . . . . . b a b f N - 1

- b a - fN -

4.6.1 Motivation for multigrid

For the purposes of this section, a one-dimensional model problem is used. More specifically, we will concern ourselves with the solution of the linear system resulting from the finite element approximation of the one-dimensional Helmholtz equation

(4.110)

with homogeneous Dirichlet boundary conditions E(0) = E(1) = 0, using linear elements. The forcing function, g ( z ) will be assumed known; however, for the purposes of our discussion its exact form is not required.

domain 0 5 z 5 1 with the indicated homogeneous Dirichlet boundary conditions, with corresponding eigenvalues Am = p2 - ( m ~ ) ~ , the analytic solution of (4.1 10) is easily found to be

Recognizing that sin(m7rz) are the eigenfunctions of the operator L = ;iET d2 + ,B2 over the

m

r l (4.1 11)

(4.113)

134 ITERATIVE MEMODS, PRECONDITIONERS, AND MULTIGRID

1 and

j n = - J g(z)w,(z)dz, n = 1 ,2 , . . . , N (4.1 14) 0

with w,(z) denoting the nth linear expansion function associated with node n.

[sin On, sin 20n, . . . , sin of Qh. It is straightforward to show that

The eigenvalues and eigenvectors of A can be found in closed form. Let vAh) = where the superscript (h) is used to indicate the grid size

[ A - (a + 2bcos0,) I] .Ah) = sin((N + l)en)eN, (4.1 15)

where eN is the Nth column of the identity matrix, I, of dimension N. It is immediately recognized that the vector vAh) will be an eigenvector of A with corresponding eigenvalue A, = a + 2b cos 0, for 0, such that the right-hand side of (4.1 15) is zero. This is the case for n r e -- n = 1,2, ..., N .

n - N + 1 ' (4.1 16)

We conclude that the eigenvectors of A are

uhh) = [sin(nml), sin(nmz), . . , sin(nnz,)lT, n = 1,2, . . . , N , (4.1

where zi = i / ( N + 1). i = 1 ,2 , . . . , N . The corresponding eigenvalues are given by

7)

Depicted on the left columns of Fig. 4.1 and Fig. 4.2 are the nine eigenvectors of A for the case N = 9, where the circles are used to indicate that their values are associated with the nine interior nodes of the finite element grid, Oh, used for the approximation of (4.1 10). It is evident from the plots that the corresponding numerical modes, resulting from the finite interpolation over the domain 0 5 z 5 1, may be split into two groups. The first group includes the first four modes (i.e., modes with n < ( N + 1)/2)). These modes are well-resolved by the grid, closely resembling the corresponding analytic eigenfunctions. They will be referred to as the smooth modes. The second group contains the modes with n 2 ( N + 1)/2. These modes are not well-resolved by the grid and will be referred to as the high-frequency or oscillatory modes.

Next, let us assume that the weighted Jacobi process will be used for the iterative solution of the finite element system. In view of (4.41), the error after k iterations is given by

ek = ( R J W ) ~ ~ O . (4.119)

Let us expand eo in terms of the eigenvectors of R J,. In view of (4.36) and the fact that the elements of the diagonal matrix D are all equal, R, and A have the same eigenvectors. The corresponding eigenvalues, pn , n = 1 , 2 , . . . , N , of RJ, are readily obtained from those of A through the relation

where use was made of (4.1 13). The eigenvector expansion of eo is then, N

(4.121) n = l

MULTlGRlD PROCESS AND ITS USE AS A PRECONDITIONER 135

0 0 2 0.4 0.6 0.8 1 -1 m 0 0 2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1 -1 m 0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1 -1 m 0 0 2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1 -1 J1z/ki 0 0.2 0.4 0.6 0.8 1

Figure 4.1 Left column (from top to bonom): Plots of the smooth eigenvectors (modes) of A for a uniform grid, oh, with 9 interior nodes. Right column (from top to bonom): Projection of the five modes on a coarser grid, &!h of grid size twice that of GI.

Use of (4.121) in (4.119) yields

N N N

n=l m=l n=l

This result indicates that the nth component in the eigenvector expansion of the error has been reduced by the factor after k iterations. As already stated earlier, convergence requires that lpnl < 1, n = 1,2, . . . , N. More importantly, the result in (4.122) suggests that different components in the eigenvector expansion of the error are reduced at different rates and, hence, the different components in the eigenvector expansion of the finite element approximation will converge differently. In particular, the fastest converging components are the ones with the smallest eigenvalues.

The availability of the eigenvalues of R J~ in closed form from (4.120) makes possible the quantification of the convergence rate of the classes of numerical modes identified above, namely, the smooth and oscillatory modes. To facilitate the discussion (4.120) is recast in the following form:

For the oscillatory modes it is n 2 (N + 1)/2; hence, sin2 (T) 2(N+1 > 1/2, and (4.123) yields in this case,

pn < 1 - w, (4.124)

This result suggests that if 0 < w < 1 the error in the high-frequency modes can be made to decrease sufficiently fast. This result is confirmed by the data presented Table 4.1 for the eigenvalues of the nine modes calculated for w = 0.5 for different values of p. In all cases,

n 2 (N + l)/2.


-1 .F---=i ~

-1 ~

-1 Aap-y -1 ~ oJ

-1 0 0 2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Figure 4.2 Left column (from top to bottom): Plots of the oscillatory eigenvectors of A for a uniform grid, o h , with 9 interior nodes. Right column (from top to bottom): Projection of these modes on a coarser grid, nZh of grid size twice that of ah.

pn 5 0.5 for all high-frequency modes. Hence, the high-frequency components of the error undergo good reduction at each step of the iteration. Furthermore, in view of (4.124), the reduction rate is independent of the grid size h.

Table 4.1 Numerical Eigenvalues of RJ, for Grid ah, with w = 0.5

Mode p=4/3 p = 2 /3=4 p = S ~~~ ~~~

1 0.9798 0.9852 1.0157 1.0728 2 0.9081 0.9127 0.9387 0.9872 3 0.7965 0.7998 0.8187 0.8540 4 0.6559 0.6576 0.6676 0.6861 5 0.5000 0.5000 0.5000 0.5000 6 0.3441 0.3424 0.3324 0.3139 7 0.2035 0.2002 0.1813 0.1460 8 0.0919 0.0873 0.0613 0.0128 9 0.0202 0.0148 -0.0157 -0.0728

With regards to the rate at which the error in the smooth modes is reduced, the data in Table 4.1 suggest that the smoother the mode the slower the reduction rate. Furthermore, for the given choice of w, depending of the value of p, the eigenvalue of the smoothest of the modes can become larger than 1.

To explore this point further, let us consider the smoothest (best resolved) numerical mode of n = 1. To begin with, it is useful to recall that the constant p appearing in the one- dimensional Helmholtz equation (4.1 10) is the wavenumber for the time-harmonic wave in the domain of interest. Its constant value for our purposes indicates a homogeneous domain.


Its expression in terms of the wavelength, A, in the medium, p = (27r)/A, is of particular relevance to this discussion. More specifically, it is noted that in the expressions (4.1 18) and (4.120) for the numerical eigenvalues of A and RJ,, respectively, the wavenumber enters through the term ph = 27r(h/A). Thus (ph) serves as a measure of how well the wavelength at the operating frequency is resolved by the finite element grid. For example, since h = 0.1 for the grid o h r then (ph) assumes values of 0.133, 0.2, 0.4 and 0.6, respectively, for the values of p of 1.333, 2.0, 4.0 and 6.0, used for obtaining the data in Table 4.1. The smaller values of (ph) represent a well-resolved wavelength. We mention that in most practical applications of the finite element method for the modeling of wave phenomena, a wavelength resolution of at least ten degrees of freedom per wavelength is used. For linear elements, this is equivalent to a resolution of at least ten elements per wavelength. For the largest value of /3 considered (p = 6.0) corresponds to a wavelength of 7r/3, the grid size h = 0.1 amounts to a wavelength resolution of N A/10. Hence, for all four cases considered in Table 4.1 the wavelength is well resolved by the finite element grid.

Forn = 1 it is sin2(7r/2/(N+ 1)) = sin2(n/2/h) N ( ~ / 2 / h ) ~ . Furthermore, termp = ( 1 / 3 ) ( p / ~ ) ~ in the denominator of (4.123) is small enough for well-resolved wavelengths to allow for the approximation (1 - p)-’ N 1 + p. Use of this approximation in (4.123) yields for n = 1 the following approximate expression for p1,

(4.125) w

p1 N 1 + - (p2 - 2) h2 + O(h4). 2

This result is consistent with the data presented in Table 4.1, indicating that, for 0 < w < 1, the eigenvalue of the smoothest mode in the error remains close to 1 for well-resolved wavelengths on the grid, and it can even become greater than 1 for p > 7r. This leads us to the important result that the attenuation rate of smooth components of the error is slow, and that their reduction may even stall as wavelength resolution worsens. Multigrid attempts to rectify this situation by introducing a coarser grid on which the smooth modes appear as high-frequency ones. Thus, in view of (4.124). their iteration on the coarser grid is expected to enhance their attenuation rate and, hence, enhance solution convergence.

The multigrid idea may be explored further with the aid of Figs. 4.1 and 4.2. Let us define the coarse grid, &h, over the domain 0 5 z 5 1, with element grid size 2h. Hence, there are four interior nodes for R2h. with coordinates z{2h) = i(2h), i = 1,2,3,4. with the nine interior nodes of given by zjh) = j h , j = 1 ,2 , . . . ,9, it is z y h ) = z!$’, i = 1,2,3,4. The second column of Figs. 4.1 and 4.2 depicts the projection of the nine modes of the fine grid, o h , onto the coarse grid, R2h. Clearly, as seen in the right column of Fig. 4.1, the Smooth modes of become oscillatory on R2h. More precisely, considering the elements of the nth mode, vAh), 1 < n < ( N + 1)/2, of Rh, associated with the even-numbered grid points, 2j, j = 1,2,3,4, we have

(4.126)

where vA2h) is recognized as the nth eigenvector, 1 < n < ( N + 1)/2 of the R2h grid. Furthermore, for the case of N odd considered in our example, the fine-grid mode v$+l),2 is not represented on 0 2 h , as seen clearly by the top plot on the right column of Fig. 4.2. Finally, the fine-grid, high-frequency modes (i.e.. the modes with n > (N + 1)/2) are not represented anymore on R2h. More precisely, through the phenomenon of aliasing, these


modes are misrepresented as smooth modes on & ? h , as depicted clearly in the right column of Fig. 4.2.

Returning to the smooth modes of the error on o h , the observation that these modes are more oscillatory on the coarser grid f & ? h suggests that, rather than continuing their iterative reduction on the fine grid, project them onto !&h and attempt what will hopefully be a more expedient reduction on the coarser grid.

For example, for the model problem considered, we find from (4.123). with w = 0.5 and for p = 4/3, that the fine-grid, smooth modes n = 3 and n = 4, when projected onto f l 2 h have eigenvalues of 0.3399, and 0.0808, a vast improvement over their fine-grid values depicted in the first column of Table 4.1. Hence, a much faster attenuation is expected by iterating on them on 0 2 h rather than on a h .

On the other hand, the smoothest, fine-grid modes n = 1 and n = 2, when projected onto &h have eigenvalues of 0.9192 and 0.6601, respectively, which, even though better than their fine-grid values, may not be sufficiently smaller than 1 to result in fast enough damping of the corresponding components of the error. The remedy to this should be obvious. It involves the introduction of an even coarser grid, R 4 h . of grid size twice that of 0 2 h . From (4.123) (again, with w = 0.5 and for ,d = 4/3) we find that the eigenvalues of the two smoothest, fine-grid modes n = 1 and n = 2, when projected onto f i 4 h are 0.7764 and 0.2236, promising a faster damping compared to that obtained through their reduction on grid o 2 h .

At this point the recursive nature of the multigrid process starts becoming evident. More specifically, the geometric multigrid process we have been discussing in the previous paragraphs calls for the definition of a hierarchy of grids of progressively decreasing density, starting with a finest grid, o h r and ending with the coarsest grid OH. The approximate solution is sought in an iterative manner through a process aimed at the expedient reduction of the error in an initial guess for the solution, by exploiting the fact that the damping rate for the smooth components of the error is larger on coarser grids. Clearly, such an iterative process requires intergrid transfer operators for moving back and forth between fine and coarse grids in the defined grid hierarchy. Along with these intergrid transfer operators, an effective smoother (i.e., a relaxation process (such as the weighted Jacobi process used above) is required to enable the expedient damping of the oscillatory components of the error at each grid. The detail procedure that leads to the algorithmic implementation of a multigrid process is discussed next.

4.6.2 The two-grid process

We begin with some notation. Let A h Z h = f h (4.127)

denote the finite element approximation of the boundary value problem of interest on a grid, o h , of average grid size h. Let z h denote the approximate solution, and e h and Th the error and residual, respectively; hence,

(4.128)

From the above definitions it follows that the error satisfies the following residual equation:


In a similar manner, let X H be the approximation on a coarser grid, n ~ , with average grid size H. The finite element approximation of the boundary value problem on s 1 ~ has the form

A H X H = f H - (4.130)

Let Z H denote the approximate solution of the system. Then the error e H and the residual T H . given by

(4.13 1)

(4.132)

Equations (4.129) and (4.132) are most useful since they provide for the multigrid process to be performed directly in terms of the residual and the error vectors.

As mentioned in the last few paragraphs of the previous section, the purpose of the coarse grid is to enable for a more effective reduction of the smooth components of the error. Hence, once the residual is calculated on Rh, it must be transferred to the coarse grid for the smooth components of the error to be effectively processed there. This brings us to the introduction of one of the two intergrid transfer operators needed for going back and forth between the two grids, namely, the resrriction operator. The restriction operator, I f , restricts the residual T h of the fine grid to the coarser grid. Hence, it is

(4.133)

With T H known, let us assume that (4.132) can be solved for e H . Then a prolongafion or interpolation operator I; is needed to interpolate the correction e H back to the fine grid

(4.134)

In practice, both I f and I; are chosen to be linear operators.

tion, Eh, to the unknown vector, x h , obtained through the correction operation The coarse-grid correction process described above completes with the new approxima-

The following provides a compact summary of the algorithm.

Algorithm (4.8): z h +- Coarse-Grid-Correction ( z h , f h )

i Th = f h - A h z h ; 2 TH = 3 e H =AH T H ;

4 eh = I k e H ; 5 * h * i h + e h ;

(Compute residual on fine grid) (Restrict residual to coarse grid)

(Solve exactly on coarse grid) (Prolongate correction to fine grid)

(Correct solution on fine grid)

In compact mathematical form, the above coarse-grid correction is written as follows:

(4.136)

It should be clear from the above process that for the intergrid transfer process to be effective the residual that is being transferred back and forth between the two grids must be smooth enough to provide for accurate interpolation. Hence, a smoothing process is needed


prior to intergrid transfer, to damp sufficiently the high-frequency components of the error and thus improve the smoothness of the residual.

In the correction process described above, with the assumption that the residual equation is solved exactly on the coarse grid, such smoothing is not needed prior to the application of the prolongation operator. However, it is needed on the fine grid, prior to applying the restriction operator. As already stated in the previous section, a stationary iteration process, such as weighted Jacobi or Gauss-Seidel can be employed for this smoothing (or relaxation) process. For example, the combination of the symmetric GS of (4.103) with the coarse-grid correction of (4.136) leads to the following two-grid process for the approximate solution Of A h Z h = f h :

Algorithm (4.9): ?h + **Grid hC?Ss(?h, f h ) . 1 j . h C H f m - c s S h + C ~ ~ l H f - c s R r - c s f h ; (Pre-smoothing)

2 rh = f h - AhZ; (Compute residual on finegrid) 3 r H = I f r h ; (Restrict residual to coarse grid) 4 eH = A H 1 , ~ ; (Solve exactly on coarser grid) 5 eh = I h e H ; (Prolongate correction to fine grid) 6 5 h e 5 h + e h ; (Update solution) 7 z h + H F c s S h + Hl-cs Rb-Cs fh (Post-smoothing)

Comparison of Algorithm (4.9) with Algorithm (4.8) reveals the coarse-grid correction step is inserted between the forward and the backward GS iterations. With the initial guess for the approximate solution ?h set to zero, the mathematical statement of two-grid method is cast in the form

(4.137)

The two-grid process (and, in general, a multigrid process utilizing a hierarchy of several grids) constitutes in itself a stand-alone iterative solver. However, it is also most useful as a convergence acceleration technique for use as a preconditioner for Krylov-subspace based iterative solvers such as GMRES or CG.

As we have already discussed, symmetry is required for a preconditioner that will be used in conjunction with CG. From (4.137), the two-grid preconditioner Ad2;' is

m- 1 m- 1

(4.138)

Compared with the symmetric Gauss-Seidel preconditioner M;2s, the coarse-grid correction introduces the extra third item in (4.138). The symmetry of the sum of the two terms in (4.138) has already been shown in (4.103). Thus it remains to show that the third term


is symmetric also. Since it is

m-1

I - A h Hi -GsRf -Gs = I - &(I - H f - C s ) - ' ( I - Hfm_Gs)Rf-GS

(4.139) i=O

= I - (D - L ) ( I - Hfm-Gs)Rf-cs

= ( u p - L)-')",

the third term in (4.138) may be cast in the form

(4.140)

= (D - U ) - ' L ) ~ I & A ; ~ I ~ ( u p - L ) - ' ) ~ .

~f the restriction operator I: is the transpose of the prolongation operator ~ k , that is, if

I h - f h ) H - l H T (4.141)

then the following relation holds between the coarse-grid matrix A H and the fine-grid matrix Ah :

AH = IkAhI&. (4.142)

In this case the multigrid scheme is often referred to as being of the Galerkin type. Under this condition, it is straightforward to show that the matrix in (4.140) is symmetric, and thus the two-grid preconditioner M.-j-- is symmetric.

4.6.3 The multigrid process

The extension of the two-grid process to a multigrid one is straightforward. Instead of solving exactly the coarse-grid residual equation exactly in Step 4 of Algorithm (4.9), the residual is restricted, following a smoothing process, onto a coarser grid. This process can be continued down to a grid coarse enough for the exact solution of the residual equation to be computationally efficient.

Consider a hierarchy of N grids, Oih . i = 1,2, . . . , N, with average element size varying from h at the finest grid to N h at the coarsest grid. We will often refer to such a multigrid process as an N-level process, with the highest level corresponding to the finest grid and the lowest level corresponding to the coarsest grid. The matrix equation at each level is

while enh, ?-&, are, respectively, the error and the residual at the nth level. The N-level multigrid process is described by the following algorithm.


2.6 znh +- z n h enh 2 . 7 f n h t post-smooth using backward CS 2 ) ~ times

The operators Iyl)h and are the intergrid transfer operators between grids R,h and R(n+l)h. It is easy to show recursively that the multigrid preconditioner is symmetric. The parameter Q represents the shape of the multigrid cycle. To elaborate, the transition from the finest grid down to the coarsest grid and back to the finest grid again is called a cycle. If Q = 1, Step 2.4 in Algorithm (4.10) is performed only once per iteration. This results in the V-cycle multigrid process, depicted in the left plot of Fig. 4.3. If Q = 2, Step 2.4 is performed twice per iteration. This results in the W-cycle multigrid process, depicted in the right plot of Fig. 4.4.

Figure43 V-cycle: N = 4, a = 1. Figure4.4 W-cycle: N = 4, Q = 2.

The key to deriving an effective and robust multigrid algorithm is the selection of the appropriate relaxation processes (smoothers) and the construction of Galerkin-type intergrid transfer operators. Closely related to the former are the properties of the matrix A, and in particular its spectrum. For our model problem we chose the one-dimensional Helmholtz operator the finite element approximation of which may not be a positive-definite matrix, as it is immediately evident from its eigenvalues in (4.1 18). At this point, it is useful to point out that the positive-definiteness of its finite element matrix approximation is guaranteed only when p = 0. In this case, the Helmholtz operator reduces to the elliptic Laplace operator. While multigrid techniques have a proven record of success in the solution of finite-difference and finite-element approximation of elliptic BVPs, their application to the iterative solution of finite approximations of electrodynamic BVPs governed by Helmholtz- like operators has not been straightforward. However, recent progress in the understanding of the properties of the spectrum of discrete approximation to such problems have paved the way toward the establishment of effective multigrid-based methodologies and algorithms for the robust iterative solution of the associated matrix equations. These methodologies are discussed in detail in the remaining of the book.

REFERENCES

1 . Y. Saad, Iterative Methods for Sparse Linear Systems, New York: SIAM,2003.

2. R. Barrett, et al., Templates for Iterative Solution of Linear Systems, New York: SIAM, 1992. 3. G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd ed., Baltimore: Johns Hopkins

University Press, 1996.

REFERENCES 143

4. R. Lee and A. Cangellaxis, "A study of the discretization error in the finite element approximation of wave solutions," IEEE Trans. Antennas Propagat., vol. AP-40, pp. 542-548, May 1992.

5. W. L. Briggs, V. E. Henson, and S. F. McCormick, A Multigrid Tutorial, 2nd ed., New York: SIAM, 2000.

6. S. McCormick, Multigrid Methods, vol3 of SIAM Frontiers Series, Philadelphia:SIAM, 1987.

7. U. Trottenberg, C. 0. Osterlee, and A. Schiiller, Mulrigrid, London: Academic Press, 2001.

Documents

Multigrid Finite Element Methods for Electromagnetic Field Modeling (Zhu/Multigrid) || Iterative Methods, Preconditioners, and Multigrid