Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Applied Math 225: lecture 9

I Last time:I Linking libraries in C++I BLAS and LAPACKI Radial basis function example

I Today:I Krylov methodsI Fast Fourier transform

Krylov methods revisited1

Given a matrix A and vector b, a Krylov sequence is the set ofvectors

{b,Ab,A2b,A3b, . . .}

The corresponding Krylov subspaces are the spaces spanned bysuccessive groups of these vectors

Km(A, b) ≡ span{b,Ab,A2b, . . . ,Am−1b}

An important advantage: Krylov methods do not deal directly withA, but rather with matrix–vector products involving A

This is particularly helpful when A is large and sparse, sincematrix–vector multiplications are relatively cheap

1This is a quick review of AM205 lecture 24.

http://iacs-courses.seas.harvard.edu/courses/am205/slides/am205_lec24.pdf

Arnoldi Iteration

We define a matrix as being in Hessenberg form in the followingway:

I A is called upper-Hessenberg if aij = 0 for all i > j + 1

I A is called lower-Hessenberg if aij = 0 for all j > i + 1

The Arnoldi iteration is a Krylov subspace iterative method thatreduces A to upper-Hessenberg form

Arnoldi Iteration

For A ∈ Cn×n, we want to compute A = QHQ∗, where H is upperHessenberg and Q is unitary (i.e. QQ∗ = I)

However, we suppose that n is huge! Hence we do not try tocompute the full factorization

Instead, let us consider just the first m� n columns of thefactorization AQ = QH

Therefore, on the left-hand side, we only need the matrixQm ∈ Cn×m:

Qm =

q1 q2 . . . qm

Arnoldi Iteration

On the right-hand side, we only need the first m columns of H

More specifically, due to upper-Hessenberg structure, we only needH̃m, which is the (m + 1)×m upper-left section of H:

H̃m =

h11 · · · h1mh21 h22

. . .. . .

...hm,m−1 hmm

hm+1,m

H̃m only interacts with the first m+ 1 columns of Q, hence we have

AQm = Qm+1H̃m

Arnoldi Iteration

A

q1 . . . qm

=

q1 . . . qm+1

h11 · · · h1mh21 · · · h2m

. . ....

hm+1,m

The mth column can be written as

Aqm = h1mq1 + · · ·+ hmmqm + hm+1,mqm+1

Or, equivalently

qm+1 = (Aqm − h1mq1 − · · · − hmmqm)/hm+1,m

Arnoldi iteration is just the Gram–Schmidt method that constructsthe hij and the (orthonormal) vectors qj , j = 1, 2, . . .

Arnoldi Iteration

1: choose b arbitrarily, then q1 = b/‖b‖22: for m = 1, 2, 3, . . . do3: v = Aqm4: for j = 1, 2, . . . ,m do5: hjm = q∗j v6: v = v − hjmqj7: end for8: hm+1,m = ‖v‖29: qm+1 = v/hm+1,m

10: end for

This is akin to the modified Gram–Schmidt method because theupdated vector v is used in line 5 (vs. the “raw vector” Aqm)

Also, we only need to evaluate Aqm and perform some vectoroperations in each iteration

Lanczos Iteration

Lanczos iteration is the Arnoldi iteration in the special case that Ais hermitian

However, we obtain some significant computational savings in thisspecial case

Let us suppose for simplicity that A is symmetric with real entries,and hence has real eigenvalues

Then Hm = QTmAQm is also symmetric, and hence must be

tridiagonal

Lanczos Iteration

Since Hm is now tridiagonal, we shall write it as

Tm =

α1 β1β1 α2 β2

β2 α3. . .

. . .. . . βm−1βm−1 αm

The consequence of tridiagonality: Lanczos iteration is muchcheaper than Arnoldi iteration!

Lanczos Iteration

Which leads to the Lanczos iteration

1: β0 = 0, q0 = 02: choose b arbitrarily, then q1 = b/‖b‖23: for m = 1, 2, 3, . . . do4: v = Aqm5: αm = qTmv6: v = v − βm−1qm−1 − αmqm7: βm = ‖v‖28: qm+1 = v/βm9: end for

Solving linear systems with Krylov methods

We aim to use Krylov methods to solve linear systems Ax = b

Only place to look is in the Krylov subspace. Try a solutionxk ∈ Kk . Suppose true solution is x = A−1b and residual isrk = b − Axk . Could aim for

I Minimizing ‖xk − x‖2. There is not enough information in theKrylov subspace to do this.

I Minimizing ‖rk‖2. This leads to algorithms such as MINRESfor symmetric A and GMRES for nonsymmetric A.

I For symmetric A, define the norm ‖x − xk‖A. Minimizing thisresults in the conjugate gradient method.

Conjugate Gradient Method

The CG algorithm is given by

1: x0 = 0, r0 = b, p1 = b2: for k = 1, 2, 3, . . . do3: z = Apk4: νk = (rTk−1rk−1)/(pTk z)5: xk = xk−1 + νkpk6: rk = rk−1 − νkz7: µk = (rTk rk)/(rTk−1rk−1)8: pk+1 = rk + µkpk9: end for

See AM205 lecture 24 for a full discussion of this algorithm. Atevery stage xk minimizes ‖xk − x‖A within Kk(A, b).

http://iacs-courses.seas.harvard.edu/courses/am205/slides/am205_lec24.pdf

Basic conjugate gradient example

Consider the one-dimensional Poission equation for u(x),

∂2u

∂x2= f

on the interval [0, 1], with Dirichlet conditions u(0) = u(1) = 0.

Discretize as uj = u(jh), fj = f (jh) where h = 1/n−1. Henceu0 = un−1 = 0 and

uj+1 − 2uj + uj−1h2

= fj

for j = 1, . . . , n − 2.

Basic conjugate gradient example

scafell:unit2/lec7+8% ./basic_cg_test# Iter 0, residual 3# Iter 1, residual 5.61249# Iter 2, residual 4.1833# Iter 3, residual 2.73861# Iter 4, residual 1.22474# Iter 5, residual 0

Residuals decrease, although it is typical to see non-monotoicbehavior.

After five iterations, the solution x is contained within the Krylovsubspace, and the residual decreases to zero.

Compactly supported radial basis functions

We return to the radial basis function example. Since theconjugate gradient method is best-suited sparse matrices, we useradial functions of compact support. Define

(1− r)k+ =

{(1− r)k for 0 ≤ r < 1,0 for r ≥ 1

Wendland’s functions are compact, k-differentiable, and positivedefinite.2

φ(r) k

(1− r)2+ 0(1− r)4+(4r − 1) 2(1− r)6+(35r2 + 18r + 3) 4(1− r)8+(32r3 + 25r2 + 8r + 1) 6

2The set given here are positive definite up to three dimensions.

Convergence

Convergence of the conjugate gradient method is better when thematrix A has a small condition number

A way to improve convergence is to use preconditioning. We find amatrix M that is an approximation to A, and solveM−1Ax = M−1b. We want

I M is symmetric and positive definite

I M−1A is well conditioned and has few extreme eigenvalues

I Mx = b is easy to solve

Preconditioned Conjugate Gradient Method

The preconditioned CG algorithm is given by

1: x0 = 0, r0 = b, p1 = M−1b, y0 = M−1r02: for k = 1, 2, 3, . . . do3: z = Apk4: νk = (yTk−1rk−1)/(pTk z)5: xk = xk−1 + νkpk6: rk = rk−1 − νkz7: yk = M−1rk8: µk = (yTk rk)/(yTk−1rk−1)9: pk+1 = yk + µkpk

10: end for

Examples of preconditioning

I Diagonal (Jacobi) preconditioning: defineM = diag(a11, a22, . . . , ann). Straightforward to invert.

I Block Jacobi preconditioning: Write the matrix in block formas

A =

A11 A12 · · · A1k...

.... . .

...Ak1 Ak2 · · · Akk

Define

M =

A11

. . .

Akk

Performing M−1 requires inverting each block—much fasterthan solving the original matrix

Examples of preconditioning

I Incomplete LU/Cholesky factorization: a full LU or Choleskyfactorization of a sparse matrix results in fill-in of the zeroentries. Adjust algorithm to obtain approximate result withminimum fill-in.

I Multigrid: the multigrid algorithm is an iterative procedure forsolving matrix problems, by applying successive V-cycles. LetM−1 be the matrix applying one V-cycle—goodapproximation to the inverse of A.

Radial basis function timing example

Tested RBF example using points from n = 10 to n = 104.

Use compact Wendland functions with a radius of 5/√n. Gives

approximately 15 non-zero entries per row of matrix.

Two solution algorithms:

I LAPACK – dense linear algebra

I Preconditioned CG – use block Jacobi preconditioner withblocks of size

√n.


10−6

10−5

10−4

10−3

10−2

10−1

1

10 30 100 300 1000 3000 10000

Wal

lcl

ock

tim

e(s

)

Number of points

LAPACKPreconditioned CG


For small systems with n < 800, dense linear algebra is faster.

For large systems with n ≥ 800, the O(n3) scaling of LAPACKmakes it inefficient.

Preconditioned CG has O(n2.37) scaling in this example, andtherefore becomes the best choice for large numbers of points.

This timing comparison is heavily dependent on the matrixstructure and sparsity. LAPACK does better for denser matrices.

Other Krylov methods

The conjugate gradient method only applies to symmetric positivedefinite linear systems.

There are many related algorithms for solving different types oflinear systems. The following flow chart from the textbook3

illustrates some of the different possibilities.

3J. A. Demmel, Applied Numerical Linear Algebra, SIAM 1997.

GMRES: Generalized Minimum RESidual method

Consider a general matrix A that may not be symmetric. Shortrecurrence no longer holds so we must use the Arnoldi algorithm toobtain

Hk = QTk AQk

where Qk is orthogonal and Hk is upper Hessenberg.

Choose xk = Qkyk ∈ Kk(A, b) to minimize the residual ‖rk‖2.

GMRES: Generalized Minimum RESidual methodManipulating the residual gives

‖rk‖2 = ‖b − Axk‖2= ‖b − AQkyk‖2= ‖b − (QHQT )Qkyk‖2= ‖QTb − HQTQkyk‖2

=

∥∥∥∥e1‖b‖2 − ( Hk Huk

Hku Hu

)(yk0

)∥∥∥∥2

=

∥∥∥∥e1‖b‖2 − ( Hk

Hku

)yk

∥∥∥∥2

Here the u subscript refers to the remaining parts of the full matrixH that are not in Hk . e1 is the first unit vector.

The final line is a linear least-squares problem for yk , which can besolved using the QR algorithm.

GMRES: solving the least-squares problem

Normally, performing a QR factorization would require O(k3)iterations.

But here, we require the QR factorization of the (k + 1)× kHessenberg matrix. We can perform the QR factorization byperforming k Givens rotations to rotate out the terms below thediagonal.

GMRES requires O(kn) memory to store the vectors Qk . A variantto minimize the growth of computation and storage is to stop afterk steps, and restart by solving Ad = rk = b − Axk , after which thesolution is given by d + xk .

This is called GMRES(k). It is still more expensive than conjugategradient.

The Fast Fourier Transform

Consider a one-dimensional Poisson problem

−d2v

dx2= f (x)

for a function v(x) on [0, 1] with boundary conditionsv(0) = v(1) = 0.

Discretize with N + 2 evenly spaced points with grid spacingh = 1/(N + 1), so that xi = hi .

Second-order centered finite difference gives

−vi−1 + 2vi − vi+1 = h2fi

for i = 1, . . .N.

Matrix formulation

Writing all equations in a linear system yields

TN

v1...vN

=

2 −1 0

−1. . .

. . .. . .

. . . −10 −1 2

v1

...vN

= h2

f1...fN

.

The eigenvectors of TN are

zj(k) =

√2

N + 1sin

jkπ

N + 1

with corresponding eigenvalues

λj = 2

(1− cos

πj

N + 1

).

Poisson’s equation in two dimensions

Now consider the two dimensional Poisson problem

−∂2v

∂x2− ∂2v

∂y2= f (x , y)

on the unit square [0, 1]2 with v = 0 on the boundary.Discretize using a (N + 2)× (N + 2) grid with xj = jh and yk = khwith h = 1/(N + 1). Write

vjk = v(jh, kh), fjk = f (jh, kh).

Equations in the linear system are

4vjk − vj−1,k − vj+1,k − vj ,k−1 − vj ,k+1 = h2fjk .

Matrix formulation

Rewrite unknowns vjk as occupying an N × N matrix V . Then

2vjk − vj−1,k − vj+1,k = (TNV )jk ,

2vjk − vj ,k−1 − vj ,k+1 = (VTN)jk .

Hence the problem can be written as

TNV + VTN = h2F

where F is an N × N with entries fjk .

Eigenvectors and eigenvalues for 2D problem

Let V = zjzTk . Then

TNV + VTN = (TNzj)zTk + zj(z

Tk TN)

= (λjzj)zTk + zj(z

Tk λk)

= (λj + λk)zjzTk

= (λj + λk)V

and hence zjzTk is an eigenevector of the 2D problem with

eigenvalue λj + λk . We obtain a full set of N2 eigenvectors for theproblem.

Solving the equation via an eigendecomposition

Let TN = ZΛZT be the eigendecomposition of TN . Note thatZTZ = I since Z is orthogonal. Then

ZΛZTV + V (ZΛZT) = h2F

andZTZΛZTVZ + ZTV (ZΛZT)Z = h2ZTFZ ,

which becomesΛV ′ + V ′Λ = h2F ′

where V ′ = ZTVZ and F ′ = ZTFZ .

Solving the equation via an eigendecomposition

Henceλjv′jk + v ′jkλk = h2f ′jk

and so

v ′jk =h2f ′jkλj + λk

.

Three steps to obtain a solution:

1. Compute F ′ = ZTFZ (O(N3) operations4)

2. Find v ′jk = h2f ′jk/(λj + λk) (O(N2) operations)

3. Compute V = ZV ′ZT (O(N3) operations)

However, we will soon see that the Fast Fourier Transform allowssteps 1 and 3 to be performed in O(N2 logN) operations, turningthis into a practical algorithm.

4Assuming a conventional matrix–matrix multiplication routine.

Alternative viewpoint: the Kronecker product

Write vec(V ) to be the operator converting the N × N matrix intoan N2-vector of unknowns. Write

TN×N = I ⊗ TN + TN ⊗ I = (Z ⊗ Z )(I ⊗ Λ + Λ⊗ I )(Z ⊗ Z )T.

Then

vec(V ) = (TN×N)−1 vec(h2F )

=(

(Z ⊗ Z )(I ⊗ Λ + Λ⊗ I )(Z ⊗ Z )T)−1

vec(h2F )

= (Z ⊗ Z )(I ⊗ Λ + Λ⊗ I )−1(ZT ⊗ ZT) vec(h2F ).

While this is less notationally elegant, it makes it clear that thesolution procedure could be extended to arbitrary dimensions.

i.e. in 3D, we would consider (Z ⊗ Z ⊗ Z ), applying the matrix Zto field values in each coordinate direction.

Documents

Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK