34
Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK I Radial basis function example I Today: I Krylov methods I Fast Fourier transform

Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

  • Upload
    others

  • View
    3

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Applied Math 225: lecture 9

I Last time:I Linking libraries in C++I BLAS and LAPACKI Radial basis function example

I Today:I Krylov methodsI Fast Fourier transform

Page 2: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Krylov methods revisited1

Given a matrix A and vector b, a Krylov sequence is the set ofvectors

{b,Ab,A2b,A3b, . . .}

The corresponding Krylov subspaces are the spaces spanned bysuccessive groups of these vectors

Km(A, b) ≡ span{b,Ab,A2b, . . . ,Am−1b}

An important advantage: Krylov methods do not deal directly withA, but rather with matrix–vector products involving A

This is particularly helpful when A is large and sparse, sincematrix–vector multiplications are relatively cheap

1This is a quick review of AM205 lecture 24.

Page 3: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Arnoldi Iteration

We define a matrix as being in Hessenberg form in the followingway:

I A is called upper-Hessenberg if aij = 0 for all i > j + 1

I A is called lower-Hessenberg if aij = 0 for all j > i + 1

The Arnoldi iteration is a Krylov subspace iterative method thatreduces A to upper-Hessenberg form

Page 4: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Arnoldi Iteration

For A ∈ Cn×n, we want to compute A = QHQ∗, where H is upperHessenberg and Q is unitary (i.e. QQ∗ = I)

However, we suppose that n is huge! Hence we do not try tocompute the full factorization

Instead, let us consider just the first m� n columns of thefactorization AQ = QH

Therefore, on the left-hand side, we only need the matrixQm ∈ Cn×m:

Qm =

q1 q2 . . . qm

Page 5: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Arnoldi Iteration

On the right-hand side, we only need the first m columns of H

More specifically, due to upper-Hessenberg structure, we only needH̃m, which is the (m + 1)×m upper-left section of H:

H̃m =

h11 · · · h1mh21 h22

. . .. . .

...hm,m−1 hmm

hm+1,m

H̃m only interacts with the first m+ 1 columns of Q, hence we have

AQm = Qm+1H̃m

Page 6: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Arnoldi Iteration

A

q1 . . . qm

=

q1 . . . qm+1

h11 · · · h1mh21 · · · h2m

. . ....

hm+1,m

The mth column can be written as

Aqm = h1mq1 + · · ·+ hmmqm + hm+1,mqm+1

Or, equivalently

qm+1 = (Aqm − h1mq1 − · · · − hmmqm)/hm+1,m

Arnoldi iteration is just the Gram–Schmidt method that constructsthe hij and the (orthonormal) vectors qj , j = 1, 2, . . .

Page 7: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Arnoldi Iteration

1: choose b arbitrarily, then q1 = b/‖b‖22: for m = 1, 2, 3, . . . do3: v = Aqm4: for j = 1, 2, . . . ,m do5: hjm = q∗j v6: v = v − hjmqj7: end for8: hm+1,m = ‖v‖29: qm+1 = v/hm+1,m

10: end for

This is akin to the modified Gram–Schmidt method because theupdated vector v is used in line 5 (vs. the “raw vector” Aqm)

Also, we only need to evaluate Aqm and perform some vectoroperations in each iteration

Page 8: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Lanczos Iteration

Lanczos iteration is the Arnoldi iteration in the special case that Ais hermitian

However, we obtain some significant computational savings in thisspecial case

Let us suppose for simplicity that A is symmetric with real entries,and hence has real eigenvalues

Then Hm = QTmAQm is also symmetric, and hence must be

tridiagonal

Page 9: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Lanczos Iteration

Since Hm is now tridiagonal, we shall write it as

Tm =

α1 β1β1 α2 β2

β2 α3. . .

. . .. . . βm−1βm−1 αm

The consequence of tridiagonality: Lanczos iteration is muchcheaper than Arnoldi iteration!

Page 10: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Lanczos Iteration

Which leads to the Lanczos iteration

1: β0 = 0, q0 = 02: choose b arbitrarily, then q1 = b/‖b‖23: for m = 1, 2, 3, . . . do4: v = Aqm5: αm = qTmv6: v = v − βm−1qm−1 − αmqm7: βm = ‖v‖28: qm+1 = v/βm9: end for

Page 11: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Solving linear systems with Krylov methods

We aim to use Krylov methods to solve linear systems Ax = b

Only place to look is in the Krylov subspace. Try a solutionxk ∈ Kk . Suppose true solution is x = A−1b and residual isrk = b − Axk . Could aim for

I Minimizing ‖xk − x‖2. There is not enough information in theKrylov subspace to do this.

I Minimizing ‖rk‖2. This leads to algorithms such as MINRESfor symmetric A and GMRES for nonsymmetric A.

I For symmetric A, define the norm ‖x − xk‖A. Minimizing thisresults in the conjugate gradient method.

Page 12: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Conjugate Gradient Method

The CG algorithm is given by

1: x0 = 0, r0 = b, p1 = b2: for k = 1, 2, 3, . . . do3: z = Apk4: νk = (rTk−1rk−1)/(pTk z)5: xk = xk−1 + νkpk6: rk = rk−1 − νkz7: µk = (rTk rk)/(rTk−1rk−1)8: pk+1 = rk + µkpk9: end for

See AM205 lecture 24 for a full discussion of this algorithm. Atevery stage xk minimizes ‖xk − x‖A within Kk(A, b).

Page 13: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Basic conjugate gradient example

Consider the one-dimensional Poission equation for u(x),

∂2u

∂x2= f

on the interval [0, 1], with Dirichlet conditions u(0) = u(1) = 0.

Discretize as uj = u(jh), fj = f (jh) where h = 1/n−1. Henceu0 = un−1 = 0 and

uj+1 − 2uj + uj−1h2

= fj

for j = 1, . . . , n − 2.

Page 14: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Basic conjugate gradient example

scafell:unit2/lec7+8% ./basic_cg_test# Iter 0, residual 3# Iter 1, residual 5.61249# Iter 2, residual 4.1833# Iter 3, residual 2.73861# Iter 4, residual 1.22474# Iter 5, residual 0

Residuals decrease, although it is typical to see non-monotoicbehavior.

After five iterations, the solution x is contained within the Krylovsubspace, and the residual decreases to zero.

Page 15: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Compactly supported radial basis functions

We return to the radial basis function example. Since theconjugate gradient method is best-suited sparse matrices, we useradial functions of compact support. Define

(1− r)k+ =

{(1− r)k for 0 ≤ r < 1,0 for r ≥ 1

Wendland’s functions are compact, k-differentiable, and positivedefinite.2

φ(r) k

(1− r)2+ 0(1− r)4+(4r − 1) 2(1− r)6+(35r2 + 18r + 3) 4(1− r)8+(32r3 + 25r2 + 8r + 1) 6

2The set given here are positive definite up to three dimensions.

Page 16: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Convergence

Convergence of the conjugate gradient method is better when thematrix A has a small condition number

A way to improve convergence is to use preconditioning. We find amatrix M that is an approximation to A, and solveM−1Ax = M−1b. We want

I M is symmetric and positive definite

I M−1A is well conditioned and has few extreme eigenvalues

I Mx = b is easy to solve

Page 17: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Preconditioned Conjugate Gradient Method

The preconditioned CG algorithm is given by

1: x0 = 0, r0 = b, p1 = M−1b, y0 = M−1r02: for k = 1, 2, 3, . . . do3: z = Apk4: νk = (yTk−1rk−1)/(pTk z)5: xk = xk−1 + νkpk6: rk = rk−1 − νkz7: yk = M−1rk8: µk = (yTk rk)/(yTk−1rk−1)9: pk+1 = yk + µkpk

10: end for

Page 18: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Examples of preconditioning

I Diagonal (Jacobi) preconditioning: defineM = diag(a11, a22, . . . , ann). Straightforward to invert.

I Block Jacobi preconditioning: Write the matrix in block formas

A =

A11 A12 · · · A1k...

.... . .

...Ak1 Ak2 · · · Akk

Define

M =

A11

. . .

Akk

Performing M−1 requires inverting each block—much fasterthan solving the original matrix

Page 19: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Examples of preconditioning

I Incomplete LU/Cholesky factorization: a full LU or Choleskyfactorization of a sparse matrix results in fill-in of the zeroentries. Adjust algorithm to obtain approximate result withminimum fill-in.

I Multigrid: the multigrid algorithm is an iterative procedure forsolving matrix problems, by applying successive V-cycles. LetM−1 be the matrix applying one V-cycle—goodapproximation to the inverse of A.

Page 20: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Radial basis function timing example

Tested RBF example using points from n = 10 to n = 104.

Use compact Wendland functions with a radius of 5/√n. Gives

approximately 15 non-zero entries per row of matrix.

Two solution algorithms:

I LAPACK – dense linear algebra

I Preconditioned CG – use block Jacobi preconditioner withblocks of size

√n.

Page 21: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Radial basis function timing example

10−6

10−5

10−4

10−3

10−2

10−1

1

10 30 100 300 1000 3000 10000

Wal

lcl

ock

tim

e(s

)

Number of points

LAPACKPreconditioned CG

Page 22: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Radial basis function timing example

For small systems with n < 800, dense linear algebra is faster.

For large systems with n ≥ 800, the O(n3) scaling of LAPACKmakes it inefficient.

Preconditioned CG has O(n2.37) scaling in this example, andtherefore becomes the best choice for large numbers of points.

This timing comparison is heavily dependent on the matrixstructure and sparsity. LAPACK does better for denser matrices.

Page 23: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Other Krylov methods

The conjugate gradient method only applies to symmetric positivedefinite linear systems.

There are many related algorithms for solving different types oflinear systems. The following flow chart from the textbook3

illustrates some of the different possibilities.

3J. A. Demmel, Applied Numerical Linear Algebra, SIAM 1997.

Page 24: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

GMRES: Generalized Minimum RESidual method

Consider a general matrix A that may not be symmetric. Shortrecurrence no longer holds so we must use the Arnoldi algorithm toobtain

Hk = QTk AQk

where Qk is orthogonal and Hk is upper Hessenberg.

Choose xk = Qkyk ∈ Kk(A, b) to minimize the residual ‖rk‖2.

Page 25: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

GMRES: Generalized Minimum RESidual methodManipulating the residual gives

‖rk‖2 = ‖b − Axk‖2= ‖b − AQkyk‖2= ‖b − (QHQT )Qkyk‖2= ‖QTb − HQTQkyk‖2

=

∥∥∥∥e1‖b‖2 − ( Hk Huk

Hku Hu

)(yk0

)∥∥∥∥2

=

∥∥∥∥e1‖b‖2 − ( Hk

Hku

)yk

∥∥∥∥2

Here the u subscript refers to the remaining parts of the full matrixH that are not in Hk . e1 is the first unit vector.

The final line is a linear least-squares problem for yk , which can besolved using the QR algorithm.

Page 26: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

GMRES: solving the least-squares problem

Normally, performing a QR factorization would require O(k3)iterations.

But here, we require the QR factorization of the (k + 1)× kHessenberg matrix. We can perform the QR factorization byperforming k Givens rotations to rotate out the terms below thediagonal.

GMRES requires O(kn) memory to store the vectors Qk . A variantto minimize the growth of computation and storage is to stop afterk steps, and restart by solving Ad = rk = b − Axk , after which thesolution is given by d + xk .

This is called GMRES(k). It is still more expensive than conjugategradient.

Page 27: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

The Fast Fourier Transform

Consider a one-dimensional Poisson problem

−d2v

dx2= f (x)

for a function v(x) on [0, 1] with boundary conditionsv(0) = v(1) = 0.

Discretize with N + 2 evenly spaced points with grid spacingh = 1/(N + 1), so that xi = hi .

Second-order centered finite difference gives

−vi−1 + 2vi − vi+1 = h2fi

for i = 1, . . .N.

Page 28: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Matrix formulation

Writing all equations in a linear system yields

TN

v1...vN

=

2 −1 0

−1. . .

. . .. . .

. . . −10 −1 2

v1

...vN

= h2

f1...fN

.

The eigenvectors of TN are

zj(k) =

√2

N + 1sin

jkπ

N + 1

with corresponding eigenvalues

λj = 2

(1− cos

πj

N + 1

).

Page 29: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Poisson’s equation in two dimensions

Now consider the two dimensional Poisson problem

−∂2v

∂x2− ∂2v

∂y2= f (x , y)

on the unit square [0, 1]2 with v = 0 on the boundary.Discretize using a (N + 2)× (N + 2) grid with xj = jh and yk = khwith h = 1/(N + 1). Write

vjk = v(jh, kh), fjk = f (jh, kh).

Equations in the linear system are

4vjk − vj−1,k − vj+1,k − vj ,k−1 − vj ,k+1 = h2fjk .

Page 30: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Matrix formulation

Rewrite unknowns vjk as occupying an N × N matrix V . Then

2vjk − vj−1,k − vj+1,k = (TNV )jk ,

2vjk − vj ,k−1 − vj ,k+1 = (VTN)jk .

Hence the problem can be written as

TNV + VTN = h2F

where F is an N × N with entries fjk .

Page 31: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Eigenvectors and eigenvalues for 2D problem

Let V = zjzTk . Then

TNV + VTN = (TNzj)zTk + zj(z

Tk TN)

= (λjzj)zTk + zj(z

Tk λk)

= (λj + λk)zjzTk

= (λj + λk)V

and hence zjzTk is an eigenevector of the 2D problem with

eigenvalue λj + λk . We obtain a full set of N2 eigenvectors for theproblem.

Page 32: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Solving the equation via an eigendecomposition

Let TN = ZΛZT be the eigendecomposition of TN . Note thatZTZ = I since Z is orthogonal. Then

ZΛZTV + V (ZΛZT) = h2F

andZTZΛZTVZ + ZTV (ZΛZT)Z = h2ZTFZ ,

which becomesΛV ′ + V ′Λ = h2F ′

where V ′ = ZTVZ and F ′ = ZTFZ .

Page 33: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Solving the equation via an eigendecomposition

Henceλjv′jk + v ′jkλk = h2f ′jk

and so

v ′jk =h2f ′jkλj + λk

.

Three steps to obtain a solution:

1. Compute F ′ = ZTFZ (O(N3) operations4)

2. Find v ′jk = h2f ′jk/(λj + λk) (O(N2) operations)

3. Compute V = ZV ′ZT (O(N3) operations)

However, we will soon see that the Fast Fourier Transform allowssteps 1 and 3 to be performed in O(N2 logN) operations, turningthis into a practical algorithm.

4Assuming a conventional matrix–matrix multiplication routine.

Page 34: Applied Math 225: lecture 9iacs-courses.seas.harvard.edu/courses/am225/slides/am225_lec09.pdf · Applied Math 225: lecture 9 I Last time: I Linking libraries in C++ I BLAS and LAPACK

Alternative viewpoint: the Kronecker product

Write vec(V ) to be the operator converting the N × N matrix intoan N2-vector of unknowns. Write

TN×N = I ⊗ TN + TN ⊗ I = (Z ⊗ Z )(I ⊗ Λ + Λ⊗ I )(Z ⊗ Z )T.

Then

vec(V ) = (TN×N)−1 vec(h2F )

=(

(Z ⊗ Z )(I ⊗ Λ + Λ⊗ I )(Z ⊗ Z )T)−1

vec(h2F )

= (Z ⊗ Z )(I ⊗ Λ + Λ⊗ I )−1(ZT ⊗ ZT) vec(h2F ).

While this is less notationally elegant, it makes it clear that thesolution procedure could be extended to arbitrary dimensions.

i.e. in 3D, we would consider (Z ⊗ Z ⊗ Z ), applying the matrix Zto field values in each coordinate direction.