Iterative methods continued

Iterative methods continued

Conjugate gradient on normal equations● The conjugate gradient method (see last lecture)

can be only applied to symmetric and positive-definite matrix A

● The simplest approach towards more general systems is to transform the system to a symmetric and definite one and then apply the CGM: CGNE method solves the system (ATA)y=b for y and then

computes the solution x=ATy

CGNR method solves the system (ATA)x=b' for the solution x, where b'=ATb

Non-symmetric methods

● Generalized Minimal Residual Algorithm (GMRES) Construct iterates xk for which

The minimization problem is, in iteration k, a least squares problem of size (k+1) x k

For the kth iterate all the previous k-1 iterates are needed– GMRES using only m previous iterates is called GMRES(m)

∥b−Axk∥=minx∈x0K k r0, A∥b−A x∥

Non-symmetric methods

● Bi-conjugate gradient method (BiCG) Update two sequences of residuals

andand search directions and

● Conjugate Gradient Squared (CGS) is a method closely related to BiCG

● Bi-conjugate gradient stablilized (BiCGSTAB) is a generalization of CGS Usually converges faster and more monotonically than

either BiCG or CGS

● Quasi-Minimal Residual Method (QMR)

r k=r k−1

−k Apk r k= r k−1−k AT pk

pk=r k−1

−k−1pk−1 pk= r k−1−k−1

pk−1

On the convergence of iterative methods

Iterative methods compared

Method dot prod ax+y mxv storageJacobi 1 3NGauss-Seidel 1 1 2NCGM 2 3 1 6NGMRES(m) m+1 m+1 1 (m+5)NQMR 2 12 2 16N

Preconditioning

Preconditioning

● We can accelerate the convergence of the Krylov-type methods by preconditioning, that is by multiplying the system from the left with the inverse of a suitable matrix: M-1Ax=M-1b If M has enough similarities with A, the algorithms will

converge faster for the preconditioned system

The extreme cases are M=A and M=I; M should be something in between

● In practice, one still operates with the original coefficient matrix, but the algorithm is adjusted to include additional steps involving M-1

Preconditioning methods

● Diagonal preconditioning M is a diagonal matrix having the same diagonal elements

as A

● Incomplete LU factorization Control the emergence of non-zero elements when forming

the LU factorization, precondition with

● Approximate inverse Form a matrix H ~ A-1 by (pre)defining which elements are

computed and the solving the minimization problem

– Here ||F denotes the sc. Frobenius norm of a matrix

Construction of H parallelizes very well

min∥I−AH∥F2

M= L U

Preconditioned conjugate gradient method● The preconditioned conjugate gradient method

(PCG) isr = b - matmul(A,x)call solver(M,z,r)p = zk = 0do while (norm(r) < eps)

Ap = matmul(A,p) res = dot_product(r,z) alpha = res/dot_product(p,Ap) x = x + alpha*p r = r - alpha*Ap call solver(M,z,r) beta = dot_product(r,z)/res p = r + beta*p k = k + 1end do

Solves the system Mz=r

Which iterative solver to choose?

Is A symmetric?

Is AT available? Is A positive-definite?

QMR GMRES CGNR PCG

No

No No

YesYes

YesYes

Numerical libraries

Numerical libraries

● Libraries are useful for ensuring program correctness

hide complexities

guarantee high-quality implementation

minimize repetitive work

● Numerical libraries for linear systems Serial: LAPACK

Parallel: ScaLAPACK, PETSc, Trilinos,...

ScaLAPACK

● Parallel correspondent of LAPACK linear algebra subroutine library basic operations of matrices and vectors

systems of linear equations

eigenvalue problems

singular value problems etc.

● Contains also parallel subset of BLAS routines, PBLAS

● Portable, can be used from Fortran and C programs● Open source, freely available for several platforms● www.netlib.org/scalapack

ScaLAPACK

● Communication layer BLACS (Basic linear algebra communications subsystem) Message passing

interface for linear algebra

BLACS operates with 2D arrays on 2D processes grids

● Distributed matrices● Parallel BLAS

ScaLAPACK

BLAS

LAPACK BLACS

MPI/PVM/...

PBLASGlobalLocal

platform specific

ScaLAPACK

BLAS

LAPACK BLACS

MPI/PVM/...

PBLASGlobal

Local

platform specific

Process grid and context

● The p processes are organized into a 2D process grid p

row x p

col = p

● Process can be referenced by its coordinates on the process grid

● Each process grid is enclosed within a context● Context is an equivalent of MPI communicator

Data distribution

● ScaLAPACK uses the block-cyclic distribution Example: distributing an 8x8 matrix to 6 processors

● It is user's responsibility to distribute the matrices according to chosen block sizes and processor grid

Using ScaLAPACK

● Once the matrices are distributed and array descriptors are created, ScaLAPACK calls are close to LAPACK and BLAS equivalents

● General translation from LAPACK/BLAS insert P in the front of routine name

replace leading dimensions with array descriptors

insert global indices as separate arguments

● For example PDGEMM: General matrix-matrix multiplication C=aAB+bC

PDGESV: General solver for systems of linear equations

PSSYEV: Eigenvalues and vectors of a symmetric matrix

See matmul-scalapack.f90

ScaLAPACK summary

● ScaLAPACK provides routines for parallel linear algebra

● Block cyclic 2D data distribution● ScaLAPACK usage

1)create a BLACS process grid

2)distribute matrices

3)create array descriptors

4)call ScaLAPACK routine

Documents

Iterative methods continued