Upload
others
View
21
Download
0
Embed Size (px)
Citation preview
Iterative methods continued
Conjugate gradient on normal equations● The conjugate gradient method (see last lecture)
can be only applied to symmetric and positive-definite matrix A
● The simplest approach towards more general systems is to transform the system to a symmetric and definite one and then apply the CGM: CGNE method solves the system (ATA)y=b for y and then
computes the solution x=ATy
CGNR method solves the system (ATA)x=b' for the solution x, where b'=ATb
Non-symmetric methods
● Generalized Minimal Residual Algorithm (GMRES) Construct iterates xk for which
The minimization problem is, in iteration k, a least squares problem of size (k+1) x k
For the kth iterate all the previous k-1 iterates are needed– GMRES using only m previous iterates is called GMRES(m)
∥b−Axk∥=minx∈x0K k r0, A∥b−A x∥
Non-symmetric methods
● Bi-conjugate gradient method (BiCG) Update two sequences of residuals
andand search directions and
● Conjugate Gradient Squared (CGS) is a method closely related to BiCG
● Bi-conjugate gradient stablilized (BiCGSTAB) is a generalization of CGS Usually converges faster and more monotonically than
either BiCG or CGS
● Quasi-Minimal Residual Method (QMR)
r k=r k−1
−k Apk r k= r k−1−k AT pk
pk=r k−1
−k−1pk−1 pk= r k−1−k−1
pk−1
On the convergence of iterative methods
Iterative methods compared
Method dot prod ax+y mxv storageJacobi 1 3NGauss-Seidel 1 1 2NCGM 2 3 1 6NGMRES(m) m+1 m+1 1 (m+5)NQMR 2 12 2 16N
Preconditioning
Preconditioning
● We can accelerate the convergence of the Krylov-type methods by preconditioning, that is by multiplying the system from the left with the inverse of a suitable matrix: M-1Ax=M-1b If M has enough similarities with A, the algorithms will
converge faster for the preconditioned system
The extreme cases are M=A and M=I; M should be something in between
● In practice, one still operates with the original coefficient matrix, but the algorithm is adjusted to include additional steps involving M-1
Preconditioning methods
● Diagonal preconditioning M is a diagonal matrix having the same diagonal elements
as A
● Incomplete LU factorization Control the emergence of non-zero elements when forming
the LU factorization, precondition with
● Approximate inverse Form a matrix H ~ A-1 by (pre)defining which elements are
computed and the solving the minimization problem
– Here ||F denotes the sc. Frobenius norm of a matrix
Construction of H parallelizes very well
min∥I−AH∥F2
M= L U
Preconditioned conjugate gradient method● The preconditioned conjugate gradient method
(PCG) isr = b - matmul(A,x)call solver(M,z,r)p = zk = 0do while (norm(r) < eps)
Ap = matmul(A,p) res = dot_product(r,z) alpha = res/dot_product(p,Ap) x = x + alpha*p r = r - alpha*Ap call solver(M,z,r) beta = dot_product(r,z)/res p = r + beta*p k = k + 1end do
Solves the system Mz=r
Which iterative solver to choose?
Is A symmetric?
Is AT available? Is A positive-definite?
QMR GMRES CGNR PCG
No
No No
YesYes
YesYes
Numerical libraries
Numerical libraries
● Libraries are useful for ensuring program correctness
hide complexities
guarantee high-quality implementation
minimize repetitive work
● Numerical libraries for linear systems Serial: LAPACK
Parallel: ScaLAPACK, PETSc, Trilinos,...
ScaLAPACK
● Parallel correspondent of LAPACK linear algebra subroutine library basic operations of matrices and vectors
systems of linear equations
eigenvalue problems
singular value problems etc.
● Contains also parallel subset of BLAS routines, PBLAS
● Portable, can be used from Fortran and C programs● Open source, freely available for several platforms● www.netlib.org/scalapack
ScaLAPACK
● Communication layer BLACS (Basic linear algebra communications subsystem) Message passing
interface for linear algebra
BLACS operates with 2D arrays on 2D processes grids
● Distributed matrices● Parallel BLAS
ScaLAPACK
BLAS
LAPACK BLACS
MPI/PVM/...
PBLASGlobalLocal
platform specific
ScaLAPACK
BLAS
LAPACK BLACS
MPI/PVM/...
PBLASGlobal
Local
platform specific
Process grid and context
● The p processes are organized into a 2D process grid p
row x p
col = p
● Process can be referenced by its coordinates on the process grid
● Each process grid is enclosed within a context● Context is an equivalent of MPI communicator
Data distribution
● ScaLAPACK uses the block-cyclic distribution Example: distributing an 8x8 matrix to 6 processors
● It is user's responsibility to distribute the matrices according to chosen block sizes and processor grid
Using ScaLAPACK
● Once the matrices are distributed and array descriptors are created, ScaLAPACK calls are close to LAPACK and BLAS equivalents
● General translation from LAPACK/BLAS insert P in the front of routine name
replace leading dimensions with array descriptors
insert global indices as separate arguments
● For example PDGEMM: General matrix-matrix multiplication C=aAB+bC
PDGESV: General solver for systems of linear equations
PSSYEV: Eigenvalues and vectors of a symmetric matrix
See matmul-scalapack.f90
ScaLAPACK summary
● ScaLAPACK provides routines for parallel linear algebra
● Block cyclic 2D data distribution● ScaLAPACK usage
1)create a BLACS process grid
2)distribute matrices
3)create array descriptors
4)call ScaLAPACK routine