32
Implementation of Block Algebraic Iterative Reconstruction Methods Per Christian Hansen joint work with Hans Henrik B. Sørensen

Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

Implementation of Block Algebraic Iterative Reconstruction Methods

Per Christian Hansen joint work with

Hans Henrik B. Sørensen

Page 2: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 2 P. C. Hansen – Implementation of Block AIR Methods

About Me …

• Interests: inverse problems, tomography, regularization algorithms, matrix compu-tations, image deblurring, signal processing, Matlab software, …

• Head of the project High-Definition Tomography, funded by an ERC Advanced Research Grant.

• Author of several Matlab software packages. • Author of four books.

Forward problem

Page 3: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 3 P. C. Hansen – Implementation of Block AIR Methods

Outline of Talk

We consider reconstruction problems in computed tomography: reconstruct a 2D or 3D object from its projections, i.e., (noisy) measurements of the damping of rays that go through the domain.

We obtain a very large system of equations A x = b with a very sparse matrix A, which must be solved by an iterative method.

1. Classical iterative reconstruction techniques

2. Performance considerations

3. An overview of block methods

4. How to compare the block methods

5. Numerical results

Page 4: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 4 P. C. Hansen – Implementation of Block AIR Methods

Analogy: the “Sudoku” Problem – 数独

3

7

4 6

This matrix in rank deficient and there are infinitely many solutions.

0

BB@

1 0 1 00 1 0 11 1 0 00 0 1 1

1

CCA

0

BB@

x1

x2

x3

x4

1

CCA =

0

BB@

3746

1

CCA

3

7

4 6

0

BBBB@

1 0 1 00 1 0 11 1 0 00 0 1 11 0 0 1

1

CCCCA

0

BB@

x1

x2

x3

x4

1

CCA =

0

BBBB@

37465

1

CCCCA

5

Unique solution!

Page 5: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 5 P. C. Hansen – Implementation of Block AIR Methods

3D Tomography Test Problem

• Parallel X-rays are sent through the object.

• The object is discretized in an array of N×N×N voxels.

• Projections are recorded on detectors with p×p pixels.

• The directions of the rays are ”evenly” distributed over the half-sphere using Lebedev quadrature points.

Page 6: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 6 P. C. Hansen – Implementation of Block AIR Methods

Setting Up the Algebraic Model

Damping of the i-th X-ray through domain (Beer’s law):

bi =Rrayi

Â(s) d`; Â(s) = attenuation coef.

Discretization leads to a large, sparse, ill-conditioned system:

A x = b

Geometry

Image

Projections Noise

¹b = A ¹x

b = ¹b + e

2D example:

Page 7: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 7 P. C. Hansen – Implementation of Block AIR Methods

A Note About the 3D Algebraic Model

Each ray corresponds to a particular row of the matrix A.

Each ray intersects only a very small number of voxels.

Hence, many rows of A are structurally orthogonal.

Page 8: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 8 P. C. Hansen – Implementation of Block AIR Methods

Noise Sensitivity

Assume that A has full rank, and consider the two problems:

A ¹x = ¹b (no noise) A x ¼ b = ¹b + e

The last term dominates because A is very ill conditioned! We must use regularization to compute an approximate solution that is less senstive to the noise.

xnaive = A¡1b = xexact + A¡1e; kA¡1ek À kxexactkLet us define the ”naive” solution:

Page 9: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 9 P. C. Hansen – Implementation of Block AIR Methods

Some Large-Scale Reconstruction Algorithms

Bayesian Methods My knowledge here is very limited …

Transform-Based Methods The forward problem is formulated as a certain transform

→ find a stable way to compute the inverse transform. Examples: the inverse Radon transform for tomography

→ filtered back-projection, FDK.

Algebraic Iterative Methods The forward problem is formulated as a discretized problem

→ solve A x = b using an iterative method. Examples: Cimmino, Kaczmarz, CGLS.

Page 10: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 10 P. C. Hansen – Implementation of Block AIR Methods

ART (Algebraic Reconstruction Technique)

Relaxation parameter

Parallelism at the level of an inner product

Algorithm: ART (Classical Kaczmarz)

Let x0 = 0Repeat the above for k = 1; 2; 3; : : :

Algorithm: xk à ART-sweep (¸; A; b; xk¡1)

xk;0 = xk¡1

xk;i = P

µxk;i¡1 + ¸

bi ¡ aTi xk;i¡1

kaik22ai

¶; i = 1; : : : ; m

xk = xk;m.

Page 11: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 11 P. C. Hansen – Implementation of Block AIR Methods

SIRT (Simultaneous Iter. Reconstr. Tech.)

matrix that defines the specific method

Algorithm: SIRT

Let x0 = 0For k = 1; 2; 3; : : :

xk =¡

xk¡1 + ¸ ATM (b¡A xk¡1)¢

Relaxation parameter

Parallelism at the level of a matrix-vector product

No evidence that x0 6= 0 gives better solutions or smaller computing time.

Cimmino:M = 1

mdiag(1=kaik22).

Page 12: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 12 P. C. Hansen – Implementation of Block AIR Methods

Performance

ART and SIRT (Cimmino) for very small λ = 0.01 and for ”optimal” λ.

Slow convergence.

ART can converge a lot faster than SIRT.

kxk¡

¹xk 2

=k¹xk 2

Page 13: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 13 P. C. Hansen – Implementation of Block AIR Methods

Performance

Iterations k

Rela

tive

erro

r

Test Problem: • Parallel-beam tomography. • 13 projections. • 3D Shepp-Logan phantom, Schabel (2006).

kxk ¡ ¹xk2=k¹xk2

ART

Page 14: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 14 P. C. Hansen – Implementation of Block AIR Methods

Performance 1 core

Intel Xeon E5620 2.40 GHz (1 core)

Same number of flops! The difference is due to the cache: ART uses row ai twice once it is loaded.

ART SIRT

Page 15: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 15 P. C. Hansen – Implementation of Block AIR Methods

Intel Xeon E5620 2.40 GHz (4 cores)

Performance 4 cores

ART SIRT

Four cores are better suited for block matrix-vector operations.

Page 16: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 16 P. C. Hansen – Implementation of Block AIR Methods

Our Dilemma

ART has faster convergence than SIRT – i.e., more reduction of the error per iteration.

SIRT can better take advantage of multi-core architecture than ART.

How to achieve the ”best of both worlds?” → Block methods!

Page 17: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 17 P. C. Hansen – Implementation of Block AIR Methods

Block Methods

In each iteration we can: • Treat the blocks sequentially or simultaneously (i.e., in parallel). • Treat each block by an iterative or by a direct computation.

We obtain several methods: • Sequential processing + ART on each block → classical ART • Sequential processing + SIRT on each block • Sequential processing + pseudoinverse of Aℓ • Parallel processing + ART on each block • Parallel processing + SIRT on each block → classical SIRT • Parallel processing + pseudoinverse of Aℓ

Page 18: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 18 P. C. Hansen – Implementation of Block AIR Methods

The convergence depends on the number of blocks p: If p = 1, we recover SIRT If p = m, we recover ART

Block-Sequential Methods

Eggermont, Herman, Lent (1981) Elfving (1980)

Parallelism given by the tradeoff:

Algorithm: Block-Sequential

Initialization: choose an arbitrary x0 2 Rn

Iteration: for k = 0; 1; 2; : : :

xk;0 = xk¡1

xk;` = P¡

xk;`¡1 + ¸ AT` M` (b` ¡A` xk;`¡1)

¢; ` = 1; 2; : : : ; p

xk = xk¡1;p

M` = (A`AT` )y ) AT

` M` = Ay`Variant by Elfving (1980):

Page 19: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 19 P. C. Hansen – Implementation of Block AIR Methods

The convergence depends on p: If p = 1, we recover ART

If p = m, we recover SIRT

Block-Parallel Methods

Algorithm: Block-Parallel

Initialization: choose an arbitrary x0 2 Rn

Iteration: for k = 0; 1; 2; : : :

for ` = 1; : : : ; p execute in parallel

xk;` = ART-sweep(¸; A`; b`; xk¡1)

xk = 1=pPp

`=1 xk;`.

Variants: Elfving (1980) – inner step:

CARP algorithm, Gordon & Gordon (2005): xk;` = P

¡xk¡1;` + ¸ Ay

`(b` ¡ A` xk¡1;`)¢

xk =Pp

`=1 D` xk;`; D` depends on sparsity structure

Censor, Elfving, Herman (2001)

Parallelism is given by:

Page 20: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 20 P. C. Hansen – Implementation of Block AIR Methods

Block Sequential

4 blocks

The ”building blocks” are SIRT iterations, suited for multicore. The blocks are treated sequentailly! Hence the error reduc-tion per iteration is close to that of ART.

ART SIRT Block-Seq.

Intel Xeon E5620

2.40 GHz (4 cores)

Page 21: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 21 P. C. Hansen – Implementation of Block AIR Methods

Block Parallel

ART SIRT Block Seq.

Block Par.

Intel Xeon E5620

2.40 GHz (4 cores)

Page 22: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 22 P. C. Hansen – Implementation of Block AIR Methods

Fair Comparison of the Methods …

It is quite easy to make an unfair comparison between the methods: choose a bad λ for the method you don’t like.

To make a fair comparison between the methods, we choose the value of λ that is (near) optimal for each method!

What do we mean by ”(near) optimal”? – Choose a test problem with a known solution. – Find the parameter λ that gives fastest semi-convergence.

The relaxation parameter λ makes comparisons difficult …

xnaive = A¡1b = xexact + A¡1e; kA¡1ek À kxexactk

Recall that we do not want the ”naive” solution:

Page 23: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 23 P. C. Hansen – Implementation of Block AIR Methods

Illustration of Semi-Convergence

A¡1b

Page 24: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 24 P. C. Hansen – Implementation of Block AIR Methods

Semi-convergence and relaxation parameter λ

Optimal λ reaches min. error in fewest iterations

Training for Optimal λ

Optimal λ

Iteration k

Page 25: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 25 P. C. Hansen – Implementation of Block AIR Methods

Convergence Results I

Only convergence is considered here, the number of cores is irrelevant.

Page 26: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 26 P. C. Hansen – Implementation of Block AIR Methods

Convergence Results II

Only convergence is considered here, the number of cores is irrelevant.

Page 27: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 27 P. C. Hansen – Implementation of Block AIR Methods

Blocks of Structurally Orthogonal Rows

When a block has structurally orthogonal rows then ART, SIRT and ”pinv” are equivalent. It is worthwhile to utilize this!

PART algorithm, Gordon (2006)

In 3D tomography, it is easy to find sets of rows that are orthogonal due to the structure of zeros/nonzeros.

Thus, a re-ordering of the rows can produce blocks with mutually orthogonal rows (= the traces of rays are non-overlapping).

Page 28: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 28 P. C. Hansen – Implementation of Block AIR Methods

Single-Core Results

Intel Core i7-3820 3.60 GHz (1 core)

Block-Seq: block-sequential-SIRT Block-Par: block-parallel-ART (Censor, Elfving, Herman) CARP: block-parallel-ART (Gordon, Gordon) PART – utilizes struct. orthog. ART (1 thread)

Page 29: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 29 P. C. Hansen – Implementation of Block AIR Methods

Multi-Core Performance – 4 cores

Block-seq-SIRT Block-par-ART (Censor, Elfving, Herman) Block-par-ART (Gordon, Gordon) PART – utilizes struct. orthog. ART (1 thread)

Page 30: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 30 P. C. Hansen – Implementation of Block AIR Methods

Multi-core Results – 4 Cores

Intel Core i7-3820 3.60 GHz (4 cores)

The advantage of PART over standard ART is due to the improved use of multicore architecture.

Block-Seq: block-sequential-SIRT Block-Par: block-parallel-ART (Censor, Elfving, Herman) CARP: block-parallel-ART (Gordon, Gordon) PART – utilizes struct. orthog. ART (1 thread)

Page 31: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 31 P. C. Hansen – Implementation of Block AIR Methods

Multi-core Results – 32 Cores

4 socket AMD Opteron 6282 SE 2.60 GHz (32 cores)

With many cores, PART is a clear winner. Block-Seq: block-sequential-SIRT

Block-Par: block-parallel-ART (Censor, Elfving, Herman) CARP: block-parallel-ART (Gordon, Gordon) PART – utilizes struct. orthog. ART (1 thread)

Page 32: Implementation of Block Algebraic Iterative Reconstruction ...pcha/HDtomo/SC/BlockAIR.pdf8 P. C. Hansen – Implementation of Block AIR Methods May 2014 Noise Sensitivity . Assume

May 2014 32 P. C. Hansen – Implementation of Block AIR Methods

Conclusions

Block algebraic iterative reconstruction techniques are able to achieve initial convergence rate similar to that of ART,

and with the smaller computing time of SIRT, because we can utilize the multicore architecture.

With a suitable row ordering and choice of blocks, we can produce blocks of structurally orthogonal rows.

PART has identical convergence to ART and very good scaling properties in practice.

Next step: target GPUs (up to 2688 cores).