75
Iterative Reconstruction in CT and MRI Jeffrey A. Fessler EECS Dept., BME Dept., Dept. of Radiology University of Michigan web.eecs.umich.edu/ ˜ fessler Fully 3D Conference 02 June 2015 1 / 74

Iterative Reconstruction in CT and MRIweb.eecs.umich.edu/~fessler/papers/lists/files/talk/15/f3d-keynote.pdf · Iterative Reconstruction in CT and MRI (and a bit of PET and SPECT)

Embed Size (px)

Citation preview

  • Iterative Reconstruction in CT and MRI

    (and a bit of PET and SPECT)

    Jeffrey A. Fessler

    EECS Dept., BME Dept., Dept. of RadiologyUniversity of Michigan

    web.eecs.umich.edu/fessler

    Fully 3D Conference

    02 June 2015

    1 / 74

    web.eecs.umich.edu/~fessler

  • Iterative Reconstruction in CT and MRI

    (and a bit of PET and SPECT)

    Jeffrey A. Fessler

    EECS Dept., BME Dept., Dept. of RadiologyUniversity of Michigan

    web.eecs.umich.edu/fessler

    Fully 3D Conference

    02 June 2015

    1 / 74

    web.eecs.umich.edu/~fessler

  • Disclosure

    Research support from GE Healthcare Supported in part by NIH grants P01 CA-87634, U01 EB018753 Equipment support from Intel Corporation

    Acknowledgment:many collaborators and many students and post-docs

    2 / 74

  • Outline

    WhatCTMRI

    WhyWhy CT iterativeWhy MRI iterative

    HowOptimization transferSeparable quadratic surrogatesMomentumOrdered subsets

    Parallelization

    3 / 74

  • Outline

    WhatCTMRI

    WhyWhy CT iterativeWhy MRI iterative

    HowOptimization transferSeparable quadratic surrogatesMomentumOrdered subsets

    Parallelization

    4 / 74

  • X-ray CT scans

    CT image reconstruction problem:Determine unknown attenuation map x given sinogram data yusing system matrix A.cf. SPECT with orbiting gamma camera

    5 / 74

  • MRI scans

    (No moving partsto animate)

    MR image reconstruction problem:Determine unknown magnetization image x given k-space data yusing system matrix ADefer motion for now...

    6 / 74

  • Inverse problems

    Unknownobject

    x Imagingsystem

    Datay Recon

    Imagex

    How to reconstruct object x from data y?Non-iterative methods: analytical / direct Filtered back-projection (FBP) for CT (textbook: Radon transform) Inverse FFT for MRI (textbook: FFT)

    idealized description of the system (textbook model) geometry / sampling disregards noise and simplifies physics

    typically fastIterative methods: model-based / statistical based on reasonably accurate models for physics and statistics usually much slower

    7 / 74

  • Statistical image reconstruction: CT example

    A picture is worth 1000 words (and perhaps several 1000 seconds of computation?)

    Thin-slice FBP ASIR (denoise) StatisticalSeconds A bit longer Much longer

    (Same sinogram, so all at same dose)

    8 / 74

  • Outline

    WhatCTMRI

    WhyWhy CT iterativeWhy MRI iterative

    HowOptimization transferSeparable quadratic surrogatesMomentumOrdered subsets

    Parallelization

    9 / 74

  • Why statistical/iterative methods for CT?

    Accurate physics models X-ray spectrum, beam-hardening, scatter, ...

    = reduced artifacts? quantitative CT? X-ray detector spatial response, focal spot size, ...

    = improved spatial resolution? detector spectral response (e.g., photon-counting detectors)

    = improved contrast between distinct material types?

    Nonstandard geometries transaxial truncation (wide patients) long-object problem in helical CT irregular sampling in next-generation geometries coarse angular sampling in image-guidance applications limited angular range (tomosynthesis) missing data, e.g., bad pixels in flat-panel systems

    10 / 74

  • Why iterative for CT ... continued

    Appropriate models of (data dependent) measurement statistics weighting reduces influence of photon-starved rays (cf. FBP)

    = reducing image noise or X-ray dose

    Object constraints / priors nonnegativity object support piecewise smoothness object sparsity (e.g., angiography) sparsity in some basis motion models dynamic models ...

    Henry Gray, Anatomy ofthe Human Body, 1918,Fig. 413.

    Constraints may help reduce image artifacts or noise or dose.

    Similar motivations/benefits in PET and SPECT.

    11 / 74

  • Disadvantages of iterative methods for CT?

    I Computation timeI Must reconstruct entire FOVI Complexity of models and softwareI Algorithm nonlinearities Difficult to analyze resolution/noise properties (cf. FBP) Tuning parameters Challenging to characterize performance / assess IQ

    12 / 74

  • Sub-mSv example

    3D helical X-ray CT scan of abdomen/pelvis:100 kVp, 25-38 mA, 0.4 second rotation, 0.625 mm slice, 0.6 mSv.

    FBP ASIR Statistical

    13 / 74

  • MBIR example: Chest CT

    Helical chest CT study with dose = 0.09 mSv.Typical CXR effective dose is about 0.06 mSv.(Health Physics Soc.: http://www.hps.org/publicinformation/ate/q2372.html)

    FBP MBIRVeo (MBIR) images courtesy of Jiang Hsieh, GE Healthcare

    14 / 74

    http://www.hps.org/publicinformation/ate/q2372.html

  • History: Statistical reconstruction for X-ray CT

    Iterative method for X-ray CT (Hounsfield, 1968) ART for tomography (Gordon, Bender, Herman, JTB, 1970) ... Roughness regularized LS for tomography (Kashyap & Mittal, 1975) Poisson likelihood (transmission) (Rockmore and Macovski, TNS, 1977) EM algorithm for Poisson transmission (Lange and Carson, JCAT, 1984) Iterative coordinate descent (ICD) (Sauer and Bouman, T-SP, 1993) Ordered-subsets algorithms (Manglos et al., PMB 1995)

    (Kamphuis & Beekman, T-MI, 1998)(Erdogan & Fessler, PMB, 1999)

    ... Commercial OS for Philips BrightView SPECT-CT (2010) Commercial ICD for GE CT scanners (circa 2010) FDA 510(k) clearance of Veo (Sep. 2011) First Veo installation in USA (at UM) (Jan. 2012)

    ( numerous omissions, including many denoising methods)

    15 / 74

  • Statistical image reconstruction for CT: Formulation

    Optimization problem formulation: x = arg minx0 (x)

    (x) cost

    function

    ,12 y Ax

    2W

    data-fit termphysics & statistics

    +N

    j=1

    kNj

    (xj xk) regularizer

    prior models

    y : measured data (sinogram)A : system matrix (physics / geometry)W : weighting matrix (statistics)x : unknown image (attenuation map) : regularization parameter(s)Nj : neighborhood of jth voxel : edge-preserving potential function(piece-wise smoothness / gradient sparsity)

    16 / 74

  • Statistical image reconstruction for CT: Research

    x = arg minx0

    (x), (x) , 12 y Ax2W +

    j

    k

    j,k (xj xk)

    Apparent topics: regularization design / parameter selection , jk statistical modeling W , system modeling A optimization algorithms (arg min) assessing IQ of x

    Other topics: system design motion spectral dose ...

    17 / 74

  • MRI: Why iterative reconstruction?

    Inverse FFT is fast (like FBP). Why change?(Joint work with D. Noll, J. Nielsen, ...)

    Recall rationale for CT/PET/SPECT:I physics modeling reduce artifacts improve resolution improve contrast

    I noise modeling: (dose, variability)I sampling: non-standard geometriesI constraints on object

    Which of these matter for MRI?

    18 / 74

  • MRI why iterative: PhysicsPhysics modeling (e.g., field inhomogeneity) = reduced artifacts

    Example: T2*-weighted imaging (Sutton et al., IEEE T-MI, 03)

    uncorrected traditional iterative field map

    x = arg minx

    12 y Ax

    22 + R(x)

    System matrix A depends on (measured) field map:

    aij = ej ti e2~i ~rj

    No analytical inverse of A. cf. nonuniform attenuation correction in SPECT19 / 74

    http://dx.doi.org/10.1109/TMI.2002.808360

  • MRI why iterative: PhysicsJoint estimation of field map and magnetization image x:

    (x, ) = arg minx,

    12 y A()x

    22 + 1 R1(x) +2 R2()

    Useful when field map drifts in dynamic imaging.(Sutton et al., MRM 04) (Olafsson et al., T-MI 08)

    cf. joint estimation of attenuation map and activity image in SPECT, PET and TOF-PET.(Censor et al., T-NS 79) (Clinthorne et al., NSS 91) (Rezaei, Defrise, Nuyts, T-MI 14)

    20 / 74

  • MRI why iterative: Physics

    RF pulse design

    RF pulseb Bloch Eqn

    Excited magnetizationm

    Small-tip approximation: m AbIterative RF pulse design (with RF power regularization):

    arg minbm Ab22 + b

    22

    Minimize using CG. (Yip et al., MRM, Oct. 2005)

    d. Non-iterative:e. Iterative:

    21 / 74

  • MRI why iterative: Noise

    I MRI measurements: (complex) AWGN = easy !?

    I Variance of image phase depends on image magnitude.I Image phase useful in some applications, e.g., B1 mapping:

    Unregularized vs regularized phase estimate. (Zhao et al., T-MI 14)

    22 / 74

  • MRI why iterative: Noise

    I MRI measurements: (complex) AWGN = easy !?I Variance of image phase depends on image magnitude.I Image phase useful in some applications, e.g., B1 mapping:

    Unregularized vs regularized phase estimate. (Zhao et al., T-MI 14)23 / 74

  • MRI why iterative: Sampling

    I Reducing k-space sampling = reduced scan timeI Especially compelling for dynamic imaging (cf. CT and SPECT)I Popular under-sampled patterns: (cf. sparse-view CT)

    Random Cartesian

    kx

    kyRadial

    I Solution strategies Multiple receive coils Object model assumptions (e.g., sparsity) iterative reconstruction (compressed sensing)

    24 / 74

  • Parallel MRI

    Under-sampled Cartesian k-space: use multiple receive coils withindividual spatial sensitivity patterns. (Pruessmann et al., MRM, 1999)

    Array coil images

    1 64

    1

    64 A =

    FS1FS2...

    Compressed sensing parallel MRI (random) under-samplingLustig et al., IEEE Sig. Proc. Mag., Mar. 2008cf. multiple-source CT (speed) or multi-camera SPECT (counts)

    25 / 74

    http://dx.doi.org/10.1002/(SICI)1522-2594(199911)42:53.0.CO;2-Shttp://dx.doi.org/10.1109/MSP.2007.914728

  • Model-based image reconstruction in parallel MRI

    Regularized estimator:

    x = arg minx

    12 y FSx

    22

    data fit

    + Rxp sparsity

    .

    F is under-sampled DFT matrix (wide)Features: coil sensitivity matrix S is block diagonal F F is circulant (for Cartesian sampling)

    Challenges: Data-fit Hessian S F FS is highly shift variant due to coil

    sensitivity maps Non-quadratic (edge-preserving) regularization p Non-smooth regularization 1 (cf. sparse view CT) Complex quantities Large problem size (if 3D or dynamic or many coils)

    26 / 74

  • 2.5D parallel MR image reconstruction

    Example of compressed sensing MRI reconstruction:

    0.4

    32

    63.6

    95.1

    126.7

    158.2

    189.8

    221.3

    252.9

    284.4

    316

    347.5

    0.4

    32

    63.6

    95.1

    126.7

    158.2

    189.8

    221.3

    252.9

    284.4

    316

    347.5

    0.4

    32

    63.6

    95.1

    126.7

    158.2

    189.8

    221.3

    252.9

    284.4

    316

    347.5

    0

    8.4

    16.8

    25.2

    33.5

    41.9

    50.3

    58.7

    67

    75.4

    83.8

    92.2

    original k-space IFFT iterative difference

    Fully sampled body coil image of human brain (144 128) Poisson-disk-based k-space sampling, 16% sampling (acceleration 6.25) Square-root of sum-of-squares inverse FFT of zero-filled k-space data for 8 coils Regularized reconstruction x ()

    combined TV and `1 norm of two-level undecimated Haar wavelets Difference image magnitude

    (Sathish Ramani & JF, IEEE T-MI, Mar. 2011)

    27 / 74

    http://dx.doi.org/10.1109/TMI.2010.2093536

  • Summary of What and Why

    I CT and MRI both involve inverse problemsI Some similarities in motivations and formulationsI Some similarities in computation challengesI Some opportunities for cross-fertilizationI Caution: MRI reconstruction field is crowded!

    28 / 74

  • Outline

    WhatCTMRI

    WhyWhy CT iterativeWhy MRI iterative

    HowOptimization transferSeparable quadratic surrogatesMomentumOrdered subsets

    Parallelization

    29 / 74

  • SIR for CT: Optimization challenges

    x = arg minx0

    (x), (x) , 12 y Ax2W +

    Nj=1

    k

    j,k (xj xk)

    Optimization challenges: large problem size: x R512512600, y R888647000 A is sparse but still too large to store; compute Ax on-the-fly W has enormous dynamic range (1 to exp(9) 1.2 104) Gram matrix AWA highly shift variant is non-quadratic but convex (and often smooth) nonnegativity constraint data size grows: dual-source CT, spectral CT, wide-cone CT, ... Moores law insufficient

    30 / 74

  • Optimization transfer (Majorize-Minimize) methods: 1D(x) (x)

    (n)(x)

    x(n)

    x(n+1)

    x

    Surrogate function (majorizer)Cost function

    (n)(x (n)) = (x (n))(n)(x) (x)

    cf. ML-EM

    x (n+1) = arg minx

    (n)(x)

    31 / 74

  • Optimization transfer (Majorize-Minimize) methods: 2D

    32 / 74

  • Outline

    WhatCTMRI

    WhyWhy CT iterativeWhy MRI iterative

    HowOptimization transferSeparable quadratic surrogatesMomentumOrdered subsets

    Parallelization

    33 / 74

  • Separable Quadratic Surrogates (SQS): Math

    L(x) = 12 y Ax2W

    = L(x (n)

    )+ L

    (x (n)

    )(x x (n)) + 12

    (x x (n)

    )AWA (x x (n)) non-separable

    L(x (n)

    )+ L

    (x (n)

    )(x x (n)) + 12

    (x x (n)

    )D (x x (n)) separable

    , (n)L (x), a SQS,

    where AWA D = diag{AWA1} . (De Pierro, T-MI, Mar. 1995)Proofs: Convexity of x2 Gersgorin disk theorem Cauchy-Schwarz inequality

    34 / 74

  • Separable Quadratic Surrogates (SQS): Pictures

    Find minimizer of L(x): challenging Find minimizer of (n)L (x): easy (separate 1D problems)

    35 / 74

  • WLS-SQS: Iteration

    General optimization transfer (majorize-minimize) method:

    x (n+1) = arg minx

    (n)L (x)

    For SQS:

    (n)L (x) = L(x (n)

    )+ L

    (x (n)

    )(x x (n)) + 12

    (x x (n)

    )D (x x (n))(n)L (x) = L

    (x (n)

    )+D

    (x x (n)

    )0 = (n)L

    (x (n+1)

    )= L

    (x (n)

    )+D

    (x (n+1) x (n)

    )x (n+1) = x (n) D1 L

    (x (n)

    )diagonally preconditioned gradient descent

    (Erdogan & JF, PMB, 1999)

    36 / 74

  • SQS versus GD: Math

    Ordinary gradient descent (GD) for WLS:

    x (n+1) = x (n) L(x (n)

    )= x (n) AW (Ax (n) y),

    where textbook step size is reciprocal of Lipschitz constant:

    = 1max(AWA)

    .

    WLS-GD is equivalent to WLS-SQS with isotropic majorizerHessian:

    D = max(AWA)I.

    Drawbacks: max(AWA) usually impractical to compute (in CT) Usually slower convergence due to smaller step sizes

    37 / 74

  • SQS versus GD: Pictures

    38 / 74

  • SQS versus GD: Pictures

    39 / 74

  • Outline

    WhatCTMRI

    WhyWhy CT iterativeWhy MRI iterative

    HowOptimization transferSeparable quadratic surrogatesMomentumOrdered subsets

    Parallelization

    40 / 74

  • Classical gradient descent (GD)Assumptions: is convex (need not be strictly convex) has non-empty set of global minimizers

    x X ={

    x (?) RN : (x (?)) (x), x RN}

    is smooth (differentiable with L-Lipschitz gradient)(x)(z)2 L x z2 , x, z RN

    GD with step size 1/L ensures monotonic descent of :

    x (n+1) = x (n) 1L (x (n)

    ).

    Drori & Teboulle (2014) derive tightest inaccuracy bound:

    (x (n)

    )

    (x (?)

    ) inaccuracy

    Lx (0) x (?)22

    4n + 2 .

    For a Huber-like function , GD achieves that (tight) bound.O(1/n) rate is undesirably slow.

    41 / 74

  • Nesterovs fast gradient method (FGM1)

    Nesterov (1983) iteration: Initialize: t0 = 1, z (0) = x (0)

    z (n+1) = x (n) 1L (x (n)

    )(usual GD update)

    tn+1 =12

    (1 +

    1 + 4t2n

    )(magic momentum factors)

    x (n+1) = z (n+1) + tn 1tn+1

    (z (n+1) z (n)

    )(update with momentum)

    I Reverts to GD if tn = 1, n.I Comparable computation as GDI Store one additional image-sized vector z (n)

    42 / 74

  • FGM1 properties

    FGM1 shown by Nesterov to be O(1/n2) for primary sequence:

    (z (n)

    )

    (x (?)

    )

    2Lx (0) x (?)22(n + 1)2 .

    Nesterov constructed a function such that any first-ordermethod achieves

    332L

    x (0) x (?)22(n + 1)2

    (x (n)

    )

    (x (?)

    ).

    Thus O(1/n2) rate of FGM1 is optimal.Donghwan Kim (2014) analyzed secondary sequence:

    (x (n)

    )

    (x (?)

    )

    2Lx (0) x (?)22(n + 2)2 .

    43 / 74

  • SQS plus momentum for parallel MRI

    I Traditional iterative soft thresholding algorithm (ISTA)uses (global) Lipschitz constant of data-fit term:

    2 12 y FS22 = S

    F FS S S maxI, max = maxj[S S

    ]j,j

    max is maximum sum-of-squares value of sensitivity maps.I Augmented Lagrangian (AL) methods converge faster than

    ISTA, FISTA, MFISTA (Ramani & JF, T-MI, 2011)I BARISTA (B1-based, adaptive restart, ISTA)

    (Muckley, Noll, JF, T-MI, 2015)For synthesis operator x = Qz with z sparse:

    2 12 y FSQ22 = Q

    S F FSQ QS SQ D

    for a suitable diagonal matrix D. (cf., SQS)I D1 becomes voxel-dependent step size, akin to SQS in CT

    44 / 74

    http://dx.doi.org/10.1109/TMI.2014.2363034

  • BARISTA convergence rates

    Compressed sensing MRI reconstruction:Total variation (TV) regularizer Undecimated Haar Wavelets

    Corresponding D for each of the two cases:BARISTA requires no algorithm parameter tuning, unlike AL.Includes momentum with adaptive restart of ODonoghue andCandes (2014).

    45 / 74

  • Generalizing Nesterovs FGM

    FGM1 is in the general class of first-order methods:

    x (n+1) = x (n) 1L

    nk=0

    hn+1,k (x (k)

    )where the step-size factors {hn,k} are

    1 0 0 0 0 00 1.25 0 0 0 00 0.10 1.40 0 0 00 0.05 0.20 1.50 0 00 0.03 0.11 0.29 1.57 0...

    . . .

    Use of previous gradients = momentumIs this the optimal choice for {hn,k} ?Can we improve on the constant 2 in worst-case convergence rate?Drori & Teboulle (2014) numerically found 2 better {hn,k}

    46 / 74

  • Optimized gradient method (OGM1)New approach by optimizing {hn,k} analyticallyInitialize: t0 = 1, z (0) = x (0) (Donghwan Kim and JF; 2014, 2015)

    z (n+1) = x (n) 1L (x (n)

    )(usual GD update)

    tn+1 =12

    (1 +

    1 + 4t2n

    )(momentum factors)

    x (n+1) = z (n+1) + tn 1tn+1

    (z (n+1) z (n)

    )+ tntn+1

    (z (n+1) x (n)

    )

    new momentum

    Smaller (worst-case) convergence bound than Nesterov by 2:

    (z (n)

    )

    (x (?)

    )

    1Lx (0) x (?)22(n + 1)2 .

    Recently DK found a Huber-like function for which OGM1 achieves that upper bound(thus tight), inspired by numerical work of Taylor et al. (2015).

    47 / 74

  • Example: Image restoration (!?)

    Truex

    Blurryy

    Restoredx

    Rate

    0 50 100 150 200

    102

    100

    GM

    FGM

    OGM

    (x (n))(x) vs iteration n

    48 / 74

  • Outline

    WhatCTMRI

    WhyWhy CT iterativeWhy MRI iterative

    HowOptimization transferSeparable quadratic surrogatesMomentumOrdered subsets

    Parallelization

    49 / 74

  • Ordered subsets approximation

    I Data decomposition (aka incremental gradients, cf. stochastic GD):

    (x) =M

    m=1m(x), m(x) ,

    12 ym Amx

    2Wm

    1/Mth of measurements

    + 1M R(x)

    I Key idea. For x far from minimizer: (x) Mm(x)I SQS:

    x (n+1) = x (n) D1(x (n)

    )I OS-SQS:

    for n = 0, 1, . . . (iteration)for m = 1, . . . ,M (subset)

    k = nM + m (subiteration)

    xk+1 = xk D1Mm(xk)

    less work

    I Coil-wise in parallel MRI (Muckley, Noll, JF, ISMRM 2014)50 / 74

  • Ordered subsets version of OGM1

    For more acceleration, combine OGM1 with ordered subsets (OS).

    OS-OGM1:Initialize: t0 = 1, z (0) = x (0)for n = 0, 1, . . . (iteration)

    for m = 1, . . . ,M (subset)

    k = nM + m (subiteration)

    zk+1 =[xk D1Mm

    (xk)]

    +(typical OS-SQS)

    tk+1 =12

    (1 +

    1 + 4t2k

    )xk+1 = zk+1 + tk 1tk+1

    (zk+1 zk

    )+ tktk+1

    (zk+1 xk

    )

    51 / 74

  • OS-OGM1 properties

    I Approximate convergence rate for : O(

    1n2M2

    )(Donghwan Kim and JF; CT 2014)

    I Same compute per iteration as other OS methods(One forward / backward projection and M regularizer gradients per iteration)

    I Same memory as OGM1 (two more images than OS-SQS)I Guaranteed convergence for M = 1I No convergence theory for M > 1 unstable for large M small M preferable for parallelization

    I Now fast enough to show X-ray CT examples...

    52 / 74

  • OS-OGM1 results: data

    3D cone-beam helical X-ray CT scan pitch 0.5 image x: 512 512 109 with 70 cm FOV and 0.625 mm slices sinogram : y 888 detectors 32 rows 7146 views

    53 / 74

  • OS-OGM1 results: convergence rate

    Root mean square difference (RMSD) between x (n) and x () overROI (in HU), versus iteration.(Compute times per iteration are very similar.)

    54 / 74

  • OS-OGM1 results: images

    At iteration n = 10 with M = 12 subsets.

    55 / 74

  • OS divergence example

    1 1

    4

    one-pixel image three intersecting rays

    A =

    114

    x = 2, y = Ax =

    228

    condition number of AA = 1 consistent system of eqns.

    56 / 74

  • OS divergence example

    OS-SQS-LS for M = 3 subsets:

    xnew = xold D13mxold = xold D13A(Axold y)

    D = diag{AA1} = 12 + 12 + 42 = 18After 3 updates:

    x (n+1) x =(

    1 31812)(

    1 31812)(

    1 31842) (

    x (n) x)

    = 2(15/18)3(x (n) x

    )= 125108

    (x (n) x

    )Divergence of OS-SQS-LS is possible even in well-conditioned,consistent case

    57 / 74

  • Outline

    WhatCTMRI

    WhyWhy CT iterativeWhy MRI iterative

    HowOptimization transferSeparable quadratic surrogatesMomentumOrdered subsets

    Parallelization

    58 / 74

  • Amazon Cloud version of OS-OGMDistribute long object (320 useful slices) into (overlapping) slabs(128 slices each) across 5 separate clusters, each with 10 nodeshaving 16 cores.Use MPI (message passing interface) for within-clustercommunication:

    Forward Projection

    Back Projection

    Regularization UpdateForward

    Projection

    . . .Broadcast

    CommunicationBroadcast

    Communication

    1 16 324864 8096 128 160 192 2241

    16

    32

    48

    64

    80

    96

    128

    160

    192

    224

    Numbers of Cores

    Sp

    ee

    du

    p

    Ideal Speedup

    Observed Speedup

    1 32 64 96 128 160 192 22410

    1

    102

    103

    Number of Cores

    Tim

    e p

    er

    Ite

    rati

    on

    (s

    ec

    )

    FP BP Reg Update Comm0

    5

    10

    15

    20

    25

    30

    35

    Tim

    e (

    se

    c)

    10 nodes

    1 node

    Linearly scaled10 nodes

    Rosen, Wu, Wenisch, JF (Fully 3D, 2013) Overlapping slabs is inefficient Communication time (within cluster, after every subset) is

    serious bottleneck59 / 74

  • Block-separable surrogates for distributed reconstruction

    Conventional OS approach uses a voxel-wise SQS:

    (x) (x (n)

    )+

    (x (n)

    )(x x (n)) + 12(x x

    (n))D(x x (n))

    = (x (n)

    )+

    Nj=1

    xj(x (n)

    )(xj x (n)j ) +

    12 dj

    (xj x (n)j

    )2Diagonal matrix D majorizes the Hessian of : 2 (x) D.Distributed computing alternative: slab-separable surrogate:

    (x)(x (n)

    )

    Bb=1

    b(xb)

    b(xb) , xb (x (n)

    )(xb x (n)b ) +

    12(xb x (n)b

    )Hb(xb x (n)b

    )Block diagonal matrix H = diag{H1, . . . ,HB} majorizes 2 (x) .

    60 / 74

  • BSS continued

    b(xb) , xb (x (n)

    )(xb x (n)b ) +

    12(xb x (n)b

    )Hb(xb x (n)b

    )

    Hb , AbW bAb, b , diag{A1 Ab1b}

    Updates parallelizable across blocks (slabs):

    x (n+1)b , arg minxb0

    b(xb) .

    I Reduces communication.I (Apply favorite optimization method within slab.)I (Donghwan Kim and JF; Fully 3D, 2015) [Mo18]

    61 / 74

  • Block-separable surrogate (BSS) OS-OGM

    62 / 74

  • BSS OS-OGM: data

    256 256 160 XCAT phantom (Segars et al., 2008) Simulated helical CT, 444 32 492 M = 12 subsets, B = 10 blocks, L = 5 inner iterations Matlab emulation

    FBP initializer x (0) Converged x ()

    63 / 74

  • BSS OS-OGM: rates

    Outer loop interrupts momentum= BSS is slower per iteration than OS-OGM

    Reduced communication reduces overall time

    64 / 74

  • BSS OS-OGM: images

    Comparable images Algorithm designed for distributed computation

    65 / 74

  • Duality approach for using GPU Data transfer between system RAM and GPU can be bottleneck Hide communication time by overlapping with computation

    Algorithm synopsis: (Madison McGaffin and JF; Fully 3D, 2015) [Wed. AM] Write cost function (x) in terms of dual variables v and u for

    data-fit and regularizer:

    (x) =M

    i=1hi ([Ax]i ) +

    k([Cx]k)

    x (n+1) = arg minx

    supu,v(

    A u +C v) x M

    i=1hi (ui )

    k(vk) +

    2x x (n)22

    hi and denote convex conjugates of hi and Alternate between updating several projection view dual variables {ui} dual variables for one regularization direction {vk}

    Using dual variables decouples regularizer and data terms OS-like method with convergence theorem More details in Dr. McGaffins talk

    66 / 74

  • Duality-GPU: data

    3D cone-beam helical X-ray CT scan pitch 0.5 image x: 512 512 109 with 70 cm FOV and 0.625 mm slices sinogram : y 888 detectors 32 rows 7146 views OpenCL on aging NVIDIA GTX 480 GPU with 2.5 GB RAM

    FBP initializer x (0) Converged x ()

    67 / 74

  • Duality-GPU: timing results

    Algorithm designed specifically for GPU architecturecharacteristics Future work: combine with BSS for multiple nodes ?

    68 / 74

  • Duality-GPU: image results

    69 / 74

  • Summary

    I Model-based image reconstruction can improve image quality for low-dose X-ray CT enable faster MRI scans via under-sampling

    I Much more: dynamic image reconstruction, motioncompensation, ...

    I Computation time remains a significant challengeI Moores law will not solve the problemI Algorithms designed for distributed computation are essential Block-separable surrogates to reduce communication

    (Donghwan Kim and JF; Fully 3D, 2015) [Mo18] Duality approach to overlap communication with

    computationAlso provides a OS-like algorithm with convergence theory(Madison McGaffin and JF; Fully 3D, 2015) [Wed. AM]

    70 / 74

  • Bibliography I

    * G. Hounsfield, A method of apparatus for examination of a body by radiation such as x-ray or gammaradiation, US Patent 1283915. British patent 1283915, London., 1972.

    * R. Gordon, R. Bender, and G. T. Herman, Algebraic reconstruction techniques (ART) for thethree-dimensional electron microscopy and X-ray photography, J. Theor. Biol., vol. 29, no. 3, 47181,Dec. 1970.

    * R. Gordon and G. T. Herman, Reconstruction of pictures from their projections, Comm. ACM, vol.14, no. 12, 75968, Dec. 1971.

    * G. T. Herman, A. Lent, and S. W. Rowland, ART: mathematics and applications (a report on themathematical foundations and on the applicability to real data of the algebraic reconstructiontechniques), J. Theor. Biol., vol. 42, no. 1, 132, Nov. 1973.

    * R. Gordon, A tutorial on ART (algebraic reconstruction techniques), IEEE Trans. Nuc. Sci., vol. 21,no. 3, 7893, Jun. 1974.

    * R. L. Kashyap and M. C. Mittal, Picture reconstruction from projections, IEEE Trans. Comp., vol. 24,no. 9, 91523, Sep. 1975.

    * A. J. Rockmore and A. Macovski, A maximum likelihood approach to transmission imagereconstruction from projections, IEEE Trans. Nuc. Sci., vol. 24, no. 3, 192935, Jun. 1977.

    * K. Lange and R. Carson, EM reconstruction algorithms for emission and transmission tomography, J.Comp. Assisted Tomo., vol. 8, no. 2, 30616, Apr. 1984.

    * K. Sauer and C. Bouman, A local update strategy for iterative reconstruction from projections, IEEETrans. Sig. Proc., vol. 41, no. 2, 53448, Feb. 1993.

    * S. H. Manglos, G. M. Gagne, A. Krol, F. D. Thomas, and R. Narayanaswamy, Transmissionmaximum-likelihood reconstruction with ordered subsets for cone beam CT, Phys. Med. Biol., vol. 40,no. 7, 122541, Jul. 1995.

    71 / 74

  • Bibliography II

    * C. Kamphuis and F. J. Beekman, Accelerated iterative transmission CT reconstruction using an orderedsubsets convex algorithm, IEEE Trans. Med. Imag., vol. 17, no. 6, 10015, Dec. 1998.

    * H. Erdogan and J. A. Fessler, Ordered subsets algorithms for transmission tomography, Phys. Med.Biol., vol. 44, no. 11, 283551, Nov. 1999.

    * E. Hansis, J. Bredno, D. Sowards-Emmerd, and L. Shao, Iterative reconstruction for circular cone-beamCT with an offset flat-panel detector, in Proc. IEEE Nuc. Sci. Symp. Med. Im. Conf., 2010, 222831.

    * B. P. Sutton, D. C. Noll, and J. A. Fessler, Fast, iterative image reconstruction for MRI in the presenceof field inhomogeneities, IEEE Trans. Med. Imag., vol. 22, no. 2, 17888, Feb. 2003.

    * ,Dynamic field map estimation using a spiral-in / spiral-out acquisition, Mag. Res. Med., vol. 51,no. 6, 1194204, Jun. 2004.

    * V. T. Olafsson, D. C. Noll, and J. A. Fessler, Fast joint reconstruction of dynamic R2 and field maps infunctional MRI, IEEE Trans. Med. Imag., vol. 27, no. 9, 117788, Sep. 2008.

    * Y. Censor, D. E. Gustafson, A. Lent, and H. Tuy, A new approach to the emission computerizedtomography problem: simultaneous calculation of attenuation and activity coefficients, IEEE Trans.Nuc. Sci., vol. 26, no. 2, 27759, Apr. 1979.

    * N. H. Clinthorne, J. A. Fessler, G. D. Hutchins, and W. L. Rogers, Joint maximum likelihood estimationof emission and attenuation densities in PET, in Proc. IEEE Nuc. Sci. Symp. Med. Im. Conf., vol. 3,1991, 192732.

    * H. Erdogan and J. A. Fessler, Algorithms for joint estimation of attenuation and emission images inPET, in Proc. IEEE Conf. Acoust. Speech Sig. Proc., vol. 6, 2000, 37836.

    * A. Rezaei, M. Defrise, and J. Nuyts, ML-reconstruction for TOF-PET with simultaneous estimation ofthe attenuation factors, IEEE Trans. Med. Imag., vol. 33, no. 7, 156372, Jul. 2014.

    * C. Yip, J. A. Fessler, and D. C. Noll, Iterative RF pulse design for multidimensional, small-tip-angleselective excitation, Mag. Res. Med., vol. 54, no. 4, 90817, Oct. 2005.

    72 / 74

  • Bibliography III

    * F. Zhao, J. A. Fessler, S. M. Wright, and D. C. Noll, Regularized estimation of magnitude and phase ofmulti-coil B1 field via Bloch-Siegert B1 mapping and coil combination optimizations, IEEE Trans.Med. Imag., vol. 33, no. 10, 202030, Oct. 2014.

    * S. S. Vasanawala, M. T. Alley, B. A. Hargreaves, R. A. Barth, J. M. Pauly, and M. Lustig, Improvedpediatric MR imaging with compressed sensing, Radiology, vol. 256, 60716, 2010.

    * K. P. Pruessmann, M. Weiger, M. B. Scheidegger, and P. Boesiger, SENSE: sensitivity encoding forfast MRI, Mag. Res. Med., vol. 42, no. 5, 95262, Nov. 1999.

    * M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly, Compressed sensing MRI, IEEE Sig. Proc.Mag., vol. 25, no. 2, 7282, Mar. 2008.

    * S. Ramani and J. A. Fessler, Parallel MR image reconstruction using augmented Lagrangian methods,IEEE Trans. Med. Imag., vol. 30, no. 3, 694706, Mar. 2011.

    * A. R. De Pierro, A modified expectation maximization algorithm for penalized likelihood estimation inemission tomography, IEEE Trans. Med. Imag., vol. 14, no. 1, 1327, Mar. 1995.

    * Y. Drori and M. Teboulle, Performance of first-order methods for smooth convex minimization: A novelapproach, Mathematical Programming, vol. 145, no. 1-2, 45182, Jun. 2014.

    * Y. Nesterov, A method for unconstrained convex minimization problem with the rate of convergenceO(1/k2), Dokl. Akad. Nauk. USSR, vol. 269, no. 3, 5437, 1983.

    * ,Smooth minimization of non-smooth functions, Mathematical Programming, vol. 103, no. 1,12752, May 2005.

    * D. Kim and J. A. Fessler, Optimized first-order methods for smooth convex minimization, arxiv1406.5468, 2014.

    * ,Optimized first-order methods for smooth convex minimization, Mathematical Programming,2015, Submitted.

    73 / 74

  • Bibliography IV

    * M. J. Muckley, D. C. Noll, and J. A. Fessler, Fast parallel MR image reconstruction via B1-based,adaptive restart, iterative soft thresholding algorithms (BARISTA), IEEE Trans. Med. Imag., vol. 34,no. 2, 57888, Feb. 2015.

    * B. ODonoghue and E. Candes, Adaptive restart for accelerated gradient schemes, Found.Computational Math., 2014.

    * D. Kim and J. A. Fessler, An optimized first-order method for image restoration, in Proc. IEEE Intl.Conf. on Image Processing, To appear., 2015.

    * A. B. Taylor, J. M. Hendrickx, and Francois. Glineur, Smooth strongly convex interpolation and exactworst-case performance of first- order methods, arxiv 1502.05666, 2015.

    * M. Muckley, D. C. Noll, and J. A. Fessler, Accelerating SENSE-type MR image reconstructionalgorithms with incremental gradients, in Proc. Intl. Soc. Mag. Res. Med., 2014, p. 4400.

    * D. Kim and J. A. Fessler, Optimized momentum steps for accelerating X-ray CT ordered subsets imagereconstruction, in Proc. 3rd Intl. Mtg. on image formation in X-ray CT, 2014, 1036.

    * J. M. Rosen, J. Wu, T. F. Wenisch, and J. A. Fessler, Iterative helical CT reconstruction in the cloudfor ten dollars in five minutes, in Proc. Intl. Mtg. on Fully 3D Image Recon. in Rad. and Nuc. Med,2013, 2414.

    * D. Kim and J. A. Fessler, Distributed block-separable ordered subsets for helical X-ray CT imagereconstruction, in Proc. Intl. Mtg. on Fully 3D Image Recon. in Rad. and Nuc. Med, To appear., 2015.

    * W. P. Segars, M. Mahesh, T. J. Beck, E. C. Frey, and B. M. W. Tsui, Realistic CT simulation using the4D XCAT phantom, Med. Phys., vol. 35, no. 8, 38008, Aug. 2008.

    * M. G. McGaffin and J. A. Fessler, Fast GPU-driven model-based X-ray CT image reconstruction viaalternating dual updates, in Proc. Intl. Mtg. on Fully 3D Image Recon. in Rad. and Nuc. Med, Toappear., 2015.

    74 / 74

    WhatCTMRI

    WhyWhy CT iterativeWhy MRI iterative

    HowOptimization transferSeparable quadratic surrogatesMomentumOrdered subsets

    Parallelization

    fd@rm@0: fd@rm@1: