Parallelization of System Matrix generation code Mahmoud Abdallah Antall Fernandes

Parallelization of System Matrix generation code

Mahmoud AbdallahAntall Fernandes

SPECT System

SPECT System

Inverse Cone

Back Projection

Ref figure: Tomographic Reconstruction of SPECT Data – Bill Amini, Magnus Björklund, Ron Dror, Anders Nygren oo

Filtered Back Projection is applying a ramp filter on the back projected image.Still widely used for its high speed and easy implementation.

Maximum Likelihood-Expectation Maximization Algorithm

Is found to reduce noise in reconstruction iteratively

An iterative algorithm is used to solve the following linear problemFX = PP – vector of projection dataX – voxelized imageF – projection matrix operator

Needs a large number of iterations to reconstruct an image

EM AlgorithmThe EM algorithm is given by

Summation over k is projection operation

Summation over j is the back projection operation

System Matrix

Maps the image space to the data space

Takes detector geometry as input

Generates detector data for every bin for each angle (usually there are 72 angles/frames)

System Matrix Algorithm

for each angle DO // number of angles = 72for each detector bin in U direction Do // bins: around 14

for each detector bin in V direction Do // bins: around 64for each row in the inverse cone grid Do // <= 99

for each Column in the inverse cone grid Do //<= 99for each voxel intersected the Ray Do calculate point responseend

endend

endend

end

Number of loops = 72 x 14 x 64 x 99 x 99 = 632282112

System Matrix Parallelization

Observation:At each angle, each bin’s calculations are independent from other bins’.

Proposal:Parallelize all calculations for each angle.

E.g. use GPU.

System Matrix Parallelization on GPU

Parallelized System Matrix Algorithm

Host Program:for each angle DO

Run all kernels for all bins at the same timeend

GPU Kernel:for each voxel intersected the Ray Do calculate attenuation and store it in SysMatend

SIMD (Architecture of GPU)

From: (AMD) Advanced Micro Devices INC 2010 (Introduction to OpenCL Programming)

OpenCL

Based on ISO C99 with some extensions & restrictions

provides parallel computing using task-based and data-based parallelism

Architecture Host Program Kernel

Program Architecture

Host ProgramExecutes on the host systemSends kernels to execute on OpenCL™ devices using command queue.

KernelsSimilar to C function.Executed on OpenCL™ devices ( GPU).

Thank You

Documents

Parallelization of System Matrix generation code Mahmoud Abdallah Antall Fernandes