57
FPGA Accelerated 3-D Tomography Richard Dorrance Progress Update: 09/07/12

FPGA Accelerated 3-D Tomography

  • Upload
    abba

  • View
    47

  • Download
    0

Embed Size (px)

DESCRIPTION

FPGA Accelerated 3-D Tomography. Richard Dorrance Progress Update: 09/07/12. Outline. Introduction to Tomography Reconstruction Methods Analytical Backprojection Filtered Backprojection Algebraic Algebraic Reconstruction Technique (ART) - PowerPoint PPT Presentation

Citation preview

Page 1: FPGA Accelerated 3-D Tomography

FPGA Accelerated3-D Tomography

Richard Dorrance

Progress Update: 09/07/12

Page 2: FPGA Accelerated 3-D Tomography

2

Outline Introduction to Tomography

Reconstruction Methods– Analytical

o Backprojectiono Filtered Backprojection

– Algebraico Algebraic Reconstruction Technique (ART)o Simultaneous Iterative Reconstruction Technique (SIRT)o Simultaneous Algebraic Reconstruction Technique (SART)

Modeling Performance of Reconstruction Methods

Future Work

Page 3: FPGA Accelerated 3-D Tomography

3

Tomography Cross-sectional imaging technique using transmission

or reflection data from multiple angles

Basis for CAT scan, MRI,PET, SPECT, ET, etc.

Computed Tomography (CT):A form of tomographic reconstruction on computers

Page 4: FPGA Accelerated 3-D Tomography

4

Cross-Sections by X-Ray Projections Project X-ray through biological tissue;

measure total absorption of ray by tissue

Projection Pθ(t) is the Radontransform of object functionf(x,y):

Total set of projections calledsinogram

, cos sinP t f x y x y t dxdy

Page 5: FPGA Accelerated 3-D Tomography

5

Shepp-Logan Phantom Standard test image for tomographic reconstructions

Page 6: FPGA Accelerated 3-D Tomography

6

Example Image with Projections

1 3 1

1 2 1

4 1 1

5

4

6

14

42

46 6 312

74

1

Page 7: FPGA Accelerated 3-D Tomography

7

CT Reconstruction Restore image from projection data

Inverse Radon transform

Most common algorithm is filtered backprojection– “Smear” each projection over image plane

Accuracy of reconstruction depends on the number of detectors and projection angles

Original 4 Angles 16 Angles 64 Angles 256 Angles

Page 8: FPGA Accelerated 3-D Tomography

8

Analytical Reconstruction Methods (Filtered) Backprojection Pseudo Code:

– Input: sinogram sino(θ, N)– Output: image img(x,y)

for each θ

filter sino(θ,:) ; only for FBP

for each x

for each y

n = x*cos(θ) + y*sin(θ)

img(x,y) = sino(θ,n) + img(x,y)

Page 9: FPGA Accelerated 3-D Tomography

9

Backprojection (Step 1)

0 0 0

0 0 0

0 0 0

5

4

6

14

42

46 6 312

74

1

Page 10: FPGA Accelerated 3-D Tomography

10

Backprojection (Step 2)

5 5 5

4 4 4

6 6 6

5

4

6

14

42

46 6 312

74

1

Page 11: FPGA Accelerated 3-D Tomography

11

Backprojection (Step 3)

9 9 6

6 8 8

10 8 10

5

4

6

14

42

46 6 312

74

1

Page 12: FPGA Accelerated 3-D Tomography

12

Backprojection (Step 4)

15 15 9

12 14 11

16 14 13

5

4

6

14

42

46 6 312

74

1

Page 13: FPGA Accelerated 3-D Tomography

13

Backprojection (Step 5)

16 19 16

16 21 13

23 16 14

5

4

6

14

42

46 6 312

74

1

Page 14: FPGA Accelerated 3-D Tomography

14

Backprojection vs. Original Final Step: normalize image power

– Divide each pixel by θ·N

1.33 1.58 1.33

1.33 1.75 1.08

1.92 1.33 1.17

1 3 1

1 2 1

4 1 1

Page 15: FPGA Accelerated 3-D Tomography

15

Note On Filtering

No Filtering With Filtering

Page 16: FPGA Accelerated 3-D Tomography

16

Filtered Backprojection (Step 1)

0 0 0

0 0 0

0 0 0

1.22

-1.220.61

0.39-0.84

1.061.16 0.49 0-0.11-0.84

1.55-0.06

-0.55

-0.73

1.61

Page 17: FPGA Accelerated 3-D Tomography

17

Filtered Backprojection (Step 2)

1.22 1.22 1.22

-0.73 -0.73 -0.73

1.61 1.61 1.61

1.22

-1.220.61

0.39-0.84

1.061.16 0.49 0-0.11-0.84

1.55-0.06

-0.55

-0.73

1.61

Page 18: FPGA Accelerated 3-D Tomography

18

Filtered Backprojection (Step 3)

1.61 1.83 0

-1.57 -0.34 -0.12

2.67 0.77 2

1.22

-1.220.61

0.39-0.84

1.061.16 0.49 0-0.11-0.84

1.55-0.06

-0.55

-0.73

1.61

Page 19: FPGA Accelerated 3-D Tomography

19

Filtered Backprojection (Step 4)

0.45 2.32 0

-0.41 0.15 -0.12

3.83 1.26 2

1.22

-1.220.61

0.39-0.84

1.061.16 0.49 0-0.11-0.84

1.55-0.06

-0.55

-0.73

1.61

Page 20: FPGA Accelerated 3-D Tomography

20

Filtered Backprojection (Step 5)

-0.1 2.26 1.55

-0.47 1.7 -0.96

5.38 0.42 1.89

1.22

-1.220.61

0.39-0.84

1.061.16 0.49 0-0.11-0.84

1.55-0.06

-0.55

-0.73

1.61

Page 21: FPGA Accelerated 3-D Tomography

21

Filtered Backprojection vs. Original

-0.1 2.26 1.55

-0.47 1.7 -0.96

5.38 0.42 1.89

1 3 1

1 2 1

4 1 1

Page 22: FPGA Accelerated 3-D Tomography

22

Conventional Algebraic Reconstruction Methods

Page 23: FPGA Accelerated 3-D Tomography

23

Problem Formulation We want to formulate it as a Linear Inverse Problem:

Where x is a column vector of length N2 representing the pixels of the original image, A is an M by N2 matrix representing the data acquisition process, and b is a column vector of length M representing the measured projection data.

We want to find a solution such that:

bAx

bAx left1

Page 24: FPGA Accelerated 3-D Tomography

24

Notes on the Discretized Image x The discretized image is denoted by:

and by:

where x is obtained by stacking the columns of X.

NNX R

12

Rvec NXx

Page 25: FPGA Accelerated 3-D Tomography

25

Notes on the projection data b There are a total of d detectors and θ projection angles,

so that a total of M = d · θ are used.

Then the measured projection data is denoted by:

and by:

where b is obtained by stacking the columns of B.

dB R

11 RRvec MdBb

Page 26: FPGA Accelerated 3-D Tomography

26

Notes on the Acquisition Matrix A The acquisition of projection data b from x is modeled

by:

where:

ai,j is the contribution of pixel j to projection i.

Also, let:

be a column matrix that represents the ith ray which computes the value of the ith projection.

2

R NMA

M.,,,ixabN

jjjii 21,

2

1,

Ti iAA :,

Page 27: FPGA Accelerated 3-D Tomography

27

Iterative Reconstruction Algorithm Let x(k) denote the kth estimation of the reconstruction.

Then:

where the relaxation factor λ is a scalar.

bAxAxx kTkk 1

Page 28: FPGA Accelerated 3-D Tomography

28

Proof of Convergence [1] Let

Then

AAI T

bAIx

bAIx

bAxx

Tkk

Tk

Tkk

01

12

1

Page 29: FPGA Accelerated 3-D Tomography

29

Proof of Convergence [2] If ATA is positive definite and λ is chosen so that the

spectral radius of Δ is less than 1, then:

and

0lim 1

k

k

1lim

IIk

k

Page 30: FPGA Accelerated 3-D Tomography

30

Proof of Convergence [3] Therefore:

bA

bAAA

bAIx

left

TT

Tk

k

1

1

1lim

Page 31: FPGA Accelerated 3-D Tomography

31

# of Projections needed for ART Reconstruction on a square grid (N×N) with N detectors Assuming a circular reconstruction region, we can

ignore pixels outside this region

pixels 4

2N

4

4

detectors of #

pixels of # 2 N

N

NART

Page 32: FPGA Accelerated 3-D Tomography

32

# of Projections needed for FBP [1] Reconstructing region with diameter L

Sampling interval is at least:

with a maximum frequency of:

Due to polar sampling,the density of samplesdecreases as we gooutward on the polar grid

NL

1

L2

1max

Page 33: FPGA Accelerated 3-D Tomography

33

# of Projections needed for FBP [2] To ensure a sampling rate of at least Δω everywhere:

therefore:

2

NFBP

NL

NL 2

21

1

max

Page 34: FPGA Accelerated 3-D Tomography

34

Matrix Formulation with Normalization Introduce diagonal matrices V and W:

V: diagonal matrix of theinverse of the row sums

W: diagonal matrix of theinverse of the column sums

bAxWVAxx kTkk 1

2

1,

,

1N

jji

iii

a

VV

M

iji

jjj

aWW

1,

,

1

Page 35: FPGA Accelerated 3-D Tomography

35

Reconstruction Methods Algebraic Reconstruction Technique

– Update image after each ray is processed

Simultaneous Iterative Reconstruction Technique– Update image after all rays are processed

Simultaneous Algebraic Reconstruction Technique– Update image after all rays in a single projection angle

are processed

Page 36: FPGA Accelerated 3-D Tomography

36

ART Image update method:

– After each ray is processed

Pseudocode:

for k = 1:K

for i = 1:M

end

end

iiTiii

ii bxAWAVxxi

1

1 ik xx

Page 37: FPGA Accelerated 3-D Tomography

37

ART (Iterations 1-6)

1 3.03 1.06

0.97 2 1.03

3.94 0.97 1

1 2.99 0.98

1.01 2 0.99

4.02 1.01 1

1 3 1

1 2 1

4 1 1

Iteration 4 Iteration 5 Iteration 6

Iteration 1 Iteration 2 Iteration 3

1 3 0.83

1 1.83 0.75

4.33 1.25 1

1 3 1

1 2 1

4 1 1

1 3 1

1 2 1

4 1 1

Page 38: FPGA Accelerated 3-D Tomography

38

SIRT Image update method:

– After all rays are processed

Pseudocode:

for k = 1:K

end

bAxWVAxx kTkk 1

Page 39: FPGA Accelerated 3-D Tomography

39

SIRT (Iterations 1-6, λ = 0.5)

0.67 3.5 0.66

0.83 2.17 0.33

5.83 0.67 0.33

0.78 3.43 0.86

0.76 2.08 1.01

4.28 0.85 0.94

0.94 3.2 0.91

0.87 2.04 0.99

4.12 0.91 1.02

0.97 3.1 0.95

0.94 2.02 1

4.05 0.96 1.01

0.99 3.05 0.97

0.97 2.01 1

4.03 0.98 1.01

Iteration 4 Iteration 5 Iteration 6

Iteration 1 Iteration 2 Iteration 3

0.99 3.03 0.99

0.98 2.01 1

4.01 0.99 1

Page 40: FPGA Accelerated 3-D Tomography

40

SART Image update method:

– After all rays in a single projection angle are processed

Pseudocode:

for k = 1:K

for θ = 1:Θ

end

end

bxAWAVxx T 1

1 xx k

Page 41: FPGA Accelerated 3-D Tomography

41

SART (Step 1, Iteration 1, Theta 1)

1.67 1.67 1.67

1.67 1.67 1.67

1.67 1.67 1.67

5

5

5

Page 42: FPGA Accelerated 3-D Tomography

42

SART (Step 2, Iteration 1, Theta 1)

1.67 1.67 1.67

1.33 1.33 1.33

2 2 2

03

55

33.03

54

33.03

56

Page 43: FPGA Accelerated 3-D Tomography

43

SART (Step 1, Iteration 1, Theta 2)

1.67 1.67 1.67

1.33 1.33 1.33

2 2 2

67.13

533.3

2

Page 44: FPGA Accelerated 3-D Tomography

44

SART (Step 2, Iteration 1, Theta 2)

1.33 2.17 1

0.67 1 1.83

4 1.33 1.67

67.0

1 67.1

1

5.0

2 34

33.0

3 54

67.0

2 33.3

2

2

1 24

Page 45: FPGA Accelerated 3-D Tomography

45

SART (Step 1, Iteration 1, Theta 3)

6 5.4

1.33 2.17 1

0.67 1 1.83

4 1.33 1.67

5.4

Page 46: FPGA Accelerated 3-D Tomography

46

SART (Step 2, Iteration 1, Theta 3)

1.33 2.67 0.5

0.67 1.5 1.33

4 1.83 1.17

03

66

5.0

3

5.4

3

5.0

3

5.4

6

Page 47: FPGA Accelerated 3-D Tomography

47

SART (Step 1, Iteration 1, Theta 4)

617.3

17.1

33.333.1

1.33 2.67 0.5

0.67 1.5 1.33

4 1.83 1.17

Page 48: FPGA Accelerated 3-D Tomography

48

ART (Step 2, Iteration 1, Theta 4)

1 3 0.83

1 1.83 0.75

4.33 1.25 1

33.0

1

33.11

33.0

2

33.34

33.0

3

67

59.0

2

17.32

17.0

1

17.11

Page 49: FPGA Accelerated 3-D Tomography

49

SART (Iterations 1-6)

1 3.03 1.06

0.97 2 1.03

3.94 0.97 1

1 2.99 0.98

1.01 2 0.99

4.02 1.01 1

1 3 1

1 2 1

4 1 1

Iteration 4 Iteration 5 Iteration 6

Iteration 1 Iteration 2 Iteration 3

1 3 0.83

1 1.83 0.75

4.33 1.25 1

1 3 1

1 2 1

4 1 1

1 3 1

1 2 1

4 1 1

Page 50: FPGA Accelerated 3-D Tomography

50

Modeling Performance (CPU, GPU, FPGA) Write C pseudo code for Matrix-Vector multiplication

and Vector-Vector addition

Convert C pseudo code to application specific pseudo code (CPU = x86, GPU = OpenCL/CUDA)

Model latency and throughput of pseudo code given:– CPU architecture:

o Cache structure, freq., total # of threads, etc…

– Image reconstruction problem:o N, d, θ, A matrix sparsity (α), # of iterations, etc…

Page 51: FPGA Accelerated 3-D Tomography

51

C Pseudo Code (Ax = b)float btemp;

float *Apos = &A[0][0];

for(int i=0; i<M; i++)

{

float *xpos = &x[0];

btemp=0;

for(int j=0; j<N; j++)

{

btemp += (*Apos++) * (*xpos++);

}

b[i] = btemp;

}

Page 52: FPGA Accelerated 3-D Tomography

52

x86 Pseudo Code (Ax = b)loop_i: ;

fldz ; btemp = 0

mov eax, hXXXX ; j = M

loop_j: ;

fld dword ptr [edx] ; A_ij

add edx, 4 ; Apos++

fmul dword ptr [ecx] ; A_ij*x_j

add ecx, 4 ; xpos++

faddp st(1), st ; btemp = btemp + A_ij*x_j

dec eax ; j--

jnz short loop_j; loop if j~=0

fst dword [ebx] ; b_i = btemp

add ebx, 4 ; bpos++

dec esi ; i--

jnz short loop_i; loop if i~=0

Page 53: FPGA Accelerated 3-D Tomography

53

Results for CPUs [1]Processor Xeon E5405 [1] Xeon E5405 [1] Xeon E5405 [1] Xeon E5405 [1]

Architecture Penryn Penryn Penryn PenrynOperating Frequency 2.00 GHz 2.00 GHz 2.00 GHz 2.00 GHzNumber of Cores 4 4 4 4Number of Threads per Core 1 1 1 1Total Threads Used 1 1 1 1

Reconstruction Specifics Number of Pixels (NxN) 1024x1024 1024x1024 512x512 512x512Number of Dectectors (D) 1024 1024 512 512Number of Angles (θ) 140 140 140 140Matrix Sparsity (α) 0.098% 0.098% 0.195% 0.195%Number of Iterations 30 30 30 30Loop Unrolling Yes Yes Yes YesSIMD or Floating Point? Floating Point SIMD Floating Point SIMD

Reconstruction Time Reported [s] 24.174 6.639 6.087 1.650Estimated [s] 22.478 6.307 5.613 1.570Accuracy [%] 92.982% 94.987% 92.214% 95.180%

[1] J.I. Agulleiro, E.M. Garzon, I. Garcia, J.J. Fernandez, "Multi-core Desktop Processors Make Possible Real-Time Electron Tomography," 2011 19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp.127-132, Feb. 2011.

Page 54: FPGA Accelerated 3-D Tomography

54

Results for CPUs [2]Processor Xeon 3.4 [2] Xeon 3.4 [2] Xeon 3.4 [2]

Architecture NetBurst NetBurst NetBurstOperating Frequency 3.40 GHz 3.40 GHz 3.40 GHzNumber of Cores 1 1 1Number of Threads per Core 1 1 1Total Threads Used 1 1 1

Reconstruction Specifics Number of Pixels (NxN) 2048x2048 1024x1024 512x512Number of Dectectors (D) 2048 1024 512Number of Angles (θ) 88 88 88Matrix Sparsity (α) 0.049% 0.195% 0.977%Number of Iterations 10 10 10Loop Unrolling Yes Yes YesSIMD or Floating Point? Floating Point Floating Point Floating Point

Reconstruction Time Reported [s] 4.512 2.227 1.336Estimated [s] 5.488 2.558 1.509Accuracy [%] 121.630% 114.875% 112.953%

[2] D.C. Diez, H. Mueller, A.S. Frangakis, "Implementation and Performance Evaluation of Reconstruction Algorithms on Graphics Processors," Journal of Structural Biology, vol. 157, no. 1, pp. 288-295, Jan. 2007.

Page 55: FPGA Accelerated 3-D Tomography

55

Results for CPUs [3]Processor P4 2.40A [2] P4 2.40A [2] P4 2.40A [2]

Architecture Prescott Prescott PrescottOperating Frequency 2.40 GHz 2.40 GHz 2.40 GHzNumber of Cores 1 1 1Number of Threads per Core 2 2 2Total Threads Used 2 2 2

Reconstruction Specifics Number of Pixels (NxN) 2048x2048 1024x1024 512x512Number of Dectectors (D) 2048 1024 512Number of Angles (θ) 88 88 88Matrix Sparsity (α) 0.049% 0.195% 0.977%Number of Iterations 10 10 10Loop Unrolling Yes Yes YesSIMD or Floating Point? Floating Point Floating Point Floating Point

Reconstruction Time Reported [s] 5.449 2.637 1.609Estimated [s] 5.687 2.542 1.448Accuracy [%] 104.355% 96.401% 89.958%

[2] D.C. Diez, H. Mueller, A.S. Frangakis, "Implementation and Performance Evaluation of Reconstruction Algorithms on Graphics Processors," Journal of Structural Biology, vol. 157, no. 1, pp. 288-295, Jan. 2007.

Page 56: FPGA Accelerated 3-D Tomography

56

Results for CPUs [4]Processor 2x X5550 [3] 4x X7460 [3] 4x X7560 [3]

Architecture Nehalem Core NehalemOperating Frequency 2.66 GHz 2.66 GHz 2.27 GHzNumber of Cores 4 6 8Number of Threads per Core 2 1 2Total Threads Used 16 24 64

Reconstruction Specifics Number of Pixels (NxN) 512x512 512x512 512x512Number of Dectectors (D) 512 512 512Number of Angles (θ) 414 414 414Matrix Sparsity (α) 0.391% 0.391% 0.391%Number of Iterations 1 1 1Loop Unrolling No No NoSIMD or Floating Point? Floating Point Floating Point Floating Point

Reconstruction Time Reported [s] 0.138 0.099 0.045Estimated [s] 0.150 0.098 0.044Accuracy [%] 109.059% 98.765% 97.041%

[3] H.G. Hofmann, B. Keck, C. Rohkohl, J. Hornegger, "Comparing Performance of Many-core CPUs and GPUs for Static and Motion Compensated Reconstruction of C-arm CT Data," Medical Physics, vol. 38, no 1, pp. 468-473, Jan. 2011.

Page 57: FPGA Accelerated 3-D Tomography

57

Future Work Modeling performance of SART on GPUs and FPGAs