Upload
abba
View
47
Download
0
Tags:
Embed Size (px)
DESCRIPTION
FPGA Accelerated 3-D Tomography. Richard Dorrance Progress Update: 09/07/12. Outline. Introduction to Tomography Reconstruction Methods Analytical Backprojection Filtered Backprojection Algebraic Algebraic Reconstruction Technique (ART) - PowerPoint PPT Presentation
Citation preview
FPGA Accelerated3-D Tomography
Richard Dorrance
Progress Update: 09/07/12
2
Outline Introduction to Tomography
Reconstruction Methods– Analytical
o Backprojectiono Filtered Backprojection
– Algebraico Algebraic Reconstruction Technique (ART)o Simultaneous Iterative Reconstruction Technique (SIRT)o Simultaneous Algebraic Reconstruction Technique (SART)
Modeling Performance of Reconstruction Methods
Future Work
3
Tomography Cross-sectional imaging technique using transmission
or reflection data from multiple angles
Basis for CAT scan, MRI,PET, SPECT, ET, etc.
Computed Tomography (CT):A form of tomographic reconstruction on computers
4
Cross-Sections by X-Ray Projections Project X-ray through biological tissue;
measure total absorption of ray by tissue
Projection Pθ(t) is the Radontransform of object functionf(x,y):
Total set of projections calledsinogram
, cos sinP t f x y x y t dxdy
5
Shepp-Logan Phantom Standard test image for tomographic reconstructions
6
Example Image with Projections
1 3 1
1 2 1
4 1 1
5
4
6
14
42
46 6 312
74
1
7
CT Reconstruction Restore image from projection data
Inverse Radon transform
Most common algorithm is filtered backprojection– “Smear” each projection over image plane
Accuracy of reconstruction depends on the number of detectors and projection angles
Original 4 Angles 16 Angles 64 Angles 256 Angles
8
Analytical Reconstruction Methods (Filtered) Backprojection Pseudo Code:
– Input: sinogram sino(θ, N)– Output: image img(x,y)
for each θ
filter sino(θ,:) ; only for FBP
for each x
for each y
n = x*cos(θ) + y*sin(θ)
img(x,y) = sino(θ,n) + img(x,y)
9
Backprojection (Step 1)
0 0 0
0 0 0
0 0 0
5
4
6
14
42
46 6 312
74
1
10
Backprojection (Step 2)
5 5 5
4 4 4
6 6 6
5
4
6
14
42
46 6 312
74
1
11
Backprojection (Step 3)
9 9 6
6 8 8
10 8 10
5
4
6
14
42
46 6 312
74
1
12
Backprojection (Step 4)
15 15 9
12 14 11
16 14 13
5
4
6
14
42
46 6 312
74
1
13
Backprojection (Step 5)
16 19 16
16 21 13
23 16 14
5
4
6
14
42
46 6 312
74
1
14
Backprojection vs. Original Final Step: normalize image power
– Divide each pixel by θ·N
1.33 1.58 1.33
1.33 1.75 1.08
1.92 1.33 1.17
1 3 1
1 2 1
4 1 1
15
Note On Filtering
No Filtering With Filtering
16
Filtered Backprojection (Step 1)
0 0 0
0 0 0
0 0 0
1.22
-1.220.61
0.39-0.84
1.061.16 0.49 0-0.11-0.84
1.55-0.06
-0.55
-0.73
1.61
17
Filtered Backprojection (Step 2)
1.22 1.22 1.22
-0.73 -0.73 -0.73
1.61 1.61 1.61
1.22
-1.220.61
0.39-0.84
1.061.16 0.49 0-0.11-0.84
1.55-0.06
-0.55
-0.73
1.61
18
Filtered Backprojection (Step 3)
1.61 1.83 0
-1.57 -0.34 -0.12
2.67 0.77 2
1.22
-1.220.61
0.39-0.84
1.061.16 0.49 0-0.11-0.84
1.55-0.06
-0.55
-0.73
1.61
19
Filtered Backprojection (Step 4)
0.45 2.32 0
-0.41 0.15 -0.12
3.83 1.26 2
1.22
-1.220.61
0.39-0.84
1.061.16 0.49 0-0.11-0.84
1.55-0.06
-0.55
-0.73
1.61
20
Filtered Backprojection (Step 5)
-0.1 2.26 1.55
-0.47 1.7 -0.96
5.38 0.42 1.89
1.22
-1.220.61
0.39-0.84
1.061.16 0.49 0-0.11-0.84
1.55-0.06
-0.55
-0.73
1.61
21
Filtered Backprojection vs. Original
-0.1 2.26 1.55
-0.47 1.7 -0.96
5.38 0.42 1.89
1 3 1
1 2 1
4 1 1
22
Conventional Algebraic Reconstruction Methods
23
Problem Formulation We want to formulate it as a Linear Inverse Problem:
Where x is a column vector of length N2 representing the pixels of the original image, A is an M by N2 matrix representing the data acquisition process, and b is a column vector of length M representing the measured projection data.
We want to find a solution such that:
bAx
bAx left1
24
Notes on the Discretized Image x The discretized image is denoted by:
and by:
where x is obtained by stacking the columns of X.
NNX R
12
Rvec NXx
25
Notes on the projection data b There are a total of d detectors and θ projection angles,
so that a total of M = d · θ are used.
Then the measured projection data is denoted by:
and by:
where b is obtained by stacking the columns of B.
dB R
11 RRvec MdBb
26
Notes on the Acquisition Matrix A The acquisition of projection data b from x is modeled
by:
where:
ai,j is the contribution of pixel j to projection i.
Also, let:
be a column matrix that represents the ith ray which computes the value of the ith projection.
2
R NMA
M.,,,ixabN
jjjii 21,
2
1,
Ti iAA :,
27
Iterative Reconstruction Algorithm Let x(k) denote the kth estimation of the reconstruction.
Then:
where the relaxation factor λ is a scalar.
bAxAxx kTkk 1
28
Proof of Convergence [1] Let
Then
AAI T
bAIx
bAIx
bAxx
Tkk
Tk
Tkk
01
12
1
29
Proof of Convergence [2] If ATA is positive definite and λ is chosen so that the
spectral radius of Δ is less than 1, then:
and
0lim 1
k
k
1lim
IIk
k
30
Proof of Convergence [3] Therefore:
bA
bAAA
bAIx
left
TT
Tk
k
1
1
1lim
31
# of Projections needed for ART Reconstruction on a square grid (N×N) with N detectors Assuming a circular reconstruction region, we can
ignore pixels outside this region
pixels 4
2N
4
4
detectors of #
pixels of # 2 N
N
NART
32
# of Projections needed for FBP [1] Reconstructing region with diameter L
Sampling interval is at least:
with a maximum frequency of:
Due to polar sampling,the density of samplesdecreases as we gooutward on the polar grid
NL
1
L2
1max
33
# of Projections needed for FBP [2] To ensure a sampling rate of at least Δω everywhere:
therefore:
2
NFBP
NL
NL 2
21
1
max
34
Matrix Formulation with Normalization Introduce diagonal matrices V and W:
V: diagonal matrix of theinverse of the row sums
W: diagonal matrix of theinverse of the column sums
bAxWVAxx kTkk 1
2
1,
,
1N
jji
iii
a
VV
M
iji
jjj
aWW
1,
,
1
35
Reconstruction Methods Algebraic Reconstruction Technique
– Update image after each ray is processed
Simultaneous Iterative Reconstruction Technique– Update image after all rays are processed
Simultaneous Algebraic Reconstruction Technique– Update image after all rays in a single projection angle
are processed
36
ART Image update method:
– After each ray is processed
Pseudocode:
for k = 1:K
for i = 1:M
end
end
iiTiii
ii bxAWAVxxi
1
1 ik xx
37
ART (Iterations 1-6)
1 3.03 1.06
0.97 2 1.03
3.94 0.97 1
1 2.99 0.98
1.01 2 0.99
4.02 1.01 1
1 3 1
1 2 1
4 1 1
Iteration 4 Iteration 5 Iteration 6
Iteration 1 Iteration 2 Iteration 3
1 3 0.83
1 1.83 0.75
4.33 1.25 1
1 3 1
1 2 1
4 1 1
1 3 1
1 2 1
4 1 1
38
SIRT Image update method:
– After all rays are processed
Pseudocode:
for k = 1:K
end
bAxWVAxx kTkk 1
39
SIRT (Iterations 1-6, λ = 0.5)
0.67 3.5 0.66
0.83 2.17 0.33
5.83 0.67 0.33
0.78 3.43 0.86
0.76 2.08 1.01
4.28 0.85 0.94
0.94 3.2 0.91
0.87 2.04 0.99
4.12 0.91 1.02
0.97 3.1 0.95
0.94 2.02 1
4.05 0.96 1.01
0.99 3.05 0.97
0.97 2.01 1
4.03 0.98 1.01
Iteration 4 Iteration 5 Iteration 6
Iteration 1 Iteration 2 Iteration 3
0.99 3.03 0.99
0.98 2.01 1
4.01 0.99 1
40
SART Image update method:
– After all rays in a single projection angle are processed
Pseudocode:
for k = 1:K
for θ = 1:Θ
end
end
bxAWAVxx T 1
1 xx k
41
SART (Step 1, Iteration 1, Theta 1)
1.67 1.67 1.67
1.67 1.67 1.67
1.67 1.67 1.67
5
5
5
42
SART (Step 2, Iteration 1, Theta 1)
1.67 1.67 1.67
1.33 1.33 1.33
2 2 2
03
55
33.03
54
33.03
56
43
SART (Step 1, Iteration 1, Theta 2)
1.67 1.67 1.67
1.33 1.33 1.33
2 2 2
67.13
533.3
2
44
SART (Step 2, Iteration 1, Theta 2)
1.33 2.17 1
0.67 1 1.83
4 1.33 1.67
67.0
1 67.1
1
5.0
2 34
33.0
3 54
67.0
2 33.3
2
2
1 24
45
SART (Step 1, Iteration 1, Theta 3)
6 5.4
1.33 2.17 1
0.67 1 1.83
4 1.33 1.67
5.4
46
SART (Step 2, Iteration 1, Theta 3)
1.33 2.67 0.5
0.67 1.5 1.33
4 1.83 1.17
03
66
5.0
3
5.4
3
5.0
3
5.4
6
47
SART (Step 1, Iteration 1, Theta 4)
617.3
17.1
33.333.1
1.33 2.67 0.5
0.67 1.5 1.33
4 1.83 1.17
48
ART (Step 2, Iteration 1, Theta 4)
1 3 0.83
1 1.83 0.75
4.33 1.25 1
33.0
1
33.11
33.0
2
33.34
33.0
3
67
59.0
2
17.32
17.0
1
17.11
49
SART (Iterations 1-6)
1 3.03 1.06
0.97 2 1.03
3.94 0.97 1
1 2.99 0.98
1.01 2 0.99
4.02 1.01 1
1 3 1
1 2 1
4 1 1
Iteration 4 Iteration 5 Iteration 6
Iteration 1 Iteration 2 Iteration 3
1 3 0.83
1 1.83 0.75
4.33 1.25 1
1 3 1
1 2 1
4 1 1
1 3 1
1 2 1
4 1 1
50
Modeling Performance (CPU, GPU, FPGA) Write C pseudo code for Matrix-Vector multiplication
and Vector-Vector addition
Convert C pseudo code to application specific pseudo code (CPU = x86, GPU = OpenCL/CUDA)
Model latency and throughput of pseudo code given:– CPU architecture:
o Cache structure, freq., total # of threads, etc…
– Image reconstruction problem:o N, d, θ, A matrix sparsity (α), # of iterations, etc…
51
C Pseudo Code (Ax = b)float btemp;
float *Apos = &A[0][0];
for(int i=0; i<M; i++)
{
float *xpos = &x[0];
btemp=0;
for(int j=0; j<N; j++)
{
btemp += (*Apos++) * (*xpos++);
}
b[i] = btemp;
}
52
x86 Pseudo Code (Ax = b)loop_i: ;
fldz ; btemp = 0
mov eax, hXXXX ; j = M
loop_j: ;
fld dword ptr [edx] ; A_ij
add edx, 4 ; Apos++
fmul dword ptr [ecx] ; A_ij*x_j
add ecx, 4 ; xpos++
faddp st(1), st ; btemp = btemp + A_ij*x_j
dec eax ; j--
jnz short loop_j; loop if j~=0
fst dword [ebx] ; b_i = btemp
add ebx, 4 ; bpos++
dec esi ; i--
jnz short loop_i; loop if i~=0
53
Results for CPUs [1]Processor Xeon E5405 [1] Xeon E5405 [1] Xeon E5405 [1] Xeon E5405 [1]
Architecture Penryn Penryn Penryn PenrynOperating Frequency 2.00 GHz 2.00 GHz 2.00 GHz 2.00 GHzNumber of Cores 4 4 4 4Number of Threads per Core 1 1 1 1Total Threads Used 1 1 1 1
Reconstruction Specifics Number of Pixels (NxN) 1024x1024 1024x1024 512x512 512x512Number of Dectectors (D) 1024 1024 512 512Number of Angles (θ) 140 140 140 140Matrix Sparsity (α) 0.098% 0.098% 0.195% 0.195%Number of Iterations 30 30 30 30Loop Unrolling Yes Yes Yes YesSIMD or Floating Point? Floating Point SIMD Floating Point SIMD
Reconstruction Time Reported [s] 24.174 6.639 6.087 1.650Estimated [s] 22.478 6.307 5.613 1.570Accuracy [%] 92.982% 94.987% 92.214% 95.180%
[1] J.I. Agulleiro, E.M. Garzon, I. Garcia, J.J. Fernandez, "Multi-core Desktop Processors Make Possible Real-Time Electron Tomography," 2011 19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp.127-132, Feb. 2011.
54
Results for CPUs [2]Processor Xeon 3.4 [2] Xeon 3.4 [2] Xeon 3.4 [2]
Architecture NetBurst NetBurst NetBurstOperating Frequency 3.40 GHz 3.40 GHz 3.40 GHzNumber of Cores 1 1 1Number of Threads per Core 1 1 1Total Threads Used 1 1 1
Reconstruction Specifics Number of Pixels (NxN) 2048x2048 1024x1024 512x512Number of Dectectors (D) 2048 1024 512Number of Angles (θ) 88 88 88Matrix Sparsity (α) 0.049% 0.195% 0.977%Number of Iterations 10 10 10Loop Unrolling Yes Yes YesSIMD or Floating Point? Floating Point Floating Point Floating Point
Reconstruction Time Reported [s] 4.512 2.227 1.336Estimated [s] 5.488 2.558 1.509Accuracy [%] 121.630% 114.875% 112.953%
[2] D.C. Diez, H. Mueller, A.S. Frangakis, "Implementation and Performance Evaluation of Reconstruction Algorithms on Graphics Processors," Journal of Structural Biology, vol. 157, no. 1, pp. 288-295, Jan. 2007.
55
Results for CPUs [3]Processor P4 2.40A [2] P4 2.40A [2] P4 2.40A [2]
Architecture Prescott Prescott PrescottOperating Frequency 2.40 GHz 2.40 GHz 2.40 GHzNumber of Cores 1 1 1Number of Threads per Core 2 2 2Total Threads Used 2 2 2
Reconstruction Specifics Number of Pixels (NxN) 2048x2048 1024x1024 512x512Number of Dectectors (D) 2048 1024 512Number of Angles (θ) 88 88 88Matrix Sparsity (α) 0.049% 0.195% 0.977%Number of Iterations 10 10 10Loop Unrolling Yes Yes YesSIMD or Floating Point? Floating Point Floating Point Floating Point
Reconstruction Time Reported [s] 5.449 2.637 1.609Estimated [s] 5.687 2.542 1.448Accuracy [%] 104.355% 96.401% 89.958%
[2] D.C. Diez, H. Mueller, A.S. Frangakis, "Implementation and Performance Evaluation of Reconstruction Algorithms on Graphics Processors," Journal of Structural Biology, vol. 157, no. 1, pp. 288-295, Jan. 2007.
56
Results for CPUs [4]Processor 2x X5550 [3] 4x X7460 [3] 4x X7560 [3]
Architecture Nehalem Core NehalemOperating Frequency 2.66 GHz 2.66 GHz 2.27 GHzNumber of Cores 4 6 8Number of Threads per Core 2 1 2Total Threads Used 16 24 64
Reconstruction Specifics Number of Pixels (NxN) 512x512 512x512 512x512Number of Dectectors (D) 512 512 512Number of Angles (θ) 414 414 414Matrix Sparsity (α) 0.391% 0.391% 0.391%Number of Iterations 1 1 1Loop Unrolling No No NoSIMD or Floating Point? Floating Point Floating Point Floating Point
Reconstruction Time Reported [s] 0.138 0.099 0.045Estimated [s] 0.150 0.098 0.044Accuracy [%] 109.059% 98.765% 97.041%
[3] H.G. Hofmann, B. Keck, C. Rohkohl, J. Hornegger, "Comparing Performance of Many-core CPUs and GPUs for Static and Motion Compensated Reconstruction of C-arm CT Data," Medical Physics, vol. 38, no 1, pp. 468-473, Jan. 2011.
57
Future Work Modeling performance of SART on GPUs and FPGAs