Upload
f
View
215
Download
1
Embed Size (px)
Citation preview
This article was downloaded by: [University of California, San Francisco]On: 30 September 2014, At: 01:25Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK
Parallel Algorithms and ApplicationsPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/gpaa19
An experiment of parallel SPECT data reconstructionE. Loli Piccolomini & F. Zamaa Department of Mathematics , University of Bologna , Piazza Porta S. Donato 5, 40127,Bologna, ItalyPublished online: 17 Oct 2011.
To cite this article: E. Loli Piccolomini & F. Zama (2003) An experiment of parallel SPECT data reconstruction , ParallelAlgorithms and Applications, 18:3, 107-119, DOI: 10.1080/1063719032000115335
To link to this article: http://dx.doi.org/10.1080/1063719032000115335
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
AN EXPERIMENT OF PARALLEL SPECT DATARECONSTRUCTION
E. LOLI PICCOLOMINI and F. ZAMA*
Department of Mathematics, University of Bologna, Piazza Porta S. Donato 5, 40127, Bologna, Italy
(Received 22 March 2001; In final form May 2002)
In this work we use a massively parallel architecture for solving the problem of reconstructing human brain sectionsfrom experimental data obtained from a Gamma camera equipped with parallel-hole collimators. We compute least-squares regularized solutions by means of weighted conjugate gradient iterations coupled with a suitable stoppingrule. The computations are distributed to the CRAY T3E parallel processors following two different decompositionstrategies obtaining high speed up values. This decomposition strategy can be easily extended to a wide family ofiterative reconstruction algebraic methods.
Keywords: SPECT; Reconstruction; NPES; Distributed memory computing; Conjugate gradients; Computedtomography
INTRODUCTION
Different computed tomography (CT) problems such as single photon emission tomography
(SPECT) and positrons emission tomography (PET) are modelled by first kind Fredholm
integral equations:
gðu; uÞ ¼
ðV
Kðu; r; uÞ f ðrÞdr ð1Þ
where the unknown function f represents phenomenon of emission in the coordinate points
r ¼ ðx; y; zÞ over the domain V , R3: The function g is the volume integral of the activity
weighted by the probability of detection and total attenuation at each point. In particular,
SPECT consists in the detection of the gamma rays emitted singly from radioactive atoms
(radionuclides) contained in the radiopharmaceutical injected into a particular organ to
determine its functional activity. The detector, called Gamma camera, rotates around the
object describing a circular orbit onto the plane x, y. The surface of the Gamma camera is
described by the coordinates u ¼ ðt; ‘Þ (see Fig. 1) and is parallel to the z axis. For each angle
0 # u # 2p; the photons that pass through the holes of the collimator, placed onto the
surface of the Gamma Camera, are counted and constitute the values of g. The kernel
function K weights the contribution of each point of the object f and models the physical
phenomena involved into the acquisition phase.
ISSN 1063-7192 print/ISSN 1029-032X online q 2003 Taylor & Francis Ltd
DOI: 10.1080/1063719032000115335
*Corresponding author. E-mail: [email protected]
Parallel Algorithms and Applications, Vol. 18 (3) September 2003, pp. 107–119
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a, S
an F
ranc
isco
] at
01:
25 3
0 Se
ptem
ber
2014
Fast and low storage direct methods, such as the Filtered Back Projection, based on a line-
integral model, are not suitable for CT problems with high noise rates such as SPECT [3,6,9].
The introduction of iterative model-based methods, such as Conjugate Gradient (CG)
[10,11], Expectation Maximization-Maximum Likelihood (EL-ML) [12], Ordered Subset
Expectation Maximization (OSEM) [13], presents a greater flexibility allowing better
modelling of acquisition and noise but requires greater computational burden in terms of
time and storage occupation. In this work we use the CG method, since it finds a good
solution with few iterations. The merits of the Conjugate Gradients iterations for the
solutions of tomographic problems will not be discussed here in detail. The comparisons with
other methods can be found in many papers such as [3,6,11].
Our purpose here is to evaluate two parallel algorithms on real data and to show the benefit,
in terms of faster reconstructions, that can be achieved using parallel distributed memory
architectures. As shown in [14], distributed memory parallel architectures can be usefully
applied to reduce the computational times necessary for clinical applications while preserving
accurate modelling. In this work we apply a decomposition technique to distribute data and
computations among several CRAY T3E processors. Since the communication primitives are
written using a free software such as Parallel Virtual Machine (PVM) [8], it is possible to run
the application in all the laboratories with a network of Workstations or PC with PVM
installed.
The model and the algorithm used to solve the discrete problem are analyzed in the second
section. In the third section we report details of the parallel implementation and in the fourth
section we show some meaningful results obtained in numerical experiments.
THE MATHEMATICAL MODEL
In the mathematical model used in this paper the kernel function in Eq. (1) accounts for
attenuation and geometrical system response:
Kðu; r; uÞ ¼ exp ð2mrpÞ·Gðu; r; uÞ:
FIGURE 1 Model of the acquisition for the 3D SPECT emission problem. The object to be reconstructed isrepresented by the sphere. The projection of one single point ðxr ; yr ; zrÞis represented by the coordinates ðlr ; trÞ ontothe rotated projection plane Tu.
E.L. PICCOLOMINI AND F. ZAMA108
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a, S
an F
ranc
isco
] at
01:
25 3
0 Se
ptem
ber
2014
The term exp ð2mrpÞ represents the total cross section attenuation and decreases the
number of counts from that which would have been if the activity was on air. It depends
on the value of p, which is the attenuation thickness between the collimator plate and the
point r, and on the value of the attenuation coefficient mr which is related to the tissue
examined.
The function G expresses the Gaussian response function:
Gðu; r; uÞ ¼1ffiffiffiffiffiffi
2pp
s ðqÞexp 2
Du
2s ðqÞ2
� �
where q is the distance of the point r ¼ ðx; y; zÞ from the Gamma camera collimator surface,
Du ¼ ðt 2 trÞ2 þ ð‘2 ‘rÞ
2
where:tr ¼ 2x sin uþ y cos u; ‘r ¼ z ð2Þ
i.e. ðtr; ‘rÞ represents the bin at the minimum distance from the object point r ¼ ðx; y; zÞ
(see Fig. 1). Substituting in Eq. (1) we obtain:
gðt; ‘; uÞ ¼
ðV
1ffiffiffiffiffiffi2p
ps ðqÞ
exp
�ðt 2 trÞ
2 þ ð‘2 ‘rÞ2
2s ðqÞ2
�exp ð2mrpÞ f ðrÞdr:
The discrete measures of the function g that we obtain in the acquisition phase are relative
to the bins, represented by ðtu; ‘vÞ :
tu ¼ udu; u ¼ 1; . . .;Nu; ‘v ¼ vdv; v ¼ 1; . . .;Nv;
ðdu; dvðNu;NvÞ are determined by the acquisition process) and to the Nu rotation angles un
where:un ¼ ndu; du ¼ 2p=Nu :
If we consider:
V ¼ ðx; y; zÞ : x [ 2rx; rx
� �; y [ 2ry; ry
� �; z [ 2rz; rz
� �� �and sample the object domain in points ðxi; yj; zkÞ where:
xi ¼ idx; i ¼ 1; . . .;Nx; dx ¼ rx=Nx
yj ¼ jdy; j ¼ 1; . . .;Ny; dy ¼ ry=Ny
zk ¼ kdz; k ¼ 1; . . .;Nz; dz ¼ rz=Nz
we obtain the equations:
gðtu; ‘v; unÞ ¼XNx
i¼1
XNy
j¼1
XNz
k¼1
1ffiffiffiffiffiffi2p
ps ðqÞ
exp 2ðtu 2 trÞ
2 þ ð‘v 2 ‘rÞ2
2s ðqÞ2
� �e2mrpf ðxi; yj; zkÞ:
We obtain a set of Nu·Nv·Ng linear equations in Nx·Ny·Nz unknowns:
Kf ¼ g: ð3Þ
PARALLEL SPECT RECONSTRUCTION 109
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a, S
an F
ranc
isco
] at
01:
25 3
0 Se
ptem
ber
2014
From Eq. (2) we observe that tr depends on xi; yj; un while ‘r ¼ zk and the distance q from
the gamma camera surface does not depend on zk. With the hypothesis that dz ¼ dv is
sufficiently large then
exp2 ð‘v 2 ‘rÞ
2�< 0 if ‘v – ‘r
and the model equations become ðk ¼ 1; . . .;NzÞ :
gðtu; zk; unÞ ¼XNx
i¼1
XNy
j¼1
1ffiffiffiffiffiffi2p
ps ðqÞ
exp 2ðtu 2 trÞ
2
2s ðqÞ2
� �exp ð2mrpÞf ðxi; yj; zkÞ ð4Þ
The response function for this case is represented in Fig. 2. The value of Nz is the
number of transaxial sections to be reconstructed. The Eq. (4) suggests that every section
f k ¼ f ðxi; yj; zkÞ; with i ¼ 1; . . .;Nx; j ¼ 1; . . .;Ny and k fixed, can be reconstructed solving
a linear system with the same matrix K and right hand side gk ¼ gðtu; zk; unÞ; with
u ¼ 1; . . .;Nu; n ¼ 1; . . .;Nu and k fixed, i.e.:
f k ¼ Kgk; k ¼ 1; . . .;Nz
where K has Nu·Nu rows and Nx·Ny columns. This formulation is equivalent to Eq. (3) where:
K ¼
K 0 0 . . .
0 K 0 . . .
. . . . . . . . . . . .
0 . . . 0 K
0BBBBB@
1CCCCCA: ð5Þ
and where f ¼ ð f 1; . . .; f NxÞt; g ¼ ðg1; . . .; gNx
Þt: The linear system (3) is obtained
discretizing the ill posed problem (1). It is well known that ordinary algorithms cannot
properly solve this problem which is ill conditioned [16] and regularization algorithms must
be used. In this work we compute a regularized solution by performing several conjugate
gradient iterations applied to the least squares problem (CGLS):
fminkKf 2 gk2:
FIGURE 2 Model of the 2D Gaussian response function.
E.L. PICCOLOMINI AND F. ZAMA110
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a, S
an F
ranc
isco
] at
01:
25 3
0 Se
ptem
ber
2014
It has been observed that the first few iterations have a converging behaviour before they
start to oscillate and finally diverge [5]. This phenomenon, known as semiconvergence,
suggests to introduce a suitable stopping criterion to obtain a regularized solution [4]. Different
stopping criteria are analysed in [7] for different tomography problems; in this work we use a
stopping criterion based on the behaviour of the residuals and the error norms computed during
the CGLS iterations.
IMPLEMENTATION NOTES
In this section we report some details about the implementation of the CGLS method
on parallel distributed memory architectures characterized by high interprocessors
communication bandwith such as the CRAY T3E. Each CGLS iteration requires two matrix
vector products which constitute the most time consuming steps of the algorithms ([1,2]), as
shown in the following pseudo-code (where the starting value x (0) is xð0Þ ¼ 0Þ.
rð0Þ ¼ g 2Kxð0Þ; D ¼ krð0Þk2;
dð0Þ ¼ rð0Þ
for k ¼ 0; 1; . . .
q ¼ Kdð0Þ Projection Step
G ¼ kqk2
a ¼ D=G ; G ¼ D
xðkþ1Þ ¼ x ðkÞ þ adðkÞ
rðkþ1Þ ¼ rðkÞ 2 aq
s ðkþ1Þ ¼ Ktr ðkÞ Backprojection Step
D ¼ ks ðkþ1Þk2
b ¼ D=G
dðkþ1Þ ¼ s ðkþ1Þ þ bdðkÞ
end
ð6Þ
The projection and backprojection steps, highlighted in Eq. (6), are modified by
distributing the rows or columns of the matrix K among the NPES processors which
constitute the parallel architecture. We observe that, in our model, the matrix K has the block
diagonal structure (5). This structure can be exploited by partitioning the diagonal block K
among the different processors as shown in the following paragraphs.
Column Partition Scheme
The columns of matrix K are distributed among the NPES processors as shown in Fig. 3.
Partitioning the matrix K by columns we have
K ¼ ½K1;K2; . . .;KNPES�
where Ki has Nu·Nu rows and ðNx·NyÞ=NPES columns. Setting:
m ¼ Nu·Nv·Nu; n ¼ Nx·Ny·Nz;mp ¼ m=NPES; np ¼ n=NPES ð7Þ
PARALLEL SPECT RECONSTRUCTION 111
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a, S
an F
ranc
isco
] at
01:
25 3
0 Se
ptem
ber
2014
we obtain the CGLS_col iteration given in Eq. (8).
rð0Þ ¼ g; D ¼ krð0Þk2;
dð0Þ ¼ rð0Þ
for k ¼ 0; 1; . . .
zðkÞj ¼ KjdðkÞj Parallel Projection Step
q ¼ GlobalSum zðkÞj
� �Communication=Synchronization
G ¼ kqk2
a ¼ D=G; G ¼ D
xkþ1j ¼ xðkÞ
j þ adðkÞj
rðkþ1Þ ¼ rðkÞ 2 aq
sðkþ1Þj ¼ Kt
jrðkÞ Parallel Backprojection Step
dj ¼ ksjk2
D ¼ GlobalSum ðdjÞ Communication=Synchronization
b ¼ D=G
dðkþ1Þj ¼ sðkþ1Þ
j þ bdðkÞj
end
ð8Þ
Each processor j, j ¼ 1; . . .;NPES computes the local partitions of the vectors x (k), d (k),
s (k) given by:
xðkÞj ; dðkÞ
j ; sðkÞj [ Rnp
and its own local copy of the vectors: r; q [ Rm and zj [ Rm: In order to obtain the scalar
factors G and D consistent with the scalar algorithm (6) it is necessary to perform GlobalSum
operations, i.e. to sum up elements or vectors locally computed by each processor. In fact, the
projection step is written as:
q ¼ Kd ¼XNPES
j¼1
zðkÞj
FIGURE 3 Column partition scheme.
E.L. PICCOLOMINI AND F. ZAMA112
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a, S
an F
ranc
isco
] at
01:
25 3
0 Se
ptem
ber
2014
where the products zðkÞj ¼ Kjdj are computed by processors j ¼ 1; . . .;NPES
simultaneously. This step requires a global sum of vectors of length m which is defined
as GlobalSum zðkÞj
� �:
In the backprojection step we have:
sðkþ1Þ ¼ Ktrðkþ1Þ ¼
sðkþ1Þ1
sðkþ1Þ2
..
.
sðkþ1ÞNPES
0BBBBBBBBB@
1CCCCCCCCCA
¼
Kt1rðkþ1Þ
Kt2rðkþ1Þ
..
.
KtNPESrðkþ1Þ
0BBBBBBBB@
1CCCCCCCCA
Then the value of D is given by:
D ¼XNPES
j¼i
kdjk2
dj ¼ ksðkþ1Þj k
2
where dj are computed in parallel by each processor. This step requires a global sum of the
scalar dj which is defined as GlobalSum (dj).
The GolbalSum operation is performed by means of PVM communication/synchroniza-
tion primitives, its computation time is proportional to the length of the vectors to be summed
up and to the number of processors involved. The computations involved in each iteration of
CGLS_col are approximately: Oð2ðm £ npÞ þ 3np þ 3mÞ:
Row Partition Scheme
When the matrix is partitioned on NPES processors by rows each processor is assigned a
block of rows of K as shown in Fig. 4, where each block Kj has ððNu·NuÞ=NPESÞ rows and
ðNx·NyÞ columns. Each processor j, j ¼ 1; . . .;NPES computes the local partitions of the
vectors x (k), r (k), q (k) given by xðkÞj ; rðkÞj and qðkÞ
j ; respectively. Using the definition in Eq. (7)
we have:
xj [ Rnp; rj; qj [ Rmp :
FIGURE 4 Row partition scheme.
PARALLEL SPECT RECONSTRUCTION 113
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a, S
an F
ranc
isco
] at
01:
25 3
0 Se
ptem
ber
2014
Each processor computes also its own local copy of the vectors d and s and the vector
zj [ Rn as shown in Eq. (9).
rð0Þ ¼ g; D ¼ krð0Þk2;
dð0Þ ¼ rð0Þ
for k ¼ 0; 1; . . .
qj ¼ KjdðkÞ Parallel Projection Step
gj ¼ kqjk2
G ¼ GlobalSum ðgjÞ Communication=Synchronization
a ¼ D=G; G ¼ D;
xðkþ1Þj ¼ xðkÞj þ adðkÞ
j
rðkþ1Þj ¼ rðkÞj 2 aq j
zj ¼ Ktjrj Parallel Backprojection Step
sðkþ1Þ ¼ GlobalSum ðzjÞ Communication=Synchronization
D ¼ ksðkþ1Þk2
b ¼ D=G
dðkþ1Þ ¼ sðkþ1Þ þ bdðkÞ
end
ð9Þ
Also in this case the communication/synchronization operations are required after
the projection/backprojection steps. The scalar G, computed after the projection step, is
given by:
G ¼XNPES
j¼1
gj gj ¼ kq jk2
where each processor computes its own gj simultaneously. Then G is computed by means of a
GlobalSum of one scalar.
The backprojection step is written as:
dðkþ1Þ ¼ Ktrðkþ1Þ ¼XNPES
j¼1
zj
where the products zj ¼ Ktjr
ðkþ1Þj are computed by processors j ¼ 1; . . .;NPES simul-
taneously. This step requires a global sum of vectors of length n which is defined as
GlobalSum (zj).
E.L. PICCOLOMINI AND F. ZAMA114
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a, S
an F
ranc
isco
] at
01:
25 3
0 Se
ptem
ber
2014
The communication/synchronization operations required by each single step of
CGLS_row are two GlobalSum of length 1 and n, respectively. The computations per
iteration are approximately: Oð2ðmp £ nÞ þ 2mp þ np þ 3nÞ:
NUMERICAL RESULTS
In this section, we report some results of numerical experiments carried out on 256
processors of the CRAY T3E at CINECA. The parallel architecture is composed by 128 Dec
Alpha processors (600 MHz), each with 128 Mbyte of local RAM, and 128 Dec 21164 Alpha
processors, each with 256 Mbyte of local RAM. All the processors are connected in 3D torus
topology. The source code is written in Fortran and the communications are performed by
means of PVM primitives [8].
The test acquisitions are obtained by the Department of Physiopathology of the Careggi
Hospital in Florence. Each dataset contains 90 stationary projections equispaced over 2p and
64 £ 64 sampling.
T1 Phantom data obtained with HMPAO99Tcm, 800,000 photon counts per slice of
thickness 6 mm.
T2 Patient data obtained with HMPAO99Tcm, 250,000 photon counts per slice of thickness
9 mm.
The dimensions of the whole matrix K are 362,880 £ 262,144. The dimensions of the
diagonal blocks K are: 5760 £ 4096 with about 10% of non zero elements.
In order to evaluate the efficiency of the parallel algorithms CGLS_col and CGLS_row the
test problems have been reconstructed using algorithms (8), (9) and (6) and the times
required for the execution of one single iteration have been compared. The total computation
time is composed by the computation and the communication time. These quantities have
been measured for both the algorithms varying the number of processors applied from 2 to
128 by powers of 2 as required by the T3E architecture configuration. As we can see in Fig. 5
FIGURE 5 Computation and communication times per iteration.
PARALLEL SPECT RECONSTRUCTION 115
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a, S
an F
ranc
isco
] at
01:
25 3
0 Se
ptem
ber
2014
the amount of communication time (dotted line) increases with the number of processors
while the total computation time decreases to a minimum value which is slightly better for
the CGLS_row algorithms (1.30 s) with respect to the CGLS_col algorithm (1.58 s). It is
interesting to observe that these minima do not occur with the same number of processors.
In the case of CGLS_row algorithm the minimum value is achieved with 64 processors while
CGLS_col achieves its minimum computation time with 32 processors. This means that the
communication/synchronization operations have greater impact on CGLS_col computation
time with respect to CGLS_row.
The parallel performance is evaluated by means of the speed up (SU) and efficiency (E)
measures:
SU ð pÞ ¼Tð pÞ
Tð1ÞEð pÞ ¼
SU ð pÞ
p
where T( p) is the total computation time of one iteration of the parallel algorithm run
on p processors. The value of T(1) is relative the computation time of one iteration of
the CGLS algorithm (6) and is 27.01 s. The values of speed up and efficiency are
reported in Table I and confirm the better efficiency of CGLS_row with respect to
CGLS_col.
The total execution time of each algorithm depends upon the termination criterion adopted
to regularize the CGLS iterations. Different stopping rules are analyzed in [7,15]. The rules
R1 and R2 are defined as follows:
R1 ¼ krkk , gksrk21k
R2 ¼ kxk 2 xk21k , dkxkk:
A suitable value for the parameter g is g ¼ 1.5 which gives 11 iterations for test problem
T1 and 6 iterations for test problem T2. For the stopping rule R2 we have d ¼ 0.003
(12 iterations) for test problem T1 and d ¼ 0.03 (7 iterations) for test problem T2.
The total computation time required by the reconstruction with CGLS is 297 s for test
problem T1 and 162 s for test problem T2. Using CGLS_row we can compute the full brain
study in 14.3 s for test problem T1 and 7.8 s for test problem T2. The reconstructions shown
in Figs. 6 and 7 are relative to one section of the whole brain study with activity and with
attenuation. In the case of T1 we report the reconstructions from high photon counts
TABLE I Speed up and efficiency of one iteration of CGLS_col and CGLS_row algorithms run on p processors
CGLS_col CGLS_row
p Speed up Efficiency Speed up Efficiency
2 1.99 9.96 £ 1021 1.99 9.97 £ 1021
4 3.71 9.27 £ 1021 3.71 9.29 £ 1021
8 6.94 8.67 £ 1021 7.10 8.87 £ 1021
16 1.22 £ 101 7.65 £ 1021 1.35 £ 101 8.44 £ 1021
32 1.71 £ 101 5.35 £ 1021 2.02 £ 101 6.31 £ 1021
64 1.62 £ 101 2.53 £ 1021 2.08 £ 101 3.24 £ 1021
E.L. PICCOLOMINI AND F. ZAMA116
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a, S
an F
ranc
isco
] at
01:
25 3
0 Se
ptem
ber
2014
FIGURE 6 (a) Phantom data T1 reconstructed with CGLS_row and stopping rule R1. (b) Phantom with low photoncounts (400,000 per slice) reconstructed with CGLS_row and stopping rule R1 (10 iterations).
FIGURE 7 Single section of dataset T2 reconstructed with CGLS_row and stopping rule R1.
PARALLEL SPECT RECONSTRUCTION 117
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a, S
an F
ranc
isco
] at
01:
25 3
0 Se
ptem
ber
2014
(Fig. 6(a)) and from low photon counts (Fig. 6(b)). It is evident that better reconstruction of
noisy data is with attenuation.
Some meaningful sections of the brain study T2 are shown in Fig. 8 for the
transaxial view.
CONCLUSION
We conclude by observing that this program can be applied to other inverse problems
where the kernel of the underlying integral equation is not of Dirac type. Moreover,
the program can be used to solve the problem in the fully 3D case without modifying its
structure and with the same amount of communications. In the 3D case, the matrix K has
the block structure shown:
K ¼
K1;1 K1;2 . . . K1;2NN21 K1;2Nx
K2;1 K2;2 . . . K2;2Nx21 K2;2Nx
K1;1 K1:2 . . . K1;2Nx21 K1;2Nx
..
. ... ..
. ... ..
.
KN‘21;1 KN‘21;2 . . . KN‘21;2NN21 KN‘21;2NN
KN‘;1 KN‘;2 . . . KN‘;2Nx21 KN‘;2Nx
0BBBBBBBBBBBB@
1CCCCCCCCCCCCA
where Ki,i ¼ K and Ki, j represent the activity of the j-th slice that affects the values relative to
slice i. These blocks are expected to be sparse and with the number of zero elements
increasing with ji 2 jj (due to the Gaussian response function). We expect that the amount of
computation in the projection/backprojection increases in each processor. Then, higher
values in speed-up and efficiency should be obtained.
FIGURE 8 Section from the full brain study in the patient case (T2).
E.L. PICCOLOMINI AND F. ZAMA118
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a, S
an F
ranc
isco
] at
01:
25 3
0 Se
ptem
ber
2014
Finally, this approach could be easily extended to obtain parallel implementation of other
iterative reconstruction methods such as ML, ML-EM, OSEM, and everywhere a
projection/backprojection step can be performed in terms of matrix vector product.
References
[1] D’Azevedo, E., Eijkhout, V. and Romine, C. (1992) “Lapack Working Note 56 reducing communication costsin Conjugate Gradient algorithm memory multiprocessors”. Technical report, http:netlib.org
[2] Field, M.R. (1998) “Optimizing a parallel conjugate gradient solver”, SIAM J. Sci. Comput. 19, 27–37.[3] Formiconi, A.R., Pupi, A. and Passeri, A. (1989) “Compensation of spatial system response in SPECT with
conjugate gradient reconstruction technique”, Phys. Med. Biol. 34, 69–84.[4] Hanke, M. (1995) “Conjugate gradient type methods for ill posed problems”, Pitman Research Notes in
Mathematics Series (Longman, London).[5] Hansen, P.C. and Hanke, M. (1998) “Regularization methods for large-scale problems”, Surv. Math. Ind. 3,
253–315.[6] Passeri, A., Formiconi, A.R. and Meldolesi, U. (1992) “Physical modelling (geometrical system response,
Compton scattering and attenuation) in brain SPECT using the conjugate gradients reconstruction method”,Phys. Med. Biol. 37, 172–1744.
[7] Loli Piccolomini, E. and Zama, F. (1999) “The conjugate gradient regularization method in computedtomography problems”, Appl. Math. Comput. 102, 87–99.
[8] Message Passing Toolkit: PVM Programmer’s Manual.[9] Zama, F. and Loli Piccolomini, E. (1997) “Regularization algorithms for image reconstruction from
projections”, In: Ciarlini, P., Cox, M.G., Pavese, F. and Richter, D., eds, Series on Advances in Mathematics forApplied Science (World Scientific, Singapore), Vol. 45.
[10] Hestens, M.R. and Stiefel, E. (1952) “Methods of conjugate gradients for solving linear systems”, J. Res. NatlBur. Stand. 49, 409–436.
[11] Tsui, B.M.W., Zhao, X.D., Frey, E. and Gullberg, G. (1991) “Comparison between ML-EM and WLS-CGalgorithms for SPECT image reconstruction”, IEEE Trans. Nucl. Sci. 38, 1766–1772.
[12] Shepp, L. and Vardi, Y. (1982) “Maximum likelihood reconstruction for emission tomography”, IEEE Trans.Med. Imaging 1, 113–122.
[13] Hudson, H.M. and Larkin, R.S. (1994) “Accelerated EM reconstruction using ordered subsets of projectiondata”, IEEE Trans. Med. Imaging 13, 601–609.
[14] Formiconi, A.R., Passeri, A., Guelfi, M.R., Masoni, M., Pupi, A., Meldolesi, U., Malfetti, P., Calori, L. andGuidazzoli, A. (1997) “World wide web interface for advanced SPECT reconstruction algorithms implementedon a remote massively parallel computer”, Int. J. Med. Inf. 47, 125–138.
[15] Zama, F. and Loli Piccolomini, E. “Numerical solution of some problems in tomography”, Annalidell’Universita di Ferrara, Sezione VII Scienze Matematiche, suppl. Vol. XLVI, Ferrara 2000. Available in:http://www.dm.unibo.it/ , zama
[16] Hansen, P.C. (1997) “Rank-deficient and discrete ill-posed problems”, SIAM.
PARALLEL SPECT RECONSTRUCTION 119
Dow
nloa
ded
by [
Uni
vers
ity o
f C
alif
orni
a, S
an F
ranc
isco
] at
01:
25 3
0 Se
ptem
ber
2014