14
This article was downloaded by: [University of California, San Francisco] On: 30 September 2014, At: 01:25 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Parallel Algorithms and Applications Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/gpaa19 An experiment of parallel SPECT data reconstruction E. Loli Piccolomini & F. Zama a Department of Mathematics , University of Bologna , Piazza Porta S. Donato 5, 40127, Bologna, Italy Published online: 17 Oct 2011. To cite this article: E. Loli Piccolomini & F. Zama (2003) An experiment of parallel SPECT data reconstruction , Parallel Algorithms and Applications, 18:3, 107-119, DOI: 10.1080/1063719032000115335 To link to this article: http://dx.doi.org/10.1080/1063719032000115335 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

An experiment of parallel SPECT data reconstruction

  • Upload
    f

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Page 1: An experiment of parallel SPECT data reconstruction

This article was downloaded by: [University of California, San Francisco]On: 30 September 2014, At: 01:25Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Parallel Algorithms and ApplicationsPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/gpaa19

An experiment of parallel SPECT data reconstructionE. Loli Piccolomini & F. Zamaa Department of Mathematics , University of Bologna , Piazza Porta S. Donato 5, 40127,Bologna, ItalyPublished online: 17 Oct 2011.

To cite this article: E. Loli Piccolomini & F. Zama (2003) An experiment of parallel SPECT data reconstruction , ParallelAlgorithms and Applications, 18:3, 107-119, DOI: 10.1080/1063719032000115335

To link to this article: http://dx.doi.org/10.1080/1063719032000115335

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: An experiment of parallel SPECT data reconstruction

AN EXPERIMENT OF PARALLEL SPECT DATARECONSTRUCTION

E. LOLI PICCOLOMINI and F. ZAMA*

Department of Mathematics, University of Bologna, Piazza Porta S. Donato 5, 40127, Bologna, Italy

(Received 22 March 2001; In final form May 2002)

In this work we use a massively parallel architecture for solving the problem of reconstructing human brain sectionsfrom experimental data obtained from a Gamma camera equipped with parallel-hole collimators. We compute least-squares regularized solutions by means of weighted conjugate gradient iterations coupled with a suitable stoppingrule. The computations are distributed to the CRAY T3E parallel processors following two different decompositionstrategies obtaining high speed up values. This decomposition strategy can be easily extended to a wide family ofiterative reconstruction algebraic methods.

Keywords: SPECT; Reconstruction; NPES; Distributed memory computing; Conjugate gradients; Computedtomography

INTRODUCTION

Different computed tomography (CT) problems such as single photon emission tomography

(SPECT) and positrons emission tomography (PET) are modelled by first kind Fredholm

integral equations:

gðu; uÞ ¼

ðV

Kðu; r; uÞ f ðrÞdr ð1Þ

where the unknown function f represents phenomenon of emission in the coordinate points

r ¼ ðx; y; zÞ over the domain V , R3: The function g is the volume integral of the activity

weighted by the probability of detection and total attenuation at each point. In particular,

SPECT consists in the detection of the gamma rays emitted singly from radioactive atoms

(radionuclides) contained in the radiopharmaceutical injected into a particular organ to

determine its functional activity. The detector, called Gamma camera, rotates around the

object describing a circular orbit onto the plane x, y. The surface of the Gamma camera is

described by the coordinates u ¼ ðt; ‘Þ (see Fig. 1) and is parallel to the z axis. For each angle

0 # u # 2p; the photons that pass through the holes of the collimator, placed onto the

surface of the Gamma Camera, are counted and constitute the values of g. The kernel

function K weights the contribution of each point of the object f and models the physical

phenomena involved into the acquisition phase.

ISSN 1063-7192 print/ISSN 1029-032X online q 2003 Taylor & Francis Ltd

DOI: 10.1080/1063719032000115335

*Corresponding author. E-mail: [email protected]

Parallel Algorithms and Applications, Vol. 18 (3) September 2003, pp. 107–119

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

01:

25 3

0 Se

ptem

ber

2014

Page 3: An experiment of parallel SPECT data reconstruction

Fast and low storage direct methods, such as the Filtered Back Projection, based on a line-

integral model, are not suitable for CT problems with high noise rates such as SPECT [3,6,9].

The introduction of iterative model-based methods, such as Conjugate Gradient (CG)

[10,11], Expectation Maximization-Maximum Likelihood (EL-ML) [12], Ordered Subset

Expectation Maximization (OSEM) [13], presents a greater flexibility allowing better

modelling of acquisition and noise but requires greater computational burden in terms of

time and storage occupation. In this work we use the CG method, since it finds a good

solution with few iterations. The merits of the Conjugate Gradients iterations for the

solutions of tomographic problems will not be discussed here in detail. The comparisons with

other methods can be found in many papers such as [3,6,11].

Our purpose here is to evaluate two parallel algorithms on real data and to show the benefit,

in terms of faster reconstructions, that can be achieved using parallel distributed memory

architectures. As shown in [14], distributed memory parallel architectures can be usefully

applied to reduce the computational times necessary for clinical applications while preserving

accurate modelling. In this work we apply a decomposition technique to distribute data and

computations among several CRAY T3E processors. Since the communication primitives are

written using a free software such as Parallel Virtual Machine (PVM) [8], it is possible to run

the application in all the laboratories with a network of Workstations or PC with PVM

installed.

The model and the algorithm used to solve the discrete problem are analyzed in the second

section. In the third section we report details of the parallel implementation and in the fourth

section we show some meaningful results obtained in numerical experiments.

THE MATHEMATICAL MODEL

In the mathematical model used in this paper the kernel function in Eq. (1) accounts for

attenuation and geometrical system response:

Kðu; r; uÞ ¼ exp ð2mrpÞ·Gðu; r; uÞ:

FIGURE 1 Model of the acquisition for the 3D SPECT emission problem. The object to be reconstructed isrepresented by the sphere. The projection of one single point ðxr ; yr ; zrÞis represented by the coordinates ðlr ; trÞ ontothe rotated projection plane Tu.

E.L. PICCOLOMINI AND F. ZAMA108

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

01:

25 3

0 Se

ptem

ber

2014

Page 4: An experiment of parallel SPECT data reconstruction

The term exp ð2mrpÞ represents the total cross section attenuation and decreases the

number of counts from that which would have been if the activity was on air. It depends

on the value of p, which is the attenuation thickness between the collimator plate and the

point r, and on the value of the attenuation coefficient mr which is related to the tissue

examined.

The function G expresses the Gaussian response function:

Gðu; r; uÞ ¼1ffiffiffiffiffiffi

2pp

s ðqÞexp 2

Du

2s ðqÞ2

� �

where q is the distance of the point r ¼ ðx; y; zÞ from the Gamma camera collimator surface,

Du ¼ ðt 2 trÞ2 þ ð‘2 ‘rÞ

2

where:tr ¼ 2x sin uþ y cos u; ‘r ¼ z ð2Þ

i.e. ðtr; ‘rÞ represents the bin at the minimum distance from the object point r ¼ ðx; y; zÞ

(see Fig. 1). Substituting in Eq. (1) we obtain:

gðt; ‘; uÞ ¼

ðV

1ffiffiffiffiffiffi2p

ps ðqÞ

exp

�ðt 2 trÞ

2 þ ð‘2 ‘rÞ2

2s ðqÞ2

�exp ð2mrpÞ f ðrÞdr:

The discrete measures of the function g that we obtain in the acquisition phase are relative

to the bins, represented by ðtu; ‘vÞ :

tu ¼ udu; u ¼ 1; . . .;Nu; ‘v ¼ vdv; v ¼ 1; . . .;Nv;

ðdu; dvðNu;NvÞ are determined by the acquisition process) and to the Nu rotation angles un

where:un ¼ ndu; du ¼ 2p=Nu :

If we consider:

V ¼ ðx; y; zÞ : x [ 2rx; rx

� �; y [ 2ry; ry

� �; z [ 2rz; rz

� �� �and sample the object domain in points ðxi; yj; zkÞ where:

xi ¼ idx; i ¼ 1; . . .;Nx; dx ¼ rx=Nx

yj ¼ jdy; j ¼ 1; . . .;Ny; dy ¼ ry=Ny

zk ¼ kdz; k ¼ 1; . . .;Nz; dz ¼ rz=Nz

we obtain the equations:

gðtu; ‘v; unÞ ¼XNx

i¼1

XNy

j¼1

XNz

k¼1

1ffiffiffiffiffiffi2p

ps ðqÞ

exp 2ðtu 2 trÞ

2 þ ð‘v 2 ‘rÞ2

2s ðqÞ2

� �e2mrpf ðxi; yj; zkÞ:

We obtain a set of Nu·Nv·Ng linear equations in Nx·Ny·Nz unknowns:

Kf ¼ g: ð3Þ

PARALLEL SPECT RECONSTRUCTION 109

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

01:

25 3

0 Se

ptem

ber

2014

Page 5: An experiment of parallel SPECT data reconstruction

From Eq. (2) we observe that tr depends on xi; yj; un while ‘r ¼ zk and the distance q from

the gamma camera surface does not depend on zk. With the hypothesis that dz ¼ dv is

sufficiently large then

exp2 ð‘v 2 ‘rÞ

2�< 0 if ‘v – ‘r

and the model equations become ðk ¼ 1; . . .;NzÞ :

gðtu; zk; unÞ ¼XNx

i¼1

XNy

j¼1

1ffiffiffiffiffiffi2p

ps ðqÞ

exp 2ðtu 2 trÞ

2

2s ðqÞ2

� �exp ð2mrpÞf ðxi; yj; zkÞ ð4Þ

The response function for this case is represented in Fig. 2. The value of Nz is the

number of transaxial sections to be reconstructed. The Eq. (4) suggests that every section

f k ¼ f ðxi; yj; zkÞ; with i ¼ 1; . . .;Nx; j ¼ 1; . . .;Ny and k fixed, can be reconstructed solving

a linear system with the same matrix K and right hand side gk ¼ gðtu; zk; unÞ; with

u ¼ 1; . . .;Nu; n ¼ 1; . . .;Nu and k fixed, i.e.:

f k ¼ Kgk; k ¼ 1; . . .;Nz

where K has Nu·Nu rows and Nx·Ny columns. This formulation is equivalent to Eq. (3) where:

K ¼

K 0 0 . . .

0 K 0 . . .

. . . . . . . . . . . .

0 . . . 0 K

0BBBBB@

1CCCCCA: ð5Þ

and where f ¼ ð f 1; . . .; f NxÞt; g ¼ ðg1; . . .; gNx

Þt: The linear system (3) is obtained

discretizing the ill posed problem (1). It is well known that ordinary algorithms cannot

properly solve this problem which is ill conditioned [16] and regularization algorithms must

be used. In this work we compute a regularized solution by performing several conjugate

gradient iterations applied to the least squares problem (CGLS):

fminkKf 2 gk2:

FIGURE 2 Model of the 2D Gaussian response function.

E.L. PICCOLOMINI AND F. ZAMA110

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

01:

25 3

0 Se

ptem

ber

2014

Page 6: An experiment of parallel SPECT data reconstruction

It has been observed that the first few iterations have a converging behaviour before they

start to oscillate and finally diverge [5]. This phenomenon, known as semiconvergence,

suggests to introduce a suitable stopping criterion to obtain a regularized solution [4]. Different

stopping criteria are analysed in [7] for different tomography problems; in this work we use a

stopping criterion based on the behaviour of the residuals and the error norms computed during

the CGLS iterations.

IMPLEMENTATION NOTES

In this section we report some details about the implementation of the CGLS method

on parallel distributed memory architectures characterized by high interprocessors

communication bandwith such as the CRAY T3E. Each CGLS iteration requires two matrix

vector products which constitute the most time consuming steps of the algorithms ([1,2]), as

shown in the following pseudo-code (where the starting value x (0) is xð0Þ ¼ 0Þ.

rð0Þ ¼ g 2Kxð0Þ; D ¼ krð0Þk2;

dð0Þ ¼ rð0Þ

for k ¼ 0; 1; . . .

q ¼ Kdð0Þ Projection Step

G ¼ kqk2

a ¼ D=G ; G ¼ D

xðkþ1Þ ¼ x ðkÞ þ adðkÞ

rðkþ1Þ ¼ rðkÞ 2 aq

s ðkþ1Þ ¼ Ktr ðkÞ Backprojection Step

D ¼ ks ðkþ1Þk2

b ¼ D=G

dðkþ1Þ ¼ s ðkþ1Þ þ bdðkÞ

end

ð6Þ

The projection and backprojection steps, highlighted in Eq. (6), are modified by

distributing the rows or columns of the matrix K among the NPES processors which

constitute the parallel architecture. We observe that, in our model, the matrix K has the block

diagonal structure (5). This structure can be exploited by partitioning the diagonal block K

among the different processors as shown in the following paragraphs.

Column Partition Scheme

The columns of matrix K are distributed among the NPES processors as shown in Fig. 3.

Partitioning the matrix K by columns we have

K ¼ ½K1;K2; . . .;KNPES�

where Ki has Nu·Nu rows and ðNx·NyÞ=NPES columns. Setting:

m ¼ Nu·Nv·Nu; n ¼ Nx·Ny·Nz;mp ¼ m=NPES; np ¼ n=NPES ð7Þ

PARALLEL SPECT RECONSTRUCTION 111

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

01:

25 3

0 Se

ptem

ber

2014

Page 7: An experiment of parallel SPECT data reconstruction

we obtain the CGLS_col iteration given in Eq. (8).

rð0Þ ¼ g; D ¼ krð0Þk2;

dð0Þ ¼ rð0Þ

for k ¼ 0; 1; . . .

zðkÞj ¼ KjdðkÞj Parallel Projection Step

q ¼ GlobalSum zðkÞj

� �Communication=Synchronization

G ¼ kqk2

a ¼ D=G; G ¼ D

xkþ1j ¼ xðkÞ

j þ adðkÞj

rðkþ1Þ ¼ rðkÞ 2 aq

sðkþ1Þj ¼ Kt

jrðkÞ Parallel Backprojection Step

dj ¼ ksjk2

D ¼ GlobalSum ðdjÞ Communication=Synchronization

b ¼ D=G

dðkþ1Þj ¼ sðkþ1Þ

j þ bdðkÞj

end

ð8Þ

Each processor j, j ¼ 1; . . .;NPES computes the local partitions of the vectors x (k), d (k),

s (k) given by:

xðkÞj ; dðkÞ

j ; sðkÞj [ Rnp

and its own local copy of the vectors: r; q [ Rm and zj [ Rm: In order to obtain the scalar

factors G and D consistent with the scalar algorithm (6) it is necessary to perform GlobalSum

operations, i.e. to sum up elements or vectors locally computed by each processor. In fact, the

projection step is written as:

q ¼ Kd ¼XNPES

j¼1

zðkÞj

FIGURE 3 Column partition scheme.

E.L. PICCOLOMINI AND F. ZAMA112

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

01:

25 3

0 Se

ptem

ber

2014

Page 8: An experiment of parallel SPECT data reconstruction

where the products zðkÞj ¼ Kjdj are computed by processors j ¼ 1; . . .;NPES

simultaneously. This step requires a global sum of vectors of length m which is defined

as GlobalSum zðkÞj

� �:

In the backprojection step we have:

sðkþ1Þ ¼ Ktrðkþ1Þ ¼

sðkþ1Þ1

sðkþ1Þ2

..

.

sðkþ1ÞNPES

0BBBBBBBBB@

1CCCCCCCCCA

¼

Kt1rðkþ1Þ

Kt2rðkþ1Þ

..

.

KtNPESrðkþ1Þ

0BBBBBBBB@

1CCCCCCCCA

Then the value of D is given by:

D ¼XNPES

j¼i

kdjk2

dj ¼ ksðkþ1Þj k

2

where dj are computed in parallel by each processor. This step requires a global sum of the

scalar dj which is defined as GlobalSum (dj).

The GolbalSum operation is performed by means of PVM communication/synchroniza-

tion primitives, its computation time is proportional to the length of the vectors to be summed

up and to the number of processors involved. The computations involved in each iteration of

CGLS_col are approximately: Oð2ðm £ npÞ þ 3np þ 3mÞ:

Row Partition Scheme

When the matrix is partitioned on NPES processors by rows each processor is assigned a

block of rows of K as shown in Fig. 4, where each block Kj has ððNu·NuÞ=NPESÞ rows and

ðNx·NyÞ columns. Each processor j, j ¼ 1; . . .;NPES computes the local partitions of the

vectors x (k), r (k), q (k) given by xðkÞj ; rðkÞj and qðkÞ

j ; respectively. Using the definition in Eq. (7)

we have:

xj [ Rnp; rj; qj [ Rmp :

FIGURE 4 Row partition scheme.

PARALLEL SPECT RECONSTRUCTION 113

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

01:

25 3

0 Se

ptem

ber

2014

Page 9: An experiment of parallel SPECT data reconstruction

Each processor computes also its own local copy of the vectors d and s and the vector

zj [ Rn as shown in Eq. (9).

rð0Þ ¼ g; D ¼ krð0Þk2;

dð0Þ ¼ rð0Þ

for k ¼ 0; 1; . . .

qj ¼ KjdðkÞ Parallel Projection Step

gj ¼ kqjk2

G ¼ GlobalSum ðgjÞ Communication=Synchronization

a ¼ D=G; G ¼ D;

xðkþ1Þj ¼ xðkÞj þ adðkÞ

j

rðkþ1Þj ¼ rðkÞj 2 aq j

zj ¼ Ktjrj Parallel Backprojection Step

sðkþ1Þ ¼ GlobalSum ðzjÞ Communication=Synchronization

D ¼ ksðkþ1Þk2

b ¼ D=G

dðkþ1Þ ¼ sðkþ1Þ þ bdðkÞ

end

ð9Þ

Also in this case the communication/synchronization operations are required after

the projection/backprojection steps. The scalar G, computed after the projection step, is

given by:

G ¼XNPES

j¼1

gj gj ¼ kq jk2

where each processor computes its own gj simultaneously. Then G is computed by means of a

GlobalSum of one scalar.

The backprojection step is written as:

dðkþ1Þ ¼ Ktrðkþ1Þ ¼XNPES

j¼1

zj

where the products zj ¼ Ktjr

ðkþ1Þj are computed by processors j ¼ 1; . . .;NPES simul-

taneously. This step requires a global sum of vectors of length n which is defined as

GlobalSum (zj).

E.L. PICCOLOMINI AND F. ZAMA114

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

01:

25 3

0 Se

ptem

ber

2014

Page 10: An experiment of parallel SPECT data reconstruction

The communication/synchronization operations required by each single step of

CGLS_row are two GlobalSum of length 1 and n, respectively. The computations per

iteration are approximately: Oð2ðmp £ nÞ þ 2mp þ np þ 3nÞ:

NUMERICAL RESULTS

In this section, we report some results of numerical experiments carried out on 256

processors of the CRAY T3E at CINECA. The parallel architecture is composed by 128 Dec

Alpha processors (600 MHz), each with 128 Mbyte of local RAM, and 128 Dec 21164 Alpha

processors, each with 256 Mbyte of local RAM. All the processors are connected in 3D torus

topology. The source code is written in Fortran and the communications are performed by

means of PVM primitives [8].

The test acquisitions are obtained by the Department of Physiopathology of the Careggi

Hospital in Florence. Each dataset contains 90 stationary projections equispaced over 2p and

64 £ 64 sampling.

T1 Phantom data obtained with HMPAO99Tcm, 800,000 photon counts per slice of

thickness 6 mm.

T2 Patient data obtained with HMPAO99Tcm, 250,000 photon counts per slice of thickness

9 mm.

The dimensions of the whole matrix K are 362,880 £ 262,144. The dimensions of the

diagonal blocks K are: 5760 £ 4096 with about 10% of non zero elements.

In order to evaluate the efficiency of the parallel algorithms CGLS_col and CGLS_row the

test problems have been reconstructed using algorithms (8), (9) and (6) and the times

required for the execution of one single iteration have been compared. The total computation

time is composed by the computation and the communication time. These quantities have

been measured for both the algorithms varying the number of processors applied from 2 to

128 by powers of 2 as required by the T3E architecture configuration. As we can see in Fig. 5

FIGURE 5 Computation and communication times per iteration.

PARALLEL SPECT RECONSTRUCTION 115

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

01:

25 3

0 Se

ptem

ber

2014

Page 11: An experiment of parallel SPECT data reconstruction

the amount of communication time (dotted line) increases with the number of processors

while the total computation time decreases to a minimum value which is slightly better for

the CGLS_row algorithms (1.30 s) with respect to the CGLS_col algorithm (1.58 s). It is

interesting to observe that these minima do not occur with the same number of processors.

In the case of CGLS_row algorithm the minimum value is achieved with 64 processors while

CGLS_col achieves its minimum computation time with 32 processors. This means that the

communication/synchronization operations have greater impact on CGLS_col computation

time with respect to CGLS_row.

The parallel performance is evaluated by means of the speed up (SU) and efficiency (E)

measures:

SU ð pÞ ¼Tð pÞ

Tð1ÞEð pÞ ¼

SU ð pÞ

p

where T( p) is the total computation time of one iteration of the parallel algorithm run

on p processors. The value of T(1) is relative the computation time of one iteration of

the CGLS algorithm (6) and is 27.01 s. The values of speed up and efficiency are

reported in Table I and confirm the better efficiency of CGLS_row with respect to

CGLS_col.

The total execution time of each algorithm depends upon the termination criterion adopted

to regularize the CGLS iterations. Different stopping rules are analyzed in [7,15]. The rules

R1 and R2 are defined as follows:

R1 ¼ krkk , gksrk21k

R2 ¼ kxk 2 xk21k , dkxkk:

A suitable value for the parameter g is g ¼ 1.5 which gives 11 iterations for test problem

T1 and 6 iterations for test problem T2. For the stopping rule R2 we have d ¼ 0.003

(12 iterations) for test problem T1 and d ¼ 0.03 (7 iterations) for test problem T2.

The total computation time required by the reconstruction with CGLS is 297 s for test

problem T1 and 162 s for test problem T2. Using CGLS_row we can compute the full brain

study in 14.3 s for test problem T1 and 7.8 s for test problem T2. The reconstructions shown

in Figs. 6 and 7 are relative to one section of the whole brain study with activity and with

attenuation. In the case of T1 we report the reconstructions from high photon counts

TABLE I Speed up and efficiency of one iteration of CGLS_col and CGLS_row algorithms run on p processors

CGLS_col CGLS_row

p Speed up Efficiency Speed up Efficiency

2 1.99 9.96 £ 1021 1.99 9.97 £ 1021

4 3.71 9.27 £ 1021 3.71 9.29 £ 1021

8 6.94 8.67 £ 1021 7.10 8.87 £ 1021

16 1.22 £ 101 7.65 £ 1021 1.35 £ 101 8.44 £ 1021

32 1.71 £ 101 5.35 £ 1021 2.02 £ 101 6.31 £ 1021

64 1.62 £ 101 2.53 £ 1021 2.08 £ 101 3.24 £ 1021

E.L. PICCOLOMINI AND F. ZAMA116

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

01:

25 3

0 Se

ptem

ber

2014

Page 12: An experiment of parallel SPECT data reconstruction

FIGURE 6 (a) Phantom data T1 reconstructed with CGLS_row and stopping rule R1. (b) Phantom with low photoncounts (400,000 per slice) reconstructed with CGLS_row and stopping rule R1 (10 iterations).

FIGURE 7 Single section of dataset T2 reconstructed with CGLS_row and stopping rule R1.

PARALLEL SPECT RECONSTRUCTION 117

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

01:

25 3

0 Se

ptem

ber

2014

Page 13: An experiment of parallel SPECT data reconstruction

(Fig. 6(a)) and from low photon counts (Fig. 6(b)). It is evident that better reconstruction of

noisy data is with attenuation.

Some meaningful sections of the brain study T2 are shown in Fig. 8 for the

transaxial view.

CONCLUSION

We conclude by observing that this program can be applied to other inverse problems

where the kernel of the underlying integral equation is not of Dirac type. Moreover,

the program can be used to solve the problem in the fully 3D case without modifying its

structure and with the same amount of communications. In the 3D case, the matrix K has

the block structure shown:

K ¼

K1;1 K1;2 . . . K1;2NN21 K1;2Nx

K2;1 K2;2 . . . K2;2Nx21 K2;2Nx

K1;1 K1:2 . . . K1;2Nx21 K1;2Nx

..

. ... ..

. ... ..

.

KN‘21;1 KN‘21;2 . . . KN‘21;2NN21 KN‘21;2NN

KN‘;1 KN‘;2 . . . KN‘;2Nx21 KN‘;2Nx

0BBBBBBBBBBBB@

1CCCCCCCCCCCCA

where Ki,i ¼ K and Ki, j represent the activity of the j-th slice that affects the values relative to

slice i. These blocks are expected to be sparse and with the number of zero elements

increasing with ji 2 jj (due to the Gaussian response function). We expect that the amount of

computation in the projection/backprojection increases in each processor. Then, higher

values in speed-up and efficiency should be obtained.

FIGURE 8 Section from the full brain study in the patient case (T2).

E.L. PICCOLOMINI AND F. ZAMA118

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

01:

25 3

0 Se

ptem

ber

2014

Page 14: An experiment of parallel SPECT data reconstruction

Finally, this approach could be easily extended to obtain parallel implementation of other

iterative reconstruction methods such as ML, ML-EM, OSEM, and everywhere a

projection/backprojection step can be performed in terms of matrix vector product.

References

[1] D’Azevedo, E., Eijkhout, V. and Romine, C. (1992) “Lapack Working Note 56 reducing communication costsin Conjugate Gradient algorithm memory multiprocessors”. Technical report, http:netlib.org

[2] Field, M.R. (1998) “Optimizing a parallel conjugate gradient solver”, SIAM J. Sci. Comput. 19, 27–37.[3] Formiconi, A.R., Pupi, A. and Passeri, A. (1989) “Compensation of spatial system response in SPECT with

conjugate gradient reconstruction technique”, Phys. Med. Biol. 34, 69–84.[4] Hanke, M. (1995) “Conjugate gradient type methods for ill posed problems”, Pitman Research Notes in

Mathematics Series (Longman, London).[5] Hansen, P.C. and Hanke, M. (1998) “Regularization methods for large-scale problems”, Surv. Math. Ind. 3,

253–315.[6] Passeri, A., Formiconi, A.R. and Meldolesi, U. (1992) “Physical modelling (geometrical system response,

Compton scattering and attenuation) in brain SPECT using the conjugate gradients reconstruction method”,Phys. Med. Biol. 37, 172–1744.

[7] Loli Piccolomini, E. and Zama, F. (1999) “The conjugate gradient regularization method in computedtomography problems”, Appl. Math. Comput. 102, 87–99.

[8] Message Passing Toolkit: PVM Programmer’s Manual.[9] Zama, F. and Loli Piccolomini, E. (1997) “Regularization algorithms for image reconstruction from

projections”, In: Ciarlini, P., Cox, M.G., Pavese, F. and Richter, D., eds, Series on Advances in Mathematics forApplied Science (World Scientific, Singapore), Vol. 45.

[10] Hestens, M.R. and Stiefel, E. (1952) “Methods of conjugate gradients for solving linear systems”, J. Res. NatlBur. Stand. 49, 409–436.

[11] Tsui, B.M.W., Zhao, X.D., Frey, E. and Gullberg, G. (1991) “Comparison between ML-EM and WLS-CGalgorithms for SPECT image reconstruction”, IEEE Trans. Nucl. Sci. 38, 1766–1772.

[12] Shepp, L. and Vardi, Y. (1982) “Maximum likelihood reconstruction for emission tomography”, IEEE Trans.Med. Imaging 1, 113–122.

[13] Hudson, H.M. and Larkin, R.S. (1994) “Accelerated EM reconstruction using ordered subsets of projectiondata”, IEEE Trans. Med. Imaging 13, 601–609.

[14] Formiconi, A.R., Passeri, A., Guelfi, M.R., Masoni, M., Pupi, A., Meldolesi, U., Malfetti, P., Calori, L. andGuidazzoli, A. (1997) “World wide web interface for advanced SPECT reconstruction algorithms implementedon a remote massively parallel computer”, Int. J. Med. Inf. 47, 125–138.

[15] Zama, F. and Loli Piccolomini, E. “Numerical solution of some problems in tomography”, Annalidell’Universita di Ferrara, Sezione VII Scienze Matematiche, suppl. Vol. XLVI, Ferrara 2000. Available in:http://www.dm.unibo.it/ , zama

[16] Hansen, P.C. (1997) “Rank-deficient and discrete ill-posed problems”, SIAM.

PARALLEL SPECT RECONSTRUCTION 119

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

01:

25 3

0 Se

ptem

ber

2014