A Bayesian Method for Multispectral Image Data Classification

This article was downloaded by: [UQ Library]On: 31 October 2014, At: 20:23Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Nonparametric StatisticsPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/gnst20

A Bayesian Method for MultispectralImage Data ClassificationGiovanni Sebastiani a & Sigrunn Holbek Sørbye ba Istituto per le Applicazioni del Calcolo "M. Picone" , ConsiglioNazionale delle Ricerche , Roma, Italy E-mail:b Institute of Mathematical and Physical Sciences , University ofTromsø , Tromsø, Norway E-mail:Published online: 27 Oct 2010.

To cite this article: Giovanni Sebastiani & Sigrunn Holbek Sørbye (2002) A Bayesian Method forMultispectral Image Data Classification, Journal of Nonparametric Statistics, 14:1-2, 169-180, DOI:10.1080/10485250211384

To link to this article: http://dx.doi.org/10.1080/10485250211384

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/gnst20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/10485250211384

http://dx.doi.org/10.1080/10485250211384

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

Nonparametric Statistics, 2002, Vol. 14(1-2), pp. 169–180

A BAYESIAN METHOD FOR MULTISPECTRAL IMAGEDATA CLASSIFICATION

GIOVANNI SEBASTIANIa,* and SIGRUNN HOLBEK SØRBYEb,{

aIstituto per le Applicazioni del Calcolo ‘‘M. Picone’’, Consiglio Nazionale delle Ricerche, Roma,Italy; bInstitute of Mathematical and Physical Sciences, University of Tromsø, Tromsø, Norway

(Received January 1999; In final form April 2000)

The problem of classifying multispectral image data is studied here. We propose a new Bayesian method for this. Themethod uses ‘‘a priori’’ spatial information modeled by means of a suitable Markov random field. The image data foreach class are assumed to be i.i.d. following a multivariate Gaussian model with unknown mean and unknowndiagonal covariance matrix. When the prior information is not used and the variances of the Gaussian model areequal, the method reduces to the standard K-means algorithm. All the parameters appearing in the posteriormodel are estimated simultaneously. The prior normalizing constant is approximated on the basis of theexpectation of the energy function as obtained by means of Markov Chain Monte Carlo simulations. Someexperimental results suggest calculating this expectation from a ‘‘standard’’ function by simple multiplication bythe minimum value of the energy. A local solution to the problem of maximizing the posterior distribution isobtained by using the Iterated Conditional Modes algorithm. The implementation of this method is easy and therequired computations are carried out quickly. The method was applied with success to classify simulated imagedata and real dynamic Magnetic Resonance Imaging data.

Keywords: Image analysis; Classification; Bayesian statistics; Markov random fields; K-means algorithm

1 INTRODUCTION

The problem addressed here is the classification of multispectral image data. This type of data

consists of several, say m, images describing different ‘‘attributes’’ of the same underlying

scene. As an example, we can consider a given region of Earth’s surface described by satellite

images of its emitted radiation in the ultraviolet, visible and infrared portions of the electro-

magnetic spectrum. In the simplest situation, any given spatial voxel of the physical object we

are describing is associated with an image element (pixel), which has the same location in

each of the m different types of measured images. In more complicated situations, e.g. med-

ical tomographic images of the same slice of the human body from different modalities

(Computed Assisted Tomography, Magnetic Resonance Imaging, Positron Emission Tomo-

graphy), this may not happen. In these more complicated cases, suitable models should be

used in order to link to each other pixels in different images that correspond to the same phy-

sical voxel. In this paper we deal with data, either original or transformed, of the first type.

* Corresponding author. E-mail: [email protected]{ E-mail: [email protected]

ISSN 1048-5252 print; ISSN 1029-0311X online # 2002 Taylor & Francis LtdDOI: 10.1080=10485250290026899

Dow

nloa

ded

by [

UQ

Lib

rary

] at

20:

23 3

1 O

ctob

er 2

014

One may then want to classify this type of data. This means producing an image that as-

signs to pixel i an integer ki 2 f1; . . . ;Kg that identifies to which of K possible ‘‘classes’’ that

pixel belongs. We want pixels belonging to the same class to be in some sense ‘‘equivalent’’

to each other. This equivalence should be induced by some relation between the attributes of

pixels assigned to a particular class. Ideally, we may wish that all pixels assigned to a parti-

cular class have the same value for each attribute, with the value of at least one attribute dif-

ferent between classes. The K classes could then be represented as K different points in m-

dimensional Euclidean space D with coordinates proportional to the values of the m attri-

butes. Two pixels with the same classification would then correspond to the same point in

D. In practice, due to variability present in the data, this definition would very often lead to

a situation in which the number of classes K could be as large as the total number of pixels

n. More realistically, when representing the image data in D, the K points become

K ‘‘clouds’’ or ‘‘clusters’’ of points. We observe that, if we represent the image data in this

way, its spatial nature has completely disappeared. In fact, any permutation of pixel values

performed in the same way in all m images would lead to the same representation in D. An-

other simple situation would be where there exist K non intersecting regions in D, e.g.

spheres, with all the points of each cluster contained in one region. In this case, a pixel

would be straightforwardly classified according to which sphere its corresponding point in

D belongs. Unfortunately, this situation very rarely occurs in practice. Accordingly, alterna-

tive criteria for image classification must be used.

In general, classification criteria can themselves be assigned to two main categories de-

pending on whether or not the number of classes K is known in advance. Of course, classi-

fication is usually expected to be an easier task when K is assumed known. In some

applications, this can reasonably be assumed. Such an application is the real situation

from dynamic MR imaging considered in this paper. Accordingly, we will only discuss the

case when the value of K is known. However, the Bayesian approach could also be applied

when K is unknown by using recent Markov Chain Monte Carlo (MCMC) algorithms [1]

instead of the deterministic scheme adopted here to maximize the posterior probability.

One of the methods present in the literature that is often applied for classification is the so

called K-means algorithm [2, 3]. This algorithm is widely used because it is very fast and it is

implemented as an internal function in popular statistical packages, such as S-plus. The K-

means algorithm classifies multispectral data by providing a local solution to a minimization

problem. Let us denote the image data point i in D by di, with i ¼ 1; . . . ; n and

di ¼ ðd1;i; . . . ; dm;iÞ This minimization problem consists of finding the location of K cluster

‘‘centres’’ cl ¼ ðc1;l; . . . ; cm;lÞ, in D and determining to which cluster ki each data point di

belongs in such a way that the total ‘‘internal’’ cluster squared variation v, given by

v ¼Xn

i¼1

kdi � ckik2

2; ð1Þ

is minimized. Let us introduce the vectors d ¼ ðd1; . . . ; dnÞ; c ¼ ðc1; . . . ; cK Þ and

k ¼ ðk1; . . . ; knÞ. The solution of the above minimization problem maximizes a very special

type of Gaussian model pðdjc; kÞ for the image data. This model is given by

pðdjc; kÞ ¼ ð2pV Þ�nm=2 exp

�v

ð2V Þ

� �: ð2Þ

It assumes that all the measured values dj;i are i.i.d. Gaussian. Furthermore, the expected

value of di is taken to be equal to cki, depending on the class ki to which pixel i is assigned,

170 G. SEBASTIANI AND S. H. SØRBYE

Dow

nloa

ded

by [

UQ

Lib

rary

] at

20:

23 3

1 O

ctob

er 2

014

while the variance V is the same for each attribute and class. The assumption of equality of

variances is often unrealistic, especially for the variances of the different attributes within a

class. Furthermore, even when considering only one attribute, the variances of different

classes can be different. In those cases, it is expected that minimizing the total squared varia-

tion (1) will provide on average less correct classifications than maximizing the proper data

model. Therefore, we prefer to base the classification procedure on the more general Gaus-

sian image data model

pðdjc; k;V Þ ¼Yn

i¼1

Ymj¼1

ð2pVj;kiÞ�1=2 exp

�ðdj;i � cj;kiÞ2

ð2Vj;kiÞ

� �; ð3Þ

where V ¼ ðV 1; . . . ;V K Þ;V l ¼ ðV1;l; . . . ;Vm;lÞ and Vj;l is the variance of the jth attribute

relative to class l. However, up to now, we have not taken into account the spatial nature

of the imaging data. Among the various ways to take into account specific information

from spatial data, the Bayesian approach based on a priori Markov random field (MRF)

image models has proved very successful for many different imaging tasks [4–9].

In the specific context of image classification, several Bayesian methods modeling a priori

spatial information on k by means of MRFs have been proposed [10–13]. In [12], a general-

ization of the K-means algorithm is presented, which takes into account spatial information

by using an MRF model. However, only the case of equal variances is considered there. In

[11], the case of possibly unequal variances is considered for the classification of multispec-

tral Magnetic Resonance (MR) images. The expected values and variances of the different

attributes for each class are considered known. These were estimated from training data

from regions of interest preliminary drawn from zones of the measured images that were as-

sumed to be representatives of the K possible classes. Furthermore, in [12], the problem of

estimating the hyper-parameter appearing in the prior model is not addressed. However, this

can be a very relevant issue since the classification result also depends on the value used for

this parameter.

In this paper, we propose a new Bayesian method for the classification of multispectral

image data. As described earlier, we assume that the measured image data are i.i.d. Gaussian

with unknown means and unknown and possibly unequal variances. The same MRF (Potts)

model as in [12] is used here to taken into account the spatial a priori information on k.

Furthermore, following the fully Bayesian approach to image analysis [14], the prior

hyper-parameter is estimated at the same time as the parameters cj;l and

Vj;l; j ¼ 1; . . . ;m; l ¼ 1; . . . ;K, of the image data model. A non-informative prior model

for the prior hyper-parameter is assumed. Inference on the above quantities can be performed

by means of the Maximum a Posteriori (MAP) estimator [6]. Unfortunately, for our problem,

the computational cost of finding the MAP estimator is too high. Accordingly, we find an

approximation to it by means of the Iterated Conditional Modes (ICM) algorithm [15].

This approximation corresponds to a local maximum of the posterior distribution. The

paper is organized as follows. In Section 2, the proposed method is described in details.

In Section 3, results from the application of the method to the classification of both simulated

and real data from dynamic Magnetic Resonance (MR) imaging are described and discussed.

In Section 4, some conclusions are outlined.

2 DESCRIPTION OF THE METHOD

The image data model adopted here is given in (3), where all the parameters c, V and k have

to be estimated. Following the fully Bayesian approach to image analysis [14], we aim to es-

BAYESIAN MULTISPECTRAL IMAGE 171

Dow

nloa

ded

by [

UQ

Lib

rary

] at

20:

23 3

1 O

ctob

er 2

014

timate c, V, k and the prior hyper-parameter b based on the posterior distribution

pðc;V ; k; bjdÞ. The expression of the posterior distribution in terms of the data model

pðdjc;V ; k; bÞ ¼ pðdjc;V ; kÞ and of the a priori model pðc;V ; k; bÞ can be obtained easily

by using Bayes’ theorem

pðc;V ; k; bjdÞ ¼pðdjc;V ; k; bÞpðc;V ; k; bÞ

pðdÞ: ð4Þ

All our a priori information regarding the quantities to be estimated is contained in the term

pðc;V ; k; bÞ. This term can be written in general as pðc;V ; k; bÞ ¼ pðkjc;V ; bÞpðc;V ; bÞ.The term pðkjc;V ; bÞ contains our prior information about the spatial nature of the classifica-

tion parameter vector k. The prior information considered here is the ‘‘continuity’’ of the

classified image k. More precisely, we expect that it is more likely for pixels that are close

to each other to belong to the same class than to different classes. A suitable MRF model

that takes into account such a property for k is the Potts model [10]:

pðkjc;V ; bÞ ¼ pðkjbÞ ¼ Z�1b exp½�bU ðkÞ� ¼ Z�1

b exp bXhs ti

dks;kt

!; ð5Þ

where the symbol hs ti denotes any possible pair of ‘‘neighbouring’’ pixels in the image. The

symbol du;w denotes the Kronecher index, which is equal to one when u ¼ w, otherwise it is

zero. We adopt a second-order neighbourhood structure, in which two pixels are considered

neighbours when the squared Euclidean distance between them is less or equal to two. There-

fore, given a pixel, its neighbours are the eight pixels closest to it, with obvious corrections at

the boundaries. The symbol Zb denotes the normalizing constant. We observe that the Potts

model is usually written as ~ZZ�1b0 exp½b0Shs tið2dks;kt

� 1Þ�. However, by setting b0 ¼ b=2 we

have the Potts model in (5). The model (5) is a generalization of the well-known Ising

model in Statistical Physics and the function U appearing in it is called the ‘‘energy’’ func-

tion. The MRF model (5) has been widely adopted in the context of Bayesian image analysis,

especially for classification purposes [6,10,12]. This model assigns, for a fixed value of the

smoothing parameter b, maximum probability to all the K ‘‘constant’’ configurations, while it

penalizes more and more configurations with less and less continuity. For example, in the

case of first order neighbours (the four closest horizontal or vertical) and of a regular rectan-

gular lattice of pixels, the minimum probability is associated to the ‘‘chess’’ configurations.

The normalizing constant Zb appearing in the posterior model is given by

Zb ¼ Sk exp½�bU ðkÞ�, where the summation is performed over all possible configurations

for the classification vector k. In practice, the exact calculation of Zb is impossible for almost

all MRF models used in Bayesian image analysis due to the very high number Kn of possible

configurations. This is not a problem in the case usually considered, when the smoothing

parameter is not estimated at the same time as the other relevant quantities. In that situation,

the classification is based on the posterior probability of the relevant quantities given the data

and b. Then, the MAP estimator will also be equal to the maximizer of the posterior distribu-

tion multiplied by any factor not depending on the quantities to be estimated, such as Zb, and

so the value of Zb need not to be known. Furthermore, approximate versions of other estima-

tors can usually be obtained from a finite sample drawn from the posterior distribution by

using MCMC algorithms [16], which only require the posterior model to be known up to

a multiplicative factor. This also means that we do not need to know the expression for

p(d) that appears in (4). In practice, the values of b is usually pre-assigned on the basis

of a ‘‘training set’’ of data by using different criteria [15]. However, our aim is to estimate


Dow

nloa

ded

by [

UQ

Lib

rary

] at

20:

23 3

1 O

ctob

er 2

014

b at the same time as the other quantities relevant to our problem. Therefore, we need to

have an expression for Zb as a function of b. For this we take advantage of the following

relation

d ln Zb

db¼ Z�1

bdZb

db¼

�P

k U ðkÞ

Zb exp½�bU ðkÞ�¼ �Epb ½U �; ð6Þ

where the Epb ½U � denotes the expected value of the energy function U, under the model (5),

indicated here as pb [17]. By integrating Eq. (6) with respect to b, we obtain

logðZbÞ � logðZb0Þ ¼ �

ðbb0

Epb0½U �db0: ð7Þ

Equation (7) has been commonly used in the field of Statistical Physics since late 1970’s

for computing the free energy difference between two molecular-dynamic systems [18]. This

method is known as thermodynamic integration. Algorithms based on MCMC can provide

approximations of expected values, under a probability distribution, of functions over the

configuration space, by averaging the values of the functions over a finite sample following

that distribution [16]. We therefore propose to approximate the function Epb ½U � by using stan-

dard MCMC algorithms. In particular, we used the Metropolis algorithm [10,16,19]. More

details about this are given in Section 3. We approximated Epb ½U � for a finite number of va-

lues of b in an appropriate range. This allows us to calculate the integral that appears in (7)

numerically using Simpson’s rule. However, as we shall describe in Section 3, Epb ½U � seems

to have a simple form resembling a logistic function of b. Therefore, after the approximation

step, a least squares procedure could be applied to estimate the parameters of a suitable

model for Epb ½U �. In this way we could obtain an analytical approximation to the integral

that appears in (7). We observe that once logðZbÞ � logðZb0Þ has been calculated for one

image domain (e.g. a square 64 � 64 image), then it can be used in all cases with the

same domain. Moreover, in Section 3 we shall see some examples that suggest calculating

logðZbÞ � logðZb0Þ for two domains which differ from each other in both shape and total

number of pixels on the basis of a ‘‘standard’’ function of b.

The prior distribution will be completely determined by assigning a model for pðc;V ; bÞ.We assume a completely non-informative model for pðc;V ; bÞ, so that the prior reduces to

the product between model (5) and a uniform distribution over suitably large product inter-

vals domain for c, V and b. A local solution to the problem of maximizing the posterior dis-

tribution is obtained by using the ICM algorithm [15]. The algorithm starts from an initial

configuration for k. Estimation first of c and then of V is performed by maximizing their

corresponding conditional probability with respect to all other quantities to be estimated.

It is easy to see that the conditional probability of cj;l given all the other quantities is propor-

tional to exp½�Ss2Biðdj;s � cj;lÞ

2=ð2Vj;lÞ�, where Bl is the set of pixels that are currently as-

signed to class l. By maximizing this expression with respect to cj;l, we obtain the sample

mean for the jth attribute in the subset of pixels that are currently assigned to class l. The

conditional probability of Vj;l given all the other quantities is proportional to

V�jBij

j;l exp½�Ss2Biðdj;s � cj;lÞ

2=ð2Vj;lÞ� where jBlj indicates the cardinality of Bl. By maximiz-

ing this expression with respect to Vj;l, we obtain the sampling second moment, with respect

to the current value of cj;l for the jth attribute in the subset of pixels that are currently as-

signed to class l.


Dow

nloa

ded

by [

UQ

Lib

rary

] at

20:

23 3

1 O

ctob

er 2

014

After these steps, the value of b is updated by maximizing the conditional probability of bgiven all the other quantities, which is proportional to Z�1

b expðbShs tidks;ktÞ. We performed

this maximization over values of b in a finite set, although it could have been done more

generally over continuous values of b. Each value ki; i ¼ 1; . . . ; n is then updated by max-

imizing the conditional probability of ki given all the other quantities to be estimated,

which is proportional to Pmj¼1V�1

j;kiexp½�Sm

j¼1ðdj;i � cj;kiÞ2=ð2Vj;ki

Þ þ bSs2qidks;ki

� where qi

is the set of neighbours of pixel i. Then, with a new configuration for k, the whole procedure

is iteratively applied until no further changes occur for k. All the above computations can be

performed very fast.

The ICM algorithm adopted here provides in general a local maximum of the posterior

distribution. In some situations, large differences between the real MAP and the ICM solu-

tions may appear [20]. Furthermore, there may be quite some dependence of the ICM solu-

tion on the initial configuration from which the algorithm starts. Therefore, a suitable starting

point for the ICM algorithm must be chosen. We obtained the initial configuration as follows.

The first principal component of the measured values of the m attributes was calculated.

Then, an optimization procedure was performed in order to find the location of K centres

such that, after classifying each data point to the class corresponding to its nearest centre,

the sum of the K variances (with respect to the centres) calculated from the K clusters of

points, is minimized. The nearest centre based classification corresponding to the optimal

location of the centres is adopted as the starting configuration for k.

3 RESULTS AND DISCUSSION

In this section, we will present some results from the application of the proposed method to the

classification of both real data from dynamic MR imaging and simulated data resembling the

real data situation. The results will also be compared to those from the K-means algorithm [3].

We start with simulated data and then turn to the real dynamic MR image data.

Our simulated data is based on a true scene shown in Figure 1a comprising three different

regions within an elliptical domain with a total number of 1,063 pixels. Based on the true

scene, two noisy images were generated following model (3), which represent two different

attributes of the true scene. The expected values we have used for the two attributes in the

three regions were 10, 16, 24 and 10, 8, 7, respectively. The used standard deviations for

the two attributes in the three regions were taken to be 2.5, 5.0, 3.0 and 2.3, 1.9, 1.4, respec-

tively. A realization of the first attribute is shown in Figure 1b while Figure 1c shows a rea-

lization of the second attribute.

Figure 1d shows the result obtained by applying the ordinary K-means algorithm to the

data from Figures 1b and 1c. The quality of this result is poor as judged both visually and

also by calculating the misclassification rate, which was equal to 43%. The initial classifica-

tion was determined as follows. After calculating the first principal component

n i; i ¼ 1; . . . ; n, of the data, the range of these values is divided into three sub-intervals of

the same length. Then, the initial classification ki; i ¼ 1; . . . ; n, is obtained according to

which of these three sub-intervals the value n i belonged. Since the K-means algorithm pro-

vides only a local solution to the minimization of the total ‘‘internal’’ cluster squared varia-

tion (1), we also tried to find a less local solution. This was done by calculating v in

correspondence to many different locations of the class centres on a finite grid. However,

this procedure did not improve the quality of the resulting classification. Since the

K-means classification depends on the starting point, we also tried to determine a ‘‘better’’

initial configuration. This was done by performing the same optimization procedure as


Dow

nloa

ded

by [

UQ

Lib

rary

] at

20:

23 3

1 O

ctob

er 2

014

above, but replacing v by a weighted total squared variation, where each of the three internal

squared variations were divided by the number of pixels in each class. The initial classifi-

cation, shown in Figure 3a, clearly improved giving a misclassification rate of 14%. However,

only minor changes appeared in the result from the K-means algorithm when using this

initial classification. We shall use this initial classification for the proposed Bayesian

approach.

The classification results shown above indicate the need for modeling the spatial interaction

between the pixels. We begin by using the Bayesian model described above without estimat-

ing b. The resulting classifications for different values of b are shown in Figure 2. The figure

clearly shows a large improvement when taking spatial interaction into account. As expected,

if the value of b is too large, the smallest of the three sub-regions tends to be smoothed away.

Figure 2 also shows that there is not a large sensitivity to the choice of b. Furthermore, for

each value of b in the range 0.6 to 1.1 a good classification result is obtained.

In order to apply the proposed Bayesian approach, we need to have an expression for

Epb ½U � as a function of b. This was obtained here, by using the Metropolis algorithm

[10,16,19], for b ¼ 0; 0:1; . . . ; 1:5, as follows. Starting from a random initial value for k,

the values ki; i ¼ 1; . . . ; n, are updated one at the time following a raster scan pixels visiting

procedure. The value ki is changed to k0i 6¼ ki with probability p ¼ 1=ðK � 1Þ

minf1; pðk0jbÞ=pðkjbÞg where the prior model is the one in (5). The procedure is repeated

iteratively until each pixel in the domain of the image (the largest ellipses in the simulated

example) has been updated following the chosen visiting scheme. An update of all n pixels in

the image is called a ‘‘sweep’’. The whole procedure is repeated N times. Then, an approx-

imation of Epb ½U � is calculated by averaging the values of the energy function corresponding

FIGURE 1 The simulated image data: the true scene (a); the image of the first attribute (b); the image of the secondattribute (c); the classification obtained by applying the K-means algorithm (d).


Dow

nloa

ded

by [

UQ

Lib

rary

] at

20:

23 3

1 O

ctob

er 2

014

to the last N/2 sweeps. The dotted line in Figure 4 shows the result of the estimated Epb ½U �

divided by the minimum value Umin of U over the state space (equal to the negative value of

the total number of neighbour pixel pairs). This normalization will be useful to compare the

results corresponding to domains with either different shape or total number of pixels. We

used N ¼ 10; 000 for Figure 4. No significant changes in the values were obtained when

using N ¼ 20; 000 or changing the initial point. The behaviour of Epb ½U �=Umin as a function

of b resembles a logistic function. By including the optimization of b in the Bayesian clas-

sification approach, the resulting optimal value of b was found to be 0.7. We note that this

value lies within the range 0.6 to 1.1 identified above. The resulting classification is shown in

Figure 3b. In this case the misclassification rate was lower than 1 percent (9/1063).

FIGURE 2 The classifications that result from using the Bayesian method with b ¼ 0; 0:1; . . . ; 1:5 to process thesimulated data.

FIGURE 3 The initial classification used to process the simulated data by the proposed method (a). The result ofapplying the proposed method (b).


Dow

nloa

ded

by [

UQ

Lib

rary

] at

20:

23 3

1 O

ctob

er 2

014

We move on to applying the proposed method to real dynamic MR data. Dynamic MR

imaging of the first passage of an injected contrast agent through an organ, gives information

about local changes in tissue perfusion, including the supply of the necessary oxygen and

glucose to the tissue. Dynamic MR imaging may identify the regions with reduced perfusion

and may be used for diagnosis and for therapy evaluation. The ‘‘peak intensity’’ and the

‘‘peak delay’’ images that we shall consider, are shown in Figures 5a and 5b. These quantities

represent one useful way of extracting information from a typical dynamic MR series along

time [21]. The images display the brain of an ischaemic rat. The approximately oval region is

the brain and regional ischaemia appears in its left part. This type of pathology is well-studied

and medical experience suggests using three classes corresponding to absence of brain da-

mage, a low/medium degree and a high degree of damage, respectively. We only classified

the pixels inside the region corresponding to the brain, having a total number of 473 pixels.

The resulting classification from the proposed method is shown in Figure 5c. The ischaemic

region containing the two subregions called ‘‘core’’ (high degree of damage) and ‘‘penum-

bra’’ (low/medium degree of damage) is well described in the classified image. For compar-

ison, the result from the K-means algorithm is shown in Figure 5d.

In this real example the optimal value of b was found to be 0.6. The plot of Epb ½U �=Umin is

shown in Figure 4 by means of the solid line. An interesting point is that the curves of

Epb ½U �=Umin for the simulated and real data examples are very similar to each other. The

plot of Epb ½U �=Umin for a 64 � 64 square image is also very similar to the other two curves,

as shown in Figure 4 by means of the dashed line.

The proposed method was implemented using the Fortran-77 language and the experi-

ments were run on an HP workstation (9000/800) with a clock frequency of 80 MHz.

Once the expectation of the energy function was calculated for the different values of b,

the execution time for the proposed method was of a few seconds of CPU time for the ex-

amples included here. The calculation of the expectation of the energy function required a

FIGURE 4 The expected value of the normalized energy as a function of b for the simulated data (dotted line), thereal data (solid line) and a 64� 64 square image (dashed line).


Dow

nloa

ded

by [

UQ

Lib

rary

] at

20:

23 3

1 O

ctob

er 2

014

total CPU time with an order ranging from tens of minutes to hours when considering the

domain with 473 pixels and the square domain with 4096 pixels, respectively. This strongly

influences the total CPU time required by the proposed method. This is of course not a pro-

blem for a square image since typically the dimension in specific applications are fixed (e.g.

256 � 256). In fact, in this situation the computation of the expectation of the energy func-

tion can be performed just once with results that the total CPU time for the proposed method

will be dramatically reduced. Furthermore, this seems also to be the case in situations where

the domain is varying, as is the case when processing hand-drawn regions of interest. In fact,

switching Epb ½U �=Umin between the real and the simulated examples when applying the pro-

posed method gave exactly the same classification result in both cases. This means that in

practice Epb ½U � for a given image could always be calculated based on a ‘‘standard’’ function

by simple multiplication by the total number of neighbour pixel pairs.

We observe that the proposed method is a natural and simple spatial generalization of the K-

means algorithm. The a priori spatial model adopted here is also very simple. More complex

models could be adopted whenever more information about the local spatial structure of the

classification scene is available [22]. This will only increase the computational cost of the

method to a higher but still acceptable level. Other possible generalizations of the method in-

volve the image data model. The more general case when the deviations of the m attributes

from their corresponding means at each pixel are dependent to each other could be considered.

For example, a Gaussian model with a non-diagonal covariance matrix could be adopted for

the data in each class. The step of the maximization of the posterior distribution with respect to

FIGURE 5 Real dynamic MR imaging data: the peak intensity image (a); The peak delay image (b); theclassification obtained from applying the proposed method (c) (the region where pathology is not present appears ingrey while the penumbra and core regions are shown in dark grey and black, respectively); the classification obtainedfrom applying the K-means algorithm (d).


Dow

nloa

ded

by [

UQ

Lib

rary

] at

20:

23 3

1 O

ctob

er 2

014

cl is equivalent in this case to the minimization of a quadratic form. Instead, in order to max-

imize the posterior distribution with respect to the elements of the covariance matrix of each

class, non-linear equations must be solved. Concerning the prior distribution assumed for the

variances Vj;l, another choice could be made, with logðVj;lÞ uniformly distributed. This choice

has better theoretical justifications than the one made here with Vj;l uniformly distributed.

4 CONCLUSIONS

We have proposed a new Bayesian method for classifying multispectral image data. The

method is automatic and estimates all the posterior model parameters simultaneously. The

method is both easy to implement and fast. Successful results were obtained when applying

the method to classify simulated and real multispectral image data.

Acknowledgments

The authors wish to thank Piero Barone, Fred Godtliebsen, Julian Stander and a referee for

their very valuable comments and suggestions and Olav Haraldseth for providing the real MR

imaging data.

References

[1] Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination.Biometrika, 82, 711–732.

[2] MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. Proceedingsof the 5th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1. University of California Press,Berkeley, California, pp. 281–297.

[3] Johnson, R. A. and Wichern, D. W. (1988). Applied Multivariate Statistical Analysis. Prentice-Hall, EnglewoodCliffs, New Jersey, pp. 566–570.

[4] Cross, G.C. and Jain, A. K. (1983). Markov random fields texture models. IEEE Trans. Pat. Anal. Mach. Intel.,5, 25–39.

[5] Grenander, U. (1983). Tutorial in pattern theory. Report, Division of Applied Mathematics, Brown University,Providence, RI

[6] Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distribution and the Bayesian restoration ofimages, IEEE Trans. Pat. Anal. Mach. Intell. , 6, 721–741.

[7] Geman, S. and McClure, D. (1987). Statistical methods for tomographic image reconstructions. Bull. Int. Statist.Inst., 52, 4–20.

[8] Marroquin, J., Mitter, S. and Poggio, T. (1987). Probabilistic solution of ill-posed problems in computationalvision. J. Am. Statist. Assoc., 82, 76–89.

[9] Green, P. (1990). Bayesian reconstructions from emission tomography data using a modified EM algorithm.IEEE Trans. Med. Imag., 9, 84–93.

[10] Winkler, G. Image Analysis, Random Fields and Dynamic Monte Carlo Methods. Springer Verlag, New York,1995.

[11] Choi, H. S., Haynor, D. R. and Kim, Y. (1989). Partial volume tissue classification of multichannel magneticresonance images – A mixed model. IEEE Trans. Med. Imag., 10, 395–407.

[12] Pappas, T. N. (1992). An adaptive clustering algorithm for image segmentation. IEEE Trans. Sign. Proc.,40, 901–914.

[13] Geman, D. (1990). Random Fields and Inverse Problems in Imaging, Lecture Notes in Mathematics. Springer-Verlag, Berlin.

[14] Besag, J. (1989). Towards Bayesian image analysis. J. Appl. Stat. 16, 395–407.[15] Besag, J. (1986). On the statistical analysis of dirty pictures (with discussion). J. Royal Stat. Soc. B, 48, 259–302.[16] Gilks, W. R., Richardson, S. and Spiegelhalter, D. J. (Eds.) (1996). Markov Chain Monte Carlo in Practice.

Chapman and Hall, London.[17] Davidson, N. (1962). Statistical Mechanics. McGraw-Hill, New York, pp. 244–245.[18] Binder, K. (Ed.) (1986) Monte Carlo Methods in Statistical Physics, Topics in Current Physics, Vol. 7. Springer

Verlag, Berlin.[19] Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E. (1953). Equations of state

calculations by fast computing machines. J. Chem. Phys., 21, 1087–1091.


Dow

nloa

ded

by [

UQ

Lib

rary

] at

20:

23 3

1 O

ctob

er 2

014

[20] Grieg, D. M., Porteous, B. T. and Seheult, A. H. (1989). Exact maximum a posteriori estimation for binaryimages. J. Royal Stat. Soc. B, 51, 271–279.

[21] Sebastiani, G., Godtliebsen, F., Jones, R. A., Haraldseth, O., Muller, T. B. and Rinck, P. A. (1996). Analysis ofdynamic magnetic resonance images. IEEE Trans. Med. Imag., 15, 268–277.

[22] Tjelmeland, H. and Besag, J. (1998). Markov random fields with higher-order interactions. Scandinavian Journalof Statistics, 25, 415–433.


Dow

nloa

ded

by [

UQ

Lib

rary

] at

20:

23 3

1 O

ctob

er 2

014

Documents

A Bayesian Method for Multispectral Image Data Classification