A Support Vector Method for Estimating Joint Density of Medical Images

A Support Vector Method for Estimating Joint

Density of Medical Images

Jesus Serrano, Pedro J. Garcıa-Laencina,Jorge Larrey-Ruiz, and Jose-Luis Sancho-Gomez.

Dpto. Tecnologıas de la Informacion y las Comunicaciones.Universidad Politecnica de Cartagena,

Plaza del hospital 1, 30202 Cartagena (Murcia), [email protected]

Abstract. Human learning inspires a large amount of algorithms andtechniques to solve problems in image understanding. Supervised learn-ing algorithms based on support vector machines are currently one of themost effective methods in machine learning. A support vector approachis used in this paper† to solve a typical problem in image registration,this is, the joint probability density function estimation needed in theimage registration by maximization of mutual information. Results es-timating the joint probability density function for two CT and PETimages demonstrate the proposed approach advantages over the classicalhistogram estimation.

1 Introduction

Human learning can be defined as the change in a subjet behaviour as a re-sult of the experience, by the stablishment of associations between stimulus andresponses by means of the practice. This practice implies an iterative trial-and-

error proccess in which better responses are used before known stimulus. Evennew responses are generated in front of unknown stimulus, as a result of eval-uating similar known ones previously learned. These learning processes are notlimited to the humankind, in the other hand, they are shared with other livingbeings. It is possible to extrapolate all this conceps concerning with biologicallearning to the field of the artificial intelligence, being the result the researcharea named machine learning [1]. This broad subfield, inspired in these biolog-ical learning processes, is in charge of develop algorithms and techniques thatmake the computers be able to “learn”.

One of the most relevant type of algorithm in machine learning is the su-

pervised training, in which the aim is to create a function from a training data,consisting in pairs of input objects and desired outputs. Support Vector Machines(SVM) are powerfull supervised training methods [2], which are “cousins” of the

† This work is partially supported by Ministerio de Educacion y Ciencia under grantTEC2006-13338/TCM, and by Consejerıa de Educacion y Cultura de Murcia undergrant 03122/PI/05.

artificial neural networks. SVM are able to solve classification and regressionproblems from a training data set, supporting the solution in a small subset ofit.

Probability density function estimation is a particular approach of SVM re-gresion. Density approximation is a common task in many image understandingproblems, particularly in those that need to compute the Mutual Information(MI) between images as a similarity measure. This is the case of the medicalimage registration by Maximization of Mutual Information (MMI) [3, 4]. Thismethod requires to estimate the joint probability density function of two or moremedical images, which usually are 3-D digital ones. Normally joint histogram isthe used method to estimate the joint density, but this is an innacurate estima-tor. In this work a support vector density estimator is used, based on the supportvector density estimation method decribed in [5], which provides a smooth andsparse solution.

This paper structures as follows: Section 2 describes the support vector den-sity estimation method; in Section 3 the application of this method in medicalimages is presented with a brief introduction of the image registration problemand the medical image registration using MMI in subsections 3.1 and 3.2 respec-tively, and the implementation of the support vector joint density estimationin subsection 3.3; the Section 4 shows the described method results; finally theconclusions and references close the paper.

2 Support Vector Density Estimation

Let p(x) denote the density function from a data set x1, ..., xl which is triedto be estimated. The distribution function is

F (x) = P (X ≤ x) =

∫ x

∞

p(t)dt (1)

Thus, finding density requires solving a linear operator equation Ap = F ,where A is the linear operator.

The empirical distribution function is computed from the data set using

Fl(x) =1

l

l∑

i=1

θ(x − xi) (2)

with

θ(x) =

1, x > 0

0, otherwise(3)

The obtained distribution is introduced in (1). The Support Vector (SV)method can be used to solve linear operator equations Ap(t) = F (x) [6], us-ing Fl(xi) as the desired output yi. For any fixed point x, Fl(x) is a unbiasedestimation of F (x) and its standard deviation is

σ =

√

1

lF (x)((1 − F (x))) (4)

so the accuracy of the approximation can be characterized as

εi = σi =

√

1

lFl(xi)((1 − Fl(xi))) (5)

thus, in order to solve the regression problem we consider the triples (xi, Fl(xi), εi).This method does not guarantee the obtained distribution will always be

positive due to the used kernel may be non-monotonic. It is possible to choosemonotonic kernels, but the drawback is the monotonic functions expressed withMercer kernels [2] have not the desired shapes to perform an accurate regression.It can be obtained using classical density estimation kernels, although thesekernels not satisfies the Mercer’s condition, by a linear programming approachfor solving the SV regression [7].

If approximating the density from a mixture of gaussian shapes is desired,the regression in the image space can be a mixture of sigmoids [5]. Then thekernels may have the form

K(xsv, x) =1

1 + eγ(xsv−x)(6)

where xsv is the kernel centre and γ the kernel width, which is a pre-fixedparameter, and the cross-kernel [5] derived from the kernel K is

K(xsv , x) =γ

2 + eγ(xsv−x) + e−γ(xsv−x)(7)

Instead of having a fixed parameter γ, an adaptative kernel width can be used.This is achieved by means of a dictionary of κ kernels for each SV centre, resultingin an approximation of the distribution function as the following expansion ofsupport vectors:

F (x) =l∑

i=1

κ∑

k=0

αki Kk(xi, t) (8)

and, from these support vectors is possible to obtain the desired density

p(t) =

l∑

i=1

κ∑

k=0

αki Kk(xi, t) (9)

where each Ki and Ki has a width γi. Thus, generalizing the linear programmingSV density estimation, we obtain the following optimization problem [5]:

min

(

l∑

i=1

κ∑

k=0

αki + C

l∑

i=1

ξi + C

l∑

i=1

ξ∗i

)

(10)

under the constraints

yi − εi − ξi ≤∑l

j=1

∑κk=0 αk

j Kk(xj , xi) ≤ yi + εi + ξ∗i , i = 1, . . . , l, (11)∑l

i=1

∑κ

k=0 αki = 1, (12)

αi ≥ 0, ξi ≥ 0, ξ∗i ≥ 0 i = 1, . . . , l (13)

where the regularizer Ω(α) =∑l

i=1

∑κk=0 αk

i can be changed for other ones thatprovide better results, like a weighted sum of the coefficients

l∑

i=1

κ∑

k=0

wkαki (14)

3 Support Vector Joint Density Estimation of Medical

Images

3.1 Image Registration Problem

It is common that medical images are represented as sequences of 2-D cross-sectional slices, which are used to construct a 3-D volume by the geometricalrelationships between the slices, and there are not information about the relativepositions of the patients in the scanners. The problem generally consists in theprospective registration of one or several 3-D images to another one, obtainingthe parameters involved in the transformation. This process is desirable to beautomated and accurate [8]. It is not a trivial problem, because of the largenumber of variables to take into account, such as the different positioning of thepatient in the scanners, the different images resolution, distortions introducedin the images which are specific in each modality, and so on. There are manyapproaches for prospectively image registering, but one of them has acquiredspecial importance in the last years because of its robustness, simplicity andmathematical elegance. This method is based in MMI, and defends that theMutal Information (MI) of the images to be registered becomes maximal if theyare geometrically aligned [3, 4].

3.2 Medical Image Registration using MMI

Let R (reference image) and F (floating image) denote two images related by aregistration transformation Tααα with parameters ααα such that voxels p in R withintensity r physically correspond to voxels Tααα(p) in F with intensity f [9]. Theinformation that a value contains about the other is measured as the mutualinformation I(F, R) of the variables F = f and R = r

r =R(p)

f =F(Tααα(p))

I(F, R) =∑

f,r

pFR(f, r) logpRF (f, r)

pF (f) · pR(r)(15)

where pRF (r, f), pR(r) and pF (f) are the joint and marginal densities respec-tively.These distributions are usually computed by simple normalization of thejoint histogram. Since the density pRF (r, f), and, in general, the marginal dis-tributions pR(r) and pF (f), depend on the mapping Tααα(p), also the mutual

information I(R, F ) does. The mutual information criterion postulates that theimages are geometrically aligned by the transformation Tααα∗(p) for which I(R, F )is maximal

ααα∗ = arg maxααα

I(R, F ) (16)

3.3 SV Joint Density Estimation Method

The density estimation method previously described is applied to one-dimensionaldensities. This section extends it to a multi-dimensional problem, such as thejoint density estimation of medical images. This is a 2-dimensional problem, inwhich we estimate the density p(x) with its corresponding distribution function

F (x) = P (X ≤ x) =

∫ x1

−∞

∫ x2

−∞

p(t)dtdt

where x1 and x2 are the images intensity values.Let R and F denote the reference and floating images, which have a cor-

responding fixed resolutions ρR and ρF , and generally, ρR 6= ρF . First, theintensity values are linearly rescaled into the continuous range (0, nR − 1) and(0, nF − 1) with nR = nF = 256. Instead of being discrete ranges, these rangesmust be continuos because rounding to the nearest integer may cause a loss ofinformation. The next step is the computation of the empirical joint distribu-tion function, in which is taken into account the different images resolution ρRand ρF . Due to Tααα(s) will not coincide with a grid point of R, a interpolationis needed. The classical interpolation schemes (Nearest Neighbour, Trilinear orPartial Volume) only extend the influence of the voxel sample s to the nearestneighbors of Tααα(s), without considering that the images may have so differentresolutions. In this case the influence region of a voxel in one of the images af-fects in the another one either a larger or a smaller number of voxels than thenumber of nearest neighbors. In this work an interpolation scheme based on thevolume overlap of the voxels is proposed, assuming that each voxel represents aparallelepiped centered on it, which dimensions and orientation depend respec-tively on the slice separation and pixel spacing in each image, and on the gridpositions and transformation parameters ααα. Figure 1 shows a 2-D projection ofthis concept. Each voxel of the reference and floating images has a fixed volumeVR and VF , and each pair of voxels in Tααα(s) and ni has a overlap volume Vs,ni

.The joint distribution function is computed using the ratio of the overlap volumeover the total volume V

∀f ≥ f(s), f ∈ F

∧∀r ≥ r(ni), r ∈ R

⇒ F (f, r)+ =Vs,ni

V(17)

where F, R are discrete ranges between 0 and (nF −1) or (nR−1), respectively.The number of points in these ranges is a parameter to be chosen, if either acoarse or a fine estimation is desired.

n1

n2

n3

n4

n5

n6

n7

n8

n9

n10

n11

n12

n13

n14

Tα(s)

Fig. 1. 2-D projection of volume overlap based interpolation.

Since the problem is 2-dimensional, then a 2-dimensional kernel is used. Inthis work the bi-dimensional kernel is chosen to be a tensor product of onedimensional gaussian-like kernel shown in section 2

K(xsv,x) =1

1 + eγ1(xsv,1−x1)×

1

1 + eγ2(xsv,2−x2)(18)

where xsv = (xsv,1, xsv,2) is the centre of the kernel, and γ = (γ1, γ2) is a vectorcontaining the kernel widths in each one of the two dimensions. Therefore, thecross-kernel results:

K(xsv ,x) =γ1

2 + eγ1(xsv,1−x1) + e−γ1(xsv,1−x1)×

γ2

2 + eγ2(xsv,2−x2) + e−γ2(xsv,2−x2)

(19)The used regularizing term in (10) is the same as in (14), i.e., a weighted

sum of the SV coefficients. These weights are chosen to penalize the kernels withsmall width, in order to perform a smooth solution. In addition, since a highlyaccuracy is desired, the εi used is choosen equal to 0.1σi.

4 Results

The described method is tested in two thorax images from a dual CT-PETscanner, obtained from the OsiriX open source site‡. The CT image is consistedof 41 slices with a resolution equal to 512 × 512, and the PET image has thesame number of slices than the CT image and resolution 128 × 128.

‡ http://homepage.mac.com/rossetantoine/osirix/Index2.html

(a) CT (b) PET

Fig. 2. Cross slices of the CT (a) and PET (b) images used.

Two cross slices of both images are shown in Figure 2. After computing theempirical distribution, the SV joint density estimation is carried out, obtaining249 support vectors, which means a 16.45% over the whole data set. Therefore,the SV density estimation results in a data compression, which allows workingover fewer points than the 256× 256 histogram points. We compare our methodwith the classical and simple estimation procedure based on the histogram.

Figures 3 and 4 show that the obtained result looks like the histogram shape,but it is smoother. An important advantage of our approach is that, since thedensity obtained is like a sum of Gaussians, analytic expresions can be derivedfor stochastic approximations of mutual information and its gradient. Therefore,the mutual information derivatives can be easy obtained in order to optimize itby means of a gradient-based or similar procedure.

(a) Joint Histogram (b) Support vector f.d.p estimation

Fig. 3. CT-PET joint histogram (a) and SV probability joint density estimation (b).

(a) 2D joint histogram (b) 2D f.d.p estimation

Fig. 4. Joint histogram (a) and support vector density estimation (b) 2D projections.

5 Conclusions

This paper describes a SV method for estimating joint densities of medical im-ages. The method makes an empirical distribution regression to find the supportvectors, and after that, the linear operator that relates the distribution to thedensity is applied to the kernels. In addition, the empirical distribution is ob-tained using a novel interpolation method that takes into acount the superposi-tion volume amount between voxels of both images. In this approach, a sigmoidalkernel is needed to achieve a correct destribution regression, which implies ob-taining a density as a sum of Gaussians. Therefore, due to the obtained result isa kernel mixture, analytic expressions of the mutual information gradient can bederived to perform gradient maximization based approaches, in order to imple-ment an efficient MMI based image registration. The resulting density is smoothand sparse, in contrast to the estimation based on histogram or other methodssuch as parzen windows.

This work will stimulate future works in many directions. Some of them wouldbe to include the SV estimation method in a MMI registration procedure, and acomparative study of the developed volume overlap interpolation approach withthe most common interpolation schemes used in the literature.

References

1. Mitchel, T.M.: Machine Learning. McGraw-Hill, New York (1997)

2. Vapnik, V.: The Nature of Statistical Learning Theory. Spinger, New York (1995)

3. Collignon, A., Maes, F., Delaere, D., Vandermeulen, D., Suetens, P., Marchal, G.:Automated Multi-modality Image Registration based on Information Theory. InBizais, Y., Barillot, C., Di Paola, R., eds.: Proceedings XIVth International Confer-ence on Information Processing in Medical Imaging – IPMI’95. Volume 3 of Com-putational Imaging and Vision., Ile de Berder, France, Kluwer Academic Publishers(June 1995) 263–274

4. Viola, P., Wells, W.M.: Alignement by Maximization of Mutual Information. In:Proceedings of the 5th International Conference on Computer Vision, Cambridge,MA (1995) 16–23

5. Weston, J.A.E.: Extensions to the Support Vector Method. PhD thesis, Universityof London (1999)

6. Vapnik, V., Golowich, S.E., Smola, A.: Support Vector Method for Function Ap-proximation, Regression Estimation and Signal Processing. In Mozer, M.C., Jordan,M.I., Petsche, T., eds.: Advances in Neural Information Processing Systems. Vol-ume 9., Cambridge, MA, MIT Press (1997) 281–287

7. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)8. Maintz, J.B.A., Viergever, M.A.: A survey of medical image registration. Medical

Image Analisys 2(1) (1998) 1–369. Maes, F., Vandermeulen, D., Suetens, P.: Medical Image Registration Using Mutual

Information. In: Proceedings of the IEEE – Special Issue on Emerging MedicalImaging Technology. Volume 91. (2003) 1699–1722 (invited paper).

Documents

A Support Vector Method for Estimating Joint Density of Medical Images