Maximum likelihood estimation of Markov Random Field parameters using Markov Chain Monte Carlo algorithms

ISS

N 0

249-

6399

ap por t de r ech er ch e

INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE

Estimation of Markov Random Field priorparameters using Markov chain Monte Carlo

Maximum Likelihood

Xavier Descombes, Robin Morris, Josiane Zerubia, Marc Berthod

N˚ 3015

October 1996

THEME 3

Estimation of Markov Random Field priorparameters using Markov chain Monte CarloMaximum LikelihoodXavier Descombes, Robin Morris, Josiane Zerubia, Marc BerthodThème 3 � Interaction homme-machine,images, données, connaissancesProjet PASTISRapport de recherche n�3015 � October 1996 � 28 pagesAbstract: Recent developments in statistics now allow maximum likelihoodestimators for the parameters of Markov Random Fields to be constructed.We detail the theory required, and present an algorithm which is easily im-plemented and practical in terms of computation time. We demonstrate thisalgorithm on three MRF models � the standard Potts model, an inhomoge-neous variation of the Potts model, and a long-range interaction model, betteradapted to modeling real-world images. We estimate the parameters from asynthetic and a real image, and then resynthesise the models to demonstratewhich features of the image have been captured by the model. Segmentationsare computed based on the estimated parameters and conclusions drawn.Key-words: Estimation, Maximum Likelihood, Potts model, Chien-model,image segmentation, image restoration (Résumé : tsvp)* R.D. Morris was supported by a grant from the Commission of the European Commu-nities under the HCM program.** email: [email protected] de recherche INRIA Sophia-Antipolis

2004 route des Lucioles, BP 93, 06902 SOPHIA-ANTIPOLIS Cedex (France)Telephone : (33) 93 65 77 77 – Telecopie : (33) 93 65 77 65

Estimation au sens du Maximum deVraisemblance des paramètres d'un modèlemarkovien par méthode de Monte CarloRésumé : De récents développements en statistiques rendent maintenantpossible l'estimation au sens du maximum de vraisemblance des paramètresassociés aux champs de Markov. Dans ce rapport, nous détaillons la théorierequise et présentons un algorithme facilement implémentable et abordable dupoint de vue du temps CPU. Nous appliquons cet algorithme à trois modèlesmarkoviens : le modèle de Potts, une variante inhomogène de ce modèle et auchien-modèle (modèle comprenant des interactions mettant en jeu plus de deuxpixels et plus adapté pour modéliser des images réelles). Nous estimons les pa-ramètres à partir d'images de synthèse et d'images réelles, puis re-synthétisonsles modèles pour étudier les caractéristiques de l'image captées par le modèle.Nous faisons aussi des segmentations en utilisant les paramèters que nous avonsestimés, et nous donnons des conclusions.Mots-clé : Estimation, Maximum de Vraisemblance, modèle de Potts, chien-modèle, segmentation d'images, restauration d'images

MCMCML 3Contents1 Introduction 42 Maximum Likelihood estimators 52.1 The log-likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . 63 An MCMCML algorithm 83.1 Estimate a robustness criterion . . . . . . . . . . . . . . . . . . 83.2 Estimation algorithm . . . . . . . . . . . . . . . . . . . . . . . . 104 Validation on Markovian priors 114.1 The Potts model . . . . . . . . . . . . . . . . . . . . . . . . . . 114.2 An inhomogeneous variation of the Potts model . . . . . . . . . 114.3 The chien-model . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Pros and cons of the priors 185.1 Image constraints modeled by priors . . . . . . . . . . . . . . . 185.2 Bayesian inference using Markovian priors . . . . . . . . . . . . 185.3 Sampling considerations . . . . . . . . . . . . . . . . . . . . . . 246 Conclusion 26

RR n�3015

4 X. Descombes, R. Morris, J. Zerubia, M. Berthod1 IntroductionEarly vision algorithms extract some information from observed data withoutany speci�c knowledge about the scene. However, these data (remote sensingdata, medical images,...) are usually disturbed by noise. To improve thealgorithms, regularization techniques are used, incorporating constraints onthe solution. These constraints represent a general knowledge about whata natural scene should be. A popular way to de�ne these constraints is toconsider a probabilistic model (the prior) of the expected result. Using theBayesian approach, we search for a realization which optimizes the probabilityof the solution, given the data. A key point to obtain unsupervised algorithmsin this paradigm is to be able to estimate the di�erent parameters involvedin the prior. Accurate estimators of these parameters are necessary to controlthe impact of the prior on the properties desired for the solution.Because of their ability to model global properties using local constraints,Markov Random Fields (MRFs) are very popular priors. Several optimizationalgorithms converging either toward a global minimum of the energy [1] or a lo-cal one [2], [3] are now well de�ned. But accurate estimation of the parametersis still an open issue. Indeed, the partition function (normalization constant)leads to intractable computation. Parameter estimation methods are then ei-ther devoted to very speci�c models [4], [5] or based on approximations suchas Maximum Pseudo Likelihood (MPL) [6], [7]. Unfortunately, these approxi-mations lead to inaccurate estimators for the prior parameters. MaximumLikelihood Estimators (MLEs) have more interesting properties. Gidas provesin [8] that MLEs are consistent. Asymptotic normality can be reached in somespecial case, such as the 2-dimensional Ising model. For high dependencymodels, the MPL gives poor results as reported in [9]. The aim of MRFs inimage processing is to obtained regularized solutions. High dependencies arerequired to get homogeneous realizations. Thus, MLEs should improve imagesegmentation and image restoration algorithms based on Markovian priors.Markov Chain Monte Carlo algorithms (MCMC) [10] are very popular inimage processing to derive optimization methods when using a Markovianprior [2], [1]. In fact, MCMC algorithms can be developed for other purposesthan Bayesian inference. Indeed, they can be used to derive MLE. The par-tition function of Gibbs Fields can be estimated using an MCMC procedure.INRIA

MCMCML 5A Maximum Likelihood estimation using an MCMC algorithm is proposedin [11]. Geyer proves the convergence in probability of the MCMC toward theMLE. This method can be applied to a wide range of models such a Point Pro-cesses [12] or Markov Random Fields. In this paper, we propose an estimationalgorithm for Markovian prior parameters based on an MCMCML procedure.We validate this algorithm on three di�erent priors.In section 2, we compute the Maximum Likelihood estimators of a givenGibbs Field whose energy is linear with respect to parameters. Importancesampling is also introduced. This allows us to compute statistics of the modelassociated with given parameters using samples obtained with other parametervalues. These results lead to an MCMCML estimation method described insection 3. Results are detailed in section 4. We consider three di�erent Marko-vian priors: the Potts model, an inhomogeneous variation of the Potts modeland the Chien-model. This Maximum Likelihood estimation allows us to de-rive some comments about the priors. Section 5 is devoted to a comparisonbetween the priors considered. Finally, we conclude in section 6.2 Maximum Likelihood estimators2.1 The log-likelihoodLet P� be a random �eld de�ned on S and parameterized with vector � = (�i).We consider P� to be a Gibbs Field, whose energy is linear with respect to theparameters �i. We then have :P�(Y ) = 1Z(�) exp�h�; Y i = 1Z(�) exp "�Xi �iNi(Y )# ; (1)where Ni(Y ) are functions of the con�guration Y . In this paper, we considera continuous framework for the state space �. Results are still valid in thediscrete case by changing integrals into sums. The partition function Z(�) isthen written: Z(�) = Z�S exp "�Xi �iNi(X)# dX; (2)where S is the site set and � is the state space.RR n�3015

6 X. Descombes, R. Morris, J. Zerubia, M. BerthodWe consider that we have data Y . We want to �t the model to the data.The log-likelihood is then de�ned by :logP (Y j�) = log " 1Z(�) exp�Xi �iNi(Y )# ; (3)logP (Y j�) = �Xi �iNi(Y )� logZ(�): (4)The maximum likelihood estimators are obtained by maximizing the log-likelihood.We then have: 8i; @ logP (Y j�)@�i �� = 0; (5)and then: 8i;�Ni(Y ) + R�S Ni(X) exp h�Pi �iNi(X)i dXR�S exp h�Pi �iNi(X)i dX = 0; (6)where �i is the maximum likelihood estimator of �i.Denoting by < a(x) >�, the expectation of a(x) with respect to P�, we�nally get: 8i; hNi(X)i� = Ni(Y ): (7)To evaluate the log-likelihood function, we have to compute the partition func-tion. The partial derivatives of the log-likelihood requires the computation ofthe di�erent hNi(X)i�. Unfortunately, the computation of these quantitiesis intractable. We can estimate the hNi(X)i� by sampling the distribution.Nevertheless, to sample the distribution for each value of � is inconceivablefrom CPU time considerations.2.2 Importance samplingWe introduce importance sampling to avoid having to sample the model foreach value of �. Indeed, importance sampling allows us to estimate statisticalmoments corresponding to P� using samples obtained from P.Consider �rst the partition function. We have:Z(�) = Z exp "�Xi �iNi(X)# dX; (8)INRIA

MCMCML 7then: Z(�) = Z exp "�Xi (�i � i)Ni(X)# exp "�Xi iNi(X)# dXZ(�) = Z exp "�Xi (�i � i)Ni(X)#Z()dP(X): (9)For each couple (�;), the ratio of the partition functions is given by:Z(�)Z() = E exp "�Xi (�i � i)Ni(X)#! ; (10)where E refers to the expectation with respect to the law P.The partition function corresponding to P� can thus be estimated fromthe sampling of P. We just have to sample the law with parameter to getan estimator of the ratio Z(�)Z() for all � by computing from the samples theexpectation given by formula (10).Consider now the log-likelihood. The maximum likelihood estimator isgiven by the vector � which maximizes formula (3). This is equivalent tominimizing the following expression:� logP�(Y ) =Xi �iNi(Y ) + log Z(�)Z() : (11)The partial derivative of the partition function can be written:@Z(�)@�i = Z�Ni(X) exp 24�Xj (�j � j)Nj(X)35Z()dP(X)@Z(�)@�i = �Z()E0@Ni(X) exp 24�Xj (�j � j)Nj(X)351A : (12)Then, we have:@ � logP�(Y )@�i = Ni(Y )� E �Ni(X) exp h�Pj (�j � j)Nj(X)i�E �exp h�Pj (�j � j)Nj(X)i� : (13)RR n�3015

8 X. Descombes, R. Morris, J. Zerubia, M. BerthodFrom a sampling of P we thus can theoretically estimate the log-likelihood ofP� and its partial derivatives for all �. The same kind of computation allowsus to compute the Hessian, and we have:@2 � logP�(Y )@�i@�j = Ni(Y )Nj(Y )�E (Ni(X)Nj(X) exp� [Pk (�k � k)Nk(X)])E (exp� [Pk (�k � k)Nk(X)]) :(14)3 An MCMCML algorithm3.1 Estimate a robustness criterionConsider an image Y from which we wish to compute the maximum likelihoodestimator of � using P� as a model. From the image we can extract thevalue of the Ni(Y ). Then, we can sample the law P for a given . Fromthe samples, the di�erent expectations involved in formulas (10) and (13) canbe estimated. Then, for all �, we can estimate the log-likelihood and itsderivatives. An optimization algorithm (gradient descent, conjugate gradientfor example) leads then to the maximum likelihood estimator of �, when thelog-likelihood is a convex function.Nevertheless, if the two parameters � and are too far from each other,the estimation of the expectations will be inaccurate. Indeed, the robustnessof the estimation of the expectations requires the overlap between the two dis-tributions P� and P to be large enough. The proposed method is practicallyvalid only in a neighborhood of parameter . During the optimization, whenthe current value of � is too far from , we have to re-sample the model usinga new value for (we take the current value of � for ). Such a samplingrequires CPU time. So, we need a criteria to de�ne the neighborhood of onwhich the estimation is robust to avoid un-necessary sampling. A �rst idea isto use a metric between the distributions P and P� given by:d (P�; P) = 12 Z jP�(X)� P(X)j dX: (15)By de�nition, we have:P�(X)� P(X) = exp�hX;�iZ(�) � exp�hX;iZ() ; (16)INRIA

MCMCML 9P�(X)� P(X) = Z()Z(�) exp [�hX;�i+ hX;i]� 1! exp�hX;iZ() : (17)Thus, we de�ne a distance between the distribution by:Z jP�(X)� P(X)j dX = Z �� Z()Z(�) exp [�hX;�i+ hX;i]� 1!�� dP(X);= E ��Z()Z(�) exp [�hX;�i+ hX;i]� 1��! : (18)By using formula (10), we then have:d (P�; P) = 12E ��Z()Z(�) exp [�hX;�i+ hX;i]� 1��! : (19)We can compute this distance to test the robustness of estimates and decidewhether we should sample the model once more or not. However, this distanceis also estimated and can be biased. Therefore, we de�ne a heuristic criterion,considering the current samples used for estimating the expectations. For eachsample Xi, we de�ne a weight by:!i = U�(Xi)� U(Xi); (20)!i =Xj (�j �j)Nj(Xi): (21)Computing the expectations, we use the following trick to avoid over�ow andnumerical instabilities:Xsamples i exp [�!i] = exp [�!max] Xsamples i exp [�(!i � !max)]; (22)where !max = maxsamplesi !i.Consider the di�erent samples. If !max is too far from most of the !i thequantity exp [�(!i � !max)] will be close to 0 and results in a poor estimation.So, we can consider the estimation robust only if !max � !min is lower than agiven threshold.RR n�3015

10 X. Descombes, R. Morris, J. Zerubia, M. Berthod3.2 Estimation algorithmWe can now derive an algorithm based on the conjugate gradient principle.Consider the current parameter estimate � and a sampling of P�. We canestimate the gradient and the Hessian of the log-likelihood function at �. Wethen compute the conjugate directions [13]. Along each conjugate directionwe de�ne an interval using the distance de�ned by equation (19), where thelog-likelihood estimation is robust. We then maximize the log-likelihood alongthese intervals.The algorithm can be written as follows:1. Compute the Ni(Y )2. Initialize �0, n = 03. Sample the distribution P�n4. Estimate the gradient and the Hessian of the log-likelihood at �n, usingequations (13) and (14)5. Compute the conjugate directions �i6. For each conjugate direction de�ne a search interval using either thedistance de�ned in equation (19): Ii = h�n � �i�i; �n + �i�ii where�i = sup�2IR+ n� : d �P�n ; P�n��i� < To ;�i = sup�2IR+ n� : d �P�n; P�n+��i� < To where T is a threshold, or theproposed heuristic criterion.7. Compute �n+1 by maximizing the log-likelihood along each search inter-val using golden section search [13]8. if jj�n+1� �njj > T2 put n = n+1 and go back to 3, where T2 is anotherthreshold.INRIA

MCMCML 114 Validation on Markovian priorsIn this section, we consider di�erent Markov models used as priors in imageprocessing. We validate the estimation method on these models and demons-trate the generality of its applicability.4.1 The Potts modelThe Potts model is commonly used as a prior in image segmentation. Itdepends on a single parameter � and is de�ned by:P�(X) = 1Z(�) exp 24�� Xc=fs;s0g2C �xs 6=xs035 ; (23)where C is the set of cliques. In this case, a clique consists in two neighboringpixels. We consider the case where the lattice S is a subset of ZZ2. For simula-tions and estimations we have considered the 4 nearest-neighbors. The Pottsmodel can be embedded in the general form of equation (1):P�(X) = 1Z(�) exp [��N0(X)] = 1Z(�) exp [��#X ] ; (24)where N0(X) = #X is the number of inhomogeneous cliques in the con�-guration X. The model depends on one single parameter, so the proposedalgorithm is simpli�ed as we do not have to compute the conjugate directions.Table 1 shows estimates obtained using the MCMCML method for di�erentvalues of �.4.2 An inhomogeneous variation of the Potts modelWe can extend the procedure to the case of a non-stationary Potts model.Consider a Potts mode for which the parameter � depends on the localizationof the clique. We suppose for simplicity that this dependency is linear andthat � is written: �c=fxi;j ;xp;qg = a� i+ p2 � + b�j + q2 �+ c: (25)RR n�3015

12 X. Descombes, R. Morris, J. Zerubia, M. BerthodParameters � = 0:53 (N0 = 48070) � = 0:5493 (N0 = 35699)Estimates � = 0:529 (hN0i = 48072) � = 0:5488 (hN0i = 35699)Sample

Table 1: Estimation of the Potts model parameterTo estimate � we have to estimate a, b and c. The associated distribution iswritten:Pa;b;c(X) = 1Z(a; b; c) exp 24� Xc=fs;s0g2C �a� i+ p2 � + b�j + q2 �+ c� �xs 6=xs035 :(26)This model can be written in the form of equation (1):Pa;b;c(X) = 1Z(a; b; c) exp [�cN0(X)� bN1(X)� aN2(X)] ; (27)where: N0(X) = #X ; the number of inhomogeneous cliquesN1(X) = Xinhomogeneous cliques j + q2 = N0(X)�j + q2 �inh:cl:N2(X) = Xinhomogeneous cliques i+ p2 = N0(X)�i + p2 �inh:cl:Table 2 shows samples from this model and the parameters estimated fromthese samples.INRIA

MCMCML 13

Parameters c 0.3 N0 26603b 0.005 N1 1703201a 0.002 N2 2777495 c 0.5 N0 29171b 0.005 N1 1816390a 0.0 N2 3704620Estimates c 0.3002 hN0i 27960b 0.0052 hN1i 1771015a 0.0025 hN2i 2757615 c 0.535 hN0i 30495b 0.0052 hN1i 1859293a 0.0003 hN2i 3760871SampleTable 2: Estimation of an inhomogeneous variation of the Potts model.

RR n�3015

14 X. Descombes, R. Morris, J. Zerubia, M. Berthod4.3 The chien-modelTo improve segmentations some more complex models have been proposed inthe last few years. These models consider cliques of more than two pixelsto de�ne more accurately local con�gurations and their contribution to themodel. Such a model based on 3 � 3 cliques was proposed in [14]. Anothermodel on an hexagonal lattice can be found in [15]. These models consideronly the clique con�gurations. In [16], a binary model (the chien-model) takinginto account links between neighboring cliques is proposed. This model hasbeen generalized to the m-ary case in [17]. This model, although regularizing,preserves �ne structures and linear shapes in images. In this model, the setof cliques is composed of 3 � 3 squares. The chien-model is de�ned from thediscrimination between noise, lines and edges. Three parameters (n, l and e)are associated to these patterns.Before constructing the model the di�erent con�gurations induced by a3� 3 square are classi�ed using the symmetries (symmetry black-white, rota-tions, etc.) This classi�cation and the number of elements in each class areC(15)

8

C(3)

C(4)

8

16

C(5)

8

C(6)

8

C(7)

16

C(8)

4

C(9)

8

C(10)

8

C(11)

4

C(12)

16

8

C(13)

8

C(14)

16

C(16)

16

C(17)

16

C(18)

C(19)

16

8

C(20)

8

C(24)

8

C(21)

8

C(22)

16

C(23)

8

C(25)

4

C(26)

8

C(27)

4

C(28)

16

C(29)

8

C(30)

16

C(31)

8

C(32)

16

C(36)

8

C(35)

16

C(34)

8

C(33)

8

C(42)

16

C(40)

16

C(39)

16

C(38)

16

C(37)

8

C(41)

C(46)

8

C(44)

16

C(43)

16

C(45)

8

8

C(47)

8

C(49)

2

C(50)

8

C(51)

2

C(48)

2

2

C(2)

C(1)

Figure 1: The di�erent classes induced by a binary 3 � 3 model andtheir number of elementsdescribed in Figure 1. A parameter is associated to each class and refers to thevalue of the potential function for the considered con�guration. So, under theINRIA

MCMCML 15hypothesis of isotropy of the model which induces the symmetries, we have forsuch a topology (cliques of 3�3) �fty one degrees of freedom. The constructionof the model consists in imposing constraints by relations between its para-meters. Two energy functions which di�er only by a constant are equivalent,so we suppose that the minimum of the energy is equal to 0. The global rea-lization of 0 energy are called the ground states of the model and representthe realization of maximal probability. We suppose that uniform realizationsare ground states, so we have the �rst equation for the parameters given byC(1) = 0. We then de�ne the di�erent constraints with respect to those twouniform realizations. The �rst class of constraints concerns the energy of edgeswhich is noted e per unit of length. Due to symmetries and rotations we justhave to de�ne three orientations of edges corresponding to the eight ones in-duced by the size of cliques. These constraints and the derived equations arerepresented on �gure 2. These constraints are de�ned for features of width at+=

= + + +

= + +Figure 2: Equations associated with edges constraintsleast equal to 3. For other features, we have to de�ne other constraints. Whenthe width of the considered object is one pixel, it is referred as a line and hasenergy per unit l. For larger features (double and triple lines), we consider thatthe energy per unit is given by 2� e, which correspond to left and right edge,and get more equations. All these constraints induce eleven equations whichdepend on fourteen parameters leading to the following solution (see [16] forfull details): C(3) = C(5) = e4 C(26) = l � e C(14) = p2e4C(16) = C(23) = p56 l � e4 C(11) = C(28) = p2l3 � e6C(35) = p2e2 C(29) = p5e6 C(13) = C(9) = C(19) = e2Noise is de�ned by assigning to every other con�guration a positive value n.RR n�3015

16 X. Descombes, R. Morris, J. Zerubia, M. BerthodTo extend the binary chien-model in an m-ary model, we de�ne the energyof a given con�guration as the sum of several energies given by the binarymodel. Consider a con�guration and a given label �0. We put every pixels ofthe con�guration which are in state �0 to 0 and others to 1. We then have abinary con�guration. The energy of the m-ary model is the sum of the energiesobtained by these deduced binary con�gurations for the m labels (see �gure 3).The potential associated with each con�guration is then a linear combination+= +Figure 3: M-ary extension of the chien-modelof the three parameters e, l and n:8i = 0; :::; 51 C(i) = �(i)e + �(i)l + �(i)n: (28)The resulting distribution is written:Pe;l;n(X) = 1Z(e; l; n) exp [�eN0(X)� lN1(X)� nN2(X)] ; (29)where: N0(X) = Xi=1;:::;51 �(i)#i(X);N1(X) = Xi=1;:::;51�(i)#i(X);N2(X) = Xi=1;:::;51 �(i)#i(X):#i(X) being the number of con�gurations of type i in the realization X.Results are summarized in table 3. The chosen parameter values show theproperties of the model. Indeed, we can control the total length of edges aswell as the total length of lines. Moreover, parameter n allows us to controlthe amount of noise. The table also shows that we can accurately estimate theparameters of this model.

INRIA

MCMCML 17

Parameters e 0.2 N0 32950l 0.4 N1 11724n 0.6 N2 49708 e 0.4 N0 21089l 0.8 N1 1710n 1.0 N2 2880Estimates e 0.2008 hN0i 32857l 0.4038 hN1i 11669n 0.5997 hN2i 49606 e 0.3905 hN0i 21358l 0.7843 hN1i 1585n 0.9993 hN2i 2947Sample

Table 3: Estimation of chien-model parameters

RR n�3015

18 X. Descombes, R. Morris, J. Zerubia, M. Berthod5 Pros and cons of the priors5.1 Image constraints modeled by priorsIn this section, we provide tests to evaluate the properties the di�erent modelscan incorporate into, for example, a segmentation result. First, we consider abinary synthetic image (see �gure 4.a). A segmented SPOT image is the se-cond proposed test (see �gure 4.b). We estimate the corresponding parametersfor each model and then synthesize the model using estimated parameters. Inthis way, we can observe the properties of the initial image which are captu-red by the di�erent models. We �rst consider the synthetic image shown in�gure 4.a. Results obtained for the Potts and chien models using a MCMCMLestimation are summarized in table 4. As we consider a general model for seg-mented images, the realization of the models are visually far from the originalimage. Nevertheless, we can point out that in the case of the Potts model, theonly image characteristic represented in the simulation is the number of inho-mogeneous cliques. For a given number of inhomogeneous cliques, there aremany more con�gurations composed of an uniform background with noise thancon�gurations composed of several homogeneous shapes. Therefore, using es-timated parameters, the Potts model does not seem to be a regularizing prior.As it considers cliques of 3� 3 pixels, the chien-model allows us to de�ne edgeand line lengths. The realization of the chien-model obtained with estimatedparameters contains di�erent shapes. The global edge and line lengths are thesame as the original image. Therefore, it seems more appropriate to be usedas a prior for image modeling.5.2 Bayesian inference using Markovian priorsWe consider in this subsection a classical application to validate the previousassertions concerning the priors in this study. We �rst consider a noisy versionof the binary synthetic image shown on �gure 4.a. The original image is cor-rupted by a channel noise of ratio 0:15 (15% of the original pixels are reversed)(see �gure 5). We perform a restoration in a Bayesian framework.Denote the noisy image by X = (xs)s2S and the restored image by Y =(ys)s2S where S is the lattice and s = (i; j) is a pixel. The data are X andwe search for Y which minimizes some cost function under the probabilityINRIA

MCMCML 19

a: Binary synthetic image b: Segmented SPOT image, 5 classesFigure 4: Test images: 256� 256

a: channel noise corrupted (� = 0:15) b: SPOT imageFigure 5: Test images: 256� 256RR n�3015

20 X. Descombes, R. Morris, J. Zerubia, M. BerthodModel N0 Es. par. < N0 > N1 Es. par. < N1 >Potts 8706 0.4981 8734Chien 3259 0.82 3261 1462 1.5 1464Model N2 Es. par. < N2 >PottsChien 1101 1.6 1094Model Sample with estimated parameters CommentsPotts Inhomogeneous cliques arerepresented.The simulation is far fromthe original image. It re-presents noise in a homoge-neous background.Chien Edge and line lengths arecaptured.Surface mean of black areasis greater than in the origi-nal image due to the doubleand triple lines in the origi-nal image (bone, birds, ho-rizontal line...)Table 4: Comparison of Potts model and chien-model as prior for the binarysynthetic image

INRIA

MCMCML 21Model N0 Es. par. < N0 > N1 Es. par. < N1 >Potts 18915 0.6043 18129Chien 17856 0.4261 18585 1605 1.0203 1651Model N2 Es. par. < N2 >PottsChien 11175 0.7353 11750Model Sample with estimated parameters CommentsPotts The Potts model results ina noisy uniform image. Theamount of noise is de�nedby the number of inhomoge-neous cliques in the SPOTimage.Chien The resulting image iscomposed of homogeneousareas. The edge length isgiven by the SPOT image.The lines in the SPOTimage are representedby little segments in thesynthetic realization.Table 5: Comparison of Potts model and chien-model as prior for a segmentedSPOT image

RR n�3015

22 X. Descombes, R. Morris, J. Zerubia, M. BerthodP (Y jX). Using Bayes law, we have:P (Y jX) = P (XjY )P (Y )P (X) / P (XjY )P (Y ): (30)P (Y ) is de�ned by the prior whereas P (XjY ) represents the data attachmentterm. As we have considered an uncorrelated channel noise of ratio 0:15, wehave: 8s 2 S;8>>><>>>: p(xs = 0jys = 0) = 0:85p(xs = 1jys = 0) = 0:15p(xs = 0jys = 1) = 0:15p(xs = 1jys = 1) = 0:85 (31)The global probability P (XjY ) is written as follows:P (XjY ) = Ys2S p(xsjys) = expXs2S ln p(xsjys): (32)We consider a prior given by equation (1). We then have:P (Y jX) / 1Z(�) exp "�Xi �iNi(Y ) +Xs2S ln p(xsjys)# : (33)As P (Y ) is a Markov Random Field, Ni(Y ) is written as a sum of local po-tentials: Pc2C Vc(ys; s 2 c). We use the Maximiser of the Posterior Marginal(MPM) estimate, found using a Gibbs Sampler [18].The results obtained are shown in table 6. The restored image obtainedwith the Potts model is still noisy. It is not surprising when we consider thesimulation shown in table 4. However, we can �nd in the literature betterrestorations using a Potts model as a prior. In that case the parameter �is increased in order to over-regularize the solution. By using this trick, thenoise is erased but details are lost (see �gure 6). In some cases, the parameter� is estimated using a Maximum Pseudo-likelihood criteria [19]. Such anestimator tend to over-estimate the parameter [11]. The chien model, however,successfully regularises the segmentation. The noise is removed and for themost part the structures in the image are retained. The model has trulycaptured the salient characteristics of the original image.INRIA

MCMCML 23Model Restored image CommentsPotts The result is still noisy. Wecan not have the right num-ber of homogeneous cliquesand obtain a regularized so-lution. To regularize the re-sult we have to increase theparameter �.Chien The chien-model is moreadapted to restoration asit controls edge and linelengths. The prior reallymodels some of the imagecharacteristics.Table 6: Segmentation of a noisy binary image using MCMCML estimatorsfor the prior parameters

Figure 6: Restoration of �gure 5 with a Potts model over-regularizing (� = 1:0)RR n�3015

24 X. Descombes, R. Morris, J. Zerubia, M. BerthodWe now consider the original SPOT image (see �gure 5.b). We compare thebehavior of the models being studied when segmenting this image. We supposethat each of the �ve classes can be represented by a normal law. The meansand variances of these classes are estimated using the empirical estimators onthe subsets de�ned on �gure 4.b. The conditional probability P (Y jX) is thenwritten as follows:P (Y jX) / 1Z(�) exp "�Xi �iNi(Y ) +Xs2S Xclass c (xs � �c)22�c � ln(2��2s)! �ys=c#(34)This distribution is sampled and an MPM estimation performed. The resultingsegmentations using the Potts model and the chien model are given in table 7.Both segmentations are close to that shown in �gure 4. This is because the data(�gure 5.b) is very clear and noise free � it is in e�ect easy to segment; littleregularisation is required. The errors in the segmentations occur at the edgesof the regions. This partially explains the apparent success in the literature ofsegmentations performed with the Potts model as prior � in many cases theprecise form of the regularisaton is unimportant. As we have seen with thebinary image corrupted by channel noise, this is not always true and accurateprior modeling is important in these cases.5.3 Sampling considerationsIn the proposed MCMCML algorithm, more than 99% of the required CPUtime consists in sampling the model. Computing the log-likelihood and itsderivatives is very fast when we have the samples, indeed, once the samplesare in place performing the maximum likelihood estimation takes less than aminute on a Sun-20. This sampling is obtained using a Metropolis-Hastingalgorithm. We �rst have to iterate the algorithm until we reach convergenceand then achieve enough iterations to get accurate estimates of the di�erentstatistical moments involved in the MCMCML estimation. Among Metropolis-Hasting algorithms, the Gibbs Sampler is the most used in image processing.However, when considering the Potts model the Swendsen-Wang algorithm [20]is more e�cient. The Swendsen-Wang algorithm considers clusters insteadof pixels. The convergence rate is then faster than a single site updatingalgorithm. Moreover, as it moves freely within the distribution, we need fewerINRIA

MCMCML 25Model Restored image CommentsPotts The segmentation using thePotts model consists mainlyof zones. The classi�cationerrors are along the regionboundaries. The data is suf-�ciently good that little re-gularisation is needed.Chien The chien-model segmenta-tion is visually very similar.The model is well matchedto the data, but the data aresu�ciently good that theexact form of the regulari-sation is unimportant.Table 7: Segmentation of a SPOT using MCMCML estimators for the priorparameters

RR n�3015

26 X. Descombes, R. Morris, J. Zerubia, M. Berthodsamples to obtain accurate estimates of the statistical moments than whenusing a Gibbs Sampler. Unfortunately, this algorithm can not be applied tothe Chien-model. Finding an auxiliary variable to de�ne clusters in this caseis still an open issue. In a CPU time point of view, users may then prefer thePotts model.Notice that, for given parameters, we only ever have to sample the modelonce. To compute the estimation, we need for each sample the values of theNi. We can store these values in a data base. The parameter space can bediscretized. The discretization step depends on the robustness of the impor-tance sampling. Once we have sample the model for each value of the discreteparameter space the proposed algorithm requires a few seconds on a SUN 20.We can initialize the parameters with the values corresponding to the closest< Ni > in the data base and derive the estimators without further samplingof the model.6 ConclusionIn this paper, we have used recent development in statistics to propose analgorithm performing maximum likelihood estimation of Markovian prior pa-rameters. Using importance sampling, the proposed algorithm avoids too muchsampling which would require huge CPU time. Moreover, a data base can becomputed which suppresses sampling. We are currently working on such adata base.Using the maximum likelihood criterion leads to accurate estimators ofthe prior parameters. Therefore, we can compare the di�erent priors and theregularizing properties they handle. In this paper, we have considered threeMarkovian priors: the Potts model and a nonstationary variation of this model,and the Chien-model. The property handled by the two �rst models consistsessentially of homogeneous cliques. The chien model seems more appropriateto image processing as it controls image features (edge length, line length,noise) independently. On the other hand, this model requires more CPU timeas it considers higher order interactions. The quality of the data can alsoin�uence on a practical level the choice of a priori model.INRIA

MCMCML 27References[1] S. Geman, D. Geman. Stochastic relaxation, Gibbs distribution, and theBayesian restoration of images. IEEE trans. on Pattern Analysis andMachine Intelligence, 6(6):721�741, 1984.[2] J. Besag. Spatial interaction and statistical analysis of lattice systems.Journal of the Royal Statistical Society Series B, 36:721�741, 1974.[3] J. Zerubia, R. Chellappa. Mean �eld approximation using CompoundGauss Markov Random �elds for edge detection and image estimation.IEEE trans. on Neural Networks, 8(4):703�709, July 1993.[4] J.K. Goutsias. Mutually compatible Gibbs random �elds. IEEE trans. onInformation Theory, 35:1233�1249, 1989.[5] J.K. Goutsias. Unilateral approximation of Gibbs random �eld images.Computer Vision, Graphics and Image Processing: Graphical Models andImage Processing, 53:240�257, 1991.[6] B. Chalmond. Image restoration using an estimated Markov model. SignalProcessing, 15:115�129, 1988.[7] A.J. Gray, J.W. Kay, D.M. Titterington. On the estimation of noisybinary Markov Random Fields. Pattern Recognition, 25:749�768, 1992.[8] B. Gidas. Markov Random Fields (Theory and Application), chapter 17,Parameter Estimation for Gibbs Distributions from Fully Observed Data,pages 471�498. Ed. R. Chellappa, A.K. Jain Academic Press Inc., 1993.[9] C.J. Geyer. Markov Chain Monte Carlo Maximum Likelihood. School ofStatistics University of Minnesota Minneapolis, MN 55455, 1993.[10] W.K. Hastings. Monte Carlo sampling methods using Markov Chains andtheir applications. Biometrika, 57:97�109, 1970.[11] C. J. Geyer, E.E. Thompson. Constrained Monte Carlo Maximum Like-lihood for dependent data. J. R. Statist. Soc. B, 54(3):657�699, 1992.RR n�3015

28 X. Descombes, R. Morris, J. Zerubia, M. Berthod[12] Y. Ogata, M. Tanemura. Estimation for interaction potentials of spatialpoint patterns through the maximum likelihood procedure. Annals of theInstitute of Statistical Mathematics, B, 33:315�338, 1981.[13] W. Press, S. Teukolski, W. Vetterling, B. Flannery. Numerical Recipesin C: The Art of Scienti�c Computing. Cambridge University Press, 2nded., 1992.[14] G. Wolberg, T. Pavlidis. Restoration of binary images using stochasticrelaxation with annealing. Pattern Recognition Letters, 3:375�388, 1985.[15] H. Tjelmeland, J. Besag. Markov Random Fields with higher order inter-actions. submitted to JASA. Preprint.[16] X. Descombes, J.F. Mangin, E. Pechersky, M. Sigelle. Fine structurespreserving model for image processing. In Proc. 9th SCIA 95 Uppsala,Sweden, pages 349�356, 1995.[17] X. Descombes. Application of stochastic techniques in image processingfor automatic tissue classi�cation in MRI and blood vessel restoration inMRA. Technical Report KUL/ESAT/MI2/9603, Laboratory for MedicalImaging Research (ESAT-Radiology), K.U.Leuven, Belgium, 1996.[18] X. Descombes, R. Morris, J Zerubia. Quelques améliorations la segmen-tation d'images bayesienne. Première partie : modélisation. Traitementdu signal (submitted), 1996.[19] J. Besag. On the statistical analysis of dirty pictures. J. Roy. Statis. Soc.,Series B, 48:259�302, 1986.[20] R.H. Swendsen, J.S. Wang. Nonuniversal critical dynamics in Monte Carlosimulations. Physical Review Letters, 58(2):86�88, January 1987.INRIA

Unite de recherche INRIA Lorraine, Technopole de Nancy-Brabois, Campus scientifique,615 rue du Jardin Botanique, BP 101, 54600 VILLERS LES NANCY

Unite de recherche INRIA Rennes, Irisa, Campus universitaire de Beaulieu, 35042 RENNES CedexUnite de recherche INRIA Rhone-Alpes, 655, avenue de l’Europe, 38330 MONTBONNOT ST MARTIN

Unite de recherche INRIA Rocquencourt, Domaine de Voluceau, Rocquencourt, BP 105, 78153 LE CHESNAY CedexUnite de recherche INRIA Sophia-Antipolis, 2004 route des Lucioles, BP 93, 06902 SOPHIA-ANTIPOLIS Cedex

EditeurINRIA, Domaine de Voluceau, Rocquencourt, BP 105, 78153 LE CHESNAY Cedex (France)

ISSN 0249-6399

Documents

Maximum likelihood estimation of Markov Random Field parameters using Markov Chain Monte Carlo algorithms