ISSN 1751-9659 Probabilistic relevance feedback approach ...arly/papers/IET09.pdf · This is known in the CBIR community as the ‘semantic gap’ problem [13]. Relevance feedback

10

& T

www.ietdl.org

Published in IET Image ProcessingReceived on 21st January 2008Revised on 16th September 2008doi: 10.1049/iet-ipr:20080012

ISSN 1751-9659

Probabilistic relevance feedback approachfor content-based image retrieval basedon gaussian mixture modelsA. Marakakis1 N. Galatsanos2 A. Likas2 A. Stafylopatis1

1School of Electrical and Computer Engineering, National Technical University of Athens, 15773 Athens, Greece2Department of Computer Science, University of Ioannina, 45110 Ioannina, GreeceE-mail: [email protected]

Abstract: A new relevance feedback (RF) approach for content-based image retrieval is presented. This approachuses Gaussian mixture (GM) models of the image features and a query that is updated in a probabilistic manner.This update reflects the preferences of the user and is based on the models of both the positive and negativefeedback images. The retrieval is based on a recently proposed distance measure between probability densityfunctions, which can be computed in closed form for GM models. The proposed approach takes advantageof the form of this distance measure and updates it very efficiently based on the models of the user-specified relevant and irrelevant images. It is also shown that this RF framework is fairly general and can beapplied in case other image models or distance measures are used instead of those proposed in this work.Finally, comparative numerical experiments are provided, which that demonstrate the merits of the proposedRF methodology and the use of the distance measure, and also the advantages of using GMs for imagemodelling.

1 IntroductionThe target of content-based image retrieval (CBIR) [1] is toretrieve relevant images from an image database based onthe similarity of their visual content with one or morequery images. These query images are submitted bythe user as examples of his/her preferences. Then, theCBIR system ranks the database images and displaysthe retrieved results ordered with respect to their similaritywith the query images. Most CBIR systems [2–9] modeleach image using a combination of low-level featuresand, then, define a distance measure that is used toquantify the similarity between the image models. A lot ofeffort has been devoted to developing features andstrategies that capture the human perception of imagesimilarity in order to enable efficient indexing andretrieval [6, 10–12]. Nevertheless, low-level image featurescannot always capture the human perception of imagesimilarity. In other words, it is difficult using only low-levelimage features to describe the semantic content of an image.

he Institution of Engineering and Technology 2009

Authorized licensed use limited to: Illinois Institute of Technology. Downloaded on February 1

This is known in the CBIR community as the ‘semanticgap’ problem [13].

Relevance feedback (RF) has been proposed as amethodology to alleviate this problem [2–4, 7–9]. RFattempts to insert the subjective human perception ofimage similarity into a CBIR system. Thus, RF is aninteractive process that refines the distance between thequery and the database images through interaction with theuser and taking into account his/her preferences. Toaccomplish this, during a round of RF, the user is requiredto rate the relevance of the retrieved images according tohis/her preferences. Then, the retrieval system updates thematching criterion based on the user’s feedback [2–4, 7–9,12, 14].

Gaussian mixtures (GM) are a well-establishedmethodology to model probability density functions (pdf).The advantages of this methodology, such as adaptabilityto the data, modelling flexibility and robustness, have made

IET Image Process., 2009, Vol. 3, Iss. 1, pp. 10–25doi: 10.1049/iet-ipr:20080012

6, 2009 at 02:34 from IEEE Xplore. Restrictions apply.

IETdoi:

www.ietdl.org

GM models attractive for a wide range of applications e.g.[15, 16]. Because of the above merits, GM models havealready been employed for the CBIR problem [5, 15, 17].The main challenge when using a GM model in CBIR isto define a distance measure between GMs, whichseparates the different models well and, in addition, can becomputed efficiently. The traditionally used distancemeasure between pdfs is the Kullback–Leibler (KL)divergence that cannot be computed in closed form forGM models. Thus, we have to resort to random samplingMonte Carlo methods to compute this measure for GMs.This makes its use impractical for CBIR, where retrievaltime is an important issue. In [17], the earth moversdistance (EMD) was proposed as an alternative distancemetric for GM models. Although the EMD metric hasgood separation properties and is much faster to computethan the KL divergence (in the GM case), it still requiresthe solution of a linear program. Consequently, it is notcomputable in closed form and it is not fast enough for anRF-based CBIR system, which requires online interactionwith the user. Moreover, in [18], the asymptotic likelihoodapproximation (ALA) was proposed as a measure that,under certain assumptions, approximates the KL divergenceand can be computed in closed form for GMs.

With regard to RF approaches proposed in the literature,much work has been done during the last years. This workcan be classified in two main categories. The first categoryconcerns classifier-based methods, that is, it includes themethods which are based on some learning model (usuallySVMs) in order to train a classifier to distinguish betweenthe positive and negative feedback examples. Several SVMvariations have been proposed for RF and a number ofdifferent SVM kernels have been adopted, from typicalones, like radial basis functions based on some Lp norm, tomore sophisticated ones that are based on the EMD [7, 19,20]. The main drawback of the classifier-based approach isthat, for every feedback round, a new classifier must betrained taking into account both the previously presentedexamples and the new ones presented in the last feedbackround.

The second category of RF methods (data distribution-based methods) includes those that attempt to model thestatistical distribution of feedback examples in the featurespace. These methods can be further divided in twosubcategories.

The first subcategory includes methods that make theassumption that the feedback examples form one cluster inthe feature space. The cornerstone of such methods isMindReader [2]. Other methods that work under thisassumption are presented in [12, 21–23]. Usually, in thesemethods, the new query is formed as a linear combinationof the positive feedback examples and the distance betweenthe query and the database images is expressed in the formof a Mahalanobis distance. Unfortunately, in most cases,the single cluster assumption is very restrictive even for the

Image Process., 2009, Vol. 3, Iss. 1, pp. 10–2510.1049/iet-ipr:20080012


set of positive examples. Moreover, the negative feedbackexamples cannot be taken into account, because theynaturally spread across different semantic categories and,thus, it cannot be claimed that they form one cluster.Nevertheless, in [22], a solution to this problem isproposed based on a ‘two-step’ retrieval approach. In thefirst step, only the positive examples are used to determinea reduced set of database images very similar to them. Inthe second step, these images are re-ranked based on boththe positive and negative examples, for the final ranking tobe produced.

The second subcategory of the data distribution-based RFtechniques includes methods, which assume that thefeedback examples (either positive or negative) form morethan one cluster [15, 24–27]. The methods, which arebased on GM models, belong to this category. GM modelsare used in the aforementioned methods in two differentways. First, given an image description in a multi-dimensional space, GM models are used in order toapproximate the distribution of the user examples, whichare encountered as points in the feature space. The rankingof the database images can be performed based on thelikelihood of the formed GM model for each of the images[15, 24, 25]. The drawbacks associated with these methodsare, on the one hand, that we must re-train a new GMmodel for every RF round based on both the new and theold examples and, on the other hand, that usually a globalimage description based either on a single or on a fewvectors per image is adopted. Alternatively, one GM can beused to model each image based on a set of feature vectorsextracted from the image pixels or regions. This is theapproach adopted in this work. In [26, 27], the posteriorprobability of the model of each image, given the feedback,is used in order to rank the database images. However, themain problem regarding these methods is the definition ofan easy to compute distance measure between GM models,which can be expressed in a closed form, in order toaccelerate the retrieval and feedback process.

In this paper, as a solution to the above problem, wepropose the use of an alternative distance measure betweenpdfs, called C2 divergence, which was recently proposed in[28]. This measure can be computed in a closed form forGM models. Moreover, we analyse thoroughly theproperties of this measure and provide a bound on thedifference between the C2 divergence and the symmetricKL divergence. Additionally, we propose an efficientprobabilistic RF technique, which relies on a suitable,straightforward and intuitive update of the GM model ofthe query after each RF round, using the relevance of theretrieved images. Furthermore, we propose an effectivestrategy that requires very few computations to update thedistance measure after each RF round. In particular, thedistance of the new query from the database images can beeasily computed using some pre-computed quantitiesrelated to the C2 divergence, and is incrementally updatedduring the RF rounds. This query update methodology and

11

& The Institution of Engineering and Technology 2009


12

&

www.ietdl.org

efficient distance computation strategy can be applied notonly in the case of GM models, but also with other pdfmodels of the images. For example, and for reasons ofcomparison with GMs, we have also applied thisframework when the images are modelled using histograms.Moreover, the proposed query update methodology,because of its elegant form, can be applied very efficientlyusing other distance measures as well. To obtain acomparative evaluation of the C2 divergence in this context,we also develop and test an incremental scheme to updatethe ALA measure.

A preliminary version of this work has been presentedin [29]. In the present paper, we further elaborate onthe approach and provide an in-depth experimental studyfor performance assessment of the proposed method. Theexperimental evaluation is based on comparative results onlarge-scale data sets using an enriched set of image features.

The rest of this paper is organised as follows. In Section 2,we describe GMs in the context of image modelling forCBIR, and we give more details about the most populardistance measures between GM models. We also elaboratefurther on the difficulties that arise from the use of thesedistance measures for RF based on GMs. In Section 3, wepresent the C2 divergence as a promising alternativedistance measure for GMs, we analyse its properties and wepresent the proposed C2-based RF scheme. In Section 4,we first present the other methods with which our methodis compared and, then, we provide the details and theresults of the experiments. Finally, in Section 5, we presentconclusions and directions for future research.

2 CBIR using GM models2.1 Definition of GM models

GM models have been used extensively in many data-modelling applications. Using them for the CBIR problemsallows us to bring to bear several powerful features of theGM-modelling methodology, such as modelling flexibilityand easy training, that make it attractive for a wide range ofapplications [15, 16, 30]. GM models have been usedpreviously for CBIR [5, 17] as probability density modelsof the features that are used to describe the images. In thisframework, each image is described as a bag of featurevectors that are computed locally (e.g. a feature vector foreach pixel or region of the image). This bag of featurevectors is, subsequently, used to train (in a maximumlikelihood manner) a GM that models the probabilitydensity of the image features in the feature space. A GMmodel for the image feature vectors x [ Rd is defined as

p xð Þ ¼XK

j¼1

pjf xjuj

� �(1:1)

uj ¼ mj , Sj

� �(1:2)

The Institution of Engineering and Technology 2009


f xjuj

� �¼ N xjuj

� �

¼1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

2pð Þd Sj

�� r e�

1

2x� mj

� �T

S�1j x� mj

� �(1:3)

where K is the number of Gaussian components in themodel, 0 � pj � 1 the mixing probabilities withPK

j¼1 pj ¼ 1, and f xjuj

� �a Gaussian pdf with a mean mj

and covariance Sj .

2.2 Distance measures betweenGM models

To describe the similarity between images in this context, adistance measure must be defined. The KL divergence [11]is the most commonly used distance measure between pdfs.For two pdfs, p1 xð Þ and p2 xð Þ, the KL divergence is defined as

KL p1jjp2

� �¼

ðp1 xð Þ log

p1 xð Þ

p2 xð Þdx (2)

However, this distance measure cannot be computed in aclosed form for GMs. Thus, one has to resort to timeconsuming random sampling Monte Carlo methods, whichmake the use of this distance measure impractical forCBIR. As far as RF is concerned, the situation is evenworse, because RF is based on online interaction with theuser, which demands rapid distance updates at every RFround. To overcome these difficulties, some alternativeshave been proposed.

In [17], the EMD metric between GMs was proposed.This metric is based on considering the probability mass ofthe one GM as piles of earth and of the other GM asholes in the ground and, then, finding the least worknecessary to fill the holes with the earth in the piles. Morespecifically, for two GMs, p1 xð Þ ¼

PK1i¼1 p1if xju1i

� �and

p2 xð Þ ¼PK2

j¼1 p2jf xju2j

� �

EMD p1, p2

� �¼

PK1i¼1

PK2j¼1 fijdground f xju1i

� �, f xju2j

� �� PK1

i¼1

PK2j¼1 fij

(3:1)

where fij � 0 are selected so as to minimise the numerator inthe EMD definition subject to the following constraints

XK2

j¼1

fij � p1i,XK1

i¼1

fij � p2j ,

XK1

i¼1

XK2

j¼1

fij ¼ minXK1

i¼1

p1i,XK2

j¼1

p2j

!¼ 1

(3:2)

dground(,) denotes a ground distance metric quantifying thedistance between two Gaussians. The EMD is an effective



IETdoi:

www.ietdl.org

metric for CBIR, however, it cannot be computed in a closedform and requires the solution of a linear program each timethat must be computed. Thus, it is slow and cumbersome touse for RF, where the query changes after each RF epoch.

In [18], the ALA was proposed as a similarity measurefor GMs. More specifically, for two GMs, p1 xð Þ ¼PK1

i¼1 p1if xju1i

� �and p2 xð Þ ¼

PK2j¼1 p2jf xju2j

� �

ALA p1jjp2

� �¼XK1

i¼1

p1i log p2b12 ið Þ þ log f m1iju2b12 ið Þ

� �hn

�1

2trace S

�12b12 ið ÞS1i

� ��(4:1)

where

x� m 2

S¼ x� mð Þ

TS�1

x� mð Þ (4:2)

b12 ið Þ ¼ k, m1i � m2 k

2

S2 k� log p2k

, m1i � m2l

2

S2l� log p2l , 8l = k (4:3)

This measure has the favourable property that, under certainassumptions, it approximates the KL divergence. Thedifficulty with the ALA measure is that it requires thecomputation of a correspondence function, b12 ið Þ, betweenthe components of the query and those of the databaseimage model. As the query model changes after everyfeedback round, there is a need for re-computation of thiscorrespondence function, which can burden the distancecomputation taking into account that the new query modelmay be much more complex than the initial one, in orderto incorporate the models corresponding to the feedbackexamples. Nevertheless, in Section 4.1, we present anincremental scheme based on the proposed methodology ofquery model update, which enables the fast update of thismeasure after each round of RF.

3 The C2 divergence and itsapplication to RF3.1 Definition of the C2 divergence

To alleviate the aforementioned difficulties, which areencountered by most of the popular distance measures inthe case of GMs, a new distance measure was proposed in[28]. This measure between two pdfs, p1 xð Þ and p2 xð Þ, isdefined as

C2 p1, p2

� �¼ � log

2Sp1p2

Sp1p1þ Sp2p2

(5:1)

with

Spmpl¼ Sml ¼

ðpm xð Þpl xð Þdx (5:2)

Image Process., 2009, Vol. 3, Iss. 1, pp. 10–2510.1049/iet-ipr:20080012

Authorized licensed use limited to: Illinois Institute of Technology. Downloaded on February

and it can be computed in a closed form when p1 xð Þ and p2 xð Þare GMs. In this case, we have (see proof in Appendix 8.1)

Sml ¼1ffiffiffiffiffiffiffiffiffiffiffi2pð Þd

p XKm

i¼1

XKl

j¼1

pmipljffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiCml i, jð Þ�� ekml i,;jð Þ

q (6:1)

where

Cml i, jð Þ ¼ Smi þ Slj (6:2)

kml i, jð Þ ¼ mmi � mlj

� �T

C�1ml i, jð Þ mmi � mlj

� �(6:3)

pmi, mmi andSmi are the mixing weight, the mean and thecovariance matrix, respectively, of the ith Gaussian kernelof pm; d is the dimension of the feature vector x, and Km isthe number of Gaussian components in pm.

3.2 Properties of the C2 divergence

For the C2 divergence, the following properties hold:

1. C2(p1, p2) � 0

2. C2(p1, p2) ¼ 0, p1 xð Þ ¼ p2 xð Þ

3. C2(p1, p2) ¼ C2(p2, p1)

4. The triangular inequality C2(p1, p3) � C2(p1, p2)þC2(p2, p3) does not hold; therefore the C2 is not a metricas is also the case with the KL divergence.

With regard to the relation between the C2 divergence andthe symmetric KL divergence defined by

SKL p1, p2

� �¼

1

2KL p1jjp2

� �þ KL p2jjp1

� �� (7)

the following inequality can be proved for arbitrary pdfs, p1

and p2 (the proof is given in Appendix 8.2).

Remark 1: The difference between the SKL and the C2 isbounded by

SKL p1, p2

� �� C2 p1, p2

� ��

1

2D1 p1

� �þ D2 p2

� �� þ log 2

(8:1)

where

Di(pi) ¼ E[ log pi]pi� log E[pi]pi

� 0 (8:2)

because of the Jensen’s inequality. It must be noted that theJensen’s inequality becomes an equality when the logfunction is replaced by a linear function. Since the logfunction can be locally approximated by a linear function, itcan be inferred that the bound becomes tighter (i.e. Di(pi)becomes small) when pi exhibits small variance.

13



14

&

www.ietdl.org

Furthermore, with regard to the relation between the C2divergence and the L2 norm, it is straightforward to show that

C2 f , g� �

¼ � log2 , f , g .

L22 f� �þ L2

2 g� �

¼ � log 1�L2

2 f � g� �

L22 f� �þ L2

2 g� �

! (9:1)

where

L2 f� �¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðf 2 xð Þdx

s(9:2)

and

, f , g .¼

ðf xð Þg xð Þdx (9:3)

are the definitions of the L2 norm and the inner product,respectively, for real continuous functions. Given thatlog x ’ x� 1 for x ’1

C2 f , g� �

’L2

2 f � g� �

L22 f� �þ L2

2 g� � for

L22 f � g� �

� L22 f� �þ L2

2 g� � (9:4)

3.3 RF based on the C2 divergence

For a distance measure to be useful in RF, it is crucial that itbe easily updated based on the feedback images provided bythe user. Thus, assume we have a query, q, modelled by q xð Þ(e.g. a GM model) and that the ith database image ismodelled by i xð Þ, for i ¼ 1, . . . , N . The search based onthis query requires the calculation of an N � 1 table withthe values C2 q, i

� �. Also, assume that, from the retrieved

images, the user decides that the images with modelsrm xð Þ, m ¼ 1, 2 . . . M , are the most relevant and desires toupdate his query based on them. One simple and intuitiveway to go about it, is to generate a new query model given by

q0 xð Þ ¼ 1� Lð Þq xð Þ þXMm¼1

lmrm xð Þ (10)

where 0 � lm � 1,PM

m¼1 lm ¼ L, 0 � L � 1, lm is therelevance assigned by the user to the image rm and 1� L isthe weight of contribution of the previous query to theformation of the new query. The attractive feature of thisdefinition of the new query model is that it is consistentwith the probabilistic framework adopted for imagemodelling, with each lm having a clear probabilisticmeaning. Moreover, lm has a physical meaning as well; itis proportional to the relevance degree assigned by the userto the image rm. From the above considerations, it becomes



apparent that (10) defines a ‘composite model’ thatincorporates the preferences of the user.

Furthermore, it is desirable to efficiently compute thedistances between the database image models,i xð Þ, i ¼ 1, 2 . . . N , and the new query model, q0 xð Þ.Taking into account (5.x) and (10), the updated distancemeasure for the new query, q0, is given by (the proof isgiven in Appendix 8.3)

C2 q0, i� �

¼ � log2Sq0i

Sq0q0 þ Sii

(11:1)

Sq0i ¼ 1� Lð ÞSqi þXMm¼1

lmSrmi (11:2)

Sq0q0 ¼ 1� Lð Þ2Sqq þ 2 1� Lð Þ

XMm¼1

lmSqrm

þXMm¼1

XMm0¼1

lmlm0Srmrm0(11:3)

Since the Sml values can be computed (and stored) a priori forall the images of the database, and, additionally, Sqq and allSqi have already been computed for the previous query, it isobvious that the computation of the distance between q0 xð Þand the database image models is very fast. This constitutesa notable advantage of our method. Indeed, computing thedistance for the new query involves only re-scalingoperations based on the relevance probabilities lm.Actually, the new query model in (10) does not need to beconstructed. To update the distance between the query andthe database images, only the computations of (11.x) areneeded, and those computations only implicitly involve thenew query model. It should be mentioned that, even if wedo not pre-compute the Sml values for all the databaseimages (because of the lack of space or time when hugeimage databases are considered, for example), thesequantities can be computed online very fast for GMs, sincethey are defined in a closed form in (6.x). Thus, in thiscase too, by incrementally updating Sqq and Sqi using (11.2)and (11.3), the C2 divergence between the new query andthe database images can be computed fast and without theneed of constructing the new query model.

Another favourable property of the proposed RF scheme isthat it can be easily used with many types of pdf models andnot just with the GMs, proposed in this work. In otherwords, the models of the database images and the queryneed not be GMs. Any other image model for which thecomputation of the S-measure, defined by (5.2), is notprohibitively time consuming can be adopted without anymodification to the RF scheme.

The images retrieved by the system at each retrieval epoch,which are not selected by the user as relevant, can beconsidered as irrelevant. Except for this implicit way of



IETdoi

www.ietdl.org

selection of irrelevant images, another one, more rigorous andaccurate, can be considered in which the user will be asked todefine them explicitly. The user-selected irrelevant imagescan also be incorporated, as negative feedback, into the RFprocess. In this context, we define a query model forirrelevant images, which we call ‘negative’ query model, andupdate it by a method similar to (10). Specifically, the newnegative query model is given by

n0 xð Þ ¼ 1� Ln

� �n xð Þ þ

XMn

m¼1

lnmrn

m xð Þ (12)

where n and n0 correspond to the previous and new negativequeries, respectively; Ln and ln

m are analogous to thepreviously mentioned L and lm; and rn

m, m ¼ 1, 2, . . . , Mn,are the negative examples. The negative query is initially‘empty’, contrary to the positive one that includes the initialquery selected by the user.

The best images to retrieve can be found by combiningboth the positive and negative RF. This can be done byminimising the following distance measure as

c ið Þ ¼ aposd q, i� �

� 1� apos

� �d n, ið Þ (13:1)

d q, i� �

¼ C2 q, i� �

(13:2) (13:2)

d n, ið Þ ¼ C2 n, ið Þ (13:3)

with 0 � apos � 1 being the relative weight given to thepositive feedback. After computing the measure c ið Þ forevery database image, we can retrieve the images with thelowest values for this measure. Such images will be similarto the user ‘ideal’ query, which is determined by the initialquery and the positive examples, and dissimilar to the user-provided negative examples.

4 Experiments4.1 Methods of comparison

In this work, we propose the use of GMs in order to modelthe database images, along with the RF scheme based on theC2 divergence, described in Section 3.3. To demonstrate themerits of this method, we conducted a number ofexperiments.

As we mentioned above, our RF scheme can also bedirectly applied when the database images have beenmodelled using some other pdf model instead of GMs. Todemonstrate the strength of GMs for image modelling, wecompared our method with a histogram-based variation. Inother words, we implemented a variation of our method,which uses histograms instead of GMs as image models.

In the case of modelling the pdf of image features usinghistograms, the range of values of each feature is dividedinto a number of disjoint intervals. The simplest choice for

Image Process., 2009, Vol. 3, Iss. 1, pp. 10–25: 10.1049/iet-ipr:20080012


those intervals is to assume that they have equal width.Thus, for image features f1, f2, . . . , fL (ai � fi � bi) andwith the range of each feature fi divided into zi intervals,we obtain an L-dimensional space quantised intoZ ¼ z1z2 � � � zL bins. Then, assuming that we describeeach image with one feature vector per image pixel, we canapproximate the pdf of the image features by simplycounting the pixels the features of which fall into eachbin and normalising by dividing with the total numberof pixels. Thus, we can represent the histogram of animage in vectorial form as p ¼ h1, h2, . . . , hZ

� �T, withPZ

i¼1 hi ¼ 1.

Taking into account (5.x), in order to compute the C2divergence between histogram models p1 and p2, it issufficient to compute Sml ¼

Ðpm xð Þpl xð Þdx, m, l ¼ 1, 2. It

is trivial to show that, for histograms

Sml ¼XZ

i¼1

hmihli (14)

where pm ¼ hm1, hm2, . . . , hmZ

� �Tand pl ¼ hl1, hl2, . . . ,

�hlZÞ

T. Using (14) to compute Sml and under theassumption that we use (10) and (12) to update the positiveand negative queries (based on the feedback given by theuser), RF, in the case of histograms, is performed in exactlythe same manner as for GMs. Thus, again, the histogramsfor the new query models do not need to be constructed.

To demonstrate the benefits of using the C2 divergence foran RF based on GMs, we compared our method with avariation that uses the ALA measure to rank the images.Using (10) and (12) to update the models of the queries, itis easy to prove (the proof is given in Appendix 8.4) thatthe ALA measure for the new query models can beincrementally updated by

ALA q0jji� �

¼ 1� Lð ÞALA qjji� �

þXMm¼1

lmALA rmjji� �

(15)

A similar equation holds in the case of the negative query.To obtain this equation, the only thing to be done is toreplace q, q0, lm, L, M and rm by n, n0, ln

m, Ln, Mn and rnm,

respectively, in (15). Taking into account that the ALA is asimilarity measure, the definitions of the distances d q, i

� �and d n, ið Þ of (13.x) were modified, taking the followingform

d q, i� �

¼ �ALA qjji� �

(16:1)

d n, ið Þ ¼ �ALA njjið Þ (16:2)

Additionally, for a comparative evaluation of the proposedRF methodology, we have implemented the methoddescribed in [22]. This method incorporates both positiveand negative examples using a ‘two step’ retrieval scheme.

15



16

&

www.ietdl.org

In the first step, only the positive examples are used in orderto rank the database images. Then, only a relatively smallnumber of the database images, which are very similar tothe positive examples, are retained and re-ranked in thesecond step which takes into account both the positive andnegative examples, to produce the final ranking from whichthe top images, placed near to the positive and far from thenegative examples, are presented to the user. Both theretrieval steps are based on the Lagrange optimisation inorder to estimate the distance parameters that minimise thewithin class distance and maximise the between classdistance of the positive and negative examples.

4.2 Image databases and features

To test the validity of the proposed approach, we used twoimage databases. The first image set (DB I) contains 3740colour images of size 640 � 480 from the image databasein [31]. These images have been classified into 17 semanticcategories according to their content (e.g. airplanes, cars,birds, windows etc.). Generally, we adopted thecategorisation specified by the database provider, except fora few cases where categories, which are semantically veryclose to each other, were merged. The second database(DB II) is a subset of the Corel image database. It contains9923 images of size 256 � 384, which are classified into 42semantic categories. Although the Corel database isprofessionally annotated and categorised, many imagescontaining the same semantic content are distributed acrossdifferent Corel categories. Hence, we decided to mergesome Corel categories, thus producing our own semanticcategorisation, which is considered as the ground truth.

For each image pixel in the aforementioned databases, a setof features has been extracted. These features are of threetypes and are described below:

1. Position: the (x, y)-coordinates of each pixel normalised bythe image width (xmax) for the x-dimension and by the imageheight (ymax) for the y-dimension.

2. Colour: we chose the CIE-Lab [17] colour space asbeing approximately perceptually uniform, a property veryuseful for retrieval and, thus, three colour features (L�,a�, b�) per pixel were used. To take the final values[ L�ð Þsmoothed, a�ð Þsmoothed, b�ð Þsmoothed] for these features,local Gaussian smoothing is performed according to thetexture scale which is mentioned below. In this way, wedecorrelate the colour and the texture of the image.

3. Texture: As a scheme to extract texture features from theimages, we selected the one presented analytically in [5].The gradient of the L� colour component of the image isestimated. The method is based on the eigenstructure ofthe second moment matrix derived using theaforementioned gradient. One measure of the local texturescale along with the anisotropy (A), the polarity (P) andthe contrast (C ), corresponding to this scale, are estimated



for each pixel. Taking into account that the polarity andanisotropy values are meaningless in regions of lowcontrast, the texture features that were selected are:AC ¼ A � C, PC ¼ P � C and C.

Thus, finally, we have an eight-dimensional vector, x, foreach image pixel, with

x ¼ f1, f2, . . . , f8� �T

¼ xnorm, ynorm, L�ð Þsmoothed,�

a�ð Þsmoothed, b�ð Þsmoothed, AC, PC, C�T

(17:1)

xnorm¼

x

xmax

(17:2)

ynorm¼

y

ymax

(17:3)

Prior to feature extraction, pixel sub-sampling was performedusing a spatially uniform grid. In DB I, only 15% of theimage pixels were used for feature extraction and GMtraining. On the other hand, in DB II, 50% of the imagepixels were retained after sub-sampling, because the imagesare of lower resolution. Prior to sub-sampling, the imageswere smoothed, using a Gaussian kernel, in order to avoidaliasing. Sub-sampling was necessary in order to reduce thecomputational load required by the feature extraction andmodel parameter estimation steps.

4.3 Implementation parameters

Regarding our method, the parameters of the GM model ofeach image were estimated using a variation of the wellknown EM algorithm, called greedy EM [32]. Thisalgorithm avoids the problem of parameter initialisation,which is critical for the normal EM. The greedy EMalgorithm starts with a single component and addscomponents sequentially until a maximum number isreached. In all the experiments, we chose to use fullcovariance parametrisation for the GM components. Wealso performed several experiments with GM models ofdifferent number of components and we concludedempirically that a good choice for this parameter is to haveten components per GM model. Before applying the EMalgorithm, we normalise each feature to have zero meanand unit standard deviation taking into account all thefeature vectors of the image. After the application of theEM, we convert the produced model to the initial featurespace (before feature normalisation). This is necessary sothat the subsequent comparison between the produced GMmodels are fair.

The parameters lm and lnm for the positive and negative

examples are given equal values regardless of m, because,for the simulations described analytically in the nextsection, the ground truth of the aforementioned pre-categorised databases is used in order to select the positiveand negative examples. Thus, using the strict ground truth



IETdoi

www.ietdl.org

categorisation, it is meaningless to assume different relevancedegrees for each example. The exact values of theseparameters for each RF round are determined as follows,based on the number of positive and negative examplesalready provided by the user until the current RF round,and aiming at assigning equal weights to all the examples.

Assuming that we are at the rth RF round and that thepositive and the negative examples provided by the user atthis RF round are kr and kn

r , respectively, the total numberof positive and negative examples given during all the RFrounds until now are tr and tn

r , respectively, where

tr ¼ 1þXr

j¼1

kj (18:1)

tnr ¼

Xr

j¼1

knj (18:2)

In the case of positive examples, in order to include the initialquery in the computation, we increment by one the totalnumber of positive examples given by the user during theRF rounds. Based on the above considerations, theparameters lm and ln

m for the current feedback round, r, are

lm

� �r¼

1

tr

, m ¼ 1 . . . kr (18:3)

lnm

� �r¼

1

tnr

, m ¼ 1 . . . knr (18:4)

Regarding the histogram-based variation of our method, it iswell known that the main difficulty with histograms is the so-called ‘curse of dimensionality’. In other words, as we increasethe number of intervals for each feature, the number ofhistogram bins increases exponentially, resulting in ashortage of data for robust density estimation, given thelimited number of available feature vectors per image. Forthe eight-dimensional feature space used in theseexperiments, even if we use a small number of intervals foreach feature, the resulting total number of bins of an eight-dimensional histogram is prohibitively large.

For this purpose, instead of one eight-dimensionalhistogram, we decided to use two five-dimensional ones,one with the position and colour features and another withthe position and texture features. The decision of using oneposition-colour histogram and one position-texturehistogram was motivated by the fact that, on the one hand,it is meaningless to consider the position features separatelyfrom the other types of features and, on the other hand,both the colour and texture feature distributions depend onthe position in the image. Thus, the only reasonable choiceis to combine the position features with both the colourand texture features. Finally, it should be mentioned thatthe straightforward inclusion of the position features in thehistograms is consistent with the previously described GM



modelling, which makes no discrimination betweenposition, colour and texture features.

In this context, in order to compute the overall distancebetween the query and the database images, a convexcombination of the computed distances for the twohistogram models was used. More specifically, thedefinition of the distances d q, i

� �and d n, ið Þ of (13.x) was

modified taking the following form

d q, i� �

¼ lpos�colourC2pos�colour q, i� �

þ lpos�textC2pos�text q, i� �

(19:1)

d n, ið Þ ¼ lpos�colourC2pos�colour n, ið Þ

þ lpos�textC2pos�text n, ið Þ (19:2)

lpos�colour þ lpos�text ¼ 1 (19:3)

with C2pos�colour ,ð Þ and C2pos�text ,ð Þ denoting the C2distances for the position-colour and position-texturehistograms, respectively, and lpos�colour and lpos�text beingtwo positive constants summing up to one.

Regarding the quantisation of the feature space, severalexperiments were performed. We present here the resultsobtained for two distinct quantisation settings (denotedHist1, Hist2), as being the most representative. Specifically,Hist1 uses a 3� 3� 4� 8� 8 histogram for the position-colour feature combination, xnorm

� ynorm L�ð Þsmoothed�

� a�ð Þsmoothed� b�ð Þsmoothed, and a 3� 3� 4� 4� 4

histogram for the position-texture feature combination,xnorm

� ynorm� AC � PC � C. Respectively, Hist2 uses a

5� 5� 8� 16� 16 position-colour histogram and a5� 5� 8� 8� 8 position-texture histogram.

Regarding the ALA-based variation of our method, theimage models and the RF scheme are exactly the same asthose used in our method, except for the ranking measureincremental update, which is performed using (15), and thedefinition of the distances d q, i

� �and d n, ið Þ of (13.1),

which is given by (16.x).

To extract features appropriate for the method proposed in[22], we partitioned each database image in nine sub-images,dividing each of the x and y image axes in three equal widthintervals, and, for each sub-image, we estimated a4� 8� 8 L�ð Þsmoothed

� a�ð Þsmoothed� b�ð Þsmoothed

� �colour

histogram and a 4� 4� 4 AC � PC � Cð Þ texturehistogram. Thus, we obtain an image description of 18feature vectors, nine 256-dimensional colour histogramvectors and nine 64-dimensional texture histogram vectors.As it can easily be seen, this feature extraction process is inclear correspondence with the previously described Hist1quantisation setting (the features corresponding to theHist2 quantisation cannot be used for the methodproposed in [22], because of the rapidly increasing with thefeature space dimensionality time complexity of this

17



18

&

www.ietdl.org

method). Moreover, the relevance degrees pkn [22] are given

equal values, independent of n, for the same reason forwhich we give equal values to the parameters lm and ln

m ofour method. Finally, for reasons of comparison with ourmethod, we selected to set p̃1

¼ apos and p̃ 2¼ 1� apos,

because p̃1 and p̃2, in [22], are in essence the relativeweights given to the positive and negative examples,respectively. For our method, this is exactly the role of theparameters apos and 1� apos.

4.4 Simulations and results

To quantify the performance of the proposed RF system, wehave resorted to RF simulation. Specifically, two simulationschemes were implemented. In both schemes, a query set isformed including for each database category a percentage ofthe images belonging to it. Each image in this set is usedas an initial query. In this context, the relevant (irrelevant)images are identified, using the ground truth categorisationof the image databases, as those which (do not) belong tothe database category of the initial query. In the firstsimulation scheme, for each query, similar images areretrieved until a specific recall level [33] is reached. Then,for this recall level, the corresponding precision [33] iscomputed. The retrieved relevant and irrelevant images arealso identified, with the percentage of them used for RF inthe simulation denoted by rprc and nrprc, respectively. Therelevant and irrelevant images used in RF are selected atrandom from the sets of retrieved relevant and irrelevantimages, respectively, when the aforementioned percentagesare ,1. In Figs. 1 and 2, we show the progression inprecision averaged over the set of initial queries duringdifferent rounds of RF, for a recall level equal to 0.3. Inthese figures, ‘GMMþ C2’ indicates the method proposedin this paper and ‘Hist1þ C2’ (‘Hist2þ C2’) indicates thehistogram-based variation when the Hist1 (Hist2) feature

Figure 1 Image models evaluation (first experiment):average precision for recall level ¼ 0.3, during differentrounds of RF (DB I)



space quantisation is used. Also, ‘pN’ indicates that rprcequals N and ‘nN’ indicates that nrprc equals N. Theresults in Figs. 1 and 2 were obtained using DB I, wherethe set of initial queries included every image in thedatabase. When negative feedback was given, apos equals0.55. With regard to the histogram-based variation of ourmethod, we set lpos�colour ¼ lpos�text ¼ 0:5.

The second simulation scheme is similar to that proposedin [12]. The accuracy [21] is measured as the ratio of relevantimages among the top T retrieved images. At each feedbackstep, at most Kp relevant images are selected from the top Rretrieved images, as positive feedback. If the number ofrelevant images in the top R retrieved images is greaterthan Kp, then, we select randomly Kp of them as feedback,else we select all the relevant retrieved images. In case wewish to provide the system with negative feedback, wefollow a similar procedure providing as feedback at most Kn

of the irrelevant images retrieved in the first R retrievedimages. We provide experiments with this methodology inboth databases. In DB I, each image was used once as theinitial query, whereas in DB II, the query set included 40%of the database images (a percentage sufficient from astatistical point of view). The average accuracy wascomputed for all the images in the set of initial queries andfor each RF round. In both databases, for reasons ofcomparison, we considered several choices regarding theimage models, the RF methodology and the distancemeasures between the images. The results are given inFigs. 3–6. The choices for the simulation parameters were:T ¼ 20, R ¼ 150, Kp ¼ 10, Kn ¼ 10. When negativefeedback was provided, apos ¼ 0:65 (except for the case ofthe ALA-based variation, which gives the best results whenapos ¼ 0:8). For the histogram-based variation, we set againlpos�colour ¼ lpos�text ¼ 0:5. For the method proposed in [22],when negative feedback was provided, the R (¼150)

Figure 2 Image models evaluation (second experiment):average precision for recall Level ¼ 0.3, during differentrounds of RF (DB I)



IEdo

www.ietdl.org

top-ranked images in the first step of retrieval, were retained inthe second step. In Figs. 3–6, ‘GMMþ C2’ denotes ourmethod, ‘Hist1þ C2’ the histogram-based variation with theHist1 feature space quantisation, ‘GMMþALA’ the ALA-based variation and ‘KZB_Meth’ the method proposed in[22]. Also, ‘p’ indicates the use of positive feedback and ‘n’indicates the use of negative feedback.

From Figs. 1–3, it can be observed that, when GMs areused, the precision (first simulation) or the accuracy(second simulation) is always higher than in the histogramcase, except for the initial round (before RF) of retrievaland rarely the initial one or two RF rounds. This is truewhen either the first (Hist1) or the second (Hist2)quantisation for the histograms is considered. After severalRF rounds, the precision (accuracy) level obtained usingGMs is significantly higher compared with that of thehistogram-based variation.

Figure 3 RF schemes evaluation: average accuracy in scopeT ¼ 20 during different rounds of RF (DB I)

Figure 4 Distance measures evaluation: average accuracyin scope T ¼ 20 during different rounds of RF (DB I)

T Image Process., 2009, Vol. 3, Iss. 1, pp. 10–25i: 10.1049/iet-ipr:20080012


Furthermore, from Figs. 4 and 6, it can be observed thatthe use of the C2 divergence as a distance measure betweenGMs leads to a very significant improvement comparedwith the accuracy obtained using the ALA measure inexactly the same context.

Additionally, as illustrated in Figs. 3 and 5, our methodalmost always significantly outperforms the methodproposed in [22] after the initial (one or two) RF rounds.Moreover, when compared with the method proposed in[22], the histogram-based variation of our method (usingthe Hist1 quantisation) also results in higher levels ofaccuracy. Taking into account that the image featuresused for both the histogram-based variation and themethod proposed in [22] are exactly the same in thisexperiment, this is an indication of the superiority of ourRF scheme.

Figure 5 RF schemes evaluation: average accuracy in scopeT ¼ 20 during different rounds of RF (DB II)

Figure 6 Distance measures evaluation: average accuracyin scope T ¼ 20 during different rounds of RF (DB II)

19



20

&

www.ietdl.org

Finally, it should be mentioned that, when we use onlypositive feedback (and, more generally, when we assign ahigh weight (apos) to the positive feedback), we observe arapid increase in precision (accuracy) at the initial RFrounds and, then, the precision (accuracy) stabilises to themaximum attainable level. On the other hand, when boththe positive and negative examples are used (and as theweight of the positive feedback decreases, always remaining.0.5), the increase in precision (accuracy) becomessmoother, a greater number of rounds are required forconvergence, however, in many cases, we can achieve ahigher level of precision (accuracy) than before.Additionally, in the first simulation scheme, the use of asubset of the available images for feedback (when rprc ornrprc is ,1) leads, in many cases, to slower convergenceand to lower levels of precision.

To provide an estimate of the retrieval time of the proposedmethod, it can be noted that, for DB I and for the secondsimulation scheme (with both positive and negative feedbackand with the simulation parameters having values asspecified above), it takes about 166 s of computation time(on a 3 GHz PC using Matlab) to execute, for 3740 imagespresented as initial queries, the initial retrieval plus sixepochs of RF. This means that the average retrieval time perquery image with six rounds of RF is 0.044 s. Regarding thehistogram-based variation, it takes about double the time forthe same simulation scenario, because each image isdescribed by two histogram models instead of the one GMmodel used by the proposed method. Moreover, the ALA-based variation, using the incremental update formula of(15), results in an execution as fast as that of our method.On the other hand, using the same simulation scenario, themethod proposed in [22] takes about 19 000 s for 3740images used as initial queries, which means an average ofabout 5 s per query. This significant difference in the timecomplexity demonstrates the noticeable ability of our RFscheme to rapidly update the distances between the databaseimages and the query and to provide an instantaneousresponse to the user.

5 Conclusions – future workA probabilistic framework for RF based on GM models wasproposed in this paper. The main advantages of theproposed methodology are accuracy as indicated by oursimulation study results, speed of implementation andflexibility. Incorporation of both positive and negativefeedback examples is performed in a very intuitive mannerwhich, in combination with the simple and easy to updateform of the proposed C2 divergence, allows for real-timeevaluation of the image-ranking criterion. In this way, fastretrieval is achieved after the user feedback has beenprovided.

Furthermore, we compared GM and histogram-basedimage models, in order to access the value of GMmodelling within the context of RF. To use the same



number of features in both the histogram and GM cases,we were forced to partition the feature space in twosubspaces and to use two histograms for each imagebecause of the ‘curse of dimensionality’. In contrast, whenusing GMs, there is no need for such partitioning. As faras the experimental results are concerned, it is clear thatsignificant performance improvement is obtained when weuse GM models, compared with the performance achievedwhen we use histograms to model the images.Furthermore, the combination of our RF scheme eitherwith GMs or with histograms leads to superior resultswhen compared to the method in [22]. Finally, wecompared our method with a variation using the ALAsimilarity measure instead of the proposed C2 divergence,in order to demonstrate the advantages of the latter. Thecomparative experimental results demonstrate that asuperior performance can be achieved with the C2divergence.

In the future, we intend to provide the user with thepossibility to determine explicitly the degree of relevanceof the feedback examples, by implementing asophisticated interactive system based on GMs and usingour RF scheme. In addition, we aim to generalise our RFscheme to support region-based similarity and retrieval.Furthermore, we aim to attempt to apply techniques todetermine automatically the appropriate number ofkernels for each mixture. Moreover, we plan to test theperformance of our method using more sophisticatedimage features. Finally, we would like to test thescalability of the proposed method using even largerimage databases.

6 AcknowledgmentsThis work was partly supported by Public Funds under thePENED 2003 Project. The Project is co-funded by theEuropean Social Fund (80%) and National Resources(20%) from the Hellenic Ministry of Development -General Secretariat for Research and Technology.

7 References

[1] DATTA R., LI J., ZE WANG J.: ‘Content-based image retrieval:approaches and trends of the new age’. MultimediaInformation Retrieval, 2005, pp. 253–262

[2] ISHIKAWA Y., SUBRAMANYA R., FALOUTSOS C.: ‘MindReader:querying databases through multiple examples’. Proc.VLDB Conf, 1998

[3] COX I.J., MILLER M.L., MINKA T.P., PAPATHOMAS T.V., YIANILOS P.N.:‘The Bayesian image retrieval system, PicHunter: theory,implementation, and psychophysical experiments’, IEEETrans. Image Process., 2000, 9, (1), pp. 20–37



IETdoi

www.ietdl.org

[4] RUI Y., HUANG T.S., ORTEGA M., MEHROTRA S.: ‘Relevancefeedback: a power tool for interactive content-basedimage retrieval’, IEEE Trans. Circuits Syst. Video Technol,1998, 8, (5), pp. 644–655

[5] CARSON C., BELONGIE S., GREENSPAN H., MALIK J.:‘Blobworld: image segmentation using expectation-maximization and its application to image querying’, IEEETrans. Pattern Anal. Mach. Intell, 2002, 24, (8),pp. 1026–1038

[6] CHEN Y., WANG J.Z.: ‘A region-based fuzzy featurematching approach to content-based image retrieval’, IEEETrans. Pattern Anal. Mach. Intell., 2002, 24, (9),pp. 1252–1267

[7] GUO G.D., JAIN A.K., MA W.Y., ZHANG H.J.: ‘Learning similaritymeasure for natural image retrieval with relevancefeedback’, IEEE Trans. Neural Netw., 2002, 13, (4),pp. 811–820

[8] AGGARWAL G., ASHWIN T.V., GHOSAL S.: ‘An image retrievalsystem with automatic query modification’, IEEE Trans.Multimedia, 2002, 4, pp. 201–214

[9] JING F., LI M., ZHANG H.J., ZHANG B.: ‘An efficient andeffective region-based image retrieval framework’, IEEETrans. Image Process., 2004, 13, (5), pp. 699–709

[10] MANJUNATH B.S., OHM J.R., VASUDEVAN V.V., YAMADA A.: ‘Colorand texture descriptors’, IEEE Trans. Circuits Syst. VideoTechnol, 2001, 11, (6), pp. 703–715

[11] DO M.N., VETTERLI M.: ‘Wavelet-based texture retrievalusing generalized Gaussian density and Kullback-Leiblerdistance’, IEEE Trans. Image Process., 2002, 11, (2),pp. 146–158

[12] HSU C.T., LI C.Y.: ‘Relevance feedback using generalizedbayesian framework with region-based optimizationlearning’, IEEE Trans. Image Process., 2005, 14, (10),pp. 1617–1631

[13] RUI Y., HUANG T.S., CHANG S.F.: ‘Image retrieval: currenttechniques, promising directions, and open issues’, J. Vis.Commun. Image Represen, 1999, 10, pp. 39–62

[14] EL-NAQA I., YANG Y., GALATSANOS N., WERNICK M.: ‘A Similaritylearning approach to content based image retrieval:application to digital mammography’, IEEE Trans. Med.Imaging, 2004, 23, (10), pp. 1233–1244

[15] QIAN F., LI M., ZHANG L., ZHANG H.J., ZHANG B.: ‘Gaussianmixture model for relevance feedback in image retrieval’.Proc. IEEE ICME, August 2002

[16] BISHOP C.M.: ‘Neural networks for pattern recognition’(Oxford University Press Inc, New York, 1995)



[17] GREENSPAN H., DVIR G., RUBNER Y.: ‘Context-dependentsegmentation and matching in image databases’, Comput.Vis. Image Underst., 2004, 93, pp. 86–109

[18] VASCONCELOS N.: ‘On the efficient evaluation ofprobabilistic similarity functions for image retrieval’, IEEETrans. Inf. Theory, 2004, 50, (7), pp. 1482–1496

[19] JING F., LI M., ZHANG H-J., ZHANG B.: ‘Relevance feedback inregion-based image retrieval’, IEEE Trans. Circuits Syst.Video Technol., 2004, 14, (5), pp. 672–681

[20] TONG S., CHANG E.: ‘Support vector machine activelearning for image retrieval’. ACM Multimedia, 2001

[21] SU Z., ZHANG H., LI S., MA S.: ‘Relevance feedback incontent-based image retrieval: Bayesian framework,feature subspaces, and progressive learning’, IEEE Trans.Image Process., 2003, 12, (8), pp. 924–937

[22] KHERFI M.L., ZIOU D., BERNARDI A.: ‘Combining positive andnegative examples in relevance feedback for content-based image retrieval’, J. Vis. Commun. Image Represent.,2003, 14, pp. 428–457

[23] RUI Y., HUANG T.S.: ‘Optimizing learning in image retrieval’.IEEE Int. Conf. Computer Vision and Pattern Recognition,Hilton Head, SC, USA, 2000

[24] KHERFI M.L., ZIOU D.: ‘Relevance feedback for CBIR: a newapproach based on probabilistic feature weighting withpositive and negative examples’, IEEE Trans. ImageProcess., 2006, 15, (4), pp. 1017–1030

[25] MUNEESAWANG P., GUAN L.: ‘Automatic machineinteractions for content-based image retrieval using aself-organizing tree map architecture’, IEEE Trans. NeuralNetw., 2002, 13, (4), pp. 821–834

[26] VASCONCELOS N., LIPPMAN A.: ‘Learning over multipletemporal scales in image databases’. European Conf.Computer Vision, 2000

[27] VASCONCELOS N., LIPPMAN A.: ‘Learning from user feedbackin image retrieval systems’. Advances in Neural InformationProcessing Systems, 1999

[28] SFIKAS G., CONSTANTINOPOULOS C., LIKAS A., GALATSANOS N.P.: ‘Ananalytic distance metric for Gaussian mixture modelswith application in image retrieval’. Proc. Int. Conf.Artificial Neural Networks (ICANN 2005), Warsaw, Poland,2005

[29] MARAKAKIS A., GALATSANOS N., LIKAS A., STAFYLOPATIS A.: ‘Arelevance feedback approach for content based imageretrieval using Gaussian mixture models’. Proc. Int. Conf.Artificial Neural Networks (ICANN 2006), Athens, Greece,September 2006

21



22

&

www.ietdl.org

[30] McLACHLAN G.M., PEEL D.: ‘Finite mixture models’ (JohnWiley and Sons Inc, New York, 2001)

[31] Microsoft Research Cambridge Object RecognitionImage Database: ‘Version 1.0’. http://research.microsoft.com/research/downloads/Details/b94de342-60dc-45d0-830b-9f6eff91b301/Details.aspx

[32] VLASSIS N., LIKAS A.: ‘A greedy EM algorithm for Gaussianmixture learning’, Neural Process. Lett., 2002, 15,pp. 77–87

[33] BIMBO A.D.: ‘Visual information retrieval’ (MorganKaufmann, San Mateo, CA, 1999)

8 Appendix8.1 Proof of (6.x)

We have

pm xð Þ ¼XKm

i¼1

pmif xjumi

� �and pl xð Þ ¼

XKl

j¼1

pljf xjulj

� �

with

unk ¼ mnk, Snk

� �and f xjunk

� �¼ N xjunk

� �¼

1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pð Þd Snk

�� q e�1=2 x�mnkð ÞTS�1nk x�mnkð Þ

Thus

ðpm xð Þpl xð Þdx ¼

ð XKm

i¼1

pmif xjumi

� � !"

�XKl

j¼1

pljf xjulj

� � !#dx

)

ðpm xð Þpl xð Þdx ¼

XKm

i¼1

XKl

j¼1

pmiplj

ðf xjumi

� �f xjulj

� �dx

(20)

However

ðf xjumi

� �f xjulj

� �dx

¼

ð1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

2pð Þd Smi

�� q e�1=2 x�mmið ÞTS�1mi x�mmið Þ



�1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

2pð Þd Slj

�� r e�1=2 x�mljð ÞTS�1lj x�mljð Þdx

¼1

2pð ÞdffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiSmi

�� Slj

�� r

�

ðe�

12 x�mmið Þ

TS�1mi x�mmið Þþ x�mljð Þ

TS�1lj x�mljð Þ

� �dx

¼e�1=2 mT

miS�1mi mmiþm

Tlj S�1lj mlj

� �2pð Þd

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiSmi

�� Slj

�� r

�

ðe�1=2 xT

S�1x�xT

S�1M�MT

S�1x

� �dx (21)

with

S ¼ S�1mi þ S

�1lj

� ��1

(22)

and

M ¼ S S�1mi mmi þ S

�1lj mlj

� �(23)

Moreover, given that Q ¼ M, S½ , we know that

ðf xjQð Þdx ¼

ð1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

2pð Þd S�� q e�1=2 x�Mð Þ

TS�1 x�Mð Þdx ¼ 1

)

ðe�

12 xT

S�1x�xT

S�1M�MT

S�1x

� �dx ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pð Þd S

�� qe�1=2MTS

�1M(24)

From (21) and (24), we have

ðf xjumi

� �f xjulj

� �dx ¼

e�1=2 mTmiS�1mi mmiþm

Tlj S�1lj mlj

� �2pð Þd

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiSmi

�� Slj

�� r

�

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pð Þd S

�� qe�1=2MTS

�1M

¼1ffiffiffiffiffiffiffiffiffiffiffi2pð Þd

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

S��

Smi

�� Slj

�� eR

vuut (25)

with R defined by

R ¼ mTmiS�1mi mmi þ mT

lj S�1lj mlj �MTS

�1M

¼ mTmiS�1mi mmi þ mT

lj S�1lj mlj � mT

miS�1mi þ mT

lj S�1lj

� �

� S�1mi þ S

�1lj

� ��1

S�1mi mmi þ S

�1lj mlj

� �(26)



IETdoi

www.ietdl.org

From the matrix inversion lemma, we have

S�1mi þ S

�1lj

� ��1

¼ Smi � Smi Smi þ Slj

� ��1

Smi (27)

and

S�1mi þ S

�1lj

� ��1

¼ Slj � Slj Smi þ Slj

� ��1

Slj (28)

Furthermore

S�1mi S

�1mi þ S

�1lj

� ��1

S�1lj ¼ Slj S

�1mi þ S

�1lj

� �Smi

h i�1

¼ Smi þ Slj

� ��1

) S�1mi þ S

�1lj

� ��1

¼ Smi Smi þ Slj

� ��1

Slj (29)

Similarly, we obtain

S�1mi þ S

�1lj

� ��1

¼ Slj Smi þ Slj

� ��1

Smi (30)

Combining (26) – (30), we obtain

R ¼ mTmiS�1mi mmi þ mT

lj S�1lj mlj

�mTmiS�1mi S

�1mi þ S

�1lj

� ��1

S�1mi mmi � mT

lj S�1lj

� S�1mi þ S

�1lj

� ��1

S�1lj mlj

� mTmiS�1mi S

�1mi þ S

�1lj

� ��1

S�1lj mlj � mT

lj S�1lj

� S�1mi þ S

�1lj

� ��1

S�1mi mmi

) R ¼ mmi � mlj

� �T

Smi þ Slj

� ��1

mmi � mlj

� �(31)

Finally, from (20), (22), (25) and (31), we obtain

ðpm xð Þpl xð Þdx

¼XKm

i¼1

XKl

j¼1

pmiplj

1ffiffiffiffiffiffiffiffiffiffiffi2pð Þd

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

S��

Smi

�� Slj

�� eR

vuut




p XKm

i¼1

XKl

j¼1

pmiplj

�

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiS�1mi þ S

�1lj

� ��1��

��Smi

�� Slj

�� e mmi�mljð ÞT

SmiþSljð Þ�1

mmi�mljð Þ

vuuuut


p XKm

i¼1

XKl

j¼1

�pmipljffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

Smi þ Slj

�� e mmi�mljð ÞT

SmiþSljð Þ�1

mmi�mljð Þ

r

Thus, from (5.2), we have

Sml ¼

ðpm xð Þpl xð Þdx

) Sml ¼1ffiffiffiffiffiffiffiffiffiffiffi2pð Þd

p XKm

i¼1

XKl

j¼1

pmipljffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiCml i, jð Þ�� ekml i,;jð Þ

q

with

Cml i, jð Þ ¼ Smi þ Slj

kml i, jð Þ ¼ mmi � mlj

� �T

C�1ml i, jð Þ mmi � mlj

� �

8.2 Proof of Remark 1

Because of Jensen’s inequality it holds that

log

ðp1(x)p2(x)dx �

ðp1(x) log p2(x)dx and

log

ðp1(x)p2(x)dx �

ðp2(x) log p1(x)dx

From (2) and (5.x), we have

KL p1jjp2

� �¼

ðp1 xð Þ log

p1 xð Þ

p2 xð Þdx and C2 p1, p2

� �

¼ � log2Ð

p1 xð Þp2 xð ÞdxÐp2

1 xð ÞdxþÐ

p22 xð Þdx

Therefore it holds that

KL p1jjp2

� �� C2 p1, p2

� �� log 2þ

ðp1 xð Þ log p1 xð Þdx

� log

ðp2

1 xð Þ þ p22 xð Þ

� �dx

23



24

&

www.ietdl.org

which leads to

KL(p1jjp2)� C2(p1, p2) � log 2þ

ðp1(x) log p1(x)dx

� log

ðp2

1(x)dx

By defining Di(pi) ¼Ð

pi(x) log pi(x)dx� logÐ

p2i (x)dx ¼

E[ log pi]pi� log E[pi]pi

� 0 (because of Jensen’sinequality) we, finally, obtain

KL(p1jjp2)� C2(p1, p2) � log 2þ D1(p1)

Similarly

KL(p2jjp1)� C2(p1, p2) � log 2þ D2(p2)

Adding the last two inequalities, we obtain

SKL p1, p2

� �� C2 p1, p2

� ��

1

2D1 p1

� �þ D2 p2

� �� þ log 2

8.3 Proof of (11.x)

From (10), we have


lmrm xð Þ

From (5.1), the C2 divergence between the new query andthe ith database image is given by

C2 q0, i� �

¼ � log2Sq0i

Sq0q0 þ Sii

Moreover, from (5.2), we obtain

Sq0i ¼

ðq0 xð Þi xð Þdx

¼

ð1� Lð Þq xð Þ þ

XMm¼1

lmrm xð Þ

" #i xð Þdx

) Sq0i ¼ 1� Lð Þ

ðq xð Þi xð Þdx

þXMm¼1

lm

ðrm xð Þi xð Þdx

) Sq0i ¼ 1� Lð ÞSqi þXMm¼1

lmSrmi



Furthermore

Sq0q0 ¼

ðq0 xð Þq0 xð Þdx

¼

ð1� Lð Þq xð Þ þ

XMm¼1

lmrm xð Þ

" #

� 1� Lð Þq xð Þ þXMm0¼1

lm0rm0 xð Þ

" #dx

¼ 1� Lð Þ2

ðq2 xð Þdxþ 2 1� Lð Þ

�XMm¼1

lm

ðq xð Þrm xð Þdx

þXMm¼1

XMm0¼1

lmlm0

ðrm xð Þrm0 xð Þdx

) Sq0q0 ¼ 1� Lð Þ2Sqq þ 2 1� Lð Þ

XMm¼1

lmSqrm

þXMm¼1

XMm0¼1

lmlm0Srmrm0

Of course, Sii ¼Ð

i2 xð Þdx and it does not change from oneRF round to another.

8.4 Proof of (15)

From (10), we have


lmrm xð Þ

Given that

q xð Þ ¼XNq

j¼1

pq jf xjuq j

� �with uq j ¼ mq j , Sq j

� �

rm xð Þ ¼XNm

j¼1

pmjf xjumj

� �with umj ¼ mmj , Smj

� �

and

i xð Þ ¼XNi

j¼1

pijf xjuij

� �with uij ¼ mij , Sij

� �

the new query model is given by

q0 xð Þ ¼XNq0

j¼1

pq0jf xjuq0j

� �

¼XNq

j¼1

1� Lð Þpq jf xjuq j

� �þXN1

j¼1

l1p1jf xju1j

� �

þ . . .þXNM

j¼1

lMpMjf xjuMj

� �(32)

with Nq0 ¼ Nq þ N1 þ . . .þ NM .



IETdoi

www.ietdl.org

Moreover, from (4.x), we have

ALA q0jji� �

¼XNq0

j¼1

pq0j log pibq0 ijð Þ þ log f mq0j juibq0 i jð Þ

� �hn

�1

2trace S

�1ibq0 i jð ÞSq0j

� ��

with

bq0i jð Þ ¼ k, mq0j � mik

2

Sik

� log pik

, mq0j � mil

2

Sil

� log pil , 8l = k

Taking into account (32), the sets of pq0j , mq0j and Sq0j are

Cp ¼ 1�Lð Þpq1, . . . , 1�Lð ÞpqNq, l1p11,

n. . . , l1p1 N1

, . . . , lMpM1, . . . , lMpMNM

o

Cm ¼ mq1, . . . , mqNq, m11, . . . , m1 N1

, . . . , mM1, . . . , mMNM

n oCS ¼ Sq1, . . . , SqNq

, S11, . . . , S1 N1, . . . , SM1, . . . , SMNM

n o

respectively. Moreover, each correspondence functionvalue, bq0i jð Þ, depends only on mq0j [ Cm and on



pik, mik, Sik, k¼ 1, . . . , Ni. Thus, the set of bq0i jð Þ is

Cb ¼ bqi 1ð Þ, . . . , bqi Nq

� �, b1i 1ð Þ, . . . ,

nb1i N1

� �, . . . , bMi 1ð Þ, . . . , bMi NM

� � and, for the ALA q0jji

� �, it holds that

ALA(q0ki)¼XNq

j¼1

(1�L)pq j log pibqi ( j)

n

þ log f mq j juibq i( j)

� �h�

1

2trace S

�1ibqi( j)Sq j

� ��

þXN1

j¼1

l1p1j log pib1i( j)þ log f m1j juib1i( j)

� �hn

�1

2trace S

�1ib1i( j)S1j

� ��

þ . . .þXNM

j¼1

lMpMj log pibMi( j)

n

þ log f mMj juibMi( j)

� �h

�1

2trace S

�1ibMi jð ÞSMj

� ��

)ALA(q0ki)¼ (1�L)ALA(qki)

þXMm¼1

lmALA(rmki)

25



Documents

ISSN 1751-9659 Probabilistic relevance feedback approach ...arly/papers/IET09.pdf · This is known in the CBIR community as the ‘semantic gap’ problem [13]. Relevance feedback