1999 - Invariant Content-based Image Retrieval Using a Complete Set of Fourier-Mellin Descriptors

7/27/2019 1999 - Invariant Content-based Image Retrieval Using a Complete Set of Fourier-Mellin Descriptors

1/5

Invariant Content-based Image RetrievalUsing a Complete Set of Fourier-Mellin Descriptors

St6phane Derrode, Mohamed DaoudiG r o u p e d e R e c h e rc h e I m a g es e t F o r m e s

E N I C / I N T - Cite scien tifique - ru e G. M a r c o n i ,59658 Vil leneuve d As cq Cedex , France(derrode ,duoudi) @en ic . r

AbstractImages retrieval from a large database is an important and

emerging search urea. This retrieval requires the choice of asuitable set of image features, a method to extract themcorrectly, and a measure of the similario between features thatcan be computed in real-time. This paper presents a completeset of Fourier-Mellin descriptors fo r object storage andretrieval. Our approach is translation, rotation and scaleinvariant. Several retrieval examples in large image databaseare presented.1. Introduction

Recent advances in computing and communicationtechnology are taking the actual information processingtools to their limits. The last years have seen anoverwhelming accumulation of digital data such asimag es, video, and audio. Internet is an excellent exam pleof distributed databases containing several millions ofimages. Other cases of large image databases includesatellite and medical imagery, where it is often difficult todescribe o r to annotate the image content.Techniques dealing with traditional informationsystems have been adequate for many applicationsinvolving alphanumeric records. They can be ordered,indexed and searched for matching patterns in astraightforward manner. However, in many scientificdatabase applications, the information content of imagesis not explicit, and it is not easily suitable for directindexing, classification and retrieval. In particular, thelarge-scale image databases emerge as the mostchallenging problem in the field of scientific databases.The Visual Information Retrieval (VIR) systems areconcerne d with efficient storage and record retrieval. In

Faouzi GhorbelEcole Nat ionale de s Sciences de l ln format ique

C E R T / D E R - 42 , rue Asdrubal ,1002 Tunis, Tunisia

Fuouei.Ghorbe l @cer t . m inconz . n

general, a VIR system is useful only if i t can retrieveacceptable matches in real-time. In addition to human-assigned keywords, VIR systems can use the visual contentof the images as indexes, e.g. color, texture and shapefeatures. Recently, several systems combine heterogeneousattributes to improve discrimination and classificationresults: QBIC [l], Photobook [2], Virage [3], [4]. One pointis to determine the ap propriate weight given to each attribute.The features determine the type of queries that can beexpressed, and VIR systems tend to be application specificbecause each application will have different retrieval needs.In general, a VIR system is useful only if it can retrieveacceptable matches in real time. This requires the choice of asuitable set of image features, a method for correctlyextracting them, and a feature distance measure that can becomputed in real time. Thus, this paper focuses on the use ofglobal features for the retrieval of isolated gray-level objectspresent on an uniform background.W e propose that for VIR system so me propert ies may beuseful to improve the representation :Invar iance for a given set of geom etric transformations,

Stab i l i ty under small shape distortions and numericalapproximations,Simplicity and real-time computation.

We base our study on these properties together withanother criterion called comple teness . The latter ensures thattwo objects will have the same shape if and only if all theirdescriptors are equal [ 5 ] . n section 11 we present a completeset of global gray-level im age des criptors that satisfies all thecriteria cited above. This set is extracted from the analy t ica lFourier -Mel l in t ransform, and it is invariant to translation,rotation and scale changes of the images. From the solidtheoretical background, we are able to define a real distancebetween invariant descriptors that can be computed in almost

0-7695-0253-9/99 $10.000 99 9 IEEE 877


2/5

real-time. Section I11 introduces the application of theinvariant distance in a retrieval scheme. Severalclassification and retrieval results from two real gray-level object databases are also discussed. Finally, wepresent the main conclusions and future work from thestudy.2. Gray level shape Invariant representation

In this section, we describe a method to compute acomplete and invariant representation of gray-levelimages that is suitable for content-based retrieval fromimage databases. The representation is performed usingthe analytical Fourier-Mellin transform and it is invariantto image translation, rotation and scaling, i.e. planesimilarity transformations. Based on the properties of theinvariant set, we use a true mathematical distancebetween shapes as a similarity measure to compare twoimages in a database.2.1. The analytical Fourier-Mellin transform.

Late on the ~ O ' S , the optical research communityintroduced the Fourier-Mellin Transform (FMT) forpattern recognition purposes [6]. In the meanw hile, theMellin transform was studied for target identification inrelation to signal translation and scaling [ 7]. Few yearsago, i t was pointed out that the crucial numericaldifficulties in computing the Fourier-Mellin transform ofan image might be solved by using the Analytical Fourier-Mellin Transform [8]. Let us recall the main bases:Le t f(r ,@ be the irradiance function representing agray-level image defined over a compact set of R2 . T heorigin of the polar coordinates is located on the imagecentroi'd in order to offset translation. The analyticalFourier-Mellin transform (AFMT) of f is given byequation (1) [9]:

(c )F igure 1 (a 128x128 buyer f l y im age and b i t s l og - o l a r re-Sam l ing (128x/28). c) M agn i t ude o f t b e c e n d o u r ie r- hR s ll inc o e l k i e n t s o b t a i n e d f ro m t h e f a s t AFMT a lgor i t hm (S,,,,=221).

1 1 2 n 0-iv -ikO drMF(k,v)=- . j j f(rD)r e .-a (1 )2n 0 0 rfor all k in Z, v in R, and m0. f is assumed to be squaresumm able under the measure ddr/r.The AFM T of an object f can be seen as the usual FM T ofthe distorted object fdr,@=r"f(r,@. The AF'MT gives acomplete description of gray-level objects since f can beretrieved by its inverse transform given by:

fo r rE R , 6sSI an d 00.Since no discrete transform exists, three approximationsof the AFMT have been designed: the direct, the Cartesianand the fast algorithms 191. For image content-based retrievalapplications, real-time is crucial, and thus the fast algorithmcan be performed as follows. With a variable change on theintegral (q=ln(r) instead of r), the equation (1) can be rewritteninto Fourier transforms as follows:

A fast algorithm is obtained by computing a twodimensional Fast Fourier Transform on the log-polardistorted object eq'f(eq,@. The log-polar sampling is builtfrom the points obtained by the intersection between N beansoriginating from the image centroi'd and M concentric circleswith exponentially increasing radii. In our tests, we havechos en N=128 and M=128.The AFMT of an object is theoretically infinite in extent.When dealing with computers, a finite set of coefficients isonly available so that part of the original image content islost. However, the FM T goes to 0 as IvI an d Ikl go to M an dthe information lost by numerical truncation can be as weakas required. Let K and V be the boundaries of the finite extentA F M T so that M(k,v ) is available for kE -K..K] and VE [-V..V]with a sampling step value, over axis V, set to 1 an d ~ 0 . 5 .Due to the symmetry property of the Fourier-Mellintransform, the effective size of this representation isS~,v=[(2K+1)(2V+1)+1]/2. Figure 1illustrates the fast AFMTapproximation algorithm of a real gray-level image. Thecomputation time fo r such algorithm is provided in the nextsection.

F igure 2. l og -po la r and Car t es ian reconsbuct i onof the butte@ inFig. 1.

878


3/5

Similarly, the original gray-level image can beretrieved from its numerical AFMT by using a fastapproximation of the inverse transform (Eq. (2)) and alog-polar to Cartesian coordinates conversion as well.Figure 2 shows the reconstructed Cartesian image of Fig.1 obtained by using K=64 an d V=64. Increasing the valuesof K and V, the quality of the reconstructed image can beimproved. Extensive experiments on AFMTapproximation and reconstruction can be found in [9].

Finally, let us recall the transformation law of theanalytical Fourier-Mellin transform for planar similarities.Le t g be the orientation and size change of an object f bythe angle /?~[Q;2z]nd the scale factor E R , , i.e.g(r,@=f(m,atP).hese two objects have the same shapeand denoted similar objects. One can easily shows that theAFMT of g and f are related by:

Mg 0 k v) =a-& i kpMI (k v) (4)for all k in Z, v in R and 00.Equation (4) is called the shift theorem and suggeststhat the AFMT is well suited fo r the computation of

global shape features which are invariant to the objectposition, orientation and size.2.2. A complete set of Fourier-Mellin features

Since the usual Fourier-Mellin transforms of tw osimilar objects only differ by a phase factor (Eq. (4 )without the a term), a set of global invariant descriptorsregardless of the object position, orientation and size, isgenerally extracted by computing the modulus of someFourier-Mellin coefficients [9]. A set like this is notcomplete since the phase information is lost and i t onlyrepresents a signature of the shape. Due to the lack ofcompleteness, one can find distinct objects with identicaldescriptor values and a classification process may mix upobjects, which is critical for content-based retrieval fromimage database (both false positive and true negativematches).

Recently, a complete family of similarity invariantdescriptors based on the AFMT has been suggested [4].This family can be easily written and applied to anystrictly positive o va lu e as follows:

Each feature Z(k,v) is constructed in order to compensatethe aotiveiUerm that appears in the shift theorem (4).Thecompensation is achieved via the two Fourier-Mellincoefficients, M(0,O) and M(1,0), which are thenormalization parameters. The set in Eq. (5) is completesince it is possible (i) to recover the FMT of an object fromall of their invariant descriptors and the two normalizationparameters by inverting Eq. (5); (ii) to reconstruct theoriginal gray-level image by the inverse A FM T (see Fig. 2) .Figure 3 shows the magnitude of the central invariantdescriptors obtained from the butterfly in Fig. 1. W eobtained as many invariant descriptors as Fourier-Mellincoefficients (sio,io=231 invariants).2.3. A true distance between shapes

For pattern recognition purposes, the classification of anunknown object into a set of reference patterns is achievedby several comparison methods. Besides them, the directcomparison of a couple of features, neural networks; orstatistical classifiers, by means of intra- and inter-classsimilarity measures.

Since the invariant set (5) is also convergent for squaresummable functions, it can be shown that the followingfunction defines a true mathematical distance betweenshapes [ 5 ] :

This distance is an Euclidean distance expressed in theinvariant domain. T heoretically, it is zero if and only if theobjects are identical up to a similarity transformation. Due tonumerical sampling and approximation, we never haveexactly zero and the value of the distance is used for thequantification for the similarity between objects, regardlessof their pose, orientation and size in the image.

10v -10 -10 K

F igure 3. M a g n i t u d e o f some central i nvar i an t descr i p t o rs ex t rac t ed rom the butterflyin Fig. 1.

879


4/5

3. Application to image archival an dretrievalIn this section, we use the results obtained from the

previous section for content-based image retrieval. Firstly,the two databases used for experiments are presented.Then, the main algorithm, for retrieval and classification,is sketched. Finally, we show different retrieval resultsand examine the main properties of the invariant set.3.1. The test database content

The first database includes a collection of 91 imagesof butterflies (Fig. 4). 69 images represent distinctbutterflies and 22 images show translation. rotation andscale changes of one of the 69 images. This database hasbeen built to test similarity transfurmation invariance forour descriptor set according to the number of invariantsused in experiments. It must be pointed out that searchinga set such this is quite difficult, since the global aspect ofbutterflies shows prevailing shape characteristics and themain difference between butterflies essentially comesfrom the texture of the wings. Thus, contour-basedretrieval scheme might be confuse d.

The second model collection is the well-knownColumbia database which contains 1440 images of 20different 3D objects: 72 images per object taken at 5degrees in pose. In this collection, camera and objectmotion clearly violate the similarity transformation modelunderlying the image representation. This database is usedto test the robustness and stability of ou r descripto r set toshape distortions.

3.2. Algorithm and computation timeAccording to section 11, the retrieval of an unknown

image from a set of P models can be split into thefollowing two stages :I . At archival time (off-line), each model f,, iE [ 1 .PI isrepresented by its bi-dimensional matrix of invariantfeatures { Ifi(k,v)), with kc -K..K] and VE [-V..V] byusing algorithm s derived from Eq. (3) and ( 5 ) .2. At query time (on-line), the classification of an

unknown object g is achieved as follows:co mp ut in g their invariant features (if the query does-estimating the distance between the input object gnot belong to the database),

and the P models f, ,by using Eq. (6),

-sorting and selecting the m odels w hich give th e smallestdistance to g.In this way, all images from the database can be

compared with any query image, and ranked by the value ofd2. A fixed number of top-ranked images can then bedisplayed to the user, enabling browsing through thedatabase. Since invariants are used as indexes, it is noted thata new model can be added to the database without m odifyingthe models already stored. Furthermore, no voting algorithmfor selecting the best model is r equired.Concerning to the experiments, we compute S15,15=481invariant descriptors for each 128x128 gray-level images.The computation time for deriving the invariantrepresentation is about 4s on a 200 MHz PC (including fileU O , log-polar re-sampling, 2D FFT computation andinvariant extraction). It only depends on the size of theimage, not on its complexity. At query time, the searchthrough the database is exhaustive and performed linearly.Th e distance estimation and sortin g is almost in real-time.3.3. The butterflies database

Figure 4 shows three sample sets of the top 12 rankedimages which have been retrieved from the same queryimage with different amounts of invariants.

- - _ _Figure 4. Resu l ts of database retrieval using the quety ima e shown in the to left comer.Images are ranked from l e f t to right top to b o t t o m , %y decreasing sim,&riIyincreasing distance measure IIJ.ek S3 3=, Middle: S77=113. RightL I S~ 4 8 1 .For the right query (S15,15=481), the meth od retrieves inpriority all th e 4 images similar to the query images. Takingless descriptors than 481 for the computation of the distancecan produced confusion between shapes. The mix-up can beseen in the middle (s7,7=113) and left (s3,3=25) retrieves,where the rank of the 4 similar butterfly increases as the

amount of invariants decreases. Other experiments haveshown that this problem occurs when the scale factorbetween the query and the target is large. Th e position of theother ranked images also changes according to s, however,most of them are present in the 3 retrievals and showsignificant resemblance with the query .

880


5/5

3.4. The Columbia databaseFigure 5 shows the top 12 ranked images retrieved

from the same query image by larger set of invariants(from left to right). Retrieval results become b etter as thedimension of the invariant features enlarges, since moreand more im ages of the query object are retrieved.

- --F i gu re 5 :Resulh of retrievalusing he query shown in the top-left c o m e r .

The top 12 ranked of the right query (481 features)represent images of the query object. The last raw ofmatches is interesting since it presents the back side of theVaseline bottle, thus we de monstrate the robustness of themethod to small distortions (non-similaritytransformations). Th e first non similar object is ranked atthe 18* position.4. Conclusion

We have proposed the use of a complete set of gray-level invariant features regardless of the object pose,orientation and size for the retrieval of an unknown objectinto a set of models. An efficient algorithm was presentedfor automatic extraction of a large set of invariantfeatures, and a true invariant distance was then tested forcontent-based retrieval with two databases. Experimentalresults have confirmed that classification results can beimproved when increasing the dimension of the featurespace (with a small additional computing time at querytime). The feature set have showed high-quality numericalinvariance and good retrieval results when the cam era andthe object motion clearly violate the similaritytransformation model underlying the imagerepresentation.

Since our set works better for database containingisolated objects on a uniform background, it seems well-suited for professional database such as: medical, biology,telecommunication (MPEG 7) , etc. When the objectessentially presents local deformations, such as faces,

gray-level local invariants should provide better retrievalresults

Our complete invariant set can be used to store andencode complex gray-level shapes uniquely since we haveshowed in section I1 that it is possible to reconstruct anobject from its invariant descriptors. Future work will alsoinclude the improve of the efficiency for indexing andsearching by means of feature quantization and hierarchicalsearch strategies to avoid the exhaustive search through thedatabase as it has been proposed here. Multiple imagequeries can also be studied to refine retrievals by adding thedistances obtained from each individual query.5. References[l ] M. Flickner et al., Query by image and video content: the QBlCsystem, IEEE Computer, pp 23-32, 1995.[2] A. Pentland, R . W. Picard and S. Sclaroff, Photobook: tools forcontent based manipulation of image database, in Proc. of SPIE, Storageand Retrieval for Image and video Database 11, San Jose, CA, pp 34-47,1994.[3] J .R. Bach et al, Virage image search engine: an open frameworkfor image management, in Proc. of SPIE Storuge und Retrievalfor Imageand video Database N , San Jose, CA , pp 76-87, 1994.(41 S . Matusiak, M. Daoudi, T. Blu and 0. Avaro Sketch-basedimages database retrieval Sushi1 Jajodia, M. Tamer Ozsu, Asuman Dogac,editor, Proceedings Workshop on Multimedia Information systems, volume1508 of Lecture Notes in Computer Science, International, pages 185.191,Instanbul, Turkey, September 24-26.[5] F. Ghorbel, A complete invariant description for grey-levelimages by the harmonic analysis approach, Pattern Recognition Lett. 15, pp1043-1051, 1994.[6] D. Casasent and D. Psaltis, Scale invariant optical transform, Opt .Eng. 15(3), pp 258-261, 1976.[7] P.E. Zwicke and Z. Kiss, A new implementation of the Mellintransform and its application to radar classification, IEEE Truns. ParternAnalysis Mach. Intell. 5(2), pp 191-199, 1983.[X I Y. Sheng and H.H. Arsenault, Experiments on pattern recognitionusing invariant Fourier-Mellin descriptors, J . Opt. Soc. Am. A 3(6), pp 771-776,1986.[9] S . Derrode and F. Ghorbel, Robust and efficient Fourier-Mellintransform approximations for invariant gray-level image description andreconstruction, Submitted to Pattern Recognition.

881

Documents

1999 - Invariant Content-based Image Retrieval Using a Complete Set of Fourier-Mellin Descriptors