9
PicSOM – content-based image retrieval with self-organizing maps Jorma Laaksonen * , Markus Koskela, Sami Laakso, Erkki Oja Laboratory of Computer and Information Science, Helsinki University of Technology, P.O. Box 5400, Fin-02015 HUT, Espoo, Finland Abstract We have developed a novel system for content-based image retrieval in large, unannotated databases. The system is called PicSOM, and it is based on tree structured self-organizing maps (TS-SOMs). Given a set of reference images, PicSOM is able to retrieve another set of images which are similar to the given ones. Each TS-SOM is formed with a dierent image feature representation like color, texture, or shape. A new technique introduced in PicSOM facilitates automatic combination of responses from multiple TS-SOMs and their hierarchical levels. This mechanism adapts to the user’s preferences in selecting which images resemble each other. Thus, the mechanism implements a relevance feedback technique on content-based image retrieval. The image queries are performed through the World Wide Web and the queries are iteratively refined as the system exposes more images to the user. Ó 2000 Elsevier Science B.V. All rights reserved. Keywords: Content-based image retrieval; Image databases; Neural networks; Self-organizing map 1. Introduction Content-based image retrieval (CBIR) from unannotated image databases has been an object for ongoing research for a long period (Rui et al., 1999). Digital image and video libraries are be- coming more widely used as more visual infor- mation is produced at a rapidly growing rate. The technologies needed for retrieving and browsing this growing amount of information are still, however, quite immature and limited. Many projects have been started in recent years to research and develop ecient systems for con- tent-based image retrieval. The best-known CBIR system is probably Query by Image Content (QBIC) (Flickner et al., 1995) developed at the IBM Almaden Research Center. Other notable systems include MIT’s Photobook (Pentland et al., 1994) and its more recent version, FourEyes (Minka, 1996), the search engine family of Visu- alSEEk, WebSEEk, and MetaSEEk (Smith and Chang, 1996; Beigi et al., 1998), which all are de- veloped at Columbia University, and Virage (Bach et al., 1996), a commercial content-based search engine developed at Virage Technologies. We have implemented an image-retrieval sys- tem that uses a World Wide Web browser as the user interface and the tree structured self-orga- nizing map (TS-SOM) (Koikkalainen and Oja, www.elsevier.nl/locate/patrec Pattern Recognition Letters 21 (2000) 1199–1207 * Corresponding author. Tel.: +358-9-4513269; fax: +358-9- 4513277. E-mail addresses: jorma.laaksonen@hut.fi (J. Laaksonen), markus.koskela@hut.fi (M. Koskela), sami.laakso@hut.fi (S. Laakso), erkki.oja@hut.fi (E. Oja). 0167-8655/00/$ - see front matter Ó 2000 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 8 6 5 5 ( 0 0 ) 0 0 0 8 2 - 9

PicSOM – content-based image retrieval with self-organizing maps

Embed Size (px)

Citation preview

Page 1: PicSOM – content-based image retrieval with self-organizing maps

PicSOM ± content-based image retrieval with self-organizingmaps

Jorma Laaksonen *, Markus Koskela, Sami Laakso, Erkki Oja

Laboratory of Computer and Information Science, Helsinki University of Technology, P.O. Box 5400, Fin-02015 HUT, Espoo, Finland

Abstract

We have developed a novel system for content-based image retrieval in large, unannotated databases. The system is

called PicSOM, and it is based on tree structured self-organizing maps (TS-SOMs). Given a set of reference images,

PicSOM is able to retrieve another set of images which are similar to the given ones. Each TS-SOM is formed with a

di�erent image feature representation like color, texture, or shape. A new technique introduced in PicSOM facilitates

automatic combination of responses from multiple TS-SOMs and their hierarchical levels. This mechanism adapts to

the user's preferences in selecting which images resemble each other. Thus, the mechanism implements a relevance

feedback technique on content-based image retrieval. The image queries are performed through the World Wide Web

and the queries are iteratively re®ned as the system exposes more images to the user. Ó 2000 Elsevier Science B.V. All

rights reserved.

Keywords: Content-based image retrieval; Image databases; Neural networks; Self-organizing map

1. Introduction

Content-based image retrieval (CBIR) fromunannotated image databases has been an objectfor ongoing research for a long period (Rui et al.,1999). Digital image and video libraries are be-coming more widely used as more visual infor-mation is produced at a rapidly growing rate. Thetechnologies needed for retrieving and browsingthis growing amount of information are still,however, quite immature and limited.

Many projects have been started in recent yearsto research and develop e�cient systems for con-tent-based image retrieval. The best-known CBIRsystem is probably Query by Image Content(QBIC) (Flickner et al., 1995) developed at theIBM Almaden Research Center. Other notablesystems include MIT's Photobook (Pentland et al.,1994) and its more recent version, FourEyes(Minka, 1996), the search engine family of Visu-alSEEk, WebSEEk, and MetaSEEk (Smith andChang, 1996; Beigi et al., 1998), which all are de-veloped at Columbia University, and Virage (Bachet al., 1996), a commercial content-based searchengine developed at Virage Technologies.

We have implemented an image-retrieval sys-tem that uses a World Wide Web browser as theuser interface and the tree structured self-orga-nizing map (TS-SOM) (Koikkalainen and Oja,

www.elsevier.nl/locate/patrec

Pattern Recognition Letters 21 (2000) 1199±1207

* Corresponding author. Tel.: +358-9-4513269; fax: +358-9-

4513277.

E-mail addresses: jorma.laaksonen@hut.® (J. Laaksonen),

markus.koskela@hut.® (M. Koskela), sami.laakso@hut.®

(S. Laakso), erkki.oja@hut.® (E. Oja).

0167-8655/00/$ - see front matter Ó 2000 Elsevier Science B.V. All rights reserved.

PII: S 0 1 6 7 - 8 6 5 5 ( 0 0 ) 0 0 0 8 2 - 9

Page 2: PicSOM – content-based image retrieval with self-organizing maps

1990; Koikkalainen, 1994) as the image similarityscoring method. The retrieval method is based onthe relevance feedback approach (Salton andMcGill, 1983) adopted from traditional, text-based information retrieval. In relevance feedback,the previous human±computer interaction is usedto re®ne subsequent queries to better approximatethe need of the user.

As far as the current authors are aware, therehas not been until now notable image retrievalapplications based on the self-organizing map(SOM) (Kohonen, 1997).

2. Basic operation of PicSOM

PicSOM is intended as a general framework formulti-purpose content-based image retrieval. Thesystem is designed to be open and able to adapt todi�erent kinds of image databases, ranging fromsmall and domain-speci®c picture sets to large,general-purpose image collections. The featuresmay be chosen separately for each speci®c task.

From a user's point of view, the basic operationof the PicSOM image retrieval is as follows. (1)The user connects to the World Wide Web serverproviding the search engine with her web browser.(2) The system presents a list of databases andfeature extraction methods available to that par-ticular user. (3) After the user has made the se-lections, the system presents an initial set oftentative images scaled to a small ``thumbnail''size. The user then indicates the subset of theseimages which best matches her expectations and tosome degree of relevance ®ts to her purposes.Then, she hits the ``Continue Query'' button whichsends the information on the selected images backto the search engine. (4) The system marks theimages selected by the user with a positive valueand the non-selected images with a negative valuein its internal data structure. Based on this data,the system then presents the user a new set ofimages together with the images selected this far.(5) The user again selects the relevant images andthe iteration continues. Hopefully, the fraction ofrelevant images increases in each image set pre-sented to the user and, ®nally, one of them is ex-actly what she was originally looking for.

2.1. Feature extraction

PicSOM may use one or several types of sta-tistical features for image querying. Separate fea-ture vectors can be formed for describing thecolor, texture, shape, and structure of the images.A separate TS-SOM (see Section 2.2) is thenconstructed for each feature. These maps are usedin parallel to ®nd from the database those imageswhich are most similar to the images selected bythe user. The feature selection is not restricted inany way, and new features can be included as longas the feature vectors are of ®xed dimensionalityand the Euclidean metric can be used to measuredistances between them.

In the ®rst stage, ®ve di�erent feature extractionmethods were applied to the images, and the cor-responding TS-SOMs were created. The featuretypes used in this study included two di�erentcolor and shape features and a simple texturefeature. All except one feature type were calculatedin ®ve separate zones of the image. The zones areformed by ®rst extracting from the center of theimage a circular zone whose size is approximatelyone-®fth of the area of the image. Then the re-maining area is divided into four zones with twodiagonal lines.

Average color (cavg in Tables 1 and 2) is ob-tained by calculating average R-, G- and B-valuesin the ®ve zones of the image. The resulting3� 5 � 15-dimensional feature vector thus de-scribes the average color of the image and givesrough information on the spatial color composi-tion.

Color moments (cmom) were introduced byStricker and Orengo (1995). The color momentfeatures are computed by treating the color valuesin di�erent color channels in each zone as separateprobability distributions and then calculating the®rst three moments (mean, variance, and skew-ness) of each color channel. This results in a3� 3� 5 � 45-dimensional feature vector. Due tothe varying dynamic ranges, the feature values arenormalized to zero mean and unit variance.

Texture neighborhood (texture) feature in Pic-SOM is also calculated in the same ®ve zones. TheY-values of the YIQ color representation of everypixel's 8-neighborhood are examined, and the

1200 J. Laaksonen et al. / Pattern Recognition Letters 21 (2000) 1199±1207

Page 3: PicSOM – content-based image retrieval with self-organizing maps

estimated probabilities for each neighbor beingbrighter than the center pixel are used as fea-tures. When combined, this results in an 8� 5 �40-dimensional feature vector.

Shape histogram (shist) feature is based on thehistogram of the eight quantized directions ofedges in the image. When the histogram is sepa-rately formed in the same ®ve zones as before, an8� 5 � 40-dimensional feature vector is obtained.It describes the distribution of edge directions invarious parts of the image and thus reveals theshape in a low-level statistical manner (Brandtet al., 2000).

Shape FFT (sFFT) feature is based on theFourier transform of the binarized edge image.The image is normalized to 512� 512 pixels beforethe FFT. No zone division is made. The magni-tude image of the spectrum is low-pass ®ltered anddecimated by a factor of 32, resulting in a 128-dimensional feature vector (Brandt et al., 2000).

2.2. Tree structured SOM (TS-SOM)

The ®ve separate feature vectors obtained fromeach image in the database have to be indexed tofacilitate similarity-based queries. For the index-ing, similarity-preserving clustering using SOMs(Kohonen, 1997) is used. Due to the high dimen-sionality of the feature vectors and the large size ofthe image database, computational complexity is amajor issue.

The TS-SOM (Koikkalainen and Oja, 1990;Koikkalainen, 1994) is a tree-structured vectorquantization algorithm that uses SOMs at each ofits hierarchical levels. In PicSOM, all TS-SOMmaps are two-dimensional. The number of mapunits increases when moving downwards in theTS-SOM. The search space on the underlyingSOM level is restricted to a prede®ned portion justbelow the best-matching unit on the above SOM.Therefore, the complexity of the searches in TS-SOM is lower than if the whole bottommost SOMlevel had been accessed without the tree structure.

The computational lightness of TS-SOM facil-itates the creation and use of huge SOMs which, inour PicSOM system, are used to hold the imagesstored in the image database. The TS-SOMs for allfeatures are sized 4� 4, 16� 16, 64� 64, and

256� 256, from top to bottom. The feature vec-tors calculated from the images are used to trainthe levels of the TS-SOMs beginning from the toplevel. During the training, each feature vector ispresented to the map multiple, say 100, times. Inthe process, the model vectors stored in the mapunits are modi®ed to match the distribution andtopological ordering of the feature vector space.After the training phase, each unit of the TS-SOMs contains a model vector which may be re-garded as the average of all feature vectors map-ped to that particular unit. In PicSOM, we thensearch in the corresponding dataset for the featurevector which best matches the stored model vectorand associate the corresponding image to that mapunit. Consequently, a tree-structured, hierarchicalrepresentation of all the images in the database isformed. In an ideal situation, there should be one-to-one correspondence between the images andTS-SOM units at the bottom level of each map.

2.3. Using multiple TS-SOMs

In an image-based query, the ®ve feature vec-tors computed from the images presented to theuser are passed to the respective TS-SOMs and thebest-matching units are found at each level of themaps. Combining the results from several mapscan be done in a number of ways. A simple methodwould be to ask the user to enter weights for dif-ferent maps and then calculate a weighted average.This, however, requires the user to give informa-tion which she normally does not have. Generally,it is a di�cult task to give low-level features suchweights which would coincide with a human'sperception of images at a more conceptual level.Therefore, a better solution is to use the relevancefeedback approach in which the results of multiplemaps are combined automatically by using theimplicit information from the user's responsesduring the query session. The PicSOM system thustries to learn the user's preferences over the ®vefeatures from the interaction with her and then setsits own responses accordingly.

The rationale behind our approach is as fol-lows: if the images selected by the user are locatedclose to each other on one of the ®ve TS-SOMmaps, it seems that the corresponding feature

J. Laaksonen et al. / Pattern Recognition Letters 21 (2000) 1199±1207 1201

Page 4: PicSOM – content-based image retrieval with self-organizing maps

performs well on the present query, and the rela-tive weight of its opinion should be increased. Thiscan be implemented simply by marking the shownimages on the maps with positive and negativevalues depending on whether the user has selectedor rejected them, respectively. The positive andnegative responses are then normalized so thattheir sum equals zero.

The combined e�ect of positively marked unitsresiding near each other can then be enhanced byconvolving the maps with a simple, low-pass ®l-tering mask. As a result, those areas which havemany positively marked images close togetherspread the positive response to their neighboringmap units. The images associated with the unitshaving largest positive values after the convolutionare then good candidates for the next images to beshown to the user.

The conversion from the positive and negativemarked images to the convolutions in a three-levelTS-SOM is visualized in Fig. 1. The sizes of theSOM levels are 4� 4, 16� 16, and 64� 64, fromtop to bottom. First, SOMs displaying the positivemap units as white and negative as black areshown on the left. These maps are then low-pass®ltered, and the resulting map surfaces are shown

on the right. It is seen that a cluster of positiveimages resides at the lower edge of the map.

2.4. Re®ning queries

A typical retrieval session with PicSOM consistsof a number of subsequent queries during whichthe retrieval is focused more accurately on imagesresembling those shown images which the user hasaccepted. In the current implementation, all posi-tive values on all convolved TS-SOM levels aresorted in descending order in one list. Then, apreset number, e.g., 16, of the best candidate im-ages which have not been shown to the user beforeare output as the new image selection.

Initially, the query begins with a set of referenceimages picked from the top levels of the TS-SOMs.In the early stages of the image query, the systemtends to present the user images from the upperTS-SOM levels. As soon as the convolutions beginto produce large positive values also on lower maplevels, the images from these levels are shown tothe user. The images are, therefore, graduallypicked more and more from the lower map levelsas the query is continued.

The inherent property of PicSOM to use morethan one reference image as the input informationfor retrievals is important and makes PicSOMdi�er from other content-based image retrievalsystems, such as QBIC (Flickner et al., 1995),which use only one reference image at a time.

3. Implementation

We have wanted to make our PicSOM searchengine available to the public by implementing iton the World Wide Web. This also makes thequeries on the databases machine independent,because standard web browsers can be used. Ademonstration of the system and further infor-mation on it is available at http://www.cis.hut.®/picsom.

The PicSOM user interface in the midst of anongoing query is displayed in Fig. 2. First, thethree parallel TS-SOM map structures representthree map levels of SOMs trained with RGB color,texture, and shape features, from left to right. The

Fig. 1. An example of converting the positive and negative map

units to convolved maps in a three-level TS-SOM. Map surfaces

displaying the positive (white) and negative (black) map units

are shown on the left. The resulting convolved maps are shown

on the right.

1202 J. Laaksonen et al. / Pattern Recognition Letters 21 (2000) 1199±1207

Page 5: PicSOM – content-based image retrieval with self-organizing maps

sizes of the SOM levels are 4� 4, 16� 16, and64� 64, from top to bottom. On color terminals,positive map points are seen as red and negative asblue. More saturated shades represent strongerresponses and white points are zero valued.

Below the convolved SOMs, the ®rst set of 16images consists of images selected by the user onthe previous rounds of the retrieval process. Theseimages may then be unselected on any subsequentround. In this example, images representingbuildings have been selected as positive. The nextimages, separated by a horizontal line, are the 16best-scoring new images in this round obtainedfrom the convolved units in the TS-SOMs. Therelevant images are marked by checking the ap-propriate checkboxes. Finally, the page has someuser-modi®able settings and a ``Continue Query''button which submits the new selections back tothe search engine.

The user can at any time switch from the iter-ative queries to examining of the TS-SOM surfacessimply by clicking on the map images. A portionof the map around the chosen viewpoint is thenshown before the other images. This allows direct

browsing in the image space. Relevant images onthe map surface can then also be selected forcontinuing queries.

4. Databases

Currently, we have made our experiments withtwo image databases of di�erent sizes. The ®rstdatabase contains 4350 miscellaneous imagesdownloaded from the image collection residing atftp://ftp.sunet.se/pub/pictures/. Most of the imagesare color photographs in JPEG format. Figs. 1 and2 were produced using this database.

The second image database we have used is theimage collection from the Corel Gallery 1 000 000product (Corel, 1999). It contains 59 995 photo-graphs and arti®cial images with a very wide va-riety of subjects. When this collection is mappedon a 256� 256 SOM, each map unit approxi-mately corresponds to one image. All the imagesare either of size 256� 384 or 384� 256 pixels.The majority of the images are in color, but thereare also a small number of grayscale images. Theexperiments described in the next sections wereperformed using this dataset.

5. Performance measures

A number of measures for evaluating the abilityof various visual features to reveal image similarityare presented in this section. Assume a database Dcontaining a total of N images and an image classC � D with NC images. Such a class can contain,for example, images of aircraft, buildings, nature,human faces, and so on. The process of gatheringa class is naturally arbitrary, as there are no dis-tinct and objective boundaries between classes. Ifthe images in the database already have reliabletextual information about the contents of the im-ages, it can be used directly; otherwise, manualclassi®cation is needed. Then, the a priori proba-bility qC of the class C is qC � NC=N . An idealperformance measure should be independent ofthe a priori probability and the type of images inthe image class.

Fig. 2. The web-based user interface of PicSOM.

J. Laaksonen et al. / Pattern Recognition Letters 21 (2000) 1199±1207 1203

Page 6: PicSOM – content-based image retrieval with self-organizing maps

5.1. Observed probability

For each image I 2 C with a feature vector f I ,we calculate the Euclidean distance dL2

�I ; J� of f I

and the feature vectors f J of the other imagesJ 2 D n fIg in the database. Then, we sort theimages based on their ascending distance from theimage I and store the indices of the images in an�N ÿ 1�-sized vector gI . By gI

i , we denote the ithcomponent of gI .

Next, for all images I 2 C, we de®ne a vector hI

as follows:

8i 2 f0; . . . ;N ÿ 2g : hIi � 1; if gI

i 2 C;0; otherwise:

��1�

The vector hI thus has value one at location i, if thecorresponding image belongs to the class C. As Chas NC images, of which one is the image I itself,each vector hI contains exactly NC ÿ 1 ones. In theoptimal case, the closest NC ÿ 1 images to image Iare also in the same class C; thus hI

i � 1 fori � 0; . . . ;NC ÿ 2, and 0 otherwise.

We can now de®ne the observed probability pi:

8i 2 f0; . . . ;N ÿ 2g : pi � 1

NC

XK2C

hKi : �2�

The observed probability pi is a measure of theprobability that an image in C has, as the ithnearest image, according to the feature extrac-tion f , another image belonging to the sameclass.

5.2. Forming scalars from the observed probability

The observed probability pi is a function of theindex i, so it cannot easily be used to compare twodi�erent feature extractions. Therefore, it is nec-essary to derive scalar measures from pi to enableus to perform such comparisons.

We chose to use three ®gures of merit to de-scribe the performance of individual feature types.First, a local measure was calculated as the aver-age of the observed probability pi for the ®rst 50retrieved images, i.e.:

glocal �P49

i�0 pi

50: �3�

The glocal measure obtains values between zero andone.

For a global ®gure of merit we used theweighted sum of the observed probability pi cal-culated as:

gglobal � Re

PNÿ2i�0 pi e

jpi=�Nÿ1�

NC ÿ 1

( ): �4�

Also gglobal attains values between zero and one. Itfavors observed probabilities that are concentratedin small indices and punishes large probabilities inlarge index values.

The third value of merit, ghalf , measures thetotal fraction of images in C found when the ®rsthalf of the pi sequence is considered:

ghalf �PN=2

i�0 pi

NC ÿ 1: �5�

ghalf obviously yields a value of one in the optimalcase and a value of half with the a priori distri-bution of images.

For all the three ®gures of merit, glocal, gglobal,and ghalf , the larger the value the better the dis-crimination ability of the feature extraction.

5.3. s measure

We have applied another quantitative ®gure,denoted as the s measure, which describes theperformance of the whole CBIR system instead ofa single feature type. It is based on measuring theaverage number of images the system retrievesbefore the correct one is found. The s measureresembles the ``target testing'' method presented in(Cox et al., 1996), but instead of using human testusers, the s measure is fully automatic.

For obtaining the s measure we use the samesubset C of images as before for the single features.We have implemented an ``ideal screener'', acomputer program which simulates a human userby examining the output of the retrieval systemand marking the retrieved images as either relevantor non-relevant according to whether the imagesbelong to C. The iterative query processing canthus be simulated and performance data collectedwithout any human intervention.

1204 J. Laaksonen et al. / Pattern Recognition Letters 21 (2000) 1199±1207

Page 7: PicSOM – content-based image retrieval with self-organizing maps

For each of the images in the class C, we thenrecord the total number of images presented by thesystem until that particular image is shown. Fromthis data, we calculate the average number ofshown images before the hit. After division by N,this ®gure yields a value

s 2 qC

2; 1

hÿ qC

2

i: �6�

For values s < 0:5, the performance of the systemis thus better than random picking of images and,in general, the smaller the s value the better theperformance.

6. Results

In order to evaluate the performance of thesingle features and the whole PicSOM system withdi�erent types of images, three separate imageclasses were picked manually from the 59 995-im-age Corel database. The selected classes wereplanes, faces, and cars, of which the databasecontains 292, 1115, and 864 images, respectively.The corresponding a priori probabilities are 0.5%,1.9%, and 1.4%. In the retrieval experiments, theseclasses were thus not competing against each otherbut mainly against the ``background'' of 57 724,i.e., 96.2% of other images.

Table 1 shows the results of forming the threescalar measures, glocal, gglobal, and ghalf , from themeasured observed probabilities. It can be seenthat the glocal measure is always larger than thecorresponding a priori probability. Also, the shapefeatures shist and sFFT seem to outperform theother feature types for every image class and every

performance measure. Otherwise, it is not yetclear which one of the three performance mea-sures would be the most suitable as a singledescriptor.

The results of the experiments with the wholePicSOM system are shown in Table 2. First, eachfeature was used alone and then di�erent combi-nations of them were tested. The two shape fea-tures again yield better results than the color andtexture features, which can seen from the ®rst ®verows in Table 2.

By examining the results with all the testedclasses, it can be seen that the general trend is thatusing a larger set of features yields better resultsthan using a smaller set. Most notably, using allfeatures gives better results than using any onefeature alone.

The implicit weighting of the relative impor-tances of di�erent features models the semanticsimilarity of the images selected by the user. Thiskind of automatic adaptation is desirable as it isgenerally not known which feature combinationwould perform best for a certain image query. Toalleviate the situation, the PicSOM approachprovides a robust method for using a set of dif-ferent features and image maps formed thereof inparallel so that the result exceeds the performancesof all the single features.

However, it also seems that if one feature vectortype has clearly worse retrieval performance thanthe others, it may be more bene®cial to excludethat particular TS-SOM from the retrieval process.Therefore, it is necessary for the proper operationof the PicSOM system that the used features arewell balanced, i.e., they should, on the average,perform quite similarly by themselves.

Table 1

Comparison of the performances of di�erent feature extraction methods for di�erent image classes. Each entry gives three performance

®gures (glocal=gglobal=ghalf )

Features Classes

Planes Faces Cars

cavg 0.06/0.16/0.59 0.05/0.10/0.56 0.03/0.21/0.63

cmom 0.06/0.16/0.59 0.05/0.10/0.56 0.04/0.21/0.63

texture 0.04/0.04/0.52 0.06/0.16/0.57 0.07/0.22/0.63

shist 0.11/0.62/0.84 0.10/0.54/0.82 0.13/0.34/0.68

sFFT 0.04/0.49/0.78 0.07/0.39/0.72 0.10/0.30/0.65

J. Laaksonen et al. / Pattern Recognition Letters 21 (2000) 1199±1207 1205

Page 8: PicSOM – content-based image retrieval with self-organizing maps

7. Conclusions and future plans

We have in this paper introduced the PicSOMapproach to content-based image retrieval andmethods for quantitative evaluation of its perfor-mance. As a single visual feature cannot classifyimages into semantic classes, we need to gather theinformation provided by multiple features toachieve good retrieval performance. The results ofour experiments show that the PicSOM system isable to e�ectively select from a set of parallel TS-SOMs a combination which coincides with theuser's conceptual view of image similarity.

One obvious direction to increase PicSOM'sretrieval performance is to do an extensive study ofdi�erent feature representations to ®nd a set ofwell-balanced features which, on the average,perform as well as possible.

As a vast collection of unclassi®ed images isavailable on the Internet, we have also madepreparations for using PicSOM as an image searchengine for the World Wide Web.

Acknowledgements

This work was supported by the Finnish Centreof Excellence Programme (2000±2005) of theAcademy of Finland, project New informationprocessing principles, 44886.

References

Bach, J.R., Fuller, C., Gupta, A., et al., 1996. The Virage image

search engine: an open framework for image management.

In: Sethi, I.K., Jain, R.J. (Eds.), Storage and Retrieval for

Image and Video Databases IV. In: SPIE Proceedings

Series, Vol. 2670. San Jose, CA, USA.

Beigi, M., Benitez, A., Chang, S.-F., 1998. MetaSEEk: a

content-based metasearch engine for images. In: Storage

and Retrieval for Image and Video Databases VI (SPIE).

In: SPIE Proceedings Series, Vol. 3312. San Jose, CA,

USA.

Brandt, S., Laaksonen, J., Oja, E., 2000. Statistical shape

features in content-based image retrieval. In: Proceedings of

15th International Conference on Pattern Recognition,

Barcelona, Spain, Vol. 2, pp. 1066±1069.

Corel, 1999. The Corel Corporation world wide web home

page, http://www.corel.com.

Cox, I.J., Miller, M.L., Omohundro, S.M., Yianilos, P.N.,

1996. Target testing and the PicHunter Bayesian multimedia

retrieval system. In: Advanced Digital Libraries ADL'96

Forum. Washington, DC.

Flickner, M., Sawhney, H., Niblack, W., et al., 1995. Query by

image and video content: the QBIC system. IEEE Computer

September, 23±31.

Kohonen, T., 1997. Self-Organizing Maps, second extended

ed. In: Springer Series in Information Sciences, Vol. 30.

Springer, Berlin.

Koikkalainen, P., 1994. Progress with the tree-structured self-

organizing map. In: Cohn, A.G. (Ed.), 11th European

Conference on Arti®cial Intelligence. European Committee

for Arti®cial Intelligence (ECCAI). Wiley, New York.

Koikkalainen, P., Oja, E., 1990. Self-organizing hierarchical

feature maps. In: Proceedings of 1990 International Joint

Conference on Neural Networks, Vol. II. IEEE, INNS, San

Diego, CA.

Table 2

The resulting s values in the experiments

Features Classes

cavg cmom texture shist sFFT Planes Faces Cars

� 0.30 0.35 0.39

� 0.31 0.43 0.34

� 0.26 0.26 0.34

� 0.16 0.22 0.18

� 0.19 0.22 0.18

� � � 0.16 0.21 0.18

� � � 0.17 0.23 0.18

� � � � 0.14 0.21 0.16

� � � 0.15 0.21 0.18

� � � 0.18 0.22 0.19

� � � � 0.14 0.20 0.16

� � � � � 0.14 0.20 0.16

1206 J. Laaksonen et al. / Pattern Recognition Letters 21 (2000) 1199±1207

Page 9: PicSOM – content-based image retrieval with self-organizing maps

Minka, T.P., 1996. An image database browser that learns from

user interaction. Master's thesis, M.I.T., Cambridge, MA.

Pentland, A., Picard, R.W., Sclaro�, S., 1994. Photobook: tools

for content-based manipulation of image databases. In:

Storage and Retrieval for Image and Video Databases II.

In: SPIE Proceedings Series, Vol. 2185. San Jose, CA, USA.

Rui, Y., Huang, T.S., Chang, S.-F., 1999. Image retrieval:

current techniques, promising directions, and open issues.

Journal of Visual Communication and Image Representa-

tion 10 (1), 39±62.

Salton, G., McGill, M.J., 1983. Introduction to Modern

Information Retrieval. In: Computer Science Series. Mc-

Graw±Hill, New York.

Smith, J.R., Chang, S.-F., 1996. VisualSEEk: a fully automated

content-based image query system. In: Proceedings of the

ACM Multimedia 1996. Boston, MA.

Stricker, M., Orengo, M., 1995. Similarity of color images. In:

Storage and Retrieval for Image and Video Databases III

(SPIE). In: SPIE Proceedings Series, Vol. 2420. San Jose,

CA, USA.

J. Laaksonen et al. / Pattern Recognition Letters 21 (2000) 1199±1207 1207