8
Region-Based Image Retrieval with Perceptual Colors Ying Liu 1 , Dengsheng Zhang 1 , Guojun Lu 1 , and Wei-Ying Ma 2 1 Gippsland School of Computing and Information Technology, Monash University, Vic, 3842, Australia, {ying.liu, dengsheng.zhang, guojun.lu}@infotech.monash.edu.au, 2 Microsoft Research Asia, No. 49 ZhiChun Road, Beijing, 100080, China, [email protected] Abstract. Due to the ‘semantic gap’ between low-level visual features and the rich semantics in user’s mind, performance of traditional content- based image retrieval systems is far from user’s expectation. In attempt to reduce the ‘semantic gap’, this paper introduces a region-based image retrieval system with high-level semantic color names used. For each seg- mented region, we define a perceptual color as the low-level color feature of the region. This perceptual color is then converted to a semantic color name. In this way, the system reduces the ‘semantic gap’ between numer- ical image features and the richness of human semantics. Four different ways to calculate perceptual color are studied. Experimental results con- firm the substantial performance of the proposed system compared to traditional CBIR systems. 1 Introduction To overcome the drawback of traditional text-based image retrieval systems which require considerable amount of human labors, content-based image re- trieval (CBIR) was introduced in the early 1990’s. CBIR indexes images by their low-level features, such as color, shape, texture. Commercial products and experimental prototype systems developed in the past decade include QBIC system[1], Photobook system[2], Netra system[3], SIMPLIcity system [4], etc. However, extensive experiments on CBIR systems show that in many cases low- level image features can’t describe the high level semantic concepts in the user’s mind. Hence, the performance of CBIR is still far from the user’s expectations [5][6]. ‘The discrepancy between the relatively limited descriptive power of low- level imagery features and the richness of user semantics’, is referred to as the ‘semantic gap’ [7]. In order to improve the retrieval accuracy of CBIR systems, research focus in CBIR has been shifted from designing sophisticated feature extraction algo- rithms to reducing the ‘semantic gap’[8]. Recent work in narrowing down the ‘semantic gap’ can be roughly classified into 3 categories: 1) Using region-based image retrieval (RBIR) which represents images at region-level with the intention to be more close to the perception of human visual system [9]. 2) Introducing K. Aizawa, Y. Nakamura, and S. Satoh (Eds.): PCM 2004, LNCS 3332, pp. 931–938, 2004. c Springer-Verlag Berlin Heidelberg 2004

LNCS 3332 - Region-Based Image Retrieval with Perceptual Colorsusers.monash.edu.au/~dengs/resource/papers/pcm04.pdf · 2005. 1. 6. · mind. Hence, the performance of CBIR is still

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: LNCS 3332 - Region-Based Image Retrieval with Perceptual Colorsusers.monash.edu.au/~dengs/resource/papers/pcm04.pdf · 2005. 1. 6. · mind. Hence, the performance of CBIR is still

Region-Based Image Retrieval withPerceptual Colors

Ying Liu1, Dengsheng Zhang1, Guojun Lu1, and Wei-Ying Ma2

1 Gippsland School of Computing and Information Technology,Monash University, Vic, 3842, Australia,

{ying.liu, dengsheng.zhang, guojun.lu}@infotech.monash.edu.au,2 Microsoft Research Asia, No. 49 ZhiChun Road, Beijing, 100080, China,

[email protected]

Abstract. Due to the ‘semantic gap’ between low-level visual featuresand the rich semantics in user’s mind, performance of traditional content-based image retrieval systems is far from user’s expectation. In attemptto reduce the ‘semantic gap’, this paper introduces a region-based imageretrieval system with high-level semantic color names used. For each seg-mented region, we define a perceptual color as the low-level color featureof the region. This perceptual color is then converted to a semantic colorname. In this way, the system reduces the ‘semantic gap’ between numer-ical image features and the richness of human semantics. Four differentways to calculate perceptual color are studied. Experimental results con-firm the substantial performance of the proposed system compared totraditional CBIR systems.

1 Introduction

To overcome the drawback of traditional text-based image retrieval systemswhich require considerable amount of human labors, content-based image re-trieval (CBIR) was introduced in the early 1990’s. CBIR indexes images bytheir low-level features, such as color, shape, texture. Commercial products andexperimental prototype systems developed in the past decade include QBICsystem[1], Photobook system[2], Netra system[3], SIMPLIcity system [4], etc.However, extensive experiments on CBIR systems show that in many cases low-level image features can’t describe the high level semantic concepts in the user’smind. Hence, the performance of CBIR is still far from the user’s expectations[5][6]. ‘The discrepancy between the relatively limited descriptive power of low-level imagery features and the richness of user semantics’, is referred to as the‘semantic gap’ [7].

In order to improve the retrieval accuracy of CBIR systems, research focusin CBIR has been shifted from designing sophisticated feature extraction algo-rithms to reducing the ‘semantic gap’[8]. Recent work in narrowing down the‘semantic gap’ can be roughly classified into 3 categories: 1) Using region-basedimage retrieval (RBIR) which represents images at region-level with the intentionto be more close to the perception of human visual system [9]. 2) Introducing

K. Aizawa, Y. Nakamura, and S. Satoh (Eds.): PCM 2004, LNCS 3332, pp. 931–938, 2004.c© Springer-Verlag Berlin Heidelberg 2004

Page 2: LNCS 3332 - Region-Based Image Retrieval with Perceptual Colorsusers.monash.edu.au/~dengs/resource/papers/pcm04.pdf · 2005. 1. 6. · mind. Hence, the performance of CBIR is still

932 Y. Liu et al.

relevance feedback into image retrieval system for continuous learning throughon-line interaction with users to improve retrieval accuracy [7]. 3) Extractingsemantic features from low-level image features using machine learning or datamining techniques [5].

We intent to develop a RBIR system with high-level concepts obtained fromnumerical region features such as color, texture, spatial position. This paperincludes our initial experimental results using semantic color names. Firstly, eachdatabase image is segmented into homogeneous regions. Then, for each region,a perceptual color is defined. This is different from conventional methods usingcolor histogram or color moments [4][9]. The perceptual color is then convertedto a semantic color name (for example, ‘grass green’, ‘sky blue’).In this way, the‘semantic gap’ is reduced. Another advantage of the system is that it allows usersto perform query by keywords (for example, ‘find images with sky blue regions’).

The remaining of the paper is organized as follows. In section 2, we describeour system in details. Section 3 explains the test data set and the performanceevaluation model. Experimental results are given in Section 4. Finally, Section5 concludes this paper.

2 System Description

Our system includes three components, image segmentation, color naming andquery processing.

2.1 Image Segmentation

Natural scenes are rich in both color and texture, and a wide range of naturalimages can be considered as a mosaic of regions with different colors and textures.We intent to relate low-level region features to high-level semantics such as colornames used in daily life (pink, green, sky blue, etc), real-world texture patterns(grass, sky, trees, etc). For this purpose, firstly we use ‘JSEG’[10] to segmentimages into regions homogeneous in color and texture. Fig.1 gives a few examples.

2.2 Color Naming

Perceptual Colors: In stead of using traditional color features such as colormoments or color histograms[4][9], we define a perceptual color for each seg-mented region with the intention to relate it to semantic color names.

Although millions of colors can be defined in computer system, the colorsthat can be named by users are limited [11]. For example, the first two colorsin Fig. 2 correspond to two different points in HSV (Hue, Saturation, Value)space, but users are likely to name them both as ‘pink’. Similarly, both of thenext 2 colors could be named as ‘sky blue’. The HSV values (with ranges [0,360],[0,100], [0,100], respectively) of the 4 colors are given below.

Pink: (H,S,V) = (326, 42,100), (330, 40, 100)SkyBlue: (H,S,V) = (200, 42, 93), (202, 40,100)

Page 3: LNCS 3332 - Region-Based Image Retrieval with Perceptual Colorsusers.monash.edu.au/~dengs/resource/papers/pcm04.pdf · 2005. 1. 6. · mind. Hence, the performance of CBIR is still

Region-Based Image Retrieval with Perceptual Colors 933

HSV color space is the most natural color space in visual. We define a per-ceptual color in HSV space for each region and then convert it to a semanticcolor name. Four different ways to define perceptual color are studied.

– We use the average HSV value of all the pixels in a region as its perceptualcolor (refered to as ‘Ave-cl’). This is reasonable as most regions obtainedusing JSEG are color homogeneous.

– The value of Hue is in circular, for example, both ‘0’ and ‘360’ represents ‘red’color. Averaging Hue values may result in a color very different from whatwe expect. For example, (0+360)/2=180, this means the average of two ‘red’pixels is ‘cyan’. To solve this problem, we first calculate the average RGBvalue of a region and then convert it to HSV domain. This result is referredto as ‘RGB-cl’.

– Due to the inaccuracy in image segmentation, pixels not belonging to theinterested region might be included in ‘Ave-cl’ calculation and results ina color perceptually different from that of the region. Hence, we considerusing the dominant color of a region as the perceptual color. For this, wefirst calculate the color histogram (10*4*4 bins) of a region and select the binwith maximum size. The average HSV value of all the pixels in the selectedbin is used as the dominant color and referred to as ‘Dm-cl’.

– Considering that the histogram of a region may contain more than one binsof large size, we calculate the average HSV value of all the pixels from M(¿1)large bins as the perceptual color. Experimentally, we select all those binswith size no less than 68% of the maximum-size bin. The result is referredto as ‘Dmm-cl’.

We observed that in most cases the four perceptual colors are very similar,as in Fig. 3(1), 3(2). However, in some special cases, ‘Ave-cl’ results in a colorvisually very different from that of the original region. For example, in region3(a), due to the inaccuracy in segmentation,a small part of the green background(left side) is included in the flower. In addition, some pixels are not of pink color,but dark yellow (at the center of the flower) or gray (in between the petals). Theresult is that the ‘Ave-cl’ in 3(b) turns to be different from the color of regionin 3(a).

Color Naming: Color naming is to map a numerical color space to semanticcolor names used in natural language. Qualification color naming model is oftenused, in which Hue value is quantized into a small set of about 10-20 base colornames [12]. In [12], the author uniformly quantized the Hue value into 10 basecolors, such as red, orange, yellow, etc. Saturation and Luminance are quantizedinto 4 bins respectively as adjectives signifying the richness and brightness ofthe color. There are two problems with the model used in [14]. Firstly,uniformquantization of Hue value is not proper as colors in the HSV space are notuniformly distributed (refer to Fig. 4). The reason is that different colors havedifferent wave bandwidths. For example, the wave band of yellow and blue are565-590nm, 450-500nm, respectively. The second problem is that in [12],‘red’

Page 4: LNCS 3332 - Region-Based Image Retrieval with Perceptual Colorsusers.monash.edu.au/~dengs/resource/papers/pcm04.pdf · 2005. 1. 6. · mind. Hence, the performance of CBIR is still

934 Y. Liu et al.

corresponds to Hue value from 0 to 36 (normalized to 0-0.1 in [12]). However,we notice that Hue of ‘red’ can be around either 0 or 360.

Considering the above mentioned problems, we design a color naming modelas follows. Firstly, we define 8 base colors, red, orange, yellow, green, cyan, blue,purple, magenta, with the range of the Hue values as [0,8) or [345,360], [8,36),[36,80),[80,160),[160,188),[188,262), [262,315),[315,345), respectively. Saturationand Value are quantized into 3 bins as in Fig. 5, with the corresponding adjectivesshown in Table 1. The asterisks indicate special cases. When S=0 and V=1, wehave ‘grey’. When S=0 and V>80, we have ‘white’. When V=0, we alwaysget ‘black’. Base color names with their adjectives can be simplified as othercommon-used color names. For instance, ‘pale magenta’ is named as ‘pink’.

Finally, we obtain 8*2*2+3=35 different colors. For example, the first twocolors in Fig. 2 are both named as ‘pink’. Similarly, the other two colors arenamed as ‘sky blue’.

In this way, the low-level color features are mapped to high-level semanticcolor names, thus reducing the ‘semantic gap’.

2.3 Query Processing

All database images are segmented into regions and their low-level color fea-tures and color names are stored for retrieval purpose. The system can supportdifferent types of queries.

1) Query by specified region - The user selects an interested region froman image as the query region. The system calculates the low-level color featureand color name of the query region. All images containing region(s) of samecolor name are selected and form a candidate set C. Then, the images in C arefurther ranked according to their EMD[13] distance to the query image. Withregion distance defined as the Euclidean distance between region color features,EMD measures the overall distance between two images.

2) Query by keyword – The keyword is selected from the 35 semantic colorsdefined. In this case, the system returns all images containing region(s) of samecolor name as specified by the keyword.

In this paper, we work on the first case, which is more complex.

3 Database and Performance Evaluation

Corel data set is often used to evaluate the performance of image retrieval sys-tems due to its large size, heterogeneous content and human annotated groundtruth available. However, to be used in image retrieval system as test set, somepre-processing work is necessary for the following two reasons: 1) some imageswith similar content are divided into different categories. For examples, the im-ages in ‘Ballon1’ and ‘Ballon2’. 2) Some ‘category labels’ are very abstract andthe images within the category can be largely varied in content. For instance, thecategory ‘Australia’ includes pictures of city building, Australian wild animals,etc. A few examples are given in Fig. 6.

Page 5: LNCS 3332 - Region-Based Image Retrieval with Perceptual Colorsusers.monash.edu.au/~dengs/resource/papers/pcm04.pdf · 2005. 1. 6. · mind. Hence, the performance of CBIR is still

Region-Based Image Retrieval with Perceptual Colors 935

Fig. 1. JSEG segmentation results

Fig. 2. Example colors

Fig. 3. Region perceptual colors (a) original region, (b) ‘Ave-cl’, (c) ‘RGB-cl’, (d)‘Dm-cl’, (e) ‘Dmm-cl’

Fig. 4. HSV colorspace (H,S)

Fig. 5. Quantization of S,V

Fig. 6. Example images from category ‘Australia’

Fig. 7. Query images/regions examples

Page 6: LNCS 3332 - Region-Based Image Retrieval with Perceptual Colorsusers.monash.edu.au/~dengs/resource/papers/pcm04.pdf · 2005. 1. 6. · mind. Hence, the performance of CBIR is still

936 Y. Liu et al.

Hence, it’s better to select a subset from Corel images with ground truthdata available, or make some necessary changes in setting the group truth data.

We selected 5,000 Corel images as our test set(ground truth available).‘JSEG’ segmentation produces 29187 regions (5.84 regions per image on av-erage) with size no less than 3% of the original image. We ignore small regionsconsidering that regions should be large enough for us to study their texturepatterns later.

Precision and recall are often used in CBIR system to measure retrievalperformance. Precision (Pr) is defined as the ratio of the number of relevantimages retrieved Nrel to the total number of retrieved images N . Recall (Re) isdefined as the number of relevant images retrieved Nrel over the total numberof relevant images available in the database Nall. We calculate the average Prand Re of 30 queries with N=10,20,. . . 100, and obtain the Pr∼Re curve. A fewquery images and the specified regions are displayed in Fig. 7.

4 Experimental Results

Firstly, we compare the performance of our RBIR system using ‘Ave-cl’, ‘RGB-cl’, ‘Dm-Cl’ and ‘Dmm-cl’ respectively. The Pr∼Re curves are given in Fig.8(a).The results show that ‘Dm-cl’ and ‘Dmm-cl’ perform better than ‘Ave-cl’ does.‘RGB-cl’ works better than ‘Ave-cl’ but not as good as ‘Dm-cl’ and ‘Dmm-cl’.In addition, the performance of ‘Dm-cl’ is very close to that of ‘Dmm-cl’. In thiswork, we use ‘Dmm-cl’. Fig.9 compares the retrieval results for query 1 using‘Ave-cl’ and ‘Dmm-cl’.

Our experiments also show that the proposed color naming system worksbetter than that used in [12].Due to space limitation, we did not give the resultshere.

In addition,we compare our system (denoted as ‘R’) with a CBIR systemusing global color histogram (referred to as ‘G’). In system ‘G’, images are rep-resented by their HSV space color histogram with H, S, V uniformly quantizedinto 18, 4, 4 bins, respectively. The similarity of two images is measured by theEuclidean distance between their color histograms.

We observed that ‘R’ works well when the interested region is recognizedand the color names defined can well describe it. For example, in query 2, thequery region is the ‘eagle’. ‘R’ recognizes ‘eagle’ and successfully finds manyrelevant images. Fig.10 gives the retrieval results, with ‘R’ returns 8 relevantimages within the top 10 retrieved, while ‘G’ finds only 3.

In another case, such as query 3, both ‘R’ and ‘G’ work well. Due to thelarge green background available in the query image and the relevant databaseimages, retrieval accuracy of ‘G’ is very high. On the other hand, color name‘grass green’ can well represent the grass region. Hence, retrieval performanceof ‘R’ is also very good. Among the first 10 images retrieved, the number ofrelevant images retrieved by ‘G’ and ‘R’ are both 10.

Fig.8(b) compares the performance of ‘G’ and ‘R’ over 30 queries.

Page 7: LNCS 3332 - Region-Based Image Retrieval with Perceptual Colorsusers.monash.edu.au/~dengs/resource/papers/pcm04.pdf · 2005. 1. 6. · mind. Hence, the performance of CBIR is still

Region-Based Image Retrieval with Perceptual Colors 937

Fig. 8. (a) Using different perceptual colors, (b) ‘G’-‘R’ over 30 queries

Fig. 9. Retrieval Results for query 1. The first image is the query image. ‘Q’ refers toquery region. ‘T’ refers to the relevant images selected.

Fig. 10. Retrieval Results for query 2. The first image is the query image. ‘Q’ refersto query region. ‘T’ refers to the relevant images selected.

5 Conclusions

This paper presents a region-based image retrieval system using high-level se-mantic color names. For each segmented region, a perceptual color is defined,

Page 8: LNCS 3332 - Region-Based Image Retrieval with Perceptual Colorsusers.monash.edu.au/~dengs/resource/papers/pcm04.pdf · 2005. 1. 6. · mind. Hence, the performance of CBIR is still

938 Y. Liu et al.

which is then converted to a semantic color name using our color naming al-gorithm. In this way, the system reduces the ‘semantic gap’ between numericalimage features and the richness of human semantics. Experimental results con-firm the substantial performance of the proposed system over conventional CBIRsystems.

In our future work, we will make use of multiple types of low-level imagefeatures to extract more accurate semantics. We expect the performance of oursystem to be further improved.

References

1. C.Faloutsos, R.Barber, M.Flickner, J.Hafner, W. Niblack, D.Petkovic, andW.Equitz, “Efficient and Effective Querying by Image Content,” J. Intell. Inform.Syst., vol.3, no.3-4, pp231-262,1994

2. A. Pentland, R.W.Picard, and S.Scaroff, “Photobook: Content-based Manipulationfor Image Databases”, Inter. Jour. Computer Vision, vol. 18, no.3, pp233-254, 1996.

3. W.Y.Ma and B.Manjunath, “Netra: A Toolbox for Navigating Large ImageDatabases”, Proc. of ICIP, pp568-571, 1997.

4. J.Z.Wang, J.Li, and G. Wiederhold, “SIMPLIcity: Semantics-Sentitive IntegratedMatching for Picture Libraries,” IEEE Trans. Pattern and Machine. Intelligence.Vol 23, no.9, pp947-963, 2001.

5. A.Mojsilovic, B.Rogowitz, “Capturing Image Semantics with Low-Level Descrip-tors”, Proc. of ICIP, pp18-21, 2001

6. X.S. Zhou, T.S.Huang, “CBIR: From Low-Level Features to High-Level Seman-tics”, Proc. SPIE Image and Video Communication and Processing, San Jose, CA.Jan.24-28, 2000.

7. Yixin Chen, J.Z.Wang, R.Krovetz, “An Unsupervised Learning Approach toContent-based Image Retrieval”, IEEE Proc. Inter. Symposium on Signal Pro-cessing and Its Applications, pp197-200, July 2003.

8. Arnold W.M. Smeulders, Marcel Worring, Amarnath Gupta, Ramesh Jain,“Content-based Image Retrieval at the End of the Early Years”, IEEE Trans.On Pattern Analysis and Machine Intelligence, vol. 22, No.12, Dec. 2000.

9. Feng Jing, Mingjing Li,, Lei Zhang, Hong-Jiang Zhang, Bo Zhang, “Learningin Region-based Image Retrieval”, Proc. Inter. Conf. on Image and Video Re-trieval(CIVR2003), 2003.

10. Y.Deng, B.S.Manjunath and H.Shin “Color Image Segmentation”, Proc. IEEEComputer Society Conference on Computer Vision and Pattern Recognition,CVPR ’99, Fort Collins, CO, vol.2, pp.446-51, June 1999.

11. E.B.Goldstein, Sensation and Perception, 5th Edition, Brooks/Cole, 1999.12. Conway, D.M., “An Experimental Comparison of Three Natural Language Color

Naming Models”, Proc. East-West International Conference on Human-ComputerInteractions, St. Petersburg, Russia, pp328-339, 1992.

13. Rubner, Y., Tomasi, C., and Guibas, L., “A Metric for Distributions with Appli-cations to Image Databases”, Proc. of the 1998 IEEE Inter. Conf. on ComputerVision, Jan. 1998.