4
Combining Novel features for Content Based Image Retrieval K. Satya Sai Prakash 1 , RMD. Sundaram 2 1 Department of Computer Science and Engineering Amrita Vishwa Vidyapeetham Coimbatore, Tamil Nadu, India-641 105 Phone: +91(422)2656422 Fax: +91(422)2656274 E-mail: [email protected] 2 Wipro Technologies 111, Anna Salai, Guindy, Chennai, Tamil Nadu, India-600 032 Phone: 09994282183 E-mail: [email protected] Keywords: Fuzzy color Histogram, Tamura Features, Phase Congruency. Abstract – Recent advances in technology have made fabulous amount of multimedia information on the Internet. Content Based Image Retrieval (CBIR) aims at retrieving the similar set of images from the database corresponding to the users query. To do so, a set of features need to be extracted from the images and stored in the database prior to accepting users query. In this paper we introduce a system for interactive image retrieval that combines different approaches to feature based queries. The proposed technique is tested on two types of datasets, first one consisting of different animals and the second dataset consisting of birds, flowers and buildings. The retrieval accuracy is 96.4% and 92.2% for a database size of 530 each. 1. INTRODUCTION Due to the fact that images represent a particularly large volume of information, the efficient and possibly intelligent browsing of images based on visual content is becoming increasingly important in application fields that make use of large image databases, such as diagnostic medical imaging, remote sensing, entertainment, etc. Content-based image retrieval uses the visual contents of an image such as color, shape, texture, and spatial layout to represent and index the image. In typical content-based image retrieval systems (Fig. 1.), the visual contents of the images in the database are extracted and described by multi-dimensional feature vectors. To retrieve images, users provide the retrieval system with example images or sketched figures. The system then changes these examples into its internal representation of feature vectors. The similarities / distances between the feature vectors of the query example or sketch and those of the images in the database are then calculated and retrieval is performed with the aid of an indexing scheme. We introduce novelty in these basic features as well combine together these features in order to yield better results. 2. IMAGE CONTENT DESCRIPTORS Image Content include both Visual and semantic content. Visual Content can be either very general or domain specific. General visual content include color, texture, shape, spatial relationship, etc. Domain specific visual content, like human faces, is application dependent and may involve domain knowledge. Semantic content is obtained either by textual annotation or by complex inference procedures based on visual content. This work concentrates on general visual contents descriptions. Fig. 1. Content-Based Image Retrieval System 3. FEATURE EXTRACTION A good image content descriptor should not vary for accidental variance introduced by the imaging systems (e.g., the variation of the illuminant of the scene). However, there is an exchange between the variance introduced and the discriminative power of visual features extracted, since a wide variety class of invariance loses the ability to discriminate between essential differences [1]. Invariant description has been largely studied in computer vision (e.g., object recognition), but is reasonably a new concept in image retrieval [2]. In the next sections, we discuss about the various kind of novel features extracted from the image and how do we combine them together to yield better results. 3.1 Fuzzy Color Histogram Color is the most widely used visual content for retrieving images because it does not depend on image size or orientation. A histogram is a standard statistical description of a distribution in terms of occurrence frequencies of different event classes; for color, the event classes are regions in color space. A Conventional Color Histogram (CCH) considers neither the color similarity across different bins nor the color dissimilarity in the same bin. Hence it is found to be sensitive to noisy interference such as illumination changes and quantization errors. More over, 373

[IEEE 2007 14th International Workshop on Systems, Signals and Image Processing and 6th EURASIP Conference focused on Speech and Image Processing, Multimedia Communications and Services

  • Upload
    rmd

  • View
    215

  • Download
    2

Embed Size (px)

Citation preview

Page 1: [IEEE 2007 14th International Workshop on Systems, Signals and Image Processing and 6th EURASIP Conference focused on Speech and Image Processing, Multimedia Communications and Services

Combining Novel features for Content Based Image Retrieval

K. Satya Sai Prakash1, RMD. Sundaram2 1Department of Computer Science and Engineering

Amrita Vishwa Vidyapeetham Coimbatore, Tamil Nadu, India-641 105

Phone: +91(422)2656422 Fax: +91(422)2656274 E-mail: [email protected] 2Wipro Technologies

111, Anna Salai, Guindy, Chennai, Tamil Nadu, India-600 032 Phone: 09994282183 E-mail: [email protected]

Keywords: Fuzzy color Histogram, Tamura Features, Phase Congruency.

Abstract – Recent advances in technology have made fabulous amount of multimedia information on the Internet. Content Based Image Retrieval (CBIR) aims at retrieving the similar set of images from the database corresponding to the users query. To do so, a set of features need to be extracted from the images and stored in the database prior to accepting users query. In this paper we introduce a system for interactive image retrieval that combines different approaches to feature based queries. The proposed technique is tested on two types of datasets, first one consisting of different animals and the second dataset consisting of birds, flowers and buildings. The retrieval accuracy is 96.4% and 92.2% for a database size of 530 each. 1. INTRODUCTION

Due to the fact that images represent a particularly large volume of information, the efficient and possibly intelligent browsing of images based on visual content is becoming increasingly important in application fields that make use of large image databases, such as diagnostic medical imaging, remote sensing, entertainment, etc. Content-based image retrieval uses the visual contents of an image such as color, shape, texture, and spatial layout to represent and index the image. In typical content-based image retrieval systems (Fig. 1.), the visual contents of the images in the database are extracted and described by multi-dimensional feature vectors. To retrieve images, users provide the retrieval system with example images or sketched figures. The system then changes these examples into its internal representation of feature vectors. The similarities / distances between the feature vectors of the query example or sketch and those of the images in the database are then calculated and retrieval is performed with the aid of an indexing scheme. We introduce novelty in these basic features as well combine together these features in order to yield better results.

2. IMAGE CONTENT DESCRIPTORS

Image Content include both Visual and semantic content. Visual Content can be either very general or domain specific. General visual content include color, texture, shape, spatial relationship, etc. Domain specific visual

content, like human faces, is application dependent and may involve domain knowledge. Semantic content is obtained either by textual annotation or by complex inference procedures based on visual content. This work concentrates on general visual contents descriptions.

Fig. 1. Content-Based Image Retrieval System

3. FEATURE EXTRACTION

A good image content descriptor should not vary for accidental variance introduced by the imaging systems (e.g., the variation of the illuminant of the scene). However, there is an exchange between the variance introduced and the discriminative power of visual features extracted, since a wide variety class of invariance loses the ability to discriminate between essential differences [1]. Invariant description has been largely studied in computer vision (e.g., object recognition), but is reasonably a new concept in image retrieval [2]. In the next sections, we discuss about the various kind of novel features extracted from the image and how do we combine them together to yield better results. 3.1 Fuzzy Color Histogram

Color is the most widely used visual content for retrieving images because it does not depend on image size or orientation. A histogram is a standard statistical description of a distribution in terms of occurrence frequencies of different event classes; for color, the event classes are regions in color space. A Conventional Color Histogram (CCH) considers neither the color similarity across different bins nor the color dissimilarity in the same bin. Hence it is found to be sensitive to noisy interference such as illumination changes and quantization errors. More over,

373

Page 2: [IEEE 2007 14th International Workshop on Systems, Signals and Image Processing and 6th EURASIP Conference focused on Speech and Image Processing, Multimedia Communications and Services

CCHs large dimension or histogram bins require large computation on histogram comparison. To address these concerns, this paper presents a new color histogram representation, called Fuzzy Color Histogram (FCH), by considering the color similarity of each pixel’s color associated to all the histogram bins through fuzzy-set membership function In comparison with the conventional color histogram (CCH), which assigns each pixel into one of the bins only, FCH spreads each pixel’s total membership value to all the histogram bins. Also, to reduce the computational complexity, we use Fuzzy C -Means (FCM) clustering algorithm. Taking a color space containing n different color bins, the color histogram of image I containing N pixels is represented as H (I) = [h1, h2, …hn], where hi = Ni / N is the probability of a pixel in the image belonging to the i th color bin, and Ni is the total number of pixels in the i th color bin. According to the probability theory, hi can be defined as follows: hi = � P i /j Pj = (1/N) � Pi /j (1)

where Pj is the probability of a pixel chosen from image I being the jth pixel, which is 1/N and Pi/j is the conditonal probability of the chosen jth pixel belonging to the ith color bin. FCH, on the other hand considers each of the N pixels in image I, related to all the n color bins via Fuzzy set membership function such that the degree of “belongingness” of the jth pixel to the ith color bin is found by distributing the membership value of the jth pixel, �ij, to the ith color bin. Thus the Fuzzy color Histogram (FCH) of image I can be expressed as F(I) = [f1, f2… fn], where fi = � �ij Pj = (1/N) � �ij (2) Thus when compared to CCH, FCH considers not only the similarity of different colors from different bins but also the dissimilarity of those colors from the same bin. For computing the membership values, fuzzy C-means clustering algorithm [3] performed on the color components recorded in the perceptually uniform CIELAB color space is being utilized. Compared to RGB and CMYK, it is often quicker to make efficient color corrections in Lab.

3.2 Texture

Texture is another important property of images. Texture measures look for visual patterns in images and how they are spatially defined. Textures are represented by texels which are then placed in to a number of sets, depending on how many textures are detected in the image. These sets not only define the texture, but also where in the image the texture is located. Basically, texture representation methods can be classified into two categories: structural and statistical. Structural methods, including morphological operator and adjacency graph [4], describe texture by identifying structural primitives and their placement rules. Statistical methods, including Fourier power spectra, Tamura feature and multi-resolution filtering techniques such as Gabor and

wavelet transform, characterize texture by the statistical distribution of the image intensity. Among which we concentrate on extracting Tamura features and multi-resolution filtering techniques which are expected to give better results. The Tamura features, including coarseness, contrast, directionality, linelikeness, regularity, and roughness, are designed in accordance with psychological studies on the human perception of texture. The basic idea of using Gabor filters to extract texture features is as follows. A two dimensional Gabor function g(x, y) is defined as: g(x, y) = (1/ 2��x�y) exp[(–1/ 2)((x2 /�x2) + ( y2 /�y2)) + 2�jwx] (3)

where �x and �y represent the standard deviations of the Gaussian envelopes along the x and y direction. Given an image I(x, y), its Gabor transform is defined as: Wmn(x,y) = � I(x, y) g(x–x1, y–y1)dx1dy1 (4) Then the mean and the standard deviation of the magnitude of Wmn(x, y) can be used to represent the texture feature of a homogeneous texture region. 3.3 Shape Information from Phase Congruency Compared with color and texture features shape features are usually described after images have been segmented in to regions or objects. Shape does not refer to the shape of an image but to the shape of a particular region that is being sought out. Image features such as step edges, lines, and Mach bands all give rise to points where the Fourier components of the image are maximally in phase. The use of phase congruency for marking features has significant advantages over gradient-based methods [5]. Phase congruency is a dimensionless quantity that is invariant to changes in image brightness or contrast; hence, it provides an absolute measure of the significance of feature points, thus allowing the use of universal threshold values that can be applied over wide classes of images. Gradient-based edge-detection methods such as those developed by Sobel [5], Marr and Hildreth [5], Canny [5], and others are sensitive to variations in image illumination, blurring, and magnification. The image gradient values that correspond to significant edges are usually determined empirically, although a limited number of efforts have been made to determine threshold values automatically. Congruency of phase at any angle produces a clearly perceived feature.

(5) where �, the offset at which congruence of phase occurs, is varied from 0 to �/2. A model of feature perception called the Local Energy Model has been developed by Morrone et al. and Morrone and Owens [5]. This model postulates that features are perceived at points in an image where the Fourier components are maximally in phase. Phase congruency function in terms of the Fourier series expansion of a image at some location x is defined as

374

Page 3: [IEEE 2007 14th International Workshop on Systems, Signals and Image Processing and 6th EURASIP Conference focused on Speech and Image Processing, Multimedia Communications and Services

PC (x) = {max �(x) � [0, 2] � An cos (n(x) - �(x))}/�An (6) where An represents the amplitude of the nth Fourier component, and �n(x) represents the local phase of the Fourier component at position x [6]. The value of �� (x) that maximizes this equation is the amplitude weighted mean local phase angle of all the Fourier terms at the point being considered. Values of phase congruency vary from a maximum of 1 (indicating a very significant feature) down to 0 (indicating no significance). This allows one to specify a threshold to pick out features before an image is seen. As it stands, phase congruency is a rather awkward quantity to calculate. Points of maximum phase congruency can be calculated equivalently by searching for peaks in the local energy function [5]. The local energy function is defined for a one-dimensional luminance profile, I (x), as

(7) where F(x) is the signal I (x) with its DC component removed, and H(x) is the Hilbert transform of F(x) (a 90 deg. phase shift of F(x)). Typically, approximations to the components F(x) and H(x) are obtained by convolving the signal with a quadrate pair of filters. The relationship between phase congruency, energy, and the sum of the Fourier amplitudes can be seen geometrically in Fig. 2.

Fig. 2. Polar diagram showing the Fourier components at a location in the signal plotted head to tail.

Hence the energy is equal to phase congruency scaled by the sum of the Fourier amplitudes; that is, E(x) = PC(x) � An (8)

Thus, the local energy function is directly proportional to the phase congruency function, so peaks in local energy will correspond to peaks in phase congruency. The raw phase congruency images were obtained by applying Equation (6) to the images. Local frequency information was obtained using two-octave bandwidth

filters over four scales and six orientations. These values are compared against the database images and ranking is completed. 4. COMBINING FEATURES EXTRACTED Voting is a well-established technique not only in social and political life, but also in computational systems. In our case each agent, according to its specialization, votes for the similarity of the query image to images in a gallery [8]. Reasoning, describes the image scores for each agent with regard to the voting methodology applied. The selected voting plays an important role and has a great influence in the final result. We propose a simpler technique to combine multiple features together, called Voting, which corresponds to taking linear Combination of the learners. Let us denote by Wj the weight of learner j. Then the final output is computed as

Y = � wjdj (9) where dj refers to the vote of the corresponding feature. In retrieval, we call it as plurality voting where the image having the maximum number of votes is the winner. To evaluate the performance of retrieval system, two measurements, namely, recall and precision [7], are borrowed from traditional information retrieval. For a query q, the data set of images in the database that are relevant to the query q is denoted as R(q), and the retrieval result of the query q is denoted as Q(q). The precision of the retrieval is defined as the fraction of the retrieved images that are indeed relevant for the query:

(10) The Recall is the fraction of relevant images that is returned by the query.

(11) Usually, a tradeoff must be made between these two measures since improving one will sacrifice the other. In typical retrieval systems, recall tends to increase as the number of retrieved items increases; while at the same time the precision is likely to decrease. In addition, selecting a relevant data set R(q) is much less stable due to various interpretations of the images. Further, when the number of relevant images is greater than the number of the retrieved images, recall is meaningless. As a result, precision and recall are only rough descriptions of the performance of the retrieval system. 5. RESULTS We made use of the “Animal” data set for finding the similarity between the images. The total size of the

375

Page 4: [IEEE 2007 14th International Workshop on Systems, Signals and Image Processing and 6th EURASIP Conference focused on Speech and Image Processing, Multimedia Communications and Services

database is found to be 530(Images). The images below represent some of the query images for the developed system.

Input

Color

Texture

Shape

Combination of

features

Lion 94 92 93 98

Monkey 88 92 85 94

Tiger 96 96 98 100

Horse 94 90 94 94

Cat 92 94 88 96

Table 1. Retrieval Accuracy in Percentile score (Rounded off)

6. CONCLUSION Combining the features extracted thus makes an intelligent and automatic solution for efficient searching of images from a huge database. Image repossession systems rely greatly on user communication. On the one hand, images to be retrieved are determined by the user’s requirement of the query. On the other hand, query results can be refined to include more significant candidates through the relevance feedback of users. The experimental results are indicative of the many possibilities and promising for further work. Updating the retrieval results based on the user’s feedback can be achieved by updating the images, the feature models, the weights of features in similarity distance, and select different similarity measures. New techniques in semantic descriptions of visual contents will be the future direction. Preliminary results in this application domain are very encouraging and our current aim is to develop an efficient and possibly intelligent content-based image retrieval system with user’s feedback.

REFERENCES [1] James Z. Wang, Jia Li and Gio Wiederhold, “SIMPLIcity:

Semantics-sensitive Integrated Matching for Picture Libraries,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.23, no. 9, PP. 947 – 963, 2001.

[2] K.Grauman and T.Darrell. Efficient Image Matching with

Distributions of Local Invariant Features. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, A, USA, June 2005.

[3] J. Van de Weijer and J.M.Geusebroek. Color edge and

corner detection by photometric quasi-invariants. IEEE Trans. Pattern Anal. Machine Intell. 27(4):625-630, 2005.

[4] Sebe N. and Lew M.S., “Texture Features for Content

based Retrieval”, in principles of visual information Retrieval, Springer – verlag, 2001.

[5] Peter Kovesi. “Videre: Journal of Computer Vision

Research – Quarterly Journal”, Vol 1, No. 3, PP.2 - 27 summer 1999.

[6] K.Satya Sai Prakash, RMD. Sundaram, “Shape Information

from Phase Congruency and its application in Content based Retrieval of Digital Multimedia databases”, Proceedings of the 3rd workshop on Computer Vision, Graphics and Image Processing – WCVGIP 2006, Hyderabad PP.88-93, 2006.

[7] A.M.W. Smeulders, M. Worring, S. Santini, A.Gupta, and

R. Jain, “Content-based image retrieval at the end of early years, “IEEE Trans. On Pattern Analysis and Machine Intelligence, Vol.22, No.12, pp.1349-1380, Dec.2000.

[8] G. Halkiadakis, “Agent Architecture for a Voting System”, MSc Thesis, University of Crete, 1999.

376