Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
31
CHAPTER 2
LITERATURE REVIEW
2.1 Introduction
An important task of Content Based Image Retrieval system is to produce
machines which search for images in the data base and retrieve the required image
faster and efficiently. The first and most an important task in CBIR application is
begins with the extraction of suitable visual features from the images. These visual
features form the base of all CBIR applications. There are various methods are
available and it was studied to extract different types of visual features from the
images. In fact, most of the early CBIR systems attempted to find best possible visual
features to represent different kinds of images for effective retrieval. In this chapter
the different types of visual features are discussed and the related works on each of
these visual features are reviewed.
Digitized images are stored in databases as 2-dimensional pixel intensities
without inherit meaning. Thus, extracting meaningful, useful and accurate
information from those raw data is the main issue. This process is called feature
extraction. The extracted features, also called image features. A feature is defined to
capture a certain visual property of an image, either globally for the entire image or
locally for a small group of pixels. The most commonly used features include those
reflecting color, texture, shape, and salient points in images. The existing image
features can be grouped into two categories:
Global features
Local features
In global extraction, features are computed to capture the overall
characteristics of an image and are represented in form of histograms or statistical
models. The advantage of global extraction is its high speed for both extracting
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
32
features and computing similarity. The popular global features include Edge
Histograms[LC01], MPEG-7 color descriptors [LC01] , Color Correlograms
[HRM97] , Local Binary Patterns (LBP) [TMT02] , Gabor filtering features
[MJW02], Color Moments [SO95], Color Histograms [SB91] , and Tamura
[TMY78] etc.
Local features are small sub-images or patches extracted from the original
image. In local feature extraction, a set of features are computed for every pixel using
its neighborhood. To reduce computation, an image may be divided into small, non-
overlapping blocks, and features are computed individually for every block. The
features are still local because of the small block size, but the amount of computation
is only a fraction of that for obtaining features around every pixel. The local features
can yield good results in various classification tasks [CWK05]. Local features have
some interesting properties for image recognition, e.g. they are inherently robust
against translation. These properties are also interesting for image retrieval.
2.2 Content-Based Image Retrieval using Global features
2.2.1 Colour
The color feature is one of the most widely used visual features in image
retrieval. It is relatively robust to background complication and independent of image
size and orientation. Some representative studies of color perception and color spaces
are available in literature [MMD76] [WYA97].
The color histogram is the most commonly used color feature representation in
image retrieval system. Statistically, the colour histogram denotes the joint probability
of the intensities of the three color channels. Swain and Ballard proposed histogram
intersection, an L1 metric, as the similarity measure for the color histogram [SB91].
To take into account the similarities between similar but not identical colors, Ioka
[MI89] and Niblack et al. [NB94] introduced an L2-related metric in comparing the
histograms. Furthermore, considering that most color histograms are very sparse and
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
33
thus sensitive to noise, Stricker and Orengo proposed using the cumulated color
histogram.
Stricker and Orengo [SO95] proposed using the color moments approach to
overcome the quantization effects, as in the color histogram. The mathematical
foundation of this approach is that any color distribution can be characterized by its
moments. Furthermore, since most of the information is concentrated on the low-
order moments, only the first moment (mean), and the second and third central
moments (variance and skewness) were extracted as the color feature representation.
Weighted Euclidean distance was used to calculate the color similarity.
To facilitate fast search over large-scale image collections, Smith and Chang
proposed color sets as an approximation to the color histogram [SC95]. They first
transformed the (R, G, B) color space into a perceptually uniform space, such as
HSV, and then quantized the transformed color space into M bins. A color set is
defined as a selection of colors from the quantized color space. Because color set
feature vectors were binary, a binary search tree was constructed to allow a fast
search.
In 1996, Histogram refinement [GR96] based on color coherence vectors was
proposed. The technique considers spatial information and classifies pixels of
histogram buckets as coherent if they belong to a small region and incoherent
otherwise. Color coherence vectors technique improves performance of histogram
based matching but computationally expensive.
Color correlogram feature for images was proposed in [HKM97] which take
into account local color spatial correlation as well as global distribution of this spatial
correlation. The correlogram gives the change of spatial correlation of pairs of colors
with distance and hence performs well over classical histogram based techniques.
Fuzzy color histogram [VB00] was proposed to revisit the use of color image
content as an image descriptor through the introduction of fuzziness, which naturally
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
34
arises due to the imprecision of the pixel color values and human perception. In 2000
they proposed the use of both fuzzy color histograms and their corresponding fuzzy
distances for the retrieval of color images within various databases.
Color histogram is an efficient tool, which is widely used in CBIR. The classic
method of color histogram creation results in very large histograms with large
variations between neighboring bins. Thus, small changes in the image might result in
great changes in the histogram. Moreover, the fact that each color space consists of
three components leads to 3-dimensional histograms. Manipulating and comparing
3D histograms is a complicated and computationally expensive procedure. To
overcome the above problem Konstantinidis [KGA05] proposed a method in which
projected the 3D histogram onto one single-dimension histogram is called histogram
linking. In their method, a new fuzzy linking method of color histogram creation is
proposed based on the L*a*b* color space and provides a histogram which contains
only 10 bins. The histogram creation method in hand was assessed based on the
performances achieved in retrieving similar images from a widely diverse image
collection. The experimental results prove that the proposed method is less sensitive
to various changes in the images (such as lighting variations, occlusions and noise)
than other methods of histogram creation. The query image and its fuzzy linked
histogram is shown in Figure 2.1.
Figure 2.1: Fuzzy linked histogram
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
35
Seong-Yong Hong. et .al. [HC06] utilized HSI values, which are converted
from RGB values, to generate the FMV-index values for image searching. HSI color
is described through the range from 0° to 360° which is shown in Figure 2.2.
Figure 2.2: HSI color model
Three color characteristics, hue (H), saturation (S), and intensity (I) or lightness (L),
are defined to distinguish color components. Hue describes the actual wavelength of
a color by representing the color name, such as red or yellow. Saturation is a measure
of the purity of a color, indicating how much white light being added to a pure color.
First, the RGB histograms are extracted from color images and are stored into
databases. These RGB values are converted into HSI values and FMV-index is
produced from it. FMV-index is stored in an emotional categories table. To search an
image the emotional adjectives are applied to color images and fuzzy values are
automatically obtained based on human visual features. The method FMV index
searching scheme does support semantic-based retrieval that depends upon human
sensation (e.g., “cool” images) and emotion (e.g., “soft” images), as well as
traditional color-based retrieval. Emotional concepts are classified into twelve classes
according to emotional expression and images are classified into these categories as
well. The image searching speed was improved by assigning FMV index value to
each class. As a result, more user-friendly and accurate searching, with emotional
expression, was realized in their experiment.
Color histograms are the most commonly used features for image retrieval due
to the good attributes of image translation and rotation invariance. However the
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
36
traditional color histogram features are usually short of spatial structural information
which may cause error retrieval results. In [XSB08] image retrieval was proposed
based on Weighted DCT Spatial Combination Histogram (WDSCH). The Equalized
DCT coefficients histograms for each divided image block are firstly constructed with
the DCT transformation and histogram equilibrium method. The Weighted DCT
Spatial Combination Histogram features for the entire image are then extracted in the
HSV color space with the weights of the spatial information and energy distribution.
Experimental results on the three image datasets with different conditions show that
the compact histogram features containing spatial structural relations are robust to
image translation, scaling and rotation, and can bring about good retrieval precision
and speed simultaneously.
Bo Gun et. al [BKS08] extracted the perceptual representation of an original
color image, a statistical signature by modifying general color signature, which
consists of a set of points with statistical volume. The authors also used dissimilarity
measure for a statistical signature called Perceptually Modified Hausdorff Distance
(PMHD) that is based on the Hausdorff distance. PMHD is insensitive to the
characteristics changes of mean color features in a signature, and theoretically sound
for incorporating human perception in the metric. To deal with partial matching, the
partial PMHD was defined, which explicitly removed outlier using the outlier
detection function. Their experimental result shows the retrieval performance of the
PMHD is, on average, 20–30% higher than the second highest one in precision rate.
Figure 2.3: An example of perceptual dissimilarity based on the densities of two color features.
In [SMM09] the image was split into Red, Green, Blue Components. For each
component the average was calculated and every component image was split to obtain
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
37
RH, RL, GH,GL, BH and BL images. The RH is obtained by taking only red
component of all pixels in the image which are above red average and RL is obtained
by taking only red component of all pixels in the image which are below red average.
Similarly the Color moments was applied to each divided component i.e. RH, RL,
GH, GL, BH and BL and was classified using K-means clustering algorithm.
In another approach [XZY09] applied quotient space granularity to retrieve
images. Image retrieval based on quotient space granularity theory includes two parts:
first, the attribute feature were obtained by describing images under different
granularities, then by combining attribute functions in different levels, an attribute
feature was formed. In their approach global color feature based on histogram and
local color feature based on DCT and SVD as attributes were considered under
different granularities to describe image color, and to combine the attribute functions
under different granularities to eventually realize image retrieval. According to their
experiment the precision curve of the 100 random queries when recall is 60%.
[SSM09] proposed a method in which invariant color histogram technique was
used for image retrieval. In their approach they used gradients in different channels
that weight the influence of a pixel on the histogram to cancel out the changes
induced by deformations. Their experimental result showed that the system was able
to retrieve the closest match when a rotated image was given as input. To reduce the
large variations between neighboring bins of conventional color histograms, fuzzy
color histograms are adopted which consists of only ten bins. However, their system
was less sensitive to various changes in the images such lightning variations and
noise.
Recently Rachid et. al. [RSM09] proposed an image retrieval method based
spatial color indexing methodology by using two descriptors. The first descriptor
CSE (Color Spatial Entropy) proposes a mechanism to transform local histograms
features into entropy feature, the second descriptor (Color Hybrid Entropy ) CHE
integrates spatial relationship of colors in multi-resolution images. The multi-
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
38
resolution images are generated using median and mean filters of different sizes. The
multi-resolution images were obtained with different filters such as mean, median,
Laplacian, Gaussian. Experimental results indicated that the proposed system were
quite robust, provide high precision in image retrieval system and takes more
querying time than the local histograms system. The multi-resolution local histograms
using the median filtering is shown in the following figure.
Figure 2.4 : Multiresolution Local Histograms
Abdelhamid Abdesselam et. al. [AWN10] proposed a CBIR where the spiral
representation of the color was extracted from the image to retrieve rotated images as
well as scaled images. To extract the spiral representation of the color content, the
image underwent a labeling process that assigns every pixel in the image closest color
from a predefined color table. The output of this CCM image (Color Cluster Mapping
image) was equally divided into m x m sub-areas and a Single Color Mapping (SCM)
image was derived for each predefined color. The Spiral Bit-string Representation of
Color (SBRC) was helpful in capturing the color information of an image as well as
its spatial distribution. Their experimentation showed that the use of SBRC in
retrieving rotated images was efficient and could be easily incorporated in real-life
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
39
CBIR systems. The disadvantage of their system was that the algorithm used in sub-
image retrieval had to be optimized before they can be used in real-life CBIR system.
H.B.Kekre et.al.[KST10] presented a image retrieval technique based on
augmentation of the colour averaging. The reflection of the original image was taken
across horizontal and vertical directions to get a flip image. The even part of the
image was obtained by adding original and flip images. In their experiment the
combination of original image with even part had a better performance than retrieving
images taking the original alone. On the other hand the combination of original image
with odd part showed the worst results. Therefore, the colour averaging techniques
like row & column mean (RCM), forward diagonal mean (FDM) and row, column &
forward diagonal mean (RCFDM) were applied on the original image and the even
part of image. The techniques were tested on a generic image database having 1000
images spread across 11 categories. For each proposed CBIR technique 55 queries (5
per category) are fired on the image database. Their experimental results showed
improved performance compared to the simple original image feature vectors.
Figure 2.5 a) : Example of Row & Column Mean
(RCM) vector
Figure 2.5 b) : Example of Forward Diagonal
Mean (FDM) vector
In an another approach H.B.Kekre et.al.[KD10] used Walsh transform to
generate the feature vector for content based image retrieval. The complex Walsh
transform was conceived by multiplying Sal functions by j and combining them with
cal functions of the same sequence. The angle was calculated by taking tan inverse of
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
40
sal/cal in the range of 0 to 360 degrees which was divided into 4 sectors. The mean of
real and imaginary values of these sectors in all three color planes considered to
design the feature vector of 24 components all together. The Walsh transform of the
color image was calculated in all three R, G and B planes. The experiment was carried
over database of 270 images spread over 11 different classes. The Euclidean distance
is used as similarity measure. Average precision and recall is calculated for the
performance evaluation. The overall average of cross over points of precision and
recall is above 50%.
2.2.2 Texture
Texture refers to the visual patterns that have properties of homogeneity that
do not result from the presence of only a single color or intensity. It is an innate
property of virtually all surfaces, including clouds, trees, bricks, hair, and fabric. It
contains important information about the structural arrangement of surfaces and their
relationship to the surrounding environment. Because of its importance and
usefulness in pattern recognition and computer vision, there are rich research results
from the past three decades. Many CBIR is proposed using texture feature in the last
three decade.
In the early 1970s, Haralick et al. proposed the co-occurrence matrix
representation of texture features [HSD73]. This approach explored the gray level
spatial dependence of texture. It first constructed a co-occurrence matrix based on the
orientation and distance between image pixels and then extracted meaningful statistics
from the matrix as the texture representation.
Many other researchers followed the same line and further proposed enhanced
versions. For example, Gotlieb and Kreyszig studied the statistics originally proposed
in [HSD73] and experimentally found out that contrast, inverse deference moment,
and entropy had the biggest discriminatory power.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
41
Motivated by the psychological studies in human visual perception of texture,
Tamura et. al. explored the texture representation from a different angle [TMY78].
They developed computational approximations to the visual texture properties found
to be important in psychology studies. The six visual texture properties were
coarseness, contrast, directionality, linelikeness, regularity, and roughness. One
major distinction between the Tamura texture representation and the co-occurrence
matrix representation is that all the texture properties in Tamura representation are
visually meaningful, whereas some of the texture properties used in co-occurrence
matrix representation may not be (for example, entropy). This characteristic makes
the Tamura texture representation very attractive in image retrieval.
In the early 1990s, after the wavelet transform was introduced and its
theoretical framework was established, many researchers began to study the use of the
wavelet transform in texture representation [LF93] [GKL94] [TNP94].
In [SC94] Smith and Chang used the statistics (mean and variance) extracted
from the wavelet subbands as the texture representation. This approach achieved over
90% accuracy on the 112 Brodatz texture images. To explore the middle-band
characteristics, a tree-structured wavelet transform was used by Chang and Kuo in
[CK93] to further improve the classification accuracy. The wavelet transform was
also combined with other techniques to achieve better performance. Gross et al. used
the wavelet transform, together with KL expansion and Kohonen maps, to perform
texture analysis in [GKL94]. Thyagarajan et al. [TNP94] and Kundu et. al. [KC92]
combined the wavelet transform with a co-occurrence matrix to take advantage of
both statistics-based and transform-based texture analyses.
Ma and Manjunath [MM95] evaluated the texture image annotation by various
wavelet transform representations, including orthogonal and bi-orthogonal wavelet
transforms, the tree-structured wavelet transform, and the Gabor wavelet transform.
They found that the Gabor transform was the best among the tested candidates which
matched human vision study results [SC96].
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
42
In an approach Tomasz [TA05] proposed Gabor filters for extracting texture
features needed to characterize images in a database. First, image normalization was
performed to extracted salient points and texture features that do not change due to
contrast and illumination changes. Then multichannel Gabor filtering and the idea of
hierarchical image representation were presented and the texture features was
computed only in a neighborhood of salient points. In the extracted salient points
Regions of Interest (ROIs) were created in which color histograms and Zernike
moments was calculated. These features were used as information and images were
retrieved based on the similarities feature, where features of the query specification
are compared with features from the image database to determine which images
match similarly with given features.
In [VNS06] a shift-invariant complex directional filter bank (CDFB) was
proposed for texture image retrieval. By combining the Laplacian pyramid and the
CDFB, a new image representation with an over complete ratio of less than 8/3 are
obtained. The direction subbands’ coefficients are used to form a feature vector for
classification. Texture retrieval performance of the proposed representation is
compared to those of the conventional transforms including the Gabor wavelet, the
contourlet and the steerable pyramid. The over complete ratio of the proposed
complex directional pyramid is about twice that of the contourlet, and is much lower
than those of the other two transforms. The experiment shows that the technique able
to retrieve images with an overall precision 71.53 %.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
43
Fig. 2.6: Frequency Partition of Directional Filter Banks: a) CDFB b) Gabor wavelets
c) Contourlet Transform d) Steerable Pyramid
Conventional Gabor representation and its extracted features often yield a
fairly poor performance in retrieving the rotated and scaled versions of the texture
image under query. To address this issue Ju Han [HM07] proposed a method using
rotation-invariant and scale-invariant Gabor representations, where each
representation only requires few summations on the conventional Gabor filter impulse
responses. The experiment was conducted on Brodatz data set. Features are extracted
from these new representations for conducting rotation-invariant or scale-invariant
texture image retrieval. Since the dimension of the new feature space is much
reduced, this leads to a much smaller metadata storage space and faster on-line
computation on the similarity measurement. The system proved to be robust for
rotation-invariant and scale-invariant texture image retrieval.
In another approach An P.N. Vo. et. al. [VON08] a new statistical model was
proposed for modeling the nature images in the transform domain. They demonstrate
that the von Mises distribution (VM) fits accurately the behaviors of relative phases in
the complex directional wavelet subband from different nature images. The image
feature was extracted based on the VM model and the texture was used for image
retrieval. For each image in the database were decomposed using the following three
decompositions: the curvelet, the 2-D Gabor transform and the CDFB. The Gabor
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
44
wavelet and the curvelet are applied with four scales and six orientations per scale,
while the CDFB has three scales of eight orientations. For each subband, the mean
and standard deviation of the absolute values of the coefficients are calculated and the
RP matrix of each complex subband was created to construct the RP based feature
vectors. For each query image, N nearest neighbors are selected, and the number of
textures belonging to the same class as the query texture, except for itself, is counted.
If the less than or equal to fifteen then it is assumed as the retrieval rate. The
performance of the entire class is obtained by averaging this rate over the sixteen
members which belong to the same class of texture. The texture database used in the
experiment contains 40 images from the VisTex databases. Each of these 512 × 512
images is divided into sixteen 128×128 non-overlapping sub-images, thus creating a
database of 640 texture samples. Their experimental result showed that the VM based
feature yielded higher retrieval accuracy compared to the energy features and the
relative phase features.
N . Gnaneswara Rao. et. al. [GVV09] presented CBIR based on wavelet
transformations. The Haar Wavelet transformation for feature extraction was to
captures the local level texture features. First, the texture feature vectors for all the
images in the database were computed using Haar wavelet Transformation. Then
images were clustered based on feature vectors using modified ROCK clustering
algorithm. Similarly, the feature vector for the query image was computed and
compared with indexed database to identify the closest cluster for the query image
and retrieved those images. The system operated partially at the primitive feature
level (ie texture) and failed to retrieve images with other features such as colour and
shape.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
45
Figure 2.7: Wavelet Decomposition
Jianhua Wu. et. al. [JZY10] combined the color and texture feature for
content-based image retrieval. According to the characteristic of the image texture,
the information is represented by Dual-Tree Complex Wavelet (DT-CWT) transform
and rotated wavelet filter (RWF). The authors chose the color histogram in RGB and
HSV color space for the color feature. The experiment results show that their method
is efficient for simple images and failed to retrieve images of complex in nature.
Balamurugan et. al. [BAN10] in their approach the colour feature of each
query image was extracted and quadratic distance measure was computed to retrieve
similar images from the database. The 2D-DWT was applied to query and database
image, texture features of the query image and database compared using Euclidean
distance measure. The images with similar texture feature were selected and these
images were sorted using integrated approach. The relevant images from the color and
texture based retrieval were compared to get the most relevant image from the
database. Their system was able to retrieve relevant images without any
transformation and their precision recall rate is 98%.
More recently Kavitha et. al. [KBP11] proposed method in which both the
query image and the images in database were divided into 6- equal blocks. For each
block constructed HSV colour histogram and obtained statistic features (Energy,
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
46
Contrast, Entropy and inverse difference) from GLCM and constructed a combined
feature vector for color and texture. To obtain the similar images from the database
normalized Euclidean distance was used between feature vector of query image and
the feature vectors of target images. Their system retrieved the first 20 similar images
with minimum Euclidean distance.
Zhengli Zhu et. al. [ZCY11] presented an approach for rotation-invariant
texture image retrieval based on combined feature sets. The approach depended on
nonsubsampled contourlet coefficients, angular second moment, moment inertia,
inverse difference moment and correlation of grey co-occurrence matrices of the
images. Nonsubsampled contourlet transform had anisotropy and translation
invariability. Grey co-occurrence matrix of image reflected information about
direction, adjacency spacing, and range of variation. Combined feature sets included
more information of image. Their experimental result shows the system was able to
retrieve rotated images and the retrieval accuracy was 76.31% .
Figure 2.8: Two-level Nonsubsampled Contourlet Transform Decomposition
2.2.3 Shape
Shape is one of the most commonly used visual features in computer vision
and multimedia applications. Shape is relatively robust to background complication
and independent of image size and orientation. In general, the shape representations
can be divided into two categories, boundary-based and region-based. The former
uses only the outer boundary of the shape while the latter uses the entire shape region
[RSH96]. Many image retrieval systems were proposed in literature using shape.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
47
The Fourier descriptor was used the Fourier transformed boundary as the
shape feature. To take into account the digitization noise in the image domain, Rui et
al. proposed a modified Fourier descriptor which was both robust to noise and
invariant to geometric transformations [RSH96]. The moment invariants were used
region which was invariant to transformations, as the shape feature.
In [MKL97] , Babu et al. compared the performance of boundary-based
representations (chain code, Fourier descriptor, UNL Fourier descriptor), region-
based representations (moment invariants, Zernike moments, pseudo-Zernike
moments), and combined representations (moment invariants and Fourier descriptor,
moment invariants and UNL Fourier descriptor).
In addition to 2D shape representations, there were many methods developed
for 3D shape representations. In [WM81], Wallace and Mitchell proposed using a
hybrid structural/statistical local shape analysis algorithm for 3D shape
representation.
Zhiyong Wang et. al [ZWCF00] presented a two-step approach of using a
shape characterization function called centroid-contour distance curve and the object
eccentricity (or elongation) for leaf image retrieval. Both the centroid-contour
distance curve and the eccentricity of a leaf image are scale, rotation, and translation
invariant after proper normalizations. In the first step, the eccentricity was used to
rank leaf images, and the top scored images were further ranked using the centroid-
contour distance curve together with the eccentricity in the second step. A thinning
based method was used to locate start point(s) for reducing the matching time. The
method used for retrieving images based on global shape information.
Image characterization with lesser number of features involving lower
computational cost is always desirable. Edge is a strong feature for characterizing an
image. Minakshi Banerjee presented a technique for extracting edge map of an image
which was followed by computation of global feature (like fuzzy compactness) using
gray level as well as shape information of the edge map [MK03]. Unlike other
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
48
existing techniques it does not require pre segmentation for the computation of
features.
C. Sheng et. al [SX05] in their approach a different measurement of shape
similarity, based on shape matrix which is invariant and unique under rigid motions
were used for image retrieval. Combined with the snake model, the original template
shape is deformed to adjust itself to the shape images. The deformation spent by the
original template to match the shape images and the matching degree is used to
evaluate the similarity between them.
S.Arivazhagan used Canny operator to detect the edge points of the image
[AGS07]. The contour of the image was traced by scanning the edge image and re-
sampling was done to avoid the discontinuities in the contour representation. The
resulting image was swept line by line and the neighbor of every pixel was explored
to detect the number of surrounding points and to derive the shape features.
Shan Li et. al. [LLP09] in their approach is based on a set of orthogonal
complex moments of images known as Zernike moments (ZMs). As the rotation of an
image has an impact on the ZM phase coefficients of the image, existing proposals
normally use magnitude-only ZM as the image feature. They combined both the
magnitude and phase coefficients to form a new shape descriptor, referred to as
invariant ZM descriptor (IZMD). The scale and translation invariance of IZMD were
obtained by pre normalizing the image using the geometrical moments. To make the
phase invariant to rotation, they performed a phase correction while extracting the
IZMD features. Experiment results show that the proposed shape feature is, in
general, robust to changes caused by image shape rotation, translation, and/or scaling
and was better than the commonly used magnitude-only ZMD.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
49
Figure 2.9 : (a) Original image. (b) Reconstructed images using ZM(c) Reconstructed complex ZM
2.3 Content-Based Image Retrieval using Discrete Cosine Transform
Lay et al. [LG99] proposed the energy histograms of the low frequency DCT
coefficients as image features. The DCT coefficients are selected from 6 blocks. As
the low frequency DCT coefficients carry the most of the energy in DCT blocks, the 6
blocks contain different combinations of the DCT coefficients in low-frequency are
selected. Six DCT coefficient histograms extracted from these blocks are the image
features.
H. Yu [HY99] introduces a Q-metric for the DCT coefficients to measure the
image similarity. A sub-image including all DCT coefficients at each position in a
8×8 block is generated and the wavelet transformation is applied. While calculating
the distance between a query and a target image, the Q-metric counts the total number
of the coefficients which are higher than a predefined threshold in the corresponding
sub-images from both the query and target images. The overall distance is a weighted
summation of the counts from all sub-images. The Q-metric gives a measurement on
how many significant coefficients in two images are in common.
Ngo et. al. [NPC01] developed an image indexing technique via
reorganization of DCT coefficients in Mandala domain, and representation of color,
shape and texture features in compressed domain. Their work demonstrated
advantages in terms of indexing speed but with significantly sacrificing the retrieval
accuracy. As DCT compresses the image energy into lower order coefficients, they
only considered the first nine AC coefficients in an 8x8 DCT block and the variance
of these nine AC coefficients used to index the image. Although minimum numbers
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
50
of features are always desirable property for characterizing images but a single feature
failed to achieve desired accuracy.
In [NS04] the features were directly extracted from DCT domain. For each
color image of block size 8x8 in DCT domain a feature vector was extracted. Then,
feature vectors of all blocks of an image using the k-means algorithm was applied to
cluster into groups. Each cluster represents a special object of the image. Then some
clusters are selected that have largest members after clustering. The centroids of the
selected clusters are taken as feature vectors and indexed into the database. It
increases the size of the feature database and takes much time to index an image in
the database. Since image retrieval system was a subjective matter, evaluation of
retrieval performance is not reported.
Lu et al. [LB05] proposed a new method to extract a vector quantization index
histogram from the DCT coefficients. First, the 64 DCT coefficients in each 8 × 8
block are divided into four groups. For each color channel (Y, Cb, Cr), 4 codebooks
are trained from a randomly selected training image set. After obtaining the above 12
codebooks, images in the data set are processed in the same method as the training
images. The DCT sequences are divided into 12 groups, and encoded by the
corresponding codebooks. The indexes of the DCT coefficients from each codebook
are jointly put together as the quantization index histograms.
Lu et al. [LLB06] approximate the color and texture feature in the pixel
domain directly from the DCT coefficients. The color features of the image are
calculated directly from the DCT coefficients by partial decoding the JPEG image.
Each 8 × 8 DCT block was divided into 4 sub-blocks as shown in Figure 2.11. The
average color values of each block are denoted as M11, M12, M21, M22. The value of
M11,M12,M21,M22 are approximated through the four upper left coefficients in the 8 ×
8 DCT block. The texture information is a vector extracted from selective DCT
coefficients from 6 groups. Group 1 is the DC coefficient. Group 2 and 3 are the
frequency information. Group 4, 5 and 6 are the vertical, horizontal and diagonal
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
51
direction information. The mean and the standard deviation of all coefficients in each
group are extracted as the texture features. The Euclidean distance was used to
evaluate the distance between the query image and the images in the database.
Figure 2.10: Four sub-blocks of the 8x8 DCT block
In another approach Tsai et al. [TCH08] proposed a distance threshold pruning
(DTP) to alleviate computational burden of CBIR without sacrificing its accuracy. In
their approach, the images are converted into the YUV color space, and then
transformed into discrete cosine transform (DCT) coefficients. Benefited from the
energy compacting property of DCT, only the low-frequency DCT coefficients of Y,
U, and V components are stored. On querying an image, at the first stage, the DTP
serves as a filter to remove those candidates with widely distinct features. At the
second stage, the detailed similarity comparison (DSC) is performed on those
remaining candidates passing through the first stage. The experimental results showed
that both high efficacy and high data reduction rate can be achieved simultaneously
by using the this approach.
Kekre et al. [KST09] proposed another technique for image retrieval using the
color, texture features extracted from images based on vector quantization with
Kekre’s fast codebook generation which gave better discrimination capability for
Content Based Image Retrieval (CBIR). The database image is divided into 2x2
pixel windows to obtain 12 color descriptors per window (Red, Green and Blue per
pixel) to form a vector. Collection of all such vectors is a training set. Then the
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
52
Kekre’s Fast Codebook Generation (KFCG) is applied on this set to get 16 code
vectors. The Discrete Cosine Transform (DCT) is applied on these code vectors by
converting them to column vector. This transform vector is used as the image
signature (feature vector) for image retrieval. The method took lesser computations as
compared to conventional DCT applied on complete image. The method gives the
color-texture features of the image database at reduced feature set size. Proposed
method avoided resizing of images which is required for any transform based feature
extraction method.
In another approach H.B.Kekre [KST09] the data base image was divided into
2x2 pixel windows to obtain 12 color descriptors per window (Red, Green and Blue
per pixel) to form a vector. Collection of all such vectors is a training set. Then the
Kekre’s Fast Codebook Generation (KFCG) was applied on this set to get 16 code
vectors. The Discrete Cosine Transform (DCT) is applied on these codevectors by
converting them to column vector. This transform vector was used as the image
signature (feature vector) for image retrieval. The method takes lesser computations
as compared to conventional DCT applied on complete image. The method gave the
color-texture features of the image database at reduced feature set size. Proposed
method avoids resizing of images which is required for any transform based feature
extraction method. The image database used in the experiments is the subset of
Columbia Object Image Library (COIL-100) [NNM96] that contain 100 different
objects (classes). Each object was rotated in 72 different degrees, resulting in 7200
images in the database. To test the system 15 categories of images randomly selected
from the CIOL-100 database [NNM96] as a total of 1080 images. The Net Average
Precision/Recall vs Number of Retrieved images for all categories obtained were
82%.
Recently H.B.Kekre et al. [KTA10] based on feature vectors as fractional
coefficients of transformed images using DCT and Walsh transforms [KTA10]. The
feature vector size per image is greatly reduced by taking fractional coefficients of
transformed image. The feature vectors are extracted in fourteen different ways from
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
53
the transformed image. Along with the first being all the coefficients of transformed
image, seven reduced coefficients sets (as 50%, 25%, 12.5%, 6.25%, 3.125%,
1.5625% ,0.7813%,0.39%, 0.195%, 0.097%, 0.048%, 0.024%, 0.012% and 0.06% of
complete transformed image) are considered as feature vectors. The two transforms
are applied on gray image equivalents and the colour components of images to extract
Gray and RGB feature sets respectively. Instead of using all coefficients of
transformed images as feature vector for image retrieval, these fourteen reduced
coefficients sets for gray as well as RGB feature vectors are used, resulting into better
performance and lower computations. The results have shown the performance
improvement (higher precision and recall values) with fractional coefficients
compared to complete transform of image at reduced computations resulting in faster
retrieval. Finally Walsh transform surpasses DCT transforms in performance with
highest precision and recall values for fractional coefficients and minimum number of
computations up to 0.097% and then DCT takes over.
Fig 2.11: The Fractional Coefficients of Transformed images
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
54
In [ASN11] system exploits the global and regional features of the images.
The images are extracted using DCT and central moment. The idea of DCT was to
decouples the color component of image using YCbCr and transform these into DCT
coefficients and measure the similarity moreover extract the region based shape
descriptor to calculate central moment of an image with the help of edge information
and morphological operations to find normalized feature vectors. Then the similarity
calculated with combined features of quantized DCT color coefficients and
normalized feature vectors to retrieve an image.
The existing researches have mainly used the global features of an image for
image retrieval. The drawback of global features is losing much detail information of
the images, in case of looking for images that contain the same object or same scene
with different viewpoints. Therefore, they are usually not suitable for partial image
matching and object recognition or retrieval in cluttered and complex scenes. In order
to overcome this problem local features are used which extract only local features
from regions of interest or objects in the image without segmentation.
The famous local features are SIFT [GL99], PCA-SIFT [KR04], GLOH
[MS05], SURF [HAT08] , etc. Local features have been initially proposed to solve
problems in computer vision applications in applications such as wide baseline
matching [SZ02] , object recognition [FTV04] , texture recognition [LSP03] , robot
localization [LL02] , video data mining [SZ03] , building panoramas [BL03] , and
recognition of object categories [DS03] [FPZ03]. In recent years, the local features
are employed in many CBIR systems.
2.4 Content-Based Image Retrieval Scale Invariant Feature Transform
Xu Wangming et al.[WJX08] used SIFT in CBIR system. In this system the
visual contents of the query image and the database images are extracted and
described by the 128-dimensional SIFT feature vectors. The KD-tree with the best bin
first (BBF), an approximate nearest neighbors(ANN) search algorithm, was used to
index and match those SIFT features. A modified voting scheme called nearest
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
55
neighbor distance ratio scoring (NNDRS) was used calculate the aggregate scores of
the corresponding candidate images in the database and sorting the database images
according to their aggregate scores in descending order, the top few similar images
were retrieved as results.
In another system Sathya Bama et al., [SMRA11] used SIFT for computer-
aided Plant Image Retrieval method based on plant leaf images using Shape, Color
and Texture features for medical industry, botanical gardening and cosmetic industry.
The HSV color space was used to extract the various features of leaves. Log-Gabor
wavelet was applied to the input image for texture feature extraction. The Scale
Invariant Feature Transform (SIFT) was incorporated to extract the feature points of
the leaf image.
Anil Balaji Gonde et al., [AMB11] used Scale Invariant Feature Transform
(SIFT) feature for image retrieval was used. SIFT was used to extract the local
features of an image. The feature from the images can be extracted more accurately
by using SIFT than color, texture, shape and spatial relations. SIFT descriptor vectors
for each image was indexed by making the use of vocabulary tree. Further, relevance
feedback technique was used to bridge the gap between low level features and high
level concepts. The proposed method showed a significant improvement in precision
and average recall rate.
2.5 Content-Based Image Retrieval using Histogram of Oriented Gradients
Histogram of Oriented Gradients (HOG) [DT05] is a feature descriptor used in
computer vision and image processing for the purpose of object detection. The
technique counts occurrences of gradient orientation in localized portions of an
image. This method is similar to that of edge orientation histograms and scale-
invariant feature transform descriptors but differs in that it is computed on a dense
grid of uniformly spaced cells and uses overlapping local contrast normalization for
improved accuracy.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
56
Megha Agarwal et al., [MM10] applied HOG feature descriptor for Content-
based Image Retrieval (CBIR). To measure the similarity large amount of database,
vocabulary tree was used. They performed the comparative analysis of retrieval
system based on HOG feature descriptor and Gabor transform feature descriptor. The
HOG-based retrieval system improved Average Precision (AP) and Average Recall
(AR) (56.75% and 38.45%, respectively) from Gabor-transform-based retrieval
system (41.20% and 25.41%, respectively). The experiment was performed on Corel
1000 natural image database.
2.6 Conclusion
In this chapter, literature reviews of the image features of CBIR systems using
the low-level global features are reviewed and the related work using local features
for image retrieval also reviewed. It is found that there is some CBIR system which
uses local features for image retrieval. Local features are widely used in many
computer vision applications. Therefore, the proposed systems are implemented using
local features since local features are robust, invariant to transformations etc.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.