Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Dottorato di Ricerca in Informatica
XIII ciclo
Universita di Salerno
Compression and Indexing of Digital Images
Riccardo Distasi
December 19, 2001
Coordinatore: Relatore:
Prof. A. De Santis Prof. G. Tortora
Contents
Front Matter i
Title Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Thanks! ix
1 Introduction 1
1.1 Guided Tour of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Speeding Up Fractal Coding: Split Decision Functions 3
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Split-Decision Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Fractal Indexing with Robust Extensions 9
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 The Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.1 Basics of IFS image encoding . . . . . . . . . . . . . . . . . . . . . . 11
3.2.2 The Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Invariance and Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.1 Contrast scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
iii
3.3.2 Luminance Shifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3.3 Color Change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3.4 Rotations and Reflections . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4 A Hierarchical Representation for Image Retrieval 29
4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 HER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.1 Computing Time Evaluation . . . . . . . . . . . . . . . . . . . . . . 36
4.3 HER for Contours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3.1 Properties of HER . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 HER for Textures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4.1 Invariance Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4.3 Comparison with a Wavelet Based Method . . . . . . . . . . . . . . 52
4.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
References 65
A Additional Details 71
A.1 Fractal Index Invariance to Contrast Scaling . . . . . . . . . . . . . . . . . . 71
A.1.1 Center of mass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
A.1.2 Higher Deviates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
A.2 Fractal Index Invariance to Luminance Shifting . . . . . . . . . . . . . . . . 72
A.2.1 Center of Mass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Index 73
iv
List of Figures
2.1 Rate-distortion curves of the 512× 512 lena image obtained using a 4-levelquadtree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1 Obtaining the canonical form of an image. (a) Original image with key
points highlighted; (b) Image flipped vertically; (c) Final image rotated
into canonical form. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Compression to SNR comparison between first and fire. . . . . . . . . . . 21
3.3 AVRR comparison between first, fire and PicToSeek. The horizontal line
represents ideal AVRR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4 Querying fire: contrast scaling invariance and robustness. . . . . . . . . . 23
3.5 Querying fire: luminance shifting combined with rotation and reflection. . 24
3.6 Querying fire: color change. . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.7 Querying fire: rotations and reflections. . . . . . . . . . . . . . . . . . . . . 26
3.8 Querying fire: elephants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.9 Querying PicToSeek: elephants. . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1 A high level sketch of the her algorithm . . . . . . . . . . . . . . . . . . . . 34
4.2 Obtaining the her representation of a real-life 1-D input signal . . . . . . . 35
4.3 Sampling a 2-D contour at fixed angle increments can destroy information . 37
4.4 Sampling a 2-D contour pixel-by-pixel . . . . . . . . . . . . . . . . . . . . . 38
4.5 The correspondence between a contour and its her representation . . . . . 39
4.6 Approximation of a shape by the first M coefficients of its dft . . . . . . . 40
4.7 Approximation of a shape by its M largest maxima using her . . . . . . . 40
v
4.8 Converting a 2-d texture into a 1-d time series. (A): what the texture
looks like; (B): the partition element (texture tile); (C): the spiral; (D): the
resulting 1-d signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.9 Rotating the partition element yields different local maxima . . . . . . . . . 44
4.10 A selection of 256 tiles from the complete texture database utilized for the
experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.11 Results of a sample query: Metal.0000 (Element #2 in Fig. 4.10) . . . . . . 48
4.12 Results of a sample query: Bark.0000 (Element #1 in Fig. 4.10) . . . . . . 48
4.13 Distances from Fabric.0001 (#36 in Fig. 4.10) to the closest 250 matches
in the database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.14 Relation of the energy fraction used to the number of maxima found (index
size) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.15 A graphical view of the outcome of texture-based retrieval performed on
the extended Brodatz data set . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.16 An example of retrieval using heri . . . . . . . . . . . . . . . . . . . . . . . 59
4.17 An example of retrieval using Euclidean Distance . . . . . . . . . . . . . . . 60
4.18 An example of heri’s ability to retrieve rotated versions of the query . . . . 61
4.19 An example of retrieval utilizing a moment-based technique . . . . . . . . . 62
vi
List of Tables
2.1 Speed-up achieved using adaptive entropy against rms on the 512× 512lena image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.1 Invariance properties of her for contours . . . . . . . . . . . . . . . . . . . 42
4.2 Invariance properties for of her for textures . . . . . . . . . . . . . . . . . . 45
4.3 Tabular results of texture-based retrieval performed on the extended Bro-
datz data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4 Quick comparison between hs and heat . . . . . . . . . . . . . . . . . . . . 56
4.5 Comparison between heri, Euclidean distance (ed) and a moment-based
technique (mbt) in terms of normalized recall . . . . . . . . . . . . . . . . . 58
4.6 Detailed comparative results for Euclidean Distance (ed), heri and moment-
based technique (mbt) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
vii
viii
Thanks!
I’d like to thank the people who, in one way or another, helped me to carry out my work.
First, a word of thanks is due to my advisor, prof. Genny Tortora, and to prof. Maurizio
Tucci. They make up the “there would be no thesis at all without them” category. They
gave me their trust, and I am very grateful for all the advice and encouragement they
provided. Next there is the “this thesis would be very different without them, and not for
the better” category: working with prof. Sergio Vitulano has been an enriching experience.
Several colleagues and friends also provided help in unique ways: in particular, the
long and frequent discussions with Michele Nappi were the nursery where many of the
ideas presented here were born or took shape.
Many more people helped, each in her or his own way. I cannot mention every single
name here, but you know who you are. Thanks.
Last but not least, a final word of thanks is due to my family, especially my parents.
Their constant support has been invaluable. They make up the most exclusive category:
“without them, there would be no thesis at all—and even the author would not be there.”
Chapter 1
Introduction
This thesis work is about a topic that is very ‘hot’ today: the compression and indexing of
digital images. Memories are going through a sharp drop in prices, and that is even more
true for mass memories. When we additionally consider the effective techniques available
for compression today, the cost of memorizing large databases of multimedia data—and
images in particular—has never been so low. However, as database size increases, the
main problem we are facing today is that of retrieval.
If we throw the Internet into the pot, we have even more examples of a large body of
image data which is accessible but not necessarily available to everyone, precisely because
the cost of effectively searching for a specific image is so high as to make retrieval nearly
impossible: in the worst case, we have to wade manually through a huge amount of data
obtained from some ‘image search’ engine.
In addition, it is important to devise systems that are usable by the general public.
This might seem obvious, but in fact many of the best working image indexing systems are
research prototypes, and as such have many parameters that must be accurately ‘tweaked’
in order to obtain the best results. Usability is a major goal, and not a trivial one at that.
For these reasons, finding more efficient ways to compress, index, and retrieve images
has become an active research field. Many researchers are investing their energies into the
quest of particular and general solutions to this important problem.
1.1 Guided Tour of this Thesis
Fractal image compression has proven to be a very effective way of achieving high compres-
sion ratios with very little distortion. The main problem with fractal-based techniques is
2 Chapter 1. Introduction
the long computing time required. Therefore, several acceleration methods are being inves-
tigated. The careful choice of a split decision functions is one of such methods. Chapter 2
describes one such possible choice, which is shown to achieve a significant speedup.
While being a useful compression technique, fractal coding can be successfully used to
index a database of images. The very same features that can describe an image in such
a way to achieve good compression—basically spatial relations among different image
regions—can be processed into a usable index. The resulting indexing system has inter-
esting properties: in particular, several useful invariances and the fact that the database
can be kept in (fractal-)compressed form throughout the whole process. This is addressed
in Chapter 3. The features that are relevant for fractal compression and indexing are not
easy to see to the human eye. Self-similarity is on a scale too microscopic and the match
for the ‘best’ self-similarity (best in the rms sense) is computed with a least-squares best
fit whose results aren’t often apparent to the eye.
Another approach to indexing an image database, quite differently, focuses on specific
human-perceivable features, especially object contours and textures. Chapter 4 describes
a representation for 2-d data such as contours and textures that, besides having many
desirable invariance properties, has the nice characteristic of being reasonably predictable
in its operation. As a consequence, the human operator who is not an expert in the field of
image processing can easily relate to the method, and the resulting usability of the system
is significantly enhanced.
Chapter 2
Speeding Up Fractal Coding: Split Decision
Functions
2.1 Background
In fractal image compression, an image is partitioned into a set of image blocks called
ranges. A pool of larger image blocks called domains is used as a codebook from which
ranges are approximated with affine mappings of the intensity value. It is common practice
to use square blocks for both ranges and domains, as well as to enlarge the codebook by
including all rotations and reflections of each domain. For further details, see [21] and [40].
Bit rates and image quality are strongly related to range block size: large ranges are
hard to approximate but lead to low bit rates, while small ranges are easily approximated
but yield higher bit rates.
In order to achieve both good fidelity and low bit rates, a possible solution employs
variable size partitions that tune range size to the complexity of the different image areas.
The most popular partition mechanism uses a quadtree scheme [21]: a square block is
recursively broken up in four quadrants until the resulting blocks are considered sufficiently
simple to be approximated by a domain chosen from the codebook. This decision is
usually taken by comparing the value of a function of the block—the so-called split-decision
function—against a given threshold. The choice of both function and threshold affects the
fidelity of the reconstructed image as well as the coding time.
We now attempt to provide a comparison between the most widespread split-decision
funcions and to propose a split-decision function based on entropy, also addressing the
choice of the threshold.
4 Chapter 2. Speeding Up Fractal Coding: Split Decision Functions
2.2 Split-Decision Functions
The aim of a fractal coder is to minimise the root-mean-square (rms) error between
the original and the transformed image. To this aim, the natural way of driving the
partition process is that of adopting rms error as the splitting function. The classical
split-decision function computes the rms error between the range R and the optimal
transformed domain D∗:
S1(R) = rms(R,D∗) =‖R −D∗‖√
n,
where n is the area in pixels. Using the function S1 implies that, before subdividing, every
attempt is made to encode bigger ranges, thus leading to optimal rate-distortion curves.
On the other hand, because of the unsuccessful attempts, a lot of computation is wasted.
To avoid this, it would be better to use a function which can be computed before even
trying to encode the current range block.
In [41], the authors consider a splitting function called n-fold variance that for a generic
block B (range or domain) is defined as
S2(B) = nVar(B) =n∑
k=1
(Bk − µ(B))2 ,
where µ(B) is the mean value of the intensities B1, . . . , Bn. This choice not only accelerates
the encoding process but also outperforms S1 in terms of rate-distortion curves. This might
sound strange, as S1 should be optimal from the point of view of fidelity. The explanation
is that S2 is an adaptive criterion, while S1 is not. In fact, the n-fold variance S2 takes
into account the size of the block: on the average, its value on a given block is 4 times
bigger than on a block of half its linear size. If the threshold is fixed for all levels of the
quadtree, then the subdivision of bigger blocks—which are more difficult to approximate
accurately—is favoured over that of smaller ones. The overall effect is a better rate-
distortion curve.
Using n-fold variance with a fixed threshold is equivalent to the use of standard variance
S3(B) = Var(B) =1n
n∑k=1
(Bk − µ(B))2
with an adaptive threshold Ti = 4Ti−1, where Ti is the threshold at level i of the quadtree.
The fact that adaptive thresholds improve image quality has also been observed in [8] for
rms-based split-decision functions.
Chapter 2. Speeding Up Fractal Coding: Split Decision Functions 5
Entropy is another interesting splitting function. For an 8-bit gray scale block R, it is
defined as
S4(R) = H(R) = −255∑k=0
fk lg(fk)
where fk is the frequency within R of the intensity value k. Like variance, entropy saves
computing time by avoiding computation for unsuccessful attempts.
Regarding threshold adaptivity, there seems to be a close correlation between the
split-decision function scaling factor on adjacent levels of the quadtree and the optimal
thresholds on those levels. In fact, for the variance criterion, Ti = 4Ti−1 yields the best
rate-distortion curve, while for rms the best curve is obtained by Ti = 2Ti−1.
For entropy, things are a bit different. Entropy is a convex function that reaches its
maximum when the frequency distribution is uniform. Since we are dealing with 8-bit
images, with 256 intensity values, the distribution inside a small block (e.g., 8×8) is likelyto be noticeably less uniform than the distribution in a larger block (e.g., 32 × 32): thenumber of possible pixel values is comparable to the number of pixels in a block. This
means that larger blocks tend to have higher entropy; entropy is thus intrinsically adaptive.
Furthermore, being a logarithmic function, it does not have a scaling factor. For all these
reasons, the optimal threshold on level i should depend not only on that on level i − 1,but also on the size of the block itself—or equivalently on i. In fact, we experimentally
obtained the best curve using the threshold relation Ti = Ti−1 + 1/i. However, both
entropy and variance share the difficulty of tuning the threshold value precisely as to
obtain the desired degree of fidelity.
2.3 Experimental Results
Experiments were made using 4-level quadtrees, with range block linear sizes of 32, 16, 8
and 4 pixels, on the 512×512 lena image. In order to give no advantage to any particular
method, we did not employ any lossy speed-up technique such as block classification.
Furthermore, we considered codebooks of the same size for all levels of the quadtree.
Indeed, when the codebook size increases with the block size, rms has a relative advantage;
when codebook size decreases, the advantage is on entropy’s and variance’s side.
Table 2.1 shows the speed-ups achieved using adaptive entropy against adaptive rms.
Each row shows an encoding of lena at about the same compression ratio (CR) and
6 Chapter 2. Speeding Up Fractal Coding: Split Decision Functions
Table 2.1: Speed-up achieved using adaptive entropy against rms on the 512× 512 lenaimage
CR Entropy rms Speed-up
snr snr
12 34.6 35.2 2.28
35 30.1 30.9 1.95
50 29.1 29.7 1.83
quality. Depending on the rate, the speed-up ranges from 2.28 to 1.83. Note that the
speed-up increases at lower bit rates; this reflects the fact that more computations are
avoided in the first levels of the quadtree.
Figure 2.3 compares the rate-distortion curves of the three analyzed splitting functions.
For each method there are two curves: one is obtained with a fixed threshold, while
the other is obtained with the best adaptive threshold relation. As expected, adaptive
rms is optimal, yielding the best rate-distortion curve. However, fixed-threshold entropy
outperforms standard rms because of its intrinsic adaptivity. Adaptive variance and
adaptive entropy are comparable and very close to the optimum—that is, adaptive rms.
From the plot it can be seen that adaptive thresholds provide a definite improvement
independently from the chosen splitting-decision function.
The main difference between variance and entropy is in the kind of noise added to the
reconstructed image: at low bit rates, variance causes low-contrast (“flat”) zones to get
coded poorly; by contrast, with entropy it is high-contrast (sharp-edged) zones that get
coded poorly. The latter kind of degrade is significantly less noticeable to the eye.
Comparing the two curves for each method in Fig. 2.3, we observe a dramatic improve-
ment brought by adaptive thresholds. This is even more noticeable from a visual point of
view. In fact, adaptive thresholds yield a visual quality far better than fixed ones. This
improvement is basically due to the almost complete lack of blockiness.
2.4 Conclusions
In this chapter, we compared several decision-splitting functions for quadtree-based fractal
image coders. In addition, the effect of using adaptive thresholds on the various levels of
Chapter 2. Speeding Up Fractal Coding: Split Decision Functions 7
26
28
30
32
34
20 30 40 50 60 70 80 90
PS
NR
(dB
)
Compression Ratio
Figure 2.1: Rate-distortion curves of the 512 × 512 lena image obtained using a 4-level
quadtree.
EntropyAdaptive entropy
RMSAdaptive RMS
VarianceAdaptive variance
8 Chapter 2. Speeding Up Fractal Coding: Split Decision Functions
the quadtree has been have investigated. The experiments show that adaptivity yields far
better rate-distortion curves and attenuates the effect of blockiness. However, it should
be pointed out that in the rms-error based criterion the threshold for obtaining a given
image quality is independent from the particular image. That is, for a given threshold
value we obtain almost the same visual image quality for every image. Unfortunately, this
is not the case with entropy- or variance-based criteria. This makes it difficult to obtain
an a priori estimate of the threshold value for encoding a given image at a given quality
level.
Chapter 3
Fractal Indexing with Robust Extensions
3.1 Background
Recently, the diffusion of multimedia computing systems has aroused a significant interest
for research in multimedia database management. In particular, the search of image
databases is a complex issue. Desirable performance characteristics of an ideal indexing
system include precise retrieval, small index size and easy computability. Currently, there
are several solutions to this problem, most of them tailored to a specific field of application.
As a broad distinction, homogeneous databases—such as those required by biomedical
applications—are characterized by having very small differences among the images (think
of an archive of liver CT-scans); the most effective approaches to date are based on object
contour shapes and spatial relationships among them [10]. The most popular tools of
the trade are Attributed Relational Graphs [36] and 2-D Strings [7, 9, 32]. Images in
heterogeneous databases, on the other hand, can be represented by coarser global features,
such as texture or color percentage [20,22]. Some systems combine the two approaches to
restrict the query answer set. A survey of content-based indexing systems can be found
in [12].
Fractal-based encoding—also called IFS-based encoding from the Iterated Function
Systems that are at its foundations—has already proven to be a reliable technique that
exploits the self similarity present in an image by representing it as a collection of affine
contractive transformations [21]. In much the same way, it is possible to utilize the same
collection of transformations as low-level features which can be organized into a signa-
ture that identifies the image and allows it to be retrieved. As a result, fractal encod-
10 Chapter 3. Fractal Indexing with Robust Extensions
ing is able to provide an effective indexing technique for heterogeneous image databases.
There is an added benefit: the whole database can be searched and manipulated with-
out ever decompressing the images. This ability makes fractal-based indexing suitable
to the handling of large databases. The efficiency of these systems—as opposed to their
effectiveness—depends heavily on the compression speed and also on how the low level
features extracted from the images are organized into indices. The techniques available to
avoid a linear search of the database include spatial access methods such as K-d-trees and
R*-trees [3, 4, 46] and general-purpose methods such as hash tables.
In particular, the image indexing technique called first [33] is based on fractal en-
coding. The features used for indexing are mainly the histograms of the most salient
contractive transformation parameters. These are organized in a spatial access structure
by means of R*-trees. The first system does reasonably well under the aforementioned
performance aspects and also provides good image compression ‘for free.’
However, it is often desirable to have an indexing system that is invariant—or at least
robust—to several image transformations: geometrical transformations such as changes
in the viewpoint or in the orientation of the object; pixel intensity transformations such
as those produced by a change in the illumination or in the sensitivity of the medium;
transformations due to transmission, such as added noise, and so on. first, while being
quite accurate and stable as far as image retrieval is considered, was not invariant to any
such transformation in a precise sense.
This chapter shows how a technique originally designed to speed up fractal compres-
sion [37] can be effectively employed to obtain desirable features (i.e., invariance) in a
fractal indexing retrieval system. The new system is an heir of first that significantly
improves on its efficiency and robustness, providing invariance under a large class of pixel
intensity transformations as well as isometrical geometrical transformations. As we shall
see, this is accomplished by modifying the relevant set of indexed features, removing irrel-
evant data and representing relevant data in a more abstract way. The experiments show
that these modifications in the index structure, besides providing the desired invariances,
do not cause any performance degradation; rather, the index is smaller and the quality
of retrieval is increased. The new system is called fire (Fractal Indexing with Robust
Extensions).
The chapter is organized as follows: Section 3.2 introduces the necessary concepts of
Chapter 3. Fractal Indexing with Robust Extensions 11
IFS-based encoding and then explains how the fire index is created; Section 3.3 then
discusses the invariance properties of the technique, providing proof sketches. Finally,
the technique is put in context: a few experimental results are shown in Section 3.4.
These results show fire’s improvement over its parent first and compare both with the
most similar (feature-wise) state-of-the-art method. Sections A.1 and A.2 in the appendix
contain a few details to complete the proofs of invariance that are sketched in this main
text.
3.2 The Technique
In order for this discussion to be self-contained, we briefly review the relevant concepts of
IFS encoding before explaining the details of index construction.
3.2.1 Basics of IFS image encoding
The image to be encoded is partitioned into a set R of blocks called ranges. Another
set D of image blocks called domains is used as a codebook to approximate each range by
means of an affine transformation. The domains, with sides twice as long as the ranges,
are shrunken to range size by pixel averaging. For each range r ∈ R, we have to find thedomain d and two real numbers α and β that give
mind∈D{min
α,β‖r− (αd+ β1)‖ }, (3.1)
where 1 is a constant block of intensity 1. This minimizes the root-mean-square (rms)
error in the approximation of r by an affine image of d:
r ≈ αd+ β1. (3.2)
The inner minimum in Eq. (3.1) can be computed directly as the solution to a least squares
problem:
α =
∑1≤i≤n1≤j≤n
(ri,j − r)(di,j − d)∑1≤i≤n1≤j≤n
(di,j − d)2; (3.3)
β = r − αd, (3.4)
where n is the side length of both range and shrunken domain, while r =∑
1≤i≤n1≤j≤n
ri,j/n2
and d =∑
1≤i≤n1≤j≤n
di,j/n2 are the average intensities in r and d respectively. Computation of
12 Chapter 3. Fractal Indexing with Robust Extensions
the outer minimum, however, is rather heavy if D is to be exhaustively searched. In order
to reduce the search space, the blocks (both ranges and domains) undergo a classification
process that yields a feature vector for each block [37]. Later, when approximating a
range r, we only search a domain class centered on r’s feature vector. The spatial access
method used for this search is based on R*-trees.
How are feature vectors computed? Given an n× n block b, we set k ← 0, b(k) ← b,
and we draw a similarity between pixel intensity and mass to compute the block’s mass
center coordinates:
xk =1Mk
∑1≤i≤n1≤j≤n
ib(k)i,j ; yk =
1Mk
∑1≤i≤n1≤j≤n
jb(k)i,j , (3.5)
where Mk =∑
1≤i≤n1≤j≤n
b(k)i,j is the block’s mass.
We then consider the ‘deviate’ block given by
b(k+1)i,j = (b(k)
i,j − µk)2, (3.6)
where µk =Mk/n2 is the average mass per pixel in b(k). We set k ← k+1 and go back to
Eq. (3.5) to compute the new mass center coordinates. This procedure can be carried out
an arbitrary number of times, but experiments have shown that twice (k = 0 and k = 1)
is enough to characterize the blocks adequately [14].
The resulting points (x0, y0) and (x1, y1) are then expressed in a polar coordinate
system whose origin lies in the block’s center, yielding (ρ0, θ0) and (ρ1, θ1); after discarding
ρ0 and ρ1, we are left with the final feature vector (θ0, θ1).
Once each range in R has been approximated by a domain in D, the image can be
encoded by the affine mappings from domains to ranges. The reconstruction of the original
image A can be accomplished by the iteration of this system of transformations, since it
converges to A regardless of the starting data [28].
Some further shortcut can be taken for multichannel images in YIQ form: by exploiting
the human visual system’s peculiarities, it is possible to encode the luminance channel
with maximum accuracy and then utilize the same range-domain correspondence also for
the encoding of the chrominance channels. In other words, referring to Eq. (3.1), once
the optimal d has been fixed by the luminance encoding, for the remaining chrominance
channels we only recompute the inner minimum over α and β: the hard outer minimum
Chapter 3. Fractal Indexing with Robust Extensions 13
over d ∈ D is computed only once. This yields a definite saving in both encoding time
and compressed length with a slight loss in terms of distortion.
In fractal encoding, it is customary to enlarge the domain pool by considering all
reflections and rotations of the domains along with the original versions. The fire systems
does utilize these block isometries for encoding, but not for indexing.
Another concept that is widely used in plain fractal encoding is that of variable, dy-
namic partitioning: the set R usually grows larger as a candidate range is seen to be
best approximated by being recursively divided. The whole process yields a hierarchical
partition, usually in the form of a quadtree [39]. However, how shall be seen later, fire
does not utilize dynamic partitioning for the sake of invariance to contrast scaling.
3.2.2 The Index
Once the image has been encoded, the index is obtained from the data gathered from the
encoding phase.
The index has a hierarchical structure, divided into two levels. The first level has to
do with the class distribution of the ranges composing the image, while the second level
depends on the specific affine mappings that encode them.
In order to classify the ranges, we quantize their (θ0, θ1) feature vector. The quan-
tized feature vectors are then histogrammed over all ranges, giving the first level of the
index. By tuning the number of bits utilized for quantization, it is possible to decide
how many classes there will be, and therefore control the size of the histogram. This is
the first important difference between fire and first: the previous system employed a
grid-quantized representation of the center of mass for each range rather than the polar
angle.
The second level of the index consists of another range-by-range histogram: that of the
(P(d), α) vector that represents the affine mapping utilized for the range’s encoding. The
domain is represented by the pixel position P(d) of its upper left corner, while α is the value
resulting from Eq. (3.3), appropriately quantized. This is another important difference
between fire and first: the latter utilized a 3-component vector including P(d), α and β,
while the new system only utilizes P(d) and α. As shall be seen, this is an improvement
over the previous version: the absence of β from the index, along with the adoption of
14 Chapter 3. Fractal Indexing with Robust Extensions
static partitioning, ensures index invariance under several pixel transformations to be
examined in the next section.
The two histograms would be too big to be a usable data structure, especially for large
databases; for this reason, the whole image index undergoes a discrete Fourier transform,
after which only the lower frequency coefficients are kept—typically 3 coefficients have
been empirically verified to be enough [1]. This operation significantly reduces the size of
the index.
Multichannel images—either YIQ or RGB—can be handled in two ways: we can either
have a distinct index for each channel or we may pick a ‘privileged’ channel to have a full
index and condense the indices for the remaining two channels.
In the case of YIQ images, the privileged channel is luminance: the index for the
luminance component has its full two levels, while the I and Q channels have the second
level only. This choice is motivated by two considerations:
• the necessity of keeping the size of index data reasonably small; the choice of lumi-nance as the privileged channel is justified by the nature of the human visual system,
which is much more sensitive to luminance information than it is to chrominance.
• the inner workings of the encoding phase, which, as stated in the previous section,has been designed to incorporate a similar shortcut: when dealing with 3-channel
images in the ‘cheaper’ way, the optimal domain for each target range is selected on
the basis of luminance information only.
RGB images are usually handled by having three separate full indices. However, with
a linear combination of the three channels, it is possible to build a luminance channel ‘on
the fly’ to be used as a privileged component and then choose arbitrarily two components
among R, G and B, building only the second index level for these two. One of the original
3 channels is therefore discarded altogether, but it can be reconstructed from the combined
luminance and the remaining two original channels when the image is decoded.
Examining how the index is calculated, note that identical encodings yield identical
indices. In other words, the index is entirely determined by the structure of the encoding
and contains no extra information. Therefore, any image transformation that leaves the
encoding unchanged obviously leaves the index unchanged. Furthermore, referring to the
Chapter 3. Fractal Indexing with Robust Extensions 15
affine approximation in (3.2), any variations in the β parameters alone, while changing
the reconstructed image, has no effect on the index.
3.3 Invariance and Robustness
Without loss of generality, let us consider a true-color (24-bit) n×n image A. Its red, greenand blue components shall be denoted by AR, AG and AB. The image transformations
that we are about to examine include the following pixel value mappings:
• Contrast scaling.The transformed image is given by
A′ = wA, (3.7)
where w is a positive real number.
• Luminance shifting.The transformed image is given by
A′R = AR +mR1; A′
G = AG +mG1; A′B = AB +mB1, (3.8)
where 1 is a suitably sized constant image of intensity 1, while mR, mG and mB are
arbitrary real numbers.
• Color change.The transformed image is given by
A′R = wRAR; A′
G = wGAG; A′B = wBAB, (3.9)
where wR, wG and wB are positive real numbers.
Due to the standard representation of digital images, it should be observed that what
really gets into the transformed image is min(A′, 255) rather than simply A′. Conversely,
in the case of luminance shifting with a negative offset, the transformed image gets
max(0, A′). In other words, the final result is clipped to the interval [0, 255]. Our dis-
cussion assumes that the parameter values are in such a range that the transformations
have linear effects—i.e., that no clipping occurs. Indeed, in the case of severe clipping,
16 Chapter 3. Fractal Indexing with Robust Extensions
all image information is destroyed and there is no method that can ensure invariance:
any image would simply turn into a series of 255’s. On the other hand, if ‘mild’ clipping
occurs at a few points, fire retains its robustness: the transformed image is no longer at
distance 0 from the original, but the distance is small enough for the transformed image
to rank high in the answer set. Generally, the main reason why the user wants invariance
to pixel-value transformations is the ability to deal with small discrepancies in the image
acquisition devices (e.g., scanner calibration). Therefore, it makes sense to assume that
in most practical cases the actual parameter values are indeed reasonably small.
In addition to pixel value mappings, we are interested in geometric image transforma-
tions:
• ‘Integer’ rotations by an angle ω multiple of π/2.Depending on the angle of rotation, the transformed image is given by
A′i,j = Aj,n−i+1 when ω = π/2; (3.10)
A′i,j = An−i+1,n−j+1 when ω = π; (3.11)
A′i,j = An−j+1,i when ω = π/2. (3.12)
• Reflections.According to the type of reflection, the transformed image is given by
A′i,j = An−i+1,j
‘vertical flip’
about the horizontal axis;(3.13)
A′i,j = Ai,n−j+1
‘horizontal flip’
about the vertical axis;(3.14)
A′i,j = Aj,i
‘NE/SW flip’
about the main diagonal;(3.15)
A′i,j = An−j+1,n−i+1
‘NW/SE flip’
about the secondary diagonal.(3.16)
3.3.1 Contrast scaling
Let us examine what happens when the image A undergoes a contrast scaling transforma-
tion as in Eq. (3.7). The first thing to be noticed is that the positions of the blocks’ mass
centers do not change. In terms of the intensity/mass simile, what happens is that the
density changes by a factor w, but, since this happens uniformly over the whole image, the
values of x0 and y0 in Eq. (3.5) remain the same as they were before the transformation.
Chapter 3. Fractal Indexing with Robust Extensions 17
When k is increased for the calculation of x1 and y1 (and possibly higher-order devi-
ates), there is a further scaling of the density by a factor of w, but this does not affect
the resulting x and y coordinates. Therefore, the whole calculation of feature vectors is
unaffected.
As for the rest of the encoding process, multiplicating all the blocks—both ranges and
domains—by the same scalar w > 0 simply means that the approximation in Eq. (3.2)
gets replaced by
r′ ≈ αd′ + β′1, (3.17)
where all the ‘primed’ quantities are multiplied by w. Of course, also the rms error is
scaled by w; however, the minimum error for a given range will be obtained with the same
domain and the same value of α as before, since all errors are scaled uniformly.
As a consequence, all ranges in the transformed image A′ will be encoded by the same
domains as in the original image A; in addition, the positions of the mass centers—and
therefore the (θ0, θ1) feature vectors—remain unchanged.
The bottom line is that the encodings of A and A′ differ only in their β offsets—in the
encoding of A′, they are all multiplied by w. However, β does not appear anywhere in the
index; what this implies is that the fire index is identical for A and A′, as desired.
It should be noted that the invariance of fire under contrast scaling transformations is
dependent on the adoption of a predetermined, fixed image partitioning: when using any
locally adaptive partitioning (such as the popular quadtree-based schemes), if w > 1 there
is no guarantee that the rms error in Eq. (3.17) cannot be improved upon by dividing the
range into smaller units. In this event, the whole encoding changes and so does the index.
3.3.2 Luminance Shifting
In the case of a luminance shifting transformation such as that described in Eq. (3.8),
the value of α that solves the least squares problem in (3.3) remains unaltered, but the
optimal value of β as given by Eq. (3.4) is affected by the appropriate additive constant
among those appearing in Eq. (3.8)—e.g, mR for the red channel and so on. As it happens
in the case of contrast scaling, however, the index is unaffected because β does not appear
in it.
Luminance shifting transformations also affect the position of the ranges’ centers of
mass, which might seem to affect the class histogram portion of the index. However,
18 Chapter 3. Fractal Indexing with Robust Extensions
recalling that the relation between the original block b and the transformed block b′ is
b′ = b+m1, it can be seen that the effect is that of moving the center of mass towards
the geometric center of the block. It is as if a slab of constant density were attached to
the bottom of the block. The heavier the slab gets, the more the center of mass moves
toward the geometric center.
What this means is that the angle θ is unvaried; it is the distance ρ that gets smaller.
Since only θ is used for classification and index construction, the technique is invariant un-
der luminance shifting. Unlike the case of contrast scaling, for this type of transformations
the invariance is not even dependent on the adoption of a fixed-partitioning scheme.
3.3.3 Color Change
Examine a color change transformation such as described in Eq. (3.9). On a channel-by-
channel basis, this is mathematically identical to a contrast scaling, and therefore the fire
technique is invariant to color change as it is to contrast scaling as far as single-channel
8-bit images are considered.
However, in the case of multi-channel images, the technique stays invariant only as
long as each channel has its own index. In other words, it is not possible to condense
the indices for different channels into a single index while keeping fire invariant to color
change.
3.3.4 Rotations and Reflections
The first thing to be noticed is that any combination of reflections and integer rotations
can be obtained by a suitable subset of transformations that include the four rotations
and reflection about the vertical axis. For this reason, we shall restrict our attention to
these transformations.
The approach used in fire to handle rotated and reflected versions of the query image
is that of reducing all the images in the database to a ‘canonical form,’ that is, one
particular rotation and reflection among all the possibilities.
In order to choose a canonical form for the image A, we consider three key points: (1) Its
center of mass P = CM(A); (2) The center of mass of the deviate image Q = CM(A(1));
(3) The geometric center O.
The canonical form of an image is defined by the following two properties:
Chapter 3. Fractal Indexing with Robust Extensions 19
ϕP
OQ ϕ P
OQP O
Q
ϕ
(a) (b) (c)
Figure 3.1: Obtaining the canonical form of an image. (a) Original image with key points
highlighted; (b) Image flipped vertically; (c) Final image rotated into canonical form.
1. the point P lies in the northwestern quadrant of the image.
2. the oriented angle ϕ = POQ satisfies 0 ≤ ϕ < π.
The first property selects the right rotation, while the second selects the right reflection.
Each image A, therefore, undergoes the following normalization procedure: first, Prop-
erty 2 is enforced by flipping A if necessary; then, Property 1 is enforced rotating A by the
appropriate multiple of π/2 as to bring its center of mass P in the northwestern quadrant.
This process is illustrated in Fig. 3.1.
As a result, all isometric variants of A yield the same index. Therefore, the fire index
is invariant under integer rotations and reflections.
3.4 Experimental Results
This section focuses on the improvements achieved by fire over its parent, first. Since
one of the characteristics of these two systems is that of providing both image compression
and image indexing, our tests have been directed to both aspects. Additionally, in order
to assess fire’s performance with respect to methods based on different theoretic foun-
dations, we also have compared fire to PicToSeek [23]. The choice fell on this method
because, similarly to fire, it has been designed with the purpose of providing several
invariance properties.
20 Chapter 3. Fractal Indexing with Robust Extensions
The fire system has been implemented as a Java program under Windows 98. The
hardware platform is a Pentium II based PC. All the tests have been performed on a
heterogeneous database of about 2000 256×256 images at 24 or 8 bits/pixel that includesthe following categories: tools, animals, human faces, CT-scans, NMRs, landscapes, art
images and pasta. The database has grown over time by accumulation and basically con-
sists of the Smeulders dataset [23] and the Sclaroff dataset [42,43], plus additional images
from our own test dataset. In order to verify the invariance and robustness properties
of the system, the database also contains several transformed versions of some of the im-
ages. The transformed versions have been obtained from the originals by applying various
combinations of pixel value mappings and rotation/reflection.
The quality of the encoding can be evaluated by plotting a measure of distortion against
a measure of compression. We utilize SNR (signal-to-noise ratio) as a measure of inverse
distortion and the bit rate in bits per pixel as a measure of inverse compression.
As can be seen from Fig. 3.2, there is an improvement in compression that derives from
fire’s better classification scheme: the former system utilized a modified grid method in
Cartesian coordinates instead of the present quantized polar coordinates. The (θ0, θ1)
feature vectors have been quantized to 6 bits, yielding 64 classes. The classification im-
provement even makes up for the weaker partitioning scheme of fire, which has abandoned
quadtree dynamic partitioning, resorting to a fixed partitioning scheme. The data shown
in Fig. 3.2 have been gathered by averaging the results over a representative selection
including about 20 images for each category in our database.
As for the discriminating ability, this also has improved significantly. It should not be
surprising that better compression goes along with better indexing: in fact, better com-
pression for equal SNR means that the internal self-similarity of the image is represented
more accurately; this allows it to be exploited more effectively also for indexing.
There are several measures of the accuracy with which images can be retrieved; the
one utilized in this chapter is defined thereafter.
The averaged rank over the set of relevant images (AVRR) is just what its name
implies: given a query, consider an answer set S of predetermined size—say p. Then
the ideal answer set contains the p “best” images for that query, ranking therefore from
0 to p − 1. As customary in image indexing, the ranks are subjectively assigned by anexternal human operator with some experience in the field of application from which the
Chapter 3. Fractal Indexing with Robust Extensions 21
31
32
33
34
35
36
37
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
PS
NR
(dB
)
BIT RATE (bpp)
FIRSTFIRE
Figure 3.2: Compression to SNR comparison between first and fire.
images are drawn.
In general real life, of course, the obtained ranks will be worse than optimal. Their
average is just the AVRR of the answer set S. In other words,
AVRR(S) =1p
∑s∈S
rank(s), (3.18)
where S has cardinality p. The best AVRR is therefore the lowest. The lower bound for
AVRR is called ideal AVRR (IAVRR) and depends on p only:
IAVRR(p) =1p
p−1∑i=0
i =p− 12. (3.19)
Fig. 3.3 shows how the two systems perform as for AVRR. The results shown here have
been obtained by averaging 15 different queries over databases of several sizes with answer
set size p = 20.
These measures of accuracy should be interpreted with some caution, because they
depend heavily on the particular database being used for the tests. To make matters
worse, the researchers in this field have not yet agreed on a standard database to be
employed for this kind of tests. However, as database size increases, the measures get
more and more reliable in describing performance.
22 Chapter 3. Fractal Indexing with Robust Extensions
8
9
10
11
12
13
0 200 400 600 800 1000 1200 1400 1600 1800
AV
RR
DATABASE SIZE
FIRSTFIRE
PicToSeekIAVRR
Figure 3.3: AVRR comparison between first, fire and PicToSeek. The horizontal line
represents ideal AVRR.
Fig. 3.4 shows the outcome of a query. The query image is in the upper left, while
the 11 best ranking images in the answer set follow. In this case, our database contained
8 transformed versions of the query image, and all are retrieved. The first retrieved images
are at distance 0 from the query; starting from the second image in the second row, the
distance is different from 0, since for these images the contrast has been increased beyond
the clipping point. From the third row on, the retrieved image is a different one altogether
(a bear instead of the monkey), along with a few transformed versions. Fig. 3.5 shows
invariance/robustness to luminance shifting combined with rotation and reflection, while
Fig. 3.6 shows color change.
A query returning mainly rotated and reflected versions of the query image is depicted
in Fig. 3.7. In this case, too, the first two rows return all the 0-distance matches in an
arbitrary order, while the last row shows the next best matches.
To get a feel for the different personalities of fire and PicToSeek, look at Figs. 3.8
and 3.9. Here, the two systems are queried with the same elephant image. As can be
seen, fire (Fig. 3.8) acts like color-blind and returns a 0-distance match first, even if
the colors and the orientation are radically different. Furthermore, notice how the first
three images in the second row are very similar to a reflection of the query image. On the
Chapter 3. Fractal Indexing with Robust Extensions 23
Figure 3.4: Querying fire: contrast scaling invariance and robustness.
24 Chapter 3. Fractal Indexing with Robust Extensions
Figure 3.5: Querying fire: luminance shifting combined with rotation and reflection.
Chapter 3. Fractal Indexing with Robust Extensions 25
Figure 3.6: Querying fire: color change.
26 Chapter 3. Fractal Indexing with Robust Extensions
Figure 3.7: Querying fire: rotations and reflections.
Chapter 3. Fractal Indexing with Robust Extensions 27
Figure 3.8: Querying fire: elephants.
other hand, PicToSeek (Fig. 3.9) with the ‘color invariant’ option selected is much more
attentive to hue values. As a result, all the returned images show a very similar cumulative
color distribution, which is not the case with fire. A ‘nearly-rotated’ version of the query
image does appear (first place in the fourth row), but not before several lions.
As the last example suggests, the indexing method that should be preferred is largely
dependent on the specific application and user needs, especially when considering that the
two systems have similar overall performance as measured by AVRR.
3.5 Conclusions
This chapter has illustrated a system for image indexing that is based on fractal image
coding. The images are always kept in compressed form, and in fact the encoding itself
provides the information that makes up the index. For this reason, the indexing method
proposed is suitable for use with large image databases.
28 Chapter 3. Fractal Indexing with Robust Extensions
Figure 3.9: Querying PicToSeek: elephants.
The system has been designed to be invariant to three classes of pixel intensity trans-
formations: contrast scaling, luminance shifting and color change, and to all isometric
transformations such as rotations, reflections and their combinations.
These features make the system robust with respect to a wide class of image distortions
that are likely to happen in real applications.
The experiments show that fire performs adequately to be employed with large image
databases: the compression it achieves with fractal encoding is nearly top notch and its
retrieval accuracy compares well with today’s standards.
Chapter 4
A Hierarchical Representation for Image Retrieval
4.1 Background
In the last few years, due to the steady progress of multimedia processing, the interest of
the scientific community for multimedia database systems has significantly grown. In par-
ticular, the study of effective representations suitable for obtaining approximate retrieval
by content has received great attention [24,38].
Human beings are extremely performant when it comes to the recognition of objects
independently from their position and orientation. Finding a solution for the same prob-
lem in machine vision, however, turns out to be a very complex and difficult task. The
main task of pattern recognition is that of comparing a measured image in an unknown
position to different prototypes. We get a direct brute force solution to this problem if
we compare the prototypes in all possible positions and extract the optimal coincidence.
If we use Euclidean distance for comparison (which under certain assumptions yields a
maximum likelihood estimator), we end up calculating the maximum of a high order cor-
relation function, which is a rather time consuming operation. The time required grows
exponentially with the number of parameters describing the coordinate transformations
induced by the motion.
A more elegant way to solve the problem involves the use of mappings that are able to
extract position-invariant intrinsic features of the object. The method of Fourier descrip-
tors is known to work reasonably well for the recognition of object contours independent
of position, orientation and size [25]. There are works that show the results of the Fourier
approximation of polygons for different numbers of Fourier coefficients [49]. As it turns
30 Chapter 4. A Hierarchical Representation for Image Retrieval
out, it is possible to achieve a good approximation of a polygon by using 15–30 coefficients.
Even with few coefficients, the Fourier series obtain an acceptable approximation to the
original curve because the low frequencies contain the most significant information about
the object.
Other techniques recur to the minimization of the contour’s moments with respect to
an orthogonal coordinate system centered in the object’s center. Generally, only the first
two moments are used: as pointed out in [30], higher-order moments add little information
content. However, this approach does not appear to be particularly effective: indeed, it
requires a great amount of information and long computing times.
In this chapter, we present a novel time-series indexing system, heri (Hierarchical
Entropy-based Representation for Indexing), useful for efficient retrieval by content. heri
is based on her [15,19] (Hierarchical Entropy-based Representation), which is employed in
order to describe a 1-D signal by means of a few coefficients. As a matter of fact, effective
retrieval by content implies a representation of pictorial data that is approximate in order
to save computing time during retrieval, but still retains enough relevant information to
allow for a discriminating retrieval. Many techniques, therefore, compress information
deemed to be ‘relevant’ into a few coefficients.
her too falls into this category. The particular method used by her reconstructs the
energy distribution of the given signal along the independent variable axis. The focal point
of this technique is that we select the most relevant local maxima based on the area, and
therefore the energy, associated with each maximum. An interesting consequence is the
generality of the resulting representation. Indeed, heri is a good candidate for content
based retrieval whenever the information can be accurately represented by a 1-D signal.
The structure of the chapter is as follows. In Section 4.2, we present the theoretical
foundation underlying her and describe the correspondence between the input signal and
the representation vector. We also outline the link between the properties of an input shape
and the resulting coefficients. Moreover, the invariance properties of heri are discussed.
Section 4.5 shows the results of our experiments in terms of well-known objective measures.
Finally, Section 4.6 contains a few conclusing remarks.
Chapter 4. A Hierarchical Representation for Image Retrieval 31
4.2 HER
her is based on a subset of the 1-D signal samples (namely, the signal’s local maxima)
and on their associated energy. The interesting peculiarity of this representation is that it
allows us to reconstruct the signal’s energy distribution along the time axis by using only
a few coefficients.
Let x(n) be a monodimensional signal, time-discrete and finite in the time domain, i.e.,
x(n) �= 0 for n ∈ [0, N − 1]. Let us define the energy of the i-th sample as E(i) = |x(i)|2.The total energy of x(n) can be then defined as the sample by sample sum of the individual
energies:
E =N−1∑i=0
E(i) =N−1∑i=0
|x(i)|2. (4.1)
Now, consider the difference operator ∆i = x(i)−x(i−1). A local signal maximum occurs
whenever ∆ becomes negative:
x(k) local max ⇔ ∆k ≥ 0 and ∆k+1 < 0. (4.2)
If we have several adjacent samples x(k), . . . , x(k + () such that ∆k = . . . = ∆k+ =
0, then we have a signal plateau. In this case, if ∆k+ +1 < 0, we pick the plateau
midpoint x(i+ (/2) as the local maximum.
Suppose x(i) is a local maximum for the signal under study; we compute its relative
energy, weighted by total signal energy, as
Er(i) =E(i)
E − E(i)E(i). (4.3)
We now consider x(i) as the midpoint of a Gaussian distribution. The choice of as-
sociating a Gaussian function with each maximum has been made after considering the
following facts:
• Given a set P of points, for large |P |, if the number of local maxima m is such that
m� |P |, then any distribution tends to a Gaussian.
• If the contour is corrupted by 0-mean Gaussian noise, then a Gaussian distributionis obviously the best fit for the resulting distribution.
• The experiments have shown that choosing a distribution other than a Gaussian haslittle effect on both the energy associated to each maximum and the final normalized
recall achieved.
32 Chapter 4. A Hierarchical Representation for Image Retrieval
• The symmetry of Gaussian distributions allows us to associate with each maximumthe energy contained in an interval that has the maximum as its midpoint in a
natural way.
The standard deviation of the Gaussian associated with maximum in x(i) is then
σ(i) =1√
2π (Er(i))2. (4.4)
We then calculate the entropy relative to the maximum x(i) as the quantity
S(i) =1x(i)
σ(i)∑k=−σ(i)
|x(i + k)|. (4.5)
S(i) can be considered as a relative measure of the signal energy in the range [i−σ(i), i+σ(i)] with respect to the energy in x(i). If a signal has m maxima, its total entropy is
therefore
S =m∑
i=1
S(i). (4.6)
We now consider a vector x containing the m maxima of x(i) in decreasing order:
x ≡ (x(i1), . . . , x(im)), with x(i1) ≥ x(i2) ≥ · · · ≥ x(im). (4.7)
Then, the representation y of the signal x is ultimately obtained as the union of all intervals
around the maxima appearing in x.
y =m⋃
k=1
[x(ik − σk), x(ik + σk)]. (4.8)
The signal y is uniquely determined by the first m triples of the vector y, containing
all the maxima and their associated energy and defined as
y = i1, E(i1), [x(i1 − σ1), . . . , x(i1 + σ1)],
. . . , . . . , . . . ,
im, E(im), [x(im − σm), . . . , x(im + σm)],
. . . , . . . , . . .
(4.9)
The vector y is the her representation of the signal x. More formally, supposing we
have a time series x(·) with N points, let us define the energy of the i-th sample as
E(i) = |x(i)|2. The total energy of x(·) is simply E =∑N−1
i=0 E(i), while the relative
Chapter 4. A Hierarchical Representation for Image Retrieval 33
energy of x(i) is Er(i) = E(i)2/(E −E(i)). A summary of the whole process is presented
in Figure 4.1, which shows a high level description of the algorithm to obtain the her
form of a given signal.
An alternate form for Step H6 stops iterating when the fraction of the total energy
remaining in the signal x(·) falls below a given threshold. In most cases, the alternate test
offers more control on index accuracy at the expense of unpredictable index size. In order
to perform our tests with preset index sizes, the simpler ‘number of maxima’ test has been
preferred.
Another possibility is that, instead of calculating σ in Step H4, it can be treated as
a parameter and therefore set to the same fixed value for all the maxima. This has the
effect of establishing an overall minimum spacing distance between consecutive maxima.
Figure 4.2 shows a specific example of how the her representation is obtained starting
with a real-life input signal.
In order to perform an approximate comparison between two given signals x1(·) and x2(·),we can compute the distance between their her representations y1(·) and y2(·) or, equiv-alently, between the vectors y1 and y2, defined as follows:
D(y1, y2) =∞∑i=0
|y1i − y2i|. (4.10)
The distance between any signal and itself is obviously equal to zero.
It should be stressed that her is not meant to exactly reproduce the input signal, as
would be appropriate for strict signal compression applications. Rather, her extracts a
number of features that can be used to retrieve signals similar to the one under consider-
ation. Even the case σ = 0 does not imply that every single signal point will be sampled
by her: the algorithm always limits itself to the signal maxima. If, on the other hand,
we select all the points (not only the maxima) and we define the instantaneous energy
density as
ρE(n, () =12(
∑i=−
x(n+ i), (4.11)
then ρE does converge to the original signal:
lim →1
ρE(n, () = x(n) (in l2). (4.12)
34 Chapter 4. A Hierarchical Representation for Image Retrieval
The HER Algorithm
Here is how the her representation vector y of the sequence x(i), 0 ≤ i ≤ N − 1 isobtained.
H1. [Initialize counter and ouput vector. Compute the total energy.]
k ← 0; y = ( ); E =∑N−1
i=0 |x(i)|2
H2. [Find the m signal maxima and put them in a queue Q in decreasing magnitude
order, along with their x-axis position and their 4-distance from the first (largest)
maximum.]
Q←((i1, 0, x(i1)), . . . , (im, dm, x(im))
);
H3. [Pop the largest maximum from Q.]
(t, x(t))← pop(Q);
[Compute its relative energy Er(t).]
E(t) = |x(t)|2; Er(t) = E2/(E − E(t));
H4. [Compute the standard deviation relative to the current maximum.]
σ(t) =1√
2π (Er(t))2;
[In other words, we are considering x(t) to be the midpoint of a Gaussian
distribution. Now compute its relative entropy.]
S(t) =1x(t)
σ(t)∑i=−σ(t)
|x(t+ i)|;
H5. [Append the newly found values to the her output vector y.]
y = y ⊗ (x(t), S(t), dt);
H6. [Go back to Step H3 until we have used a predefined number M of maxima.]
k ← k + 1;
If k < M go to Step H3, else output y.
Figure 4.1: A high level sketch of the her algorithm
Chapter 4. A Hierarchical Representation for Image Retrieval 35
Figure 4.2: Obtaining the her representation of a real-life 1-D input signal
36 Chapter 4. A Hierarchical Representation for Image Retrieval
4.2.1 Computing Time Evaluation
A cursory analysis of the algorithm shows that its time complexity is basically linear in
the number of points N . More precisely, let us consider an N -pixel signal. Assuming that
the number of relevant maxima is m (typically m� N), Step H1 requires constant time,
while Step H2 can be carried out by performing one sequential scan of the N -pixel input
and sorting the local maxima—that is, time c1N + c2m logm. The loop including Steps
H3 through H6 is executed m times. Steps H3, H5 and H6 require constant time, while
Step H4 takes a time proportional to σ(t), which has N as a tight upper bound—for all
practical purposes, however, σ(t) can be considered as a constant. This yields a running
time of c3 + c4σ(t) for one iteration. Therefore, the total running time for the algorithm
is O(N +m logm+Nm). Since m is usually fixed to some low value such as 5 or 6, the
whole expression reduces to O(n).
4.3 HER for Contours
We can now apply the proposed model to signals in order to analyze and classify closed
contours of objects and regions of a pictorial scene.
We first need to obtain a 1-d representation of a 2-d contour. In order to obtain a
1-d time series from 2-d contour data, two different approaches come to mind. The first
approach is that of sampling the contour at fixed angle increments, recording the distance
between the contour and the mass center. This approach has the advantage of keeping
the index small because, once the angle increment ε is fixed, the number of sample points
is 2π/ε for any contour. However, this approach has at least one serious shortcoming:
for non-convex contours such as the one depicted in Fig. 4.3 (A), there are several angles
where there is no single contour point: for the angle θ, there are 3 points that intercept a
straight line starting from the center O. As a consequence, at least 2 out of these 3 points
cannot be represented, and the reconstrucion is inevitably wrong (B). The exact type of
error depends on the strategy adopted in the case of multiple points for the same angle—do
we record the minimum, maximum or average distance?
The second approach involves scanning the contour pixel by pixel rather than angle
by angle. One conceivable disadvantage is that our representation will have one point for
each contour pixels; therefore, the data size can get large for images at high resolution.
Chapter 4. A Hierarchical Representation for Image Retrieval 37
θO
O
(A) (B)
Figure 4.3: Sampling a 2-D contour at fixed angle increments can destroy information
However, the advantage is that any contour can be represented in a lossless, reversible
way. For this reason, this approach has been preferred. We scan the contour clockwise
starting from its top left pixel, recording the distance between each pixel and the center
of mass, as shown in Fig. 4.4. The contour (A) is sampled pixel by pixel and this yields a
periodic time series (B) with as many points as there are pixels in the object contour.
To do so, we choose our frame of reference to be a coordinate system centered in the
center G of the object under exam, computed as follows:
(xG, yG) ≡(1k
k∑i=1
xi,1k
k∑i=1
yi
), (4.13)
where xi and yi are the coordinates of a pixel Pi belonging to the contour, while k is the
number of contour pixels.
Next, we compute the d4 distance between the center G and the uppermost and left-
most pixel P1 of the contour:
d4(P1, G) = |x1 − xG|+ |y1 − yG|. (4.14)
Repeated application of Eq. (4.14) for all contour pixels according to a predefined direction
(e.g., clockwise) yields a representation γ(s) of the contour in curvilinear coordinates. Such
a representation is univocal, since it is possible to reconstruct the original 2-D contour
shape without loss of information.
If we apply the model we have just described to γ(s), we observe that the maxima
of γ(s) correspond to the points of the contour having the greatest distance from the
38 Chapter 4. A Hierarchical Representation for Image Retrieval
N/2
(A)
(B)
N
Figure 4.4: Sampling a 2-D contour pixel-by-pixel
center G. Figure 4.4 depicts the concept graphically in the case of a theoretical con-
tour, while Figure 4.5 shows the correspondence between a real-life contour and its her
representation.
The entropy associated to each maximum in γ(s) can be interpreted as the ‘signature’
of the distribution of contour pixels in the neighborhood of the considered maximum: if
r(P1, P2) denotes the straight line passing through the points P1 and P2, then the area
contained by the curve γ(si − σi, si + σi) ∪ r(si − σi, si + σi) is the quantity that gets
associated to the maximum in γ(si).
The discrete form of the Fourier Transform [34] is also often used as a shape descrip-
tor. It has several nice well-known mathematical properties, most importantly linearity.
Therefore, by invoking Parseval’s theorem, it can be proven that searching by a Fourier
index, however reduced, we can expect no false dismissals. In other words, images that lie
within the specified distance from the query image will never fail to appear in the answer
set. The reason is that since many Fourier coefficients are discarded, the distance between
two items in feature space is less than the original distance in pixel space. This indeed
makes sure that there are no false dismissals, but on the other hand it might introduce
some false alarms that must be filtered out in a postprocessing step.
Although the dft and the closely related dct (Discrete Cosine Transform), used by
jpeg, are able to capture a good deal of information about images, sharply straight lines
Chapter 4. A Hierarchical Representation for Image Retrieval 39
Figure 4.5: The correspondence between a contour and its her representation
40 Chapter 4. A Hierarchical Representation for Image Retrieval
M = 1 M = 4 M = 10 M = 60
Figure 4.6: Approximation of a shape by the first M coefficients of its dft
M = 3 M = 6 M = 8 M = 10
1
O
3
2 1
O
3
5
4
6
2 1
O
3
8
5
4
6
7
2 1
O
3
10
8
5
4
697
2
Figure 4.7: Approximation of a shape by its M largest maxima using her
can’t be effectively represented unless we are willing to use enough coefficients. As shown
by Zahn and Roskies [49], an adequate approximation of a polygon requires 15–30 co-
efficients. When the object has highly irregular or jagged contours, even 30 coefficients
are not enough to characterize the shape adequately for accurate reconstruction. The
increasingly good approximation of a shape by its dft is shown graphically in Figure 4.6.
Differently from Fourier-based methods, the her representation was never meant for
reconstructing the signal. However, it is indeed possible to reconstruct the contour if one
feature is added to the her representation: the angle made by the current maximum and
some reference line—say, the positive X axis. This feature might be employed to enhance
the system’s usability by providing the user with feedback about the actual appearance
of the query shape, at the cost of a 33% increase in index size. In this case, Figure 4.7
shows how a her reconstruction changes when increasing the number M of maxima.
In Figure 4.7, it is assumed that all interpolation between maxima is done by straight
line segments. In principle it is possible to use curves as to fit the position i where the
maximum x(i) occurs, but in practice the final effect is usually not worth the extra effort.
Chapter 4. A Hierarchical Representation for Image Retrieval 41
4.3.1 Properties of HER
her for shape contours has several nice invariance properties. In particular, we list the
most important below.
• Translation invariance. her is obviously invariant to any translation of the object in
the image space, since the reference system gets translated together with the center
of the object.
• Rotation invariance. her is invariant to object rotation for any integer multiple
of π/2. Indeed, in the case of continuos signals, it is invariant for any angle: a
rotation of the object only causes a change of phase on γ(s) and therefore on the
her representation.
• Reflection invariance. Mirror reflection is equivalent to a change in the direction
of the contour scan used to build the 1-D signal. However, the maxima get or-
dered according to their magnitude, not their order of occurrence in the scan; as a
consequence, there’s no change in the vector of ordered maxima.
• Scaling invariance.Suppose that an object A has two adjacent maxima m1 and m2
in the curvilinear coordinate representation of its contour C. Let d be the distance
between m1 and m2. Now perform a scaling transformation on A obtaining A′ (and
m′1, m
′2, C
′ and d′). Then, there is the following relation between the length of the
contours |C| and |C ′| in the original object and its scaled version:
|C||C ′| =
d
d′. (4.15)
If we represent C and C ′ in curvilinear coordinates, then the following is also true:
|γ||γ′| =
d
d′, (4.16)
where d and d′ represent the distance between any two maxima resp. in A and A′.
Summing up, if we want the system to be fully invariant to scaling, all we have to
do is substituting the test for ‘approximate equality’ between y1 and y2 with a test
for ‘approximate linear dependence.’
42 Chapter 4. A Hierarchical Representation for Image Retrieval
Table 4.1: Invariance properties of her for contours
Contrast Scaling YES
Luminance Shifting YES
Rotation YES
Reflection YES
Translation YES
Zoom YES*
These invariance properties are summarized in Table 4.1. The asterisk in the Zoom
invariance row means ‘yes, if we change the test to include (approximate) linear depen-
dence.’
A few words about the ‘approximate equality’ test are in order. Techniques for approx-
imate matching utilizing feature extraction usually utilize the distance in feature space as
the main indicator of similarity.
Let T be a mapping from image space I to feature space F . T is said to be a complete
mapping iff any element XI ∈ I has exactly one image XF = T (XI) ∈ F . Non-completemappings may still be usable for pattern recognition tasks if they meet the weaker (and
more fuzzily defined) condition of separating clusters in feature space. Let XI and YI be
two objects in I and let XF = T (XI) and YF = T (YI) be their images in feature space;
let D(·, ·) be G(·, ·) distance metrics resp. in I and F .When the transformation T represents, for instance, a Fourier transform with trunca-
tion of all but the first α harmonics, Parseval’s theorem tells us that
G(XF , YF ) ≤ D(XI , YI) (4.17)
limα→N−1
Gα(XF , YF ) = D(XI , YI), (4.18)
where N is the number of points in the input signals—in this context, supposing we are
dealing with contour signals, the number of pixels in the contours of XI and YI .
When T denotes the her representation, we have a similar result:
G(XF , YF ) ≤ D(XI , YI) (4.19)
limα→N
Gα(XF , YF ) = D(XI , YI), (4.20)
Chapter 4. A Hierarchical Representation for Image Retrieval 43
(D)
(A)
(C)(B)
Figure 4.8: Converting a 2-d texture into a 1-d time series. (A): what the texture looks
like; (B): the partition element (texture tile); (C): the spiral; (D): the resulting 1-d signal.
only that in this case α represents the number of maxima utilized in the her representation
rather than the number of Fourier harmonics.
The value that provides the best tradeoff between ease of computation and effective
results is dependent on the specific shapes of the objects we are dealing with. This is true
for both her and Fourier transform based methods.
4.4 HER for Textures
How do we generate a time series—that is, a sequence—from intrinsically 2-d texture
data? Our solution is that of following a spiral path in the texture element, as shown in
Fig. 4.8.
4.4.1 Invariance Properties
Once the index data is extracted as shown above, it is organized into a spatial data access
structure for subsequent searching. The structure of choice in this case is a k-d-tree.
As stated earlier, this method is invariant to some types of image transformations.
Here are a few more details about the existing invariances.
44 Chapter 4. A Hierarchical Representation for Image Retrieval
(a) (b)
1 2 3 6 9* 8 7 4 5* - 1 ...
1 2 3
6
987
4 5
7* 4 1 2 3 6 9* 8 5 - 7 ...
1
5
9 6 3
7 4
8 2
Figure 4.9: Rotating the partition element yields different local maxima
Let us consider pixel transformations first. Contrast scaling—that is, multiplying all
image pixel values by some constant—does not change the relative energy of any pixel. For
this reason, none of the data generated by the algorithm undergoes any change. Similarly,
luminance shifting—that is, summing a constant to all the pixel values—does not change
the relative order of the maxima and therefore does not modify the index.
As for geometric transformations, translation is not even ‘seen’ by our method, since
it only deals with data coming from a segmentation phase. Image rotation/reflection and
zooming, however, do change the partition element (texture tile) and in general yield
different index data. Consider the theoretical partition element shown in Fig. 4.9 as an
example. The local maxima in the sequence are marked by asterisks. As a clockwise
rotation of π/2 is applied, the maxima reported by the spiral method change. However, it
can be seen that ‘real’ 2-d maxima (i.e., the value 9) are always reported; what happens for
different rotated versions of the element is that ‘spurious’ 1-d maxima appear. A similar
line of reasoning comes into play with mirror reflections and zoom.
Summing up, it is possible to make her for textures invariant to both rotations and
zoom if the maxima are located with a true 2-d algorithm and if the computation of
relative energy (Step H3 in the algorithm) is done with a 2-d, rather than 1-d, Gaussian.
Table 4.2 summarizes the invariance properties for texture-based retrieval. The aster-
isks denote issues that the method can be made invariant to, provided some caution is
exercised (e.g., the 4-distance in Step H2 should be normalized with respect to the size of
the element). Even as it is now, however, the method appears to be robust with respect
to transformations—such as rotation—for which there is no theoretical (that is, absolute)
Chapter 4. A Hierarchical Representation for Image Retrieval 45
Table 4.2: Invariance properties for of her for textures
Contrast Scaling YES
Luminance Shifting YES
Rotation NO*
Reflection NO*
Translation YES
Zoom NO*
invariance. This is illustrated in more detail in Section 4.4.2, which deals with the results
obtained by experimenting with the method.
4.4.2 Experimental Results
Several experiments have been performed in order to assess the validity of her for textures.
For these tests, the main database used was the Brodatz set of textures [6].
The original Brodatz dataset included 167 textures, but we added several transformed
versions in order to test the robustness of retrieval. Furthermore, we have scaled the
textures down to 32× 32 pixels from the original 128× 128, and kept only the luminancechannel, thus obtaining 8-bit images. There are several reasons for this choice, the main
one being that in most practical applications applications the texture element is usually
limited to sizes in the range of 16× 16 to 64× 64.The experiments were mainly aimed at assessing the robustness to pixel transforma-
tions and geometric transformations. For this reason, the database was augmented with
variations of the original textures including the following:
• Luminance-shifted versions (made brighter and darker by different amounts);
• Color-reduced versions where the colors were reduced in number from the original 256
to 16;
• JPEG encoded and decoded versions (average compression factor ranging from 1 : 20
to 1 : 30;
• Contrast-scaled versions, with different amounts of scaling;
46 Chapter 4. A Hierarchical Representation for Image Retrieval
• Noisy versions, with different amounts of Gaussian noise added—10%, 20% or 50% of
the total dynamic range (that is, since we are dealing with 8-bit images, average 25.5,
average 50 and average 128);
• Mirror-reflected versions;
• Rotated versions (only integer multiples of π/2 were considered).
A part of the augmented database can be seen in Fig. 4.10. As an example, Element
#1 is Bark.0000 and its transformed versions are in positions 3–16 inclusive; Element #2
is Metal.000 and its transformed versions are in positions 172–185.
In the first set of experiments, we used one of the original textures as the query and
looked at the returned results to see which transformed versions came up highest (closest)
in the answer set. In all cases, there were several matches at distance 0 in feature space
from the query texture. In particular, color reduction, contrast scaling and luminance
shifting do not change the value of the index and therefore come out at distance 0. Other
transformations do change the index, but the change is small enough for the transformed
image to be returned in the top positions of the answer set, just after the images at
distance 0.
The results of a sample query are depicted in Fig. 4.11. As can be seen, the first
8 matches consist entirely of transformed versions of the query image. The following
8 matches (that is, those in the second row) contain several spurious elements, including
Fabric.0005 and some of its transformed versions. Similar results were obtained for all
the 15 query tiles that were used in our full-database retrieval experiments.
Another example is shown in Fig. 4.12. In this case the query image is Bark.0000 ;
as before, the first row consists entirely of transformations of the query image. Some
spurious results such as Food.0000 and Fabric.0000 appear in the second and third rows,
along with other representatives of the Bark family.
A nice side effect of any query is that of partitioning the database into bins (clusters)
which are plainly visible by graphing the distances. As an example, Fig. 4.13 shows the
250 smallest distances from the images in the database to the query image—in this case,
Fabric.0001 (Element #36 in Fig. 4.10). The distances are sorted in increasing order.
The apparent plateaus or quasi-plateaus in the graph point to clusters of similar (or
index-identical) images in the database.
Chapter 4. A Hierarchical Representation for Image Retrieval 47
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0
16
32
48
64
80
96
112
128
144
160
176
192
208
224
240
Figure 4.10: A selection of 256 tiles from the complete texture database utilized for the
experiments
48 Chapter 4. A Hierarchical Representation for Image Retrieval
Figure 4.11: Results of a sample query: Metal.0000 (Element #2 in Fig. 4.10)
Figure 4.12: Results of a sample query: Bark.0000 (Element #1 in Fig. 4.10)
Chapter 4. A Hierarchical Representation for Image Retrieval 49
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
0 50 100 150 200 250
dist
ance
RANK
Figure 4.13: Distances from Fabric.0001 (#36 in Fig. 4.10) to the closest 250 matches
in the database
50 Chapter 4. A Hierarchical Representation for Image Retrieval
Another set of experiments (‘focus experiments’) was performed on restricted versions
of the database, containing the full set of original textures and only the transformations
of a single tile. In these cases the query tile was one of the transformed versions. We
found that in most cases the original and all transformed versions of the query image were
retrieved in the very first positions of the answer set, the one exception being the versions
with added Gaussian noise. This was to be expected, since the addition of substantial
amounts of noise tends to distort the textures significantly, especially considering the
small size of the tiles (32× 32 pixels).Table 4.3 summarizes the results obtained using 10 query images. For each query,
we fixed the parameter σ at several different values; furthermore, we fixed the “residual
energy” termination parameter (the threshold for alternate termination in Step H6 of the
algorithm) at different values, too. Each original image in the Brodatz dataset has several
transformed versions present in our database; the average number of such variations is 16.
Let us define a ‘relevant match’ for a given query image Q another image, present
in the database, which can be obtained from Q by applying one of the above mentioned
transformations. For instance, if the query image is a rotated version of Bark.0000, then
the original Bark.0000 is a relevant match, as is another, differently, rotated version.
The column labeled ‘match/12’ in Table 4.3 reports how many relevant matches are
found in the first 12 elements of the answer set. It should be noted that such ‘closest’
relevant matches are always consecutive, with no false alarm in between. Given the in-
variance and robustness properties described above, this is to be expected. The columns
‘match/24’ and ‘match/36’ give similar information for larger answer sets. In this case,
there are intervening false alarms.
As the last remark about Table 4.3, the column labeled σ was obtained by fixing σ
to a particular value. This has a direct impact on the number of maxima found by the
algorithm. In effect, this parameter is a normalized quantity, between 0 and 1, proportional
to the minimum spacing between maxima, in the sense that two local maxima falling less
than σ ·N pixels apart will count as a single one.
When all the maxima in the signal are significant, even if very close to one another, then
small values of σ are better suited; on the contrary, when not all maxima are significant,
higher values of σ can help to reduce the influence of noise. In the case of contours,
all maxima are usually important, since a maximum generally represents a maximum
Chapter 4. A Hierarchical Representation for Image Retrieval 51
E% σ match/12 match/24 match/36
5 .1 9 12 13
10 .1 8 8 8
15 .1 8 8 13
20 .1 8 9 13
25 .1 8 13 13
30 .1 9 12 13
35 .1 8 8 9
5 .15 10 10 10
10 .15 10 11 11
15 .15 11 12 12
20 .15 9 10 10
25 .15 9 9 9
30 .15 8 8 9
35 .15 9 11 14
5 .2 9 12 13
10 .2 9 12 12
15 .2 8 12 13
20 .2 9 9 12
25 .2 8 8 10
30 .2 12 12 12
35 .2 12 12 12
5 .25 8 8 8
10 .25 10 12 12
15 .25 9 9 11
20 .25 10 11 12
25 .25 9 11 11
30 .25 10 11 12
35 .25 11 11 11
Table 4.3: Tabular results of texture-based retrieval performed on the extended Brodatz
data set
52 Chapter 4. A Hierarchical Representation for Image Retrieval
elongation point of the contour, that is, a vertex of the object [15]. In the case of textures,
as illustrated in Table 4.3, going from small to large values of σ yields the expected decrease
in the number of maxima, but does not affect the quality of the answer set in an apparent
way.
We made several experiments using specific cut-down versions of the database. For
each experiment, the databases contained 168 elements: the original 167 Brodatz tiles,
plus a single transformed version of one of them. In each case, querying the system
with the transformed version yields the original (or vice versa). The only exception is
with Gaussian noise addition: this particular transformation has a strong effect on the
resulting texture and irreversibly destroys information, even more so since we are dealing
with small texture tiles (32 × 32). In the cause of Gaussian noise addition of 10% and
20% intensity, the average rank of the relevant match was a little more than 4: about 3
for 10% noise, about 5 for 20% noise.
Another remark that can be made is that increasing the processed fraction of signal
energy—and then the number of maxima, i.e., of features—does not significantly improve
the quality of the retrieval. Indeed, in some cases, the number of relevant matches de-
creases. Looking at Figure 4.14, the near-linear relation between signal energy and number
of maxima is apparent. However, the 4 graphs in Figure 4.15 show that a larger number
of maxima has no evident relation to the quality of the answer set. This is due to the
fact that increasing the energy introduces in the index some lesser maxima, which do not
characterize the element and therefore give way to additional false alarms. In other words,
it’s not the number of features, but their relevance that makes the difference.
4.4.3 Comparison with a Wavelet Based Method
There are several methods available for image retrieval, and the methods based on the
multiresolution formulation of wavelet transforms are among the most reliable and ro-
bust [44, 45]. The scientific literature features many techniques based on wavelets (for a
sample, see [27,47,5]), and for this comparison we picked one method that shows interest-
ing characteristics and good performance. Its name is hs (Hierarchical Signature) [2].
The comparison was aimed at assessing the efficiency and effectiveness of the retrieval.
In particular, efficiency is related to the computational requirements and to the index size,
while the effectiveness has to do with the quality of the answer set. Methods based on
Chapter 4. A Hierarchical Representation for Image Retrieval 53
0
10
20
30
40
50
60
70
5 10 15 20 25 30 35
Num
ber
of m
axim
a
Energy %
sigma=0.10sigma=0.15sigma=0.20sigma=0.25
Figure 4.14: Relation of the energy fraction used to the number of maxima found (index
size)
54 Chapter 4. A Hierarchical Representation for Image Retrieval
6
7
8
9
10
11
12
13
0 10 20 30 40 50 60 70
Rel
evan
t mat
ches
N. of maxima
sigma=0.10
6
7
8
9
10
11
12
13
5 10 15 20 25 30 35 40 45
Rel
evan
t mat
ches
N. of maxima
sigma=0.15
6
7
8
9
10
11
12
13
5 10 15 20 25 30 35 40
Rel
evan
t mat
ches
N. of maxima
sigma=0.20
6
7
8
9
10
11
12
13
5 10 15 20 25 30 35
Rel
evan
t mat
ches
N. of maxima
sigma=0.25
Figure 4.15: A graphical view of the outcome of texture-based retrieval performed on
the extended Brodatz data set
Chapter 4. A Hierarchical Representation for Image Retrieval 55
the wavelet transform require a time of at least O(N logN), where N is the number of
pixels in the image. Most methods, including hs, go as far as O(N2), due to the additional
processing involved in index construction.
In the case of such multiresolution approaches, the size of the index data for a single
image depends heavily on the so-called ‘detail level’ where the match is performed. Usually,
the match is performed on all levels, in a hierarchical fashion or following some other
scheme. hs, however, chooses an ‘optimal level’. Usually, it is the deepest level, that is,
the biggest, but which level is optimal does depend on the image and the whole database,
so that it is not easy to decide a priori. Therefore, the usual course of action for hs is
having the index made up with data from all the levels, then choosing the optimal one for
matching at run time only. This has the disadvantage of yielding a bigger index, but at
the same time in principle it allows lossless compression and decompression of the image
to be integrated with the database management system, which is often a desirable feature.
As for the quality of the retrieval, wavelet-based approaches are very robust and tol-
erate even the addition of Gaussian noise to the query texture without too negative con-
sequences. As an example, the average rank of Gaussian-perturbed textures in ‘focus
experiments’ is 2 (it is 4 for heat). With regard to other transformations, the results are
very similar to heat. In the usual working conditions, as few as 4 or 5 maxima are usually
enough to characterize a texture in an effective way. Indeed, having too many maxima in
the index does not improve on the performance, as shown in Fig. 4.15. As a consequence,
the typical size of heat indices is rather small.
On the other hand, a typical wavelet-based index requires about a hundred coefficients
to work with good accuracy. Summing up, heat’s performance in terms of quality are very
close to those of methods based on the wavelet transform, but it is much less costly in terms
of computing resources and index size. Additionally, as stated above, this representation
can be effectively used for different kinds of data; in particular, it was originally designed to
work with object contours! Table 4.4 summarizes the different strengths of heat and hs.
Summing up, it might be suggested that hs would be a more appropriate choice when-
ever we have to deal with JPEG images, or when it is desirable to have a database of
compressed images (in which case, if the compression could be integrated into the DBMS
paradigm), or when the pattern of induced noise is Gaussian-like (transmission errors).
This suggests the retrieval of images from the Internet as a very likely scenario.
56 Chapter 4. A Hierarchical Representation for Image Retrieval
Table 4.4: Quick comparison between hs and heat
Feature hs heat
Efficiency
Time√
Space√
Effectiveness
Luminance shift√
Contrast scaling√
JPEG√
Gaussian noise√
Color reduction√
Interoperability
Integrated compression√
Integrated contour√
On the other hand, heat works with small index sizes and is very quick in both index
construction and matching. It is invariant to transformations such as luminance shifting
and contrast scaling, and it is very easily combined with other matches based on e.g.,
contour data. All this suggests a DBMS dealing with medical images as an ideal field of
application: the majority of relevant objects in medical images are characterized by their
texture and their shape (contour); furthermore, lluminance shifting and contrast scaling
are typical disturbances that are likely to be introduced by medical imaging systems such
as tomography.
4.5 Experimental Results
In order to test the performance of heri, we implemented it on a 233MHz Pentium-II
system using Matlab under Windows 98. Our experiments have been performed on a
number of databases of different size. As a general result, it appears that five maxima are
generally enough to capture a significant amount of signal energy for the signal sizes we
dealt with. However, it must be said that the percentage of energy needed to describe the
contours with adequate precision is strongly dependent on the type of signal examined
Chapter 4. A Hierarchical Representation for Image Retrieval 57
and the type of application.
Before showing the detailed results, we briefly review the evaluation criteria usually
adopted for the testing of retrieval systems. The recall measures the system’s ability to
retrieve all relevant objects, while the precision measures the system ability in retriev-
ing only relevant objects. Another indicator that is often used is the normalized recall
(NR) [38], which is defined as follows.
Suppose we have a database D of |D| objects where the number of objects relevant toour query is N < |D|. Besides, suppose that the relevant objects are sorted a priori so
that the most relevant object is X1, down to the less relevant object XN . We perform a
query that returns an ordered answer set A. Let ri be the rank of Xi in the answer set A.
The ideal rank (IR) is then defined as
IR =1N
N∑i=1
i =N + 12
(4.21)
(note that it does not depend on A); the average rank (AR) of A is
AR =1N
N∑i=1
ri (4.22)
The difference AR − IR gives a measure of the effectiveness of the system. It is usually
normalized in order to obtain a value between 0 and 1 known as normalized recall (NR):
NR =AR− IR|D| −N . (4.23)
We have evaluated heri’s performance with the use of 10 heterogeneous query time series
selected from a database of 1500. For each of the 10 query time series, we manually ranked
the 20 most similar objects in the database in order to compute the NR. The number of
maxima utilized for our tests is 10. We also performed the same queries using Euclidean
distance as a similarity measure. As can be seen, Heri outperforms ed even if the former
utilizes only 10 coefficients against the latter’s 15.
A visual comparison between heri and ed (see resp. Figures 4.16 and 4.17) shows the
effectiveness of heri, which is able to capture the ‘thin-shaped fish’ quality in the answer
set. ed, on the contrary, returns a few relevant objects, but also a false alarm (a rabbit).
Moreover, Figure 4.18 shows the results obtained with a database where rotated versions
of the query had been added. heri’s ability in retrieving rotated versions of the query
58 Chapter 4. A Hierarchical Representation for Image Retrieval
Table 4.5: Comparison between heri, Euclidean distance (ed) and a moment-based
technique (mbt) in terms of normalized recall
Measure Value HERI ED MBT
Size of DB 1500
Number of queries 10
Normalized recall 0.984 0.971 0.960
is apparent. The rotated replicas of the query appear first, since they are at distance 0
from the query—this is a consequence of her’s invariance to integer rotation discussed in
Section 4.3.1.
In order to evaluate heri’s efficiency, we have also performed comparative tests against
the moments-based technique [30, 36] considering the first two moments and their ratio.
As shown in Table 4.5 and Figure 4.19, this is the worst performing technique, having
many false dismissals.
Table 4.6 shows the results of a different set of queries over a slightly larger database
(1600 objects). For this experiment, answer set size was 70 throughout. Here is a discus-
sion of the data in the table: ‘% Enrg.’ is the fraction of the total signal energy utilized
for constructing the index. The value 10 for heri means that the threshold in Figure 4.1,
Step H6 has been set to 90%. The ‘Avg. dist.’ column contains the average distance be-
tween all database objects in feature space. In image space, the average distance is 5399.5,
as can be deduced from the ‘ed’ row (ed is calculated on the unmodified contours). The
‘Avg. A.S. dist.’ column shows the average inter-object distance (in feature space) inside
the answer set only. This quantity can give some insight on the average cluster size.
Finally, the ‘NR’ column shows the normalized recall obtained.
As can be seen, heri achieves good results while using only a little fraction of the total
energy present in the signal. The energy utilized has a raw relation to the number of actual
signal samples used in the index. What this ultimately means is that the whole index is
much lighter, since it contains less information. We have found that in many cases, 50%
(or even 10%!) of the signal energy is enough to achieve good retrieval results. Euclidean
distance, on the other hand, is calculated by taking into account all of the signal’s samples.
However, the optimal threshold to be used for the representation, and therefore the actual
Chapter 4. A Hierarchical Representation for Image Retrieval 59
Figure 4.16: An example of retrieval using heri
Table 4.6: Detailed comparative results for Euclidean Distance (ed), heri and moment-
based technique (mbt)
Technique%
Enrg.Avg.dist.
Avg.A.S.dist. NR
ed 100 5399.50 15.86 0.74
heri 10 611.07 7.73 0.84
mbt — 5960.43 16.02 0.62
60 Chapter 4. A Hierarchical Representation for Image Retrieval
Figure 4.17: An example of retrieval using Euclidean Distance
Chapter 4. A Hierarchical Representation for Image Retrieval 61
Figure 4.18: An example of heri’s ability to retrieve rotated versions of the query
62 Chapter 4. A Hierarchical Representation for Image Retrieval
Figure 4.19: An example of retrieval utilizing a moment-based technique
Chapter 4. A Hierarchical Representation for Image Retrieval 63
amount of signal energy used, is strongly dependent on the data contained in the database.
Another interesting fact emerging from the table is that the moment-based mapping of
image space into feature space is really an expansion rather than a contraction: the average
distance in feature space is more than in image space.
Finally, in order to attain a more efficient search, we utilized a spatial access structure.
Spatial access methods are techniques that allow for faster access to spatially organized
data [3, 26,29,31,39,46]. heri utilizes K-d-Trees [4] as its spatial access method.
4.6 Concluding Remarks
This chapter presented heri, a novel technique for content based retrieval of time series.
HERI is based on a Hierarchical Entropy based Representation (her) for 1-dimensional
signals that utilizes the local signal maxima as well as the entropy associated to each
of them. The experiments show that that this representation is both effective and very
general. In fact, in this paper the time series were really curvilinear representations of
the contours of shapes. Actually, this model can be profitably employed whenever it is
possible to obtain a 1-D representation of the patterns to be retrieved by content.
References
[1] R. Agrawal, C. Faloutsos, A. Swami. “Efficient similarity search in sequence
databases.” Proc. Foundations of Data Organization and Algorithms (FODO)
Evanston, IL, Oct. 1993. 14
[2] M. G. Albanesi, M. Ferretti, A. Giancane. “Robust hierarchical indexing based on
texture features.” Journal of Visual Languages and Computing 11, pp. 383–404, 2000.
52
[3] Beckmann, H. P. Kriegel, R. Schneider, B. Seeger. The R∗-tree: An efficient and
robust access method for points and rectangles. Proc. ACM SIGMOD, pp. 322–331,
May 1990. 10, 63
[4] J. L. Bentley, “Multidimensional binary search trees used for associative searching,”
Comm. ACM, Vol. 18, No. 9, pp. 509–517, Sept. 1975. 10, 63
[5] C. Brambilla, A. D. Ventura, I. Gagliardi, R. Schettini. “Multiresolution wavelet
transform and supervised learning for content-based image retrieval.” IEEE Int’l Con-
ference on Multimedia Computing and Systems, Vol. 1, 1999, pp. 183–188. 52
[6] P. Brodatz, Textures, A Photographic Album for Artists and Designers, Dover
Publications, New York, 1966. Avalaible (128 × 128) in a single tar file:
ftp://ftp.cps.msu.edu/pub/prip/textures/ 45
[7] S. K. Chang, Q. Y. Shi, C. W. Yan. “Iconic indexing by 2D-strings.” IEEE Trans.
Pattern Analysis Mach. Intell., 9(3), pp. 413–427, 1987. 9
[8] Y. C. Chang, B. K. Shyu, S. J. Wang, “Region-based fractal image compression with
quadtree segmentation”, Proc. ICASSP’97, Munich, 1997. 4
66 References
[9] A. Del Bimbo, M. Campanai, P. Nesi. “A 3-dimensional iconic environment for image
database querying.” IEEE Trans. Soft. Eng. 19(10), pp. 997–1011, March 1993. 9
[10] A. Del Bimbo, P. Pala. Visual image retrieval by elastic matching of user sketches.
IEEE Trans. Pattern Analysis Mach. Intell., 19(2), Feb. 1997. 9
[11] A. Del Bimbo, M. De Marsico, S. Levialdi, G. Peritore, “Query by dialog: and inter-
active approach to pictorial querying,” Image and Vision Computing 16, pp. 557–569,
Elsevier, 1998.
[12] M. De Marsico, L. Cinque, S. Levialdi. Indexing pictorial document by their content:
A survey of current techniques. Image and Vision Computing Vol. 15, p. 119–141,
1997. 9
[13] R. Distasi, M. Nappi, S. Vitulano. Speeding up fractal encoding of images using
a block indexing technique. Proc. ICIAP‘97, Lecture Notes in Computer Science
vol. 1311, p. 101–107, Springer-Verlag, Florence, Sep. 1997.
[14] R. Distasi, M. Polvere, M. Nappi. “Split-decision functions in fractal image coding.”
IEE Electronics Letters 34(8), pp. 751–753, Apr. 1998. 12
[15] R. Distasi, D. Vitulano, S. Vitulano, “A hierarchical representation for content based
image retrieval,” Journal of Visual Languages and Computing, Special Issue on Mul-
timedia Databases and Image Communication, Vol. 5, n. 8, Aug. 2000. 30, 52
[16] R. Distasi, M. Nappi, M. Tucci, S. Vitulano. “Image indexing by contour analysis: a
comparison” Proc. IVWF4, Ischia, 2001, IEEE.
[17] R. Distasi, S. Vitulano. “Robust image retrieval based on texture information” Proc.
MDIC2001, Lecture Notes in Computer Science vol. 2184, Springer-Verlag, Amalfi,
2001.
[18] R. Distasi, M. Nappi, M. Tucci, S. Vitulano. “ConText: A technique for image re-
trieval integrating CONtour and TEXTure information” Proc. ICIAP2001, Palermo,
2001, IEEE.
References 67
[19] R. Distasi, S. Vitulano. “A Hierarchical Entropy Based Representation for Medical
Signals”, in V. Cantoni et al., eds., Human and Machine Perception 3: Thinking,
Deciding and Acting, Kluwer/Plenum Academic Press, New York, 2001. 30
[20] C. Faloutsos, W. Equitz, M. Flickner, W. Niblack, D. Petkovic, R. Barber. Efficient
and effective querying by image content. Journal of Intelligent Inf. Systems, 3(3/4),
p. 231–262, July 1994. 9
[21] Y. Fisher, Fractal Image Compression—Theory and Application, Springer–Verlag,
New York, 1994. 3, 9
[22] M. Flickner et al. “Query by image and video content: the QBIC system.” IEEE
Computer. “Finding the right image.” Special Issue on Content Based Image Retrieval
Systems, 28(9), pp. 23–32, Sep. 1995. 9
[23] T. Gevers, A. W. M. Smeulders. PicToSeek: a color image invariant retrieval system.
In A. W. M. Smeulders, R. Jain, eds., Image Databases and Multi-Media Search,
Series on Software Engineering and knowledge Engineering, vol. 8, p. 25–37, World
Scientific. 19, 20
[24] U. Glavitsch, P. Schauble, M. Wechsler, “Metadata for integrating speech documents
in a text retrieval system,” Sigmod Record, Vol. 23, No. 4, Dec. 1994. 29
[25] G. H. Grunlund, “Fourier preprocessing for hand print character recognition,” IEEE
Trans. Computers Vol. C21, 1972. 29
[26] A. Guttman, “R-Tree: A dynamic index structure for spatial searching,” Proc. ACM
SIGMOD, Boston, pp. 45–47, June 1984. 63
[27] C. E. Jacobs, A. Finkelstein, D. H. Salesin. “Fast multiresolution image querying,”
In Proc. ACM SIGGRAPH 95, NY, 1995, pp. 278–280. 52
[28] A. E. Jacquin. Image coding based on a fractal theory of iterated contractive image
transformations. IEEE Trans. Image Proc., vol. 1, pp. 18–30, Jan. 1992. 12
[29] H. V. Jagadish, “Linear clustering of objects with multiple attributes,” Proc. ACM
SIGMOD, pp. 332–342, Atlantic City, May 1990. 63
68 References
[30] A. K. Jain, Fundamentals of Digital Image Processing, Computer Science Press,
Rockville, 1989. 30, 58
[31] I. Kamel, C. Faloutsos, “On packing R-trees,” Proc. CIKM, 2nd International Conf.
on Information Knowledge Management, Nov. 1993. 63
[32] S. Y. Lee, F. J. Hsu. “Spatial reasoning and similarity retrieval of image using 2D
C-String knowledge representation.” Pattern Recognition, 25(3), pp. 305–318, 1992.
9
[33] M. Nappi, G. Polese, G. Tortora. FIRST: Fractal Indexing and Retrieval SysTem
for image databases. Image and Vision Computing 16(14), p. 1019–1031, Elsevier
Science, Dic. 1998. 10
[34] A. V. Oppenheim and R. W. Schafer, Digital Signal Processing, Prentice-Hall, Engle-
wood Cliffs, NJ, USA, 1975. 38
[35] M. Otterman, “Approximate matching with high dimensionality R-trees,” M.Sc.
scholarly paper, Dept. of Computer Science, Univ. of Maryland, MD, USA, 1992.
[36] E. G. M. Petrakis, C. Faloutsos. “Similarity searching in medical image databases.”
IEEE Trans. Knowledge and Data Eng. 9(3), pp. 435–447, May/June 1997. 9, 58
[37] M. Polvere, M. Nappi. “Speed-up in fractal image coding: comparison of methods.”
IEEE Trans. Image Processing 9(6), pp. 1002–1009, June 2000. 10, 12
[38] G. Salton and M. J. McGill, Introduction to Modern Information Retrieval, McGraw-
Hill, 1983. 29, 57
[39] H. Samet. The Design and Analysis of Spatial Data Structures. Addison-Wesley, 1989.
13, 63
[40] D. Saupe, R. Hamzaoui, H. Hartenstein, “Fractal Image Compression–An Introduc-
tory Overview,” in Saupe D. and Hart J. (Eds): “Fractal Models For Image Synthesis,
Compression and Analysis”. SIGGRAPH 96 Course Notes, ACM, New Orleans, 1996.
3
[41] D. Saupe, S. Jacob, “Variance-based quadtrees in fractal image compression”, Elec-
tronics Letters 33,1, pp. 46–48, 1997. 4
References 69
[42] S. Sclaroff. “Distance to deformable prototypes: encoding shape categories for efficient
search.” In A. W. M. Smeulders, R. Jain, eds., Image Databases and Multi-Media
Search, Series on Software Engineering and Knowledge Engineering, vol. 8, 1997,
pp. 25–37, World Scientific. 20
[43] S. Sclaroff. Image database used in shape-based retrieval experiments available via
ftp at ftp://cs-ftp.bu.edu/sclaroff/pictures.tar.Z. 20
[44] G. Strang, T. Nguyen. Wavelet and Filter Banks, Cambridge Press, Cambridge, 1997.
52
[45] P. N. Topiwala. Wavelet Image and Video Compression. Kluwer Academic, 1998. 52
[46] J. D. Ullman. Principles of database and knowledge-based systems. Computer Science
Press, Rockville, MD, USA, 1988. 10, 63
[47] G. van de Wouwer, P. Sceunders, S. Livens and D. van Dick. “Wavelet correlation
signature for color texture characterization,” Pattern Recognition 32, 1999, pp. 443–
451. 52
[48] Various Authors, Finding the Right Image. IEEE Computer. Special Issue on Content
Based Image Retrieval Systems, 28(9), Sep. 1995.
[49] C. T. Zahn and R. Z. Roskies, “Fourier descriptors for plane closed curves,” IEEE
Trans. Computers, Vol. C21, 1972. 29, 40
Appendix A
Additional Details
A.1 Fractal Index Invariance to Contrast Scaling
Consider the original block b and the transformed block b′ = wb.
A.1.1 Center of mass
We can limit our attention to the x coordinate, since an identical line of reasoning applies
to y. For b, we have
x =1M
∑1≤i≤n1≤j≤n
ibi,j .
Obviously,
M ′ =∑
1≤i≤n1≤j≤n
b′i,j =
∑1≤i≤n1≤j≤n
wbi,j = wM.
The x coordinate of the center of mass in the transformed block b′ is then
x′ =1M ′
∑1≤i≤n1≤j≤n
ib′i,j =
1wM
∑1≤i≤n1≤j≤n
i(wbi,j) = x. (A.1)
This simple argument proves invariance of mass center position for the first iteration
(k = 0) of Eq. (3.5). Subsequent iterations (k > 0) can be handled as follows.
A.1.2 Higher Deviates
The following relation is easy to prove by induction:
b′(k)i,j = w2k
b(k)i,j . (A.2)
72 Appendix A. Additional Details
The case k = 0 has been shown in the previous discussion. For k > 0, assume Eq. (A.2)
as the induction hypothesis and observe that this implies µ′k = w2kµk. We then have
b′(k+1)i,j = (b′(k)
i,j − µ′k)2
= (w2kb(k)
i,j − w2kµk)
2
= w2k+1(b(k)
i,j − µk)2
= w2k+1b(k+1)
i,j .
In words, higher deviates of the original and the transformed block are still proportional,
albeit with a different factor depending on k. Therefore, we can apply Eq. (A.1) to b(k)
and b′(k) and state that (x′k, y′k) = (xk, yk) for all k ≥ 0.
A.2 Fractal Index Invariance to Luminance Shifting
Consider the original block b and the transformed block b′ = b+m1.
A.2.1 Center of Mass
We utilize the following notations: CM(b) indicates the center of mass of the n×n block b,
while M(b) indicates its mass. The symbol 1 stands for an n×n block filled with 1’s. Thecalculations are done in an orthogonal coordinate system centered in the geometric center
of the block.
CM(b′)
= CM(b+m1)
=M(b)CM(b) + M(m1)CM(m1)
M(b) +M(m1)
=M(b)CM(b) +mM(1)(0, 0)
M(b) +mM(1)
=M(b)
M(b) +mn2CM(b).
Since CM(b′) is a scalar multiple of CM(b), their polar angles are equal.
Index
AVRR
definition, 20
Brodatz, 45
canonical form, 18
center of mass, 71, 72
color change, 15, 18
color reduction, 45
contrast scaling, 15, 16, 44, 45, 71
databases
homogeneous vs. heterogeneous, 9
deviates, 71
domains, 11
entropy
as a split-decision function, 5
Euclidean distance, 29
feature vectors, 12, 13
‘focus experiments’, 50
Fourier descriptors, 29, 38
fractal coding
basics, 11
Gaussian distribution, 31
Gaussian noise, 46, 55
Hierarchical Signature, 52
histogram, 13
image partition, 3
invariance
in indexing, 10, 17, 18, 41, 43
isometries, 19
in fractal coding, 13
JPEG, 45
luminance shifting, 15, 17, 44, 45, 72
mass center, see center of mass
moments, 30
multichannel images, see RGB, YIC
Parseval’s theorem, 42
partition, 4, 13, 17
PicToSeek, 19, 22
quadtree, 3, 13, 17
ranges, 11
reflections, 16, 18, 41, 44, 46
RGB, 14
vs. YIQ, 12
RMS error, 4
rotations, 16, 18, 41, 44, 46
scaling (size), see zoom
sigma (σ), 33
73
74 INDEX
spiral, 43
split decision function, 3
standard deviation, 32
threshold, 33
adaptive vs. fixed, 6
translations, 41, 44
variance
n-fold, 4
standard, 4
with fixed threshold, 4
wavelets, 52
YIQ, 14
vs. RGB, 12
zoom, 41, 44