Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
1
A Clustering Method based on the Estimation of the Probability Density
Function and on the Skeleton by Influence Zones.
Application to Image Processing
M. Herbin, N. Bonnet *, and P. Vautrot
* Université de Reims, 21, rue Clément Ader, F 51100 Reims, France.
Abstract : This paper investigates a new approach for data clustering. The probability density
function (p.d.f.) is estimated by using the Parzen window technique. The p.d.f. thresholding
permits the segmentation of the data space by influence zones (SKIZ algorithm). A bottom-up
thresholding procedure is iterated to refine the segmentation. As a result, a complete partition
of the data space is obtained in parallel to the clustering of the data samples. In addition, an
estimation of the intrinsic dimensionality of the data set is provided. This approach of
clustering is tested with simulated data and applied to color image data.
Keywords : Clustering, Probability Density Function, Skeleton by Influence Zones.
I Introduction
The aim of clustering is to partition a data set into subsets of "similar" data in an unsupervised
manner. The data to be classified are represented by points in a n-dimensional space. The
coordinates of these points correspond to the n observed features. Many approaches of
clustering have been suggested (Duda and Hart, 1973; Fukunaga, 1972). Some of them use an
estimation of the probability density function (p.d.f.) and others do not. Among the latter
methods are the hierarchical clustering approaches : a similarity criterion between data is
defined and aggregation algorithms are used to produce a hierarchy of classes (Sokal and
Sneath, 1963). By cutting this hierarchy at a certain level, classes of objects can be obtained.
Among the methods based on the estimation of the p.d.f., some are
2
parametric approaches. These approaches assume that the shapes of the individual p.d.f. are
known and only the related parameters have to be estimated. This assumption is often implicit
: it is the case of the K-means or ISODATA procedures, which assume that the clusters' shape
is hyperspherical (when a Euclidean distance is used) or hyperelliptical (when the
Mahalanobis distance is used). The fuzzy C-means procedure (Dunn, 1974) corresponds to the
same category. Unfortunately, real world clusters may possess a shape which is far from being
hyperelliptical, and the classical clustering methods listed above often fail. Therefore, there is
a need for clustering algorithms which do not make any (implicit or explicit) hypothesis
concerning the shape of the p.d.f. Such algorithms have been suggested, among which : the
density gradient-based approach (Fukunaga and Hostetler, 1975; Devijver and Kittler, 1982),
the maximum entropy clustering approach (Rose, Gurewitz, and Fox, 1990), the blurring (or
mean-shift) approach (Cheng and Wan, 1992; Cheng, 1995), the mode detection and convexity
analysis (Touzani and Postaire, 1988; Postaire and Olejnik, 1994), the dynamic procedure of
splitting (Garcia et al, 1995). A detailed comparison of these methods in different conditions
of application is still to be made.
The purpose of this paper is to suggest and experiment with another method belonging to this
category. Like the valley-seeking approach (Koontz and Fukunaga, 1972), our method is based
on an estimation of the p.d.f. through the Parzen window technique. But since gradient
analysis can be very sensitive to local irregularities of the estimated p.d.f. (Kittler, 1976), we
do not follow this way of analysis. Instead, we have found more valuable the method of
iteratively thresholding the estimated p.d.f., starting from low threshold values and proceeding
upwards to high threshold values (i.e. from one class to the largest number of classes
corresponding to the modes of the estimated p.d.f.). At each level of thresholding, the
connected regions are detected in the n-dimensional space. The boundaries between these
connected regions (i.e. clusters) are estimated by using a classical image processing
transformation : the SKIZ (Skeleton by influence zones) (Serra, 1982). Once the highest level
thresholding is performed, the data space is partitioned. The boundaries are marked which
define the clusters. Not only are the data classified, but in addition, the intrinsic dimensionality
of the data set can be estimated during the course of the clustering process.
3
The outline of the paper is the following : in the next section, we describe the discretization of
the data space and the p.d.f. estimation. Then the iterative algorithm is described by defining
the clustering procedure. Section 3 is devoted to testing this procedure with simulated data, as
well as proposing applications in the field of image processing. A discussion and conclusion
are developed in the last section.
II Clustering method
The method we suggest is composed of three steps : the discretization of the data space, the
estimation of the p.d.f., and the segmentation of the discretized data space to determine the
clusters.
A Discretization of the Data Space Let Q be the set of n-dimensional data to be clustered : { }qj xxxQ ,...,...1= , and q the number
of data. Each datum x j (1≤ ≤j q) is a n-dimensional point : ( )njijjj xxxx ,,1, ,...,...= with
1≤ ≤i n .
The clustering domain D is a hypercube of the data space. We assume that all data x j
(1≤ ≤j q) belong to this domain D and :
D a bi ii n
=≤ ≤∏ ,1
with ( )ijqji xa ,1min
≤≤≤ and ( )ijqji xb ,1
max≤≤
> . (1)
In practical applications, the outliers could be excluded from D.
Each interval a bi i, is partitioned into intervals of length λi. Let pi equal the number of these
intervals. The partition of each a bi i, is defined as :
a b a k a ki i i i i ik pi
, , ( )= + + +≤ <
λ λ10� with λ i
i i
i
b ap
= − (2)
These partitions permit the p tiles to be defined as : p =≤ ≤Π
1 i n ip . These tiles cover the domain
D. Note that rectangular tiles are used in dimension 2 (i.e. n = 2).
4
Each tile Pr (1 ≤ ≤r p) is defined as :
Pr r i r i ii n
= +≤ ≤∏ α α λ, ,,1
with α λr i i r i ia p, ,= + and 0 ≤ <p pr i i, . (3)
The center of each tile Pr is a n-dimensional point xr with :
( )nrirrr xxxx ,,1, ,...,...= and x2r,i r,i
i= +α λ with 1≤ ≤i n . (4)
Note that the tile centers are the nodes of a rectangular grid in dimension two (n = 2). These p
points xr (1≤ ≤r p) define the set S as a discrete sampling in the data space. Each initial data
point x j (1≤ ≤j q) belongs to one tile of the domain D, and this data is associated with the
center point of this tile. Then each data is associated with one point of S.
B Estimation of probability density function
The probability density function (p.d.f.) is estimated for each point xr (1≤ ≤r p) of the set S
(the nodes of a rectangular grid if n = 2 ). S is a n-dimensional set of points and each point has
2n neighbours (except the border points). Using this neighbourhood connection, S is
considered as a n-dimensional network with pi elements per dimension (1≤ ≤i n).
Each point xr of this network is the center of the tile Pr and each tile has the same volume v
with : vi n i=
≤ ≤Π
1λ . Thus, the density of data in each tile is : d r = # P
vr where # Pr is the
percentage of initial data in the tile.
Using the classical Parzen windows approach (Duda and Hart, 1973), the p.d.f. is estimated at
each point xr by interpolation :
( ) ( )�≤≤
−=ps
srsr xxdxfdp1
... γ (5)
where ( )sr xx −γ is an interpolation kernel.
We choose for γ a n-dimensional Gaussian kernel. Note that the interpolation is a classical
discrete convolution. As the Gaussian filter is separable, the n-dimensional convolution is
obtained by n iterations of one-dimensional convolution on each dimension. Thus, we
compute only n kernels of one-dimensional Gaussian filter to estimate the p.d.f. values. Fig.1
shows a 900 data set in a two dimensional domain. The data set is obtained by using three
Gaussian distributions of 300 data. The domain is discretized and represented by the image A
of 512*512 pixels (i.e. a partition of D with 512*512 rectangular tiles) : the higher the p.d.f.,
5
the higher the grey level values of image pixels. Three p.d.f. estimations are given using three
standard deviation values in the Parzen window interpolation. The standard deviation values
are defined as : σ =18, σ = 26 and σ = 51 in pixel units for images B, C, and D respectively.
Let M be the maximum value of the p.d.f. estimation on the set S. L levels are selected
between 0 and M. Fig.1 shows the p.d.f. using 32 levels from 0 to 31 ( L = 32 ). The level at
each point xr is the integer value defined as :
( ) ( )��
���
� ×=M
xfdpLINTxlevel rr
... (with 1≤ ≤r p). (6)
In the following, these levels are used in place of p.d.f. values. Note that these levels
correspond to thresholds of the estimated p.d.f.
C Clustering Algorithm
We have L levels for the p.d.f. from 0 to L-1. At each level e (0 ≤ <e L), let Se be the set of
points with a density level higher than e. Thus S Se ⊆ . We use this set to define temporary
clusters of S at each level of density. First, coarse clusters are defined. Then the clustering is
refined by increasing the level. This refinement is obtained by splitting the previous clusters
using the SKIZ procedure described below.
Using the n-dimensional network defined with neighbourhood relations between the points of
S, the set Se is composed of connected regions (i.e. sets of connected points). c(Se) is defined
as the number of connected regions. The clustering algorithm is described using an example
(Fig.2) which shows the clustering process with four levels (L=4) in a two-dimensional data
set (n=2). The set S has 14*12 points. Each point is the center of one tile of the domain
partition, the grey grid shows the tile boundaries. The black grid shows the neighbourhood
connections between points (i.e. the network). For initialization (at level e = 0), we have : S0 =
S. Thus only one connected region is detected at this initial level (c S( )0 1= ) and there is only
one cluster as shown by the grey in Fig.2A. Let us describe the iteration from the level e to the
level e+1. Note that Se+1 is included in Se (S S Se e+ ⊆ ⊆1 ), thus a connected region at the level e
could be disconnected at the level e+1. In Fig.2B, the grey part shows S1, we have :
S S S1 0⊆ ⊆ and S1 has two connected regions. In a case of disconnection, there is a split. The
C cluster of the e level is split into new clusters only if C contains more than one connected
region of the level e+1 (i.e. c C Se( )∩ >+1 1). The number of the new clusters is equal to
6
c C Se( )∩ +1 (i.e. the number of connected regions inside C). These connected regions are used
as seeds (i.e. markers) for the splitting procedure. A city-block distance is used in the network
S. As in the image processing field, the use of distance and markers permits us to define
influence zones, and the boundary between influence zones is known as SKIZ (skeleton by
influence zone) (Serra, 1982). The SKIZ principle is easily described : each point of the old
cluster C is associated with the nearest connected region (nearest in terms of the city-block
distance). For each connected region a distance fonction is computed in C. The influence zone
of a connected region contains all the points in C for which the associated distance function is
smaller than the distance function of the other regions. Fig.2C shows a split of the initial
cluster into two new clusters, the black line in bold shows the border between the two new
clusters. At each iteration of the algorithm, all the clusters are checked to decide an eventual
split. If a cluster C contains either only one connected region of level e+1 (c C Se( )∩ =+1 1), or
no connected region of level e+1 (c C Se( )∩ =+1 0), then this cluster C is kept without splitting.
At level 2 of the example (Fig.2D), one of the initial clusters contains only one connected
region, so this cluster is kept. The second initial cluster contains two connected regions;
therefore, this cluster is split into two parts. Thus three clusters are detected at level 2 (Fig.2E).
At level 3, these three clusters are checked; since they contain one or zero connected region,
they are kept without splitting. At the end of this example three clusters are obtained.
After the L iterations, we obtain a partition of S. This partition also causes a partition of the
data space D. Each part of D is composed of the tiles whose centers are in the same part of the
S partition. Thus each initial datum of Q belongs to one and only one part of D, and so the
clustering of Q is performed.
D Implementation
The result of clustering depends on three parameters : the data space discretization step, the
Parzen window width, and the number of levels L.
First we assume that the discretization is as high as possible with respect to the device capacity
as explained below. The domain D is defined as the smaller hypercube containing the data set
Q. The n-dimensional domain D is partitionned by p hypercube tiles with p = pn0 . Therefore
the discretization step λ i of each dimension i (1≤ ≤i n) is the ratio of the D width by the p0
7
value. p0 is the discretization parameter for clustering; it defines the number of discretization
steps in each dimension. When the dimension n is equal to 2, we select a large value for p0
(p0 512= for instance). But when the dimension n is equal to 4, the use of a
512 512 512 512× × × discrete set requires a lot of memory, and the algorithm needs a lot of
CPU time; thus we select a small value for p0 (p0 32= for instance). Then two strategies are
developed according to this discretization parameter, and the estimation of the p.d.f. depends
on the selection of p0.
For each dimension i (1≤ ≤i n), we have to select the standard deviation value of the Gaussian
filter for estimating the p.d.f.. As the initial distribution of data is unknown, these standard
deviations si are selected by an empirical approach. The discretization steps are used as units
in each dimension i. The standard deviations have a constant value s for each dimension :
s si = (1≤ ≤i n). Thus the Parzen window width is defined through this s value. The result of
clustering depends to a large extent on the ratio ρ = sp0
.
Concerning the strategy for selecting ρ in the case of a large p0 value (p0 512= ), if ρ is too
large, the estimated p.d.f. is too "smoothed" and tends toward an unimodal function. On the
other hand, if ρ is too small, the p.d.f. estimation is "noisy" describing "the superposition of
sharp pulses" (Duda and Hart, 1973) as we observe in Fig.1. Therefore the number of detected
clusters depends on this value ρ. The smaller ρ is, the higher the number of detected clusters
will be. We suggest taking advantage of this relationship to select the final ρ value. Fig. 3
shows the number of clusters as a function of ρ. The curve is obtained with the same data as in
Fig.1 by using the algorithm with 8 iterations (i.e. 8 p.d.f. thresholds), and ρ lies between 1%
and 10%. The number of clusters remains constant and equal to 3 in a large plateau showing
that the intrinsic dimensionality of the problem is equal to 3. Thus we propose selecting the ρ
value in the middle of this largest range (i.e. ρ = 7% ). Note that a similar empirical procedure
has already been used in cluster analysis to select this kind of tuning parameter (Zhang and
Postaire, 1994).
If p0 has a small value (p0 32= ), then the discretization steps are large with respect to the
domain size, and each tile of the domain is large. Each tile can integrate many samples of the
initial data set and the estimated p.d.f. is already "smoothed" without any interpolation
8
procedure. Thus it could be difficult to observe a large range with a constant number of
detected clusters. Table 1 shows the number of detected clusters as a function of ρ values
using the data shown in Fig.1 and the algorithm with 16 iterations (i.e. 16 p.d.f. thresholds).
Note that the higher p0 is, the larger the plateau is, and the percentage of labelling errors
decreases within this plateau. If such a plateau is not detected, a small value of ρ is required
and we proposed ρ ≤ 1%.
The clustering depends on the number L of algorithm iterations. The number L of p.d.f.
threshold levels should have the highest value with respect to a reasonable computing time.
Eight iterations (L=8) seem to be a good compromise; the clustering improvement obtained by
increasing L (using L=16) does not seem significant in the following examples.
III Applications
Our strategy has been to test this clustering procedure with simulated data, at which point we
propose applications in image processing.
A Example of overlap between clusters of unequal weights
The goal of the study is to compare the results of this clustering procedure with those of the
theoretical Bayes decision. Toward this end, three Gaussian distributions of two-dimensional
data are simulated. Three clusters of 900, 600, and 300 data are generated. The mean values of
the Gaussian distributions are equal to (0, 0), (40, 0), and (20, 30), and the standard deviations
have the same value of 10. Fig.4A shows these simulated data distributions with unequal
weights and an overlap between the clusters. The data space is discretized in 512*512
elements. The p.d.f. is estimated using a ρ value equal to 5 %. Fig.4B shows a representation
of this p.d.f. estimation. The clustering procedure is defined with 8 iterations (i.e. 8 p.d.f.
thresholds). For ρ values between 7 % and 3 %, the procedure causes three influence zones.
For these values, the data space is partitioned into three parts. As described above, the middle
of the largest plateau is taken into consideration (ρ = 5%). Fig.4A shows the three detected
parts with three grey levels. Through this clustering procedure, the error rate is equal to 5.4 %
(98 labelling errors). Using Bayes theoretical decision, the error rate should be equal to 3.7 %
(66 errors induced by the overlap). Note that the result of our clustering approach is not so far
9
from that of the Bayes decision, but our approach does not use any a priori knowledge about
the number of clusters and the probability distribution of the cluster data. Using a 16 iteration
algorithm, a slight improvement is observed; the error rate is reduced to 4.1 % (73 labelling
errors).
B Example of non-linearly separable clusters
The clustering procedure is particularly well suited to non-linearly separable clusters where
classical procedures fail. This example shows the results in such a case. Two non-linearly
separable clusters are simulated in dimension two with 300 and 400 data (Fig.5A). The data
space is discretized in 512*512 elements. Fig.5B shows the p.d.f. estimation with ρ = 6%. The
clustering procedure is used with 8 iterations. Fig.5A shows the result of the clustering. Note
that the procedure gives two clusters for ρ values between 2% and 10%. The error rate is
found to be equal to 4.7 % (33 labelling errors). Using our procedure with 16 iterations, the
error rate is equal to 4.3 % (30 labelling errors). Such clustering cannot be obtained with
classical procedure as the K-means using either Euclidean distance or Mahalanobis distance
because the decision boundary between clusters is either a straight line or a quadrics (Duda
and Hart, 1973).
C An example in four-dimensions
Imaging instruments are nowadays able to record multiple images (multispectral images for
instance, or multiple maps of chemical species in X-ray microanalysis (Bonnet and Quintana,
1994), or Secondary Ion Mass Spectroscopy (Van Espen et al, 1992) ). The classification of
pixels according to their multiple components can sometimes be performed by supervised
procedures as in aerial imagery for instance, but this is not affordable for studies at a
microscopic level. There is thus a need to develop unsupervised classification procedures for
the analysis of these multidimensional images (Bonnet, 1995). As a preliminary investigation
of this problem, we simulated 128*128 images in four dimensions, supposed to represent four
elemental maps corrupted by Poisson noise. Figs.6A, 6B, 6C and 6D shows these four maps.
The data space is discretized in 32*32*32*32 elements. The p.d.f. is estimated in this 4-
dimensional space with a ρ value equal to 1 %. Neither this data space nor the estimated p.d.f.
can be easily displayed which prevents any interactive partitioning method to be applied.
10
Using 8 p.d.f. thresholds, the procedure detects 10 clusters (Fig.6E), as expected. Testing
multidimensional problem, the proposed procedure exhibits the expected results with non
Gaussian distribution and unsupervised clustering. Note that one expected cluster contains
more than 50 % of the data (background in Fig.6E) and each of the 9 other clusters contains
about 5 % of the data. Despite the unequal weights of the clusters the clustering procedure
does not fail.
D Application in Color Image Processing
Each pixel of a color image has three values : red (R), green (G) and blue (B). These values are
an integer between 0 and 255 (i.e. 8 bit depth code). A classical problem in color image
processing is to search how many "colors" are present in an image. This problem cannot have
a simple answer because of the difficulties met when defining the term color in image
processing field. The discrimination between two colors is not a trivial procedure (Herbin,
Venot, Devaux et al, 1990). In this paper, a datum is a (R, G, B) point in the color space. A
color image is considered as a set of 3-dimensional data to which we can apply the clustering
analysis. Note that we do not use the spatial distribution of the data inside the image; thus this
clustering approach is only considered as a pre-processing of color images. The goal is to
study the ability of this clustering algorithm to discriminate color phenomena. We propose
distinguishing 18 greyish color patches on a neutral grey background (Fig.7). White lines
separate color patches. The classical transformation of the RGB space into IHS space
(Intensity, Hue, Saturation) is not used in such a case because the hue is very noisy or cannot
be computed with grey, whiteness and greyish color (Otha, Kanade and Sakai, 1980). Note
that the background represents more than half of the data. Thus many transformations, as
Karhunen-Loeve transformation for instance, are not well suited to this data processing
because they increase the importance of small differences between background grey values
and decrease the importance of small differences between patch color values. Fig.7 shows the
three components of the color images: images A, B and C are respectively the Red, Green, and
Blue components. The 512*512 data of the color image are discretized in 64*64*64 elements.
The p.d.f. is estimated using 8 levels. The ρ value is taken as 0.5 % and 19 clusters are
detected with the clustering procedure. The cluster results are shown with 19 grey levels in
11
Fig.7D. Each color patch and background has a different label. These results can be compared
to those of the K-means procedure with Euclidean distance (Fig.7E). Using many random
initializations, the K-means procedure searching 19 clusters always fails because of the large
number of background data compared to the small number of color patch data.
IV Discussion and Conclusion
The aim of this paper was to investigate a new approach for data clustering, with a specific
emphasis on applications in the field of image analysis. The new approach is intended to avoid
any foreknowledge assumed by many classical clustering procedures and concerning the shape
and/or the number of clusters present in the data set.
Concerning the shape of clusters : our approach is based on the estimate of the probability
density functions (p.d.f.) according to the Parzen window technique. Thus, the clusters can
have any shape, including irregular ones which are difficult to describe analytically.
Concerning the clustering process : the samples are gathered in different clusters according to
a bottom-up iterative thresholding of the p.d.f.. The successive detected parts define a
hierarchy of regions, from which the last one is used for clustering. Since neither the concept
of center of cluster nor the concept of class variance is used, the process remains independent
of any assumption about the shape of class distributions.
Concerning the number of clusters : the whole process is in fact repeated for different values
of one parameter involved in the estimate of the p.d.f.; this parameter is the standard deviation
in the case of a Gaussian interpolating function. Plotting the number of clusters obtained for
different values of this parameter reveals (as a plateau) the intrinsic dimensionality of the data
set. Of course, in some situations, several plateaus are observed, revealing the hierarchical
nature of clustering (main classes can be subdivided in sub-classes). In this case, the user has
to chose between several clustering alternatives.
12
A detailed comparison between this approach and previously described algorithms is beyond
the scope of this paper and will be the subject of a forthcoming article. However, some
qualitative indications can be given here. We concentrate the discussion on methods which do
not hypothesize concerning the shape of clusters, because classical clustering algorithms such
as K-means, fuzzy C-means or ISODATA are disqualified in the general case of arbitrary
shaped clusters.
One suggested method which is relatively close to ours is the "dynamic procedure of splitting"
of Garcia et al (Garcia et al, 1995). The idea of this method is that two data belong to the same
cluster if one can go from the first one to the second one without having to jump a large step.
This idea can be restated as : the two data points belong to the same region of the p.d.f.
without any valley between the two points. There is thus a common background shared by the
two methods. However, the explicit use of the p.d.f. in our method allows us to handle
situations with overlapping distributions, which cannot be handled by the "dynamic procedure
of splitting".
Another method which does not make any assumptions relative to the shape of clusters is the
"mean shift" clustering (Cheng, 1995). It has been shown that this approach is equivalent to a
hill-climbing of the p.d.f. Thus, there is also a common basis with our approach. Detailed
comparisons, including numerical figures of merit in various situations, have to be performed
before a more definitive conclusion can be drawn. We have the impression that the
computation time is higher in the case of mean shift than with our algorithm.
Another method which deserves attention is the method of Postaire and his co-workers
(Touzani and Postaire, 1988; Zhang and Postaire, 1994). The idea is also to start with an
estimation of the p.d.f.. The main difference concerns the second step : clustering according to
the p.d.f. estimation. They proceed by searching the modes of the p.d.f. as convex parts of the
p.d.f.; the points belonging to the concave parts are put into the different regions after a data
space equi-partition through the influence zones of the detected modes. Although once again
additional work is needed for comparison, we can make two comments about this approach.
13
First, in order to maintain the connectivity of the regions, the data space has to be discretized
into a small number of points. In these conditions, a precise partition of the data space cannot
be obtained. One important advantage of our bottom-up approach is that the connectivity of
regions is always preserved. Second, we can notice that a bias is introduced during the
partition of the data space when modes approach is used in the cases of classes with rather
different variances. This is illustrated in Fig.8. In Fig.8A, two modes of the p.d.f. are first
detected and then, the remaining data space is split into two influence zones. One can notice
too much space on the right of the boundary is allocated to the smallest class. In Fig.8B, which
corresponds to our approach, the p.d.f. is first thresholded. The iterative thresholding ensures
that the correct threshold will be found. Then the remaining space is divided into two parts.
The misclassified samples should be less numerous in this case.
Although some work remains to be done to evaluate more deeply our method, we consider that
these preliminary results are encourageing and the method could be of great interest for
solving many clustering problems arising in image processing applications, where the clusters
may have arbitrary shapes, and where the classical clustering methods (K-means, Fuzzy C-
means, ISODATA,..) often fail.
In this preliminary version, the iterative algorithm uses a fixed number of predetermined
equally-spaced thresholds of the p.d.f. Future work is intended to develop a strategy for
automatically selecting non-equally spaced thresholds adapted to specific clustering problems.
Acknowledgements :
The authors thank the anonymous referees for their helpful suggestions and corrections of the
first version of the manuscript.
14
Figure Captions
Fig. 1 : Estimation of probability density function.
Example of p.d.f. estimation with Parzen window function and Gaussian interpolation.
Image A : Distributions of 1200 data in a two-dimensional space (512*512 square units)
Images B, C, and D: Estimates of p.d.f. with σ equal to 18, 26, and 51 units, respectively.
Fig. 2 : Iterative SKIZ
Iterative partitioning through influence zones: example of a four-level process in a two-
dimensional space. The grey grid limits the rectangular tiles in the search domain. The black
grid shows the network connecting the tiles through neighbourhood relations.
Image A : At level 0, only one connected region appears. The grey part overlaps the whole
domain.
Image B : At level 1, two grey parts appear; there are two connected regions.
Image C : The black line in bold separates the two influence zones associated with connected
regions. It was computed as the skeleton influence zones (SKIZ) of the connected regions.
Image D : At level 2, three grey parts show three connected regions.
Image E : The part containing two connected regions is split. A new black line in bold
appears as the SKIZ between the new created regions. The second part remains the same.
Image F : At level 3, the three previous parts are maintained because they contain only zero
or one connected region.
Fig. 3 : Number of clusters
The number of detected clusters as a function of the tuning parameter ρ. ρ is the percentage of
the standard deviations for p.d.f. estimate with Parzen window function with respect to the
number p0 of discretization steps in the data space.
15
Fig. 4 : Clustering of three Gaussian distributions.
Image A : Three simulated Gaussian distributions of 900, 600, and 300 data. The grey
backgrounds show the zones of influence attributed to the three clusters.
Image B : The p.d.f. estimation used for this clustering.
Fig. 5 : Clustering of non linearly separable distributions.
Image A : Two distributions of 400 and 300 data. The grey backgrounds show the zones
of influence attributed to the two clusters (Note that the shape of the boundary is neither linear
nor quadric).
Image B : The p.d.f. estimation used for this clustering.
Fig. 6 : Clustering in a four-dimensional space.
Images A, B, C, and D : The four components of 128*128 images.
Image E : The 10 grey values show the 10 detected clusters. Some labelling errors could
be reduced by relaxation procedures
Fig. 7 : Clustering of color image data.
Images A, B, and C : The three color components (Red, Green, and Blue respectively) of the
image (512*512 pixels).
Image D : The 19 grey values show the 19 detected clusters.
Image E : The 19 clusters detected using a K-means procedure with random initialization.
16
Fig. 8 : Partition of the space through p.d.f. analysis
Comparison of the modes analysis approach and our thresholding approach : Example of p.d.f.
with two modes in a one-dimensional space.
A : Detection of the p.d.f. modes by convexity analysis and partition of the space into two
clusters defined by influence zones. Misclassifications occur for the points on the right side of
the estimated boundary.
B : Thresholding of the p.d.f. and partition of the space into two clusters defined by influence
zones. Misclassifications are much weaker than in A.
Table 1 : Number of clusters and error rate.
Number of detected clusters as a function of the two parameters p0 and ρ (using 16 iteration
procedure and the data shown in Fig.1): p0 is the discretization parameter defined as the
number of discretization steps in the data space; and ρ is the tuning parameter for the p.d.f.
estimation, with ρ defined as the percentage of the standard deviation for Parzen window
function with respect to data space size. If the number of detected clusters is equal to 3, then
the percentage of labelling errors is computed.
17
References
Bonnet, N. and C. Quintana (1994), Multivariate Statistical Analysis applied to X-ray Spectra
and X-ray Mapping of Liver Cell Nuclei. Scanning Microscopy 8 (3), 563-586.
Bonnet N. (1995), Preliminary Investigation of two Methods for the Automatic Handling of
Multivariate Maps in Microanalysis. Ultramicroscopy 57, 17-27.
Cheng, Y. and S. Wan (1992), Analysis of the Blurring Process. In: T. Petsche et al, Ed.,
Computational Learning Theory and Natural Learning Systems 3, MIT Press, London,
257-276.
Cheng, Y. (1995), Mean Shift, Mode Seeking, and Clustering. IEEE Trans. Pattern Anal.
Machine Intell. 17 (8), 790-799.
Devijver, P.A. and J. Kittler (1982), Pattern Recognition: a Statistical Approach. Englewood
Cliffs, NJ: Prentice-Hall.
Duda, R.O. and P.E. Hart (1973), Pattern Classification and Scene Analysis. Wiley, New
York.
Dunn, J.C. (1973), A Fuzzy Relative of the ISODATA Process and Its Use in Detecting
Compact Well-Separated Clusters. J. Cybernetics 3 (3), 32-57.
Fukunaga, K. (1972), Introduction to Statistical Pattern Recognition. Academic, New York.
Fukunaga, K. and C.D. Hostetler (1975), The Estimation of the Gradient of a Density
Function, with Application in Pattern Recognition. IEEE Trans. Inf. Theory 21(1), 32-40.
Garcia, J.A., J. Fdez-Valdivia, F.J. Cortijo, and R. Molina (1995), A Dynamic Approach for
Clustering Data. Signal Processing 44, 181-196.
Herbin, M., A. Venot, J.Y. Devaux, and C. Piette (1990), Color Quantitation Through Image
Processing in Dermatology. IEEE Trans. Medical Imaging 9 (3), 262-269.
Kittler, J. (1976), A Locally Sensitive Method for Cluster Analysis. Pattern Recognition 8, 23-
33.
18
Koontz, W.L. and K. Fukunaga (1972), A Nonparametric Valley-Seeking Technique for
Cluster Analysis. IEEE Trans. Comput 21 (2), 171-178.
Otha, Y., T. Kanade and T. Sakai (1980), Color Information for Region Segmentation.
Comput. Vision Graphic Image Process. 13, 22-241.
Postaire, J.G. and S. Olejnik (1994), A Relaxation Scheme for Improving a Convexity Based
Clustering Method. Pattern Recognition Letters 15, 1211-1221.
Rose, K., E. Gurewitz, and G.C. Fox (1990), Statistical Mechanics and Phase Transitions in
Clustering. Physical Review Letters 65, 945-948.
Serra, J. (1982), Image Analysis and Mathematical Morphology. Academic Press, New York.
Sokal, R.R. and P.H. Sneath (1963), Principles of Numerical Taxonomy. W. H. Freeman, San
Francisco.
Touzani, A. and J.-G. Postaire (1988), Mode Detection by Relaxation. IEEE Trans. Pattern
Anal. Machine Intell. 10, 970-978.
Van Espen, P., G. Jannssens, W. Vanhoolst, and P. Geladi (1992), Analusis 20, 81.
Zhang, R.D. and J.-G. Postaire (1994), Convexity Dependent Morphological Tranformations
for Mode detection in Cluster Analysis. Pattern Recognition 27 (1), 135-148.
19
A B
C D
FIG. 1 : Estimation of Probability Density Function
20
B
D
F
C
E
A
FIG. 2 : Iterative SKIZ
21
parameter for p.d.f. estimation
numberof
clusters
0123456789
10
0% 2% 4% 6% 8% 10% 12% 14%
FIG. 3 : Number of clusters
22
A B
FIG. 4 : Clustering of three Gaussian Distributions
23
A B
FIG. 5 : Clustering of non-linearly separable distributions
24
A B
C D
E
FIG. 6 : Clustering in a four-dimensional space
25
A B C
D E
FIG. 7 : Clustering of color image data
26
First cluster Second cluster
Boundarybetween clusters
Thresholded zone Thresholdedzone
Thresholdof p.d.f.
B - Analyzing the p.d.f. through thresholding
First cluster Second cluster
Convexity Convexity
Boundarybetween clusters
A - Analyzing the p.d.f. through convexity
FIG. 8 : Partition of the space through p.d.f. analysis