Color based image segmentation as edge preserving ﬁltering ...€¦ · Gauss Transform based mean shift [14] for Normal ker-2. In the recent papers, the original “mean shift”

1

Color based image segmentationas edge preserving filtering and grouping

✦

Abstract —In this paper we contend that color based image segmenta-tion can be performed in two stages: an edge preserving filtering stagefollowed by pixel grouping. Furthermore, for the first step, we introducea framework under which many current filtering approaches (and somenovel ones) can be classified. We present experiments where a newmethod, called Color Mean Shift, outperforms all the other methods inproducing more uniform regions and preserving the edge boundaries.Then, applying standard clustering methods for the grouping step, wepresent extended experimental results for the Berkeley segmentationdataset of the combined filtering+grouping segmentation methods. Weshow that this new approach to image segmentation produces bettersegmentations compared to current state-of-the-art methods based ongrouping only.

Index Terms —Image filtering, image segmentation, mean shift, bilateralfiltering.

1 INTRODUCTION

We consider the problem of image segmentation, basedonly on the intensity values of an image. Color basedsegmentation is a fundamental and well studied problemin computer vision and many algorithms exist in theliterature. Although many different methods have beensuggested, all of them perform some kind of grouping ofthe pixels in the image based on some pairwise similaritycriteria.This paper does not argue against this common view

of segmentation. On the contrary we believe that thegrouping of pixels is a necessary step of the segmenta-tion process. The main contention of the paper is thatan edge preserving filtering step1 should proceed thegrouping step. As a result, we perceive segmentationas a two-step process; a smoothing step followed by agrouping step. Intuitively, the smoothing step attempts tobring closer intensities of neighboring pixels that belongto the same segment, while preserving (or even enhanc-ing) the intensity difference across segment boundaries.The grouping step, on the other hand, makes the finaldecision whether two neighboring pixels belong to thesame segment or not. We also argue that both steps areequally important, even though current methods onlyconcentrate on one step of the process. As expected, theircombination affects the final result.

1. In the rest of the paper we use the terms filtering and smoothinginterchangably. In our view both terms indicate the process of smooth-ing the image while preserving the strong edges.

In the first part of the paper, we study a number ofsmoothing techniques; the original mean shift [1] andits modified version[2], [3]2, bilateral filtering [4],[5],local mode filtering [6] and anisotropic diffusion [7].We present all the above techniques as variations of ageneral optimization problem. Using such a formulationthe similarities and differences between them are madeclear. This framework also provides a natural way toclassify them using two criteria. Using the classificationcriteria we propose three novel methods. Two of them(color mean shift and spatial mean shift) are variations ofthe mean shift filtering and the third one is an extensionof bilateral filtering. Filtering experiments show thatcolor mean shift actually outperforms the other filteringmethods in smoothing the images, while preserving theedges.In the second part of the paper, we present segmen-

tation methods by combining the previously describedfiltering methods with four grouping methods. We per-form a number of experiments using the Berkeley [8]and Weizmann Institute [9] datasets and answer thefollowing critical questions; Is the filtering step importartfor the segmentation and does filtering in different colorspaces and using different kernel functions matter?

1.1 Related Work

Our work on filtering methods is motivated by themean shift algorithm so first we present related workon mean shift. Following the success of Comaniciu andMeer’s version of mean shift [3] the same basic algorithmfor non parametric clustering has been used for objecttracking [10], 3D reconstruction [11], image filtering [3],texture classification [12] and video segmentation [13]among other problems. The relatively high computa-tional cost of a naive implementation of the methodcombined with the need for fast image processing ledresearchers to propose fast approximate variations of it.Most notably, two solutions for finding pairs of pointswithin a radius have been proposed; the Improved FastGauss Transform based mean shift [14] for Normal ker-

2. In the recent papers, the original “mean shift” approach is called“blurring mean shift”. We use a different name for the mean shiftvariant used in computer vision, namely “mode finding”. So in therest of this chapter the term Mode Finding refers to Comaniciu andMeer’s version of mean shift and is abbrievated as CMMS.

2

nels and the Locality Sensitive Hashing based mean shift[12].Cheng [2] was the first to recognize the equivalence of

mean shift to a step-varying gradient ascent optimizationproblem, and much later Fashing and Tomashi [15]showed that it is equivalent to Newton’s method withpiecewise constant kernels, and is a quadratic boundmaximization for all other kernels. Yuan and Li [16]prove that mean shift is a half quadratic optimizationfor density mode detection when the profiles of thekernel functions are convex. Finally, Carreira-Perpinan[17] proves that it is equivalent to an EM algorithm whenthe kernel is the Normal function.Concerning the grouping methods, we use three al-

gorithms: the grouping method used by Comaniciu andMeer [3] (in 3D and 5D) and the method of Felzenszwalband Huttenlocher[19]. We have chosen the first twobecause they are directly related to the filtering methods.The fourth method while not directly related to filteringis still a fast, local method that is considered the state ofthe art in color based segmentation. A number of othergrouping (i.e., segmentation) algorithms exist in the liter-ature; energy minimization [20], spectral clustering [21],[22], algebraic multigrid [23] based methods to name afew. The reason why we did not include them in ourcomparison was because they were either a) slow and/orb) hard to parameterize and/or c) difficult to implement.Nevertheless, we believe that the grouping methods weused were sufficient to prove our points.

1.2 Paper organization

This paper is organized as follows. After a short sectiondescribing the notation and some necessary mathemati-cal prerequisites we proceed to describe the frameworkfor the filtering algorithms. Then, we present the criteriaused to classify the methods as well as the individualmethods themselves. We conclude the first part of thepaper with a number of experiments applying the meth-ods to different images. The second part, begins with theintroduction of the grouping methods we used and themeasures to quantify the segmentation quality. Then weproceed with the experiments. We conclude this paperwith the final conclusions and future work.

2 NOTATIONAL PRELIMINARIES

We represent the color image as a mapping S from the2D space of the pixel coordinates to the 3D space ofthe intensity values (for color images). xi is a 2D vectorrepresenting the spatial coordinates of pixel i(i = 1 . . .N)and S(xi) is a vector that represents the three color chan-nels. To simplify the notation we denote the intensitiesfor a pixel xi with a subscript, so S(xi) = Si. We alsodenote the set of all pixels as X and the whole imageS (X). The cardinality of X is N .In the following sections we use bold letters to represent

vectors and the notation [xi,Si]T to indicate a concatena-

tion of vectors. When we want to indicate the evolution

of a vector over time we use superscripts, e.g. [x0i ,S

0i ]

indicates the initial values of pixel xi having intensitySi.

2.1 Kernel Functions

Definition(Kernel Function):Let X be a d-dimensionalEuclidean space and x ∈ X. We denote with xi the ith

component of x. The L2 norm of x is a non-negative

number ||x|| such that ||x||2 =∑d

i=1 x2i . A function K :

X → R is a kernel if and only if there exists anotherfunction k : [0 · · ·+∞]→ R such that

K(x) = k(||x||2) (1)

and

1) k is non negative2) k is non increasing i.e.,

k(a) ≥ k(b), if a < b (2)

3) k is piecewise continuous and

ˆ +∞

0

k(a)da < +∞ (3)

Function k(x) is called the profile of the kernel K(x).

Often the kernel function is normalized i.e.,ˆ

X

K(x)dx = 1. (4)

Even though kernel functions are mostly used forkernel density estimation, we use them in order to defineoptimization problems that we subsequently solve usingstandard gradient descent methods. Thus, we are notonly interested in the kernel function K(x) but also

on its partial derivatives ∂K(x)∂x

. Next we define twokernel functions that we use; the Epanechnikov and theGaussian kernel.

2.1.1 Epanechnikov kernel

The Epanechnikov kernel [24] has the analytic form

KE(x) =

{

cE(1 − xTx) x

Tx ≤ 1

0 otherwise(5)

where cE =d + 2

2πd/2Γ(

d + 2

2) is the normalization constant.

Fig. 1(a) presents this kernel in the 1−D case. The partialderivative of KE(x) with respect to element xi of vectorx is

∂KE(x)

∂xi=

{

−2 · cE · xi −1 < xi < 1

0 |xi| > 1(6)

and is depicted in Fig. 1(b).

3

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−1

−0.5

0

0.5

1

1.5

2

x

KE

1D Epanechnikov kernel

(a) 1-D Epanechnikov Kernel

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

xdK

E/d

x

Derivative of the Epanechnikov kernel

(b) Derivative of 1-D EpanechnikovKernel

Figure 1: 1−D Epanechnikov kernel.

−5 0 5−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

x

KN

1D Normal kernel

(a) 1-D normal kernel

−5 0 5−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

x

dKN

/dx

Derivative of the Normal kernel

(b) Derivative of 1-D normal kernel

Figure 2: 1−D Normal kernel.

2.1.2 Multivariate Normal (Gaussian) kernelThe multivariate Normal kernel with variance 1 has theanalytic form

KN(x) = (2π)−d2 exp(−

1

2x

Tx). (7)

In Fig. 2(a) a 1−D Normal kernel is displayed.The partial derivative of KE(x) with respect to ele-

ment xi of vector x is

∂KN (x)

∂xi= −xi · (2π)−

d2 exp(−

1

2x

Tx) = −xi ·KN (x) (8)

and is depicted in Fig. 2(b).The Normal kernel is often symmetrically truncated to

obtain a kernel with finite support.

3 EDGE PRESERVING FILTERING

3.1 A taxonomy of filtering methods

In Fig. 3 we present our scheme for classifying thevarious edge preserving filtering methods. The figurecan be split into two parts. On the left part optimiza-tion based methods are shown, while on the right partfiltering methods. The only difference between filteringand optimization methods is that the former methodsperform a single iteration of the corresponding optimiza-tion problem. The three new methods are spatial meanshift, color mean shift and joined bilateral filtering.

3.2 Classification criteria

Careful examination of the previous defined optimiza-tion problems reveal that there are only two differencesin their objective functions; the presence of [xi,Si] or[Si] as the optimization argument; and the comparisonagainst the points in the original image [x0

j ,S0j ] or the

points on the previous iteration [xj ,Sj ]. Finally twoof the methods (bilateral filtering and joined bilateralfiltering) are an one-iteration methods, while all the othermethods perform multiple iterations till convergence.Next we explain in details these differences.

3.2.1 arg min[xi,Si]

vs arg minSi

In the first case the optimization problem is defined overthe joint spatial and range domain (5−D), i.e. both theposition of the pixels as well as their intensities change ineach iteration. In the second case, where the optimizationis over the range domain (3−D), only the intensities ofthe pixels change while their position remain the same.This is not to be confused with the use of [xi,Si] inthe objective function. While the position of the pixelis always considered in the computation of the objectivefunction, that position might change or not (dependingon the method).

At this point we should also make clear that theoptimization is defined for the whole image, that is thevalues of all the pixels change. For the sake of simplicitywe do not make this explicit when we write down theoptimization equation.

3.2.2 [x0j ,S

0j ] vs [xj ,Sj ]

With a subscript we denote the value of the pixels at aspecific iteration, so [x0

j ,S0j ] is the value of pixel xj at

the very beginning, i.e. in the original image. The lackof a superscript denotes the current value of pixels, i.e.the value of the pixel at a previous iteration. Two pairsof algorithms (mean shift/mode finding and local modefiltering/anisotropic diffusion) only differ in whether wecompare the current value of a pixel against the originalimage or the image obtained in the previous iteration. Aswe will demonstrate in the experiments, the results varysignificantly because of that (also see [25] for a theoreticalanalysis and justification).

Furthermore, there are two valid hybrid combinationsthat have not been proposed before.

• [x0j ,Sj ] : In this case the comparison is performed

against the original position of the pixels and thepreviously computed range image.

• [xj ,S0j ] : In this case the position of the pixels in the

previous iteration is used along with their originalintensity values.

Apparently the previous cases only make a differencewhen the optimization is defined over the joint spa-tial/range domain. Otherwise the position of the pixelsnever changes, thus [xj ] ≡ [x0

j ].

4

Classification Scheme

Optimize over

Compare against Compare against

One−pass filter over

Bilateral

filtering

Mode

finding

Spatial

mean

shift

Color

mean

shift

Mean

shift

Local

mode

filtering

Anisotropic

diffusion

Joined

bilateral

filtering

[xi,Si][xi,Si] [Si][Si]

[x0j ,S

0j ][x0

j ,S0j ] [xj ,S

0j ] [x0

j ,Sj ] [xj ,Sj ][xj ,Sj ]

Figure 3: Classification of various filtering methods.

3.3 Filtering methods

In the following subsections we define a number ofimage filtering techniques as optimization problems. Inprevious formulations these methods were defined asthe result of applying an algorithm to an image. Usingour formulation we aim to achieve two goals; to simplifythe methods (since we only need a single equationto describe it) and to describe all the methods in auniform way. Note that some methods (i.e. mean shiftand mode finding) are defined for any kernel function,while others (i.e., bilateral filtering, local mode filteringand anisotropic diffusion) are only defined with respectto the Normal kernel KN(x).

3.3.1 Mean Shift (MS)

The original mean shift formulation [1] (applied to acolor image) treats the image as a set of 5 − D points(i.e., 2 dimensions for the spatial coordinates and 3dimensions for the color values). Each point is iterativelymoved proportionally to the weighted average of itsneighboring points. At the end, clusters of points areformed. We define mean shift to be the gradient descentsolution of the optimization problem

arg min[xi,Si]

−∑

i,j

K([xi,Si]− [xj ,Sj]), (9)

where∑

i,j

defines the summation over all pairs of pix-

els in the image. Note that this problem has a globalmaximum when all the pixels “collapse” into a singlepoint. We seek a local minimum instead. That’s why weinitialize the features [xi, si] with the original position

and color of the pixels of the image and perform gradientdescent iterations till we reach the local minimum.

3.3.2 Mode Finding or Comaniciu/Meer Mean Shift(CMMS)The modified mean shift formulation proposed by Co-maniciu and Meer [3] (henceforth called “mode finding”and denoted as CMMS) can also be expressed as agradient descent solution of the optimization problem

arg min[xi,Si]

−∑

i,j

K([xi,Si]− [x0j ,S

0j ]) (10)

There is a subtle difference between mode finding andmean shift, that significantly affects the performance. Inthe former formulation each current point is comparedagainst the original set of 5 − D points [x0

j ,S0j ], while

in the latter case the point is compared against the setof points from the previous iteration [xj ,Sj ]. In a recentpaper [25] S. Rao et al. study those two variations froman information theoretic perspective and conclude thatmean shift is not stable and hence should not be usedfor clustering.Fig. 4 presents the results of both methods in a

smoothly varying intensity image. Notice that the gra-dient of the kernel function is zero everywhere butin the boundaries. Thus, mode finding filtering onlychanges the intensity on the boundaries (that changeis not very visible in Fig. 4). Mean shift, on the otherhand, produces artificial segments of uniform intensity.Intuitively, each iteration of the process results in moreclustered data which in turn results in better clusteringresults for the next iteration. On the downside, a fastmean shift implementation is challenging due to the fact

5

that the feature points and the comparison points do notlie on a regular spatial grid anymore. Thus in a naiveimplementation one would have to compare the currentfeature [xi,Si] against all the remaining feature points.

3.3.3 Spatial Mean-Shift (SMS)One of our proposed methods that lies between meanshift and mode finding, spatial mean shift performsmean shift in the spatial dimensions and mode finding inthe color dimensions. SMS can be viewed as the gradientdescent solution of the optimization problem

arg min[xi,Si]

−∑

i,j

K([xi,Si]− [xj ,S0j ]). (11)

Spatial mean shift suffers from the same computationalproblems as mean shift, so it is mentioned here for thesake of completeness. We exclude the results of bothmean shift and spatial mean shift in our filtering andsegmentation experiments.

3.3.4 Color Mean-Shift (CMS)Color mean shift is our proposed method that alleviatesthe computational problem of mean shift by using theoriginal spatial location of the points for comparison,while it uses the updated intensity values of the previousiteration for improved clustering ability. In a sense, meanshift is performed on the color dimensions and modefinding on the spatial dimensions (that is the reason fornaming the method “color mean shift”). As above, CMScan be expressed as the gradient descent solution of theoptimization problem

arg min[xi,Si]

−∑

i,j

K([xi,Si]− [x0j ,Sj ]). (12)

3.3.5 Local Mode Filtering (LMF)Local mode filtering [6] was introduced as a method tofind the local mode in the range domain of each pixelof the image. A generalization of the spatial Gaussianfiltering to a spatial and range Gaussian filter is used toiterate to the local mode (on the 3−D color domain). Oneach iteration the intensity of each pixel is replaced by aweighted average of its neighbors. From an optimizationpoint of view the problem can be expressed as

arg minSi

−∑

i,j

KN([xi,Si]− [x0j ,S

0j ]). (13)

3.3.6 Bilateral Filtering (BF)In bilateral filtering [4],[5] the intensity of each pixel isreplaced by a weighted average of its neighbors. Theweight assigned to each neighbor decreases with boththe distance in the image plane (spatial domain) and thedistance on the intensity axes (range domain). Formallythe intensity at each pixel Si takes the value

Si =

∑

j SjKN ([xi,Si]− [x0j ,S

0j ])

∑

j KN ([xi,Si]− [x0j ,S

0j ])

. (14)

(a) Mode Finding (b) Spatial Mean Shift (c) Color Mean Shift

(d) Mean Shift (e) Local Mode Filtering(f) Anisotropic Diffusion

Figure 4: All the described algorithms applied on asmoothly varying image. All the filtering algorithmswere executed with spatial resolution hs = 21 and rangeresolution hr = 10 and used a Normal kernel.

Bilateral filtering can be considered as the first iteration oflocal mode filtering with a specific step size (Sec. 3.4).

3.3.7 Joined Bilateral filtering

In this variation of the bilateral filtering both the inten-sity and position of each pixel is replaced by a weightedaverage of its neighbors. Formally, the new coordinatesand color of each pixel are

[xi,Si] =

∑

j [xi,Si]KN ([xi,Si]− [x0j ,S

0j ])

∑

j KN([xi,Si]− [x0j ,S

0j ])

. (15)

Analogous to bilateral filtering this method can beconsidered as the first iteration of mode finding witha specific step size.

3.3.8 Anisotropic Diffusion (AD)

Anisotropic diffusion is a non-linear process introducedby Perona and Malik [7] for edge preserving smoothing.In the original formulation a diffusion process with amonotonically decreasing diffusion function of the imagegradient magnitude is used to smooth the image whilepreserving strong edges. Since then other functions havebeen proposed and the equivalence of this techniqueto robust statistics has been established [26]. In [6] theconnection with local mode filtering was also made.Here we provide an alternative view of the diffusionprocess as an optimization problem

arg minSi

−∑

i,j

KN([xi,Si]− [xj ,Sj ]). (16)

The difference between this method and local modefiltering is analogous to the difference between theoriginal mean shift and mode finding. Namely in localmode filtering the current point is compared againstthe original image pixels [x0

j ,S0j ], while in anisotropic

diffusion the comparison is against the intensity valueof the pixels in the previous iteration [xj ,Sj ].

6

Color Mean Shift (CMS)

Input:set of pixels x

0i with intensities S

0i

a function gOutput:feature vector [xi,Si]

Algorithm:initialize feature points [xi,Si]← [x0

i ,S0i ]

repeat until convergencefor all features [xi,Si]

[xi,Si]←P

j[xj ,Sj]g(||[xi,Si]−[x0

j ,Sj]||2)

P

jg(||[xi,Si]−[x0

j,Sj ]||2)

Mode Finding (CMMS)

Input:set of pixels x

0i with intensities S

0i

a function gOutput:feature vector [xi,Si]

Algorithm:initialize feature points [xi,Si]← [x0

i ,S0i ]

for all features [xi,Si]repeat until convergence

[xi,Si]←P

j[xj ,Sj]g(||[xi,Si]−[x0

j ,S0

j ]||2)P

j g(||[xi,Si]−[x0

j,S0

j]||2)

Figure 5: The algorithms that we use in the experiments. Note that g(x) = [x ≤ 1] (indicator function in Iversonnotation) for the Epanechnikov kernel and g(x) = exp(−x/2) for the Normal kernel. Local mode filtering isperformed in a similar way as mode finding and mean shift, anisotropic diffusion are performed in a similarway as color mean shift.

3.4 Optimization steps sizes

From the above optimization problems mean shift, spatialmean shift, color mean shift and anisotropic diffusion arejoint optimization problems i.e., the whole image needsto be optimized simultaneously. In mode finding andlocal mode filtering, on the other hand, each pixel canbe optimized independently from the rest of the image.Next we present two claims concerning the step size ofthese optimization problems.

Claim 1: Local mode filtering (and mode finding witha Gaussian kernel) can be considered as gradient de-scend methods for solving the corresponding optimiza-tion problem (Eqs. 13 and 10 respectively) with a stepsize at iteration t of

γti = −

1∑

j KN ([xi,Sti]− [xj ,S0

j ]). (17)

Claim 2: Mode finding with an Epanechnikov kernelcan be considered as a gradient descend method forsolving the corresponding optimization problem (Eq. 10)with a step size at iteration t of

γti = −

1

2cE

∑

j,||[xti,St

i]−[x0

j,S0

j]||<1 1

(18)

As a consequence the result after one iteration of thegradient descent is

[xt+1i ,St+1

i ] =

∑

j,||[xti,St

i]−[x0

j,S0

j]||<1[x

0j ,S

0j ]

∑

j,||[xti,St

i]−[x0

j,S0

j]||<1 1

. (19)

In the Appendix we provide the proof of the firstclaim along with a table (Table 4) that summarizes theoptimization step sizes for each method along with theresults after one iteration. Note that in the case of meanshift and anisotropic diffusion we are using the block

gradient descent method and optimize one pixel vectorat a time3.

4 FILTERING EXPERIMENTS

Following the example of Comaniciu and Meer [3], wenormalize the spatial and color coordinates of each pixelvector by dividing by the spatial (hs) and color (hc)resolution. Thus, the original feature vector [xi,Si] istransformed to [ xi

hs, Si

hr] (not included in the optimization

equations for simplicity reasons). Then, we perform theoptimization; one pixel at a time in the case of modefinding (Fig. 5, top right), or one iteration of the wholefeature set at a time in the mean shift and color meanshift cases (Fig. 5, top left). Fig. 6 displays the originalimages that we use for all the experiments in the rest ofthe section.

4.1 Epanechnikov vs Normal Kernel

First we present some filtering results when using dif-ferent kernels; namely the Epanechnikov and Normalkernel (Figs. 7,8). Each column of the figures depictsthe filtering result with a different algorithm; CMMS,LMF, CMS and AD stand for mode filtering, localmode filtering, color mean shift and anisotropic diffusionrespectively. In all cases the Normal kernel producessmoother results, while preserving edge discontinuities.As a matter of fact the color resolution hr is the one thatdefines the gradient magnitude above which there is anedge (to be preserved). So for the “hand” image, a colorrange of hr = 19 results in smoothing most of the textureon the background, while a value of hr = 10 retains mostthe texture (in RGB color space with a Normal kernel).In all the images mode finding and local mode filtering

produced very similar results. Furthermore color mean

3. We use the symbols xj , Sj to denote the current value of pixel pj .These might be the values of pixel pj at iteration t or t +1 dependingon whether pj is processed after or before pi.

7

(a) Hand (b) Workers

(c) Woman (d) Houses

Figure 6: The original images we use for the filteringexperiments. The first image is taken from Comaniciuand Meer’s mean shift segmentation paper, while theremaining are training images of the Berkeley segmen-tation database collection. Their sizes are 303× 243 and481× 321 pixels respectively.

shift and anisotropic diffusion gave similar results. Colormean shift seems to produce more crisp edges whileanisotropic diffusion smooths some of the edges. Over-all, color mean shift and anisotropic diffusion producemore uniform regions (e.g. suppresses the skin colorvariation on the “hand” image) and more crisp bound-aries between segments compared to mode finding andlocal mode filtering. The latter is particularly importantfor the segmentation step. We further investigate thisphenomenon in subsection 4.3.

For the remaining filtering experiments we use aNormal kernel.

4.2 RGB vs Luv Color Space

In Figs. 9, 10 we present the results when filtering inthe RGB and Luv color space. In general, filtering inLuv color space produces smoother images. This is dueto two facts. The euclidean distance between two Luvvalues is perceptually meaningful, i.e. it is proportionalto the distance of the colors as perceived by a humanobserver. This is not true in RGB, where very similarcolors might be located far away and the opposite.Furthermore the range of values for each component(L, u, v) is different (for example in our implementationL ∈ [0 . . . 100], u ∈ [−100 . . .180], v ∈ [−135 . . .110].),while each of the Red, Green and Blue components havevalues from 0 to 255.

In these experiments, mode finding and local modefiltering seem to produce almost identical images, whilecolor mean shift preserves the boundaries better than

(a) CMMS withEpanechnikovkernel

(b) LMF withEpanechnikovkernel

(c) CMS withEpanechnikovkernel

(d) AD withEpanechnikovkernel

(e) CMMS withNormal kernel

(f) LMF with Nor-mal kernel

(g) CMS withNormal kernel

(h) AD with Nor-mal kernel

(i) CMMS withEpanechnikovkernel

(j) LMF withEpanechnikovkernel

(k) CMS withEpanechnikovkernel

(l) AD withEpanechnikovkernel

(m) CMMS withNormal kernel

(n) LMF withNormal kernel

(o) CMS withNormal kernel

(p) AD with Nor-mal kernel

Figure 7: Epanechnikov vs Normal kernel experiment.We use hs = 5 (resulting in a window of 11× 11 pixels)and hr = 19. All the images are processed in RGB colorspace.

anisotropic diffusion. Both latter methods smooth theimage considerably more than the former ones.

4.3 Color uniformity of regions after filtering

Next we compare the ability of the filtering algorithmsto suppress texture and produce uniform regions. Oneissue is how to measure the color uniformity of regions.Here, we use the zero order (i.e., color histograms) andfirst order (i.e., gradient histograms) statistics to measurethe intensity variation in an filtered image. Then, wecompute the entropy of the two histograms. The entropydefinition4 measures how “random” an image is. Thus,an image created by sampling each pixel’s color valuefrom a uniform random distribution is expected to havea large entropy value, while a single uniform color imagehas an entropy of 0. In general lower entropy valuesindicate more uniform colored images, i.e. images withless number of segments of more uniform color.In Table 1 we display the entropy measures for each

method with the different kernels and color spaces (and

4. If X is a discrete random variable with possible val-ues {x1, . . . , xn} then the entropy is defined as H(X) =−

Pni=1 p(xi) logb p(xi), where b is the base of the logarithm (in our

case we use b = 2).

8

Table 1: Entropy measures for the color and gradient histograms for the four images after performing the filteringwith different methods and different kernels in the two color spaces. The first number is the entropy for the colorand the second for the gradient histogram. The lower the values the smaller the variation.

Hand Image Mode finding Local Mode filtering Color Mean Shift Anisotropic Diffusion

Epanechnikov, RGB 6.14, 12.97 6.14, 12.97 6.14, 12.97 6.14, 12.97Epanechnikov, Luv 7.02, 12.91 7.02, 12.91 7.42, 12.82 7.50, 12.83

Normal, RGB 7.15, 12.68 7.32, 12.59 8.91, 11.89 9.32, 11.94Normal, Luv 10.47, 10.85 11.20, 11.02 9.84,8.87 10.93, 9.16

Workers Image Mode finding Local Mode filtering Color Mean Shift Anisotropic Diffusion


Normal, RGB 12.46, 8.47 14.16, 8.48 10.82, 7.85 12.61, 8.14Normal, Luv 12.74, 7.05 14.31, 7.16 11.80, 6.17 13.16, 6.28

Woman Image Mode finding Local Mode filtering Color Mean Shift Anisotropic Diffusion



Houses Image Mode finding Local Mode filtering Color Mean Shift Anisotropic Diffusion



(a) CMMS withEpanechnikovkernel

(b) LMF withEpanechnikovkernel

(c) CMS withEpanechnikovkernel

(d) AD withEpanechnikovkernel

(e) CMMS withNormal kernel

(f) LMF with Nor-mal kernel

(g) CMS withNormal kernel

(h) AD with Nor-mal kernel

(i) CMMS withEpanechnikovkernel

(j) LMF withEpanechnikovkernel

(k) CMS withEpanechnikovkernel

(l) AD withEpanechnikovkernel

(m) CMMS withNormal kernel

(n) LMF withNormal kernel

(o) CMS withNormal kernel

(p) AD with Nor-mal kernel

Figure 8: Epanechnikov vs Normal kernel experiment.We use hs = 5 (resulting in a window of 11× 11 pixels)and hr = 19. All the images are processed in RGB colorspace.

(a) CMMS on RGBcolor space

(b) LMF on RGBcolor space

(c) CMS on RGBcolor space

(d) AD on RGBcolor space

(e) CMMS onLUV color space

(f) LMF on LUVcolor space

(g) CMS on LUVcolor space

(h) AD on LUVcolor space

(i) CMMS on RGBcolor space

(j) LMF on RGBcolor space

(k) CMS on RGBcolor space

(l) AD on RGBcolor space

(m) CMMS onLUV color space

(n) LMF on LUVcolor space

(o) CMS on LUVcolor space

(p) AD on LUVcolor space

Figure 9: RGB vs Luv color space experiments (1/2). Weuse hs = 5 (resulting in a window of 11 × 11 pixels)and hr = 5. All the images are processed with a Normalkernel.

9

(a) CMMS on RGBcolor space

(b) LMF on RGBcolor space

(c) CMS on RGBcolor space

(d) AD on RGBcolor space

(e) CMMS onLUV color space

(f) LMF on LUVcolor space

(g) CMS on LUVcolor space

(h) AD on LUVcolor space

(i) CMMS on RGBcolor space

(j) LMF on RGBcolor space

(k) CMS on RGBcolor space

(l) AD on RGBcolor space

(m) CMMS onLUV color space

(n) LMF on LUVcolor space

(o) CMS on LUVcolor space

(p) AD on LUVcolor space

Figure 10: RGB vs Luv color space experiments (2/2).We use hs = 5 (resulting in a window of 11× 11 pixels)and hr = 5. All the images are processed with a Normalkernel.

constant spatial and color resolutions hs = 5, hr = 5).From the results of Table 1 we observe that Color MeanShift with Normal kernel gives the smallest entropyvalues for all the images, hence producing the mostuniform regions. Anisotropic diffusion follows, whileMode finding and local mode filtering produce verysimilar results. A natural question to ask is whether theabove results are due to over smoothing. From the samplefiltering results presented above this does not seem tobe the case. The only way to verify that though is toperform the segmentation and then compare the resultsagainst human segmented images. In Sec. 8 we presentthese experiments. As we discuss there the segmentationresults for color mean shift are better than the ones forthe other filtering methods, thus we can safely concludethat color mean shift produces more uniform regions withoutover smoothing the original image.

4.4 Filtering speed comparison

An objective comparison of the filtering speed of thedifferent methods is not a simple task. Besides the im-plementation details that greatly affect the speed, thereis also a number of algorithmic parameters that cansignificantly speedup or slow down the convergence ofthe optimization procedure. We start our comparison byevaluating the role of these parameters and then wediscuss whether general speed up techniques that have

Figure 11: The filtering speed as a function of the imagesize (i.e., number of pixels) for all four methods. We usethe "workers" image (whose original size is 321 × 481pixels) and perform the filtering on the RGB color spacewith an Epanechnikov kernel with spatial and colorresolutions hs = 5, hr = 15 respectively. We also limit thenumber of iterations to 20 and the convergence thresholdis 0.001. We perform the filtering 5 times for each imagesize and only plot the median value.

been proposed in the literature can be applied to thedifferent methods or not. For fairness sake, we use ourown implementation of all the filtering methods thatconsists of Matlab files for the image handling and thegeneral input/output interface, while the optimizationcode is written in C. We perform all the experimentson a desktop computer with an Intel Core2 Quad CPU@3GHz5.

4.4.1 Image sizeThe number of pixels directly affect the filtering speed. Intheory, the complexity of the algorithm increases linearlywith the number of pixels, since each pixel represents afeature vector that needs to be processed. The theoreticalprediction is verified in practice as Fig. 11 shows.

4.4.2 Spatial resolution (hs)Theoretically, all the filtering methods (but Mean Shiftand Spatial Mean Shift) depend quadratically on thespatial bandwidth. In practice, other parameters, ex-plained below, make the dependence less than quadratic.Fig. 12 displays the filtering speed with respect to thespatial resolution for the methods, when all the otherparameters are the same.

4.4.3 Epanechnikov vs Normal kernelFor each pair of pixels, computation of the weight usingthe Epanechnikov kernel only requires a comparison,while the calculation of an exponential number is nec-essary for the case of the Normal kernel. As a resultthe former operation is much cheaper than the latter

5. Due to Matlab’s limitation only one core is used in the experi-ments.

10

Figure 12: The filtering speed as a function of the spatialresolution (hs) for all four methods. We use the "workers"image (321×481 pixels) and perform the filtering on theRGB color space with an Epanechnikov kernel (continu-ous line) or Normal kernel (dotted line). We also limit thenumber of iterations to 20 and stop the optimization forpixels that move less than 0.001 between two iterations.We perform the filtering 5 times for each value of hs andonly plot the median value.

and thus filtering with an Epanechnikov kernel is fastercompared to filtering with a Normal kernel as is shownin Fig. 12. Other researchers (e.g. [27]) have proposedthe use of lookup tables to approximately compute theexponents much faster.At this point we should note that the overall speed of

the segmentation process is also affected by the qualityof the result of the filtering process. We experimentallyfound, that using a normal kernel produced better resultsand as a consequence sped up the grouping step. Overallthe use of a Normal kernel still resulted in slowersegmentation times, but the time difference was not aslarge as Fig. 12 shows.

4.4.4 Convergence thresholdAs described above, on each iteration of the optimizationprocedure each pixel vector is compared against itsneighbors and shifted. If this shift is less than a pre-defined value (denoted convergence threshold) then weignore that pixel in subsequent iterations of the optimiza-tion procedure. Intuitively the convergence thresholddenotes how close to the “true” solution the optimizationshould reach before termination. At this point we wouldlike to emphasize that for the mode finding and thelocal mode filtering methods the shift of each pixelis a monotonically decreasing function of the iterationnumber, while for color mean shift and anisotropic dif-fusion it is not. Fig. 13 displays the filtering speed withrespect to the convergence threshold. As expected thehigher the threshold the faster the filtering. Especiallyfor thresholds less than 0.1 the filtering time decreasesalmost exponentially. According to this graph and all theprevious ones, local mode filtering is the fastest filteringoperation followed by anisotropic diffusion, and then

Figure 13: The filtering speed as a function of the con-vergence threshold for all four methods. We use the"workers" image (321 × 481 pixels) and perform thefiltering on the RGB color space with an Epanechnikovkernel with spatial and color resolution hs = 5, hr = 15respectively. We also limit the number of iterations to 50.We perform the filtering 5 times for each value of theconvergence threshold and only plot the median value.Notice that the X-axis is on logarithmic scale.

mode finding, while color mean shift is slightly slower.This is expected due to the extra number of calculationsneeded to estimate the 5D feature vector instead of the3D feature vector in the other methods.

4.4.5 Filtering speed optimizationIn the tests above, we use our own implementationof all the filtering methods, that is a straightforwardtranslation of Table 4 to Matlab and C code, to performthe speed experiments. A number of methods can beused to perform the filtering faster.In the core of all the filtering algorithms the pairwise

distance between feature points needs to be computedfor all pairs of points. As suggested in [3] employing datastructures and algorithms for multidimensional rangesearching can speed up the filtering. This technique canbe used in all the filtering methods and is expected tosignificantly improve the speed of slow methods such asmean shift and spatial mean shift.In mode finding the trajectory of most feature points

lay along the path of other feature points. Christoudiaset al. in [18] report a speed up of about five times relativeto the original algorithm when they “merge” the featurepoints together. This trick can directly be used in localmode filtering. A variation of the same concept couldalso be used to speed up the filtering in all the othermethods.The introduction of the multicore CPUs and, espe-

cially, GPUs has provided new way to improve theexecution speed of algorithms through a parallel im-plementation. From Table 4 and Fig. 5 it is clear thatthe filtering of each feature point can be performed inparallel. We expect that a careful implementation of anyof the four algorithms (i.e. mode finding, color mean

11

Table 2: Synopsis of the filtering results

• Normal kernel gives smoother images compared to Epanech-nikov kernel

• Luv color space produces smoother filtering results compared toRGB color space.

• Mode finding and local mode finding produce similar filteringresults. Mode finding performs slightly better filtering.

• Color mean shift and anisotropic diffusion produce similar fil-tering results. Color mean shift preserves the edges better thananisotropic diffusion.

• 3 − D filtering (i.e. local mode filtering) is almost equivalentto 5 − D filtering (i.e. mode finding) when the original imageis used for the comparison. When the image obtained in theprevious iteration is used then 5 − D filtering (i.e. color meanshift) preserves edges better than 3− D filtering (i.e. anisotropicdiffusion).

• Whether we use the original image for comparison or not affectsthe filtering more than whether we perform it in 3−D or 5−D.

• Local mode filtering is the fastest; mode finding and local modefiltering are a little bit slower; color mean shift is even slower.All the methods are fast enough to perform the filtering in realtime for a reasonably large image when implemented in GPUs.

shift, local mode filtering and anisotropic diffusion) ona modern GPU will run in real time for VGA or largerimages.

5 FILTERING CONCLUSIONS

So far, we presented a unifying framework under whichwe can express different filtering algorithms. Using thenew understanding of filtering, we developed three newedge preserving filtering methods, that we named ColorMean Shift, Spatial Mean Shift and Joined Bilateral Fil-tering. The first one exhibits similar clustering character-istics with the original Mean Shift method while beingalmost as computationally efficient as the Mode Findingmethod, so it was included in our filtering comparison.We performed a comparison of four different methods(Mode Finding, Color Mean Shift, Local Mode Filter-ing and Anisotropic diffusion) on a number of imageswith different configurations for the color space and thekernel function. Overall we noticed that Color MeanShift outperforms (i.e. creates more uniform segmentswith better boundary separation) than the other methodswith the drawback of being slightly slower. Table 2synopsizes the results of the experimental comparisonfor performing edge preserving filtering.

6 GROUPING METHODS

A variety of grouping methods exist in the literature forimage segmentation. As a matter of fact almost all thecolor based image segmentation methods are groupingmethods. Next, we describe the three methods that wehave chosen to use in the segmentation experiments.The first two methods are based on a simple connectedcomponents algorithm with a global threshold, whilethe last method is an extension of that algorithm. Allmethods are simple, namely they don’t require the useof complicated tuning parameters and they are usedwidely for image segmentation. Another advantage is

that they are fast so they can be used for (almost) realtime segmentation.

6.1 Greedy Connected Components grouping(CC3D and CC5D)

This is the same strategy that Comaniciu and Meerimplicitly use in their image segmentation algorithm [3].The method is a good starting point for our comparison;its simplicity allows us to compare the smoothing algo-rithms for the task of segmentation without worryingthat the result has been “changed” by the grouping al-gorithm. Thus, the quality of the segmentation is directlyrelated to the quality of the filtering.In a nutshell, the algorithm groups neighboring pixels

together if and only if their Euclidean distance is withina user defined threshold. Note that there is a 3 − Dand a 5 − D variant of this algorithm since pixel xi isrepresented by either a 3 − D vector (Si) or a 5 − Dvector ([xi,Si]) (Fig. 14). In our implementation we usean union-find data structure to perform the merging sothe complexity of the algorithm is almost linear on thenumber of pixels. A similar implementation was used inthe EDISON system [18].The biggest problem with this simple grouping

method is the “segment diffusion” problem, when twoquite different segments are merged together becausethere is a single weak (blurry) edge between them (e.g.the clouds and the sky are merged into a single segmentin the first images of the top row of Fig. 16). In orderto reduce the impact of this problem we reduce thegrouping threshold (t in Fig. 14, top row) to 0.5.

6.2 Grouping with an Adaptive Threshold (GAT)

Felzenszwalb and Huttenlocher in [19] present a vari-ation of the connected component algorithm where anadaptive threshold for merging segments is used. Eachsegment Ci keeps track of the maximum distance be-tween two pixels belonging to it6(denoted Int(Ci)) andtwo segments Ci, Cj are merged only if the mini-mum distance between the pixels belonging to theircommon boundary is smaller than the internal distanceInt(Ci), Int(Cj). The method is described in Fig. 14. Thisalgorithm is also linear on the number of pixels.

7 SEGMENTATION AS FILTERING PLUSGROUPING

The notion of segmentation consisting of a filteringfollowed by a grouping step is not new, but it is under-emphasized in the literature. Most image segmentation(i.e. grouping) algorithms operate on the original image,while the filtering algorithms are usually applied tothe problems of edge preserving smoothing or noiseremoval. Comaniciu and Meer [3] talk about “segmen-tation consisting of a filtering and a fusion step”, but

6. Only the edges belonging to the minimum spanning tree of thesegment are considered

12

Connected Components 3D (CC3D)

Input:set of pixels xi with intensities Si

a grouping threshold tOutput:a set of labels (label li for xi)

Algorithm:for all pixels xi

assign label lirepeat until convergence

for all pixels xi

for all pixels xj

if ||Si − Sj || < t and li 6= ljmerge the labels of xi and xj (li ≡ lj)

Connected Components 5D (CC5D)

Input:set of pixels xi with intensities Si

a grouping threshold tOutput:a set of labels (label li for xi)

Algorithm:for all pixels xi

assign label lirepeat until convergence

for all pixels xi

for all pixels xj

if ||[xi,Si]− [xj ,Sj ]|| < t and li 6= ljmerge the labels of xi and xj (li ≡ lj)

Grouping with an Adaptive Threshold (GAT)

Input:An image as a graph G = (V, E) with n vertices and m edges

Output:A segmentation of V into components S = (C1, ...Cr)

Algorithm:sort E into π = (o1, . . . , om) by non decreasing edge weightin the initial segmentation S0 each vertex vi is its own segmentfor q = 1, . . . , m construct Sq given Sq−1 as follows

let vi, vj be the vertices connected by the qth edge oq = (vi, vj)let pixels vi, vj belong to components Ci, Cj with|Ci|, |Cj | number of elements respectivelylet Int(Ci), Int(Cj) be the maximum edge weights of the minimum spanning tree of components Ci, Cj

let eq be the weight of edge oq

if vi, vj belong to different components Ci, Cj and eq < min{Int(Ci) + k|Ci|

, Int(Cj) + k|Cj|}

merge Ci, Cj

return S = Sm

Figure 14: The grouping algorithms that we use in the segmentation experiments.

they focus on the filtering step and they use the simpleconnected component algorithm of Fig. 14 top left, toobtain the final segments. Subsequent work from thesame group [18] focuses on how to bring edge informa-tion into the filtering and grouping step, but they stilluse a similar connected components algorithm. Close toour philosophy is the work of Unnikrisnan et al. [28]where they combine the filtering algorithm of [18] withthe grouping algorithm of [19]. Their focus, thought, isto introduce a new measure called Normalized Proba-bilistic Rand to compare the quality of segmentation.

One of the main points of this paper is that both stepsare important to obtain good segmentation results. InFigs. 15, 16, for example, we present the segmentationresults we obtained using different combinations of fil-tering and grouping methods. First, we use the samegrouping method, namely CC3D, along with the fourdifferent grouping algorithms. It is clear that dependingon the filtering method the sky is merged with the grass

(a) CMMS+CC3D (b) CMS+CC3D (c) LMF+CC3D (d) AD+CC3D

Figure 15: We present the segmentation results when weuse the same grouping method (CC3D) coupled withdifferent filtering methods. The filtering is performed onthe RGB color space with an Epanechnikov kernel withspatial and color resolution hs = 5, hr = 4 respectively.

or not. On the second figure the filtering method is keptconstant (color mean shift) while the grouping methodchanges. Here the results significantly depend on themethod, with the adaptive threshold method producingthe most intuitive segments. In the next section we exper-imentally study the problem of color based segmentation

13

(a) CMS+CC3D (b) CMS+CC5D (c) CMS+GAT

Figure 16: We present the segmentation results whenwe use the same filtering method (Color mean shift)followed by a different grouping method. The filteringis performed on the RGB color space with an Epanech-nikov kernel with spatial and color resolution hs =5, hr = 4 respectively.

by comparing different combinations of filtering andgrouping algorithms. More specifically we couple eachof the four filtering algorithms that we studied abovewith the three grouping algorithms that we introducedin the previous section to obtain a new segmentationmethod.

8 SEGMENTATION COMPARISON

There is little effort to classify image segmentation al-gorithms and compare their characteristics due to twomain factors. The multiplicity of methods each having anumber of parameters make the comparison extremelytedious. Moreover, the “right” segmentation is hard todefine, since there are many levels of detail in an imageand therefore multiple different meaningful segmenta-tions. S. Paris [29] for example, creates a hierarchicalstructure of segmentations where starting from a largenumber of segments, regions are merged together to cre-ate more coarse segmentations. Furthermore, in complexscenes the evaluation of a given segmentation mostlyrelies on subjective criteria. Borra and Shankar [30], forexample, go as far as suggesting that the proper seg-mentation is task and domain specific. The difficulty offormally defining the quality of a segmentation explainsthe lack of segmentation databases for natural images.The most complete attempt at comparing segmenta-

tion algorithms is presented on the Berkeley databaseand segmentation website [8]. A large set of imagesalong with human created segmentations are made avail-able for segmentation evaluation. This is the testbedwe use in this paper for the evaluation of the differentsegmentation methods7. More specifically we use the200 training images along with the 1087 human cre-ated segmentations. Next, we first describe the differentmeasures that we use for the comparison, and then wepresent the segmentation results.

8.1 Comparison measures

A number of measures have been proposed in the litera-ture in order to compare two different segmentations of

7. In Appendix ?? we also present segmentation results using theWeizmann Institute dataset [9].

the same image. In general the segmentation measurescan be classified in two categories; region based andboundary based. The first group includes measures, suchas the Global Consistency Error [8], the Variation ofInformation [31],[32] and the Probabilistic Rand index[33], that consider the overlap of the segments in the twosegmentations, while the second consists of measuresthat count the overlap or the distance of the boundaries,such as the Boundary Displacement Error [34]. We com-pared the segmentations using all the above measures,but we report results on the Probabilistic Rand index andthe Boundary Displacement Error only. This is due notonly on the lack of space, but mainly because the othermeasures were either not discriminative (Variation ofInformation) or misleading (Global Consistency Error).Boundary Displacement Error (BDE) This quantity

measures the average displacement error of the bound-ary pixels between two segmented images. Particularly,it defines the error of one boundary pixel in one segmen-tation as the distance between the pixel and the closestpixel in the other segmentation. BDE is not symmetric,thus we use it to measure the average distance of thehuman segmentation to the computer generated one.Intuitively, the lower the BDE value the more similarthe two segmentations are. A BDE measure of 0 indicatesthat all the boundaries of the human segmentation arecovered by the boundaries of the computer one, but notvice versa.Probabilistic Rand Index (PR) This measure counts

the fraction of pairs of pixels whose labellings are con-sistent between the computed segmentation and theground truth, averaging across multiple ground truthsegmentations to account for scale variation in humanperception. PR is a measure of similarity and as sucha value of 0 indicates no similarity, while a value of 1indicates the highest similarity.

8.2 Methodology

To produce the following segmentation figures we onlyvary the value of the color resolution hr of the fil-tering methods. More specifically, we let hr to obtainvalues from 0.6 to 20 on increments of 0.3. We keepthe remaining filtering parameters constant i.e., the max-imum number of iterations for convergence is set to20 and the convergence threshold to 0.1. We also usea spatial resolution of hs = 5, resulting on a 11 × 11smoothing window around each pixel. Furthermore, weutilize constant parameters for the grouping methods.More specifically the grouping threshold (parameter tof Fig. 5) is set to 1 and 0.5 for the CC5D and CC3Dgrouping algorithms respectively. We use the excellentC++ code provided by Felzenszwalb and Huttenlocher[19] with two different sets of parameters to implementthe grouping with the adaptive threshold (GAT). Onthe static setting we used σ = 0.5 and the k = 500as suggested in their paper. On the dynamic (varyingk setting) we change the value of k depending on the

14

value of hr. In the following experiments we use a linearrelation between hr and k8, namely

k = 45.83 ∗ hr + 142.5. (20)

We computed the comparison measures for each im-age of the database and further aggregated the resultsfor the whole database using the median value9. Thesevalues are plotted on the Y-axis of each figure. On theX-axis we plot the average segment size, instead of thecolor resolution hr. Thus all the plots below show theimplicit curve of one comparison measure with respectto the average segment size. The motivation behind thischoice is the following; a major goal of a segmentationalgorithm is to create as large segments as possible with-out merging areas belonging to different objects. Thusthe measures described above in conjunction with thesegment size only, can indicate whether a segmentationis good and useful. For the computation of the BoundaryDisplacement Error and the Probabilistic Rand Index weuse the code provided by J. Wright and A. Yang [35].

8.3 Filtering+Grouping vs Grouping

In the first set of experiments we compare the seg-mentation methods with and without filtering. We startwith the simple segmentation method of connectedcomponents (CC5D) in Fig. 17. Note that filtering theimage before performing the final grouping improvesthe segmentation results in both measures. In Fig. 18we present similar results when the GAT grouping isused. As mentioned in 8.2 there are two variations ofthe GAT; one with a constant parameter k = 500 and onewhere k changes according to Eq. 20. We observe that inboth cases the results when we performed the filteringand the grouping were better than when we performedthe grouping on the original images only. Especially inthe case of GAT with varying k there was a significantimprovement on both measures.

8.4 Epanechnikov vs Gaussian kernel and RGB vsLuv color space

On our previous work [36] we presented a comparisonbetween the two kernels; Epanechnikov and Gaussian,and the two color spaces; RGB and Luv. As it is shownin Fig. 19 significantly better results were obtained withthe Luv/Normal kernel combination.In Fig. 20 we extend the results for the case where

the GAT algorithm is used for grouping. Confirming ourprevious observation the best combination is also Luvcolor space and Normal kernel function. Furthermore,

8. Out of the infinite number of combinations for the pair (k, hr) wematch the average segment size obtained with CMS + CC5D withthe one obtained by GAT only to compute the coefficients. Thus, wecalculated the coefficients of the linear system by solving the systemof (k, hr) for values (170, 0.6) and (1050, 19.8).9. Since the comparison measures vary significantly for different

images we choose the median value as opposed to the mean valuebecause it is more robust to outliers.

for average segment sizes greater than 250 pixels theclear winner for the filtering algorithm is CMMS. Thisis contrary to the results for CC5D where CMS outper-formed CMMS in all cases. Thus, it is evident that toobtain the best segmentation results one needs to considerthe combination of filtering and grouping algorithms.

9 CONCLUSIONS

In this paper we presented our position that the problemof color based segmentation should be subdivided intoa filtering and a grouping component. We used theBerkeley segmentation dataset to validate our position.Furthermore, we created a number of new segmentationalgorithms by combining existing and new filtering andgrouping methods and we evaluated all the methodsextensively. Table 3 synopsizes the results of the ex-perimental comparison for performing edge preservingfiltering and color based segmentation.There are two main results that we want to emphasize

here. In all the experiments, processing the image withan edge preserving filter before using a grouping methodproduced significantly better results. Thus it is beneficialto consider the segmentation process to be a combination ofa filtering and a grouping step.Second, depending on the grouping method that is

used, a different filtering process produces best results.For grouping with a hard threshold (i.e. CC3D andCC5D methods) Color Mean Shift filtering worked best.When grouping with an adaptive threshold (i.e. GATmethod) Mode Finding proved to be the best method.As a conclusion, when considering the problem of colorbased segmentation, one should study the combination ofthe filtering and the grouping method to obtain the bestresults. Studying only one component in isolation is notsufficient.Our overall comparison showed that for the Berkeley

dataset the best method to use is a combination of ModeFinding with Grouping with Adaptive Threshold (withvariable k). Furthermore the results are better when thefiltering is performed in Luv color space with a Normalkernel.There are many interesting directions for future re-

search. In this paper we focused on the filtering morethan the grouping step. It would be interesting to per-form the comparison using a more wide range of group-ing methods, namely global energy minimization meth-ods (e.g. graph cut), eigenvector based methods (e.g.normalized cuts) and soft assignment methods based onalgebraic multigrid.

15

Figure 17: Comparison of CC5D grouping with and without filtering the images. The filtering in the plots wasperformed on the RGB color space with an Epanechnikov kernel. Similar results were obtained on the Luv colorspace and with the Gaussian kernel. Both BDE and PR measures show that filtering improves the quality ofsegmentation.

Figure 18: Comparison of GAT grouping with and without filtering the images. The filtering in the plots wasperformed on the Luv color space with a Gaussian kernel. Similar results were obtained on the RGB color space andwith the Epanechnikov kernel. Both BDE and PR measures show that filtering improves the quality of segmentation.

16

Figure 19: Comparison of segmentation methods when performing the filtering on different color spaces usingdifferent kernels. Two color spaces (RGB and Luv) and two kernel functions (Epanechnikov and Normal) werecompared. We used the two best filtering methods, namely CMMS and CMS and the CC5D grouping method.

Figure 20: Comparison of segmentation methods when performing the filtering on different color spaces usingdifferent kernels. Two color spaces (RGB and Luv) and two kernel functions (Epanechnikov and Normal) werecompared. We used the two best filtering methods, namely CMMS and CMS and the GAT grouping method.

17

APPENDIX

Claim 3: Local mode filtering (and mode finding witha Gaussian kernel) can be considered as gradient de-scend methods for solving the corresponding optimiza-tion problem (Eqs. 13 and 10 respectively) with a stepsize at iteration t of

γti = −

1∑


j ]). (21)

Proof: A proof for local mode filtering follows. Eachpixel pi is optimized separately. So if we replace the stepsize γi in the general gradient descent algorithm we get

St+1i = S

ti − γt

i∇∑

j

KN ([xi,Sti]− [xj ,S

0j ]) (22)

St+1i = S

ti − γt

i

∑

j

∇KN ([xi,Sti]− [xj ,S

0j ]) (23)

St+1i = S

ti − γt

i

∑

j

KN([xi,Sti]− [xj ,S

0j ])[S

0j − S

ti] (24)

St+1i = S

ti + (γt

i

∑

j KN([xi,Si]− [xj ,S0j ]))S

ti

−γti

∑

j KN ([xi,Sti]− [xj ,S

0j ])S

ti

(25)

St+1i =

∑

j KN([xi,Sti]− [xj ,S

0j ])S

0j

∑


j ])(26)

that is exactly the intensity values for pixel xi at the nextiteration t + 1.

To prove the claim for mode finding with a Gaussiankernel one only needs to replace the occurrence of

Table 3: Synopsis of the filtering results

• Segmentations obtained by grouping methods alone havemuch lower quality than the ones obtained using a com-bination of a filtering and a grouping method.

• All segmentation methods are very sensitive to imagevariations. The methods based on Grouping with anAdaptive Threshold (GAT) are the least sensitive to interimage variation. They also exhibit the least sensitivity tothe segmentation parameters (hr, k) when segmenting thesame image.

• Segmentation methods based on GAT grouping are notmonotonic.

• Segmentation methods based on GAT grouping outper-form , on average, all the other segmentation methods.

• Segmentation methods based on GAT grouping are themost stable to color resolution changes i.e., exhibit lessvariation of the average segment size.

• Segmentation methods based on CC3D and CC5D group-ing have very similar performance, with the CC3D onesproducing slightly better segmentation results.

• The graphs of the Probabilistic Rand Index (PR) andBoundary Displacement Error (BDE) measures are themost discriminative.

• Color Mean Shift (CMS) based segmentation methodsoutperform all the other filtering methods when they arecombined with CC3D or CC5D grouping methods.

• When using GAT grouping with varying parameter kMode Finding (CMMS) produces the best results.

• Filtering in Luv produces much larger segments thanfiltering in RGB for a given color resolution hr . Filteringwith a Normal kernel results in larger segments comparedto using a Epanechnikov kernel.

• The selection of the kernel function seems to be very im-portant for the segmentation results. More specifically, weobtained the best segmentation results when the filteringwas performed with a Normal kernel in the Luv colorspace. The second best configuration is a Normal kernelwith an RGB color space, while the results obtained withan Epanechnikov kernel in either RGB or Luv color spacesare much worse.

Sti, S

t+1i ,S0

j with [xti,S

ti], [xt+1

i ,St+1i ], [x0

j ,S0j ] respec-

tively, because the optimization is performed on the 5−Ddomain.

REFERENCES

[1] K. Fukunaga and L. Hostetler, “The estimation of the gradient ofa density function with applications in pattern recognition,” IEEETrans. Information Theory, vol. 21, pp. 32–40, 1975.

[2] Y. Cheng, “Mean shift, mode seeking, and clustering,” PAMI,vol. 17, pp. 790–799, 1995.

[3] D. Comaniciu and P. Meer, “Mean shift: A robust approachtoward feature space analysis,” IEEE Trans. on PAMI, pp. 603–619,2002.

[4] S. Smith and J. Brady, “Susan a new approach to low level imageprocessing,” IJCV, vol. 23, pp. 45–78, 1997.

[5] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and colorimages,” ICCV, pp. 839–846, 1998.

[6] J. van de Weijer and R. van den Boomgaard, “Local modefiltering,” CVPR, vol. 2, pp. 428–432, 2001.

[7] P. Perona and J. Malik, “Scale-space and edge detection usinganisotropic diffusion,” PAMI, vol. 12, no. 7, pp. 629–639, 1990.

[8] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of humansegmented natural images and its application to evaluating seg-mentation algorithms and measuring ecological statistics,” ICCV,vol. 2, pp. 416–423, 2001.

18

Method Step Size Single iteration result

Mode Finding with KE γti = −

1

2cE

P

j,||[xti,St

i]−[x0

j,S0

j]||<1 1

[xt+1i ,S

t+1i ] =

P

j,||[xti,St

i]−[x0

j,S0

j]||<1[x

0j ,S

0j ]

P

j,||[xti,St

i]−[x0

j,S0

j]||<1 1

Mode Finding with KN γti = −

1P

jKN ([xt

i,Sti] − [x0

j , S0j ])

[xt+1i ,S

t+1i ] =

P

jKN ([xt

i,Sti] − [x0

j ,S0j ])[x

0j ,S

0j ]

P

jKN ([xt

i,Sti] − [x0

j ,S0j ])

Mean Shift with KE γti = −

1

4cE

P

j,||[xti,St

i]−[xj,S

j]||<1 1

[xt+1i ,S

t+1i ] =

P

j,||[xti,St

i]−[xj,Sj ]||<1[xj ,Sj ]

P

j,||[xti,St

i]−[xj,Sj ]||<1 1

Mean Shift with KN γti = −

1

2P

jKN([xt

i,Sti] − [xj ,Sj ])

[xt+1i ,S

t+1i ] =

P

jKN ([xt

i,Sti] − [xj ,Sj ])[xj ,Sj ]

P

jKN ([xt


Spatial Mean Shift with KE γti = −

1

2cE

P

j,||[xti,St

i]−[xj ,S0

j]||<1 1

[xt+1i ,S

t+1i ] =

P

j,||[xti,St

i]−[xj,S0

j]||<1[xj , S

0j ]

P

j,||[xti,St

i]−[xj ,S0

j]||<1 1

Spatial Mean Shift with KN γti = −

1P

j KN ([xti,S

ti] − [xj , S

0j ])

[xt+1i ,S

t+1i ] =

P

jKN ([xt

i,Sti] − [xj ,S

0j ])[xj ,S

0j ]

P

j KN ([xti,S

ti] − [xj ,S

0j ])

Color Mean Shift with KE γti = −

1

2cE

P

j,||[xti,St

i]−[x0

j,S

j]||<1 1

[xt+1i ,S

t+1i ] =

P

j,||[xti,St

i]−[x0

j,Sj ]||<1[x

0j , Sj ]

P

j,||[xti,St

i]−[x0

j,S

j]||<1 1

Color Mean Shift with KN γti = −

1P

jKN ([xt

i,Sti] − [x0

j ,Sj ])[xt+1

i ,St+1i ] =

P

jKN ([xt

i,Sti] − [x0

j ,Sj ])[x0j ,Sj ]

P

jKN ([xt

i,Sti] − [x0

j ,Sj ])

Local Mode Filtering with KN γti = −

1P

jKN ([xt

i,Sti] − [x0

j , S0j ])

St+1i =

P

jKN ([xt

i,Sti] − [x0

j ,S0j ])S

0j

P

jKN ([xt

i, Sti] − [x0

j ,S0j ])

Anisotropic Diffusion with KN γti = −

1

2P

jKN([xt


St+1i =

P

jKN ([xt

i,Sti] − [xj ,Sj ])Sj

P

jKN ([xt

i, Sti] − [xj ,Sj ])

Table 4: Step sizes and iteration results for the different filtering methods with different kernels.

[9] S. Alpert, M. Galun, R. Basri, and A. Brandt, “Image segmentationby probabilistic bottom-up aggregation and cue integration,”CVPR, pp. 1–8, 2007.

[10] D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based objecttracking,” PAMI, vol. 25, pp. 564– 577, 2003.

[11] Y. Wei and L. Quan, “Region-based progressive stereo matching,”CVPR, pp. 106–113, 2004.

[12] B. Georgescu, I. Shimshoni, and P. Meer, “Mean shift basedclustering in high dimensions: A texture classification example,”ICCV, pp. 456–463, 2003.

[13] D. DeMenthon and R. Megret, “Spatio-temporal segmentation ofvideo by hierarchical mean shift analysis,” tech. rep., 2002.

[14] C. Yang, R. Duraiswami, N. Gumerov, and L. Davis, “Improvedfast gauss transform and efficient kernel density estimation,”ICCV, pp. 464–471, 2003.

[15] M. Fashing and C. Tomasi, “Mean shift is a bound optimization,”PAMI, vol. 27, pp. 471–474, 2005.

[16] X. Yuan and S. Z. Li, “Half quadratic analysis for mean shift: withextension to a sequential data mode-seeking method,” ComputerVision, IEEE International Conference on, vol. 0, pp. 1–8, 2007.

[17] M. Carreira-Perpinan, “Gaussian mean-shift is an em algorithm,”IEEE Trans. PAMI, vol. 29, no. 5, pp. 767–776, 2007.

[18] C. Christoudias, B. Georgescu, and P. Meer, “Synergism in low-level vision,” ICPR, vol. 4, pp. 150–155, August 2002.

[19] P. Felzenszwalb and D. Huttenlocher, “Efficient graph-based im-age segmentation,” IJCV, vol. 59, no. 2, pp. 167–181, 2004.

[20] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energyminimization via graph cuts,” IEEE Trans. on PAMI, vol. 23, no. 11,pp. 1222–1239, 2001.

[21] J. Shi and J. Malik, “Normalized cuts and image segmentation,”IEEE PAMI, vol. 22, no. 8, pp. 888–905, 2000.

[22] Y. Weiss, “Segmentation using eigenvectors: a unifying view.,”ICCV, pp. 975–982, 1999.

[23] E. Sharon, A. Brandt, and R. Basri, “Fast multiscale image seg-mentation,” CVPR, pp. 70–77, 2000.

[24] V. Epanechnikov, “Nonparametric estimation of a multivariateprobability density,” Theory Prob. Appl. (USSR), vol. 14, pp. 153–158, 1969.

[25] S. Rao, A. Martins, and J. Principe, “Mean shift: An informationtheoretic perspective,” Pattern Recognition Letters, vol. 30, no. 3,pp. 222 – 230, 2009.

[26] M. Black, G. Sapiro, D. Marimont, and D. Heeger, “Robustanisotropic diffusion,” IEEE Transactions on Image Processing,vol. 7, no. 3, pp. 421–432, 1998.

[27] D. Defour, F. D. Dinechin, and J. Muller, “A new scheme for table-based evaluation of functions,” Tech. Rep. ISSN 0249-6399, InstitutNational de Recherche en Informatique et en Automatique, 2002.

[28] R. Unnikrishnan, C. Pantofaru, and M. Hebert, “Toward objectiveevaluation of image segmentation algorithms,” PAMI, vol. 29,no. 6, pp. 929–944, 2007.

[29] S. Paris and F. Durand, “A topological approach to hierarchicalsegmentation using mean shift.,” CVPR, 2007.

[30] S. Borra and S. Sarkar, “A framework for performance character-ization of intermediate level grouping modules,” PAMI, vol. 19,no. 11, pp. 1306–1312, 1997.

[31] M. Meila, “Comparing clusterings by the variation of informa-tion,” Conference Learning Theory, 2003.

[32] M. Meila, “Comparing clusterings: an axiomatic view,” ICML,pp. 577 – 584, 2005.

[33] R. Unnikrishnan, C. Pantofaru, and M. Hebert, “A measure forobjective evaluation of image segmentation algorithms,”Workshopon Empirical Evaluation Methods in Computer Vision, CVPR, 2005.

[34] J. Freixenet, X. Munoz, D. Raba, J. Marti, and X. Cuff, “Yet anothersurvey on image segmentation: Region and boundary informationintegration,” ECCV, pp. 408–422, 2002.

[35] A. Y. Yang, J. Wright, Y. Ma, and S. Sastry, “Unsupervised seg-

19

mentation of natural images via lossy data compression,” Comput.Vis. Image Underst., vol. 110, no. 2, pp. 212–225, 2008.

[36] K. Bitsakos, C. Fermüller, and Y. Aloimonos, “An experimentalstudy of color-based segmentation algorithms based on the mean-shift concept,” ECCV, vol. 2, pp. 506–519, 2010.

Documents

Color based image segmentation as edge preserving ﬁltering ...€¦ · Gauss Transform based mean shift [14] for Normal ker-2. In the recent papers, the original “mean shift”