Unsupervised multiscale segmentation of color images

www.elsevier.com/locate/patrec

Pattern Recognition Letters 28 (2007) 523–533

Unsupervised multiscale segmentation of color images

Claudio Rosito Jung *

UNISINOS – Universidade do Vale do Rio dos Sinos, PIPCA – Graduate School of Applied Computing, Av. UNISINOS, 950,

Sao Leopoldo 93022-000, RS, Brazil

Received 30 September 2005; received in revised form 17 July 2006Available online 27 November 2006

Communicated by Y.J. Zhang

Abstract

This paper proposes a new multiresolution technique for color image representation and segmentation, particularly suited for noisyimages. A decimated wavelet transform is initially applied to each color channel of the image, and a multiresolution representation isbuilt up to a selected scale 2J. Color gradient magnitudes are computed at the coarsest scale 2J, and an adaptive threshold is used toremove spurious responses. An initial segmentation is then computed by applying the watershed transform to thresholded magnitudes,and this initial segmentation is projected to finer resolutions using inverse wavelet transforms and contour refinements, until the full res-olution 20 is achieved. Finally, a region merging technique is applied to combine adjacent regions with similar colors. Experimentalresults show that the proposed technique produces results comparable to other state-of-the-art algorithms for natural images, and per-forms better for noisy images.� 2006 Elsevier B.V. All rights reserved.

Keywords: Segmentation; Watersheds; Wavelets; Multiresolution; Color images; Region merging

1. Introduction

Image segmentation consists of partitioning an imageinto isolated regions, such that each region shares commonproperties and represents a different object. Such task istypically the first step in more advanced vision systems,in which object representation and recognition are needed.Although isolating different objects in a scene may be easyfor humans, it is still surprisingly difficult for computers.An additional problem arises when dealing with colorimages, due to the variety of representations (color spaces)that can be used to characterize color similarity.

Several authors have tackled the problem of color imagesegmentation, using a variety of approaches, such as activecontours, clustering, wavelets and watersheds, among oth-ers. Some of these techniques are revised next.

0167-8655/$ - see front matter � 2006 Elsevier B.V. All rights reserved.

doi:10.1016/j.patrec.2006.10.001

* Tel.: +55 51 3591 1122x1626; fax: +55 51 3590 8162.E-mail address: [email protected]

Sapiro (1997) proposed a framework for object segmen-tation in vector-valued images, called color snakes. As inthe original snakes formulation for monochromatic images,color snakes present the nice property of smooth contours,but require a manual initialization and may face convergeproblems.

Comaniciu and Meer (2002) proposed a unifiedapproach for color image denoising and segmentationbased on the mean shift. A kernel in the joint spatial-rangedomain is used to filter image pixels in the CIELUV colorspace, and filtered pixels are clustered to obtain segmentedobjects. Although this technique presents good results,it requires a manual selection of spatial (hs) and color (hr)bandwidths, and optionally a minimum area parameter(M) for region merging.

Liapis et al. (2004) proposed a wavelet-based algorithmfor image segmentation based on color and texture proper-ties. A multichannel scale/orientation decomposition usingwavelet frame analysis is performed for texture featureselection, and histograms in the CIELAB color space are

mailto:[email protected]

524 C.R. Jung / Pattern Recognition Letters 28 (2007) 523–533

used for color feature extraction. Two labelling algorithmsare proposed to obtain the final segmentation results basedon either or both features. This technique also achieves nicesegmentation results for natural complex images, but thenumber of different color–texture classes must be selectedby the user, which may not be easy to define in practicalapplications. Ma and Manjunath (2000) and Deng andManjunath (2001) also proposed approaches for image seg-mentation based on color and texture. In (Ma and Manju-nath, 2000), a predictive coding model was created toidentify the direction of change in color and texture at eachimage location at a given scale, and object boundaries aredetected where propagated ‘‘edge flows’’ meet. In JSEG(Deng and Manjunath, 2001), a color quantization schemeis initially applied to simplify the image. Then, local win-dows are used to compute J-images, that return high valuesnear object boundaries and low values in their interior.Finally, a multiscale region growing procedure is appliedto obtain the final segmentation. It is important to noticethat JSEG is intended to be an unsupervised segmentationmethod, meaning that it is free of user-defined parameters.

Nock and Nielsen (2003, 2004, 2005) proposed fast seg-mentation techniques based on statistical properties ofcolor images. These approaches take into account expectedhomogeneity and separability properties of image objectsto obtain the final segmentation through region merging.In particular, the techniques described in (Nock and Niel-sen, 2003, 2004) are unsupervised and well suited for noisyimages, while the method presented in (Nock and Nielsen,2005) requires some user assistance.

Other authors have combined watershed segmentationwith multiresolution image representations. Scheundersand Sijbers (2002) used a non-decimated wavelet transformto build a multiscale color edge map, which was filtered bya multivalued anisotropic diffusion. The watershed trans-form is then applied at each scale, and a hierarchical regionmerging procedure is applied to connect segmented regionsat different scales. Despite the denoising power of bothwavelet transform and anisotropic diffusion, the experi-mental results presented in their paper indicate a consider-ably large number of segmented regions. Vanhamel et al.(2003) also explored multiscale image representationsand watersheds for color image segmentation. In theirapproach, the scale-space is based on a vector-valued diffu-sion scheme, and color gradients in the YUV color spaceare computed at each scale. After applying the watershedtransform, the dynamics of contours in scale-space are usedto validate detected contours. Results presented in thepaper are visually pleasant, but noisy images were nottested. Kazanov (2004) proposed a multiscale watershed-based approach for detecting both small and large objects,focused on scanned pages of color magazines. For detect-ing small objects, a small-support edge detector is used.For larger objects, a multiscale version of the gradient esti-mator is computed. One potential problem of this tech-nique is its high sensitivity to noise/texture, due to theuse of small-support edge detectors.

Although this bibliographical revision was mostlyfocused on techniques involving wavelets, watersheds ormultiresolution analysis, there are several other recentcompetitive approaches for color image segmentation, suchas Makrogiamis et al. (2003), Chen et al. (2004), Nikolaevand Nikolayev (2004), Marfil et al. (2004), Navon et al.(2005). Also, it can be noticed that most authors havenot considered the problem of noisy color image segmenta-tion, specially for large amounts of noise contamination.

This work extends the procedure proposed in (Jung,2003) for multiscale segmentation of color images with sev-eral improvements, such as the formulation of a statisticalmodel for gradient magnitudes of color images using jointinformation of color channels, the automatic thresholdingof color gradient magnitudes based on a posteriori proba-bilities, and the inclusion of a similarity metric in the CIE-LAB color space for region merging, based on perceivedcontrast between colors. It should be noticed that theapproach in (Jung, 2003) can only be applied to monochro-matic images.

In the proposed approach, a decimated wavelet trans-form (WT) is initially applied to each color channel, pro-ducing a multiresolution image representation up to aselected scale 2J. A color gradient magnitude image is com-puted at the coarsest resolution 2J, and an adaptive thres-hold is used to remove spurious responses. The watershedtransform is applied to thresholded magnitudes, obtainingan initial segmentation. The inverse wavelet transform(IWT) is then used to project this initial segmentation tofiner scales, until the full resolution image is achieved.Finally, a region merging procedure based on CIELABcolor distances is applied to obtain the final segmentation.As it will be discussed along this manuscript, the denoisingpower of the WT combined with the probabilistic edge esti-mator effectively reduces the well-known oversegmentationproblem of the watershed transform, even for images withsignificant noise contamination. Also, the same set ofdefault parameters produces good results for most images,indicating that the proposed technique can be used forunsupervised image segmentation.

The remainder of this paper is organized as follows. Sec-tion 2 provides a very brief revision on wavelets and water-sheds. The proposed method is described in Section 3, andexperimental results are provided in Section 4. Finally, con-clusions are drawn in the final section.

2. The wavelet transform and the watershed transform

2.1. Decimated wavelet transforms

In a few words, the WT of an intensity image up to thescale 2J is a set of detail subimages W h

2j , W v2j ;W d

2j , forj = 1, . . .,J, and a smoothed and downsampled versionA2J of the original image, called in this work the approxi-

mation image. According to Mallat’s pyramid algorithm(Mallat, 1989), such subimages can be obtained by acombination of convolutions with low-pass and high-pass

C.R. Jung / Pattern Recognition Letters 28 (2007) 523–533 525

filters associated with a mother wavelet followed by down-samplings. It is important to notice that the original imagecan be reconstructed using images fW h

2j ;W v2j ;W d

2jgfj¼1;2;...;Jgand A2J . For color images, such multiscale representationcan be obtained for each color channel (the RGB colorspace was used in this work).

Although there is a great variety of mother wavelets tochoose from, the well-known Haar wavelet (Strang andNguyen, 1996) was selected in this work, mainly due toits small support (hence providing good localization inspace). Also, it requires small computational complexity(linear with respect to the size of the input image) to com-pute the wavelet decomposition with the Haar wavelet. Theimportance of using a small support wavelet basis will bemore evident in Section 3.2.

2.2. The watershed transform

A powerful tool for image processing based on mathe-matical morphology is the watershed transform (Meyerand Beucher, 1990). In particular, watersheds are usefulto obtain image partitions, and they are attractive solutionsfor image segmentation when applied to edge maps (gray-scale or color). Such edge maps may be regarded as topo-graphical relieves, that are flooded up starting at localminima. When water coming from different regions meeta ‘‘dam’’ is built. At the end of flooding, the watershedtransform produces a set of connected regions (segmentedobjects) separated by dams (object contours).

A well-known problem associated with watersheds isoversegmentation. Images typically contain texture and/or noise, that produce spurious gradients (and also localminima), generating a large number of segmented objects.There several approaches to reduce the oversegmentationproblem of watersheds, such as markers (Vincent andSoille, 1991), statistical methods (Gies and Bernard,2004) and fuzzy techniques (Patino, 2005), just to name afew. In this work, oversegmentation is reduced by remov-ing small gradient magnitudes according to an adaptivethreshold.

3. The proposed algorithm

The proposed algorithm can be summarized into thefollowing steps:

(1) select a desired scale 2J, and compute the WT of eachcolor channel up to this scale;

(2) compute and threshold the color gradient magnitudeof approximation image at the coarsest resolution;

(3) apply the watershed transform to thresholded magni-tudes, and obtain an image representation at scale 2J;

(4) project the image representation to the full resolution20, using the IWT;

(5) apply a contrast-based region merging technique.

These steps are detailed next.

3.1. Thresholded color magnitudes

Let f~W h2j ; ~W v

2j ; ~W d2jgfj¼1;2;...;Jg and ~A2J denote the multires-

olution wavelet decomposition of the original color image~Iinto J scales, where

~A2J ¼ ðAR2J ;AG

2J ;AB2J Þ; ~W s

2j ¼ ðW s;R2j ;W

s;G2j ;W

s;B2j Þ; s 2 fh; v;dg ð1Þ

represent the approximation and detail images correspond-ing to each individual color channel R, G or B. If Dc

v and Dch

denote the vertical and horizontal differences of the Prewittoperator applied to images Ac

2J , then a color gradient mag-

nitude at the coarsest scale 2J is obtained through

M2J ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXc2fR;G;Bg

ðDcvÞ

2 þ ðDchÞ

2

s: ð2Þ

Although images Ac2J were denoised by the low-pass filter-

ing involved in the WT, they still contain residual noise.This residual noise typically produces small magnitude val-ues in M2J that are responsible for oversegmentation duringthe watershed transform. To remove such small gradients,we apply an adaptive threshold based on the expected dis-tribution of noise-related and edge-related magnitudes,similarly to the procedure adopted in (Henstock and Chel-berg, 1996).

If pn(r) and pe(r) are the distributions of gradient magni-tudes associated with noise and edge magnitudes, respec-tively, then

pðrÞ ¼ wnpnðrÞ þ ð1� wnÞpeðrÞ ð3Þ

provides an adequate fit for the histogram of magnitudes(here, wn represents the a priori probability of distributionpn). According to Bayes’ rule, a certain pixel with gradientmagnitude r has the following posterior probability ofbeing an edge:

pðedgejrÞ ¼ ð1� wnÞpeðrÞpðrÞ : ð4Þ

Given a certain probability threshold 0 6 P 6 1, such pixelis qualified as an edge if p(edgejr) P P. For grayscaleimages, Henstock and Chelberg (1996) assumed Gaussiandistributions for horizontal and vertical differences of bothedge and noise pixels, used gamma probability densityfunctions to model both pn(r) and pe(r), and choseP = 0.5 as the optimal threshold.

The Gaussian assumption is widely accepted to modellocal differences of noise-related pixels. On the other hand,local intensity differences of natural images are not nor-mally distributed, mostly due to a sharp peak near theorigin related to homogeneous regions (Srivastava et al.,2003). However, Scharcanski et al. (2002) showed that hor-izontal and vertical intensity differences related exclusivelyto edges (disregarding homogeneous regions) can beapproximated through a normal distribution. Hence, weassume that local differences for both noise and edges arenormally distributed in each color channel.

Fig. 1. (a) Original house image. (b) Noisy house image.


Also, it is known that, if X1,X2, . . .,Xm are m indepen-dent identically-distributed (i.i.d.) random variables with

normal distribution, then Y ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiX 2

1 þ X 22 þ � � �X 2

m

qis a ran-

dom variable with a chi distribution. It should be noticedthat intensity differences Dc

v and Dch for edge-related coeffi-

cients present some correlation in most natural images, butwe assume the simplification of statistical independence toobtain a simple and closed-form expression for pn(r) andpe(r) based on chi distributions:1

pnðrÞ ¼1

8r6n

r5e�r2=2r2n ; peðrÞ ¼

1

8r6e

r5e�r2=2r2e ; ð5Þ

where r2n is the variance of noisy coefficients and r2

e is thevariance of coefficients related to edges. Parameters rn, re

and wn required in Eq. (3) are estimated by ML (maxi-mum-likelihood) (Casella and Berger, 1990), and the thres-holded magnitude image M t

2J is given by

M t2J ðx; yÞ ¼

0; if pðedgejM2J ðx; yÞÞ < P ;

M2J ðx; yÞ; if pðedgejM2J ðx; yÞÞP P :

�ð6Þ

The watershed transform of M t2J is then computed, and the

initial segmentation at the coarsest resolution 2J is ob-tained. Larger values of P result in less spurious edges,but may also remove relevant small contrast edges, produc-ing contour gaps (and leakage during watersheds). On theother hand, smaller values of P result in less noise removaland also less contour gaps (increasing oversegmentation).In this work, we selected P = 0.25, aiming to keep smallcontrast edges. Although there is a slight increase in thenumber of segmented regions if compared to the more intu-itive choice P = 0.5, this problem can be solved using apost-processing algorithm based on region merging,as discussed in Section 3.3.

To illustrate our thresholding procedure, let us considerthe 256 · 256 house image and its noisy version(PSNR = 16.90 dB)2 shown in Fig. 1. Fig. 2(a) and (b)illustrate color gradient magnitudes for the original houseimage at scale 21 before and after thresholding, andFig. 2(c) and (d) show magnitudes at scale 22. Fig. 2(e)–(h) illustrate analogous results for the noisy version. As itcan be observed, our thresholding procedure did notdestroy any relevant edge in the ‘‘clean’’ image, and itwas able to reduce spurious responses in the noisy version.It can also be noticed that noise is smoothed out as thescale 2J gets larger, due to low-pass filtering involved inthe wavelet decomposition.

Fig. 3 illustrates how oversegmentation is attenuated bythe proposed thresholding technique. Watershed linescorresponding to the noisy magnitude images shown inFig. 2(e)–(h) are illustrated in Fig. 3(a)–(d). It can be

1 It should be noticed that more generic distributions to model edge-related magnitudes (such as gamma) were tested, with no significantdifferences if compared to the chi distribution.

2 The original image can be found at the USC-SIPI Image Database(http://sipi.usc.edu/database).

noticed that applying the watershed transform to raw colormagnitudes results in several small segmented regions, asexpected. On the other hand, the proposed thresholdingprocedure can reduce significantly the number of seg-mented regions at scales 21 and 22.

3.2. Initial segmentation and projection to finer resolutions

The output of the previous step is an initial segmenta-tion at scale 2J, consisting of a set of labelled connectedcomponents separated by watershed lines. It should benoticed that the size of the segmented image is about 2�J

of the size of the original image, due to the downsamplingused in the WT.

Now, let us consider a representation ~R2J of the approx-imation image ~A2J , obtained by replacing each labelledregion of the watershed transform with the color averageof corresponding region in ~A2J , and setting ~R2J ¼~0 atwatershed lines. As noticed by Kim and Kim (2003), a sim-ple pixel duplication procedure to restore the full resolutionimage produces an unpleasing blocking effect, and edgedefinition is lost. However, such problem can be overcameif we compute inverse wavelet transforms, similarly to theapproach adopted in (Jung, 2003).

In order to project ~R2J into adjacent scale 2J�1, detailcoefficients W h;c

2J , W v;c2J and W d;c

2J are needed, forc 2 {R,G,B}. To keep a piecewise constant representationof the approximation image ~A2J , we use only detail coeffi-cients related to region borders. In fact, due to the goodlocalization property of the WT (particularly, the verysmall support of Haar wavelets), we can determine exactlythe region of influence of each wavelet coefficient duringthe IWT. Let us consider the updated wavelet coefficients

UAc2J ðx;yÞ¼

Ac2J ðx;yÞ; if ðx;yÞ belongs to region border;

Rc2J ðx;yÞ; otherwise;

(ð7Þ

UW s;c2J ðx;yÞ¼

W s;c2J ðx;yÞ; if ðx;yÞ belongs to region border;

0; otherwise;

�ð8Þ

for s 2 {h,v,d}, c 2 {R,G,B}. The IWT is applied to UAc2J

and UW s;c2J for each color channel, generating a new set of

higher resolution images Sc2J�1 . To illustrate this procedure,

let us consider the watershed segmentation of the noisyhouse image produced at scale 22, shown in Fig. 3(d).

http://sipi.usc.edu/database

Fig. 2. Color gradient magnitude images before and after thresholding for the house image at different scales. (a)–(d) Original image. (e)–(h) Noisyversion.

Fig. 3. Watershed lines corresponding to the noisy magnitude imagesshown in Fig. 2(e)–(h), before and after thresholding.


The corresponding piece-wise constant representation im-age ~R2J (with J = 2) is shown in Fig. 4(a), and the updatedapproximation image UA

�!2J is illustrated in Fig. 4(b). The

result of the IWT is shown in Fig. 4(c), and it can be ob-served that image ~S2J�1 is almost piece-wise constant: eachplateau of ~R2J was mapped to a plateau in ~S2J�1 , but up-dated detail coefficients generate pixel fluctuations betweenadjacent plateaus when the IWT is computed. Such pixelsare responsible for reconstructing contour definition of seg-mented regions, and are called ‘‘fuzzy watershed pixels’’.

The following step is to assign these ‘‘fuzzy pixels’’ to aneighboring homogeneous region. Let us consider a fuzzypixel (x,y) and its 8-connectedness neighborhood. Someof these neighboring pixels belong to existing homogeneousregions, while others may be also fuzzy pixels. LetNh(x,y) 6 8 denote the number of pixels adjacent to (x,y)that effectively belong to existing homogeneous regions,and let (xk,yk) denote the positions of such pixels, fork = 1, . . .,Nh(x,y). The color dissimilarity dk between fuzzypixel (x,y) and its neighbor (xk,yk) is computed through

dk ¼ D ~R2J�1ðx; yÞ;~R2J�1ðxk; ykÞ� �

; k ¼ 1; . . . ;N hðx; yÞ; ð9Þ

where D(Æ,Æ) represents the distance between two colorsaccording to a given vector norm. Then, (x,y) is assignedto the region for which the value dk is the smallest. In otherwords, the value of ~R2J�1 at position (x,y) is redefined as

Fig. 4. (a) Initial segmentation result of the noisy house image at scale 22.(b) Updated approximation image UA

�!22 . (c) Projection of UA

�!22 to scale

21. (d) Piece-wise constant representation image ~R21 at scale 21.

~R2J�1ðx; yÞ ¼ ~R2J�1ðxl; ylÞ; where l ¼ argmink

dk: ð10Þ

When computing dk in Eq. (9), there are several choices forthe distance measure D(Æ,Æ), as well as several color spacerepresentations. In particular, the CIELAB color spacearises as a natural candidate, since it is roughly uniformand color dissimilarity can be approximated using the L2

vector norm (Euclidean distance), as noted in (Ohta et al.,1980). However, the L2 norm in CIELAB was tested, andproduced inferior results if compared to the L2 norm inRGB (object contours using CIELAB presented more jag-gedness than contours obtained with RGB). A possibleexplanation for this behavior is that noise contaminationcan affect significantly the chromaticity component in theRGB! CIELAB transformation, leading to erroneous col-or matching. An additional advantage of using RGB is thatno color space transformations are performed, reducing thecomputational burden. Yet another possibility would be toemploy the L1 norm (sum of absolute differences) in RGB,since it is faster to compute than the L2 norm. The L1 normin RGB was also tested, but the computational gain was notsignificant in the proposed approach, and quantitative re-sults (in terms of PSNR of segmented images) using theL1 norm were slightly inferior to results obtained with theL2 norm. Hence, the L2 vector norm in RGB coordinateswas chosen to compute color dissimilarity, as in the segmen-tation procedure described in (Fuh et al., 2000).

By the end of this step, a piece-wise constant representa-tion ~R2J�1 at scale 2J�1 is obtained, as shown in Fig. 4(d).

Since ~R2J�1 is formed by homogeneous regions, it is trivial

to find region borders and obtain updated wavelet coeffi-

cients UA�!

2J�1 and UW��!s

2J�1 , according to Eqs. (7) and (8).The IWT is computed, fuzzy pixels are resolved, and thisprocess is repeated until full resolution is restored (i.e.,until scale 20 is reached). The final segmentation resultfor the noisy house image (i.e. image ~R20 ) obtained using22 as the initial scale is depicted in Fig. 5(a). It can beobserved that all relevant structures were preserved, andonly 64 regions were segmented.

3.3. Region merging

The algorithm described so far produces a partition ofthe original image into connected regions with uniform col-ors. Although the adaptive magnitude thresholding proce-dure indeed reduces oversegmentation, adjacent regions

Fig. 5. Segmentation result for the noisy house image starting at scale 22.(a) Without post-processing (64 regions). (b) With post-processing (28regions).

3 One JND is an estimate on the minimum distance that leads to aperceptual color difference.


with similar colors may persist, and a region merging algo-rithm can be applied. We adopt an approach similar toCheng and Sun’s (2000), as described next.

Let Ns denote the number of segmented regionsobtained by the algorithm described in the previous sec-tion. For each pair of adjacent regions Bi and Bj with CIE-LAB color vectors ~ci and ~cj, the color difference measured(i, j) is computed using the L2 norm

dði; jÞ ¼ DE�ab ¼ ~ci �~cj

�� 2: ð11Þ

It should be noticed that the representative color of eachsegmented region was obtained by averaging colors in cor-responding regions of image ~A2J , as described in Section3.2. This averaging process reduces significantly noise influ-ence, and the undesired chromaticity distortion observedwhen assigning fuzzy pixels in CIELAB does not happenat this stage. Hence, it is convenient to use the CIELABcolor space for region merging instead of the RGB colorspace. Furthermore, only Ns color transformations are re-quired, resulting in practically no increase in the computa-tional burden of the proposed method.

The first step of the region merging algorithm is tolocate the pair of adjacent regions having the smallestcolor difference. If this difference is smaller than a certaincolor threshold Tc, the two regions will be merged.The color of the new region is computed again and themean value of the colors is assigned to the pixels within thisregion. The new global minimum of d(i, j) is then com-puted, and the procedure is repeated until all adjacentregions reach the minimum contrast Tc.

In (Cheng and Sun, 2000), the color threshold is the dif-ference of the mean and standard deviation of all distancesd(i, j), i.e. Tc = l � r. Hence, the threshold is highly depen-dent on the distribution of colors obtained in the segmen-tation procedure. In particular, perceptually differentadjacent regions may be erroneously merged if the originalimage contains a large variety of different colors, since theadaptive threshold Tc may be relatively large. For example,the adaptive threshold Tc = l � r is approximately 31 forthe noisy synthetic image shown in the bottom row ofFig. 8, which leads to erroneous region merging.

On the other hand, it is known that the CIELAB colorspace is roughly uniform, and a difference value d(i, j)around 2.3 corresponds to a JND3 (just noticeable differ-

ence) (Mahy et al., 1994). Also, color differences around5 units represent an accuracy acceptable for high-qualitycolor reproduction (Berns, 2001). In this work, the selecteddefault threshold is twice the acceptable value for high-quality color reproduction, i.e. Tc = 10. It should benoticed that smaller values of Tc lead to less region merging(hence, more segmented regions), while larger values of Tc

lead to opposite results.The result of the region merging procedure applied to

Fig. 5(a) is shown in Fig. 5(b). Low-contrast regions weremerged, and the number of segmented regions droppedfrom 64 to 28 after the post-processing.

3.4. Selection of the scale 2J

In the procedure described so far, an initial scale 2J mustbe provided for the initial segmentation using watersheds.The selection of 2J depends on the detail resolution desiredby the user (which is usually application-dependent), theamount of noise contamination and the dimensions ofthe original image. When a small value of J is selected,the proposed technique tends to retrieve a larger amountof segmented regions, being able to capture small imagedetails. As J increases, the segmentation procedure tendsto produce a smaller number (and larger) regions, provid-ing a coarser image representation.

The influence of initial scale 2J for the original and noisyhouse images shown in Fig. 1 is illustrated in Fig. 6. Seg-mentation results starting at scales 21, 22 and 23 for theoriginal house image are shown in the first row, and anal-ogous results for the noisy version are illustrated in the sec-ond row. As it can be observed, finer resolutions producegood results for the clean image, and even small detailsand texture changes are detected. However, a coarser initialscale 2J is required to reduce noise influence and avoid bro-ken contours in the noisy version. Visual inspection indi-cates that scale 22 provides nice results for both originaland noisy versions, keeping relevant structures.

Although the selection of J is application- and image-dependent, it is important to provide a default value toachieve unsupervised segmentation. A subjective evalua-tion indicated that a good compromise between noiseremoval and detail preservation was obtained using J = 1for 128 · 128 (or smaller) images, J = 2 for 256 · 256images, and J = 3 for 512 · 512 images (i.e. the initial scaleappears to be proportional to the logarithm of imagedimensions). Hence, for a color image with dimensionsn · m, the default value for parameter J is

J ¼ max 1; 1þ rnd log2

minfn;mg128

� � � ; ð12Þ

Fig. 6. Segmentation results for the original house image and its noisy version, using different initial scales 2J.

4 Region contours were ignored when computing the PSNR ofsegmented images.


where rnd(x) rounds x to the nearest integer. In this work,images up to 512 · 512 pixels were tested, leading to a max-imum wavelet scale J = 3. However, there is no image sizelimitation for the proposed segmentation method.

4. Experimental results

This section presents results of the proposed technique(called Waveseg) applied to several images containing bothnatural and artificial noise. We also compared Wavesegwith three state-of-the-art segmentation methods, namelyMean-shift (Comaniciu and Meer, 2002), JSEG (Dengand Manjunath, 2001) and SRM (Nock and Nielsen,2004). For Mean-shift, we manually selected requiredparameters hs, hr and M to obtain good visual results, fol-lowing the guidelines provided in (Comaniciu and Meer,2002). In particular, Mean-shift seemed to be an attractivetechnique for segmenting noisy images, since the methodcan also be used for image denoising. Segmentation usingJSEG involves three parameters: a threshold for colorquantization qJ, the number of scales lJ, and a thresholdfor region merging mJ. We used default values as describedin (Deng and Manjunath, 2001) for all images, unless sta-ted otherwise. For SRM, we used default parameters indi-cated in (Nock and Nielsen, 2004). We ran Waveseg on theunsupervised mode, meaning that J was selected using Eq.(12), and the region merging threshold was fixed Tc = 10.

In the first experiment, we applied JSEG, Mean-shiftand SRM to the original and noisy versions of the houseimage, and results are shown in Fig. 7. As it can beobserved, JSEG is efficient for segmenting the originalimage (although the window was missed). However, thedefault set of parameters for JSEG returned no segmentedregion at all for the noisy version, and the region merging

parameter mJ had to be fine-tuned to 0.1 to obtain the cor-responding result displayed in Fig. 7. It can also be noticedthat Mean-shift produced an excellent result for the origi-nal image (using hs = 8, hr = 12 and M = 100). However,the same set of parameters resulted in a very bad segmen-tation result for the noisy version, and they had to bere-adjusted (hs = 12, hr = 21 and M = 300) in order to geta barely acceptable segmentation, as illustrated in Fig. 7(center-bottom). SRM also missed part of the window inthe ‘‘clean’’ version, and segmented all relevant objects inthe noisy version (however, the lower rooftop was blendedwith the bricks). A visual comparison of these results withthe proposed approach using default values (J = 2, centralimage in Fig. 6) indicates that Waveseg produces a bettersegmentation for the noisy image. Waveseg result for theoriginal image seems superior to segmentation with JSEGand SRM, and maybe slightly inferior to Mean-shift. Fora quantitative analysis of noisy images, we computed thePSNR4 of segmentation results and the number of seg-mented regions.

Other examples of noisy image segmentation are illus-trated in Fig. 8. The first column illustrates a noisy version(PSNR = 13.31 dB) of the 512 · 512 peppers image and a330 · 330 noisy synthetic image containing 144 squaresbased on the MacBeth color chart (PSNR = 15.65 dB).The following columns show segmentation results usingWaveseg, JSEG, Mean-shift and SRM, respectively. Forthe peppers image, JSEG’s default parameters returnedno segmented region at all (as happened with the noisyhouse image), and we had to set mJ = 0.05 to obtain the

Fig. 7. From left to right: segmentation results for the original house image using JSEG, Mean-shift and SRM. Top row: original house image. Bottomrow: noisy house image.

Fig. 8. Segmentation results for noisy images using Waveseg (second column), JSEG (third column), Mean-shift (fourth column) and SRM (last column).


result shown in Fig. 8(c). For Mean-shift, we used the set ofparameters (hs,hr,M) = (16, 27,10,000) to obtain the seg-mented image. Visual inspection indicates that JSEG pro-duced a very coarse image representation, and Mean-shiftreturned very noisy contours. The proposed techniquehad a performance similar to SRM, presenting smoothcontours and more details than JSEG and Mean-shift.However, a contour gap at the bottom-left portion of thecentral green pepper resulted in leakage during watershed,leading to an erroneous contour. For the MacBeth image,JSEG was applied with mJ = 0.05, since default valuesresulted in just a few segmented squares. For Mean-shift,we used (hs,hr,M) = (20, 22,150). It can be observed thatall four methods failed to detect lighter grey squares, dueto low-contrast differences. However, Waveseg was ableto segment most of the squares (126 out of 144), returned

the largest PSNR, produced better defined contours (spe-cially for the smallest squares) than its competitors, anddid not produce spurious regions.

The MacBeth synthetic image shown in the bottom rowof Fig. 8 is particularly useful for quantitative compari-sons, since it is easy to obtain ground truth segmentationresults (opposed to most natural images). Several noisyversions of this synthetic images were produced withincreasing amounts of additive Gaussian noise, and seg-mentation results obtained with Waveseg, JSEG andSRM were compared (Mean-shift was not included, sinceit requires too many parameter adjustments). A quantita-tive analysis of these techniques in terms of PSNR andnumber of segmented regions is summarized in Table 1.As noise contamination increases, the number of seg-mented regions for all methods decrease, as well as the

Table 1PSNR and number of segmented regions for noisy versions of the MacBeth image, using Waveseg, JSEG and SRM

Noisy 1 dB 28.92 dB 23.07 dB 19.75 dB 17.45 dB 15.73 dB 14.37 dB 13.28 dB 12.36 dB

Waveseg 31.14 dB 31.10 dB 30.17 dB 28.92 dB 27.58 dB 26.03 dB 24.21 dB 22.72 dB 20.71 dB133 reg. 133 reg. 133 reg. 133 reg. 128 reg. 127 reg. 123 reg. 121 reg. 119 reg.

JSEG 24.14 dB 28.74 dB 28.82 dB 26.27 dB 15.32 dB 11.23 dB 10.28 dB 10.39 dB 10.23 dB114 reg. 123 reg. 127 reg. 114 reg. 47 reg. 11 reg. 5 reg. 3 reg. 3 reg.

SRM 24.49 dB 26.25 dB 26.39 dB 25.72 dB 23.92 dB 22.55 dB 22.06 dB 20.89 dB 20.07 dB89 reg. 94 reg. 95 reg. 94 reg. 90 reg. 84 reg. 83 reg. 87 reg. 80 reg.


resulting PSNR. However, Waveseg presented the closestnumber of segmented regions to ground truth (144regions), and also produced the highest PSNR for all levelsof noise corruption.

We also applied our technique to several 321 · 481 nat-ural images contained in the Berkeley dataset (Martinet al., 2001), and compared results with JSEG, Mean-shift,SRM and manual delineations provided with the database(darker contours were marked by more human subjects).

Fig. 9. Segmentation of several natural images. Top row: original images. Seconusing JSEG. Fourth row: segmentation using Mean-shift. Fifth row: segmenta

An overall visual inspection indicates that all algorithmsachieve a similar performance, being able to detect roughlythe same relevant objects marked by human subjects, asshown in Fig. 9. They also present similar problems, suchas the misdetection of the giraffes’ legs in the third image.It is also interesting to notice that no manual delinea-tion includes clouds in the first and third images, butclouds are strongly marked as relevant objects in the fourthimage.

d row: segmentation using the proposed method. Third row: segmentationtion using SRM. Last row: manual contour delineation.

Table 2Running times for Waveseg, Waveseg without region merging, JSEG, Mean-shift and SRM

House Noisy house Peppers MacBeth Horses Landscape Giraffes Pyramids

Waveseg (s) 6.0 5.4 31.6 13.9 12.2 13.9 16.2 20.1Waveseg (w/o) (s) 3.8 3.4 16.4 7.4 6.9 7.9 8.4 10.5JSEG (s) 4.35 8.3 46.1 20.1 20.7 16.5 9.6 16.3Mean-shift (s) 1.7 16.7 130.3 85.6 20.7 27.5 18.9 18.3SRM (s) <1 <1 <2 <1 <1 <1 <1 <1


All results regarding Waveseg were obtained through aMATLAB implementation of the proposed algorithm(available for download at http://www.inf.unisinos.br/~crjung/research.htm), while results of JSEG, Mean-shiftand SRM were obtained using C/C++ implementationsprovided by the authors. Running times on a portablePC computer with a Centrino 1.7 GHz processor are pro-vided in Table 2. Although implementations in MATLABtend to be slower than C/C++, it can be observed that run-ning times for Waveseg are similar (or smaller) than JSEGand Mean-shift, but larger than SRM. In particular, themost time consuming parts of our MATLAB implementa-tion are the assignment of fuzzy pixels (since a loop is used,and it is known that loops generate bottlenecks in MAT-LAB programming), and the region merging procedure(due to the non-optimized data structure used to find andmerge adjacent regions). For sakes of comparison, runningtimes for Waveseg without the region merging procedurewere also included in Table 2. It is expected that an opti-mized implementation of Waveseg in a compiled languagewould reduce significantly execution time.

5. Discussion and conclusions

In this work, a multiresolution color segmentation algo-rithm was presented. The WT is applied up to a selectedscale 2J, and color gradient magnitudes are computed atthe coarsest resolution. An adaptive threshold based onstatistical properties of gradient magnitudes is estimated,and watersheds are applied to obtain an initial segmenta-tion at the coarsest resolution 2J. The initial segmentationis then projected to finer resolutions using the IWT, untilthe final segmentation at scale 20 is obtained. Finally, aregion-merging post-processing step is applied to blendregions with similar colors.

The proposed method involves the selection of twoparameters: the number of dyadic scales J and the thresh-old for color region merging Tc. However, default valueswere used in all comparisons with competitive techniques,since we computed J automatically according to Eq. (12),and we fixed Tc = 10. It is clear that even better resultscan be achieved by fine-tuning J and Tc for each individ-ual image, but our intent was to show that very goodresults can be obtained without user intervention. Indeed,experimental results indicated that Waveseg producedresults comparable to other state-of-the-art techniquesfor natural images, and superior results for noisy images

(in particular, the unsupervised version of JSEG failedto segment most noisy images, needing manual parameteradjustments). Furthermore, it should be noticed that thefine-tuning of J and Tc is not only image-dependent, butalso context-dependent (since the ultimate evaluationof a segmentation technique depends on the desiredapplication).

Although the procedure described in this paper wasintended for color image segmentation, the extension formultivalued images (such as some types of medical imagesand multispectral satellite images) is straightforward. Theonly required changes are (i) to increase the number ofGaussian random variables used to deduce Eq. (5); (ii) toadapt the similarity metric of the region merging algorithm.

Future work will concentrate on combining segmenta-tion results using different initial scales 2J, and improvingthe region merging procedure.

Acknowledgement

This work was developed in collaboration with HP Bra-zil R&D and Brazilian research agency CNPq. The authorwould like to thank the anonymous reviewers, for theirvaluable contributions.

References

Berns, R.S., 2001. The science of digitizing paintings for color-accurateimage archives: A review. J. Imaging Sci. Technol. 45 (4), 305–325.

Casella, G., Berger, R.L., 1990. Statistical Inference. Wadsworth andBrooks/Cole, Pacific Grove.

Chen, J., Pappas, T., Mojsilovic, A., Rogowitz, B., 2004. Perceptually-tuned multiscale color–texture segmentation. In: Internat. Conf.Image Process., pp. II: 921–924.

Cheng, H., Sun, Y., 2000. A hierarchical approach to color imagesegmentation using homogeneity. IEEE Trans. Image Process. 9 (12),2071–2082.

Comaniciu, D., Meer, P., 2002. Mean shift: A robust approach towardfeature space analysis. IEEE Trans. Pattern Anal. Machine Intell. 24(5), 603–619. Available from: <www.caip.rutgers.edu/riul/research/code/EDISON/>.

Deng, Y., Manjunath, B.S., 2001. Unsupervised segmentation of color–texture regions in images and video. IEEE Trans. Pattern Anal.Machine Intell. 23 (8), 800–810. Available from: <vision.ece.ucsb.edu/segmentation/jseg/software/index.htm>.

Fuh, C., Cho, S., Essig, K., 2000. Hierarchical color image regionsegmentation for content-based image retrieval system. IEEE Trans.Image Process. 9 (1), 156–162.

Gies, V., Bernard, T., 2004. Statistical solution to watershed over-segmentation. In: Internat. Conf. Image Process., pp. III: 1863–1866.

http://www.inf.unisinos.br/~crjung/research.htm

http://www.inf.unisinos.br/~crjung/research.htm

http://www.caip.rutgers.edu/riul/research/code/EDISON/

http://www.caip.rutgers.edu/riul/research/code/EDISON/

http://vision.ece.ucsb.edu/segmentation/jseg/software/index.htm

http://vision.ece.ucsb.edu/segmentation/jseg/software/index.htm


Henstock, P., Chelberg, D.M., 1996. Automatic gradient thresholddetermination for edge detection. IEEE Trans. Image Process. 5 (5),784–787.

Jung, C.R., 2003. Multiscale image segmentation using wavelets andwatersheds. In: Proc. SIBGRAPI. IEEE Press, Sao Carlos, SP, pp.278–284.

Kazanov, M., 2004. A new color image segmentation algorithm based onwatershed transformation. In: Internat. Conf. Pattern Recognition,pp. II: 590–593.

Kim, J.B., Kim, H.J., 2003. Multiresolution-based watersheds for efficientimage segmentation. Pattern Recognition Lett. 24, 473–488.

Liapis, S., Sifakis, E., Tziritas, G., 2004. Colour and texture segmentationusing wavelet frame analysis, deterministic relaxation, and fastmarching algorithms. J. Visual Comm. Image Represent. 15 (1), 1–26.

Ma, W.Y., Manjunath, B.S., 2000. Edge flow: A technique for boundardetection and image segmentation. IEEE Trans. Image Process. 9 (8),1375–1388.

Mahy, M., van Eyeken, L., Oosterliuck, A., 1994. Evaluation of uniformcolor spaces developed after the adoption of CIELAB and CIELUV.Color Res. Appl. 19 (2), 105–121.

Makrogiamis, S., Theouaratos, C., Economou, G., Folopoulos, S., 2003.Color image segmentation using multiscale fuzzy c-means and graphtheoretic merging. In: Internat. Conf. Image Process., pp. I: 985–988.

Mallat, S.G., 1989. A theory for multiresolution signal decomposition:The wavelet representation. IEEE Trans. Pattern Anal. Machine Intell.11 (7), 674–693.

Marfil, R., Rodriguez, J., Bandera, A., Sandoval, F., 2004. Boundedirregular pyramid: A new structure for color image segmentation.Pattern Recognition 37 (3), 623–626.

Martin, D., Fowlkes, C., Tal, D., Malik, J., 2001. A database of humansegmented natural images and its application to evaluating segmen-tation algorithms and measuring ecological statistics. In: Proc. 8thInternat. Conf. Comput. Vision., vol. 2, pp. 416–423. Available from:<www.cs.berkeley.edu/projects/vision/grouping/segbench>.

Meyer, F., Beucher, S., 1990. Morphological segmentation. J. VisualComm. Image Represent. 1, 21–46.

Navon, E., Miller, O., Averbuch, A., 2005. Color image segmentation basedon adaptive local thresholds. Image Vision Comput. 23 (1), 69–85.

Nikolaev, D., Nikolayev, P., 2004. Linear color segmentation and itsimplementation. Computer Vision and Image Understanding 94 (1–3),115–139.

Nock, R., Nielsen, F., 2003. On region merging: The statistical soundnessof fast sorting, with applications. In: Conf. Comput. Vision PatternRecognition, pp. II: 19–26.

Nock, R., Nielsen, F., 2004. Statistical region merging. IEEE Trans.Pattern Anal. Machine Intell. 26 (11), 1452–1458.

Nock, R., Nielsen, F., 2005. Semi-supervised statistical region refinementfor color image segmentation. Pattern Recognition 38 (6), 835–846.

Ohta, Y., Kanade, T., Sakai, T., 1980. Color information for regionsegmentation. Comput. Graph. Image Process. 13 (3), 222–241.

Patino, L., 2005. Fuzzy relations applied to minimize over segmentation inwatershed algorithms. Pattern Recognition Lett. 26 (6), 819–828.

Sapiro, G., 1997. Color snakes. Computer Vision and Image Understand-ing 68 (2), 247–253.

Scharcanski, J., Jung, C.R., Clarke, R.T., 2002. Adaptive image denoisingusing scale and space consistency. IEEE Trans. Image Process. 11 (9),1092–1101.

Scheunders, P., Sijbers, J., 2002. Multiscale watershed segmentation ofmultivalued images. In: Internat. Conf. Pattern Recognition, pp. III:855–858.

Srivastava, A., Lee, A.B., Simoncelli, E.P., Zhu, S.C., 2003. On advancesin statistical modeling of natural images. J. Math. Imaging Vision 18(1), 17–33.

Strang, G., Nguyen, T., 1996. Wavelets and Filter Banks. Wellesley-Cambridge Press.

Vanhamel, I., Pratikakis, I., Sahli, H., 2003. Multiscale gradient water-sheds of color images. IEEE Trans. Image Process. 12 (6), 617–626.

Vincent, L., Soille, P., 1991. Watersheds in digital spaces: An efficientalgorithm based on immersion simulations. IEEE Trans. Pattern Anal.Machine Intell. 13 (6), 583–598.

http://www.cs.berkeley.edu/projects/vision/grouping/segbench

Documents

Unsupervised multiscale segmentation of color images