Adaptive texture and color segmentation for tracking moving objects

Pattern Recognition 35 (2002) 2013–2029www.elsevier.com/locate/patcog

Adaptive texture and color segmentation for trackingmoving objects

Ercan Ozyildiz, Nils Krahnst-over, Rajeev Sharma∗

Department of Computer Science and Engineering, Pennsylvania State University, 220 Pond Lab, University Park,PA 16802-6106, USA

Received 9 November 2000; received in revised form 26 June 2001

Abstract

Color segmentation is a very popular technique for real-time object tracking. However, even with adaptive colorsegmentation schemes, under varying environmental conditions in video sequences, the tracking tends to be unreliable.To overcome this problem, many multiple cue fusion techniques have been suggested. One of the cues that complementscolor nicely is texture. However, texture segmentation has not been used for object tracking mainly because of thecomputational complexity of texture segmentation. This paper presents a formulation for fusing texture and color in amanner that makes the segmentation reliable while keeping the computational cost low, with the goal of real-time targettracking. An autobinomial Gibbs Markov random 9eld is used for modeling the texture and a 2D Gaussian distributionis used for modeling the color. This allows a probabilistic fusion of the texture and color cues and for adapting boththe texture and color over time for target tracking. Experiments with both static images and dynamic image sequencesestablish the feasibility of the proposed approach. ? 2002 Pattern Recognition Society. Published by Elsevier ScienceLtd. All rights reserved.

Keywords: Visual tracking; Color segmentation; Texture segmentation; Cue fusion

1. Introduction

Some of the most popular methods for real-timevisual tracking of moving objects are based oncolor segmentation [1–3]. The main reason forchoosing color-based segmentation is that the colorcue is relatively invariant to scale, illuminationand viewing direction while being computationallye?cient.Although the di@erent color-based tracking approa-

ches reported in the literature demonstrate a certain

∗ Corresponding author. Tel.: +1-814-865-9505; fax: +1-814-865-3176.E-mail addresses: [email protected] (E. Ozyildiz),

[email protected] (N. Krahnst-over),[email protected] (R. Sharma).

degree of adaptation to object color changes, for com-plex backgrounds and changing environments, thereliance on one cue can lead to poor tracking per-formance. This suggests the use of multiple cues fortracking.Texture provides a way of characterizing the spa-

tial structure of an object and can complement the useof color for reliable object tracking. In the presenceof uncertainty, pixel neighborhood information can beexploited for more robust segmentation. Texture seg-mentation uses neighborhood statistical informationsince it is a non-local process unlike pixel-based colorsegmentation. However, texture segmentation is compu-tationally demanding. Hence a combination of a colorsegmentation scheme with texture segmentation can beused advantageously to achieve real-time robust objecttracking.

0031-3203/02/$22.00 ? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.PII: S0031-3203(01)00181-9

2014 E. Ozyildiz et al. / Pattern Recognition 35 (2002) 2013–2029

There is a large body of literature that addressesdi@erent aspects of the texture segmentation problem.However, texture segmentation has not been consideredin the context of real-time object tracking mainly be-cause of two reasons. First, texture parameter estimationand segmentation is computationally ine?cient, makingreal-time implementation di?cult. Second, unlike color,obtaining scale and rotation invariant texture informationin a dynamic environment is very di?cult.In this work a new approach for tracking a moving

object using a combination of adaptive texture and colorsegmentation is proposed. A Gibbs Markov random 9eld(GMRF) is used for modeling the texture and a 2D Gaus-sian distribution is used for modeling the color. This al-lows a probabilistic framework for fusing the texture andcolor cues at region level and for adapting both the tex-ture and color segmentation over time for target tracking.The fusion of texture and color makes the segmentationreliable while keeping the computation e?cient with thegoal of real-time target tracking. A Kalman 9lter-basedmotion estimation and prediction further helps in im-proving performance. Experiments with both static im-ages and dynamic image sequences are used for estab-lishing the feasibility of the proposed approach. The ex-periments were conducted for both indoor and outdoorscenes. Scenes with complex backgrounds and mixturesof similar object colors and textures were particularlychosen to test the algorithm. The experiments show thatthe probabilistic fusion of texture and color informationat the region level improves the robustness of the track-ing system under di?cult visual conditions.The rest of paper is organized as follows. Section 2

discusses the background and related work on color andtexture segmentation in the context of target tracking.Section 3 explains characterization of textured images byusing GMRF, the autobinomial model and the parameterestimation method. Section 4 gives a formulation of colorsegmentation using a Gaussian distribution that is appro-priate for adaptive segmentation. Section 5 presents a for-mulation for fusing color and texture. Section 6 proposesan approach for adapting texture and color segmentationduring the tracking process. Section 7 presents the ex-perimental results and performance analysis of color andtexture segmentation on both static images and dynamicimage sequences. Section 8 concludes with a discussionof the issues involved in improving color and texture seg-mentation for more e?cient and reliable target tracking.

2. Background and related work

2.1. Color segmentation

In recent years, the analysis of color images has beenplaying an important role in computer vision becausecolor can provide an e?cient cue for focus of attention,

object tracking and recognition allowing real-time per-formance to be obtained using only modest hardware.Color segmentation is also computationally e?cient andrelatively robust to changes in illumination, in viewingdirection, and in scale. Robustness is achieved if the colorcomponents are e?ciently separated from luminance inthe original image and color distribution is representedby a suitable mathematical model for thresholding [4].Sharbek et al. [5] reviewed color segmentation algo-

rithms and categorized them as pixel, area, edge andphysics-based segmentation according to their attributes.There is no winner among color segmentation algorithms.The e@ectiveness of an algorithm changes with applica-tion. Several researchers have compared di@erent colorspaces for the application of skin detection [4,6]. For ex-ample, Refs. [7,8] present comparisons of unsupervisedand supervised color segmentation algorithms, respec-tively.For online applications, two color segmentation meth-

ods are mostly used. The 9rst method uses Gaussianmixture models to characterize color distribution of anobject [1,3,4]. The second method employs a histogrammodel [9,10]. The histogram-based methods are basi-cally non-parametric forms of density estimation in colorspace.Although color is an e?cient cue for computer vision

applications, a number of viewing factors, such as lightsources, background colors and luminance levels, havea great impact on the change in color appearance. Mostcolor-based systems are sensitive to these changes. Thereare two major approaches for handling environmentalchanges. The 9rst approach 9nds the e@ects that changethe color and use them as inverse 9lter to obtain the realcolor [11–13]. The second approach, adaptive color seg-mentation, adapts the previously developed color modelto the changing environment [1,3]. This approach is moresuitable to compensate for changes in the natural environ-ment. However, just adapting color may not achieve therobustness desired, motivating multiple cue approaches.

2.2. Texture segmentation

Texture has been widely accepted as a very importantfeature in image processing and computer vision sinceit provides unique information about the physical char-acteristics of surfaces, objects, and scenes [14]. Numer-ous texture segmentation methods have been proposedin the past. Reed et al. [15] categorized texture featureextraction methods as feature-based, model-based andstructural. Especially, stochastic image models based onthe Gibbs distribution have received a lot of attentionand have been applied in ecology, sociology, statisticalmechanics and statistical image modeling and analysis[16,17].GMRF theory provides a foundation for the charac-

terization of contextual constraints and the derivation of

E. Ozyildiz et al. / Pattern Recognition 35 (2002) 2013–2029 2015

the probability distribution of interacting features [18]. Acomprehensive discussion about the use of MRF in com-puter vision and the statistical aspects of images is givenin Refs. [18,19]. GRFs were applied to textured imagesfor modeling and segmentation by Derin and Elliott [20].Cross and Jain [21] explored the use of GMRFs as texturemodels. They used the binomial model where each pointin the texture has a binomial distribution with a param-eter controlled by its neighbors and “number of trials”equal to the number of gray levels. Schroder et al. [17]recently used the autobinomial GMRF model as a pow-erful, robust descriptor of spatial information in typicalremote-sensing image data.Panjwani et al. presented a model that extracted tex-

ture information from the interaction within and betweencolor bands [22]. The disadvantage of this method is itscomputational complexity. Dubuisson et al. combinedtexture and color separately by using maximum likeli-hood estimation [23]. Because of this separation theydecreased the interaction of cues.

2.3. Tracking with multiple cue fusion

Recent developments in computing devices, video pro-cessing hardware, and sensors enable the constructionof more reliable and faster visual tracking systems. De-spite these advances, most visual tracking systems arebrittle. In particular, the systems which rely on a singlecue or methodology for locating their targets are easilyconfused in commonly occurring visual situations. Somecolor-based tracking techniques try to increase the relia-bility by updating the color cues over time [1–3]. Otherapproaches similarly employ multiple cues to get morereliable information [14,24–26].Yang et al. introduced a tracking algorithm which con-

structs a self-adapting model from the detected movingobject, using features such as shape, texture, color, andedgedness [25]. In a work by Murrieta et al. color andtexture information was used to characterize and trackspeci9c landmarks [24]. Paschos et al. described a vi-sual monitoring system that performs scene segmentationbased on color and texture information [14]. Color infor-mation was combined with texture to detect and measurechanges in a given space or environment over a periodof time. Deng et al. used a combination of color, textureand motion to analyze and retrieve video objects [26].However, none of these reported works utilize a combi-nation of texture and color for tracking moving objects.

3. Formulation of texture segmentation

In this work, Gibbs Random 9elds (GRF) are used tomodel the texture information. This section 9rst brieNyreviewes GRF theory. Furthermore, the auto-binomial

Fig. 1. De9nition of the local neighborhood around the pixelsite xs.

GRF, is discussed, followed by a description of the linearparameter estimation process.The image is assumed to be the realization of a random

9eld xs of pixel sites. This random 9eld is called Marko-vian, if the probability density function of the pixel xsis completely determined by the values of the pixels inthe local neighborhood N (xs). Fig. 1 shows a particularneighborhood N (xs)= {xi; x̂i}. The index labels the twopixel cliques consisting of two facing neighbors xi and x̂i.Such a Markov 9eld can be described as a Gibbs 9eld

with a local energy function H (xs; N (xs); �) with � be-ing a vector of scalar parameters reNecting the inNuenceof the di@erent cliques {xi; x̂i}. For a single pixel site xs,this approach results in the conditioned probability dis-tribution.

p(xs|N (xs))=1Ze−H (xs;N (xs);�): (1)

The variable Z , the partition sum, normalizes the dis-tribution. The parameter vector � weights the contribu-tions of the di@erent neighborhood pixels and is a param-eterization of the image content.The form of the energy function should be chosen de-

pending on the image model. In this work, the autobino-mial model, which was recently used by Schroder et al.[17] with a di@erent scaling of parameters, is used formodeling the image. Using this model the parameter es-timation can be done e?ciently while being able to char-acterize real images adequately. In the following, the no-tation and formalism is adapted from the work presentedin Ref. [17].In the autobinomial model, the energy function is de-

9ned as

H (xs; N (xs); �)=− ln

(G

xs

)− xs�(�; xs); (2)


where the G denotes the maximum gray value and ( nm )

denotes the binomial coe?cients.The inNuence of the neighboring pixels is represented

by the linear equation

�(�; xs)= �0 +∑i

�ixsi + x̂si

G: (3)

The coe?cients

�=[�0; �1; �2; �3; �4; �5; �6 : : : ]T (4)

determine interaction between the pixels and hence thestatistical properties of the random 9eld xs. A typicalrealization of the random 9eld corresponds to a typicalappearance of the image or in our context to the spatialproperties of the texture.For the autobinomial model, the partition sum Z is

calculated as

Z =G∑x=0

e−H (x;�) = (1 + e�)G: (5)

Therefore, the PDF of the autobinomial can be writtenas

p(xs|N (xs); �)=

(G

xs

)%xs(1− %)G−xs (6)

with %=1=(1 + e−�).The mean and variances are found to be

E[x]=G

1 + e−� (7)

and

Var[x]=E[x2]− E[x]2 =Ge−�

(1 + e−�)2: (8)

For �=0, the mean and variance are obtained asE[x]=G=2 and Var[x]=G=4. Before the parameter es-timation, data is normalized to these values. Therefore,the estimated parameters depend on spatial informationmore than intensity radiometric properties.The parameter vector � of the model parameterizes

the spatial information of the image. Reliable and robustMRF parameter estimation can in general be quite com-plex [18], but for the auto-binomial model an e?cientclosed form conditional least square (CLS) solution ex-ists and were recently applied to remote sensing data re-trieval [17].The CLS estimator is de9ned as

�̂=argmin�

∑s

(xs − E[xs])2: (9)

By using the expectation (7), Eq. (9) can be written as

�̂=argmin�

∑s

(xs − G

1 + e−�

)2: (10)

In Ref. [17] it was shown that how using a mean 9eldapproximation, � can be approximated as

� ≈ −log(Gxs

− 1)+ N (0; �); (11)

where N (· ; ·) is a Gaussian noise term of zero meanand some small variance �. With the de9nition of �,Eq. (3), the parameter estimation process is reduced tothe overde9ned set of linear equations

G�= d + n; (12)

where � is the unknown parameter vector,

ds =− log(Gxs

− 1)

(13)

and n a Gaussian noise vector. G is a matrix containingthe neighboring pixel values

Gs; t =

1 if t=0;xst + x̂st

Gotherwise:

(14)

The values xst and x̂st denote the two tth neighbors ofthe pixel xs according to the de9nition in Fig. 1. Withthe assumption that the noise term n is Gaussian, themaximum likelihood estimate of � is given by

�̂=(GTG)−1GTd: (15)

Thus, this provides a closed form solution for the tex-ture parameters allowing to e?ciently map texture im-age information to a low dimensional parameter space,making it appropriate for target tracking.

4. Formulation of a color space and model

The selection of the color space is one of the key fac-tors for e?cient color information extraction. A numberof color space comparisons are presented in the literatureas stated in Section 2.1. After experimentally observingthe e@ect of di@erent color spaces on the segmentationresults, the YES space was selected as the most appro-priate color space. For details, the readers can consultRef. [27].Luminance–chrominance is known as the YES space,

where “Y ” represents the luminance channel and “E” and“S” represent the chrominance components. The YESspace is de9ned by a linear transformation of the soci-ety of motion picture and television engineers (SMPTE)


RGB coordinates, given byY

E

S

=

0:253 0:684 0:063

0:500 −0:500 0:000

0:250 0:250 −0:500

R

G

B

: (16)

The second important element of color segmentationis the choice of the model for the object color distribu-tion. Color histograms and the Gaussian models havebeen successfully used for real-time color segmenta-tion systems. Color histograms [28] are a simple andnon-parametric method for modeling color. But theyneed su?ciently large datasets in order to work reliably.A second drawback is the di?culty of adapting themodel over time. A more e@ective approach is to modelthe color distribution with a 2D Gaussian. The represen-tation of color distribution is possible using relativelylittle data with the Gaussian model. It is also suitable foradaptive color segmentation.Based on the chrominance components E and S, a bi-

variate (2D) Gaussian distribution N (�c; �2c) with mean

�c and covariance �c is used to represent the distribu-tion of object color. The 2D Gaussian probability densityfunction is

p(zc|object color)

=1

2�|�c|1=2 exp(−12[zc − �c]T�−1

c [zc − �c]); (17)

where

zc =

[E

S

]; �c =

[�E�S

]; �c =

[�2E �ES

�SE �2S

]:

The distribution p(zc|objectcolor) is simpli9ed to theMahalanobis distance ([zc−�c]T�−1

c [zc−�c]) by takingthe natural logarithm. Color segmentation is performedby calculating the Mahalanobis distance for the pixels ina small region and comparing its values to a threshold �c.

5. Fusion of texture and color

In this work, color and texture segmentation is inte-grated by estimating their joint probability distributionfunction (PDF). Using a joint probability function ofcues enables one cue to support the other one if it be-comes unreliable due to environmental changes. Textureand color PDFs are combined at the region level becausereliable texture information extraction is only possible ifperformed with su?cient sample sizes. The region R(xs)is de9ned as a 3× 3 sub-window is the 5× 5 texture re-gion window centered around the pixel site xs as shownin Fig. 2. Each window element itself contains M × Mpixels for typical values M =5 : : : 15 depending on thetexture scale.

Fig. 2. The sub-center window N (xs) with N =9 neighborwindows. Each window element has size M ×M pixels. Thecolor parameter vector is estimated based on the region N (xs).For the texture analysis, a parameter vectors are estimated foreach of the neighbor windows and subsequently averaged.

Because color and texture features are conditionallyindependent, the probability of the region belonging toclass ! can be expressed as

p(zt; s; zc; s|!)=p(zc; s|!)p(zt; s|!); (18)

where zc; s =[ES]T denotes the average chrominance vec-tor and zt; s denotes the average texture feature vector inthe neighborhood R(xs) around the pixel site xs. With theabove de9nition of R(xs), this means the average valueof 9M 2 pixel samples. To keep the notation in the re-mainder of this section simple, the pixel site index s isdropped.As described in Section 4, the color distribution is

modeled by a Gaussian:

p(zc|!i)=1

(2�)|�c; i|1=2 exp(−12 c; i); (19)

where

c; i =[zc − �c; i]T(�c; i)−1[zc − �c; i]: (20)

�c; i and �c; i denote the mean vector and covariance ma-trix of zc, respectively, for class !i.Let �n denote the texture parameter vector estimated

from the nth neighbor window of the region centeredaround xs. As for the color distribution, this estimate isbased on 9M 2 pixel samples. The texture feature distri-bution is modeled by a Gaussian probability distribution.


The probability of the nth neighbor window regionbelonging to class !i can be written as

Pni =p(�n|!i)=1

(2�)d=2|�t; i|1=2 exp(−12 t;ni

); (21)

where

t;ni =[�n − �t; i]T(�t; i)−1[�n − �t; i] (22)

and �t; i and �t; i denote the mean vector and covariancematrix of �, respectively, for class !i.The texture parameters are estimated for each of N =9

neighbor windows as shown Fig. 2. Therefore, p(zt |!)can be expressed as a function of N neighbor probabili-ties:

p(zt |!i)=F(P1i ; P2i ; P3i ; : : : PNi): (23)

The experimental results show that the distance of theregion texture vector zt to the class !i; t; i, can be ex-pressed as the average of the N neighbor distances t;ni

p(zt |!i)=1

(2�)d=2|�t; i|1=2 exp(−�N

n=1 t;ni2N

)(24)

=1

(2�)d=2|�t; i|1=2 exp(−12 t; i): (25)

With assumption of conditional independence of thecolor and texture cues, the PDFs p(zc; zt |!i) can beapproximated as

p(zc; zt |!i)˙ exp(− 12 ( c; i + t; i)): (26)

Because of the assumption of dynamic and complexbackground, the representation of the background is verydi?cult. Therefore, the region classi9cation proceedswith the following hypothesis test:

x∈ !target if

( c; target

�c+ t; target

�t

)¡�fusion ;

!background otherwise;(27)

where �c and �t are, respectively, color and texture seg-mentation thresholds which are speci9ed by the user orcomputed by the system using the ROC curve analysisat the initialization level as described in Refs. [29,30].When the desired segmentation is accomplished forboth cues, the value of the combined distances becomessmaller than 2. The segmentation is obtained for �fusionvalues in the range [1:5; 2:5].

6. Adaptation of texture and color for the targettracking

The appearance of the target object varies over time. Ifthe tracking system does not compensate these changes,

it may lose the object. Under the assumption that theappearance changes gradually, a statistical adaptationscheme is proposed. Basically, the adaptation schemeupdates the mean and the covariance of the texture andcolor feature vectors over time. The process keeps Lprevious mean and covariance estimates and the numberof sample points for each mean and covariance. Thenew estimates at time (l+ 1) are calculated as follows:

�̂l+1c =

�0cN 0

c∑l−1

s=l−L+1 �̂scN

sc + �l

cN lc

N 0c +

∑ls=l−L+1 Ns

c; (28)

�̂l+1c =

�0cM 0

c +∑l−1

s=l−L+1 �̂scMs

c + �lcMl

c

M 0c +

∑l−1s=l−L+1 Ms

c; (29)

where Nlc andMl

c denote the number of sample points forcalculating the color feature vector mean and covarianceat time l, respectively. The texture variance and meanare calculated in the same manner

�̂l+1t =

�0t +

∑l−1s=l−L+1 �̂s

tNst + �l

tN lt

N 0t +

∑l−1s=l−L+1 Ns

t

; (30)

�̂l+1t =

�0t M 0

t +∑l−1

s=l−L+1 �̂stM s

t + �ltMl

t

M 0t +

∑l−1s=l−L+1 Ms

t

(31)

with Nlt and Ml

t denoting the number of sample pointsfor calculating the texture parameter vector mean andcovariance at time l, respectively. Protecting the systemfrom adapting to a wrong target is the most importantproblem for a self-adapting system. In the tracking sys-tem presented here are several levels of protection. Af-ter passing another test with threshold �c=t =2, color meansamples are collected at the pixel level and the texturemean samples are collected at the region level. Samplesof the covariances are collected after passing the fusionthreshold. The sum of means at time l=0 are added tothe above estimates to make the system memorize the ini-tial values. This way, the resulting estimates are alwaysin close proximity to the initial values.

6.1. Algorithm

This section describes the overall system (Fig. 3). Firsta region of the color image, which is bounded by a track-ing window, is sub-sampled and color segmented withthe method described in Section 4. The biggest connectedcomponent is selected as the region of interest. Withinthis region, the color image is segmented by fusing tex-ture and color as described in Section 5 and the biggestconnected component again taken as the 9nal target ROI.This two-step process e@ectively reduces the size of theregion at which a texture analysis has to be performed.


Fig. 3. Schematic overview of the system details for trackingmoving targets with adaptive color and texture segmentation.

Suitable samples for the color and texture adaptationprocess are obtained from within the target ROI. Theadaptation process updates the color and the texturemodel before the next frame. Using predicted and mea-sured locations, a Kalman 9lter is used to estimate thenext location of the target. A window centered at theestimated location is used as the initial region for thenext frame. The window size is allowed to change withthe size of the target.

7. Experimental evaluation

This section describes the experiments for evaluatingthe performance of the proposed adaptive texture andcolor segmentation method. Two test databases were cre-ated for this purpose. The 9rst database contains staticimages with various objects each having di@erent tex-tures and colors, and was used to 9ne tune and developthe basic texture and color segmentation algorithm. The

experimental results and observations are described inSection 7.1. The second database consists of a set ofdynamic image sequences and was used for testing theadaptation of color textures for moving target tracking.The results of the experiments are shown in Section 7.2.

7.1. Experiments with static images

This section presents experimental results and obser-vations for the segmentation of static images using colorand texture fusion. The main purpose of these experi-ments is to show how the fusion of cues enhances thereliability of the segmentation. The proposed algorithmis tested on real color texture images. The images werechosen to have objects with colors and textures similarto the background.Three real image examples were selected to show the

importance and the e@ectiveness of color and texture fu-sion. The segmentation result for an image containingthree di@erent wooden textures is shown in Fig. 4. Thesystem was initialized by taking a sample from the up-per left texture in Fig. 4(a). Fig. 4(b) shows that thecolor segmentation itself cannot separate the left texturedwooden surface from the right one. The segmentationthat is achieved by combining texture and color is shownin Fig. 4(c). The result shows that the texture cue clearlyhelps to discriminate the object from the background.Fig. 5 shows a scene containing a tree and a patch

of lawn. The goal of this experiment is to separate thetree from the background of the image. An appropriatesample from the tree is taken to initialize the procedure.Fig. 5(b) shows that the color segmentation of the treeobject is very noisy. No clear discrimination between thetree and the background can be seen because both havevery similar color distributions. In Fig. 5(c) the image isthresholded using texture segmentation alone. This givesa better result compared to the color segmentation be-cause the tree and the lawn di@er more in texture than incolor. However, small false positive components can stillbe seen. The image in Fig. 5(d) shows the result of fusingtexture and color cues. The tree is now clearly separatedfrom the background. The frame was taken from a com-pressed movie 9le, therefore, the compression algorithmproduces a noise pattern on the image which can be seenin Fig. 8(b).The image in Fig. 6 serves to demonstrate how the

proposed algorithm increases the separation of a tar-get object from the background. An image sample wastaken from the center of the ball. The goal of this ex-periment is to investigate for each location (x; y) in theimage, the similarity z=1=d(x; y) between that locationand the ball (i.e., the image sample that was taken fromthe ball). In particular, this experiment shows the advan-tage of performing a texture parameter averaging overthe nine neighbor windows for each image location. Asthe similarity measure, the inverse Mahalanobis distance


Fig. 4. Experiment performed on a natural image containing three wooden textures (a). The result of performing a color segmentationcan be seen in image (b). (c) shows the segmentation that is obtained by fusing texture and color.

Fig. 5. Experiment performed on a natural image containing natural textures (a). The result of performing a color segmentation canbe seen in image (b). (c) shows the segmentation that is obtained by using texture alone. The fusion of texture and color results inimage (d).

1= t;ball according to Eq. (25) was used. Fig. 7 shows thesimilarity measurement when we look along the y-axis.Fig. 7(a) shows the result of similarity map obtained byusing texture segmentation without performing parame-ter averaging (i.e., the texture parameter is calculated es-timated only from the sub-centered region N (xs)). Eventhough a separation between the objects can be seen inthe graph, it is not enough to discriminate the ball fromthe background. Fig. 7(b) shows the result for using theapproach of averaging the nine texture distances obtainedfrom neighboring region windows. It can be seen, thatthe separation is increased and the block variation is de-creased. In Fig. 7(c), the similarity map is obtained byusing the proposed model for integration of color and tex-ture. As seen in Fig. 7(c), the color integration enhancesthe separation of the ball image quite well.

7.2. Experiments with dynamic image sequences

This section presents experimental results and obser-vations for the proposed tracking system applied to dy-namic image sequences. The main purpose of these ex-periments is to show how the tracking system performsin dynamic environments. The tracking system is testedon various synthetic and natural image sequences. Eachexperiment starts with an initialization process where asmall area is selected for estimating the initial parame-ters of the color and texture models. After the parameterestimation, the tracking process begins with the segmen-tation step inside the selected area.The 9rst type of image sequences is captured from nat-

ural outdoor scenes. Several scenes that included treesand grass were selected for two reasons. Firstly, trees


Fig. 6. Image of a football that is used for examining the discrimination power of the texture and color fusion approach. The footballhas a spatially varying texture and color due to its shape and the lightning condition.

Fig. 7. Similarity measures after segmentation based on (a) one texture sample region obtained from the sub-centered window,(b) the average of the texture estimates based on the nine neighbor windows, (c) the combination of color and texture.

and grass have distinct textures while often being verysimilar in color. Secondly, due to the self-similarity atdi@erent scales, trees provide good textures at any view-ing distance. Motion is introduced by changing the cam-era location in the scene. Variations in color and textureare obtained from the change in viewing direction andfrom the digitization process. The algorithm was oper-ated on decompressed image data which produces a noisepattern, that changes with the motion of the camera. Asan example, the di@erence between sample images taken

from the same location of the tree object can be seen inFig. 8(a) for the stationary camera and in the Fig. 8(b)for the moving camera.Among the natural sequences, one image sequence is

selected to demonstrate the result of the cue integration,the adaptation of model parameters and the tracking per-formance. Fig. 8(c) shows the result of the tracking al-gorithm with the tree as the target. Fig. 9(a) and (b)show the adaptation of the 9rst component of the meancolor vector and the 9rst parameter of the texture model


Fig. 8. Image showing a section of the tree for a stationary camera (a) and a moving camera (b). The full tree tracking sequencecan be seen in (c).

vector, respectively. Fig. 10 shows the result of trackingthe tree with the tracking window (incorrectly) centeredon the grass. Fig. 11(a) and (b) again show the adapta-tion of the 9rst components of the color and the texturevectors. It can be seen how the parameter vectors areadapted continuously over time during the sequence. InFig. 10 the tracker initially cannot 9nd the target objectand does not perform any adaptation until frame number75 when it 9nds the tree object and begins the adaptation.The adaptation is made especially challenging due to ver-tical camera motion between frames 85 and 180 whichleads to Nuctuations in the color model. The tracker needssome time to zero in on the color model of the tree un-til it 9nally stabilizes after frame 140. This experimentalso shows the independence of the texture and color

cues. While at certain times the color parameter has tobe adapted to account for changes in the color propertiesof the target object, the texture parameters remain stableand vice versa.To obtain a more controlled situation, the second type

of sequences are synthetically combined images of natu-ral textures. Figs. 12 and 14 demonstrate how the integra-tion makes the segmentation more reliable. The imagesconsist of background and foreground textures which arereal texture images obtained from Columbia Universityand Utrecht University reNectance and texture database[31]. A background texture moves along two paths fromthe upper side of the image to the bottom left of the im-age. On its path, it is occluded by several foreground tex-tures. The foreground textures are selected according to


Fig. 9. The adaptation of the E component of the color param-eter vector (a) and the 9rst component of the texture parametervector (b) during the tracking of the tree in Fig. 8.

their color and texture similarity. The parameter vectorsare initialized with samples from the right upper cornerof the background texture. The velocity vector is initiallyset to a non-zero value towards the left bottom corner ofthe image. Figs. 12 and 14 shows the same experimentwith di@erent paths and tracking window sizes. Fig. 13(a)and (b) show the adaptation of the 9rst parameter of themean color vector and the 9rst parameter of the texture

Fig. 10. Sequence showing the capture and subsequent tracking of the tree. The sequence is the same as in Fig. 8.

model vector, respectively. Figs. 12 and 14 show that thetarget texture gets partially occluded by the foregroundtextures. However, the algorithm manages to continuetracking the target, without adapting to the color and tex-ture of the foreground. Thus, this shows, how the com-bination of cue fusion, adaptation and prediction makesthe tracking system reliable, even in the presence of oc-clusion.

8. Discussion

The experiments show that the combination of colorand texture cues can provide more information than anyof the cues alone. The main reason is their independencefrom each other. While color is processed based on thechromacity information, the texture parameters are cal-culated from the luminance. In addition the cues are ex-tracted at di@erent scales. Color is de9ned on a per pixelbasis and is processed at this level. Texture, however, isextracted at a region level, which allows to obtain moreinformation about the spatial relationship of the underly-ing structure.The local information supported by the neighborhood

information results in a high entropy extraction per pixel.However, the di@erent scales of the information makesthe probabilistic combination di?cult. In this work colorand texture are both combined at a regional level.


Fig. 11. The adaptation of the E component of the color parameter vector (a) and the 9rst component of the texture parametervector (b) during the tracking of the tree in Fig. 10. It can clearly be seen how the parameters are not adapted while the tree is notcaptured by the tracker.

Because texture information is calculated over a set ofregions, it gives more reliable results than when calcu-lated based on a single region.Choosing the right window element size M used for

the texture parameter calculation is very important, es-pecially for getting good representation of the parametercovariance matrix. Larger values yield better texture pa-rameter estimates but increase the computational cost. Ingeneral, the window element size depends on the texturescale of the target object. After the experiments and costcalculation, M =5 was selected as the most typical win-dow element size for the system.In this work, the average of the Mahalanobis dis-

tances calculated from the parameter vectors is used.Other forms of the function (23) are possible and wereinvestigated, but the average of distances gives the bestresult both in terms of performance and cost e?ciency.Although the adaptation to environmental changes

makes the tracking system more Nexible, it has impor-tant drawbacks. There is a risk of adapting the modelparameters to wrong targets or the background whichcan result in a complete loss of target. However, theindependence of the cues prevent to some extend theadaptation to a wrong target because changes in the en-

vironment are less likely to cause changes in both cuesat the same time (Figs. 9 and 11). For example, scalingin general changes the texture of an object but leavesthe object color unchanged. Furthermore, the use of tex-ture and color cues can also aid when tracking multipletargets because the chances of targets being similar inboth color and texture are less than similarity in a sin-gle cue alone. In this work additional counter measureswere proposed to prevent adapting to the wrong targetafter the color and texture segmentation, as explained inSection 6. When the tracking target and the backgroundboth show very little actual texture, the system reducesto an adaptive color tracker. Nevertheless, even the ab-sence of a texture is modeled by the system and ourtwo cue approach aids in situations where non-texturedtargets are moving in front of a textured background.Further experiments will have to show, to what extend

the adaptation of the covariance matrices is necessary.First, tests indicate that the covariance matrices changelittle over time and that the tracking algorithm performswell, even if the covariances are not adapted. Assumingconstant covariances would be a great advantage becausethe recalculation of the covariance matrices is computa-tionally very expensive.


Fig. 12. Synthetic dynamic tracking sequence of a large textured object partially occluded by foreground textures.

One of the di?culties faced in this work is the quanti-tative performance evaluation of the results because es-tablishing the ground truth for this work is very di?cult.For the segmentation results, the error and the correctpercentages were not computed because the main pur-pose of the system is to detect the presence of the targetobject. A quantitative measure of the segmentation doesnot express the performance of the proposed trackingscheme.It is important to note that our work so far has only

been aimed at investigating the feasibility of fusing tex-ture and color for target tracking. Depending on the pa-rameters, especially the window element size M , the al-gorithm runs at 9ve frames per second on a 200 MHzSGI O2. A careful implementation of the algorithm willbe able to run at near real-time speed on the hardwaredescribed above. To achieve real-time performance, ad-ditional strategies can be implemented. For example, acompromise between robustness and performance couldbe achieved by performing a texture analysis on everysecond frame while doing the color segmentation for ev-ery frame.

The experiments have shown that the texture segmen-tation gives very high error distances near the bound-aries. The cause of this might be unbalanced textures atthese points. This observation could in the future be usedfor controlling the boundary location of the target ob-ject. The integration method can be improved by chang-ing the thresholds �c and �t adaptively over time, for ex-ample by using threshold histograms. The Kalman 9ltercan also be used for updating the parameter vectors butthis would require modeling the color and texture vectorchanges parametrically.

9. Conclusion

This paper presents a novel technique for combiningtexture and color segmentation in a manner that makesthe segmentation robust under varying environmen-tal conditions while keeping the computation e?cientfor real-time target tracking. A probabilistic basis isused for combining texture and color cues using theGibbs Markov Random Field for the texture and the 2D


Fig. 13. The adaptation of the E component of the color parameter vector (a) and the 9rst component of the texture parametervector (b) during the tracking of the texture in Fig. 12.

Gaussian Distribution for the color segmentation. Boththe color and texture segmentation is adapted over timein a manner that increases the probability of correctsegmentation while not “drifting” with the changingbackground. Extensive experiments with both static anddynamic image sequences were used for establishing thefeasibility of the proposed approach. The work demon-strates the advantages of fusing multiple cues withina stochastic formulation while providing a scheme forpractical implementation of target tracking applications.

10. Summary

The capability to visually track moving objects inreal-time is important and fundamental for many com-puter vision systems. For example, in gesture basedhuman computer interaction systems, it is necessary totrack fast moving body parts in real-time in highly am-biguous and cluttered environments. This work presentsa new technique to make real-time visual tracking morereliable.Many methods for real-time visual tracking of mov-

ing objects are based on color segmentation becausecolor is to some degree invariant to scale, illumina-

tion and viewing direction while being computationallye?cient. To account for object color changes due tocomplex backgrounds and changing environments, dif-ferent color-based tracking approaches reported in theliterature have demonstrated a certain degree of adap-tation to object color changes. However, to achieverobust tracking over long periods of time in changingenvironments, it is often not su?cient to rely on only asingle cue.To overcome this problem, many multiple cue fusion

techniques have been suggested. One cue that comple-ments color nicely is texture. However, texture segmen-tation has until now not been considered for real-timeobject tracking.This is mainly due to two reasons. First, texture pa-

rameter estimation and segmentation is computationallyine?cient, making real-time implementation di?cult.Second, unlike color, obtaining scale and rotation in-

variant texture information in a dynamic environment isvery di?cult. This work presents a probabilistic formu-lation for fusing texture and color in a manner that makesthe segmentation reliable while keeping the computatione?cient with the goal of real-time target tracking.The texture information is modeled using Gibbs

Markov Random Field (GMRF) theory. GMRFs are able


Fig. 14. Synthetic dynamic tracking sequence of a small textured object partially occluded by foreground textures.

to capture contextual constraints and yield probabilitydistributions for interacting texture features. The partic-ular autobinomial GMRF model used in this work hasrecently been shown to be a powerful, robust descriptorof spatial information in a remote-sensing image data[17]. It furthermore has the advantage of providing aclosed form solution for the texture parameters, whichallows exploiting texture information in a computation-ally e?cient way. Color information is modeled usingtwo-dimensional Gaussian distributions. The fusion ofcolor and texture cues is achieved by estimating the jointprobability distribution of the two models.To account for changes in the appearance of the object

under varying environmental conditions, the tracking al-gorithm adapts the texture and color parameters of thetarget object in a statistical framework. In addition, thee?ciency is improved by using a Kalman 9lter to predictthe dynamics of the target.Extensive experiments with both static images and dy-

namic image sequences from synthetic as well as naturalscenes demonstrate the feasibility of the proposed ap-proach. The results thus show that the fusion of texture

and color can yield more robust and still computationallye?cient tracking.

Acknowledgements

This work was supported in part by the follow-ing grants: National Science Foundation CAREERGrant IIS-97-33644, NSF Grant IIS-0081935, and USArmy Research Laboratory Cooperative Agreement No.DAAL01-96-2-0003.

References

[1] S.J. McKenna, S. Gong, Y. Raja, Modelling facial colourand identity with Gaussian mixtures, Pattern Recognition31 (12) (1998) 1883–1892.

[2] R. Schuster, Color object tracking with adaptivemodeling, Proceedings of the IEEE Symposium on VisualLanguages, St. Louis, Missouri, 1994, pp. 91–96.


[3] J. Yang, W. Lu, A. Waibel, Skin-color modeling andadaptation, Proceedings of the Third Asian Conference onComputer Vision, Hong Kong, China, 1998, pp. 687–694.

[4] J.C. Terrillon, M. David, S. Akamatsu, Automaticdetection of human faces in natural scene images by useof a skin color model and invariant moments, Proceed-ings of the International Conference on AutomaticFace and Gesture Recognition, Nara, Japan, 1998, pp.112–117.

[5] W. Sharbek, A. Koschan, Color segmentation survey,Technical Report, University of Berlin, 1994.

[6] S.E. Umbaugh, R.H. Moss, W.V. Stoecker, A.G. Hence,Automatic color segmentation algorithms, IEEE Eng.Med. Biol. 12 (3) (1993) 75–82.

[7] P.W. Power, R.S. Clist, Comparison of supervised learningtechniques applied to color segmentation of fruit images,Proc. SPIE (1996) 370–381.

[8] A.G. Hence, S.E. Umbaugh, R.H. Moss, W.V. Stoecker,Unsupervised color image segmentation, IEEE Eng. Med.Biol. 15 (1) (1996) 104–111.

[9] M.J. Jones, J.M. Rehg, Statistical color modelswith applications to skin detection, Technical report,Cambridge Research Laboratory, 1998.

[10] R. Kieldsen, J. Kender, Finding skin in color images,Proceedings of the International Conference on AutomaticFace and Gesture Recognition, 1996, pp. 312–317.

[11] D.A. Forsyth, A novel approach to color constancy, Int.J. Comput. Vis. 5 (1) (1990) 5–36.

[12] G. Healey, D. Slater, Global color constancy: recognitionof objects by use of illumination-invariant properties ofcolor distributions, J. Opt. Soc. of Am. A 11 (11) (1994)3003–3010.

[13] W.C. Huang, C.H. Wu, Adaptive color image processingand recognition for varying backgrounds and illuminationconditions, IEEE Trans. Industrial Electronics 45 (2)(1998) 351–357.

[14] G. Paschos, K.P. Valavanis, A color texture basedvisual monitoring system for automated surveillance,IEEE Trans. System, Man, Cybernetics 29 (1) (1999)298–307.

[15] T.R. Reed, J.M. Hans Du Buf, A review of recent texturesegmentation and feature extraction techniques, CVGIP:Image Understanding 57 (3) (1993) 359–372.

[16] J. Goutsias, Mutually compatible Gibbs images:properties, simulation and identi9cation, IEEE Trans.Inform. Theory 35 (6) (1989) 1233–1249.

[17] M. Schroder, H. Rehrauer, K. Seidel, M. Datcu, Spatialinformation retrieval from remote-sensing images—part

2: Gibbs–Markov random 9elds, IEEE Trans. Geosci.Remote Sensing 36 (5) (1998) 1446–1455.

[18] S.Z. Li, Markov Random Field Modeling in ComputerVision, 1st edition, Springer, Berlin, 1995.

[19] K.V. Mardia, G.K. Kanji, Statistics and Images, 1stedition, Abingdon, Oxfordshire, UK, 1993.

[20] H. Derin, H. Elliott, Modeling and segmentation of noisyand textured images using Gibbs random 9elds, IEEETrans. Pattern Analysis Mach. Intell. 9 (1) (1987) 39–55.

[21] G.R. Cross, A.K. Jain, Markov random 9eld texturemodels, IEEE Trans. Pattern Anal. Mach. Intell. 5 (1)(1983) 25–39.

[22] D.K. Kumar, G. Healey, Markov random 9eld models forunsupervised segmentation of textured color images, IEEETrans. Pattern Anal. Mach. Intell. 17 (10) (1995) 939–954.

[23] M.P.D. Jolly, A. Gupta, Color and texture fusion:application to aerial image segmentation and GISupdating, Proceedings of the Third IEEE Workshopon Applications of Computer Vision, Sarasota, Florida,1996, pp. 2–7.

[24] R. Murrieta-Cid, M. Briot, N. Vandapel, Landmarkidenti9cation and tracking in natural environment,IEEE=RSJ International Conference on Intelligent Robotsand Systems, Victoria, B.C., Canada, 1998, pp. 738–740.

[25] D.S. Jang, H.I. Choi, Moving object tracking byoptimizing models, Proceedings of the InternationalConference of Pattern Recognition, Brisbane, Australia,1998, pp. 738–740.

[26] Y.N. Deng, B.S. Manjunath, V. Netra, Toward anobject-based video representation, IEEE Trans. CircuitsSystems Video Technol. 8 (5) (1998) 616–627.

[27] E. Ozyildiz, Adaptive texture and color segmentation fortracking moving objects, Master’s thesis, PennsylvaniaState University, 1999.

[28] M.J. Swain, D.H. Ballard, Color indexing, InternationalJ. Comput. Vis. 7 (1) (1991) 11–32.

[29] E. Saber, A.M. Tekalp, Integration of color, shape, andtexture for image annotation and retrieval, Proceedingsof the International Conference on Image Processing,Laussane, Switzerland, 1996, pp. 851–854.

[30] T. Pavlidis, Y.T. Liow, Integrating region growing andedge detection, IEEE Trans. Pattern Anal. Mach. Intell.12 (3) (1990) 225–233.

[31] S.K. Nayar, K.J. Dana, B.V. Ginneken, J.J. Koenderink,Columbia-Utrecht reNectance and texture database,http:==www.cs.columbia.edu=CAVE=curet, 1999.

About the Author—ERCAN OZYILDIZ received the M.S. degree in Electrical Engineering from the Pennsylvania State Universityin 1999. He is currently working as a system engineer at Telcordia Technologies. His research interests include network performancemanagement, network design, and the real-world applications of computer vision.

About the Author—NILS KRAHNST -OVER is currently pursuing a Ph.D. degree at the Department of Computer Science andEngineering at the Pennsylvania State University. He obtained his M.S. in Physics at the Christian-Albrechts-University of Kiel,Germany in 1998. He has worked for the Philips Research Laboratories in Hamburg, Germany and for Advanced InterfaceTechnologies, Inc. His research interests include computer vision-based human–computer interfaces, in particular the detection,analysis and classi9cation of articulated motion and visual tracking.


About the Author—RAJEEV SHARMA received the Ph.D. degree in Computer Science from the University of Maryland in 1993.He is currently an Associate Professor in the Department of Computer Science and Engineering at the Pennsylvania State University.Prior to this he spent three years at the University of Illinois, Urbana-Champaign as a Beckman Fellow and Adjunct AssistantProfessor in the Department of Electrical and Computer Engineering. His research interests lie in studying the role of computervision in robotics and advanced human–computer interfaces. Dr. Sharma received the NSF CAREER award in 1998. He was arecipient of the ACM Samuel Alexander Doctoral Dissertation Award and an IBM pre-doctoral fellowship. He has served on theprogram committee of several conferences, as the Program Co-Chair for the 1999 IEEE International Symposium on Assembly andTask Planning and is an Associate Editor for the IEEE Transactions of Pattern Analysis and Machine Intelligence.

Documents

Adaptive texture and color segmentation for tracking moving objects