15
Visual tracking by separability- maximum boosting Jie Hou Yao-bin Mao Jin-sheng Sun Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 08/30/2013 Terms of Use: http://spiedl.org/terms

Visual tracking by separability-maximum boosting

Embed Size (px)

Citation preview

Page 1: Visual tracking by separability-maximum boosting

Visual tracking by separability-maximum boosting

Jie HouYao-bin MaoJin-sheng Sun

Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 08/30/2013 Terms of Use: http://spiedl.org/terms

Page 2: Visual tracking by separability-maximum boosting

Visual tracking by separability-maximum boosting

Jie HouYao-bin MaoJin-sheng Sun

Nanjing University of Science and TechnologySchool of Automation

Nanjing, ChinaE-mail: [email protected]

Abstract. Recently, visual tracking has been formulated as a classi-fication problem whose task is to detect the object from the scene witha binary classifier. Boosting based online feature selection methods,which adopt the classifier to appearance changes by choosing themost discriminative features, have been demonstrated to be effectivefor visual tracking. A major problem of such online feature selectionmethods is that an inaccurate classifier may give imprecise trackingwindows. Tracking error accumulates when the tracker trains theclassifier with misaligned samples and finally leads to drifting.Separability-maximum boosting (SMBoost), an alternative form ofAdaBoost which characterizes the separability between the objectand the scene by their means and covariance matrices, is proposed.SMBoost only needs the means and covariance matrices during train-ing and can be easily adopted to online learning problems by estimat-ing the statistics incrementally. Experiment on UCI machine learningdatasets shows that SMBoost is as accurate as offline AdaBoost, andsignificantly outperforms Oza’s online boosting. Accurate classifierstabilizes the tracker on challenging video sequences. Empiricalresults also demonstrate improvements in term of tracking precisionand speed, comparing ours to those state-of-the-art ones. © 2013SPIE and IS&T [DOI: 10.1117/1.JEI.22.4.041108]

1 IntroductionVisual tracking is a very useful computer vision technologythat is widely used in video surveillance, industry automa-tion, and human-computer interface. Tracking algorithmslocate the object-of-interest in each incoming video frame,so that high-level vision tasks such as target recognitionor behavior analysis can be invoked on the object. In thepast decade, computer vision researchers have brought vari-ous of high-level image and video understanding tasks intoreality, which lead to rising demands of real-time robust vis-ual tracking in real-world environments. However, it is still avery challenging task due to object appearance variations(e.g., scale variation, deformation, blur, and rotation) andclutter scenes (e.g., illumination changes, occlusions, andtextures similar with the object-of-interest).

Numerous adaptive appearance models have been pro-posed to handle object appearance variations. WSL tracker1

uses a wavelet-based appearance model and employs anonline expectation maximum (EM) algorithm to update the

model parameters. Inspired by Black’s offline subspaceappearance model,2 Ross develops an incremental subspacemethod.3 In Ref. 4, an online sparse representation method isproposed for handling partial occlusions and pose changes.These adaptive approaches are also categorized as generativemodels in Ref. 5, as all of them learn a representation for theobject-of-interest directly, and adopt appearance variationsby online learning. Usually, adaptive appearance modelslocate the object-of-interest by searching for an image patchwith the most similar representation [see Fig. 1(a)].

To improve tracking precision in real-world environ-ments, discriminative appearance models,6–10 which handlecluttered scenes as well as object appearance variationssimultaneously, are also developed [see Fig. 1(b)]. Theseapproaches train a binary classifier to separate the object-of-interest from its surrounding background. They are alsocalled tracking-by-detection, for they take visual trackingas redetecting the object-of-interest over succeeding frames,and typically utilize pixel-wise sliding windows to search forthe best target location. Most of the tracking-by-detectionmethods use boosting based online feature selection.5,7–10

Original online boosting11 is an ensemble learning algorithmwhich combines weak online learners chosen in a priori intoa stronger one. Grabner7 pointed out that the online boostinglacks feature selection mechanism at the ensemble classifierlevel. He proposed to perform feature selection over a fea-ture pool with “selectors”7,12 at the weak classifier level.Therefore, the tracker is able to adopt to object appearancevariations and scene changes by choosing the most discrimi-native features. The major issue of Grabner’s tracker is thataccumulated tracking error will affect its robustness.8 Whenthe boosting classifier gives an imprecise target location, thetracker will take a misaligned positive sample. The mis-aligned samples degrade classifier accuracy if they are putinto classifier training and finally lead to tracker drifting.In their later work,8 online semi-supervised boosting is pro-posed to solve this problem. Samples collected duringtracking are learned as unlabeled data, so that the classifierwill not be affected by imprecise tracking windows. InRef. 13, P-N learning is proposed to identify noisy samplesand outliers during tracking. The samples are put back toclassifier training with correct labels that P-N constraintsexpect. In Ref. 9, Babenko formulates visual tracking as amultiple instance learning problem on ambiguity samples,and puts the samples into positive bags and negative bags.A multiinstance boosting algorithm that trains the classifier

Paper 13169SS received Apr. 10, 2013; revised manuscript received Jun. 28,2013; accepted for publication Jul. 11, 2013; published online Aug. 19,2013.0091-3286/2013/$25.00 © 2013 SPIE and IS&T

Journal of Electronic Imaging 041108-1 Oct–Dec 2013/Vol. 22(4)

Journal of Electronic Imaging 22(4), 041108 (Oct–Dec 2013)

Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 08/30/2013 Terms of Use: http://spiedl.org/terms

Page 3: Visual tracking by separability-maximum boosting

with likelihood function over the bags is developed in theirwork.9 The MIL tracker has demonstrated a good perfor-mance to handle drifting. Zeisl combined the idea ofsemi-supervised learning and multiinstance learning, andproposed an online semi-supervised multiple-instance boost-ing in Ref. 10. It worth noticing that all the above approachesaddress the drifting problem by stabilizing the tracker onnoisy data. However, classifier accuracy of these improvedboosting algorithms have not been discussed in the litera-ture.7–10 As is reported in Oza’s thesis,11 online boostingis significantly less accurate than its offline counterpartwhen given finite data. With limited samples, it is quite dif-ficult for online boosting to choose the most discriminativefeatures.9

In this paper, separability-maximum boosting (SMBoost),as an alternative form of AdaBoost, which characterizes theseparability between the object and the scene by their meansand covariance matrices, is proposed. This leads to a novelappearance model presented in Fig. 1(c). The appearancemodel is split into two parts: statistical representation andprediction rule. We first represent the object and the back-ground with their means and covariance matrices (statisticalrepresentations), then train a boosting classifier (predictionrule) with SMBoost to separate the object from the scene.We extend SMBoost into a family of feature selection meth-ods, and adopt them to online learning problems by estimat-ing the statistical representations incrementally. Experimenton UCI machine learning datasets shows that SMBoost isas accurate as offline AdaBoost, and significantly outper-forms Oza’s online boosting. This is a major advantageof SMBoost. Experimental results show that the accurateclassifier reduces tracking error on each video frame andsolves the drifting problem. Tracking results on challengingvideo sequences also demonstrate improvements in term oftracking precision and speed, comparing SMBoost basedtrackers to those state-of-the-art ones.

The remainder of the paper is organized as follows: inSec. 2, a brief review of visual tracking with online boost-ing is given. In Sec. 3, we introduce SMBoost, the majorcontribution of this paper, in detail, and extend it to a familyof online feature selection methods for visual tracking prob-lems. Experimental results on UCI machine learning datasetsand tracking results on test video sequences are reported inSec. 4. Finally, the conclusion is drawn in Sec. 5.

2 Tracking with Online Boosting

2.1 System OverviewThe basic flow of tracking system using online boosting5,7–10

is illustrated in Fig. 1(b). Let lðxÞ denote the location ofimage patch x and FðxÞ is the classifier that is trained to sep-arate the object from the scene, so the detailed procedures forthe t’th frame are as follows:

1. Searching stage: searching for the optimal target loca-tion l�t ðxÞ with

l�t ¼ lðarg maxx

FðxÞ; x ∈ XsÞ; (1)

where Xs ¼ fxjs > klðxÞ − l�t−1ðxÞkg is slidingwindow images within the searching region [seeFig. 2(a)], and s is the searching radius. From Eq. (1),it is easy to see that the tracking precision of onlineboosting based approaches heavily relies on the clas-sification accuracy of FðxÞ.

2. Updating stage: cropping image patches that satisfyx ∈ Xp ¼ fxjr > kl�t − lðxÞkg as positive samples,and image patches within the surrounding region x ∈Xn ¼ fxjα < kl�t − lðxÞk < βg as negative samples[see Fig. 2(b)]. r < α < β are radiuses that define thesearch regions for training samples. Then the trackerupdates the classifier FðxÞ with the new samples

representationtracking

updating

(a) Adaptive appearance model

classifiertracking

training

training

(b) Discriminative appearance model

classifier

statisticalrepresentationof background

statisticalrepresentationof the object

tracking

updating

updating

training

training

(c) Appearance model used in this paper

Fig. 1 Overview of appearance models in visual tracking problems. The red box represents the object-of-interest and the blue box represents itssurrounding background. Adaptive appearance models the object with a representation of good invariance; discriminative appearance modelsobject appearance with a discriminative classifier; in this paper, we split appearance model into two parts: statistical representation for capturingappearance invariance and discriminative classifier for high classification accuracy.

Journal of Electronic Imaging 041108-2 Oct–Dec 2013/Vol. 22(4)

Hou, Mao, and Sun: Visual tracking by separability-maximum boosting

Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 08/30/2013 Terms of Use: http://spiedl.org/terms

Page 4: Visual tracking by separability-maximum boosting

fðx; yÞjy ¼ þ1 if x ∈ Xp; and y ¼ −1

if x ∈ Xng:(2)

2.2 Feature Selection with AdaBoostand its Online Variation

The term “boosting” refers to the process of taking weakclassifiers with accuracy greater than random guess, andboosting their performances by combining them into a jointprediction rule. In visual tracking problems, boosting isused to combine weak classifiers fiðxÞ into an additive strongclassifier

FðxÞ ¼XTi¼1

wifiðxÞ; (3)

where wi is scalar weights. Typically, fi is chosen from a setof decision stumps on Haar-like features by greedy search,7–10

fi ∈ H ¼ fhiðxÞjhi ¼ sign½haariðxÞ�; haarðxÞ: Rn ↦ RgMi¼1.The Haar-like features used in visual trackers7–10 are describedin Fig. 3.

Since each weak classifier corresponds to a Haar-likefeature, each sample x has an image HðxÞ ¼ ½h1ðxÞ;h2ðxÞ; : : : ; hMðxÞ�T in feature space H. The strong classifierEq. (3) can be represented as a linear combination of all theweak classifiers:14

FðxÞ ¼ wTHðxÞ; w ≥ 0; and kwk0 ¼ T; (4)

where w is the weak classifier weights, w ≥ 0 means each wiin w has wi ≥ 0, and kwk0 ¼ T means there are T nonzeroentries in w. Typically, M is much larger than T, thuslearning Eq. (4) can be viewed as a feature selection problemthat chooses the most discriminative features from thefeature pool fhaariðxÞg. Suppose a labeled training set S ¼fðxi; yiÞjxi ∈ X; yi ¼ �1gNi¼1 is given, AdaBoost has proved

to minimize the exponential loss function of classificationmargin yFðxÞ15

F� ¼ arg minF

XNi¼1

e−yiFðxiÞ: (5)

Different from batch learning, only one sample at a time isused for training at in online learning. The major difficulty ofonline AdaBoost is estimating

PNi¼1 e

−yiFðxiÞ with a chang-ing classifier FðxÞ when the entire training set is not avail-able at one time. In 2001, Oza presented an online variant ofAdaBoost by simulating the reweighting procedure of offlineAdaBoost.11 In his variant, the weak classifiers fiðxÞ arepreselected and update online for each incoming sample.The algorithm estimates weak classifier weights by keepingthe running average error of each weak classifier. For naïveBayes learners, Oza proved that his online boosting andoffline AdaBoost converge to the same classifier whengiven infinity samples. However, we are not able to takeinfinity positive samples in tracking problems, since thereis only one target window in each frame. With limited sam-ples, Oza’s online boosting is significantly less accurate thanoffline AdaBoost.11 Moreover, Oza’s method lacks a featureselection mechanism, which has been demonstrated as nec-essary for good tracking performance.16,17 Grabner extendedOza’s algorithm with selectors in Ref. 12. The selectorperforms feature selection over a feature pool at the weakclassifier level. Thus the classifier is able to select themost discriminative features in an online manner. Thisextension is also employed by online MIL9 and MILSER.10

However, it lacks theoretical and empirical analysis on clas-sification accuracy. A second issue of this extension is thatit updates all the weak classifiers in the pool after choosinga feature, which would be much too time-consuming if Tis large.5

3 Tracking with SMBoostGrabner uses Oza’s online variant of AdaBoost to train aclassifier to separate the object from the scene. One mainquestion of interest is how the tracker measures the separabil-ity between the object and the scene. In this section, weprove that a tracker using AdaBoost maximizes μ − 1

2σ2,

where μ is the expectation of classification margin yFðxÞ,

Fig. 2 An illustration of the detailed procedures for tracking withonline boosting.

Fig. 3 Haar-like features. Original Haar-like features used in Ref. 7are generated from four templates, by moving and scaling the tem-plates inside the tracking window. Babenko9 extended the featureand use random rectangles instead of fixed templates.

Journal of Electronic Imaging 041108-3 Oct–Dec 2013/Vol. 22(4)

Hou, Mao, and Sun: Visual tracking by separability-maximum boosting

Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 08/30/2013 Terms of Use: http://spiedl.org/terms

Page 5: Visual tracking by separability-maximum boosting

and σ2 is its variance. Thus, we can use μ − 12σ2 to character-

ize the separability. A detailed explanation is given below.Assuming that all the training examples are randomly

independently drawn, it can be proved that classificationmargin of AdaBoost is approximately Gaussian distributed(see the detailed proof of Lemma 3.1 in Shen’s work14

and empirical study in Ref. 18). Suppose yFðxÞ ∼ Nðμ; σÞ,the moment-generating function of probabilistic distributionp½yFðxÞ� should be eutþ1

2σ2t2 . According to the definition of

moment-generating function, we have

MðtÞ ¼Z

p½yFðxÞ�etyFðxÞdy FðxÞ ¼ eutþ12σ2t2 : (6)

However, the distribution parameters (μ and σ2) of yFðxÞare unknown, thus we use the empirical moment-generatingfunction M̂ðtÞ ¼ 1

N

PNi¼1 e

tyiFðxiÞ of the samples fðx1; y1Þ;ðx2; y2Þ; : : : ; ðxN; yNÞg instead:19

MðtÞ ¼ eutþ12σ2t2 ≈ M̂ðtÞ ¼ 1

N

XNi¼1

etyiFðxiÞ: (7)

Considering a tracker using AdaBoost to train theclassifier7

F� ¼ arg minF

XNi¼1

e−yiFðxiÞ: (8)

With Eq. (7), we have

F� ¼ arg minF

1

N

XNi¼1

e−yiFðxiÞ ≈ arg minF

Mð−1Þ

¼ arg minF

e−μþ12σ2 ¼ arg max

Fμ −

1

2σ2; (9)

where F� is the classifier and μ − 12σ2 is the “separability”

the classifier tries to maximize. Equation (9) is an alternativeform of AdaBoost. We call it SMBoost, as it maximizes“separability” μ − 1

2σ2 directly.

For classifier Eq. (4), μ and σ2 can be calculated by themean and the covariance matrix of yHðxÞ. Suppose,m ¼ 1

N

PNi¼1 yiHðxiÞ and Σ ¼ 1

N

PNi¼1½yiHðxiÞ −m�

½yiHðxiÞ −m�T , we have

μ̂¼ 1

N

XNi¼1

yiFðxiÞ¼1

N

XNi¼1

yiwTHðxiÞ

¼wT 1

N

XNi¼1

yiHðxiÞ¼wTm

σ̂2¼ 1

N

XNi¼1

½yiFðxiÞ− μ̂�½yiFðxiÞ− μ̂�T

¼ 1

N

XNi¼1

wT ½yiHðxiÞ−m�½yiHðxiÞ−m�Tw

¼wT ½1N

XNi¼1

½yiHðxiÞ−m�½yiHðxiÞ−m�T �w¼wTΣw: (10)

Thus, SMBoost Eq. (9) can be rewritten as

w� ¼ arg minF

e−wTmþ1

2wTΣw; w ≥ 0; kwk0 ¼ T: (11)

SMBoost Eq. (11) is a novel feature selection method thatmaximizes the separability μ − 1

2σ2 over all possible weights

w. When applied to visual tracking, it leads to a novel stat-istical appearance model presented in Fig. 1(c). In this newparadigm, we first characterize the object-of-interest and thebackground withm and Σ, and then perform feature selectionwith SMBoost Eq. (11). We call ðm;ΣÞ the statistical repre-sentation of the object and the background. ðm;ΣÞ is not arepresentation of good invariance. However, it has provedable to capture appearance invariance and allows efficientfusion of different types of features.20 We notice that onlinelearning paradigm only affects the statistical representationupdating. Online feature selection with SMBoost Eq. (11)would be as discriminative as offline AdaBoost if the stat-istical representations are precise. Without loss of generality,we still use Eq. (9) when discussing boosting algorithms inthe rest of the paper.

In the rest of the section, we first introduce statistical rep-resentation and SMBoost, then we extend SMBoost into afamily of online feature selection methods within the frame-work that is presented in Fig. 1(c), and finally apply them tovisual tracking problems.

3.1 Statistical RepresentationThe first stage of our appearance model is updating the statis-tical representations. Themeanm can be estimated by runningaveragemethod.Foreach incomingsamplexweupdatemwithlearning parameter γ

m←γmþ ð1 − γÞyHðxÞ (12)

and covariance matrix Σ can be calculated by Σ ¼ EfyHðxÞ½yHðxÞ�Tg −mmT . Letting m2¼EfyHðxÞ½yHðxÞ�Tg, it alsocan be updated with

m2←γm2 þ ð1 − γÞyHðxÞ½yHðxÞ�T: (13)

In tracking problems, we update the statistical repre-sentations with a fixed learning parameter. This assignsexponentially decreasing weights over time, and adoptsthe tracker to appearance changes. For learning problems,we use learning parameter γ ¼ n∕nþ 1, thus Eqs. (12)and (13) become a nonoblivious estimating scheme whichassigns the same weights to the samples.

3.2 Separability-Maximum BoostingWe have demonstrated that Grabner’s tracker characterizesthe separability of the object-of-interest and the backgroundwith

separabilityðμ; σÞ ¼ μ −1

2σ2: (14)

Classification error of a voting classifier isP½FðxÞ ≠ y� ¼ P½yFðxÞ < 0�.15 Figure 4 shows the relation-ship of the margin distribution and classification errorP½yFðxÞ < 0�, with the assumption that yFðxÞ is Gaussiandistributed. It suggests that we can reduce classification

Journal of Electronic Imaging 041108-4 Oct–Dec 2013/Vol. 22(4)

Hou, Mao, and Sun: Visual tracking by separability-maximum boosting

Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 08/30/2013 Terms of Use: http://spiedl.org/terms

Page 6: Visual tracking by separability-maximum boosting

error of FðxÞ by maximizing its average margin or minimiz-ing its margin variation. Therefore, maximizing separabilityEq. (14) also improves classifier accuracy.

As we have explained at the beginning of this section,AdaBoost equivalently solves the following separability-maximum problems:

F� ¼ arg minF

e−μþ12σ2 : (15)

We call Eq. (15) SMBoost.Ada. In visual tracking prob-lems, there are some practical issues to be considered, e.g.,training samples, and computational speed. Thus, threeextensions of SMBoost.Ada are also developed in this paper.

3.2.1 Balanced SMBoost

In Ref. 7, the tracker crops the tracking window as a positivesample and four nearby windows as negative samples in eachincoming frame (see Fig. 5). The ratio of positive samplesto negative samples is 0.25, Therefore, feature selectionin Ref. 7 is a learning problem on imbalanced trainingdata.21 Imbalanced training data usually leads to high clas-sification error on positive samples (Researchers assume thatthere are fewer positive samples in imbalanced training data-sets.), but for general learning algorithm only minimizesover-all classification error.21 Typically, cost-sensitive lossfunctions are suggested for boosting algorithms when learn-ing imbalanced training data.22,23 And boosting based featureselection with low false rejecting rate (classification error onpositive samples) has been discussed in Refs. 18 and 24.In this paper, we actually use a balanced cost functionthat minimizes separability of each class instead of theover-all separability

F� ¼ arg minF

Xc∈fþ1;−1g

e−separabilityðcÞ. (16)

In Eq. (16), separabilityðcÞ is the separability for the sam-ples with label c. And we use e−t to maintain the convexityof our objective function. Denoted by μðcÞ ¼ E½yFðxÞjy ¼ c�

the average margin of samples with label c, and denoted byσ2ðcÞ ¼ σ2½yFðxÞjy ¼ c� the variance, the balanced versionof SMBoost.Ada Eq. (15) writes

SMBoost.Balance

F� ¼ arg minF

Xc∈fþ1;−1g

e−μðcÞþ12σ2ðcÞ : (17)

Equation (17) is the first extension to SMBoost.Ada Eq. (15), which will be used to explain the necessityof considering imbalanced data in tracking problems(see our experiments in Sec. 4). μðcÞ and σ2ðcÞ can be calcu-

lated from an extended statistical representation ½mðcÞ;ΣðcÞ�,where mðcÞ ¼ 1

N

PNi¼1 yiHðxiÞjyi¼c is the mean for

samples with label c and ΣðcÞ ¼ 1N

PNi¼1½yiHðximÞ −m�

½yiHðxiÞ −m�T jyi¼c is the covariance matrix.

3.2.2 Discussions on separability

We have discussed the measurement of separability forGrabner’s method.7 However, tracking-by-detection meth-ods are not limited to AdaBoost algorithm, and the definitionof separability is not unique. In fact, the appearance modelpresented in Fig. 1(c) allows us to extend separability toother forms that can be calculated from the statistical repre-sentation. A more obvious choice of separability is

separabilityratioðμ; σÞ ¼μ

σ: (18)

This leads to the second extension of SMBoostSMBoost.Ratio

F� ¼ arg minF

Xc∈fþ1;−1g

e−

μðcÞffiffiffiffiffiσ2ðcÞ

p: (19)

A third extension of SMBoost uses Fisher discriminantratio25 to characterize the separability between the objectand the scene

SMBoost.LDA

F� ¼ arg minF

e−½μðþ1Þ−μð−1Þ�2

σ2ðþ1Þþσ2ð−1Þ : (20)

For classifier Eq. (4), we can rewrite Eq. (20) into

w� ¼ arg maxF

expwTSbwwTSww

; w ≥ 0; kwk0 ¼ T; (21)

where Sb ¼ ½mðþ1Þ −mð−1Þ�½mðþ1Þ −mð−1Þ�T is the between-class scatter matrix and Sw ¼ Σðþ1Þ þ Σð−1Þ is the within-class scatter matrix.

3.3 Separability-Maximum Feature SelectionIn visual tracking, we use classifier Eq. (4) and perform fea-ture selection by choosing optimal weak classifier weights wthat maximize the separability between the object and thescene. Unlike another approach7 which updates weak clas-sifier weights incrementally, SMBoost algorithms train anew w for each incoming frame.

Fig. 4 Margin distribution of boosting classifier F ðxÞ. We can reduceclassification error P½yF ðxÞ < 0� (red part of the distribution) by maxi-mizing average margin μ and minimizing margin variation σ.

Fig. 5 Sampling scheme used in Ref. 7. For each incoming frame, thetracker crops one positive sample inside the tracking window, and fournegative samples around the target.

Journal of Electronic Imaging 041108-5 Oct–Dec 2013/Vol. 22(4)

Hou, Mao, and Sun: Visual tracking by separability-maximum boosting

Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 08/30/2013 Terms of Use: http://spiedl.org/terms

Page 7: Visual tracking by separability-maximum boosting

Suppose the mean and covariance matrix of the samplesare given, SMBoost algorithms minimize their objectivefunctions ΦðwÞ within Friedman’s framework of gradientboosting.26 In every iteration, we first choose a feature inwhose “direction” Φ descends most rapidly

Δw� ¼ arg maxΔw

����Δw dΦdw

����w¼wi

����; Δw ∈ ei; (22)

where feig is the standard basis of linear space RM. Once thedirection is decided, we choose step α� that descends theobjective function as much as possible

α� ¼ arg minα

Φðwi þ αΔw�Þ: (23)

Then, we add the new feature to the classifier:wiþ1 ¼ wi þ α�Δw�.

The pseudocode of separability-maximum feature selec-tion is shown in Algorithm 1. There are some minordifferences between different versions of SMBoost algo-rithms. The gradient dΦ∕dw and the optimal descent stepα� depend on the objective function Φ we choose

SMBoost.Ada

dΦdw

¼ e−μþ12σ2ð−mþ ΣwÞ; (24)

α� ¼ ΔwTðm − ΣwÞΔwTΣΔw

: (25)

SMBoost.Balance

dΦdw

¼X

c∈fþ1;−1ge−u

ðcÞþ12ðσðcÞÞ2ð−mðcÞ þ ΣðcÞwÞ: (26)

The optimal descent step α� can be chosen with linearsearch.26

SMBoost.Ratio

dΦdw

¼X

c¼þ1;−1− ce−

μðcÞσðcÞ

σðcÞmðcÞ − μðcÞΣðcÞwðσðcÞÞ3 : (27)

The optimal descent step α� can be chosen with linearsearch.

SMBoost.LDA

dΦdw

¼ e−wT Sbw

wTSwwSbwðwTSwwÞ − SwwðwTSbwÞ

ðwTSwwÞ2: (28)

The optimal descent step α� is one of the roots of

ðS2bS1w − S1bS2wÞα2 þ ðS2bS0w − S0bS

2wÞαþ ðS1bS0w − S0bS

1wÞ ¼ 0;

(29)

where

S0b ¼ wTi Sbwi; S1b ¼ wT

i SbΔw; S2b ¼ ΔwTSbΔw;

S0w ¼ wTi Swwi; S1w ¼ wT

i SwΔw; S2w ¼ ΔwTSwΔw:(30)

Solving the quadratic Eq. (29) is much faster than linearsearch in SMBoost.Balance and SMBoost.Ratio.

3.4 DiscussionsWe have demonstrated that the boosting-based online featureselection7 actually characterizes separability of the objectand the background with μ − 1

2σ2, and proposed a novel

boosting algorithm Eq. (15) that directly maximizes the sepa-rability. However, we will not use Eq. (15) in visual trackingdirectly due to some practical problems, e.g., data distribu-tion and tracker speed. Thus, several extensions are also pro-posed. SMBoost.Balanced Eq. (17) is designed to handleimbalanced training data, and SMBoost.Ratio Eq. (19) isdesigned to see if we could achieve higher tracking precisionwith an alternative definition of separability. Both Eqs. (17)and (19) need linear search when training, which might leadto a slow tracker. SMBoost.LDA Eq. (20) is a third extensionthat uses Fisher discriminant ratio to characterize the sepa-rability. Experiment in Sec. 4 shows that SMBoost.LDA alsohas a good performance on imbalanced data and trainsvery fast.

3.5 Tracking with SMBoostThe main procedures of our tracker are similar with Ref. 7(see Sec. 2). Given a bounding box as initial tracking win-dow in the first frame, the tracker generatesM Haar-like fea-tures inside the window randomly. In order to make a faircomparison, we use four types of Haar-like features in theoriginal paper.7 The features are converted into decisionstumps with the fixed threshold (0 in this paper). We croppositive samples in a rectangle region r > klðxÞ − l�t k1and negative samples within a surrounding regionα < klðxÞ − l�t k1 < β, then update the statistical representa-tions with learning parameter γ and finally perform featureselection with one of the SMBoost variants introducedabove. SMBoost iterates T times so that it trains a classifierand combines T features. If SMBoost.Ada is used, we update

Algorithm 1 Separability-maximum feature selection.

Input: mean m and covariance matrix Σ of the samples. (mðcÞ andΣðcÞ for balanced algorithms)

1 Initialize: w0←arg minwΦðm;σ;wÞ;w ∈ e i ; /*Φðmðþ1Þ;mð−1Þ; σðþ1Þ;σð−1Þ;wÞ for balanced algorithms*/

2 for i ¼ 2 to T do

3 select a feature: Δw �i ←arg minΔwkΔwT dΦ

dw jw¼wi−1k;

4 choose decent step: α�i ←arg minαΦðm; σ;w i−1 þ αΔw �i Þ;

5 update w : w i←w i−1 þ α�i Δw�i ;

6 end

Journal of Electronic Imaging 041108-6 Oct–Dec 2013/Vol. 22(4)

Hou, Mao, and Sun: Visual tracking by separability-maximum boosting

Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 08/30/2013 Terms of Use: http://spiedl.org/terms

Page 8: Visual tracking by separability-maximum boosting

m with m←γmþ 1−γ2þHðxþÞ þ 1−γ

2

Pmi¼1 −Hðx−i Þ, where

xþ is the positive sample cropped from the current frameand x−i , i ¼ 1; 2; : : : ; m are the negative samples. Thismakes SMBoost.Ada less sensitive to imbalanced data. Atthe searching stage, the tracker searches for the objectwith a fixed-size tracking window in a rectangle searchingregion s > klðxÞ − lt−1k1.

4 ExperimentsIn this section, we first evaluate SMBoost algorithms on UCImachine learning datasets27 to demonstrate their classifica-tion accuracy. Then we compare trackers using SMBoostwith five state-of-the-art trackers and present benchmarkresults on some challenging video sequences. Experimenton UCI datasets is carried out in MATLAB, and the trackingprograms are implemented with C++. All the experimentsare performed on a 2.5 GHz PC that runs 64-bit Linuxwith 2 GB RAM.

4.1 SMBoostWe first evaluate SMBoost.Ada (SMBa), SMBoost.Balance(SMBb), SMBoost.Ratio (SMBr), SMBoost.LDA (SMBl),AdaBoost (ADA), and Oza’s online boosting (OB) onUCI machine learning datasets with fivefold cross-valida-tion. Algorithms in Refs. 7–10 are not included in thisexperiment because the authors did not provide results orcodes for common machine learning tasks. Seven datasetsare used in our experiment, and the descriptions of the data-sets are shown in Table 1. We use fixed decision stumpsas weak classifiers in our experiment, so that the learningproblems are turned into feature selection problems. ForSMBoost algorithms, we estimate the mean and covariancematrix incrementally with the nonoblivious scheme. Theresult of OB is taken from Oza’s thesis,11 and its experimen-tal setups are the same as ours. The result is reported inTable 2.

As we have expected, there are no significant differencesbetween ADA and SMBa, for they actually optimize thesame cost function Eq. (9). Their performances on each data-set are not exactly the same because there are some minordifferences in the training processes. The performance ofSMBa and SMBb is almost the same except for dataset

German and car. SMBb achieves better classification accu-racy on these two imbalanced datasets with the balancedobjective function Eq. (17). Further analysis on balancingwill be presented later. SMBl has similar accuracy toSMBb, and SMBr yields the best performance. As it isreported in Oza’s thesis,11 OB is less accurate than ADAon most datasets. SMBa, SMBb, SMBr, and SMBl also sig-nificantly outperform OB. The experimental result showsthat SMBoost algorithms are at least as accurate as offlineAdaBoost, and significantly outperform Oza’s onlineboosting.

From Table 1, we can see that there are three imbalanceddatasets in our experiment: breast-cancer, German, and car.For these datasets, the ratio of positive samples and negativesamples is between [0.32, 0.53]. Table 3 shows classificationaccuracy with different objective functions. There is no sig-nificant difference between the ADA and the balanced algo-rithms on normal datasets. However, ADA and SMBa havevery low classification accuracy (high false rejection rate) onthe positive samples in datasets German and car. As we haveexplained in Sec. 3.2, ADA and SMBa minimize the overallclassification error. Thus, they only focus on the negativesamples when the dataset is imbalanced. Comparing theresults on breast-cancer, German, and car, it is easy to seethat the classifier has a higher false rejection rate if theratio of positive and negative samples is lower. Thus thesituation will be even worse for Ref. 7, in which the ratioof positive and negative samples is only 0.25. For bal-anced algorithms, there is no significant difference betweenpositive samples and negative samples. This experimentsuggests the use of a balanced objective function in visualtracking.

4.2 Tracking with SMBoostWe compare trackers using SMBoost with four state-of-the-art online boosting tracker, namely, Grabner’s onlineboosting tracker (OB),7 semi-supervised boosting tracker(SEMI),8 multiple instance boosting tracker (MIL),9 andMILSER tracker (MILSER).10 A most recent trackingapproach, compressive tracking (CT),28 is also included inthe experiment. The following SMBoost based trackersare evaluated in our experiment: tracker using SMBoost.Ada (SMBa), tracker using SMBoost.Balanced (SMBb),tracker using SMBoost.Ratio (SMBr), and tracker usingSMBoost.LDA (SMBl). We evaluate the trackers on eightchallenging video sequences.10 As suggested in Ref. 9, weuse α ¼ 8 and β ¼ 50 for cropping negative samples(Xn ¼ fxjα < klðxÞ − l�t ðxÞk1 < βg, see detailed explanationof the parameters in Sec. 2). We crop nine positive sampleson initialization (Xp ¼ fxj1 >¼ klðxÞ − l�t ðxÞk1g), and cropthe tracking window as a positive sample on succeedingframes (Xp ¼ fxjlðxÞ ¼ l�t ðxÞg). The tracker uses all thepositive samples and 50 random negative samples to trainthe classifier. For most test sequences, we found that thetrackers are robust if set learning parameter γ ¼ 0.75 − 0.95.From Eqs. (12) and (13) we can see that the smaller learningparameter weights more on current samples and adopts theclassifier to appearance variations more quickly, while alarger learning parameter weights more on the previousdata and keeps the tracker stable against occlusions. Weuse γ ¼ 0.85 in common video sequences, a smaller learningparameter γ ¼ 0.75 for Seq. sylv, and a larger learning

Table 1 Description of datasets.

Dataset #Positive #Negative positive∕negative

promoters 53 53 1.0

balance-scale 288 288 1.0

breast-cancer 241 458 0.53

German 300 700 0.43

car 384 1210 0.32

chess 1669 1527 0.91

mushroom 4208 3916 1.1

Journal of Electronic Imaging 041108-7 Oct–Dec 2013/Vol. 22(4)

Hou, Mao, and Sun: Visual tracking by separability-maximum boosting

Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 08/30/2013 Terms of Use: http://spiedl.org/terms

Page 9: Visual tracking by separability-maximum boosting

Table 2 Classification accuracy on test data (100 weak classifiers). Bold fonts indicate the best performance.

Dataset ADA SMBa SMBb SMBr SMBl OB

promoters 0.8500 0.8500 0.8700 0.8450 0.8000 0.7380

balance-scale 0.9259 0.9845 0.9138 0.9810 0.9672 0.7114

breast-cancer 0.9231 0.9385 0.9527 0.9571 0.9648 0.7861

German 0.6193 0.5779 0.7043 0.7189 0.7221 0.6686

car 0.5781 0.5843 0.8979 0.8979 0.9000 0.6985

chess 0.8990 0.9216 0.8866 0.9416 0.9367 0.7597

mushroom 0.9828 0.9540 0.9496 0.9893 0.9857 0.5423

Average 0.8255 0.8301 0.8821 0.9044 0.8966 0.7007

Table 3 Classification accuracy on positive samples/negative samples (100 weak classifiers).

Dataset ADA SMBa SMBb SMBr SMBl

promoters 0.8400∕0.8600 0.8400∕0.8600 0.8800∕0.8600 0.9000∕0.6600 0.7600∕0.8200

balance-scale 0.8793∕0.9724 0.9862∕0.9828 0.8931∕0.9345 0.9793∕0.9828 0.9655∕0.9690

breast-cancer 0.8813∕0.9648 0.9495∕0.9275 0.9495∕0.9560 0.9582∕0.9560 0.9670∕0.9626

German 0.3229∕0.9157 0.2043∕0.9514 0.7600∕0.6486 0.6757∕0.7200 0.6471∕0.7357

car 0.1570∕0.9992 0.1702∕0.9983 1.0000∕0.7959 1.0000∕0.7959 0.8810∕0.9190

chess 0.9010∕0.8970 0.9377∕0.9056 0.9167∕0.8564 0.9534∕0.9298 0.9534∕0.9200

mushroom 0.9748∕0.9907 0.9195∕0.9886 0.9311∕0.9682 0.9793∕0.9993 0.9758∕0.9957

#003 #050 #100 #150

#200 #250 #300 #350

#400 #450 #520 #580

#640 #688 #720 #750

Fig. 6 Tracking result of SMBa[20∕250,0.85] (red) and OB[50∕250](yellow) on colored version of Seq. David.

#000 #020 #040 #060 #080

#100 #120 #140 #160 #180

#200 #220 #240 #260 #280

#300 #320 #340 #360 #380

#400 #420 #440 #460 #480

#500 #520 #540 #560 #580

#600 #620 #640 #660 #680

SMAa

#000 #020 #040 #060 #080

#100 #120 #140 #160 #180

#200 #220 #240 #260 #280

#300 #320 #340 #360 #380

#400 #420 #440 #460 #480

#500 #520 #540 #560 #580

#600 #620 #640 #660 #680

OB

Fig. 7 Positive samples collected during tracking. Centers of theimage patches are indicated with red crosses. The red cross shouldmatch the nose of the face. We highlight “bad” target windows withblue boxes.

Journal of Electronic Imaging 041108-8 Oct–Dec 2013/Vol. 22(4)

Hou, Mao, and Sun: Visual tracking by separability-maximum boosting

Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 08/30/2013 Terms of Use: http://spiedl.org/terms

Page 10: Visual tracking by separability-maximum boosting

parameter γ ¼ 0.95 for Seq. coke11. Further discussion onlearning parameters will be presented later.

For clarity, we encode important parameters with thetracker name like this: Tracker [T∕M, γ], where M is thesize of the feature pool, T is the number of algorithm iter-ations (see Algorithm 1), and γ is the learning parameter. Thetracker uses M features to represent the sample in featurespace, and chooses T most discriminative features when

training the classifier. For example, SMBa [25∕250,0.85]means a SMBa tracker that uses a feature pool of 250haar-like features to track the target, and for each incomingframe the tracker updates the statistical representation withlearning parameter 0.85, and chooses 25 features when train-ing the classifier. When we only want to discuss one of theparameters, we might also use: Tracker[T], Tracker[T∕M],or Tracker[γ]. If not specialized, the trackers use M ¼ 250and γ ¼ 0.85 in our experiments. For OB, SEMI, MIL,

Table 4 Tracking precision (T ¼ 50). Bold fonts indicate the best performance.

Dataset SMBa SMBb SMBr SMBl MILSER MIL SEMI OFS CT

sylv 0.36 0.64 0.68 0.66 0.63 0.61 0.46 0.50 0.64

David 0.66 0.80 0.73 0.77 0.71 0.54 0.31 0.32 0.75

faceocc 0.87 0.86 0.88 0.87 0.68 0.63 0.71 0.47 0.69

faceocc2 0.84 0.81 0.86 0.84 0.78 0.65 0.63 0.64 0.73

tiger1 0.30 0.48 0.32 0.51 0.60 0.51 0.17 0.27 0.64

tiger2 0.43 0.41 0.50 0.52 0.46 0.50 0.08 0.25 0.53

coke11 0.37 0.40 0.54 0.64 0.18 0.29 0.12 0.20 0.43

girl 0.57 0.48 0.69 0.52 0.64 0.53 0.69 0.38 0.48

Average 0.55 0.61 0.65 0.67 0.58 0.53 0.40 0.38 0.61

Fig. 8 Tracking results on video sequences with challenging illumina-tion, scale and pose changes. Object centers are indicated with redcrosses. Fig. 9 Tracking result on video sequences with occlusions.

Journal of Electronic Imaging 041108-9 Oct–Dec 2013/Vol. 22(4)

Hou, Mao, and Sun: Visual tracking by separability-maximum boosting

Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 08/30/2013 Terms of Use: http://spiedl.org/terms

Page 11: Visual tracking by separability-maximum boosting

and MILSER, the trackers update all the weak classifiers inthe feature pool. For SMBoost based trackers, we use a fixedthreshold for haar-like features and do not update the weakclassifiers.

4.2.1 Qualitative analysis of tracking precision

The first experiment explains how SMBoost solves the drift-ing problem by improving classifier accuracy. We compareSMBa tracker with OB tracker (the binary program is avail-able at the authors website) on Seq. David. According to theanalysis in Sec. 3, the two trackers actually optimize thesame objective function. The only difference between twotrackers is that SMBa shows higher classification accuracythan OB in our previous experiment on UCI datasets. Wetrack the face with T ¼ 50 andM ¼ 250. The tracking resultis shown in Fig. 6. Tracking precision of the two algorithmsis very close at the beginning of the sequence. Both the algo-rithms are able to adapt to illumination, scale, and posechanges by online learning. However, tracking precisionof OB decreases at #200, and the tracker becomes unstableat #300. Finally, the tracker starts drifting at #400, and stopsworking on frame #688.

Figure 7 shows the positive samples collected by thetrackers. The nose should be located at the center of thetracking window. Indicating patch centers with crosses,we can see that tracking errors of both SMBa and OB aresmall at the beginning of the video sequence. However,OB becomes less accurate and begins to give imprecisetracking windows (#40, #60, #160). With the misalignedsamples croped from the imprecise tracking windows, theclassifier becomes more and more inaccurate. The accumu-lated classifier error leads to larger tracking error (#200,#260, #280, #300, #360), and finally OB starts drifting at#400, and stop working at #688. Meanwhile, the SMAa clas-sifier is more accurate (as the previous experiment on UCIdatasets shows), and gives more precise tracking results.Tracking error does not accumulate, as the exponentialweighted running averages Eqs. (12) and (13) reduce affectsof the misaligned samples (#060) when learning newsamples.

4.2.2 Quantitative analysis of tracking precision

To quantitatively analyze tracking precision of SMBoostbased trackers, several public benchmark sequences9,10 arechosen for evaluation. In this experiment, we comparethe overlap-criterion of visual object classes (VOC) chal-lenge,29 which is computed as

roitracing ∩ roiground-truth

roitracing ∪ roiground-truth: (31)

Zeisl suggests that overlap-criterion also illustrates detect-ing accuracy, compared with raw pixel locating error,10 andis more stationary if object scale changes are considered.We first evaluate the trackers with parameter T ¼ 50 andM ¼ 250. We run the trackers five times on each sequenceand use the median for comparison. Tracking precision isreported in Table 4. For MILSER andMIL, we use the exper-imental results reported in Ref. 10.

From Table 4, we can see that the SMBoost based trackerscan achieve high tracking precision on most test sequences as

Fig. 10 Tracking results on video sequences with both fast motionand frequent occlusions.

#0010 #0060 #0090

#0115 #0190 #0250

(a) Seq. coke11

#0005 #0085 #0175

#0340 #0440 #0500

(b) Seq. girl

SMBa SMBl MIL CT

Fig. 11 Tracking results on video sequences with both occlusionsand rotations.

Journal of Electronic Imaging 041108-10 Oct–Dec 2013/Vol. 22(4)

Hou, Mao, and Sun: Visual tracking by separability-maximum boosting

Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 08/30/2013 Terms of Use: http://spiedl.org/terms

Page 12: Visual tracking by separability-maximum boosting

expected. SMBa significantly outperforms OB. This can beexplained by high classifier accuracy, as we proved in theprevious experiment. For further improvement, we can han-dle the imbalance of training data with SMBb, the balancedvariation of SMBa. SMBr is more accurate than SMBa andSMBb, This suggests that separability Eq. (18) is a betterchoice for visual tracking problems. SMBl yields the highesttracking precision in this experiment. It worth noticing thatthe average tracking precision (see the last row of Table 4) isclose related to the average classification accuracy that isreported in Table 2. The empirical study of classificationperformance gives insight into how the improved learningalgorithms improve the tracking precision. The experimentalresult demonstrates that the feature selection with SMBoostis more effective than feature selection with traditional onlineboosting algorithms. The experimental result also indicatesthat fixed-threshold decision stumps on haar-like featureshave already provided enough information to separate theobject from its surrounding background. Thus, weak classi-fier updating is omitted in our trackers. Detailed discussionsof the video sequences are presented below.

Sylv and David: These two video sequences containchallenging illumination, scale, and pose changes.SMBa is significantly worse than SMBb. This isbecause SMBa has low classification accuracy on pos-itive samples, and has large locating error when thetarget pose changes. Figure 8 shows tracking windows

Fig. 12 Tracking precision plot for the test video sequences.

(a) Original test sequence

(b) Accelerated test sequence

Fig. 13 Adopting a tracker to fast appearance variation by a smallerlearning parameter. SMBl [20∕250,0.85] (red), OB[50∕250] (aqua),and SEMI[100∕250] (blue) are stable on the original test sequence,but defeated by the accelerated one. SMBl [20∕250,0.7] (yellow)can handle appearance variations by using a smaller learningparameter.

Journal of Electronic Imaging 041108-11 Oct–Dec 2013/Vol. 22(4)

Hou, Mao, and Sun: Visual tracking by separability-maximum boosting

Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 08/30/2013 Terms of Use: http://spiedl.org/terms

Page 13: Visual tracking by separability-maximum boosting

for SMBa, SMBb, MIL, and CT. To avoid too muchplotting on a single frame, only a good tracking resultis shown for comparison.

Faceocc and Faceocc2: A major difficulty in these twovideo sequences is occlusions. Seq. faceocc2 is morechallenging for the within-plane rotations added by theauthor.9 From Table 4, we can see that all the SMBoostbased trackers achieve high tracking precision onthese two video sequences. Tracking result is shownin Fig. 9.

tiger1 and tiger2: These two sequences contain fastmotion and frequent occlusions. It is difficult to stabi-lize the tracker by using a large learning parameter, forthe tracker will fail in capturing fast appearancechanges. If we use a small learning parameter to cap-ture appearance changes, the tracker would be unstableagainst tackle occlusions. SMBl, MIL, and CT showgood performance on these two sequences. We showthe tracking windows of these three algorithm inFig. 10.

coke11 and girl: Major challenges in these two videosequences are out-of-plane rotations and occlusions.The target in Seq. coke11 is small, thus the trackermay have a small center locating error when it losesthe target. And the target in Seq. girl is large, thusaccurate tracking windows may also have a largecenter locating error. This is the main reason whywe use overlap-criterion in this paper. SMBa, SMBr,SMBl, MIL, and CT achieve good performance onthese two video sequences. Tracking result is shownin Fig. 11.

Tracking precision plots are shown in Fig. 12. From thecurves, it is easy to see that a frequent small tracking errorat the beginning of the sequences usually affects trackingprecision on the rest of the sequences.

4.2.3 Choosing learning parameter

A very important parameter in our trackers is the learningparameter γ. For challenging video sequences, we shouldchoose this parameter carefully. The first example is videosequences with fast object appearance variations. We testOB[50∕250], SEMI[100∕250], and SMBl[20∕250,0.85]on an accelerated version of Seq. David (see Fig. 13).The trackers are stable on the original video sequence.However, the accelerated sequence defeats all the trackers,including SEMI[100]. But we made SMBl work again byusing a smaller learning parameter [see Fig. 13(b)]. Thisexperiment indicates that assigning higher weights to thenewly incoming samples is important for adopting fast objectappearance variations.

Figure 14 shows SMBl handling occlusions by using alarger learning parameter. In cluttered environments, SMBlcan be stabilized by weighting less on new samples. Whenocclusion happens, the tracker crops noisy samples evenfrom an accurate tracking window. Low supervised learning(e.g., semi-supervised boosting, multiinstance learning),which can identify bad samples at the learning stage, shouldbe employed to handle such situations.

4.2.4 Efficiency

The efficiency refers to the number of Haar-like features thatare used in the classifier to achieve the same tracking preci-sion. In Ref. 5, Zhang proved that a tracker can achieve goodtracking precision with only 15 Haar-like features, which ismuch more efficiency than the MIL tracker that uses 50 fea-tures. We also design an experiment to illustrate the effi-ciency of SMBoost algorithms. The experimental setup isthe same as the previous one except for parameter T (festureschosen by the classifier). We find that the SMBoost basedtrackers can achieve acceptable accuracy with T ¼ 25,and tracking result is reported in Table 5. MIL andMILSER are not included in this experiment, for the authordoes not provide tracking result with 25 features. OB andSEMI do not work when T ¼ 25.

From the average tracking precision, we can see thatSMBoost based trackers significantly outperform OB[50∕250] and SEMI[100∕250]. We achieve a higher trackingprecision than OB with fewer features. And SMBl[25∕250]

Fig. 14 Tracking result of SMBl [20∕250,0.85] (red) and SMBl [20∕250,0.95] (green) on Seq. coke11. SMB can handle occlusions by using a highlearning parameter.

Table 5 Tracking precision (T ¼ 25). Bold fonts indicate the bestperformance.

Dataset SMBa SMBr SMBr SMBl CT

sylv 0.27 0.36 0.37 0.59 0.39

David 0.41 0.64 0.75 0.74 0.54

faceocc 0.85 0.66 0.79 0.79 0.67

faceocc2 0.80 0.83 0.67 0.77 0.63

tiger1 0.24 0.30 0.12 0.40 0.21

tiger2 0.38 0.38 0.43 0.26 0.42

coke11 0.44 0.21 0.32 0.14 0.32

girl 0.37 0.43 0.60 0.59 0.40

Average 0.47 0.48 0.51 0.54 0.45

Journal of Electronic Imaging 041108-12 Oct–Dec 2013/Vol. 22(4)

Hou, Mao, and Sun: Visual tracking by separability-maximum boosting

Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 08/30/2013 Terms of Use: http://spiedl.org/terms

Page 14: Visual tracking by separability-maximum boosting

has similar tracking precision with MIL[50∕250]. Comparedwith the selector based online boosting, SMBoost has thesignificant advantage of efficiency.

4.2.5 Speed benchmark

A detailed analysis of computation speed is made in thisexperiment. We perform a speed benchmark on Seq.david, whose resolution is 320 × 240, and the duration is51 s. We test SMBoost based trackers with different Tand feature pool size M. The benchmark result is reportedin Table 6. Boldfaced entries indicate fast trackers withacceptable tracking precision in practical applications.From Table 6, we can see that the SMBa[25∕250] runsvery fast, and achieves 134 FPS on the experiment computer.It is 1.74 times faster than OB[25∕250] (which actually doesnot work on practical test video sequences), and seven timesfaster than OB[50∕250]. SMBl[25∕250] runs in 109 FPS.Considering that its tracking precision is good on mostof the test videos (see Table 5), we suggest this trackerfor less challenging video sequences. SMBa[50∕250] andSMBl[50∕250] are suggested for difficult video sequences.SMBb and SMBr are much slower, for they need linearsearch during feature selection.

SMBoost has great advantages in computation speedwhen compared with the state-of-the-art approaches. Themain reason is that we use fixed-threshold for the decisionstumps. In Ref. 5, Zhang also claims that weak classifierupdating is the main difficulty of fast tracker, and triesto speed up the tracker by using a smaller feature pool(M ¼ 150). Moreover, w is a sparse vector, which allowsextremely fast matrix operations.

4.2.6 Failure cases analysis

In the previous experiments, we have proved that we canimprove tracking precision by using a more accurate classi-fier. However, the ambiguity may reside in the tracking prob-lem itself,9 which cannot be solved by simply improvingclassifier accuracy. For example, the object may switch toanother surface during tracking. In such a situation, itlacks high-level information to guide the tracker switchingto the new surface. SMBoost based trackers track a singlesurface very well, and are stable against large appearancechanges [see Fig. 15(a)]. Although nearly half of the surfacetexture changed in Seq. dollar, our tracker is still stable. Butwhen the object switches to another surface, SMBoost basedtrackers continues to track the previous surface, and losetheir targets when the surface disappears [see Fig. 15(b)].Such ambiguity can be removed by multiinstance learning.It is very impressive that MIL is able to switch to the newsurface in Seq. twinnings, but stays on the original surface inSeq. dollar. This experiment indicates that low supervisedlearning is still necessary even though we have an accurateclassifier.

5 Conclusion and Future WorkIn this paper, we proposed a family of online boostingalgorithms (SMBoost) for visual tracking problems. Ouralgorithms characterize the separability between the objectand the background with their means and covariancematrices, and achieve higher classification accuracy thanOza’s online boosting. Theoretical and empirical studyshow that the SMBoost algorithms are as accurate asbatch AdaBoost. Experimental results on challenging videosequences show improvements in term of tracking precision,efficiency and robustness, comparing our method with theprevious appearance models. Moreover, our algorithms canbe applied to other online feature selection problems in com-puter vision.

References

1. A. Jepson, D. Fleet, and T. El-Maraghi, “Robust online appearancemodels for visual tracking,” IEEE Trans. Pattern Anal. Mach. Intell.25(10), 1296–1311 (2003).

2. M. Black and A. Jepson, “Eigentracking: robust matching and trackingof articulated objects using a view-based representation,” Int. J.Comput. Vis. 26(1), 63–84 (1998).

3. D. Ross et al., “Incremental learning for robust visual tracking,” Int. J.Comput. Vis. 77(1), 125–141 (2008).

4. X. Mei and H. Ling, “Robust visual tracking using l1 minimization,”pp. 1436–1443, IEEE, Kyoto (2009).

5. K. Zhang and H. Song, “Real-time visual tracking via online weightedmultiple instance learning,” Pattern Recogn. 46(1), 397–411 (2013).

6. S. Avidan, “Ensemble tracking,” IEEE Trans. Pattern Anal. Mach.Intell. 29(2), 261–271 (2007).

7. H. Grabner, M. Grabner, and H. Bischof, “Real-time tracking viaon-line boosting,” in Proc. BMVC, Vol. 1, p. 4756 (2006).

8. H. Grabner, C. Leistner, and H. Bischof, “Semi-supervised on-lineboosting for robust tracking,” Lec. Notes Comput. Sci. 5302, 234–247 (2008).

(a) Seq. dollar. SMBl(red) tracks a single surface of the object verywell, and is stable against large appearance changes. Both SMBl

and MIL(aqua) stay on the original surface.

(b) Seq. twinnings. When the box rotates, SMBl(red) still tracksthe previous surface, but MIL(aqua) switches to a new surface.

Fig. 15 Tracking result of SMBl (red) and MIL (aqua).

Table 6 Benchmark result (in fps).

SMBa [25∕250] SMBb [25∕250] SMBr [25∕250] SMBl [25∕240] OB[25∕250] CT[25]

134 43 46 109 77 42

SMBa [50∕250] SMBb [50∕250] SMBr [50∕250] SMBl [50∕250] OB[50∕250] CT[50]

97 24 26 69 22 33

Journal of Electronic Imaging 041108-13 Oct–Dec 2013/Vol. 22(4)

Hou, Mao, and Sun: Visual tracking by separability-maximum boosting

Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 08/30/2013 Terms of Use: http://spiedl.org/terms

Page 15: Visual tracking by separability-maximum boosting

9. B. Babenko and M.-H. Y. S. Belongie, “Robust object tracking withonline multiple instance learning,” IEEE Trans. Pattern Anal. Mach.Intell. 33(8), 1619–1632 (2011).

10. B. Zeisl et al., “On-line semi-supervised multiple-instance boosting,” inIEEE Conf. Computer Vision and Pattern Recognition (CVPR 2010),p. 1879, IEEE, San Francisco, California (2010).

11. N. Oza and S. Russell, Online Ensemble Learning, University ofCalifornia, Berkeley (2001).

12. H. Grabner and H. Bischof, “On-line boosting and vision,” in IEEEComputer Society Conf. Computer Vision and Pattern Recognition,Vol. 1, p. 260–267, IEEE(2006).

13. Z. Kalal, J. Matas, and K. Mikolajczyk, “P-n learning: bootstrappingbinary classifiers by structural constraints,” pp. 49–56, IEEE, SanFrancisco, California (2010).

14. C. Shen and H. Li, “On the dual formulation of boosting algorithms,”IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2216–2231 (2010).

15. R. E. Schapire et al., “Boosting the margin: a new explanation for theeffectiveness of voting methods,” Ann. Stat. 26(5), 1651–1686 (1998).

16. M. Yang and Y. Wu, “Tracking non-stationary appearances anddynamic feature selection,” in IEEE Computer Society Conf.Computer Vision and Pattern Recognition (CVPR 2005), Vol. 2,pp. 1059–1066, IEEE (2005).

17. R. Collins, Y. Liu, and M. Leordeanu, “Online selection of discrimina-tive tracking features,” IEEE Trans. Pattern Anal. Mach. Intell. 27(10),1631–1643 (2005).

18. J. Wu et al., “Fast asymmetric learning for cascade face detection,”IEEE Trans. Pattern Anal. Mach. Intell. 30(3), 369–382 (2007).

19. G. J. Székely, M. L. Rizzo, and N. K. Bakirov, “Measuring and testingdependence by correlation of distances,” Ann. Stat. 35(6), 2769–2794(2007).

20. F. Porikli, O. Tuzel, and P. Meer, “Covariance tracking using modelupdate based means on riemannian manifolds,” in Proc. IEEE Conf.on Computer Vision and Pattern Recognition, Vol. 1, pp. 728–735,IEEE (2006).

21. H. He and E. Garcia, “Learning from imbalanced data,” IEEE Trans.Knowledge Data Eng. 21(9), 1263–1284 (2009).

22. W. Fan et al., “AdaCost: misclassification cost-sensitive boosting,”in Int. Conf. Machine Learning, pp. 97–105, Morgan KaufmannPublishers Inc., San Francisco, California (1999).

23. Y. Sun et al., “Cost-sensitive boosting for classification of imbalanceddata,” Pattern Recognit. 40(12), 3358–3378(2007).

24. M. Pham and T. Cham, “Online learning asymmetric boosted classifiersfor object detection,” in IEEE Conf. Computer Vision and PatternRecognition (CVPR 2007), pp. 1–18, IEEE, Minneapolis (2007).

25. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, JohnWiley & Sons, Inc., New York (2002).

26. J. H. Friedman, “Greedy function approximation: a gradient boostingmachine,” Ann. Stat. 29(5), 1189–1232 (2001).

27. A. Frank and A. Asuncion, UCI Machine Learning Repository,University of California, Irvine, School of Information and ComputerSciences (2010).

28. K. Zhang, L. Zhang, and M.-H. Yang, “Real-time compressivetracking,” in Computer Vision ECCV 2012, D. Hutchison et al., Eds.,Vol. 7574, pp. 864–877, Springer-Berlin, Heidelberg, Berlin (2012).

29. M. Everingham et al., The pascal visual object classes (VOC) challengeInt. J. Comput. Vision 88(2), 303–338 (2010).

Biographies and photographs of the authors are not available.

Journal of Electronic Imaging 041108-14 Oct–Dec 2013/Vol. 22(4)

Hou, Mao, and Sun: Visual tracking by separability-maximum boosting

Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 08/30/2013 Terms of Use: http://spiedl.org/terms