15
Vis Comput DOI 10.1007/s00371-013-0793-5 ORIGINAL ARTICLE Object joint detection and tracking using adaptive multiple motion models Zhijie Wang · Mohamed Ben Salah · Hong Zhang © Springer-Verlag Berlin Heidelberg 2013 Abstract This paper deals with the problem of detecting objects that may switch between different motion models. In order to accurately detect these moving objects taking into account possible changing motion models, we propose an adaptive multi-motion model in the joint detection and tracking (JDT) framework. The proposed technique differs from the existing JDT-based methods mainly in two ways. First we express the solution in the JDT framework via a formulation in the multiple motion model setting. Sec- ond, we introduce a new motion model prediction function which exploits the correlation between the motion model and object kinematic state. Experiments on both synthetic and real videos demonstrate that the JDT method employing the proposed adaptive multi-motion model can detect ob- jects more accurately than the existing peer methods when objects change their motion models. Keywords Joint detection and tracking · Multi-motion model · Object kinematic state 1 Introduction In object detection and tracking, two main problems attract researchers’ attention: Z. Wang · M. Ben Salah ( ) · H. Zhang Department of Computing Science, University of Alberta, Edmonton, Canada e-mail: [email protected] Z. Wang e-mail: [email protected] H. Zhang e-mail: [email protected] (1) variable motion models of the moving objects, and (2) varying number of moving objects. In this work, we investigate both topics, i.e., we tackle the problem of detecting varying number of objects undergoing various motion models. These problems have been generally posed in the literature as a hybrid state estimation problem, that is, estimation of a partially observed stochastic process with discrete- and continuous-valued states [1]. Among all the hybrid state estimation schemes, the interacting multi- ple model (IMM) estimator [2, 3] is the most cost-effective and has been widely adopted. IMM has the ability to esti- mate the state of a dynamic system with several behavior modes which can “switch” from one to another [1]. Thus, it can be used to model a variety of problems with multiple behavior modes. In fact, IMM has been employed in some work to tackle independently each one of the two previously mentioned problems: variable motion models and a varying number of objects. Concerning the first problem, a discrete state variable in- dexing the motion models, along with a continuous state variable consisting of kinematic components have been employed by many IMM estimators to track objects that change their motion models. For example, IMM is used with Kalman filters for describing maneuvering target dy- namics, such as IMM-EKF, IMM-UKF, etc. [4, 5]. Also, To handle the problems with nonlinear dynamics and measure- ments, IMM is combined with particle filters (PF) in [68] to track maneuvering targets. In these PF implementations, one potential problem is that the number of particles cor- responding to a specific mode is proportional to the mode probability. If the mode probability is very low, only a very small fraction of the particles resides in that mode and this causes numerical problems [9]. For this reason, the authors in [9] combined IMM with a regularized particle filter so that a fixed number of particles are assigned to each mode.

Object joint detection and tracking using adaptive multiple ...ncfrn.mcgill.ca/members/pubs/Wang_VisualComputer2013.pdfObject joint detection and tracking using adaptive multiple motion

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Object joint detection and tracking using adaptive multiple ...ncfrn.mcgill.ca/members/pubs/Wang_VisualComputer2013.pdfObject joint detection and tracking using adaptive multiple motion

Vis ComputDOI 10.1007/s00371-013-0793-5

O R I G I NA L A RT I C L E

Object joint detection and tracking using adaptive multiplemotion models

Zhijie Wang · Mohamed Ben Salah · Hong Zhang

© Springer-Verlag Berlin Heidelberg 2013

Abstract This paper deals with the problem of detectingobjects that may switch between different motion models.In order to accurately detect these moving objects takinginto account possible changing motion models, we proposean adaptive multi-motion model in the joint detection andtracking (JDT) framework. The proposed technique differsfrom the existing JDT-based methods mainly in two ways.First we express the solution in the JDT framework viaa formulation in the multiple motion model setting. Sec-ond, we introduce a new motion model prediction functionwhich exploits the correlation between the motion modeland object kinematic state. Experiments on both syntheticand real videos demonstrate that the JDT method employingthe proposed adaptive multi-motion model can detect ob-jects more accurately than the existing peer methods whenobjects change their motion models.

Keywords Joint detection and tracking · Multi-motionmodel · Object kinematic state

1 Introduction

In object detection and tracking, two main problems attractresearchers’ attention:

Z. Wang · M. Ben Salah (�) · H. ZhangDepartment of Computing Science, University of Alberta,Edmonton, Canadae-mail: [email protected]

Z. Wange-mail: [email protected]

H. Zhange-mail: [email protected]

(1) variable motion models of the moving objects, and(2) varying number of moving objects.

In this work, we investigate both topics, i.e., we tackle theproblem of detecting varying number of objects undergoingvarious motion models. These problems have been generallyposed in the literature as a hybrid state estimation problem,that is, estimation of a partially observed stochastic processwith discrete- and continuous-valued states [1]. Among allthe hybrid state estimation schemes, the interacting multi-ple model (IMM) estimator [2, 3] is the most cost-effectiveand has been widely adopted. IMM has the ability to esti-mate the state of a dynamic system with several behaviormodes which can “switch” from one to another [1]. Thus,it can be used to model a variety of problems with multiplebehavior modes. In fact, IMM has been employed in somework to tackle independently each one of the two previouslymentioned problems: variable motion models and a varyingnumber of objects.

Concerning the first problem, a discrete state variable in-dexing the motion models, along with a continuous statevariable consisting of kinematic components have beenemployed by many IMM estimators to track objects thatchange their motion models. For example, IMM is usedwith Kalman filters for describing maneuvering target dy-namics, such as IMM-EKF, IMM-UKF, etc. [4, 5]. Also, Tohandle the problems with nonlinear dynamics and measure-ments, IMM is combined with particle filters (PF) in [6–8]to track maneuvering targets. In these PF implementations,one potential problem is that the number of particles cor-responding to a specific mode is proportional to the modeprobability. If the mode probability is very low, only a verysmall fraction of the particles resides in that mode and thiscauses numerical problems [9]. For this reason, the authorsin [9] combined IMM with a regularized particle filter sothat a fixed number of particles are assigned to each mode.

Page 2: Object joint detection and tracking using adaptive multiple ...ncfrn.mcgill.ca/members/pubs/Wang_VisualComputer2013.pdfObject joint detection and tracking using adaptive multiple motion

Z. Wang et al.

In addition to these traditional dynamical systems, IMM hasbeen also combined with Gaussian Process Dynamic Mod-els (GPDM) [10]. This mainly aims to effectively handlehigh dimensionality of the motion states. All these meth-ods are intended to deal with the problem of variable motionmodels. However, they do not take into account the secondproblem—varying number of objects.

Regarding the second problem, detection of varying num-ber of objects, many methods have been proposed basedon IMM estimators and employing a discrete variable in-dicating the number of present objects. Such methods arecommonly referred to by joint detection and tracking (JDT)methods. This is mainly because the detection decision ismade by tracking the discrete object state corresponding tothe number of present objects. In other words, detection andtracking of objects are performed together in a joint frame-work. Many methods have been proposed on various issuesof JDT since Ristic et al. introduced the basic JDT frame-work and its applications in their book [11]. Rutten et al.proposed two different particle filters for JDT [12]. The firstone is an orthodox SIR filter augmenting the target statespace with an existence variable. However, the second is de-rived such that the probability of existence is calculated us-ing the weights of the particles so that the target existencedoes not need to be included explicitly as a part of the statevector. JDT methods were then generalized from single ob-ject detection to multiple object detection in [13] and [14].They have also been applied in many applications for detect-ing different types of object. For example, [15] jointly de-tected and tracked unresolved targets with monopulse radar.Additionally, in [16] and [17], Czyz et al. applied a JDTmethod to detect objects in color videos by using a color ob-servation model. A scale-invariant feature transform (SIFT)-based particle filter algorithm has been also presented forjoint detection and tracking of independently moving ob-jects in stereo sequences observed by uncalibrated movingcameras in [18]. Although all these methods deal with thevarying number of objects of interest, they assume the ob-jects undergo a single motion model. Hence, they are notsuitable for problems where objects experience various mo-tion model changes.

In this paper we propose a component, formulated in theJDT framework, and which addresses the two previouslymentioned problems. The novelty of the proposed methodcan be summarized in two levels. First, we use an IMM esti-mator formulated in the JDT framework which not only em-beds multiple motion models, but also takes into account thevarying number of objects. Second, a novel motion modelprediction function is introduced. With this function, predic-tion of the current motion model is not based on the previousmodel alone, but also on the object kinematic state. Recallthat in the existing IMM estimators, prediction of the inter-acting motion models follows the basic assumption of the

Markov chain. In other words, the current active model iscorrelated only with the previous model, and the correlationis described by a fixed model transition matrix. For exam-ple, in [9, 19, 20] the IMM estimator is employed to man-age switching between different motion models and a fixedtransition matrix is used to determine the probability of tran-sition from one motion model to another. In [10, 21, 22], theIMM estimator is rather employed to manage switching be-tween different gesture and expression states. A fixed transi-tion matrix is also employed to define the probability of tran-sition from one gesture/expression to another. Furthermore,IMM is also used to estimate the varying number of objectsin [13–18]. In these methods, the probability of transitionfrom a given number of objects to another one is defined by afixed transition matrix. In all the methods mentioned above,the prediction of the current interacting model is based onlyon the previous model. However, in real applications, theinteracting model is generally strongly correlated to the ob-ject kinematic state. Taking this into account, we introducehere a model prediction function where the current modelis correlated also to the previous object kinematic state. Tothis end, any prior knowledge about model switching canbe exploited to achieve an accurate model prediction, andeventually accurate object detection results. Notice that thisnew motion model prediction function is general enough tobe adaptable to various interacting models. In this paper andfor sake of clarity, we limit ourselves to interacting motionmodels.

The remainder of this paper is organized as follows. InSect. 2, the general IMM solution for joint detecting andtracking of objects with a single motion model is first de-scribed, and then the new solution using adaptive multiplemotion models is introduced. The implementation of theproposed solution with a particle filter is detailed in Sect. 3.Section 4 includes the experimental results that illustrate theperformance of the proposed method, and finally a conclu-sion is drawn in Sect. 5.

2 Joint detection and tracking using adaptive multiplemotion models

In this section, we first introduce the existing IMM solutionfor joint detecting and tracking of objects using a single mo-tion model. Then, we describe how this formulation is gen-eralized to support detection with multiple motion models.

2.1 Object detection with a single motion model

In the JDT framework, the IMM estimator tackles the objectdetection problem by computing recursively the posteriorprobability density function (pdf) p(Xt |Zt). Zt includes all

Page 3: Object joint detection and tracking using adaptive multiple ...ncfrn.mcgill.ca/members/pubs/Wang_VisualComputer2013.pdfObject joint detection and tracking using adaptive multiple motion

Object joint detection and tracking using adaptive multiple motion models

Fig. 1 The graphical modelshowing the relationshipbetween the observations andthe object states

the observations up to time t , {Z1, . . . ,Zt }. Xt is the hybridobject state defined as follows:

Xt = [xt Et ]. (1)

xt describes the objects’ kinematic state at time t such aslocation and velocity. Et ∈ {0,1, . . . ,Me} is a discrete ex-istence variable associated with a set of interacting modelsindicating the number of objects present at time instant t .Here, we set the maximum number Me to one for the sim-plicity of explanation, and for further details regarding ex-tension to multiple objects please refer to [16]. In this way,Et ∈ {0,1} indicates whether the object is present at time t

or not, and the decision is reached by maximizing the condi-tional probability P(Et |Zt), which is estimated as explainedin the following.

The probability of an object being present at the cur-rent frame is estimated by marginalizing the posterior pdfp(xt ,Et = 1|Zt) over the kinematic state xt ,

P(Et = 1

∣∣Zt) =

∫p(xt ,Et = 1

∣∣Zt)dxt . (2)

p(xt ,Et = 1|Zt) is, in turn, expanded using the Bayes ruleand the fact that Zt is conditionally independent from Zt−1

given xt and Et (refer to the graphical model in Fig. 1),1

p(xt ,Et = 1

∣∣Zt) = p(Zt |xt ,Et = 1)p(xt ,Et = 1|Zt−1)

p(Zt |Zt−1).

(3)

p(Zt |xt ,Et = 1) is the object appearance model updatingthe predicted hybrid state according to the current obser-vation, and it can be defined according to the application.p(xt ,Et = 1|Zt−1) is the predicted hybrid state function es-timated by marginalizing the mixed probability distributionp(xt ,Et = 1,Et−1|Zt−1) over the previous existence vari-able Et−1 as follows:

p(xt ,Et = 1

∣∣Zt−1)

1Note that following the notation system in [2–4], we use the notationP (capital p) for probability mass functions (pmf), i.e., when randomvariables are discrete such as Et . However, the notation p (lower-case)is rather used both for probability density functions (pdf) and also prob-abilities on mixed joint variables (discrete and continuous).

= p(xt ,Et = 1,Et−1 = 0

∣∣Zt−1)

+ p(xt ,Et = 1,Et−1 = 1

∣∣Zt−1)

= p(xt ,Et = 1,Et−1 = 0

∣∣Zt−1)

+∫

p(xt ,Et = 1, xt−1,Et−1 = 1

∣∣Zt−1)dxt−1.

The first term in the right-hand side part of the above equal-ity can be simplified as follows:

p(xt ,Et = 1,Et−1 = 0

∣∣Zt−1)

= p(xt

∣∣Et = 1,Et−1 = 0,Zt−1)

× p(Et = 1,Et−1 = 0

∣∣Zt−1)

= p(xt

∣∣Et = 1,Et−1 = 0,Zt−1)

× p(Et = 1

∣∣Et−1 = 0,Zt−1)p(Et−1 = 0

∣∣Zt−1)

= pb(xt )PbP(Et−1 = 0

∣∣Zt−1).

The integral term in the right-hand side part is simplifiedby assuming that xt is conditionally independent from Zt−1

given (xt−1, and Et−1 or Et ) and the fact that Et is condi-tionally independent from Zt−1 given (xt−1 and Et−1). Thisgives

p(xt ,Et = 1, xt−1,Et−1 = 1

∣∣Zt−1)

= p(xt

∣∣xt−1,Et = 1,Et−1 = 1,Zt−1)

× p(xt−1,Et = 1,Et−1 = 1

∣∣Zt−1)

= p(xt |xt−1,Et = 1,Et−1 = 1)

× p(Et = 1|Et−1 = 1, xt−1)p(xt−1,Et−1 = 1

∣∣Zt−1).

Given that the kinematic state xt correspond only to caseswhere the object is present, and also for simplicity, weuse henceforth p(xt |xt−1) instead of p(xt |xt−1,Et = 1,

Et−1 = 1). The previous two simplifications together giverise to:

p(xt ,Et = 1

∣∣Zt−1)

= pb(xt )PbP(Et−1 = 0

∣∣Zt−1)

+∫

p(xt |xt−1)(1 − Pd)

× p(xt−1,Et−1 = 1

∣∣Zt−1)dxt−1. (4)

Intuitively, the above equation says that the object is ei-ther newborn in the current frame (when Et−1 = 0 whileEt = 1), or coming from the previous frame (when Et−1 = 1and Et = 1). The transition between Et−1 and Et , given thepast observations, follows a Markov chain which is specified

Page 4: Object joint detection and tracking using adaptive multiple ...ncfrn.mcgill.ca/members/pubs/Wang_VisualComputer2013.pdfObject joint detection and tracking using adaptive multiple motion

Z. Wang et al.

by a transitional probability matrix (TPM) of this form

T =[

1 − Pb Pb

Pd 1 − Pd

]. (5)

Pb denotes the probability of object birth and Pd denotes theprobability of object death. p(xt |xt−1) in Eq. (4) is the tran-sition function of the object kinematic state specified by theobject’s motion model. p(xt−1,Et−1 = 1|Zt−1) is the pre-vious posterior pdf and P(Et−1 = 0|Zt−1) is the previousobject absence probability. pb(xt ) is the initial object pdfwhere subscript b stands for “birth”. If no prior knowledgeis available, it is assumed to be the uniform distribution.

Similarly, the probability of an object being absent at thecurrent frame is estimated according to

P(Et = 0

∣∣Zt) = p(Et = 0,Zt |Zt−1)

p(Zt |Zt−1), (6)

p(Et = 0,Zt

∣∣Zt−1)

= p(Zt |Et = 0)P(Et = 0

∣∣Zt−1)

= p(Zt |Et = 0)(P

(Et = 0,Et−1 = 0

∣∣Zt−1)

+ P(Et = 0,Et−1 = 1

∣∣Zt−1))

= p(Zt |Et = 0)((1 − Pb)P

(Et−1 = 0

∣∣Zt−1)

+ PdP(Et−1 = 1

∣∣Zt−1)). (7)

Recall here the notation adopted for the probability massand density functions explained previously. p(Zt |Et = 0) isthe likelihood function when the object is absent, and it isindependent of the object state. This function is normallydefined to be a constant. Till this point, the probability ofan object being present (P(Et = 1|Zt)) or absent (P(Et =0|Zt)) can be estimated following Eqs. (2) and (6), and anobject is detected if P(Et = 1|Zt) is bigger than P(Et =0|Zt).

The JDT method described above assumes that the objectkinematic state transition function in Eq. (4), p(xt |xt−1), isspecified by a dynamic model xt = f (xt−1,w(t − 1)). Thefunction f is assumed to be known and determined by a mo-tion model, and w(t − 1) is the process noise. However, inmany problems, objects may undergo more than one motionmodel. For example, an aircraft moving in a constant ve-locity may suddenly accelerate. In this case, using a motionmodel which assumes there is constant velocity would betotally inefficient. Indeed, during the acceleration phase theconstant velocity assumption would be violated and, thus,the tracker would lose the aircraft. Therefore, to tackle thisproblem, the above JDT method is extended with adaptivemultiple motion models in the following subsection.

Fig. 2 (a) The prediction strategies of the object kinematic state, xt , inthe existing JDT methods, and of the motion model, αt , in the existingmodel prediction functions. (b) The prediction strategy of the objectkinematic state and the motion model in the proposed JDT method

2.2 Object detection with multiple motion models

To improve the above JDT method so that it can accuratelydetect objects that undergo multiple motion models, two ma-jor modifications are introduced. In summary, the first mod-ification combines two object state formulations: [xt ,Et ],which tackles the varying number of objects and [xt , αt ],which tackles the varying object motion models. The com-bined object state [xt , αt ,Et ] along with a new object kine-matic state transition function p(xt |xt−1, αt−1) allows tak-ing into account multiple motion models in the JDT frame-work. Intuitively, the prediction of the object kinematic statext is dependent not only on the previous object kinematicstate xt−1 (as is generally assumed in the original transitionfunction p(xt |xt−1)), but also the previous motion modelαt−1 (see an illustration in Fig. 2). In the second modifi-cation, we formulate the JDT with multiple motion modelsin an adaptive way by employing a novel motion model pre-diction function P(αt |xt−1, αt−1), which extends the origi-nal one P(αt |αt−1). Intuitively, the prediction of the objectmotion model αt is dependent not only on the previous mo-tion model αt−1, but also on the previous object kinematicstate xt−1. Each one of these two modifications is explainedin detail in the following.

2.2.1 Multiple motion model formulation

First, we formulate the JDT framework with multiple motionmodels. As mentioned before, there is normally one interact-ing model set in each IMM estimator. This model set eitherembeds multiple motion models whose purpose is trackingof objects undergoing different types of motion during a sin-gle sequence, or it is rather used to detect varying number ofobjects (mainly detect newly present objects in the scene).However, when these two problems occur simultaneously,i.e., when objects that may change their motion models needto be detected, one model set is not enough. Therefore, asthe first modification, we formulate the IMM estimator inJDT methods to contain an additional set of motion modelsbesides its original set of object presence models.

To this end, a discrete variable αt that allows the JDTframework to switch between different motion models is

Page 5: Object joint detection and tracking using adaptive multiple ...ncfrn.mcgill.ca/members/pubs/Wang_VisualComputer2013.pdfObject joint detection and tracking using adaptive multiple motion

Object joint detection and tracking using adaptive multiple motion models

added to the object state [xt Et ]. Hence, the original hybridobject state is expanded to

Xt = [xt αt Et ], (8)

where αt ∈ {1, . . . ,Mm} and Mm is the number of possiblemotion models. Thanks to the new variable αt , the filter canswitch between different motion models and selects the ap-propriate one which best fits the object’s actual model and,thus, leads to better tracking. When this is set properly, theobject should be tracked accurately even when its motionvaries.

By taking into account this additional motion model vari-able, the original predicted hybrid state function in Eq. (4)is rewritten as follows:

p(xt , αt ,Et = 1

∣∣Zt−1)

= p(xt , αt ,Et = 1,Et−1 = 0

∣∣Zt−1)

+ p(xt , αt ,Et = 1,Et−1 = 1

∣∣Zt−1)

= p(xt , αt ,Et = 1,Et−1 = 0

∣∣Zt−1)

+∑

αt−1

∫p(xt , αt ,Et = 1, xt−1, αt−1,

Et−1 = 1∣∣Zt−1)dxt−1.

Similar to the derivation of Eq. (4), the first term in theright-hand side part of the above equality can be simplifiedas follows:

p(xt , αt ,Et = 1,Et−1 = 0

∣∣Zt−1)

= p(xt

∣∣αt ,Et = 1,Et−1 = 0,Zt−1)

× p(αt ,Et = 1,Et−1 = 0

∣∣Zt−1)

= p(xt

∣∣αt ,Et = 1,Et−1 = 0,Zt−1)

× p(αt

∣∣Et = 1,Et−1 = 0,Zt−1)

× p(Et = 1,Et−1 = 0

∣∣Zt−1)

= p(xt

∣∣αt ,Et = 1,Et−1 = 0,Zt−1)

× p(αt

∣∣Et = 1,Et−1 = 0,Zt−1)

× p(Et = 1

∣∣Et−1 = 0,Zt−1)p(Et−1 = 0

∣∣Zt−1)

= pb(xt )Pb(αt )PbP(Et−1 = 0

∣∣Zt−1).

The integral and summation term in the right-hand side partis simplified by the following assumptions. First, xt is as-sumed as conditionally independent from αt and Zt−1 given(xt−1, αt−1 and Et−1 or Et ). Second, αt is independent withZt−1 given (xt−1, αt−1 and Et−1 or Et ). Third, the existencevariable follows first order Markov assumption, i.e., Et is

only dependent on the previous state Et−1. This gives

p(xt , αt ,Et = 1, xt−1, αt−1,Et−1 = 1

∣∣Zt−1)

= p(xt

∣∣αt ,Et = 1, xt−1, αt−1,Et−1 = 1,Zt−1)

× p(αt ,Et = 1, xt−1, αt−1,Et−1 = 1

∣∣Zt−1)

= p(xt

∣∣αt ,Et = 1, xt−1, αt−1,Et−1 = 1,Zt−1)

× p(αt

∣∣Et = 1, xt−1, αt−1,Et−1 = 1,Zt−1)

× p(Et = 1, xt−1, αt−1,Et−1 = 1

∣∣Zt−1)

= p(xt

∣∣αt ,Et = 1, xt−1, αt−1,Et−1 = 1,Zt−1)

× p(αt

∣∣Et = 1, xt−1, αt−1,Et−1 = 1,Zt−1)

× p(Et = 1

∣∣xt−1, αt−1,Et−1 = 1,Zt−1)

× p(xt−1, αt−1,Et−1 = 1

∣∣Zt−1)

= p(xt |xt−1, αt−1)P (αt |xt−1, αt−1)(1 − Pd)

× p(xt−1, αt−1,Et−1 = 1

∣∣Zt−1).

Following Eq. (4), we use p(xt |xt−1, αt−1) instead ofp(xt |xt−1, αt−1,Et = 1,Et−1 = 1). The previous two sim-plifications together give rise to

p(xt , αt ,Et = 1

∣∣Zt−1)

=∑

αt−1

∫p(xt |xt−1, αt−1)P (αt |xt−1, αt−1)(1 − Pd)

× p(xt−1, αt−1,Et−1 = 1

∣∣Zt−1)dxt−1

+ pb(xt )Pb(αt )PbP(Et−1 = 0

∣∣Zt−1). (9)

As before, Pb(αt ) is the probability of the initial objectmotion model where subscript b stands for “birth” and ifno prior knowledge is available, it may be assumed uni-formly distributed. Different from the original predicted hy-brid state function in Eq. (4), the object kinematic state tran-sition function p(xt |xt−1, αt−1,Et = 1,Et−1 = 1) in Eq. (9)is characterized by its αt−1th motion model. Here αt−1 isone of the object’s possible motion models, rather than aunique pre-fixed motion model as in the original formula-tion.

2.2.2 Motion model prediction function

Once the formulation is expanded to bear with multiple mo-tion models, we introduce a novel adaptive motion modelprediction function. In the existing approaches, the transi-tion of the motion model variable αt is modeled by a Markovchain as discussed in Sect. 1. In other words, the motionmodel prediction function P(αt |xt−1, αt−1) is assumed tobe dependent only on the previous model:

P(αt |xt−1, αt−1) ≡ P(αt |αt−1), (10)

Page 6: Object joint detection and tracking using adaptive multiple ...ncfrn.mcgill.ca/members/pubs/Wang_VisualComputer2013.pdfObject joint detection and tracking using adaptive multiple motion

Z. Wang et al.

and the transitions are specified by an Mm ×Mm transitionalprobability matrix, Π = [πij ], where

πij = P(αt = j |αt−1 = i)(i, j ∈ {1, . . . ,Mm}). (11)

In this paper, we rewrite the above prediction functionto take into account the object previous kinematic state inaddition to the previous motion model. In this way, priorknowledge on when and/or where to commute to a certainmotion model can be exploited to predict the current model.Applying the Bayes rule, and assuming xt−1 and αt−1 areconditionally independent given αt for simplicity, the newmotion model prediction function is written as

P(αt |xt−1, αt−1) = p(xt−1|αt )P (αt−1, αt )∑αt

p(xt−1|αt )P (αt−1, αt ). (12)

Compared with the corresponding function in Eq. (10), thenew function embeds an additional correlation between ob-ject motion model and kinematic state. This function indi-cates the likelihood of switching to a certain model αt givenboth the previous model αt−1 and kinematic state xt−1. Itcan be easily seen that when the second correlation is nottaken into account, Eq. (12) simply degenerates to the equal-ity in Eq. (10). To determine this prediction function, twoterms in the right hand side of Eq. (12), P(αt−1, αt ) andp(xt−1|αt ), need to be estimated following the next threesteps.

• Collecting Training Samples: A training set of M samplesneeds to be collected from a sequence as follows:

{(α1

t , x1t−1, α

1t−1

),(α2

t , x2t−1, α

2t−1

), . . . ,

(αM

t , xMt−1, α

Mt−1

)}.

Each sample triplet contains the motion model at a certaintime step, the object kinematic state and motion model atthe previous time step.

• Learning P(αt−1, αt ): The first term that needs to be de-termined is the joint probability of αt−1 and αt denotedby P(αt−1, αt ). It can be estimated as the relative fre-quency of the event (αt−1, αt ) occurring in the trainingset by the standard method for maximum-likelihood pa-rameter learning [23].

P(αt−1 = i, αt = j)

=∑

s δ((αst , x

st−1, α

st−1), (j, x

st−1, i))

M(13)

where δ is the Kronecker delta function: δ(a, b) = 1, ifa = b, and zero otherwise.

• Learning p(xt−1|αt ): The second term that needs to bedetermined, p(xt−1|αt ), is a probability density whichcan be estimated by various parametric and nonparamet-ric methods [23]. Parametric methods assume a certain

Fig. 3 Detecting large lumps in an oil sands video stream. (a) A viewof the scene where a large lump is present in the feed just before fallingin the crusher. (b) Region of interest which mainly comprises the dirtthat is to be processed. (c) Detection and localization of the large lump

model for the pdf to be estimated, and then fit the pa-rameters of that family of models to the observed dataset. Examples of the parametric models include Gaus-sian, Poisson, and Beta distributions. The problem withthe parametric methods is that they often oversimplify theunderlying distribution complexity of the data from thereal world. In contrast to parametric methods, nonpara-metric methods allow the underlying distribution com-plexity to grow with the data. Therefore they are preferredin estimating the probability density function p(xt−1|αt ),because the correlation conveyed by this distribution isoften complicated and varies with the applications. Pop-ular nonparametric methods include histogram and ker-nel density estimation (KDE), and the latter is employedin our work because it provides better performance thanthe former [24]. Hence, an individual probability densityfunction p(xt−1|αt = i) can be estimated for each modelfrom the training subset Y(i) = {(αs

t , xst−1, α

st−1)|αs

t = i,

s = 1 : M}.p(xt−1|αt = i)

= 1

|Y(i)|∑

(αst ,x

st−1,α

st−1)∈Y(i)

K(xt−1, x

st−1

), (14)

where K(xt−1, xst−1) is a kernel function, and the most

popular one is the Gaussian [23]. At this point, the adap-tive motion model prediction function P(αt |xt−1, αt−1)

can be determined.

In the following, an example from a real application,large lump detection, is used to illustrate the proposed mo-tion model prediction function more clearly. In oil sandsmining industry, it is extremely important to detect the pres-ence of large frozen lumps in the feed to a crusher, so thatprecautionary operational procedures can be taken to mini-mize the risk of crusher jamming [25]. Figure 3 is an exam-ple of a large lump present in the feed just before falling inthe crusher. The purpose is to detect this event, in real time,by locating the lump as shown in Fig. 3(c).

The object of interest in this problem, large lump, gener-ally moves from top to bottom along a vertical line, and itsmotions may be described roughly by two models: a slow

Page 7: Object joint detection and tracking using adaptive multiple ...ncfrn.mcgill.ca/members/pubs/Wang_VisualComputer2013.pdfObject joint detection and tracking using adaptive multiple motion

Object joint detection and tracking using adaptive multiple motion models

Fig. 4 Three consecutiveframes from the LLDapplication

Table 1 The joint probability P (αt−1, αt ) in the large lump detectionexample

Pr(αt−1, αt ) αt = m1 αt = m2

αt−1 = m1 0.94 0.03

αt−1 = m2 0 0.03

vertical acceleration model m1 due to the perspective distor-tion when a lump moves on the belt (as shown by the twoconsecutive frames in Figs. 4(a) and 4(b)), and a fast ver-tical acceleration model m2 due to gravity when the lumpreaches the end of the belt and falls down (as shown bythe two consecutive frames in Figs. 4(b) and 4(c)). There-fore, an interacting model set containing two motion models{m1,m2} is needed, and the possibility of switching to eachmotion model is not the same everywhere in the image. Forexample, it is much more probable that the lump switches tothe fast acceleration model m2 when it gets close to the endof the belt than in any other location. Therefore, the kine-matic state (the lump location on the row axis in this case)is correlated to the motion model and this correlation, whenlearned, helps in predicting the motion model accurately. Toachieve this, a training set is collected as described previ-ously,{(m1,1,m1), (m1,10,m1), . . . , (m1,260,m2)

}.

Each sample triplet contains the lump’s motion model ata certain time step, its location on the row axis, and its

motion model in the previous time step. In this example,the only kinematic state correlated to the motion model isthe lump’s location on the row axis. Based on the trainingset, the joint probability P(αt−1, αt ) is estimated as illus-trated in Table 1. Furthermore, the probability density func-tion p(xt−1|αt ) can also be estimated by the kernel den-sity estimation method. Figure 5(a) shows the two indi-vidually estimated density functions p(xt−1|αt = m1) andp(xt−1|αt = m2). Up to this point, the motion model pre-diction function P(αt |xt−1, αt−1) can be completely deter-mined. Figure 5(b) compares between this prediction func-tion, taking into account the kinematic state, and the originalprediction function which uses only the transition probabil-ity between different motion models. As shown in the figure,the new prediction function P(αt = m2|xt−1, αt−1 = m1)

varies depending on the object kinematic state xt−1 (thelump’s location on the row axis). On the contrary, the origi-nal prediction function P(αt = m2|αt−1 = m1) does not em-bed such correlation information, and therefore, is a hori-zontal line corresponding to a constant transition probabil-ity.

In summary, the new JDT method takes advantage of twokinds of prior motion knowledge which are the eventual mo-tion models that the object may undergo, and the probabili-ties of models switchings. Thus, the proposed JDT methodshould achieve better performances than the existing meth-ods when dealing with objects with variable motions mod-els.

Fig. 5 (a) The two estimated probability density functions of the lumplocation on the row axis given the two potential motion models inthe large lump example. The dashed line corresponds to p(xt−1|αt =m1) and the solid line corresponds to p(xt−1|αt = m2). (b) The

comparison between the proposed motion model prediction functionP (αt |xt−1, αt−1) and the original prediction function P (αt |αt−1), us-ing the probability of the transition between the previous motion modelm1 and the current motion model m2 as an example

Page 8: Object joint detection and tracking using adaptive multiple ...ncfrn.mcgill.ca/members/pubs/Wang_VisualComputer2013.pdfObject joint detection and tracking using adaptive multiple motion

Z. Wang et al.

Input: {(xit−1, α

it−1,E

it−1,w

it−1)}Ni=1, Zt

Output: {(xit , α

it ,E

it ,w

it )}Ni=1

1. Given {Eit−1}Ni=1, generate {Ei

t }Ni=1 according to the TPM specified by Eq. (5).2. Based on {Ei

t−1}Ni=1 and {Eit }Ni=1, generate {(xi

t , αit ,E

it )}Ni=1 from {(xi

t−1, αit−1,E

it−1)}Ni=1 according to Eq. (9).

3. Given Zt , compute the weights {wit }Ni=1 for {(xi

t , αit ,E

it )}Ni=1 according to Eq. (15).

4. Normalize {wit }Ni=1 to {w̃i

t }Ni=1.5. Resample from {(xi

t , αit ,E

it , w̃

it )}Ni=1 for N times to obtain a new set of particles {(xi

t , αit ,E

it ,1/N)}Ni=1.

Fig. 6 The particle filter implementation of the proposed JDT method

Fig. 7 (a) Experimental scenario. The object is represented by a smallsquare with the texture shown in the right panel. (b) Object appearanceused in the experiment. Note that in this experiment an object is mod-eled by its intensity histogram in a rectangular region, although withmodification, other types of object model can be accommodated by thealgorithm

3 Particle filter implementation

The JDT method derived in the previous section is im-plemented with a particle filter described with a pseudo-code depicted in Fig. 6. The posterior pdf at the previ-ous time is approximated by a set of weighted particles{(xi

t−1, αit−1,E

it−1,w

it−1)}Ni=1, where wi

t−1 is the ith parti-cle weight at time t − 1. The proposed algorithm takes asinput the set of particles at the previous time instant and thecurrent observed image; and the output is the set of parti-cles at the current time. In the following, we will explain thealgorithm in further detail.

• The first step predicts the current existence variable ac-cording to the previous existence variable and the TPMspecified by Eq. (5) (refer to the pseudo-code in Fig. 6).

• Given the predicted existence variable, the second steppredicts the object’s current state based on the previousstate and the predicted state function defined in Eq. (9).

• The third step weighs the particles representing the pre-dicted state given the current observed image. To achievethe detection mechanism conveniently, the likelihood ra-tio is used as the particle weight here rather than the mea-surement model p(Zt |Xt). The likelihood ratio can be

calculated as

L(Zt |Xt) ={

p(Zt |xt ,αt ,Et=1)p(Zt |Et=0)

for Et = 1,

1 for Et = 0.(15)

• The fourth step performs normalization.• The fifth and last step is the standard resampling step,

which converts the set of weighted particles back to anequivalent set of unweighed particles approximating thecurrent posterior pdf.

Once the posterior pdf p(xt , αt ,Et |Zt) is approximated bythe set of particles, the probability that the object exists isestimated based on Eq. (2) as

P(Et = 1

∣∣Zt) = 1

N∑

i=1

δ(Ei

t ,1). (16)

4 Experimental results

In this section, we demonstrate the flexibility of the pro-posed JDT method for detecting objects with various setsof interacting motion models: (1) constant velocity model(vt = vt−1 up to an additional noise wt−1) and bouncingmodel (vt = −vt−1 up to an additional noise wt−1), (2) ac-celeration model (vt = vt−1 + a up to an additional noisewt−1) and bouncing model, (3) acceleration model and ran-dom walk model (Xt = Xt−1 + wt−1), and finally (4) ac-celeration models with different acceleration rates. vt is thevelocity, a is the acceleration rate, and Xt = (x, y)t repre-sents the object location. In order to assess the effect of theproposed technique, referred to by Adaptive MMMJDT, wecompare it against existing JDT techniques using a singlemotion model assumption and referred to by SMMJDTs. Tothis end, we consider experiments where prior knowledgeabout the correlation between motion changes and the ob-ject kinematic state is available. All the appearance modelsadopted in the following are based on color histogram asin [16], except in the large lump detection example wherethe appearance model is specifically proposed for this appli-cation in [26].

Page 9: Object joint detection and tracking using adaptive multiple ...ncfrn.mcgill.ca/members/pubs/Wang_VisualComputer2013.pdfObject joint detection and tracking using adaptive multiple motion

Object joint detection and tracking using adaptive multiple motion models

Fig. 8 (a) The two estimated probability density functions of the syn-thetic object’s location on the column axis given the two potential mo-tion models, the constant velocity model (m1) and the bouncing model(m2). The dashed line corresponds to p(xt−1|αt = m1) and the solidline corresponds to p(xt−1|αt = m2). (b) The comparison between

the proposed motion model prediction function P (αt |xt−1, αt−1) andthe original prediction function P (αt |αt−1), using the probability ofthe transition between the previous motion model m1 and the currentmotion model m2 as an example

4.1 Constant velocity model with bouncing model

In the first experiment, we demonstrate examples of de-tecting objects with constant velocity model and bouncingmodel. We first use a synthetic example to illustrate clearlythe problem of interest, and then give another example on aping-pong video.

The synthetic example creates a scenario where an objectmoves back and forth as shown in Fig. 7(a). The object of in-terest is a square with the pattern shown in Fig. 7(b). The ob-ject of interest is generated as to move in a constant velocityfrom one side to the other during 10 frames before it bouncesand to which a random noise is added. It is worth mention-ing that this information is not provided to the algorithm,i.e., the algorithm does not know when and where exactlythe object bounces. In this case, the object motion can bedescribed by a constant velocity model (m1) and a bouncingmodel (m2). Our proposed method, Adaptive MMMJDT, in-tegrates both the two motion models and exploits the priorknowledge that the object is more likely to switch from itsconstant velocity model to the bouncing model around theleft and right borders of the image. This prior knowledge islearned from the training set as presented in Sect. 2.2, andit is illustrated in Fig. 8(a) by the two individually estimateddensity functions p(xt−1|αt = m1) and p(xt−1|αt = m2). Inthis specific case, the kinematic state xt−1 is the object lo-cation on the column axis. By embedding this prior knowl-edge, the new motion model prediction function can predictthe current motion model adaptively based on the previousmotion model and kinematic state as explained in Fig. 8(b).However, when such knowledge is ignored, the original pre-diction function predicts the current motion model based onthe previous one with a fixed transition probability. As a re-sult, the detection performance of Adaptive MMMJDT ismuch enhanced in comparison to two SMMJDTs as shownin Fig. 9. SMMJDT1 employs a random walk model andSMMJDT2 employs a constant velocity model. The results

Fig. 9 Existence probability curves of Adaptive MMMJDT, SM-MJDT1 (random walk method) and SMMJDT2 (constant velocitymodel method)

show that Adaptive MMMJDT can detect the object con-sistently with an existence probability above 0.5 as the em-bedded multiple motion models handle the object’s motionchanges adaptively. In contrast, SMMJDT1 and SMMJDT2lose tracking when the object moves with a motion patterninconsistent with its assumed single motion model and, as aresult, their existence probabilities drop repeatedly every 10frames, leading to incorrect detection decisions.

Furthermore, we also tested the previous three methodson the same video but perturbed with a high level of noise.Similarly, the results, in terms of existence probability, aredepicted in Fig. 10. In spite of the high level of noise inthis case, Adaptive MMMJDT is able to accumulate the evi-dence consistently even though the object changes its model,and finally detect the object after strong enough evidence isaccumulated (after approximately 15 frames). On the con-trary, the constant velocity model method, SMMJDT2, losesits accumulated evidence every time the object changes itsmodel, and as a result is unable to detect the object. Thesame behavior is noticed for the random walk method, SM-

Page 10: Object joint detection and tracking using adaptive multiple ...ncfrn.mcgill.ca/members/pubs/Wang_VisualComputer2013.pdfObject joint detection and tracking using adaptive multiple motion

Z. Wang et al.

Fig. 10 Existence probability curves of Adaptive MMMJDT, SM-MJDT1 (random walk method) and SMMJDT2 (constant velocitymodel method) when significant noise exists in the video

Fig. 11 An example frame of the sequence ping pong

MJDT1, when significant noise exists, indicating that con-servative prediction is unable to detect the object. Consistentwith the results in Fig. 9, we can see that the proposed JDTmethod has the best performance dealing with the situationswhere the object changes its motion model.

In the second example, we detect a ping-pong ball in asequence containing 200 frames. Figure 11 shows a sampleframe where the ball is highlighted by a big rectangle forbetter visualization. In this case, the ping-pong ball’s mo-tion can be described by three models: a constant velocitymodel (m1), a vertical direction bouncing model (m2) anda horizontal direction bouncing model (m3). The latter twomodels are due to the ball hitting the table and the bats. Thissequence presents a difficulty that the object switches amongthree motion models frequently. To tackle this difficulty, ourproposed method first embeds the three motion models intothe JDT framework. Then it exploits the correlation betweenthese models and object kinematic state when predicting thecurrent motion model. In this case, the object of interest ismore likely to switch to each one of the three motion mod-els at some specific locations. For instance, the object hereis more likely to switch to the vertical bouncing model, m2,

at the table vicinity while it rather tends to switch to thehorizontal bouncing model, m3, close to the ping-pong bats.Therefore, it has been learned from the training sequenceand illustrated in Fig. 12 by the three individually estimateddensity functions p(xt−1|αt = m1), p(xt−1|αt = m2) andp(xt−1|αt = m3). In this case, the object kinematic statext−1 is simply the ping-pong ball location. As shown inFig. 12, the kinematic state distributions corresponding tothe second and third motion models, p(xt−1|αt = m2) andp(xt−1|αt = m3), peak at certain locations, while the distri-bution corresponding to the first constant velocity motionmodel is not biased towards any specific location. Usingthis correlation prior knowledge produces an adaptive pre-diction function P(αt |xt−1, αt−1) with regard to the objectkinematic state as illustrated in Fig. 12(d). On the contrary,the original motion model prediction function P(αt |αt−1)

depends only on the previous motion model and remains aconstant regarding the previous object kinematic state. Fig-ure 13 shows the detection results of Adaptive MMMJDTand two SMMJDTs. The latter two methods employ a modelwith constant velocity in both vertical and horizontal direc-tions (indicated by SMMJDT1) and a model with constantvelocity in horizontal direction while constant accelerationin vertical direction (indicated by SMMJDT2). From thisfigure, it can be seen that with a single motion model, SM-MJDT1 and SMMJDT2 cannot accumulate the detection ev-idence properly. In other words, the ball cannot be detectedconsistently. However, by embedding multiple motion mod-els in JDT adaptively, a better detection performance can beachieved and this is illustrated by the higher estimated exis-tence probability.

4.2 Constant acceleration model with bouncing model

In this experiment, we perform object detection with con-stant acceleration model and bouncing model. We consideran example on a soccer sequence containing 130 frameswhere a girl juggles a soccer ball with her feet. A sampleframe of this sequence is shown in Fig. 14. In this exam-ple, the soccer ball’s motion can be described roughly bya constant acceleration model (m1) and a vertical directionbouncing model (m2). Our proposed method embeds bothmotion models and we also exploit the prior knowledge thatthe faster the ball falls down the bigger probability it mayswitch to the second motion model. This knowledge is illus-trated in Fig. 15(a) by the two individually estimated den-sity functions p(xt−1|αt = m1) and p(xt−1|αt = m2). In thiscase, the object kinematic state xt−1 is the vertical veloc-ity.2 By exploiting such correlation, the probability of pre-dicting the current motion model depends henceforth on the

2Note that kinematic stat employed here, the vertical velocity, is dif-ferent from the previous examples. The formulation is general enoughand not limited to object locations.

Page 11: Object joint detection and tracking using adaptive multiple ...ncfrn.mcgill.ca/members/pubs/Wang_VisualComputer2013.pdfObject joint detection and tracking using adaptive multiple motion

Object joint detection and tracking using adaptive multiple motion models

Fig. 12 (a) The estimated probability density function of theping-pong ball’s location given the constant velocity model,p(xt−1|αt = m1). It is not biased towards any specific location. (b) Theestimated probability density function of the ping-pong ball’s locationgiven the vertically bouncing model, p(xt−1|αt = m2). It peaks at cer-tain locations as illustrated. (c) The estimated probability density func-

tion of the ping-pong ball’s location given the horizontally bouncingmodel, p(xt−1|αt = m3). It peaks at certain locations as shown in thefigure. (d) The comparison between the proposed motion model pre-diction function P (αt |xt−1, αt−1) and the original prediction functionP (αt |αt−1), using the probability of the transition between the previ-ous motion model m1 and the current motion model m3 as an example

Fig. 13 Existence probability curves of Adaptive MMMJDT andSMMJDTs (two single motion model JDT methods) detecting theping-pong ball in sequence ping pong

previous object kinematic state. Again, this is illustrated byFig. 15(b) via a comparison against the original predictionfunction using a constant probability. Figure 16 shows theperformance comparison between SMMJDT and Adaptive

Fig. 14 An example frame of the sequence soccer

MMMJDT. From this figure, we can see that with a singlemotion model, the existing JDT method fails to detect thesoccer ball every time the ball bounces. Therefore, it is ob-viously important to use multiple motion models adaptivelyin JDT to detect objects that change motions.

Page 12: Object joint detection and tracking using adaptive multiple ...ncfrn.mcgill.ca/members/pubs/Wang_VisualComputer2013.pdfObject joint detection and tracking using adaptive multiple motion

Z. Wang et al.

Fig. 15 The two estimated probability density functions of the soc-cer ball’s vertical velocity given the two potential motion mod-els, p(xt−1|αt = m1) (corresponding to the vertically accelerationmodel) and p(xt−1|αt = m2) (corresponding to the vertically bouncingmodel). The dashed line corresponds to p(xt−1|αt = m1) and the solid

line corresponds to p(xt−1|αt = m2). (b) The comparison between theproposed motion model prediction function P (αt |xt−1, αt−1) and theoriginal prediction function P (αt |αt−1), using the probability of thetransition between the previous motion model m1 and the current mo-tion model m2 as an example

Fig. 16 Existence probability curves of Adaptive MMMJDT and SM-MJDT (constant acceleration model JDT method) detecting the soccerball in sequence soccer

4.3 Constant acceleration model with random walk model

In the third experiment, we show an example of detectingobjects with constant acceleration model and random walkmodel. We use a sequence slide which contains 28 frameswhere a person slides down as shown in a sample frame inFig. 17. The person is highlighted by the yellow rectangle inthe figure for visualization reasons. In this case, the objectchanges its motion from an acceleration model (m1) on theslide to a random walk model (m2) off the slide. Our pro-posed JDT method first embeds both models in the frame-work so that it can switch the system model when the per-son actually changes its motion. Then, to predict the motionmodel accurately, it exploits the prior knowledge that theperson moves more likely in a specific motion model aroundcertain locations. This prior knowledge can be learned fol-lowing Eq. (12), and it is depicted in Fig. 18(a) by the twoindividually estimated density functions p(xt−1|αt = m1)

and p(xt−1|αt = m2). In this case, the object kinematic state

Fig. 17 An example frame of the sequence slide

xt−1 employs the object location along the vertical axis.Similarly to the previous experiments, Fig. 18(b) shows thecomparison between the new motion model prediction func-tion P(αt |xt−1, αt−1), which exploits the correlation infor-mation described above, and the original motion model pre-diction function P(αt |αt−1). Obviously, the former predictsthe motion model adaptively to the object state, while thelatter does not. Figure 19 demonstrates that embedding twomotion models adaptively in JDT provides a better detectionperformance than using a single motion model. Three singlemotion models are employed in this comparison, constantvelocity, constant acceleration, random walk. None of themwas able to describe the actual object motion models as ac-curately as Adaptive MMMJDT.

4.4 Multiple constant acceleration models

In this experiment, we show an example of detecting ob-jects with multiple motion models in a real application,large lump detection, which has been introduced in the pre-vious Sect. 2.2.2. As discussed before, the objects in this

Page 13: Object joint detection and tracking using adaptive multiple ...ncfrn.mcgill.ca/members/pubs/Wang_VisualComputer2013.pdfObject joint detection and tracking using adaptive multiple motion

Object joint detection and tracking using adaptive multiple motion models

Fig. 18 The two estimated probability density functions of the object’svertical location given the two potential motion models, p(xt−1|αt =m1) (corresponding to the random walk model) and p(xt−1|αt = m2)

(corresponding to the vertically acceleration model). The dashed linecorresponds to p(xt−1|αt = m1) and the solid line corresponds to

p(xt−1|αt = m2). (b) The comparison between the proposed motionmodel prediction function P (αt |xt−1, αt−1) and the original predictionfunction P (αt |αt−1), using the probability of the transition betweenthe previous motion model m1 and the current motion model m2 as anexample

Fig. 19 Existence probability curves of Adaptive MMMJDT and SM-MJDTs (three single motion model JDT methods) detecting the personin sequence slide

problem, lumps, may be considered to have two motionmodels, a small constant acceleration model on the con-veyor and a large constant acceleration model off the con-veyor. We utilize these two motion models adaptively inour method (as previously described in Sect. 2.2.2) to detectlumps and compare the results with classic JDT methods us-ing only one motion model. Figure 20 shows a few consec-utive frames where one lump moves down into the crusher.Initially, it moves really slowly, for example from frame 1(Fig. 20(a)) to frame 2 (Fig. 20(b)), and then gradually fasteruntil frame 7 (Fig. 20(c)). From frame 7 (Fig. 20(c)) to frame

8 (Fig. 20(d)) the lump moves much faster than before. Fig-ure 21 shows the comparison of four methods on this ex-ample, Adaptive MMMJDT using two constant accelerationmodels and three SMMJDTs with respectively a small con-stant acceleration model, a constant velocity model, and aconstant position model. The three SMMJDTs methods donot match the actual lump motion model well especiallyfrom frame 7 to frame 8 when the lump changes its ac-celeration rate and they obviously lose tracking as well asthe detection of the lump. Adaptive MMMJDT, in contrast,continues accumulating the evidence and finally detects thelump. Even from frame 7 to frame 8, it can still adjust itselfby switching the filter’s model to fit the actual lump motionmodel.

5 Conclusion

An adaptive multi-motion model JDT method is proposedfor detecting objects that may undergo sudden changes inmotion. The proposed method tackles simultaneously theproblem of detecting objects and handling objects’ suddenchanges in motion and formulates them in a probabilisticframework. This is achieved by employing two sets of in-teracting models in an IMM estimator, one for handling avarying number of objects and the other for handling vari-able motion models. The proposed method then exploits thecorrelation between the motion model and object kinematic

Fig. 20 Sequence of a lump.(a) Frame 1, (b) frame 2,(c) frame 7, (d) frame 8

Page 14: Object joint detection and tracking using adaptive multiple ...ncfrn.mcgill.ca/members/pubs/Wang_VisualComputer2013.pdfObject joint detection and tracking using adaptive multiple motion

Z. Wang et al.

Fig. 21 Existence probability curves of Adaptive MMMJDT and SM-MJDTs (three single motion model JDT methods) detecting the largelump in Fig. 20

state, which is not exploited in the existing IMM estima-tors, to predict the motion model accurately. Experimentson both synthetic and real examples illustrated that the pro-posed JDT method detects objects more accurately than theexisting JDT methods when objects change their motions.

References

1. Mazor, E., Averbuch, A., Bar-Shalom, Y., Dayan, J.: Interactingmultiple model methods in target tracking: a survey. IEEE Trans.Aerosp. Electron. Syst. 34(1), 103–123 (1998)

2. Blom, H., Bar-Shalom, Y.: The interacting multiple model algo-rithm for systems with Markovian switching coefficients. IEEETrans. Autom. Control 33(8), 780–783 (1988)

3. Bar-Shalom, Y., Li, X.R., Kirubarajan, T.: Estimation with Appli-cations to Tracking and Navigation. Wiley-Interscience, New York(2001)

4. Arulampalam, M.S., Ristic, B., Gordon, N., Mansell, T.: Bearings-only tracking of manoeuvring targets using particle filters.EURASIP J. Appl. Signal Process. 2004(1) (2004)

5. Ristic, B., Arulampalam, M.S.: Tracking a manoeuvring target us-ing angle-only measurements: algorithms and performance. SignalProcess. 83(6) (2003)

6. Isard, M., Blake, A.: A mixed-state condensation tracker with au-tomatic model-switching. In: IEEE International Conference onComputer Vision, pp. 107–112 (1998)

7. Du, S.c., Shi, Z.g., Zang, W., Chen, K.s.: Using interacting mul-tiple model particle filter to track airborne targets hidden in blinddoppler. J. Zhejiang Univ. Sci. A 8(8), 1277–1282 (2007)

8. McGinnity, S., Irwin, G.W.: Multiple model bootstrap filter formaneuvering target tracking. IEEE Trans. Aerosp. Electron. Syst.36, 1006–1012 (2000)

9. Boers, Y., Driessen, J.: Interacting multiple model particle filter.IEE Proc. Radar Sonar Navig. 150(5), 344–349 (2003)

10. Chen, J., Kim, M., Wang, Y., Ji, Q.: Switching Gaussian processdynamic models for simultaneous composite motion tracking andrecognition. In: IEEE Conference on Computer Vision and PatternRecognition, pp. 2655–2662 (2009)

11. Ristic, B., Arulampalam, S., Gordon, N.J.: Beyond the KalmanFilter: Particle Filters for Tracking Applications. Artech House,Norwood (2004)

12. Rutten, M.G., Ristic, B., Gordon, N.J.: A comparison of particlefilters for recursive track before detect. In: International Confer-ence on Information Fusion, vol. 1, pp. 169–175 (2005)

13. Isard, M., MacCormick, J.: Bramble: a Bayesian multiple-blobtracker. In: IEEE International Conference on Computer Vision,vol. 2, pp. 34–41 (2001)

14. Ng, W., Li, J., Godsill, S., Vermaak, J.: A hybrid approach foronline joint detection and tracking for multiple targets. In: IEEEAerospace Conference, pp. 2126–2141 (2005)

15. Nandakumaran, N., Sinha, A., Kirubarajan, T.: Joint detection andtracking of unresolved targets with monopulse radar. IEEE Trans.Aerosp. Electron. Syst. 44(4), 1326–1341 (2008)

16. Czyz, J., Ristic, B., Macq, B.: A color-based particle filter for jointdetection and tracking of multiple objects. In: IEEE InternationalConference on Acoustics, Speech and Signal Processing, vol. 2,pp. 217–220 (2005)

17. Czyz, J., Ristic, B., Macq, B.M.: A particle filter for joint detectionand tracking of color objects. Image Vis. Comput. 25(8), 1271–1281 (2007)

18. Sun, H., Wang, C., Wang, B., El-Sheimy, N.: Independently mov-ing object detection and tracking using stereo vision. In: IEEE In-ternational Conference on Information and Automation, pp. 1936–1941 (2010)

19. Punithakumar, K., Kirubarajan, T., Sinha, A.: Multiple-modelprobability hypothesis density filter for tracking maneuvering tar-gets. IEEE Trans. Aerosp. Electron. Syst. 44(1), 87–98 (2008)

20. Bi, S., Ren, X.Y.: Maneuvering target doppler-bearing trackingwith signal time delay using interacting multiple model algo-rithms. Prog. Electromagn. Res. 87, 15–41 (2008)

21. Jaeggli, T., Koller-Meier, E., Gool, L.V.: Learning generativemodels for multi-activity body pose estimation. Int. J. Comput.Vis. 83, 121–134 (2009)

22. Black, M., Jepson, A.: Recognizing temporal trajectories using thecondensation algorithm. In: Proceedings of the IEEE InternationalConference on Automatic Face and Gesture Recognition, pp. 16–21 (1998)

23. Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Ap-proach. Prentice Hall, New York (2009)

24. Corder, G.W., Foreman, D.I.: Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach. Wiley, New York (2009)

25. Zhang, H.: Image processing for the oil sands mining industry.IEEE Signal Process. Mag. 25(6), 198–200 (2008)

26. Wang, Z., Zhang, H.: Large lump detection using a particle filterof hybrid state variable. In: International Conference on Advancesin Pattern Recognition, pp. 14–17 (2009)

Zhijie Wang received the Ph.D. de-gree in computer science from theUniversity of Alberta, in January2012. He has been a postdoctoralfellow at the Univrsity of Water-loo in 2012 and at the Universityof Western Ontario since January2013. His recent research interestsare in machine vision amd medicalimaging.

Page 15: Object joint detection and tracking using adaptive multiple ...ncfrn.mcgill.ca/members/pubs/Wang_VisualComputer2013.pdfObject joint detection and tracking using adaptive multiple motion

Object joint detection and tracking using adaptive multiple motion models

Mohamed Ben Salah was bornin 1980. He received Ph.D. de-gree in computer science from theNational Institute of Scientific Re-search (INRS-EMT), Montreal, Que-bec, Canada, in January 2011. Hejoined the department of Comput-ing Science, University of Albertain February 2011 as a postdoctoralfellow. His research interests arein image and motion segmentationwith focus on level set and graphcut methods, and shape priors. Cur-rently he is working on brain tumordetection and grwoth prediction and

on other problems in medical imaging.

Hong Zhang has been with theDepartment of Computing Science,University of Alberta where he iscurrently a Professor, since 1988.He has held an NSERC IndustrialResearch Chair (IRC), supportedjointly by Syncrude Canada Ltd.(SCL) and Alberta Innovates since2003. He is affiliated with the Cen-tre for Intelligent Mining Systems(CIMS) and the Robotics ResearchLaboratory. His research interestsspan robotics, computer vision, andimage processing.