10
Using a Non-prior Training Active Feature Model Sangjin Kim 1 , Jinyoung Kang 1 , Jeongho Shin 1 , Seongwon Lee 1 , Joonki Paik 1 , Sangkyu Kang 2 , Besma Abidi 2 , and Mongi Abidi 2 1 Image Processing and Intelligent Systems Laboratory, Department of Image Engineering, Graduate School of Advanced Imaging Science, Multimedia, and Film, Chung-Ang University, 221 Huksuk-Dong, Tongjak-Ku, Seoul 156-756, Korea, [email protected], http://ipis.cau.ac.kr 2 Imaging, Robotics, and Intelligent Systems Laboratory, Department of Electrical and Computer Engineering, The University of Tennessee, Knoxville, TN 37996-2100, USA http://imaging.utk.edu Abstract. This paper presents a feature point tracking algorithm using optical flow under the non-prior training active feature model (NPT- AFM) framework. The proposed algorithm mainly focuses on analysis of deformable objects, and provides real-time, robust tracking. The pro- posed object tracking procedure can be divided into two steps: (i) opti- cal flow-based tracking of feature points and (ii) NPT-AFM for robust tracking. In order to handle occlusion problems in object tracking, feature points inside an object are estimated instead of its shape boundary of the conventional active contour model (ACM) or active shape model (ASM), and are updated as an element of the training set for the AFM. The pro- posed NPT-AFM framework enables the tracking of occluded objects in complicated background. Experimental results show that the proposed NPT-AFM-based algorithm can track deformable objects in real-time. 1 Introduction The problem of deformable object tracking by analyzing motion and shape in two-dimensional (2D) video is of increasing importance in a wide range of ap- plication areas including computer vision, video surveillance, motion analysis and extraction for computer animation, human-computer interface (HCI), and object-based video compression [1,2,3,4]. There have been various research results of object extraction and tracking. One of the simplest methods is to track difference regions within a pair of con- secutive frames [1], and its performance can be improved by using adaptive This work was supported by Korean Ministry of Science and Technology under the National Research Lab. Project, by Korean Ministry of Education under Brain Korea 21 Project, by the University Research Program in Robotics under grant DOE-R01-1344148, by the DOD/TACOM/NAC/ARC Program R01-1344-18, and by FAA/NSSA Program, R01-1344-48/49. K. Aizawa, Y. Nakamura, and S. Satoh (Eds.): PCM 2004, LNCS 3333, pp. 69–78, 2004. c Springer-Verlag Berlin Heidelberg 2004

Optical Flow-Based Tracking of Deformable Objects Using a Non-prior Training Active Feature Model

Embed Size (px)

Citation preview

Using a Non-prior Training Active FeatureModel�

Sangjin Kim1, Jinyoung Kang1, Jeongho Shin1, Seongwon Lee1, Joonki Paik1,Sangkyu Kang2, Besma Abidi2, and Mongi Abidi2

1 Image Processing and Intelligent Systems Laboratory,Department of Image Engineering,

Graduate School of Advanced Imaging Science, Multimedia, and Film,Chung-Ang University, 221 Huksuk-Dong, Tongjak-Ku, Seoul 156-756, Korea,

[email protected], http://ipis.cau.ac.kr2 Imaging, Robotics, and Intelligent Systems Laboratory,

Department of Electrical and Computer Engineering,The University of Tennessee, Knoxville, TN 37996-2100, USA

http://imaging.utk.edu

Abstract. This paper presents a feature point tracking algorithm usingoptical flow under the non-prior training active feature model (NPT-AFM) framework. The proposed algorithm mainly focuses on analysisof deformable objects, and provides real-time, robust tracking. The pro-posed object tracking procedure can be divided into two steps: (i) opti-cal flow-based tracking of feature points and (ii) NPT-AFM for robusttracking. In order to handle occlusion problems in object tracking, featurepoints inside an object are estimated instead of its shape boundary of theconventional active contour model (ACM) or active shape model (ASM),and are updated as an element of the training set for the AFM. The pro-posed NPT-AFM framework enables the tracking of occluded objects incomplicated background. Experimental results show that the proposedNPT-AFM-based algorithm can track deformable objects in real-time.

1 Introduction

The problem of deformable object tracking by analyzing motion and shape intwo-dimensional (2D) video is of increasing importance in a wide range of ap-plication areas including computer vision, video surveillance, motion analysisand extraction for computer animation, human-computer interface (HCI), andobject-based video compression [1,2,3,4].

There have been various research results of object extraction and tracking.One of the simplest methods is to track difference regions within a pair of con-secutive frames [1], and its performance can be improved by using adaptive� This work was supported by Korean Ministry of Science and Technology under

the National Research Lab. Project, by Korean Ministry of Education under BrainKorea 21 Project, by the University Research Program in Robotics under grantDOE-R01-1344148, by the DOD/TACOM/NAC/ARC Program R01-1344-18, andby FAA/NSSA Program, R01-1344-48/49.

K. Aizawa, Y. Nakamura, and S. Satoh (Eds.): PCM 2004, LNCS 3333, pp. 69–78, 2004.c© Springer-Verlag Berlin Heidelberg 2004

70 S. Kim et al.

background generation and subtraction. Based on the assumption of stationarybackground, Wren et al. proposed a real-time blob tracking algorithm, where theblob can be obtained from object’s histogram [5,6].

Shape-based tracking obtains a priori shape information of an object-of-interest, and projects a trained shape onto the closest shape in a certain im-age frame. This type of methods include contour-based method [7,8,9], activeshape model (ASM) [10], state-space sampling, and condensation algorithm [9].Although the existing shape-based algorithms can commonly deal with partialocclusion, they exhibit several serious problems in the practical application, suchas (i) a priori training of the shape of a target object and (ii) iterative modellingprocedure for convergence. The first problem hinders the original shape-basedmethod from being applied to tracking objects of unpredictable shapes. Thesecond problem becomes a major bottleneck for real-time implementation.

This paper presents a non-prior training active feature model (NPT-AFM)that generates training shapes in real-time without pre-processing. The proposedAFM can track a deformable object by using a greatly reduced number of featurepoints rather than taking the entire shape. The NPT-AFM algorithm extractsan object using motion segmentation, and determines feature points inside theobject. Such feature points tend to approach toward strong edge or boundary ofan object. Selected feature points in the next frame are predicted by optical flow.If a feature point is missing or failed in tracking, an additional compensationprocess restores it.

In summary major contribution of the proposed NPT-AFM algorithm istwofold: (i) real-time implementation framework obtained by removing a priortraining process and (ii) AFM-based occlusion handling using a significantlyreduced number of feature points.

The remaining part of this paper is organized as follows. In Section 2, anoverview of the proposed tracking framework is given. In Section 3, optical flow-based tracking of feature points is presented. In Section 4, the NPT-AFM-basedtracking algorithm for occlusion handling is proposed. Experimental results areprovided in Section 5, and Section 6 concludes the paper.

2 Overview of the Feature-Based Tracking Framework

The proposed feature-based tracking algorithm is shown as a form of flowchartin Fig. 1. The dotted box represents the real-time feature tracking, prediction,and correction processes from the tth frame to the t + 1st frame. In the ob-ject segmentation step we extract an objet based on motion direction by usingmotion-based segmentation and labeling.

We classify object’s movement into four directions, extract suitable featurepoints for tracking, and predict the corresponding feature points in the nextframe. A missing feature point during the tracking process is checked and re-stored. If over 60% of feature points are restored, we decided the set of featurepoints are not proper for tracking and redefine new set of points. We detect oc-clusion by using labeling information and motion direction, and the NPT-AFM

Using a Non-prior Training Active Feature Model 71

Fig. 1. The proposed optical flow-based tracking algorithm

process, which updates training sets at each frame up to 70, restores the entireshape from the occluded input.

The advantages of the proposed tracking algorithm can be summarized as:(i) It can track both rigid and deformable objects without a priori trainingprocess and update of the training set at each frame enables real-time, robusttracking. (ii) It is robust against object’s sudden motion because both motiondirection and feature points are tracked at the same time. (iii) Its tracking per-formance is not degraded even with complicated background because featurepoints are assigned inside the object near boundary. (iv) It contains NPT-AFMprocedure that can handle partial occlusion in real-time.

3 Optical Flow-Based Tracking of Feature Points

The proposed algorithm tracks feature points based on optical flow. A missingfeature point during the tracking is restored by using both temporal and spatialinformation inside the predicted region.

3.1 Feature Point Initialization and Extraction

We extract motion from a video sequence and segment regions based on thedirection of motion using Lucas-Kanade’s optical flow method [11]. Due to thenature of optical flow, an extracted region has noise and holes, which are removedby morphological operations.

After segmentation of an object from background, we extract a set of featurepoints inside the object by using Shi-Tomasi’s feature tracking algorithm [13].The corresponding location of each feature point in the following frame is pre-dicted by using optical flow. These procedures are summarized in the followingalgorithm.

72 S. Kim et al.

1. Preprocess the region-of-interest using a Gaussian lowpass filter.2. Compute the deformation matrix using directional derivatives at each pixel

in the region as

D =[

dxx dxy

dyx dyy

](1)

where, for example, dxx represents the 2nd order derivative in the x direction.3. Compute eigenvalues of the deformation matrix, and perform non-maxima

suppression. In this work we assume that local minima exist in the 5 × 5neighborhood.

4. Discard eigenvalues that is smaller than a pre-specified threshold, and discardthe predicted feature points that does not satisfy the threshold distance.

5. Predict the corresponding set of feature points in the next frame using opticalflow.

Due to the nature of motion estimation, motion-based segmentation usallybecomes a little bit larger than the real object, which results in false extractionof feature points outside the object. These outside feature points are removedby considering the distance between predicted feature points given in

d =N∑

t=1

M∑i=1

√(xi

t+1 − xit)2 − (yi

t+1 − yit)2 < tn, (2)

where t represents the number of frames, and i the number of feature points.The results of outside point removal are shown in Fig. 2.

(a) The 2nd frame (b) The 4th frame (c) The 7th frame

Fig. 2. Results of outside feature point removal (Two outside feature points highlightedby circles in (a) are removed in (b) and (c).)

3.2 Feature Point Prediction and Correction

In many real-time, continuous video tracking applications, the feature-basedtracking algorithm fails due to the following reasons: (i) self or partial occlusionsof the object and (ii) feature points on or outside the boundary of the object,which are affected by changing background.

Using a Non-prior Training Active Feature Model 73

In order to deal with the tracking failure, we should correct the erroneouslypredicted feature points by using the location of the previous feature points andinter-pixel relationship between the predicted points. Here we summarize theprediction algorithm proposed in [12].

1. Temporal Prediction: Let the location of a feature block at frame t, whichwas not tracked to frame t+1, be vi

t, i ∈ {i, ..., Mt}. Its location is predictedusing the average of its motion vectors in the previous K frames as

v̂t+1i = v̂t

i +1K

K−1∑k=0

mit−k, (3)

where mit = vi

t − vit−1 denotes the motion vector of feature block i at

frame t, and K represents the number of frames for motion averaging . Theparameter K may be adjusted depending on the activity present in the scene.

2. Spatial Prediction: We can correct the erroneous prediction by replacing withthe average motion vector of successfully predicted feature points.

3. Re-Investigation of The Predicted Feature Point : Assign a region includingthe predicted-corrected feature point. If a feature point is extracted in thenext frame, it is updated as a new feature point. If more than 60% featurepoints are predicted, feature extraction is repeated.

The temporal prediction is suitable for deformable objects while the spatialprediction is good for non-deformable objects. Both temporal and spatial pre-diction results can also be combined with proper weights. In this work, we usedK = 7 for temporal prediction.

4 NPT-AFM for Robust Tracking

The most popular approach to tracking 2D deformable objects is to use theobject’s boundary. ASM-based tracking falls into this category. ASM can analyzeand synthesize a priori trained shape of an object even if the input is noisyor occluded [14]. On the other hand, a priori generation of training sets anditerative convergence prevent the ASM from being used for real-time, robusttracking. We propose a real-time updating method of the training set insteadof off-line preprocessing, and also modify the ASM by using only a few featurepoints instead of the entire landmark points. NPT-AFM refers to the proposedreal-time efficient modeling method.

4.1 Landmark Point Assignment Using Feature Points and AFM

The existing ASM algorithm manually assigns landmark point on the object’sboundary to make a training set [14]. A good landmark point has balanced dis-tance between adjacent landmark points and resides on either high-curvature or’T’ junction position. A good feature point, however, has a different requirement

74 S. Kim et al.

from that of a good landmark point. In other words, a feature point is recom-mended to locate inside the object because a feature point on the boundary ofthe object easily fails in optical flow or block matching-based tracking [15] dueto the effect of changing, complicated background.

Consider n feature points from an element shape in the training set. Weupdate this training set at each frame of input video, and at the same time alignthe shape onto the image coordinate using Procrustes analysis [16]. In this workthe training set has 70 element shapes. Given a set of feature points the inputfeature can be modeled by using principal component analysis (PCA).

In order to track the target object we have to find the best feature-basedlandmark points which match the object and the model. In each iteration thefeature-based landmark points selected by PCA algorithm are relocated to newposition by local feature fitting. The local feature fitting algorithm uses a block-based correlation between the object and the model. The best parameters thatrepresent the optimal location of feature points of the object, can be obtainedby matching the feature points in the training set to those of the real image.Here the existing block matching algorithms can be used for the block-basedcorrelation. Figure 3 shows the result of optical flow-based model fitting with 51training sets.

(a) (b) (c)

Fig. 3. Model fitting procedure of NPT-AFM: (a) optical flow-based feature trackingat the 40th frame, (b) model fitting at the 74th frame, and (c) model fitting at the92nd frame.

4.2 Reconstruction of Feature Model and Occlusion Handling

In spite of theoretical completeness of the AFM algorithm,a feature model ob-tained from the local feature fitting step does not always match the real objectbecause it has been constructed using a training set of features in the previousframe. A few mismatches between the feature model and the real object can befound in Fig. 3.

Using a Non-prior Training Active Feature Model 75

(a) (b) (c)

Fig. 4. Reconstruction of the feature model: (a) feature model fitting result, (b) relo-cation of an outside feature point for feature reconstruction, and (c) result of featurereconstruction.

The proposed feature reconstruction algorithm move an outside featurepoint toward the average position of all feasible feature points, which meansfeature points inside the object. While moving the outside feature point, wesearch the best path among three directions toward the average position. If thenumber of outside feature points is more than 60% of the total feature points,the feature extraction process is repeated. The feature reconstruction process isdepicted in Fig. 4

In addition to reconstructing feature model, occlusion handling is anotherimportant function in a realistic tracking algorithm. The proposed NPT-AFMbased occlusion handling algorithm first detects occlusion if the labeling regionis 1.6 times lager than the original labeling region. The decision is made withadditional information such as motion direction and size in the correspondinglylabeled object region. If an occlusion is detected, we preserve the previous la-beling information to keep multiple object’s feature models separately. Afterhandling the occlusion, the feature model should be reconstructed every time.This reconstruction process is performed the size of labeled region is between0.8L and 1.2L, where L represents the original size of the labeled region.

5 Experimental Results

We used 320 by 240, indoor and outdoor video sequences to test tracking bothrigid and deformable objects. In most experimental images bright (yellow) cir-cles represent successfully tracked feature points while dark (blue) does onesrepresent corrected feature points.

For rigid object tracking, we captured an indoor robot video sequence usinga Pelco Spectra pan-tilt-zoom (PTZ) camera. We applied the proposed trackingalgorithm while the PTZ camera does not move. Once the camera view haschanged, we received the relative coordinate information from the camera, andrestarted tracking in the compensated image coordinate. The result of trackingis shown in Fig. 5.

76 S. Kim et al.

(a) (b) (c)

(d)

Fig. 5. Feature tracking of a rigid object and the resulting trajectory: (a) motion-basedsegmentation result of the 3rd frame, (b) the 10th frame, (c) the 34th frame, and (d)the corresponding trajectory of each frame.

For deformable object tracking, we captured indoor and outdoor human se-quences using SONY 3CCD DC-393 color video camera with auto-iris function.Tracking results using the proposed algorithm are shown in Fig .6. Predictedfeature points are classified into two classes: successful and reconstructed, whichare separately displayed in Fig. 6.

In order to track a deformable object under occlusion, we applied the pro-posed NPT-AFM-based tracking algorithm. Results of occlusion handling by theproposed NPT-AFM are shown in Fig. 7. By using the NPT-AFM, the proposedtracking algorithm could successfully track an object with occlusion up to 85%.

6 Conclusions

We presented a novel method for tracking both rigid and deformable objects invideo sequences. The proposed tracking algorithm segments object’s region basedon motion, extracts feature points, predicts the corresponding feature points inthe next frame using optical flow, corrects and reconstructs incorrectly predictedfeature points, and finally applies NPT-AFM to handle occlusion problems.

NPT-AFM, which is the major contribution of this paper, removes the off-line, preprocessing step for generating a priori training set. The training set usedfor model fitting can be updated at each frame to make more robust object’sshape under occlude situation. The on-line updating of the training set canrealize a real-time, robust tracking system. Experimental results prove that the

Using a Non-prior Training Active Feature Model 77

(a) Man-a 4th frame (b) Man-a 34th frame (c) Man-a 57th frame

(d) Man-b 27th frame (e) Man-b 75th frame (f) Man-b 113rd frame

(g) Man-c 5th frame (h) Man-c 31st frame (i) Man-c 43rd frame

Fig. 6. Feature tracking of deformable object in both indoor and outdoor sequences:Bright (yellow) circles represent successfully predicted feature points while dark (blue)circles represent corrected, reconstructed points.

(a) 106th frame (b) 125th frame (c) 165nd frame

Fig. 7. Occlusion handling results using the proposed NPT-AFM algorithm

proposed algorithm can track both rigid and deformable objects under variousconditions, and it can also track the object-of-interest with partial occlusion andcomplicated background.

78 S. Kim et al.

References

1. Haritaoglu, I., Harwood, D., Davis, L.: W-4: Real-Time Surveillance of People andTheir Activities. IEEE Trans. On Pattern Analysis and Machine Intelligence (2000)809–830

2. McKenna, S., Raja, Y., Gong, S.: Tracking Contour Objects Using Adaptive Mix-ture Models. Image and Vision Computing (1999) 225–231

3. Plankers, R., Fua, P.: Tracking and Modeling People in Video Sequences. ComputerVision and Image Understanding (2001) 285–302

4. Comaniciu, D., Ramesh, V., Meer, P.: Kernal-Based Object Tracking. IEEE Trans.On Pattern Analysis and Machine Intelligence (2003) 564–577

5. Wren, C., Azerbeyejani, A., Darrel, T., Pentland, A.: Pfinder: Real-Time Track-ing of The Human Body. IEEE Trans. Pattern Analysis and Machine Intelligence(1997) 780–785

6. Comaniciu, D., Ramesh, V., Meer, P.: Real-Time tracking of non-rigid objectsusing mean shift. Proc. IEEE Int. Conf. Computer Vision, Pattern Recognition(2000) 142–149

7. Baumberg, A.: Learning Deformable Models for Tracking Human Motion. Ph.D.Dissertation, School of Comput. Studies (1995)

8. Kass, M., Witkin, A., Terzopoulos, D.: Snake, Active Contour Models. Interna-tional Journal of Computer Vision (1988) 321–331

9. Blake, A., Isard, M.: Active Contours. Springer, London, England (1998)10. Cootes, T., Cooper, D., Taylor, C., Graham, J.: Active Shape Models - Their

Training and Application. Comput. Image and Vision Understanding 61 (1995)38–59

11. Bruce, D., Lucas, D., Kanade, T.: An Iterative Image Registration Technique withan Application to Stereo Vision. In Proc. DARPA Image Understanding Workshop(1981) 121–300

12. Erdem, C. E., Tekalp, A. M., Sankur, B.: Non-Rigid Object Tracking Using Perfor-mance Evaluation Measures as Feedback. Proc. IEEE Int. Conf. Computer Visionand Pattern Recognition (2001) 323–330

13. Shi, J., Tomasi, C.: Good Features to Track. Proc. IEEE Int. Conf. ComputerVision and Pattern Recognition (1994) 593–600

14. Koschan, A., Kang, S., Paik, J., Abidi, B., Abidi, M.: Color Active Shape Modelsfor Tracking Non-rigid Objects. Pattern Recognition Letters (2003) 1751–1765

15. Gharavi, H., Mills, M.: Block-Matching Motion Estimation Algorithms: New Re-sults. IEEE Trans. Circ. and Syst (1990) 649–651

16. Goodall, C.: Procrustes Method in The Statistical Analysis of Shape. Journal ofThe Royal Statistical Society B (1991) 285–339