11
Research Article Single-Object Tracking Algorithm Based on Two-Step Spatiotemporal Deep Feature Fusion in a Complex Surveillance Scenario Yanyan Chen 1 and Rui Sheng 2 1 Jiuzhou Polytechnic, Xuzhou 221116, China 2 Southwest China Institute of Electronic Technology, Chengdu 610036, China Correspondence should be addressed to Yanyan Chen; [email protected] Received 3 November 2020; Revised 18 December 2020; Accepted 26 December 2020; Published 5 January 2021 Academic Editor: Yi-Zhang Jiang Copyright©2021YanyanChenandRuiSheng.isisanopenaccessarticledistributedundertheCreativeCommonsAttribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Object tracking has been one of the most active research directions in the field of computer vision. In this paper, an effective single- object tracking algorithm based on two-step spatiotemporal feature fusion is proposed, which combines deep learning detection with the kernelized correlation filtering (KCF) tracking algorithm. Deep learning detection is adopted to obtain more accurate spatial position and scale information and reduce the cumulative error. In addition, the improved KCF algorithm is adopted to track and calculate the temporal information correlation of gradient features between video frames, so as to reduce the probability of missing detection and ensure the running speed. In the process of tracking, the spatiotemporal information is fused through feature analysis. A large number of experiment results show that our proposed algorithm has more tracking performance than the traditional KCF algorithm and can efficiently continuously detect and track objects in different complex scenes, which is suitable for engineering application. 1.Introduction With the rapid development of computer vision technology, video-based object tracking algorithms have become a re- search hotspot in research institutes and universities at home and abroad [1]. Object tracking technology usually builds a robust model based on the object and its background in- formation in the video to predict the shape, size, position, trajectory, and other motion states of the object in the video, which can achieve more advanced tasks, such as the behavior prediction, scene understanding, and situation awareness [2]. Object tracking currently has a wide range of application fields, including video surveillance [3], unmanned driving [4], military guidance [5], UAV reconnaissance, intelligent transportation, and human-computer interaction [6]. It has important research value. In recent years, many effective object tracking algorithms have been proposed. Generally speaking, object tracking algorithms are divided into generative tracking algorithms and discriminative tracking algorithms according to dif- ferent judgment methods [7]. e current main research direction is focused on discriminative tracking algorithms and has gradually occupied a dominant position in the field of visual object tracking and has achieved a series of excellent research models. Different from the generative-based tracking algorithm, the discriminative-based tracking al- gorithm does not ignore the background information, but regards the object tracking as a two-classification problem, where the object area of the current frame can be tracked by designing a classifier to distinguish the object and the background area [8]. e Struck tracking algorithm proposed by Sam et al. [9] in 2011 directly outputs the tracking results by introducing an output feature space mapping and uses a support vector machine to train the classifier, which improves the tracking accuracy and further accelerates the tracking speed of the Hindawi Mathematical Problems in Engineering Volume 2021, Article ID 6653954, 11 pages https://doi.org/10.1155/2021/6653954

Single-ObjectTrackingAlgorithmBasedonTwo-Step

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Research ArticleSingle-Object Tracking Algorithm Based on Two-StepSpatiotemporal Deep Feature Fusion in a ComplexSurveillance Scenario

Yanyan Chen 1 and Rui Sheng2

1Jiuzhou Polytechnic Xuzhou 221116 China2Southwest China Institute of Electronic Technology Chengdu 610036 China

Correspondence should be addressed to Yanyan Chen chenyanyanjzpeducn

Received 3 November 2020 Revised 18 December 2020 Accepted 26 December 2020 Published 5 January 2021

Academic Editor Yi-Zhang Jiang

Copyright copy 2021YanyanChen andRui Shengis is an open access article distributed under theCreative CommonsAttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited

Object tracking has been one of themost active research directions in the field of computer vision In this paper an effective single-object tracking algorithm based on two-step spatiotemporal feature fusion is proposed which combines deep learning detectionwith the kernelized correlation filtering (KCF) tracking algorithm Deep learning detection is adopted to obtain more accuratespatial position and scale information and reduce the cumulative error In addition the improved KCF algorithm is adopted totrack and calculate the temporal information correlation of gradient features between video frames so as to reduce the probabilityof missing detection and ensure the running speed In the process of tracking the spatiotemporal information is fused throughfeature analysis A large number of experiment results show that our proposed algorithm has more tracking performance than thetraditional KCF algorithm and can efficiently continuously detect and track objects in different complex scenes which is suitablefor engineering application

1 Introduction

With the rapid development of computer vision technologyvideo-based object tracking algorithms have become a re-search hotspot in research institutes and universities at homeand abroad [1] Object tracking technology usually builds arobust model based on the object and its background in-formation in the video to predict the shape size positiontrajectory and other motion states of the object in the videowhich can achievemore advanced tasks such as the behaviorprediction scene understanding and situation awareness[2] Object tracking currently has a wide range of applicationfields including video surveillance [3] unmanned driving[4] military guidance [5] UAV reconnaissance intelligenttransportation and human-computer interaction [6] It hasimportant research value

In recent years many effective object tracking algorithmshave been proposed Generally speaking object tracking

algorithms are divided into generative tracking algorithmsand discriminative tracking algorithms according to dif-ferent judgment methods [7] e current main researchdirection is focused on discriminative tracking algorithmsand has gradually occupied a dominant position in the fieldof visual object tracking and has achieved a series of excellentresearch models Different from the generative-basedtracking algorithm the discriminative-based tracking al-gorithm does not ignore the background information butregards the object tracking as a two-classification problemwhere the object area of the current frame can be tracked bydesigning a classifier to distinguish the object and thebackground area [8]

e Struck tracking algorithm proposed by Sam et al [9]in 2011 directly outputs the tracking results by introducingan output feature space mapping and uses a support vectormachine to train the classifier which improves the trackingaccuracy and further accelerates the tracking speed of the

HindawiMathematical Problems in EngineeringVolume 2021 Article ID 6653954 11 pageshttpsdoiorg10115520216653954

algorithm Kalal et al proposed a tracking learning detection(TLD) algorithm on the basis of online learning which has abetter tracking effect for long-term tracking under complexbackground [10] Bolme et al proposed the minimumoutput sum of squared error (MOSSE) tracking algorithmand introduced correlation filtering into the object trackingalgorithm for the first time but the used grayscale featuresare too simple to adapt all scenarios [11]erefore there aremany algorithms to improve on it since then Henriqueset al introduced the kernel function mapping into theoriginal MOSSE algorithm and proposed a circulantstructure of tracking by detection with kernels (CSK) andadopted the cycle shifting method for dense sampling [12]However the CSK tracking algorithm did not improve theselection of features but still used the image gray featureswhich makes the feature characterization ability of the objectnot strong On the basis of the CSK algorithm Henriqueset al [13] used multichannel HOG features instead of single-channel gray features and proposed the kernelized corre-lation filtering (KCF) tracking algorithm and enhanced therobustness of the existing tracking algorithm Moreover theKCF algorithm uses a circulant matrix for sampling whichreduces the complexity of the algorithm and improves thespeed of tracking However the KCF algorithm has a poortracking effect on scale variations [14] In order to solve theseproblems Li and Zhu [15] proposed the scale adaptive kernelcorrelation filter (SAMF) tracking algorithm which intro-duced the concept of scale pooling for the first time etracking effect of objects with scale changes is better than theKCF algorithm e detection is performed on images ofseveral scales so the tracking speed of the SAMF algorithm isvery slow which cannot meet the real-time requirements In2017 Danelljan et al [16] proposed the context awarecorrelation filtering (CALF) algorithm where the filter wastrained by strengthening background information so thatthe CALF algorithm can maintain better performance forobject tracking with complex background On the basis ofthe SRDCF tracking algorithm the spatial-temporal regu-larized correlation filter (STRCF) was proposed in which atemporal regularization term is introduced into the SRDCFalgorithm and can effectively suppress the boundary effect[17]

With the continuous development of neural networksand deep learning the deep features learned by machinescan better extract the most essential image informationerefore some scholars have proposed a series of objecttracking algorithms based on deep features e hierarchicalconvolutional features (HCF) tracking algorithm used threeconvolutional layers in the VGG network to obtain imagedeep features and three different templates are obtainedthrough training [18] then the obtained three confidencemaps are weighted and fused to obtain the object position[19] Similarly Danelljan et al used deep features to replacethe original SRDCF algorithm and proposed theDeepSRDCF tracking algorithm which greatly improved thetracking accuracy of the object tracking algorithm e deepmodel tracking algorithms proposed above all use the imagedeep features extracted by the convolutional neural networkfor object tracking In addition the fully convolutional

network (FCT) tracking algorithm uses the regressionnetwork based on deep learning to predict the object po-sition so as to accurately track the object In 2018 Zhonget al [20] proposed the unveiling the power of deep tracking(UPDT) algorithm on the basis of the ECO algorithm Byanalyzing the impact of deep features and shallow featureson tracking accuracy a novel feature fusion strategy wasproposed to improve the tracking performance of the al-gorithm Xue and Wang [21] proposed a SiamRPN algo-rithm and Siamese network structure based on RPN givingup the use of traditional multiscale training and onlinetracking thereby improving the tracking speed to a certainextent In CVPR2019 Wang et al proposed an accuratetracking by overlap maximization (ATOM) algorithmwhich introduced the idea of IoUNet object detection andthe object classification module so as to have more powerfuldiscrimination ability for the tracker [22]

It can be seen from the above analysis that the traditionalalgorithms have high tracking speed but their anti-inter-ference ability is still insufficient e tracking algorithmsbased on a deep model can be adapted to most complexscenes but they consume a lot of hardware resources andhave poor real-time tracking performance In this paper anobject tracking model based on two-step spatiotemporalinformation fusion is proposed which uses deep learningdetection to obtain more accurate spatial position and scaleinformation reducing the cumulative error In addition thealgorithm uses KCF to track and calculate the temporalinformation correlation of gradient features between videoframes so as to reduce the probability of missing detectionand ensure the running speed In the process of tracking thedetection is run after a certain number of image frames andthe spatiotemporal information is fused through featureanalysis Under the condition of ensuring the tracking speedand accuracy it can also detect the new object in the complexvideo in time and track continuously for a long time

2 Problem Description for Object Tracking

In this paper we mainly study single-object tracking in acomplex video As shown in Figure 1 the basic framework ofthe single-object tracking algorithm mainly includes fourparts feature model motion model observation model andonline updating mechanism Each part has its own specialrole In other words the four aspects are mutually rein-forcing and indispensable parts of an integral whole efeature model is designed to use image processing tech-nology to obtain information that can characterize theappearance of the object and serve the construction of theobservation model e features suitable for object trackingare gray feature color feature histogram of oriented gra-dient feature deep feature etc the motion model mainlyprovides a set of candidate states that the object may appearin the current frame based on the context information of theobject the role of the observation model is to predict thestate of the object on the basis of the candidate state providedby the feature model and the motion model the onlineupdating mechanism allows the observation model to adapt

2 Mathematical Problems in Engineering

the changes of the object and background and ensures thatthe observation model does not degenerate

ere are many interference factors in the video trackingtask and it faces a series of difficulties in practical trackingapplications such as appearance change illumination var-iation partial occlusion and complex background In objectappearance changes it refers to the change of the trackedobjectrsquos appearance or the shooting angle of the cameraduring the movement as shown in Figure 2(a) e illu-mination variation refers to the change of video imaginggray due to changes in the light source or the surroundingenvironment as shown in Figure 2(b) Scale changes refer tothe change of the pixel size of the object in the video due tothe movement of the object or the change of the distance asshown in Figure 2(c) Partial occlusion or object losing refersto an interference phenomenon where the object is affectedby the background or moved out of the field of viewresulting in an incomplete appearance or completely out ofthe field of view as shown in Figure 2(d) e complexbackground refers to a large number of interference factors(such as a large number of similar objects) in the back-ground which causes interference to the object observationmodel In addition there are other interference factors suchas fast movement small objects and blurring during thetracking process ese interference factors limit the per-formance of the tracking model to varying degrees resultingin a decrease in the overall accuracy With the developmentof object tracking technology although some problems havebeen solved such as the use of HOG features to effectivelysolve the problem of illumination changes in tracking tasksthere are still many problems need to be solved in the actualapplication process In this paper we mainly focus onsolving the problem of partial occlusion and object recapturein the process of object tracking

3 Our Proposed Tracking Algorithms

Object detection and tracking based on spatiotemporalinformation fusion is mainly divided into three parts objectdetection based on deep spatial information KCF trackingbased on temporal information and fusion of spatiotem-poral information Firstly the You Only Look Once (YOLO-V3) detector is used to detect the object And then the KCFtracking model is used to track the object in a complexsurveillance video [23] After tracking a certain number offrames the YOLO-V3 detection mechanism is adoptedagain to compare the confidence of the old tracking

bounding box and the new detection bounding boxrough the spatiotemporal information fusion strategy theappropriate bounding box is obtained to continue trackingIf a new object is detected in the field of view the new objectis tracked at the same time e overall detection andtracking system is shown in Figure 3

31 Object Detection Based on Deep Spatial InformationIn this paper we use the framework of the YOLO-V3 deepmodel to realize the object detection and we also redesignthe bounding box selective search method to improve thedetection accuracy of the object spatial information Firstlythe input image features are fully extracted by the basicnetwork through iterative convolution operation and thenfurther feature extraction and analysis are carried outthrough the additional network e object position offset ispredicted and classified by using a convolution predictorFinally the redundancy is removed by the nonmaximumsuppression method e basic network uses the improvedVGG structure as the feature extraction network Twoconvolution layers are used at the end of the network toreplace the two fully connected layers of the original VGGnetwork and eight additional networks are added to furtherimprove the feature extraction ability It is widely knownthat different depth feature maps have different receptivefields and different responses to different scale objects enetwork structure is shown in Figure 4

e detection of multiscale objects is divided into 3 stepsdefault boxes with the different aspect ratio and same areaare generated on different scale feature maps after training alarge number of samples the convolution predictor uses theabstract features in the default box as an input to predict theoffset of the default bounding box nonmaximum sup-pression is used to remove redundant bounding boxes withlow confidence

e default bounding box generation method is im-proved as follows Firstly assuming that it is necessary tomake predictions on a total of m feature maps the area(scale) sk of the default bounding box on the first k minus thfeature map can be written as follows

sk smin +smax minus smin( 1113857

(m minus 1)(k minus 1) k isin [1 m] (1)

where m 6 the minimum area is 02 and the maximumarea is 095 In this paper the K-means clustering algorithmis used to process the aspect ratio of all suspected objects in

Initial position Motion model Feature model Observation model

Online updating

Object location

Figure 1 Basic framework of the single-object tracking algorithm

Mathematical Problems in Engineering 3

the dataset and 5 cluster centers are obtained ereforethe new aspect ratio is denoted as ar isin 1 18 25 38 5 which provides a better initial bounding box for objectdetection

YOLO-V3 convolutional network is used to obtain thecoordinate offset from the fixed default bounding box to theactual benchmark value and the category score and obtainthe loss function through the normalization and weighting

Initial bounding box

YOLO-V3 KCF trackingTemporal information

Deep learningSpatial information

Fusion of spatiotemporalinformation

Adaptive confidence discrimination

Recapture mechanism

Figure 3 Overall detection and tracking framework for our model

3 times 3

2 times 2

3 times 3

3 times 3

Conv1

3 times 3

2 times 2

2 times 2

2 times 2

3 times 3

3 times 3

3 times 33 times 3

3 times 3

3 times 3

2 times 2

2 times 2

Passthrough layer

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

Pool2

Conv2Pool1Conv4Pool3 Conv6 Conv9Pool5

Conv3 Conv5 Conv7 Conv8Pool4

Pred

ictio

n m

odul

e

NM

S

Figure 4 e network structure for object detection

(a) (b)

(c) (d)

Figure 2 Samples for different interference factors (a) Scale change (b) Illumination variation (c) Appearance change (d) Partialocclusion

4 Mathematical Problems in Engineering

of the category score and the coordinate offset ereforethe loss function can be described as follows

L xkij c l g1113872 1113873

1N

Lconf xkij c1113872 1113873 + αLloc x

kij l g1113872 11138731113960 1113961 (2)

where xkij 1 means that the candidate bounding box i

matches the object real bounding box j with category p

successfully and otherwise xkij 0 means the match fails N

is the number of candidate bounding box that can bematched with the true value Lloc is the position loss functionsmooth L1 loss and α is set to 1e network parameters canbe optimized according to the result of the loss function

32 KCF Tracking Based on Temporal Information KCFalgorithm is a classical discriminative-based object trackingalgorithm which has good performance in tracking speed andtracking accuracy In the tracking process the objectbounding box of the KCF algorithm has been set and the sizeof the object scale has not changed from beginning to endHowever the object size often changes in the tracking videosequence which will lead to the drift of the bounding box inthe tracking process of the tracker even resulting in trackingfailure In addition the KCF algorithm cannot deal with theocclusion of the object in the tracking process which will leadto the feature extraction error when training the filter modelWhen the object moves rapidly some object features cannotbe extracted because of the fixed size of the searching boxwhere the quality of the detection model will be reduced andthe tracking failure will be caused when updating the modelIn order to solve the problem of tracking failure caused by theKCF algorithm in the above situations some scholars im-proved the KCF algorithm and proposed some novel yeteffective object tracking algorithms based on deep learningdetection and a large number of experiment results show thatthe improved algorithm has better accuracy and robustnessthan the original KCF algorithm

As for complex monitoring applications the real-timeperformance of object tracking is very important We selectKCF as the basic tracking algorithm which has a greateradvantage in speed In addition considering the charac-teristics of large changes in object scale a multiscale adaptivemodule is added in KCF HOG features are adopted to trainthe classifier and transform it into a ridge regression modelso as to establish themapping relationship between the inputsample variable x and the output response y e ridgeregression objective function can be rewritten as follows

minω

1113944i

f xi( 1113857 minus yi( 11138572

+ λω2 (3)

where λ(λge 0) is a regularization parameter e regulari-zation term is added to avoid the occurrence of overfitting inoptimization In order to minimize the gap between thesample label predicted by the regression model and the reallabel a weight coefficient is assigned to each sample to obtaina closed solution formula for the regression parameterserefore the analytical solution ω can be deduced andrepresented as

ω XTX + λI1113872 1113873

minus1X

Ty (4)

Due to the time-consuming calculation of dense sam-pling in equation (3) cyclic shifting is used to constructtraining samples and the problem domain is transformedinto the discrete Fourier domain e characteristics of thecirculant matrix can avoid the process of matrix inversionand accelerate feature space learning e circulant matrixcan be diagonalized and this can be described as follows

X Fdiag(1113954x)FH

(5)

In order to simplify the calculation the features obtainedby ridge regression with linear space are mapped to thenonlinear space through the kernel function and a dualproblem is solved in the nonlinear spacerough themappingfunction ϕ(x) the classifier can be denoted as follows

f xi( 1113857 ωTϕ xi( 1113857 (6)

Given ω 1113936 iaiϕ(xi) the solution of ω can be trans-formed into the solution of α erefore on the basis of thekernel function K ϕ(X)ϕ(X)T we can get the solutionbased on the ridge regression under the kernel functionnamely

α (K + λI)minus1

y (7)

Finally we can get the response results of all test samplesin the Fourier domain

f(z) 1113954kxzΘ1113954α (8)

e sample with the strongest response is selected as theobject position in the current frame

e overall framework of the tracking algorithm isshown in Figure 5 First the object is initialized in the firstframe and the features of the object are extracted and thenthe ridge regression model is trained to obtain the optimalfilter parameters then in the process of object tracking thefeature is extracted on the current frame and convolutionoperation is performed with the filter template trained in theprevious frame We can get the response map where themaximum correlation value is the object position

In order to adapt the change of the object scale a scaleadaptive strategy is developed to ensure the stability oftracking Taking the object position as the center rectan-gular bounding boxes with different scales are selected assamples and their HOG features are extracted respectivelyerefore we can get the respective sample responsesR0 R+1 andRminus1 after tracking the classifier and obtain thestrongest response after comparison

R max R0 R+1 Rminus1( 1113857 (9)

e rectangular bounding box corresponding to thesample with the strongest response is the current objectscale where the improved KCF can be used for multiscaleadaptation selection and the amount of calculation is smalland efficient and feasible

Mathematical Problems in Engineering 5

33 Object Detection and Tracking for Spatiotemporal FusionAs we all know using deep learning for object detection toextract single-frame image features has high accuracy canidentify and classify unknown objects and has high ro-bustness However the object detection does not combinethe temporal information relationship between the con-secutive frame in the video which may lead to the misseddetection and slow running speed KCF tracking isachieved by extracting the characteristics of continuousframe images to train filters in ridge regression where thecalculation is small and the processing speed is also fastHowever it is easy to accumulate errors because oftracking drift and be easily affected by object occlusionand background interference erefore the fusion oftemporal information and spatial information can makefull use of the advantages of deep learning and KCFimprove overall performance and achieve more accurateand stable detection and tracking on the basis of ro-bustness and real-time performance

In the process of information fusion the spatial positioninformation of the object is determined by the deeplearning-based object detection algorithm in the first frameand then the position of the object in the first frame is used asthe input of the KCF tracking algorithm and the trackingalgorithm is used to track the object in the following framesAfter tracking a fixed number of frames the detectionmechanism is run to ensure the accuracy of continuousdetection and tracking through the YOLO-V3 detectionalgorithm e number of tracking frames between the twodetection operations can be determined by experimentGenerally it can be set to 50 frames In addition we can alsouse the confidence of the detection results as the basis oftemplate refresh and recapture

After running the redetection mechanism it is not surewhich one is better to track candidate bounding boxes ordetect candidate bounding boxes obtained by the redetectionmodule erefore this paper designs a candidate frameselection strategy Firstly the overlap ratio between detec-tion candidate bounding box Si and tracking candidatebounding box Kj is calculated to judge whether the detectedand tracked objects are the same In this paper the inter-section over union (IOU) is used as the criterion of overlap

ratio e IOU of two candidate bounding boxes can bewritten as follows

IOU Si capKj

Si capKj (10)

If forallKj IOU(Si Kj)lt 04 Si will be regarded as a newobject and output to achieve the initialization of the trackingalgorithm If existKj IOU(Si Kj)lt 04 it is considered that thedetection bounding box Si and the tracking bounding boxKj

have detected the same object then the confidence levelconf(Si) of the bounding box of the detection algorithm iscompared with the normalized response conf(Kj) of thebounding box of the tracking algorithm Finally thebounding box with higher confidence is taken as the outputof the system

4 Experimental Results and Analysis

41 Dataset and Verification Platform In order to improvethe accuracy and robustness of the detection and trackingalgorithm in the video surveillance task this experimentconstructs a surveillance dataset with 321550 images Tofacilitate performance analysis all data are labeled frame byframe in scale and position and classified according to theinterference state

e improved detection and tracking model is dividedinto three parts object detection based on deep spatial in-formation KCF tracking based on temporal informationand fusion of spatiotemporal informatione parameters ofeach part are consistent with the original model Duringoffline training all convolution layers will be updated Afteronline updating the parameters of the shallow convolutionlayer are fixed and the last two convolution layers will befine-tuned according to the test data During the trainingthe YOLO-V3 model trained by Pascal VOC2007 [24] isused as the initial weight parameter to fine-tune the networkwhere the learning rate is set to 0001 and the weight at-tenuation was 00005 30000 iteration times in training wereconducted on NVIDIA Geforce GTX 1080TI e KCFmodule uses the peak-side-lobe ratio to select the optimaltracking point and the threshold of normalized response isset to 065 If the regression response score is less than 065 it

Training and updating

Extracting hogfeatures

Training trackingfilter

KCF trackingmodel

Convolution Filter responsemap

Maximumresponse value Output

Feature extraction1st frame

T-th frame

Extracting hogfeatures

Figure 5 e overall framework of the tracking algorithm

6 Mathematical Problems in Engineering

is considered that the tracking is failed and the improvedYOLO-V3 detection network is used to recapture the op-timal object

In this paper eight representative subsets from videosurveillance are selected for verification where characteristicfor partial sequences is described in Table 1 For examplevideo 1 shows the similarity background occlusion and fastmotion video 2 shows the similarity background fastmotion and rotation video 3 and 4 show the occlusionrotation and attitude change and video 5 shows the fastmotion illumination and similarity background esimulation platform is AMD Ryzen 5 3500U host with31GHz and 8GB RAM

In this paper center error (CE) and overlap rate (OR) areused to compare and analyze the experimental results [19]e former is the relative number of frames whose centerposition error is less than a certain threshold and the latter isthe percentage of frames whose overlap rate of the objectbounding box exceeds the threshold In this paper theposition error of 20 and the overlap rate of 06 are selected asthe threshold of tracking success Because of the differentthresholds there are great differences in quantitative anal-ysis erefore precision plot and success plot are used toquantitatively analyze the performance of the comparisonalgorithms

42AblationAnalysis Our proposed method in this paper isan improved tracking method based on KCF to achieve theeffect of scale adaptation In order to illustrate the effec-tiveness the comparison experiment in this paper selectstracking methods with adaptive scale capabilities for com-parison such as KCF SAMF DSST CFNet [23] SiamRPN[24] and DKCF [25] where precision refers to the errorbetween the tracking point and the labeled point It can beknown that the result of KCF only updates the position of theobject (x y) and the size of the object remains unchanged sothe adaptability to the change of the object scale is relativelypoor SAMF is also a modified algorithm on the basis ofKCF and the object feature adds color features (color nameCN) which means that HOG features and CN features arecombined In addition multiscales 1 0985 099 0995 1005101 101 1015 are added to the scale pooling and theoptimal scale is cyclically selected at the expense of trackingspeed DSST uses two mutually independent filters for scalecalculation and object positioning where 17 scale changefactors and 33 interpolated scale change factors are estab-lished for scale evaluation and object positioning SiamFC isan object tracking algorithm based on a fully convolutionSiamese network where multiscale object fusion is imple-mented through a pyramid strategy to improve trackingaccuracy our proposed algorithm is a detect-before-trackmodel that uses deep neural networks in template updatingand scale adaptation e results of object detection andtracking under the influence of different environments areshown in Table 2 and the precision plot and success plot ofdetection and tracking in 8 different video sequences areshown in Figure 6 It can be seen from Table 2 and Figure 6that compared with video 1 the tracking success rates of

videos 2 3 4 and 5 have different degrees of decline It canbe seen that occlusion scale change motion blur and il-lumination have an impact on the detection and trackingeffect of which occlusion and illumination changes have agreater impact Different degrees of motion blur have dif-ferent effects on detection and tracking When the objectoverlap rate threshold is set to 06 the average detection andtracking accuracy is 7617 and the average speed can reach18 FPS e slower speed of video 2 is caused by the ap-pearance of new objects in the field of view e object scalein video 4 is larger so the detection and tracking time islonger

Video 2 under the condition of object occlusion andvideo 5 under the condition of illumination changes areselected for comparative experiments Our proposedtracking algorithm is compared with a single tracking al-gorithm and detection algorithm Video 2 has the phe-nomenon of object occlusion e experimental results areshown in Table 2 and Figure 6 In terms of center error andoverlap rate the fusion algorithm is obviously better Deeplearning detection algorithm may not be able to detect theobject with too small scale resulting in low recall rate In thelong-term detection and tracking the correlation filteringtracking algorithm will accumulate errors resulting in pooraccuracy Especially for the occluded object the trackingdrift phenomenon is easy to occur ese reasons make thecenter error and overlap rate of the single detection ortracking algorithm not high e fusion algorithm ensures ahigh recall rate through KCF tracking and corrects thecumulative error by YOLO-V3 detection After the object isoccluded it can still recapture the object again and keeptracking which solves the object lost problem in objectdetection and tracking

ere is illumination change in video 5 e experi-mental results are shown in Table 3 and Figure 7 Due tothe influence of illumination change it is difficult todistinguish the illumination and shade between the objectedge and the background which makes the objectbounding box cannot be determined for detection andtracking Even if the object position can be detected and

Table 1 Characteristic for partial sequences

Sequences CharacteristicBenchmarkvideo 1 Similarity background occlusion fast motion

Benchmarkvideo 2 Similarity background fast motion rotation

Benchmarkvideo 3 Occlusion rotation

Benchmarkvideo 4 Fast motion attitude change

Benchmarkvideo 5 Fast motion illumination similarity background

Benchmarkvideo 6 Occlusion similarity background blurry

Benchmarkvideo 7 Occlusion scale change angle of view

Benchmarkvideo 8 Occlusion rotation illumination

Mathematical Problems in Engineering 7

tracked the judgment of the object scale is not accurateerefore the accuracy of center position error is higherbut the overlap rate is lower in the KCF tracking algo-rithm YOLO-V3 detection algorithm has strong ro-bustness but it has the phenomenon of missing detectionerefore simulation results show that our proposedfusion algorithm has better detection and tracking per-formance in the complex environment

43ComparativeExperimentandAnalysis In this paper weselect different detection and tracking algorithms toconduct comparative experiments on single-object videoswhere the SSD and YOLO-V3 algorithms that are widelyused are selected in the spatial dimension and the classicsingle-object tracking algorithms DSST KCF and SAMFare selected in the temporal dimension e experiment isdivided into two parts e first part is a comparison of asingle spatial detection algorithm or a temporal trackingalgorithm with our proposed algorithm the second part isa comparison of different detection and tracking algo-rithm combinations based on the fusion strategy Table 4

shows the comparison results of a single algorithm If thedetection algorithm is compared separately the detectionaccuracy of the YOLO-V3 algorithm is higher Overall thesuccess rate of a single algorithm is much lower than theYOLO-V3 + KCF fusion algorithm is is because thedetection algorithm is affected by the complex back-ground resulting in a large number of missed detectionsthe temporal algorithm will be affected by motion blurand the accumulated error will cause the tracking driftmaking the IOU between tracking result and ground truthless than 06

Table 5 compares the fusion effects of different algorithmsIt can be seen from Table 5 that the YOLO-V3+KCF al-gorithm has the best effect Because the KCF algorithm inTable 4 has a better effect in the tracking algorithm the overalleffect of the YOLO-V3+KCF is also better than SSD+DSSTand SSD+SAMF Because the tracking algorithm usestemporal information to eliminate the missing detection ofthe detection algorithm and the detection algorithm correctsthe drift of the tracking result by accurately detecting a singleobject the success rate of the fusion algorithm detection ismore than that of the single algorithm in Table 4

Figure 8 shows the qualitative results of differentcomparison algorithms Table 6 is a quantitative com-parison for different sequences In Figure 8(a) there arefactors such as object scale changes illumination changesand background interference In the whole trackingprocess only DKCF SiamRPN and our proposed algo-rithm have better tracking results However due to the

Table 2 e quantitative analysis for testing sequences

Indexes Video 1 Video 2 Video 3 Video 4 Video 5 Video 6 Video 7 Video 8CLE 106 296 311 238 189 210 175 87OR 088 072 062 081 068 071 058 073FPS 174 120 209 168 141 191 185 172

10

08

06

04

02

000 10 20 30 40 50

Location error threshold

Prec

ision

Video 2Video 1Video 5

Video 7Video 8Video 3

Video 6Video 4

(a)

10

08

06

04

02

000 10 20 30 40 50

Overlap rate thresholdSu

cces

s rat

e

Video 2Video 1Video 5

Video 7Video 8Video 3

Video 6Video 4

(b)

Figure 6 e tracking results for different testing sequences (a) Precision plot and (b) success plot

Table 3 Detection and tracking results for different modules

Index (average) YOLO-V3 KCF FusionCLE (pixel) 169 146 105OR 0816 0752 0861Frame rate (FPS) 87 460 161

8 Mathematical Problems in Engineering

continuous change of object scale the KCF trackingtemplate-introduced background interference informa-tion gradually accumulates and finally there is a largetracking deviation (such as the 640th frame) Our pro-posed algorithm can automatically adjust the trackingbounding box size according to the object scale changethereby reducing the background interference informa-tion so it can always estimate the location and the scale of

the object the object in video 7 has dramatic changes inillumination and scale (frames 65 110 and 351 inFigure 8(b)) In the whole tracking process only ourproposed algorithm and SiamRPN can complete thetracking of the entire video and other methods cannotadapt to drastic changes in illumination and scale theobject in video 6 has a certain scale and posture changewhere KCF SAMF DSST CFNet SiamRPN and our

10

08

06

04

02

000 10 20 30 40 50

Location error threshold

Prec

ision

YOLO-V3 + KCFYOLO-V3KCF model

(a)

10

08

06

04

02

000 10 20 30 40 50

Overlap rate threshold

Succ

ess r

ate

YOLO-V3 + KCFYOLOKCF model

(b)

Figure 7 Ablation analysis for different modules in video 5 (a) Precision plot and (b) success plot

Table 4 Comparison results for the single tracking algorithm

Modules Object detection Object tracking FusionModel SSD YOLO-V3 DSST SAMF KCF YOLO-V3 +KCFCLE (pixel) 211 169 1727 1328 146 101OR 0524 0816 0711 0625 0752 0841Frame rate (FPS) 65 87 342 261 460 161

Table 5 Tracking performance of different module combinations

Index (average) YOLO-V3+KCF YOLO-V3 +DSST YOLO-V3+SAMF SSD+KCF SSD+KCF SSD+KCFCLE (pixel) 105 136 165 152 183 175OR 0861 0782 0771 0837 0825 0776Frame rate (FPS) 161 132 91 122 109 87

Mathematical Problems in Engineering 9

proposed algorithm have better tracking performance butour OR and CLE are the highest

5 Conclusion

In a complex surveillance video object detection andtracking usually suffers from various environmental in-terference especially scale changes occlusion illuminationchanges and motion blur is paper proposes an objectdetection and tracking model based on spatiotemporalinformation fusion which uses deep learning to detect andextract spatial information improve detection accuracyand avoid object position drift and then an improved KCFtracking is used to track temporal information so as toavoid missed detection finally the spatiotemporal infor-mation fusion strategy is designed to make detection in-formation and tracking information complementation

e results show that our proposed algorithm can effi-ciently continuously detect and track objects in differentcomplex scenes To a certain extent it can cope with theinfluence of the abovementioned environmental interfer-ence factors has both robustness and stable performanceHowever the detection and tracking effect with too smallscale is slightly worse so the next step will be to makeimprovements on it

Data Availability

e labeled dataset used to support the findings of this studyis available from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

ProposedSAMFSiamRPNKCF

CFNetDKCFDSST

(a)

(b)

(c)

Figure 8 Qualitative comparison of different tracking algorithms in different scenarios (a) Video 8 (b) video 7 and (c) video 6

Table 6 Quantitative comparison for different sequences

SequencesSuccess rate Center location error

KCF SAMF DSST DKCF CFNet SiamRPN Our KCF SAMF DSST DKCF CFNet SiamRPN OurVideo 8 078 056 062 070 075 079 086 253 423 244 249 170 144 91Video 7 060 048 065 061 078 081 081 217 286 92 374 212 108 75Video 6 079 067 058 082 067 079 079 127 71 63 126 168 156 93Video 5 061 052 054 077 047 040 042 146 250 114 171 152 29 81Video 4 073 068 071 075 075 088 082 273 223 144 249 170 149 101Video 3 068 070 072 076 079 083 087 317 283 92 374 212 108 84Video 2 059 057 062 065 075 081 085 125 91 63 126 168 156 89Video 1 068 062 066 072 061 079 079 746 230 214 271 152 199 182

10 Mathematical Problems in Engineering

References

[1] C Wu H Sun H Wang et al ldquoOnline multi-object trackingvia combining discriminative correlation filters with makingdecisionrdquo IEEE Access vol 6 pp 43499ndash43512 2018

[2] Y Qi C Wu D Chen and Y Lu Superpixel tracking basedon sparse representationrdquo Journal of Electronics and Infor-mation Technology vol 37 no 3 pp 529ndash535 2015

[3] G Yuan and M Xue ldquoVisual tracking based on sparse densestructure representation and online robust dictionary learn-ingrdquo Journal of Electronics amp Information Technology vol 37no 3 pp 536ndash542 2015

[4] H Luo B-K Zhong and F-S Kong ldquoTracking usingweighted block compressed sensing and location predictionrdquoJournal of Electronics amp Information Technology vol 37 no 5pp 1160ndash1166 2015

[5] Z-Q Hou A-Q Huang W-S Yu and X Liu ldquoVisual objecttracking method based on local patch model and modelupdaterdquo Journal of Electronics amp Information Technologyvol 37 no 6 pp 1357ndash1364 2015

[6] M Xue H Zhu and G-L Yuan ldquoRobust visual trackingbased on online discrimination dictionary learningrdquo Journalof Electronics amp Information Technology vol 37 no 7pp 1654ndash1659 2015

[7] L Matthews T Ishikawa S Baker et al ldquoe template updateproblemrdquo IEEE Transactions on Pattern Analysis andMachineIntelligence vol 26 no 6 pp 810ndash815 2004

[8] D A Ross J Lim R-S Lin and M-H Yang ldquoIncrementallearning for robust visual trackingrdquo International Journal ofComputer Vision vol 77 no 1-3 pp 125ndash141 2008

[9] S Hare S Golodetz A Saffari et al ldquoStruck structuredoutput tracking with kernelsrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 38 no 10 pp 2096ndash2109 2016

[10] Z Kalal K Mikolajczyk and J Matas ldquoTracking-learning-detectionrdquo IEEE Transactions on Pattern Analysis and Ma-chine Intelligence vol 6 no 1 pp 1409ndash1422 2010

[11] D S Bolme J R Beveridge B A Draper and Y M LuildquoVisual object tracking using adaptive correlation filtersrdquo inProceedings of the 2010 IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR)pp 2544ndash2550 San Francisco CA USA June 2010

[12] J F Henriques R Caseiro P Martins and J BatistaldquoExploiting the circulant structure of tracking-by-detectionwith kernelsrdquo in Proceedings of the 12th European conferenceon Computer Vision - Volume Part IV - ECCV 2012 vol 75no 1 pp 702ndash715 Florence Italy October 2012

[13] J F Henriques R Caseiro P Martins and J Batista ldquoHigh-speed tracking with kernelized correlation filtersrdquo IEEETransactions on Pattern Analysis amp Machine Intelligencevol 37 no 3 pp 583ndash596 2015

[14] M Danelljan G Hager F Khan and M Felsberg ldquoAccuratescale estimation for robust visual trackingrdquo in Proceedings ofthe British Machine Vision Conference pp 590ndash604 BMVAPress Nottingham England September 2014

[15] Y Li and J Zhu ldquoA scale adaptive kernel correlation filtertracker with feature integrationrdquo in Proceedings of the 2014European Conference on Computer Vision (ECCV) pp 254ndash265 Springer Zurich Switzerland September 2014

[16] M Danelljan G Hager F S Khan and M FelsbergldquoLearning spatially regularized correlation filters for visualtrackingrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer Vision (ICCV) pp 4310ndash4318 IEEESantiago Chile December 2015

[17] L Bertinetto J Valmadre S Golodetz O Miksik andP H S Torr ldquoStaple complementary learners for real-timetrackingrdquo in Proceedings of the 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) pp 1401ndash1409 Las Vegas NV USA June 2016

[18] F Li C Tian W Zuo L Zhang and M-H Yang ldquoLearningspatial-temporal regularized correlation filters for visualtrackingrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 4904ndash4913 Salt Lake CityUT USA June 2018

[19] M Mueller N Smith and B Ghanem ldquoContext-awarecorrelation filter trackingrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 1396ndash1404 Honolulu HI USA July 2017

[20] W Zhong H Lu andM H Yang ldquoRobust object tracking viasparse collaborative appearance modelrdquo IEEE Transactions onImage Processing A Publication of the IEEE Signal ProcessingSociety vol 23 no 5 pp 2356ndash2368 2014

[21] Y-Z Xue and T Wang ldquoObject tracking based on cost-sensitive Adaboost algorithmrdquo Chinese Journal of GraphicArts vol 21 no 5 pp 544ndash555 2016

[22] Q Wang J Gao J Xing M Zhang and W Hu ldquoDCFnetdiscriminant correlation filters network for visual trackingrdquoin Proceedings of the 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) Honolulu HI USAJuly 2017

[23] J Valmadre L Bertinetto J F Henriques A Vedaldi andP H S Torr ldquoEnd-to-end representation learning for cor-relation filter based trackingrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 5000ndash5008 Honolulu HI USA July 2017

[24] B Li J Yan W Wu Z Zhu and X Hu ldquoHigh performancetracking with Siamese region proposal networkrdquo in Pro-ceedings of the 2018 IEEECVF Conference on Computer Visionand Pattern Recognition pp 8971ndash8980 Salt Lake City UTUSA June 2018

[25] B Uzkent and Y W Seo ldquoEnKCF ensemble of kernelizedcorrelation filters for high-speed object trackingrdquo in Pro-ceedings of the 2018 IEEEWinter Conference on Applications ofComputer Vision pp 77ndash89 Lake Tahoe NV USA March2018

Mathematical Problems in Engineering 11

algorithm Kalal et al proposed a tracking learning detection(TLD) algorithm on the basis of online learning which has abetter tracking effect for long-term tracking under complexbackground [10] Bolme et al proposed the minimumoutput sum of squared error (MOSSE) tracking algorithmand introduced correlation filtering into the object trackingalgorithm for the first time but the used grayscale featuresare too simple to adapt all scenarios [11]erefore there aremany algorithms to improve on it since then Henriqueset al introduced the kernel function mapping into theoriginal MOSSE algorithm and proposed a circulantstructure of tracking by detection with kernels (CSK) andadopted the cycle shifting method for dense sampling [12]However the CSK tracking algorithm did not improve theselection of features but still used the image gray featureswhich makes the feature characterization ability of the objectnot strong On the basis of the CSK algorithm Henriqueset al [13] used multichannel HOG features instead of single-channel gray features and proposed the kernelized corre-lation filtering (KCF) tracking algorithm and enhanced therobustness of the existing tracking algorithm Moreover theKCF algorithm uses a circulant matrix for sampling whichreduces the complexity of the algorithm and improves thespeed of tracking However the KCF algorithm has a poortracking effect on scale variations [14] In order to solve theseproblems Li and Zhu [15] proposed the scale adaptive kernelcorrelation filter (SAMF) tracking algorithm which intro-duced the concept of scale pooling for the first time etracking effect of objects with scale changes is better than theKCF algorithm e detection is performed on images ofseveral scales so the tracking speed of the SAMF algorithm isvery slow which cannot meet the real-time requirements In2017 Danelljan et al [16] proposed the context awarecorrelation filtering (CALF) algorithm where the filter wastrained by strengthening background information so thatthe CALF algorithm can maintain better performance forobject tracking with complex background On the basis ofthe SRDCF tracking algorithm the spatial-temporal regu-larized correlation filter (STRCF) was proposed in which atemporal regularization term is introduced into the SRDCFalgorithm and can effectively suppress the boundary effect[17]

With the continuous development of neural networksand deep learning the deep features learned by machinescan better extract the most essential image informationerefore some scholars have proposed a series of objecttracking algorithms based on deep features e hierarchicalconvolutional features (HCF) tracking algorithm used threeconvolutional layers in the VGG network to obtain imagedeep features and three different templates are obtainedthrough training [18] then the obtained three confidencemaps are weighted and fused to obtain the object position[19] Similarly Danelljan et al used deep features to replacethe original SRDCF algorithm and proposed theDeepSRDCF tracking algorithm which greatly improved thetracking accuracy of the object tracking algorithm e deepmodel tracking algorithms proposed above all use the imagedeep features extracted by the convolutional neural networkfor object tracking In addition the fully convolutional

network (FCT) tracking algorithm uses the regressionnetwork based on deep learning to predict the object po-sition so as to accurately track the object In 2018 Zhonget al [20] proposed the unveiling the power of deep tracking(UPDT) algorithm on the basis of the ECO algorithm Byanalyzing the impact of deep features and shallow featureson tracking accuracy a novel feature fusion strategy wasproposed to improve the tracking performance of the al-gorithm Xue and Wang [21] proposed a SiamRPN algo-rithm and Siamese network structure based on RPN givingup the use of traditional multiscale training and onlinetracking thereby improving the tracking speed to a certainextent In CVPR2019 Wang et al proposed an accuratetracking by overlap maximization (ATOM) algorithmwhich introduced the idea of IoUNet object detection andthe object classification module so as to have more powerfuldiscrimination ability for the tracker [22]

It can be seen from the above analysis that the traditionalalgorithms have high tracking speed but their anti-inter-ference ability is still insufficient e tracking algorithmsbased on a deep model can be adapted to most complexscenes but they consume a lot of hardware resources andhave poor real-time tracking performance In this paper anobject tracking model based on two-step spatiotemporalinformation fusion is proposed which uses deep learningdetection to obtain more accurate spatial position and scaleinformation reducing the cumulative error In addition thealgorithm uses KCF to track and calculate the temporalinformation correlation of gradient features between videoframes so as to reduce the probability of missing detectionand ensure the running speed In the process of tracking thedetection is run after a certain number of image frames andthe spatiotemporal information is fused through featureanalysis Under the condition of ensuring the tracking speedand accuracy it can also detect the new object in the complexvideo in time and track continuously for a long time

2 Problem Description for Object Tracking

In this paper we mainly study single-object tracking in acomplex video As shown in Figure 1 the basic framework ofthe single-object tracking algorithm mainly includes fourparts feature model motion model observation model andonline updating mechanism Each part has its own specialrole In other words the four aspects are mutually rein-forcing and indispensable parts of an integral whole efeature model is designed to use image processing tech-nology to obtain information that can characterize theappearance of the object and serve the construction of theobservation model e features suitable for object trackingare gray feature color feature histogram of oriented gra-dient feature deep feature etc the motion model mainlyprovides a set of candidate states that the object may appearin the current frame based on the context information of theobject the role of the observation model is to predict thestate of the object on the basis of the candidate state providedby the feature model and the motion model the onlineupdating mechanism allows the observation model to adapt

2 Mathematical Problems in Engineering

the changes of the object and background and ensures thatthe observation model does not degenerate

ere are many interference factors in the video trackingtask and it faces a series of difficulties in practical trackingapplications such as appearance change illumination var-iation partial occlusion and complex background In objectappearance changes it refers to the change of the trackedobjectrsquos appearance or the shooting angle of the cameraduring the movement as shown in Figure 2(a) e illu-mination variation refers to the change of video imaginggray due to changes in the light source or the surroundingenvironment as shown in Figure 2(b) Scale changes refer tothe change of the pixel size of the object in the video due tothe movement of the object or the change of the distance asshown in Figure 2(c) Partial occlusion or object losing refersto an interference phenomenon where the object is affectedby the background or moved out of the field of viewresulting in an incomplete appearance or completely out ofthe field of view as shown in Figure 2(d) e complexbackground refers to a large number of interference factors(such as a large number of similar objects) in the back-ground which causes interference to the object observationmodel In addition there are other interference factors suchas fast movement small objects and blurring during thetracking process ese interference factors limit the per-formance of the tracking model to varying degrees resultingin a decrease in the overall accuracy With the developmentof object tracking technology although some problems havebeen solved such as the use of HOG features to effectivelysolve the problem of illumination changes in tracking tasksthere are still many problems need to be solved in the actualapplication process In this paper we mainly focus onsolving the problem of partial occlusion and object recapturein the process of object tracking

3 Our Proposed Tracking Algorithms

Object detection and tracking based on spatiotemporalinformation fusion is mainly divided into three parts objectdetection based on deep spatial information KCF trackingbased on temporal information and fusion of spatiotem-poral information Firstly the You Only Look Once (YOLO-V3) detector is used to detect the object And then the KCFtracking model is used to track the object in a complexsurveillance video [23] After tracking a certain number offrames the YOLO-V3 detection mechanism is adoptedagain to compare the confidence of the old tracking

bounding box and the new detection bounding boxrough the spatiotemporal information fusion strategy theappropriate bounding box is obtained to continue trackingIf a new object is detected in the field of view the new objectis tracked at the same time e overall detection andtracking system is shown in Figure 3

31 Object Detection Based on Deep Spatial InformationIn this paper we use the framework of the YOLO-V3 deepmodel to realize the object detection and we also redesignthe bounding box selective search method to improve thedetection accuracy of the object spatial information Firstlythe input image features are fully extracted by the basicnetwork through iterative convolution operation and thenfurther feature extraction and analysis are carried outthrough the additional network e object position offset ispredicted and classified by using a convolution predictorFinally the redundancy is removed by the nonmaximumsuppression method e basic network uses the improvedVGG structure as the feature extraction network Twoconvolution layers are used at the end of the network toreplace the two fully connected layers of the original VGGnetwork and eight additional networks are added to furtherimprove the feature extraction ability It is widely knownthat different depth feature maps have different receptivefields and different responses to different scale objects enetwork structure is shown in Figure 4

e detection of multiscale objects is divided into 3 stepsdefault boxes with the different aspect ratio and same areaare generated on different scale feature maps after training alarge number of samples the convolution predictor uses theabstract features in the default box as an input to predict theoffset of the default bounding box nonmaximum sup-pression is used to remove redundant bounding boxes withlow confidence

e default bounding box generation method is im-proved as follows Firstly assuming that it is necessary tomake predictions on a total of m feature maps the area(scale) sk of the default bounding box on the first k minus thfeature map can be written as follows

sk smin +smax minus smin( 1113857

(m minus 1)(k minus 1) k isin [1 m] (1)

where m 6 the minimum area is 02 and the maximumarea is 095 In this paper the K-means clustering algorithmis used to process the aspect ratio of all suspected objects in

Initial position Motion model Feature model Observation model

Online updating

Object location

Figure 1 Basic framework of the single-object tracking algorithm

Mathematical Problems in Engineering 3

the dataset and 5 cluster centers are obtained ereforethe new aspect ratio is denoted as ar isin 1 18 25 38 5 which provides a better initial bounding box for objectdetection

YOLO-V3 convolutional network is used to obtain thecoordinate offset from the fixed default bounding box to theactual benchmark value and the category score and obtainthe loss function through the normalization and weighting

Initial bounding box

YOLO-V3 KCF trackingTemporal information

Deep learningSpatial information

Fusion of spatiotemporalinformation

Adaptive confidence discrimination

Recapture mechanism

Figure 3 Overall detection and tracking framework for our model

3 times 3

2 times 2

3 times 3

3 times 3

Conv1

3 times 3

2 times 2

2 times 2

2 times 2

3 times 3

3 times 3

3 times 33 times 3

3 times 3

3 times 3

2 times 2

2 times 2

Passthrough layer

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

Pool2

Conv2Pool1Conv4Pool3 Conv6 Conv9Pool5

Conv3 Conv5 Conv7 Conv8Pool4

Pred

ictio

n m

odul

e

NM

S

Figure 4 e network structure for object detection

(a) (b)

(c) (d)

Figure 2 Samples for different interference factors (a) Scale change (b) Illumination variation (c) Appearance change (d) Partialocclusion

4 Mathematical Problems in Engineering

of the category score and the coordinate offset ereforethe loss function can be described as follows

L xkij c l g1113872 1113873

1N

Lconf xkij c1113872 1113873 + αLloc x

kij l g1113872 11138731113960 1113961 (2)

where xkij 1 means that the candidate bounding box i

matches the object real bounding box j with category p

successfully and otherwise xkij 0 means the match fails N

is the number of candidate bounding box that can bematched with the true value Lloc is the position loss functionsmooth L1 loss and α is set to 1e network parameters canbe optimized according to the result of the loss function

32 KCF Tracking Based on Temporal Information KCFalgorithm is a classical discriminative-based object trackingalgorithm which has good performance in tracking speed andtracking accuracy In the tracking process the objectbounding box of the KCF algorithm has been set and the sizeof the object scale has not changed from beginning to endHowever the object size often changes in the tracking videosequence which will lead to the drift of the bounding box inthe tracking process of the tracker even resulting in trackingfailure In addition the KCF algorithm cannot deal with theocclusion of the object in the tracking process which will leadto the feature extraction error when training the filter modelWhen the object moves rapidly some object features cannotbe extracted because of the fixed size of the searching boxwhere the quality of the detection model will be reduced andthe tracking failure will be caused when updating the modelIn order to solve the problem of tracking failure caused by theKCF algorithm in the above situations some scholars im-proved the KCF algorithm and proposed some novel yeteffective object tracking algorithms based on deep learningdetection and a large number of experiment results show thatthe improved algorithm has better accuracy and robustnessthan the original KCF algorithm

As for complex monitoring applications the real-timeperformance of object tracking is very important We selectKCF as the basic tracking algorithm which has a greateradvantage in speed In addition considering the charac-teristics of large changes in object scale a multiscale adaptivemodule is added in KCF HOG features are adopted to trainthe classifier and transform it into a ridge regression modelso as to establish themapping relationship between the inputsample variable x and the output response y e ridgeregression objective function can be rewritten as follows

minω

1113944i

f xi( 1113857 minus yi( 11138572

+ λω2 (3)

where λ(λge 0) is a regularization parameter e regulari-zation term is added to avoid the occurrence of overfitting inoptimization In order to minimize the gap between thesample label predicted by the regression model and the reallabel a weight coefficient is assigned to each sample to obtaina closed solution formula for the regression parameterserefore the analytical solution ω can be deduced andrepresented as

ω XTX + λI1113872 1113873

minus1X

Ty (4)

Due to the time-consuming calculation of dense sam-pling in equation (3) cyclic shifting is used to constructtraining samples and the problem domain is transformedinto the discrete Fourier domain e characteristics of thecirculant matrix can avoid the process of matrix inversionand accelerate feature space learning e circulant matrixcan be diagonalized and this can be described as follows

X Fdiag(1113954x)FH

(5)

In order to simplify the calculation the features obtainedby ridge regression with linear space are mapped to thenonlinear space through the kernel function and a dualproblem is solved in the nonlinear spacerough themappingfunction ϕ(x) the classifier can be denoted as follows

f xi( 1113857 ωTϕ xi( 1113857 (6)

Given ω 1113936 iaiϕ(xi) the solution of ω can be trans-formed into the solution of α erefore on the basis of thekernel function K ϕ(X)ϕ(X)T we can get the solutionbased on the ridge regression under the kernel functionnamely

α (K + λI)minus1

y (7)

Finally we can get the response results of all test samplesin the Fourier domain

f(z) 1113954kxzΘ1113954α (8)

e sample with the strongest response is selected as theobject position in the current frame

e overall framework of the tracking algorithm isshown in Figure 5 First the object is initialized in the firstframe and the features of the object are extracted and thenthe ridge regression model is trained to obtain the optimalfilter parameters then in the process of object tracking thefeature is extracted on the current frame and convolutionoperation is performed with the filter template trained in theprevious frame We can get the response map where themaximum correlation value is the object position

In order to adapt the change of the object scale a scaleadaptive strategy is developed to ensure the stability oftracking Taking the object position as the center rectan-gular bounding boxes with different scales are selected assamples and their HOG features are extracted respectivelyerefore we can get the respective sample responsesR0 R+1 andRminus1 after tracking the classifier and obtain thestrongest response after comparison

R max R0 R+1 Rminus1( 1113857 (9)

e rectangular bounding box corresponding to thesample with the strongest response is the current objectscale where the improved KCF can be used for multiscaleadaptation selection and the amount of calculation is smalland efficient and feasible

Mathematical Problems in Engineering 5

33 Object Detection and Tracking for Spatiotemporal FusionAs we all know using deep learning for object detection toextract single-frame image features has high accuracy canidentify and classify unknown objects and has high ro-bustness However the object detection does not combinethe temporal information relationship between the con-secutive frame in the video which may lead to the misseddetection and slow running speed KCF tracking isachieved by extracting the characteristics of continuousframe images to train filters in ridge regression where thecalculation is small and the processing speed is also fastHowever it is easy to accumulate errors because oftracking drift and be easily affected by object occlusionand background interference erefore the fusion oftemporal information and spatial information can makefull use of the advantages of deep learning and KCFimprove overall performance and achieve more accurateand stable detection and tracking on the basis of ro-bustness and real-time performance

In the process of information fusion the spatial positioninformation of the object is determined by the deeplearning-based object detection algorithm in the first frameand then the position of the object in the first frame is used asthe input of the KCF tracking algorithm and the trackingalgorithm is used to track the object in the following framesAfter tracking a fixed number of frames the detectionmechanism is run to ensure the accuracy of continuousdetection and tracking through the YOLO-V3 detectionalgorithm e number of tracking frames between the twodetection operations can be determined by experimentGenerally it can be set to 50 frames In addition we can alsouse the confidence of the detection results as the basis oftemplate refresh and recapture

After running the redetection mechanism it is not surewhich one is better to track candidate bounding boxes ordetect candidate bounding boxes obtained by the redetectionmodule erefore this paper designs a candidate frameselection strategy Firstly the overlap ratio between detec-tion candidate bounding box Si and tracking candidatebounding box Kj is calculated to judge whether the detectedand tracked objects are the same In this paper the inter-section over union (IOU) is used as the criterion of overlap

ratio e IOU of two candidate bounding boxes can bewritten as follows

IOU Si capKj

Si capKj (10)

If forallKj IOU(Si Kj)lt 04 Si will be regarded as a newobject and output to achieve the initialization of the trackingalgorithm If existKj IOU(Si Kj)lt 04 it is considered that thedetection bounding box Si and the tracking bounding boxKj

have detected the same object then the confidence levelconf(Si) of the bounding box of the detection algorithm iscompared with the normalized response conf(Kj) of thebounding box of the tracking algorithm Finally thebounding box with higher confidence is taken as the outputof the system

4 Experimental Results and Analysis

41 Dataset and Verification Platform In order to improvethe accuracy and robustness of the detection and trackingalgorithm in the video surveillance task this experimentconstructs a surveillance dataset with 321550 images Tofacilitate performance analysis all data are labeled frame byframe in scale and position and classified according to theinterference state

e improved detection and tracking model is dividedinto three parts object detection based on deep spatial in-formation KCF tracking based on temporal informationand fusion of spatiotemporal informatione parameters ofeach part are consistent with the original model Duringoffline training all convolution layers will be updated Afteronline updating the parameters of the shallow convolutionlayer are fixed and the last two convolution layers will befine-tuned according to the test data During the trainingthe YOLO-V3 model trained by Pascal VOC2007 [24] isused as the initial weight parameter to fine-tune the networkwhere the learning rate is set to 0001 and the weight at-tenuation was 00005 30000 iteration times in training wereconducted on NVIDIA Geforce GTX 1080TI e KCFmodule uses the peak-side-lobe ratio to select the optimaltracking point and the threshold of normalized response isset to 065 If the regression response score is less than 065 it

Training and updating

Extracting hogfeatures

Training trackingfilter

KCF trackingmodel

Convolution Filter responsemap

Maximumresponse value Output

Feature extraction1st frame

T-th frame

Extracting hogfeatures

Figure 5 e overall framework of the tracking algorithm

6 Mathematical Problems in Engineering

is considered that the tracking is failed and the improvedYOLO-V3 detection network is used to recapture the op-timal object

In this paper eight representative subsets from videosurveillance are selected for verification where characteristicfor partial sequences is described in Table 1 For examplevideo 1 shows the similarity background occlusion and fastmotion video 2 shows the similarity background fastmotion and rotation video 3 and 4 show the occlusionrotation and attitude change and video 5 shows the fastmotion illumination and similarity background esimulation platform is AMD Ryzen 5 3500U host with31GHz and 8GB RAM

In this paper center error (CE) and overlap rate (OR) areused to compare and analyze the experimental results [19]e former is the relative number of frames whose centerposition error is less than a certain threshold and the latter isthe percentage of frames whose overlap rate of the objectbounding box exceeds the threshold In this paper theposition error of 20 and the overlap rate of 06 are selected asthe threshold of tracking success Because of the differentthresholds there are great differences in quantitative anal-ysis erefore precision plot and success plot are used toquantitatively analyze the performance of the comparisonalgorithms

42AblationAnalysis Our proposed method in this paper isan improved tracking method based on KCF to achieve theeffect of scale adaptation In order to illustrate the effec-tiveness the comparison experiment in this paper selectstracking methods with adaptive scale capabilities for com-parison such as KCF SAMF DSST CFNet [23] SiamRPN[24] and DKCF [25] where precision refers to the errorbetween the tracking point and the labeled point It can beknown that the result of KCF only updates the position of theobject (x y) and the size of the object remains unchanged sothe adaptability to the change of the object scale is relativelypoor SAMF is also a modified algorithm on the basis ofKCF and the object feature adds color features (color nameCN) which means that HOG features and CN features arecombined In addition multiscales 1 0985 099 0995 1005101 101 1015 are added to the scale pooling and theoptimal scale is cyclically selected at the expense of trackingspeed DSST uses two mutually independent filters for scalecalculation and object positioning where 17 scale changefactors and 33 interpolated scale change factors are estab-lished for scale evaluation and object positioning SiamFC isan object tracking algorithm based on a fully convolutionSiamese network where multiscale object fusion is imple-mented through a pyramid strategy to improve trackingaccuracy our proposed algorithm is a detect-before-trackmodel that uses deep neural networks in template updatingand scale adaptation e results of object detection andtracking under the influence of different environments areshown in Table 2 and the precision plot and success plot ofdetection and tracking in 8 different video sequences areshown in Figure 6 It can be seen from Table 2 and Figure 6that compared with video 1 the tracking success rates of

videos 2 3 4 and 5 have different degrees of decline It canbe seen that occlusion scale change motion blur and il-lumination have an impact on the detection and trackingeffect of which occlusion and illumination changes have agreater impact Different degrees of motion blur have dif-ferent effects on detection and tracking When the objectoverlap rate threshold is set to 06 the average detection andtracking accuracy is 7617 and the average speed can reach18 FPS e slower speed of video 2 is caused by the ap-pearance of new objects in the field of view e object scalein video 4 is larger so the detection and tracking time islonger

Video 2 under the condition of object occlusion andvideo 5 under the condition of illumination changes areselected for comparative experiments Our proposedtracking algorithm is compared with a single tracking al-gorithm and detection algorithm Video 2 has the phe-nomenon of object occlusion e experimental results areshown in Table 2 and Figure 6 In terms of center error andoverlap rate the fusion algorithm is obviously better Deeplearning detection algorithm may not be able to detect theobject with too small scale resulting in low recall rate In thelong-term detection and tracking the correlation filteringtracking algorithm will accumulate errors resulting in pooraccuracy Especially for the occluded object the trackingdrift phenomenon is easy to occur ese reasons make thecenter error and overlap rate of the single detection ortracking algorithm not high e fusion algorithm ensures ahigh recall rate through KCF tracking and corrects thecumulative error by YOLO-V3 detection After the object isoccluded it can still recapture the object again and keeptracking which solves the object lost problem in objectdetection and tracking

ere is illumination change in video 5 e experi-mental results are shown in Table 3 and Figure 7 Due tothe influence of illumination change it is difficult todistinguish the illumination and shade between the objectedge and the background which makes the objectbounding box cannot be determined for detection andtracking Even if the object position can be detected and

Table 1 Characteristic for partial sequences

Sequences CharacteristicBenchmarkvideo 1 Similarity background occlusion fast motion

Benchmarkvideo 2 Similarity background fast motion rotation

Benchmarkvideo 3 Occlusion rotation

Benchmarkvideo 4 Fast motion attitude change

Benchmarkvideo 5 Fast motion illumination similarity background

Benchmarkvideo 6 Occlusion similarity background blurry

Benchmarkvideo 7 Occlusion scale change angle of view

Benchmarkvideo 8 Occlusion rotation illumination

Mathematical Problems in Engineering 7

tracked the judgment of the object scale is not accurateerefore the accuracy of center position error is higherbut the overlap rate is lower in the KCF tracking algo-rithm YOLO-V3 detection algorithm has strong ro-bustness but it has the phenomenon of missing detectionerefore simulation results show that our proposedfusion algorithm has better detection and tracking per-formance in the complex environment

43ComparativeExperimentandAnalysis In this paper weselect different detection and tracking algorithms toconduct comparative experiments on single-object videoswhere the SSD and YOLO-V3 algorithms that are widelyused are selected in the spatial dimension and the classicsingle-object tracking algorithms DSST KCF and SAMFare selected in the temporal dimension e experiment isdivided into two parts e first part is a comparison of asingle spatial detection algorithm or a temporal trackingalgorithm with our proposed algorithm the second part isa comparison of different detection and tracking algo-rithm combinations based on the fusion strategy Table 4

shows the comparison results of a single algorithm If thedetection algorithm is compared separately the detectionaccuracy of the YOLO-V3 algorithm is higher Overall thesuccess rate of a single algorithm is much lower than theYOLO-V3 + KCF fusion algorithm is is because thedetection algorithm is affected by the complex back-ground resulting in a large number of missed detectionsthe temporal algorithm will be affected by motion blurand the accumulated error will cause the tracking driftmaking the IOU between tracking result and ground truthless than 06

Table 5 compares the fusion effects of different algorithmsIt can be seen from Table 5 that the YOLO-V3+KCF al-gorithm has the best effect Because the KCF algorithm inTable 4 has a better effect in the tracking algorithm the overalleffect of the YOLO-V3+KCF is also better than SSD+DSSTand SSD+SAMF Because the tracking algorithm usestemporal information to eliminate the missing detection ofthe detection algorithm and the detection algorithm correctsthe drift of the tracking result by accurately detecting a singleobject the success rate of the fusion algorithm detection ismore than that of the single algorithm in Table 4

Figure 8 shows the qualitative results of differentcomparison algorithms Table 6 is a quantitative com-parison for different sequences In Figure 8(a) there arefactors such as object scale changes illumination changesand background interference In the whole trackingprocess only DKCF SiamRPN and our proposed algo-rithm have better tracking results However due to the

Table 2 e quantitative analysis for testing sequences

Indexes Video 1 Video 2 Video 3 Video 4 Video 5 Video 6 Video 7 Video 8CLE 106 296 311 238 189 210 175 87OR 088 072 062 081 068 071 058 073FPS 174 120 209 168 141 191 185 172

10

08

06

04

02

000 10 20 30 40 50

Location error threshold

Prec

ision

Video 2Video 1Video 5

Video 7Video 8Video 3

Video 6Video 4

(a)

10

08

06

04

02

000 10 20 30 40 50

Overlap rate thresholdSu

cces

s rat

e

Video 2Video 1Video 5

Video 7Video 8Video 3

Video 6Video 4

(b)

Figure 6 e tracking results for different testing sequences (a) Precision plot and (b) success plot

Table 3 Detection and tracking results for different modules

Index (average) YOLO-V3 KCF FusionCLE (pixel) 169 146 105OR 0816 0752 0861Frame rate (FPS) 87 460 161

8 Mathematical Problems in Engineering

continuous change of object scale the KCF trackingtemplate-introduced background interference informa-tion gradually accumulates and finally there is a largetracking deviation (such as the 640th frame) Our pro-posed algorithm can automatically adjust the trackingbounding box size according to the object scale changethereby reducing the background interference informa-tion so it can always estimate the location and the scale of

the object the object in video 7 has dramatic changes inillumination and scale (frames 65 110 and 351 inFigure 8(b)) In the whole tracking process only ourproposed algorithm and SiamRPN can complete thetracking of the entire video and other methods cannotadapt to drastic changes in illumination and scale theobject in video 6 has a certain scale and posture changewhere KCF SAMF DSST CFNet SiamRPN and our

10

08

06

04

02

000 10 20 30 40 50

Location error threshold

Prec

ision

YOLO-V3 + KCFYOLO-V3KCF model

(a)

10

08

06

04

02

000 10 20 30 40 50

Overlap rate threshold

Succ

ess r

ate

YOLO-V3 + KCFYOLOKCF model

(b)

Figure 7 Ablation analysis for different modules in video 5 (a) Precision plot and (b) success plot

Table 4 Comparison results for the single tracking algorithm

Modules Object detection Object tracking FusionModel SSD YOLO-V3 DSST SAMF KCF YOLO-V3 +KCFCLE (pixel) 211 169 1727 1328 146 101OR 0524 0816 0711 0625 0752 0841Frame rate (FPS) 65 87 342 261 460 161

Table 5 Tracking performance of different module combinations

Index (average) YOLO-V3+KCF YOLO-V3 +DSST YOLO-V3+SAMF SSD+KCF SSD+KCF SSD+KCFCLE (pixel) 105 136 165 152 183 175OR 0861 0782 0771 0837 0825 0776Frame rate (FPS) 161 132 91 122 109 87

Mathematical Problems in Engineering 9

proposed algorithm have better tracking performance butour OR and CLE are the highest

5 Conclusion

In a complex surveillance video object detection andtracking usually suffers from various environmental in-terference especially scale changes occlusion illuminationchanges and motion blur is paper proposes an objectdetection and tracking model based on spatiotemporalinformation fusion which uses deep learning to detect andextract spatial information improve detection accuracyand avoid object position drift and then an improved KCFtracking is used to track temporal information so as toavoid missed detection finally the spatiotemporal infor-mation fusion strategy is designed to make detection in-formation and tracking information complementation

e results show that our proposed algorithm can effi-ciently continuously detect and track objects in differentcomplex scenes To a certain extent it can cope with theinfluence of the abovementioned environmental interfer-ence factors has both robustness and stable performanceHowever the detection and tracking effect with too smallscale is slightly worse so the next step will be to makeimprovements on it

Data Availability

e labeled dataset used to support the findings of this studyis available from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

ProposedSAMFSiamRPNKCF

CFNetDKCFDSST

(a)

(b)

(c)

Figure 8 Qualitative comparison of different tracking algorithms in different scenarios (a) Video 8 (b) video 7 and (c) video 6

Table 6 Quantitative comparison for different sequences

SequencesSuccess rate Center location error

KCF SAMF DSST DKCF CFNet SiamRPN Our KCF SAMF DSST DKCF CFNet SiamRPN OurVideo 8 078 056 062 070 075 079 086 253 423 244 249 170 144 91Video 7 060 048 065 061 078 081 081 217 286 92 374 212 108 75Video 6 079 067 058 082 067 079 079 127 71 63 126 168 156 93Video 5 061 052 054 077 047 040 042 146 250 114 171 152 29 81Video 4 073 068 071 075 075 088 082 273 223 144 249 170 149 101Video 3 068 070 072 076 079 083 087 317 283 92 374 212 108 84Video 2 059 057 062 065 075 081 085 125 91 63 126 168 156 89Video 1 068 062 066 072 061 079 079 746 230 214 271 152 199 182

10 Mathematical Problems in Engineering

References

[1] C Wu H Sun H Wang et al ldquoOnline multi-object trackingvia combining discriminative correlation filters with makingdecisionrdquo IEEE Access vol 6 pp 43499ndash43512 2018

[2] Y Qi C Wu D Chen and Y Lu Superpixel tracking basedon sparse representationrdquo Journal of Electronics and Infor-mation Technology vol 37 no 3 pp 529ndash535 2015

[3] G Yuan and M Xue ldquoVisual tracking based on sparse densestructure representation and online robust dictionary learn-ingrdquo Journal of Electronics amp Information Technology vol 37no 3 pp 536ndash542 2015

[4] H Luo B-K Zhong and F-S Kong ldquoTracking usingweighted block compressed sensing and location predictionrdquoJournal of Electronics amp Information Technology vol 37 no 5pp 1160ndash1166 2015

[5] Z-Q Hou A-Q Huang W-S Yu and X Liu ldquoVisual objecttracking method based on local patch model and modelupdaterdquo Journal of Electronics amp Information Technologyvol 37 no 6 pp 1357ndash1364 2015

[6] M Xue H Zhu and G-L Yuan ldquoRobust visual trackingbased on online discrimination dictionary learningrdquo Journalof Electronics amp Information Technology vol 37 no 7pp 1654ndash1659 2015

[7] L Matthews T Ishikawa S Baker et al ldquoe template updateproblemrdquo IEEE Transactions on Pattern Analysis andMachineIntelligence vol 26 no 6 pp 810ndash815 2004

[8] D A Ross J Lim R-S Lin and M-H Yang ldquoIncrementallearning for robust visual trackingrdquo International Journal ofComputer Vision vol 77 no 1-3 pp 125ndash141 2008

[9] S Hare S Golodetz A Saffari et al ldquoStruck structuredoutput tracking with kernelsrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 38 no 10 pp 2096ndash2109 2016

[10] Z Kalal K Mikolajczyk and J Matas ldquoTracking-learning-detectionrdquo IEEE Transactions on Pattern Analysis and Ma-chine Intelligence vol 6 no 1 pp 1409ndash1422 2010

[11] D S Bolme J R Beveridge B A Draper and Y M LuildquoVisual object tracking using adaptive correlation filtersrdquo inProceedings of the 2010 IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR)pp 2544ndash2550 San Francisco CA USA June 2010

[12] J F Henriques R Caseiro P Martins and J BatistaldquoExploiting the circulant structure of tracking-by-detectionwith kernelsrdquo in Proceedings of the 12th European conferenceon Computer Vision - Volume Part IV - ECCV 2012 vol 75no 1 pp 702ndash715 Florence Italy October 2012

[13] J F Henriques R Caseiro P Martins and J Batista ldquoHigh-speed tracking with kernelized correlation filtersrdquo IEEETransactions on Pattern Analysis amp Machine Intelligencevol 37 no 3 pp 583ndash596 2015

[14] M Danelljan G Hager F Khan and M Felsberg ldquoAccuratescale estimation for robust visual trackingrdquo in Proceedings ofthe British Machine Vision Conference pp 590ndash604 BMVAPress Nottingham England September 2014

[15] Y Li and J Zhu ldquoA scale adaptive kernel correlation filtertracker with feature integrationrdquo in Proceedings of the 2014European Conference on Computer Vision (ECCV) pp 254ndash265 Springer Zurich Switzerland September 2014

[16] M Danelljan G Hager F S Khan and M FelsbergldquoLearning spatially regularized correlation filters for visualtrackingrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer Vision (ICCV) pp 4310ndash4318 IEEESantiago Chile December 2015

[17] L Bertinetto J Valmadre S Golodetz O Miksik andP H S Torr ldquoStaple complementary learners for real-timetrackingrdquo in Proceedings of the 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) pp 1401ndash1409 Las Vegas NV USA June 2016

[18] F Li C Tian W Zuo L Zhang and M-H Yang ldquoLearningspatial-temporal regularized correlation filters for visualtrackingrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 4904ndash4913 Salt Lake CityUT USA June 2018

[19] M Mueller N Smith and B Ghanem ldquoContext-awarecorrelation filter trackingrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 1396ndash1404 Honolulu HI USA July 2017

[20] W Zhong H Lu andM H Yang ldquoRobust object tracking viasparse collaborative appearance modelrdquo IEEE Transactions onImage Processing A Publication of the IEEE Signal ProcessingSociety vol 23 no 5 pp 2356ndash2368 2014

[21] Y-Z Xue and T Wang ldquoObject tracking based on cost-sensitive Adaboost algorithmrdquo Chinese Journal of GraphicArts vol 21 no 5 pp 544ndash555 2016

[22] Q Wang J Gao J Xing M Zhang and W Hu ldquoDCFnetdiscriminant correlation filters network for visual trackingrdquoin Proceedings of the 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) Honolulu HI USAJuly 2017

[23] J Valmadre L Bertinetto J F Henriques A Vedaldi andP H S Torr ldquoEnd-to-end representation learning for cor-relation filter based trackingrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 5000ndash5008 Honolulu HI USA July 2017

[24] B Li J Yan W Wu Z Zhu and X Hu ldquoHigh performancetracking with Siamese region proposal networkrdquo in Pro-ceedings of the 2018 IEEECVF Conference on Computer Visionand Pattern Recognition pp 8971ndash8980 Salt Lake City UTUSA June 2018

[25] B Uzkent and Y W Seo ldquoEnKCF ensemble of kernelizedcorrelation filters for high-speed object trackingrdquo in Pro-ceedings of the 2018 IEEEWinter Conference on Applications ofComputer Vision pp 77ndash89 Lake Tahoe NV USA March2018

Mathematical Problems in Engineering 11

the changes of the object and background and ensures thatthe observation model does not degenerate

ere are many interference factors in the video trackingtask and it faces a series of difficulties in practical trackingapplications such as appearance change illumination var-iation partial occlusion and complex background In objectappearance changes it refers to the change of the trackedobjectrsquos appearance or the shooting angle of the cameraduring the movement as shown in Figure 2(a) e illu-mination variation refers to the change of video imaginggray due to changes in the light source or the surroundingenvironment as shown in Figure 2(b) Scale changes refer tothe change of the pixel size of the object in the video due tothe movement of the object or the change of the distance asshown in Figure 2(c) Partial occlusion or object losing refersto an interference phenomenon where the object is affectedby the background or moved out of the field of viewresulting in an incomplete appearance or completely out ofthe field of view as shown in Figure 2(d) e complexbackground refers to a large number of interference factors(such as a large number of similar objects) in the back-ground which causes interference to the object observationmodel In addition there are other interference factors suchas fast movement small objects and blurring during thetracking process ese interference factors limit the per-formance of the tracking model to varying degrees resultingin a decrease in the overall accuracy With the developmentof object tracking technology although some problems havebeen solved such as the use of HOG features to effectivelysolve the problem of illumination changes in tracking tasksthere are still many problems need to be solved in the actualapplication process In this paper we mainly focus onsolving the problem of partial occlusion and object recapturein the process of object tracking

3 Our Proposed Tracking Algorithms

Object detection and tracking based on spatiotemporalinformation fusion is mainly divided into three parts objectdetection based on deep spatial information KCF trackingbased on temporal information and fusion of spatiotem-poral information Firstly the You Only Look Once (YOLO-V3) detector is used to detect the object And then the KCFtracking model is used to track the object in a complexsurveillance video [23] After tracking a certain number offrames the YOLO-V3 detection mechanism is adoptedagain to compare the confidence of the old tracking

bounding box and the new detection bounding boxrough the spatiotemporal information fusion strategy theappropriate bounding box is obtained to continue trackingIf a new object is detected in the field of view the new objectis tracked at the same time e overall detection andtracking system is shown in Figure 3

31 Object Detection Based on Deep Spatial InformationIn this paper we use the framework of the YOLO-V3 deepmodel to realize the object detection and we also redesignthe bounding box selective search method to improve thedetection accuracy of the object spatial information Firstlythe input image features are fully extracted by the basicnetwork through iterative convolution operation and thenfurther feature extraction and analysis are carried outthrough the additional network e object position offset ispredicted and classified by using a convolution predictorFinally the redundancy is removed by the nonmaximumsuppression method e basic network uses the improvedVGG structure as the feature extraction network Twoconvolution layers are used at the end of the network toreplace the two fully connected layers of the original VGGnetwork and eight additional networks are added to furtherimprove the feature extraction ability It is widely knownthat different depth feature maps have different receptivefields and different responses to different scale objects enetwork structure is shown in Figure 4

e detection of multiscale objects is divided into 3 stepsdefault boxes with the different aspect ratio and same areaare generated on different scale feature maps after training alarge number of samples the convolution predictor uses theabstract features in the default box as an input to predict theoffset of the default bounding box nonmaximum sup-pression is used to remove redundant bounding boxes withlow confidence

e default bounding box generation method is im-proved as follows Firstly assuming that it is necessary tomake predictions on a total of m feature maps the area(scale) sk of the default bounding box on the first k minus thfeature map can be written as follows

sk smin +smax minus smin( 1113857

(m minus 1)(k minus 1) k isin [1 m] (1)

where m 6 the minimum area is 02 and the maximumarea is 095 In this paper the K-means clustering algorithmis used to process the aspect ratio of all suspected objects in

Initial position Motion model Feature model Observation model

Online updating

Object location

Figure 1 Basic framework of the single-object tracking algorithm

Mathematical Problems in Engineering 3

the dataset and 5 cluster centers are obtained ereforethe new aspect ratio is denoted as ar isin 1 18 25 38 5 which provides a better initial bounding box for objectdetection

YOLO-V3 convolutional network is used to obtain thecoordinate offset from the fixed default bounding box to theactual benchmark value and the category score and obtainthe loss function through the normalization and weighting

Initial bounding box

YOLO-V3 KCF trackingTemporal information

Deep learningSpatial information

Fusion of spatiotemporalinformation

Adaptive confidence discrimination

Recapture mechanism

Figure 3 Overall detection and tracking framework for our model

3 times 3

2 times 2

3 times 3

3 times 3

Conv1

3 times 3

2 times 2

2 times 2

2 times 2

3 times 3

3 times 3

3 times 33 times 3

3 times 3

3 times 3

2 times 2

2 times 2

Passthrough layer

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

Pool2

Conv2Pool1Conv4Pool3 Conv6 Conv9Pool5

Conv3 Conv5 Conv7 Conv8Pool4

Pred

ictio

n m

odul

e

NM

S

Figure 4 e network structure for object detection

(a) (b)

(c) (d)

Figure 2 Samples for different interference factors (a) Scale change (b) Illumination variation (c) Appearance change (d) Partialocclusion

4 Mathematical Problems in Engineering

of the category score and the coordinate offset ereforethe loss function can be described as follows

L xkij c l g1113872 1113873

1N

Lconf xkij c1113872 1113873 + αLloc x

kij l g1113872 11138731113960 1113961 (2)

where xkij 1 means that the candidate bounding box i

matches the object real bounding box j with category p

successfully and otherwise xkij 0 means the match fails N

is the number of candidate bounding box that can bematched with the true value Lloc is the position loss functionsmooth L1 loss and α is set to 1e network parameters canbe optimized according to the result of the loss function

32 KCF Tracking Based on Temporal Information KCFalgorithm is a classical discriminative-based object trackingalgorithm which has good performance in tracking speed andtracking accuracy In the tracking process the objectbounding box of the KCF algorithm has been set and the sizeof the object scale has not changed from beginning to endHowever the object size often changes in the tracking videosequence which will lead to the drift of the bounding box inthe tracking process of the tracker even resulting in trackingfailure In addition the KCF algorithm cannot deal with theocclusion of the object in the tracking process which will leadto the feature extraction error when training the filter modelWhen the object moves rapidly some object features cannotbe extracted because of the fixed size of the searching boxwhere the quality of the detection model will be reduced andthe tracking failure will be caused when updating the modelIn order to solve the problem of tracking failure caused by theKCF algorithm in the above situations some scholars im-proved the KCF algorithm and proposed some novel yeteffective object tracking algorithms based on deep learningdetection and a large number of experiment results show thatthe improved algorithm has better accuracy and robustnessthan the original KCF algorithm

As for complex monitoring applications the real-timeperformance of object tracking is very important We selectKCF as the basic tracking algorithm which has a greateradvantage in speed In addition considering the charac-teristics of large changes in object scale a multiscale adaptivemodule is added in KCF HOG features are adopted to trainthe classifier and transform it into a ridge regression modelso as to establish themapping relationship between the inputsample variable x and the output response y e ridgeregression objective function can be rewritten as follows

minω

1113944i

f xi( 1113857 minus yi( 11138572

+ λω2 (3)

where λ(λge 0) is a regularization parameter e regulari-zation term is added to avoid the occurrence of overfitting inoptimization In order to minimize the gap between thesample label predicted by the regression model and the reallabel a weight coefficient is assigned to each sample to obtaina closed solution formula for the regression parameterserefore the analytical solution ω can be deduced andrepresented as

ω XTX + λI1113872 1113873

minus1X

Ty (4)

Due to the time-consuming calculation of dense sam-pling in equation (3) cyclic shifting is used to constructtraining samples and the problem domain is transformedinto the discrete Fourier domain e characteristics of thecirculant matrix can avoid the process of matrix inversionand accelerate feature space learning e circulant matrixcan be diagonalized and this can be described as follows

X Fdiag(1113954x)FH

(5)

In order to simplify the calculation the features obtainedby ridge regression with linear space are mapped to thenonlinear space through the kernel function and a dualproblem is solved in the nonlinear spacerough themappingfunction ϕ(x) the classifier can be denoted as follows

f xi( 1113857 ωTϕ xi( 1113857 (6)

Given ω 1113936 iaiϕ(xi) the solution of ω can be trans-formed into the solution of α erefore on the basis of thekernel function K ϕ(X)ϕ(X)T we can get the solutionbased on the ridge regression under the kernel functionnamely

α (K + λI)minus1

y (7)

Finally we can get the response results of all test samplesin the Fourier domain

f(z) 1113954kxzΘ1113954α (8)

e sample with the strongest response is selected as theobject position in the current frame

e overall framework of the tracking algorithm isshown in Figure 5 First the object is initialized in the firstframe and the features of the object are extracted and thenthe ridge regression model is trained to obtain the optimalfilter parameters then in the process of object tracking thefeature is extracted on the current frame and convolutionoperation is performed with the filter template trained in theprevious frame We can get the response map where themaximum correlation value is the object position

In order to adapt the change of the object scale a scaleadaptive strategy is developed to ensure the stability oftracking Taking the object position as the center rectan-gular bounding boxes with different scales are selected assamples and their HOG features are extracted respectivelyerefore we can get the respective sample responsesR0 R+1 andRminus1 after tracking the classifier and obtain thestrongest response after comparison

R max R0 R+1 Rminus1( 1113857 (9)

e rectangular bounding box corresponding to thesample with the strongest response is the current objectscale where the improved KCF can be used for multiscaleadaptation selection and the amount of calculation is smalland efficient and feasible

Mathematical Problems in Engineering 5

33 Object Detection and Tracking for Spatiotemporal FusionAs we all know using deep learning for object detection toextract single-frame image features has high accuracy canidentify and classify unknown objects and has high ro-bustness However the object detection does not combinethe temporal information relationship between the con-secutive frame in the video which may lead to the misseddetection and slow running speed KCF tracking isachieved by extracting the characteristics of continuousframe images to train filters in ridge regression where thecalculation is small and the processing speed is also fastHowever it is easy to accumulate errors because oftracking drift and be easily affected by object occlusionand background interference erefore the fusion oftemporal information and spatial information can makefull use of the advantages of deep learning and KCFimprove overall performance and achieve more accurateand stable detection and tracking on the basis of ro-bustness and real-time performance

In the process of information fusion the spatial positioninformation of the object is determined by the deeplearning-based object detection algorithm in the first frameand then the position of the object in the first frame is used asthe input of the KCF tracking algorithm and the trackingalgorithm is used to track the object in the following framesAfter tracking a fixed number of frames the detectionmechanism is run to ensure the accuracy of continuousdetection and tracking through the YOLO-V3 detectionalgorithm e number of tracking frames between the twodetection operations can be determined by experimentGenerally it can be set to 50 frames In addition we can alsouse the confidence of the detection results as the basis oftemplate refresh and recapture

After running the redetection mechanism it is not surewhich one is better to track candidate bounding boxes ordetect candidate bounding boxes obtained by the redetectionmodule erefore this paper designs a candidate frameselection strategy Firstly the overlap ratio between detec-tion candidate bounding box Si and tracking candidatebounding box Kj is calculated to judge whether the detectedand tracked objects are the same In this paper the inter-section over union (IOU) is used as the criterion of overlap

ratio e IOU of two candidate bounding boxes can bewritten as follows

IOU Si capKj

Si capKj (10)

If forallKj IOU(Si Kj)lt 04 Si will be regarded as a newobject and output to achieve the initialization of the trackingalgorithm If existKj IOU(Si Kj)lt 04 it is considered that thedetection bounding box Si and the tracking bounding boxKj

have detected the same object then the confidence levelconf(Si) of the bounding box of the detection algorithm iscompared with the normalized response conf(Kj) of thebounding box of the tracking algorithm Finally thebounding box with higher confidence is taken as the outputof the system

4 Experimental Results and Analysis

41 Dataset and Verification Platform In order to improvethe accuracy and robustness of the detection and trackingalgorithm in the video surveillance task this experimentconstructs a surveillance dataset with 321550 images Tofacilitate performance analysis all data are labeled frame byframe in scale and position and classified according to theinterference state

e improved detection and tracking model is dividedinto three parts object detection based on deep spatial in-formation KCF tracking based on temporal informationand fusion of spatiotemporal informatione parameters ofeach part are consistent with the original model Duringoffline training all convolution layers will be updated Afteronline updating the parameters of the shallow convolutionlayer are fixed and the last two convolution layers will befine-tuned according to the test data During the trainingthe YOLO-V3 model trained by Pascal VOC2007 [24] isused as the initial weight parameter to fine-tune the networkwhere the learning rate is set to 0001 and the weight at-tenuation was 00005 30000 iteration times in training wereconducted on NVIDIA Geforce GTX 1080TI e KCFmodule uses the peak-side-lobe ratio to select the optimaltracking point and the threshold of normalized response isset to 065 If the regression response score is less than 065 it

Training and updating

Extracting hogfeatures

Training trackingfilter

KCF trackingmodel

Convolution Filter responsemap

Maximumresponse value Output

Feature extraction1st frame

T-th frame

Extracting hogfeatures

Figure 5 e overall framework of the tracking algorithm

6 Mathematical Problems in Engineering

is considered that the tracking is failed and the improvedYOLO-V3 detection network is used to recapture the op-timal object

In this paper eight representative subsets from videosurveillance are selected for verification where characteristicfor partial sequences is described in Table 1 For examplevideo 1 shows the similarity background occlusion and fastmotion video 2 shows the similarity background fastmotion and rotation video 3 and 4 show the occlusionrotation and attitude change and video 5 shows the fastmotion illumination and similarity background esimulation platform is AMD Ryzen 5 3500U host with31GHz and 8GB RAM

In this paper center error (CE) and overlap rate (OR) areused to compare and analyze the experimental results [19]e former is the relative number of frames whose centerposition error is less than a certain threshold and the latter isthe percentage of frames whose overlap rate of the objectbounding box exceeds the threshold In this paper theposition error of 20 and the overlap rate of 06 are selected asthe threshold of tracking success Because of the differentthresholds there are great differences in quantitative anal-ysis erefore precision plot and success plot are used toquantitatively analyze the performance of the comparisonalgorithms

42AblationAnalysis Our proposed method in this paper isan improved tracking method based on KCF to achieve theeffect of scale adaptation In order to illustrate the effec-tiveness the comparison experiment in this paper selectstracking methods with adaptive scale capabilities for com-parison such as KCF SAMF DSST CFNet [23] SiamRPN[24] and DKCF [25] where precision refers to the errorbetween the tracking point and the labeled point It can beknown that the result of KCF only updates the position of theobject (x y) and the size of the object remains unchanged sothe adaptability to the change of the object scale is relativelypoor SAMF is also a modified algorithm on the basis ofKCF and the object feature adds color features (color nameCN) which means that HOG features and CN features arecombined In addition multiscales 1 0985 099 0995 1005101 101 1015 are added to the scale pooling and theoptimal scale is cyclically selected at the expense of trackingspeed DSST uses two mutually independent filters for scalecalculation and object positioning where 17 scale changefactors and 33 interpolated scale change factors are estab-lished for scale evaluation and object positioning SiamFC isan object tracking algorithm based on a fully convolutionSiamese network where multiscale object fusion is imple-mented through a pyramid strategy to improve trackingaccuracy our proposed algorithm is a detect-before-trackmodel that uses deep neural networks in template updatingand scale adaptation e results of object detection andtracking under the influence of different environments areshown in Table 2 and the precision plot and success plot ofdetection and tracking in 8 different video sequences areshown in Figure 6 It can be seen from Table 2 and Figure 6that compared with video 1 the tracking success rates of

videos 2 3 4 and 5 have different degrees of decline It canbe seen that occlusion scale change motion blur and il-lumination have an impact on the detection and trackingeffect of which occlusion and illumination changes have agreater impact Different degrees of motion blur have dif-ferent effects on detection and tracking When the objectoverlap rate threshold is set to 06 the average detection andtracking accuracy is 7617 and the average speed can reach18 FPS e slower speed of video 2 is caused by the ap-pearance of new objects in the field of view e object scalein video 4 is larger so the detection and tracking time islonger

Video 2 under the condition of object occlusion andvideo 5 under the condition of illumination changes areselected for comparative experiments Our proposedtracking algorithm is compared with a single tracking al-gorithm and detection algorithm Video 2 has the phe-nomenon of object occlusion e experimental results areshown in Table 2 and Figure 6 In terms of center error andoverlap rate the fusion algorithm is obviously better Deeplearning detection algorithm may not be able to detect theobject with too small scale resulting in low recall rate In thelong-term detection and tracking the correlation filteringtracking algorithm will accumulate errors resulting in pooraccuracy Especially for the occluded object the trackingdrift phenomenon is easy to occur ese reasons make thecenter error and overlap rate of the single detection ortracking algorithm not high e fusion algorithm ensures ahigh recall rate through KCF tracking and corrects thecumulative error by YOLO-V3 detection After the object isoccluded it can still recapture the object again and keeptracking which solves the object lost problem in objectdetection and tracking

ere is illumination change in video 5 e experi-mental results are shown in Table 3 and Figure 7 Due tothe influence of illumination change it is difficult todistinguish the illumination and shade between the objectedge and the background which makes the objectbounding box cannot be determined for detection andtracking Even if the object position can be detected and

Table 1 Characteristic for partial sequences

Sequences CharacteristicBenchmarkvideo 1 Similarity background occlusion fast motion

Benchmarkvideo 2 Similarity background fast motion rotation

Benchmarkvideo 3 Occlusion rotation

Benchmarkvideo 4 Fast motion attitude change

Benchmarkvideo 5 Fast motion illumination similarity background

Benchmarkvideo 6 Occlusion similarity background blurry

Benchmarkvideo 7 Occlusion scale change angle of view

Benchmarkvideo 8 Occlusion rotation illumination

Mathematical Problems in Engineering 7

tracked the judgment of the object scale is not accurateerefore the accuracy of center position error is higherbut the overlap rate is lower in the KCF tracking algo-rithm YOLO-V3 detection algorithm has strong ro-bustness but it has the phenomenon of missing detectionerefore simulation results show that our proposedfusion algorithm has better detection and tracking per-formance in the complex environment

43ComparativeExperimentandAnalysis In this paper weselect different detection and tracking algorithms toconduct comparative experiments on single-object videoswhere the SSD and YOLO-V3 algorithms that are widelyused are selected in the spatial dimension and the classicsingle-object tracking algorithms DSST KCF and SAMFare selected in the temporal dimension e experiment isdivided into two parts e first part is a comparison of asingle spatial detection algorithm or a temporal trackingalgorithm with our proposed algorithm the second part isa comparison of different detection and tracking algo-rithm combinations based on the fusion strategy Table 4

shows the comparison results of a single algorithm If thedetection algorithm is compared separately the detectionaccuracy of the YOLO-V3 algorithm is higher Overall thesuccess rate of a single algorithm is much lower than theYOLO-V3 + KCF fusion algorithm is is because thedetection algorithm is affected by the complex back-ground resulting in a large number of missed detectionsthe temporal algorithm will be affected by motion blurand the accumulated error will cause the tracking driftmaking the IOU between tracking result and ground truthless than 06

Table 5 compares the fusion effects of different algorithmsIt can be seen from Table 5 that the YOLO-V3+KCF al-gorithm has the best effect Because the KCF algorithm inTable 4 has a better effect in the tracking algorithm the overalleffect of the YOLO-V3+KCF is also better than SSD+DSSTand SSD+SAMF Because the tracking algorithm usestemporal information to eliminate the missing detection ofthe detection algorithm and the detection algorithm correctsthe drift of the tracking result by accurately detecting a singleobject the success rate of the fusion algorithm detection ismore than that of the single algorithm in Table 4

Figure 8 shows the qualitative results of differentcomparison algorithms Table 6 is a quantitative com-parison for different sequences In Figure 8(a) there arefactors such as object scale changes illumination changesand background interference In the whole trackingprocess only DKCF SiamRPN and our proposed algo-rithm have better tracking results However due to the

Table 2 e quantitative analysis for testing sequences

Indexes Video 1 Video 2 Video 3 Video 4 Video 5 Video 6 Video 7 Video 8CLE 106 296 311 238 189 210 175 87OR 088 072 062 081 068 071 058 073FPS 174 120 209 168 141 191 185 172

10

08

06

04

02

000 10 20 30 40 50

Location error threshold

Prec

ision

Video 2Video 1Video 5

Video 7Video 8Video 3

Video 6Video 4

(a)

10

08

06

04

02

000 10 20 30 40 50

Overlap rate thresholdSu

cces

s rat

e

Video 2Video 1Video 5

Video 7Video 8Video 3

Video 6Video 4

(b)

Figure 6 e tracking results for different testing sequences (a) Precision plot and (b) success plot

Table 3 Detection and tracking results for different modules

Index (average) YOLO-V3 KCF FusionCLE (pixel) 169 146 105OR 0816 0752 0861Frame rate (FPS) 87 460 161

8 Mathematical Problems in Engineering

continuous change of object scale the KCF trackingtemplate-introduced background interference informa-tion gradually accumulates and finally there is a largetracking deviation (such as the 640th frame) Our pro-posed algorithm can automatically adjust the trackingbounding box size according to the object scale changethereby reducing the background interference informa-tion so it can always estimate the location and the scale of

the object the object in video 7 has dramatic changes inillumination and scale (frames 65 110 and 351 inFigure 8(b)) In the whole tracking process only ourproposed algorithm and SiamRPN can complete thetracking of the entire video and other methods cannotadapt to drastic changes in illumination and scale theobject in video 6 has a certain scale and posture changewhere KCF SAMF DSST CFNet SiamRPN and our

10

08

06

04

02

000 10 20 30 40 50

Location error threshold

Prec

ision

YOLO-V3 + KCFYOLO-V3KCF model

(a)

10

08

06

04

02

000 10 20 30 40 50

Overlap rate threshold

Succ

ess r

ate

YOLO-V3 + KCFYOLOKCF model

(b)

Figure 7 Ablation analysis for different modules in video 5 (a) Precision plot and (b) success plot

Table 4 Comparison results for the single tracking algorithm

Modules Object detection Object tracking FusionModel SSD YOLO-V3 DSST SAMF KCF YOLO-V3 +KCFCLE (pixel) 211 169 1727 1328 146 101OR 0524 0816 0711 0625 0752 0841Frame rate (FPS) 65 87 342 261 460 161

Table 5 Tracking performance of different module combinations

Index (average) YOLO-V3+KCF YOLO-V3 +DSST YOLO-V3+SAMF SSD+KCF SSD+KCF SSD+KCFCLE (pixel) 105 136 165 152 183 175OR 0861 0782 0771 0837 0825 0776Frame rate (FPS) 161 132 91 122 109 87

Mathematical Problems in Engineering 9

proposed algorithm have better tracking performance butour OR and CLE are the highest

5 Conclusion

In a complex surveillance video object detection andtracking usually suffers from various environmental in-terference especially scale changes occlusion illuminationchanges and motion blur is paper proposes an objectdetection and tracking model based on spatiotemporalinformation fusion which uses deep learning to detect andextract spatial information improve detection accuracyand avoid object position drift and then an improved KCFtracking is used to track temporal information so as toavoid missed detection finally the spatiotemporal infor-mation fusion strategy is designed to make detection in-formation and tracking information complementation

e results show that our proposed algorithm can effi-ciently continuously detect and track objects in differentcomplex scenes To a certain extent it can cope with theinfluence of the abovementioned environmental interfer-ence factors has both robustness and stable performanceHowever the detection and tracking effect with too smallscale is slightly worse so the next step will be to makeimprovements on it

Data Availability

e labeled dataset used to support the findings of this studyis available from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

ProposedSAMFSiamRPNKCF

CFNetDKCFDSST

(a)

(b)

(c)

Figure 8 Qualitative comparison of different tracking algorithms in different scenarios (a) Video 8 (b) video 7 and (c) video 6

Table 6 Quantitative comparison for different sequences

SequencesSuccess rate Center location error

KCF SAMF DSST DKCF CFNet SiamRPN Our KCF SAMF DSST DKCF CFNet SiamRPN OurVideo 8 078 056 062 070 075 079 086 253 423 244 249 170 144 91Video 7 060 048 065 061 078 081 081 217 286 92 374 212 108 75Video 6 079 067 058 082 067 079 079 127 71 63 126 168 156 93Video 5 061 052 054 077 047 040 042 146 250 114 171 152 29 81Video 4 073 068 071 075 075 088 082 273 223 144 249 170 149 101Video 3 068 070 072 076 079 083 087 317 283 92 374 212 108 84Video 2 059 057 062 065 075 081 085 125 91 63 126 168 156 89Video 1 068 062 066 072 061 079 079 746 230 214 271 152 199 182

10 Mathematical Problems in Engineering

References

[1] C Wu H Sun H Wang et al ldquoOnline multi-object trackingvia combining discriminative correlation filters with makingdecisionrdquo IEEE Access vol 6 pp 43499ndash43512 2018

[2] Y Qi C Wu D Chen and Y Lu Superpixel tracking basedon sparse representationrdquo Journal of Electronics and Infor-mation Technology vol 37 no 3 pp 529ndash535 2015

[3] G Yuan and M Xue ldquoVisual tracking based on sparse densestructure representation and online robust dictionary learn-ingrdquo Journal of Electronics amp Information Technology vol 37no 3 pp 536ndash542 2015

[4] H Luo B-K Zhong and F-S Kong ldquoTracking usingweighted block compressed sensing and location predictionrdquoJournal of Electronics amp Information Technology vol 37 no 5pp 1160ndash1166 2015

[5] Z-Q Hou A-Q Huang W-S Yu and X Liu ldquoVisual objecttracking method based on local patch model and modelupdaterdquo Journal of Electronics amp Information Technologyvol 37 no 6 pp 1357ndash1364 2015

[6] M Xue H Zhu and G-L Yuan ldquoRobust visual trackingbased on online discrimination dictionary learningrdquo Journalof Electronics amp Information Technology vol 37 no 7pp 1654ndash1659 2015

[7] L Matthews T Ishikawa S Baker et al ldquoe template updateproblemrdquo IEEE Transactions on Pattern Analysis andMachineIntelligence vol 26 no 6 pp 810ndash815 2004

[8] D A Ross J Lim R-S Lin and M-H Yang ldquoIncrementallearning for robust visual trackingrdquo International Journal ofComputer Vision vol 77 no 1-3 pp 125ndash141 2008

[9] S Hare S Golodetz A Saffari et al ldquoStruck structuredoutput tracking with kernelsrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 38 no 10 pp 2096ndash2109 2016

[10] Z Kalal K Mikolajczyk and J Matas ldquoTracking-learning-detectionrdquo IEEE Transactions on Pattern Analysis and Ma-chine Intelligence vol 6 no 1 pp 1409ndash1422 2010

[11] D S Bolme J R Beveridge B A Draper and Y M LuildquoVisual object tracking using adaptive correlation filtersrdquo inProceedings of the 2010 IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR)pp 2544ndash2550 San Francisco CA USA June 2010

[12] J F Henriques R Caseiro P Martins and J BatistaldquoExploiting the circulant structure of tracking-by-detectionwith kernelsrdquo in Proceedings of the 12th European conferenceon Computer Vision - Volume Part IV - ECCV 2012 vol 75no 1 pp 702ndash715 Florence Italy October 2012

[13] J F Henriques R Caseiro P Martins and J Batista ldquoHigh-speed tracking with kernelized correlation filtersrdquo IEEETransactions on Pattern Analysis amp Machine Intelligencevol 37 no 3 pp 583ndash596 2015

[14] M Danelljan G Hager F Khan and M Felsberg ldquoAccuratescale estimation for robust visual trackingrdquo in Proceedings ofthe British Machine Vision Conference pp 590ndash604 BMVAPress Nottingham England September 2014

[15] Y Li and J Zhu ldquoA scale adaptive kernel correlation filtertracker with feature integrationrdquo in Proceedings of the 2014European Conference on Computer Vision (ECCV) pp 254ndash265 Springer Zurich Switzerland September 2014

[16] M Danelljan G Hager F S Khan and M FelsbergldquoLearning spatially regularized correlation filters for visualtrackingrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer Vision (ICCV) pp 4310ndash4318 IEEESantiago Chile December 2015

[17] L Bertinetto J Valmadre S Golodetz O Miksik andP H S Torr ldquoStaple complementary learners for real-timetrackingrdquo in Proceedings of the 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) pp 1401ndash1409 Las Vegas NV USA June 2016

[18] F Li C Tian W Zuo L Zhang and M-H Yang ldquoLearningspatial-temporal regularized correlation filters for visualtrackingrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 4904ndash4913 Salt Lake CityUT USA June 2018

[19] M Mueller N Smith and B Ghanem ldquoContext-awarecorrelation filter trackingrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 1396ndash1404 Honolulu HI USA July 2017

[20] W Zhong H Lu andM H Yang ldquoRobust object tracking viasparse collaborative appearance modelrdquo IEEE Transactions onImage Processing A Publication of the IEEE Signal ProcessingSociety vol 23 no 5 pp 2356ndash2368 2014

[21] Y-Z Xue and T Wang ldquoObject tracking based on cost-sensitive Adaboost algorithmrdquo Chinese Journal of GraphicArts vol 21 no 5 pp 544ndash555 2016

[22] Q Wang J Gao J Xing M Zhang and W Hu ldquoDCFnetdiscriminant correlation filters network for visual trackingrdquoin Proceedings of the 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) Honolulu HI USAJuly 2017

[23] J Valmadre L Bertinetto J F Henriques A Vedaldi andP H S Torr ldquoEnd-to-end representation learning for cor-relation filter based trackingrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 5000ndash5008 Honolulu HI USA July 2017

[24] B Li J Yan W Wu Z Zhu and X Hu ldquoHigh performancetracking with Siamese region proposal networkrdquo in Pro-ceedings of the 2018 IEEECVF Conference on Computer Visionand Pattern Recognition pp 8971ndash8980 Salt Lake City UTUSA June 2018

[25] B Uzkent and Y W Seo ldquoEnKCF ensemble of kernelizedcorrelation filters for high-speed object trackingrdquo in Pro-ceedings of the 2018 IEEEWinter Conference on Applications ofComputer Vision pp 77ndash89 Lake Tahoe NV USA March2018

Mathematical Problems in Engineering 11

the dataset and 5 cluster centers are obtained ereforethe new aspect ratio is denoted as ar isin 1 18 25 38 5 which provides a better initial bounding box for objectdetection

YOLO-V3 convolutional network is used to obtain thecoordinate offset from the fixed default bounding box to theactual benchmark value and the category score and obtainthe loss function through the normalization and weighting

Initial bounding box

YOLO-V3 KCF trackingTemporal information

Deep learningSpatial information

Fusion of spatiotemporalinformation

Adaptive confidence discrimination

Recapture mechanism

Figure 3 Overall detection and tracking framework for our model

3 times 3

2 times 2

3 times 3

3 times 3

Conv1

3 times 3

2 times 2

2 times 2

2 times 2

3 times 3

3 times 3

3 times 33 times 3

3 times 3

3 times 3

2 times 2

2 times 2

Passthrough layer

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

3 times 3

Pool2

Conv2Pool1Conv4Pool3 Conv6 Conv9Pool5

Conv3 Conv5 Conv7 Conv8Pool4

Pred

ictio

n m

odul

e

NM

S

Figure 4 e network structure for object detection

(a) (b)

(c) (d)

Figure 2 Samples for different interference factors (a) Scale change (b) Illumination variation (c) Appearance change (d) Partialocclusion

4 Mathematical Problems in Engineering

of the category score and the coordinate offset ereforethe loss function can be described as follows

L xkij c l g1113872 1113873

1N

Lconf xkij c1113872 1113873 + αLloc x

kij l g1113872 11138731113960 1113961 (2)

where xkij 1 means that the candidate bounding box i

matches the object real bounding box j with category p

successfully and otherwise xkij 0 means the match fails N

is the number of candidate bounding box that can bematched with the true value Lloc is the position loss functionsmooth L1 loss and α is set to 1e network parameters canbe optimized according to the result of the loss function

32 KCF Tracking Based on Temporal Information KCFalgorithm is a classical discriminative-based object trackingalgorithm which has good performance in tracking speed andtracking accuracy In the tracking process the objectbounding box of the KCF algorithm has been set and the sizeof the object scale has not changed from beginning to endHowever the object size often changes in the tracking videosequence which will lead to the drift of the bounding box inthe tracking process of the tracker even resulting in trackingfailure In addition the KCF algorithm cannot deal with theocclusion of the object in the tracking process which will leadto the feature extraction error when training the filter modelWhen the object moves rapidly some object features cannotbe extracted because of the fixed size of the searching boxwhere the quality of the detection model will be reduced andthe tracking failure will be caused when updating the modelIn order to solve the problem of tracking failure caused by theKCF algorithm in the above situations some scholars im-proved the KCF algorithm and proposed some novel yeteffective object tracking algorithms based on deep learningdetection and a large number of experiment results show thatthe improved algorithm has better accuracy and robustnessthan the original KCF algorithm

As for complex monitoring applications the real-timeperformance of object tracking is very important We selectKCF as the basic tracking algorithm which has a greateradvantage in speed In addition considering the charac-teristics of large changes in object scale a multiscale adaptivemodule is added in KCF HOG features are adopted to trainthe classifier and transform it into a ridge regression modelso as to establish themapping relationship between the inputsample variable x and the output response y e ridgeregression objective function can be rewritten as follows

minω

1113944i

f xi( 1113857 minus yi( 11138572

+ λω2 (3)

where λ(λge 0) is a regularization parameter e regulari-zation term is added to avoid the occurrence of overfitting inoptimization In order to minimize the gap between thesample label predicted by the regression model and the reallabel a weight coefficient is assigned to each sample to obtaina closed solution formula for the regression parameterserefore the analytical solution ω can be deduced andrepresented as

ω XTX + λI1113872 1113873

minus1X

Ty (4)

Due to the time-consuming calculation of dense sam-pling in equation (3) cyclic shifting is used to constructtraining samples and the problem domain is transformedinto the discrete Fourier domain e characteristics of thecirculant matrix can avoid the process of matrix inversionand accelerate feature space learning e circulant matrixcan be diagonalized and this can be described as follows

X Fdiag(1113954x)FH

(5)

In order to simplify the calculation the features obtainedby ridge regression with linear space are mapped to thenonlinear space through the kernel function and a dualproblem is solved in the nonlinear spacerough themappingfunction ϕ(x) the classifier can be denoted as follows

f xi( 1113857 ωTϕ xi( 1113857 (6)

Given ω 1113936 iaiϕ(xi) the solution of ω can be trans-formed into the solution of α erefore on the basis of thekernel function K ϕ(X)ϕ(X)T we can get the solutionbased on the ridge regression under the kernel functionnamely

α (K + λI)minus1

y (7)

Finally we can get the response results of all test samplesin the Fourier domain

f(z) 1113954kxzΘ1113954α (8)

e sample with the strongest response is selected as theobject position in the current frame

e overall framework of the tracking algorithm isshown in Figure 5 First the object is initialized in the firstframe and the features of the object are extracted and thenthe ridge regression model is trained to obtain the optimalfilter parameters then in the process of object tracking thefeature is extracted on the current frame and convolutionoperation is performed with the filter template trained in theprevious frame We can get the response map where themaximum correlation value is the object position

In order to adapt the change of the object scale a scaleadaptive strategy is developed to ensure the stability oftracking Taking the object position as the center rectan-gular bounding boxes with different scales are selected assamples and their HOG features are extracted respectivelyerefore we can get the respective sample responsesR0 R+1 andRminus1 after tracking the classifier and obtain thestrongest response after comparison

R max R0 R+1 Rminus1( 1113857 (9)

e rectangular bounding box corresponding to thesample with the strongest response is the current objectscale where the improved KCF can be used for multiscaleadaptation selection and the amount of calculation is smalland efficient and feasible

Mathematical Problems in Engineering 5

33 Object Detection and Tracking for Spatiotemporal FusionAs we all know using deep learning for object detection toextract single-frame image features has high accuracy canidentify and classify unknown objects and has high ro-bustness However the object detection does not combinethe temporal information relationship between the con-secutive frame in the video which may lead to the misseddetection and slow running speed KCF tracking isachieved by extracting the characteristics of continuousframe images to train filters in ridge regression where thecalculation is small and the processing speed is also fastHowever it is easy to accumulate errors because oftracking drift and be easily affected by object occlusionand background interference erefore the fusion oftemporal information and spatial information can makefull use of the advantages of deep learning and KCFimprove overall performance and achieve more accurateand stable detection and tracking on the basis of ro-bustness and real-time performance

In the process of information fusion the spatial positioninformation of the object is determined by the deeplearning-based object detection algorithm in the first frameand then the position of the object in the first frame is used asthe input of the KCF tracking algorithm and the trackingalgorithm is used to track the object in the following framesAfter tracking a fixed number of frames the detectionmechanism is run to ensure the accuracy of continuousdetection and tracking through the YOLO-V3 detectionalgorithm e number of tracking frames between the twodetection operations can be determined by experimentGenerally it can be set to 50 frames In addition we can alsouse the confidence of the detection results as the basis oftemplate refresh and recapture

After running the redetection mechanism it is not surewhich one is better to track candidate bounding boxes ordetect candidate bounding boxes obtained by the redetectionmodule erefore this paper designs a candidate frameselection strategy Firstly the overlap ratio between detec-tion candidate bounding box Si and tracking candidatebounding box Kj is calculated to judge whether the detectedand tracked objects are the same In this paper the inter-section over union (IOU) is used as the criterion of overlap

ratio e IOU of two candidate bounding boxes can bewritten as follows

IOU Si capKj

Si capKj (10)

If forallKj IOU(Si Kj)lt 04 Si will be regarded as a newobject and output to achieve the initialization of the trackingalgorithm If existKj IOU(Si Kj)lt 04 it is considered that thedetection bounding box Si and the tracking bounding boxKj

have detected the same object then the confidence levelconf(Si) of the bounding box of the detection algorithm iscompared with the normalized response conf(Kj) of thebounding box of the tracking algorithm Finally thebounding box with higher confidence is taken as the outputof the system

4 Experimental Results and Analysis

41 Dataset and Verification Platform In order to improvethe accuracy and robustness of the detection and trackingalgorithm in the video surveillance task this experimentconstructs a surveillance dataset with 321550 images Tofacilitate performance analysis all data are labeled frame byframe in scale and position and classified according to theinterference state

e improved detection and tracking model is dividedinto three parts object detection based on deep spatial in-formation KCF tracking based on temporal informationand fusion of spatiotemporal informatione parameters ofeach part are consistent with the original model Duringoffline training all convolution layers will be updated Afteronline updating the parameters of the shallow convolutionlayer are fixed and the last two convolution layers will befine-tuned according to the test data During the trainingthe YOLO-V3 model trained by Pascal VOC2007 [24] isused as the initial weight parameter to fine-tune the networkwhere the learning rate is set to 0001 and the weight at-tenuation was 00005 30000 iteration times in training wereconducted on NVIDIA Geforce GTX 1080TI e KCFmodule uses the peak-side-lobe ratio to select the optimaltracking point and the threshold of normalized response isset to 065 If the regression response score is less than 065 it

Training and updating

Extracting hogfeatures

Training trackingfilter

KCF trackingmodel

Convolution Filter responsemap

Maximumresponse value Output

Feature extraction1st frame

T-th frame

Extracting hogfeatures

Figure 5 e overall framework of the tracking algorithm

6 Mathematical Problems in Engineering

is considered that the tracking is failed and the improvedYOLO-V3 detection network is used to recapture the op-timal object

In this paper eight representative subsets from videosurveillance are selected for verification where characteristicfor partial sequences is described in Table 1 For examplevideo 1 shows the similarity background occlusion and fastmotion video 2 shows the similarity background fastmotion and rotation video 3 and 4 show the occlusionrotation and attitude change and video 5 shows the fastmotion illumination and similarity background esimulation platform is AMD Ryzen 5 3500U host with31GHz and 8GB RAM

In this paper center error (CE) and overlap rate (OR) areused to compare and analyze the experimental results [19]e former is the relative number of frames whose centerposition error is less than a certain threshold and the latter isthe percentage of frames whose overlap rate of the objectbounding box exceeds the threshold In this paper theposition error of 20 and the overlap rate of 06 are selected asthe threshold of tracking success Because of the differentthresholds there are great differences in quantitative anal-ysis erefore precision plot and success plot are used toquantitatively analyze the performance of the comparisonalgorithms

42AblationAnalysis Our proposed method in this paper isan improved tracking method based on KCF to achieve theeffect of scale adaptation In order to illustrate the effec-tiveness the comparison experiment in this paper selectstracking methods with adaptive scale capabilities for com-parison such as KCF SAMF DSST CFNet [23] SiamRPN[24] and DKCF [25] where precision refers to the errorbetween the tracking point and the labeled point It can beknown that the result of KCF only updates the position of theobject (x y) and the size of the object remains unchanged sothe adaptability to the change of the object scale is relativelypoor SAMF is also a modified algorithm on the basis ofKCF and the object feature adds color features (color nameCN) which means that HOG features and CN features arecombined In addition multiscales 1 0985 099 0995 1005101 101 1015 are added to the scale pooling and theoptimal scale is cyclically selected at the expense of trackingspeed DSST uses two mutually independent filters for scalecalculation and object positioning where 17 scale changefactors and 33 interpolated scale change factors are estab-lished for scale evaluation and object positioning SiamFC isan object tracking algorithm based on a fully convolutionSiamese network where multiscale object fusion is imple-mented through a pyramid strategy to improve trackingaccuracy our proposed algorithm is a detect-before-trackmodel that uses deep neural networks in template updatingand scale adaptation e results of object detection andtracking under the influence of different environments areshown in Table 2 and the precision plot and success plot ofdetection and tracking in 8 different video sequences areshown in Figure 6 It can be seen from Table 2 and Figure 6that compared with video 1 the tracking success rates of

videos 2 3 4 and 5 have different degrees of decline It canbe seen that occlusion scale change motion blur and il-lumination have an impact on the detection and trackingeffect of which occlusion and illumination changes have agreater impact Different degrees of motion blur have dif-ferent effects on detection and tracking When the objectoverlap rate threshold is set to 06 the average detection andtracking accuracy is 7617 and the average speed can reach18 FPS e slower speed of video 2 is caused by the ap-pearance of new objects in the field of view e object scalein video 4 is larger so the detection and tracking time islonger

Video 2 under the condition of object occlusion andvideo 5 under the condition of illumination changes areselected for comparative experiments Our proposedtracking algorithm is compared with a single tracking al-gorithm and detection algorithm Video 2 has the phe-nomenon of object occlusion e experimental results areshown in Table 2 and Figure 6 In terms of center error andoverlap rate the fusion algorithm is obviously better Deeplearning detection algorithm may not be able to detect theobject with too small scale resulting in low recall rate In thelong-term detection and tracking the correlation filteringtracking algorithm will accumulate errors resulting in pooraccuracy Especially for the occluded object the trackingdrift phenomenon is easy to occur ese reasons make thecenter error and overlap rate of the single detection ortracking algorithm not high e fusion algorithm ensures ahigh recall rate through KCF tracking and corrects thecumulative error by YOLO-V3 detection After the object isoccluded it can still recapture the object again and keeptracking which solves the object lost problem in objectdetection and tracking

ere is illumination change in video 5 e experi-mental results are shown in Table 3 and Figure 7 Due tothe influence of illumination change it is difficult todistinguish the illumination and shade between the objectedge and the background which makes the objectbounding box cannot be determined for detection andtracking Even if the object position can be detected and

Table 1 Characteristic for partial sequences

Sequences CharacteristicBenchmarkvideo 1 Similarity background occlusion fast motion

Benchmarkvideo 2 Similarity background fast motion rotation

Benchmarkvideo 3 Occlusion rotation

Benchmarkvideo 4 Fast motion attitude change

Benchmarkvideo 5 Fast motion illumination similarity background

Benchmarkvideo 6 Occlusion similarity background blurry

Benchmarkvideo 7 Occlusion scale change angle of view

Benchmarkvideo 8 Occlusion rotation illumination

Mathematical Problems in Engineering 7

tracked the judgment of the object scale is not accurateerefore the accuracy of center position error is higherbut the overlap rate is lower in the KCF tracking algo-rithm YOLO-V3 detection algorithm has strong ro-bustness but it has the phenomenon of missing detectionerefore simulation results show that our proposedfusion algorithm has better detection and tracking per-formance in the complex environment

43ComparativeExperimentandAnalysis In this paper weselect different detection and tracking algorithms toconduct comparative experiments on single-object videoswhere the SSD and YOLO-V3 algorithms that are widelyused are selected in the spatial dimension and the classicsingle-object tracking algorithms DSST KCF and SAMFare selected in the temporal dimension e experiment isdivided into two parts e first part is a comparison of asingle spatial detection algorithm or a temporal trackingalgorithm with our proposed algorithm the second part isa comparison of different detection and tracking algo-rithm combinations based on the fusion strategy Table 4

shows the comparison results of a single algorithm If thedetection algorithm is compared separately the detectionaccuracy of the YOLO-V3 algorithm is higher Overall thesuccess rate of a single algorithm is much lower than theYOLO-V3 + KCF fusion algorithm is is because thedetection algorithm is affected by the complex back-ground resulting in a large number of missed detectionsthe temporal algorithm will be affected by motion blurand the accumulated error will cause the tracking driftmaking the IOU between tracking result and ground truthless than 06

Table 5 compares the fusion effects of different algorithmsIt can be seen from Table 5 that the YOLO-V3+KCF al-gorithm has the best effect Because the KCF algorithm inTable 4 has a better effect in the tracking algorithm the overalleffect of the YOLO-V3+KCF is also better than SSD+DSSTand SSD+SAMF Because the tracking algorithm usestemporal information to eliminate the missing detection ofthe detection algorithm and the detection algorithm correctsthe drift of the tracking result by accurately detecting a singleobject the success rate of the fusion algorithm detection ismore than that of the single algorithm in Table 4

Figure 8 shows the qualitative results of differentcomparison algorithms Table 6 is a quantitative com-parison for different sequences In Figure 8(a) there arefactors such as object scale changes illumination changesand background interference In the whole trackingprocess only DKCF SiamRPN and our proposed algo-rithm have better tracking results However due to the

Table 2 e quantitative analysis for testing sequences

Indexes Video 1 Video 2 Video 3 Video 4 Video 5 Video 6 Video 7 Video 8CLE 106 296 311 238 189 210 175 87OR 088 072 062 081 068 071 058 073FPS 174 120 209 168 141 191 185 172

10

08

06

04

02

000 10 20 30 40 50

Location error threshold

Prec

ision

Video 2Video 1Video 5

Video 7Video 8Video 3

Video 6Video 4

(a)

10

08

06

04

02

000 10 20 30 40 50

Overlap rate thresholdSu

cces

s rat

e

Video 2Video 1Video 5

Video 7Video 8Video 3

Video 6Video 4

(b)

Figure 6 e tracking results for different testing sequences (a) Precision plot and (b) success plot

Table 3 Detection and tracking results for different modules

Index (average) YOLO-V3 KCF FusionCLE (pixel) 169 146 105OR 0816 0752 0861Frame rate (FPS) 87 460 161

8 Mathematical Problems in Engineering

continuous change of object scale the KCF trackingtemplate-introduced background interference informa-tion gradually accumulates and finally there is a largetracking deviation (such as the 640th frame) Our pro-posed algorithm can automatically adjust the trackingbounding box size according to the object scale changethereby reducing the background interference informa-tion so it can always estimate the location and the scale of

the object the object in video 7 has dramatic changes inillumination and scale (frames 65 110 and 351 inFigure 8(b)) In the whole tracking process only ourproposed algorithm and SiamRPN can complete thetracking of the entire video and other methods cannotadapt to drastic changes in illumination and scale theobject in video 6 has a certain scale and posture changewhere KCF SAMF DSST CFNet SiamRPN and our

10

08

06

04

02

000 10 20 30 40 50

Location error threshold

Prec

ision

YOLO-V3 + KCFYOLO-V3KCF model

(a)

10

08

06

04

02

000 10 20 30 40 50

Overlap rate threshold

Succ

ess r

ate

YOLO-V3 + KCFYOLOKCF model

(b)

Figure 7 Ablation analysis for different modules in video 5 (a) Precision plot and (b) success plot

Table 4 Comparison results for the single tracking algorithm

Modules Object detection Object tracking FusionModel SSD YOLO-V3 DSST SAMF KCF YOLO-V3 +KCFCLE (pixel) 211 169 1727 1328 146 101OR 0524 0816 0711 0625 0752 0841Frame rate (FPS) 65 87 342 261 460 161

Table 5 Tracking performance of different module combinations

Index (average) YOLO-V3+KCF YOLO-V3 +DSST YOLO-V3+SAMF SSD+KCF SSD+KCF SSD+KCFCLE (pixel) 105 136 165 152 183 175OR 0861 0782 0771 0837 0825 0776Frame rate (FPS) 161 132 91 122 109 87

Mathematical Problems in Engineering 9

proposed algorithm have better tracking performance butour OR and CLE are the highest

5 Conclusion

In a complex surveillance video object detection andtracking usually suffers from various environmental in-terference especially scale changes occlusion illuminationchanges and motion blur is paper proposes an objectdetection and tracking model based on spatiotemporalinformation fusion which uses deep learning to detect andextract spatial information improve detection accuracyand avoid object position drift and then an improved KCFtracking is used to track temporal information so as toavoid missed detection finally the spatiotemporal infor-mation fusion strategy is designed to make detection in-formation and tracking information complementation

e results show that our proposed algorithm can effi-ciently continuously detect and track objects in differentcomplex scenes To a certain extent it can cope with theinfluence of the abovementioned environmental interfer-ence factors has both robustness and stable performanceHowever the detection and tracking effect with too smallscale is slightly worse so the next step will be to makeimprovements on it

Data Availability

e labeled dataset used to support the findings of this studyis available from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

ProposedSAMFSiamRPNKCF

CFNetDKCFDSST

(a)

(b)

(c)

Figure 8 Qualitative comparison of different tracking algorithms in different scenarios (a) Video 8 (b) video 7 and (c) video 6

Table 6 Quantitative comparison for different sequences

SequencesSuccess rate Center location error

KCF SAMF DSST DKCF CFNet SiamRPN Our KCF SAMF DSST DKCF CFNet SiamRPN OurVideo 8 078 056 062 070 075 079 086 253 423 244 249 170 144 91Video 7 060 048 065 061 078 081 081 217 286 92 374 212 108 75Video 6 079 067 058 082 067 079 079 127 71 63 126 168 156 93Video 5 061 052 054 077 047 040 042 146 250 114 171 152 29 81Video 4 073 068 071 075 075 088 082 273 223 144 249 170 149 101Video 3 068 070 072 076 079 083 087 317 283 92 374 212 108 84Video 2 059 057 062 065 075 081 085 125 91 63 126 168 156 89Video 1 068 062 066 072 061 079 079 746 230 214 271 152 199 182

10 Mathematical Problems in Engineering

References

[1] C Wu H Sun H Wang et al ldquoOnline multi-object trackingvia combining discriminative correlation filters with makingdecisionrdquo IEEE Access vol 6 pp 43499ndash43512 2018

[2] Y Qi C Wu D Chen and Y Lu Superpixel tracking basedon sparse representationrdquo Journal of Electronics and Infor-mation Technology vol 37 no 3 pp 529ndash535 2015

[3] G Yuan and M Xue ldquoVisual tracking based on sparse densestructure representation and online robust dictionary learn-ingrdquo Journal of Electronics amp Information Technology vol 37no 3 pp 536ndash542 2015

[4] H Luo B-K Zhong and F-S Kong ldquoTracking usingweighted block compressed sensing and location predictionrdquoJournal of Electronics amp Information Technology vol 37 no 5pp 1160ndash1166 2015

[5] Z-Q Hou A-Q Huang W-S Yu and X Liu ldquoVisual objecttracking method based on local patch model and modelupdaterdquo Journal of Electronics amp Information Technologyvol 37 no 6 pp 1357ndash1364 2015

[6] M Xue H Zhu and G-L Yuan ldquoRobust visual trackingbased on online discrimination dictionary learningrdquo Journalof Electronics amp Information Technology vol 37 no 7pp 1654ndash1659 2015

[7] L Matthews T Ishikawa S Baker et al ldquoe template updateproblemrdquo IEEE Transactions on Pattern Analysis andMachineIntelligence vol 26 no 6 pp 810ndash815 2004

[8] D A Ross J Lim R-S Lin and M-H Yang ldquoIncrementallearning for robust visual trackingrdquo International Journal ofComputer Vision vol 77 no 1-3 pp 125ndash141 2008

[9] S Hare S Golodetz A Saffari et al ldquoStruck structuredoutput tracking with kernelsrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 38 no 10 pp 2096ndash2109 2016

[10] Z Kalal K Mikolajczyk and J Matas ldquoTracking-learning-detectionrdquo IEEE Transactions on Pattern Analysis and Ma-chine Intelligence vol 6 no 1 pp 1409ndash1422 2010

[11] D S Bolme J R Beveridge B A Draper and Y M LuildquoVisual object tracking using adaptive correlation filtersrdquo inProceedings of the 2010 IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR)pp 2544ndash2550 San Francisco CA USA June 2010

[12] J F Henriques R Caseiro P Martins and J BatistaldquoExploiting the circulant structure of tracking-by-detectionwith kernelsrdquo in Proceedings of the 12th European conferenceon Computer Vision - Volume Part IV - ECCV 2012 vol 75no 1 pp 702ndash715 Florence Italy October 2012

[13] J F Henriques R Caseiro P Martins and J Batista ldquoHigh-speed tracking with kernelized correlation filtersrdquo IEEETransactions on Pattern Analysis amp Machine Intelligencevol 37 no 3 pp 583ndash596 2015

[14] M Danelljan G Hager F Khan and M Felsberg ldquoAccuratescale estimation for robust visual trackingrdquo in Proceedings ofthe British Machine Vision Conference pp 590ndash604 BMVAPress Nottingham England September 2014

[15] Y Li and J Zhu ldquoA scale adaptive kernel correlation filtertracker with feature integrationrdquo in Proceedings of the 2014European Conference on Computer Vision (ECCV) pp 254ndash265 Springer Zurich Switzerland September 2014

[16] M Danelljan G Hager F S Khan and M FelsbergldquoLearning spatially regularized correlation filters for visualtrackingrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer Vision (ICCV) pp 4310ndash4318 IEEESantiago Chile December 2015

[17] L Bertinetto J Valmadre S Golodetz O Miksik andP H S Torr ldquoStaple complementary learners for real-timetrackingrdquo in Proceedings of the 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) pp 1401ndash1409 Las Vegas NV USA June 2016

[18] F Li C Tian W Zuo L Zhang and M-H Yang ldquoLearningspatial-temporal regularized correlation filters for visualtrackingrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 4904ndash4913 Salt Lake CityUT USA June 2018

[19] M Mueller N Smith and B Ghanem ldquoContext-awarecorrelation filter trackingrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 1396ndash1404 Honolulu HI USA July 2017

[20] W Zhong H Lu andM H Yang ldquoRobust object tracking viasparse collaborative appearance modelrdquo IEEE Transactions onImage Processing A Publication of the IEEE Signal ProcessingSociety vol 23 no 5 pp 2356ndash2368 2014

[21] Y-Z Xue and T Wang ldquoObject tracking based on cost-sensitive Adaboost algorithmrdquo Chinese Journal of GraphicArts vol 21 no 5 pp 544ndash555 2016

[22] Q Wang J Gao J Xing M Zhang and W Hu ldquoDCFnetdiscriminant correlation filters network for visual trackingrdquoin Proceedings of the 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) Honolulu HI USAJuly 2017

[23] J Valmadre L Bertinetto J F Henriques A Vedaldi andP H S Torr ldquoEnd-to-end representation learning for cor-relation filter based trackingrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 5000ndash5008 Honolulu HI USA July 2017

[24] B Li J Yan W Wu Z Zhu and X Hu ldquoHigh performancetracking with Siamese region proposal networkrdquo in Pro-ceedings of the 2018 IEEECVF Conference on Computer Visionand Pattern Recognition pp 8971ndash8980 Salt Lake City UTUSA June 2018

[25] B Uzkent and Y W Seo ldquoEnKCF ensemble of kernelizedcorrelation filters for high-speed object trackingrdquo in Pro-ceedings of the 2018 IEEEWinter Conference on Applications ofComputer Vision pp 77ndash89 Lake Tahoe NV USA March2018

Mathematical Problems in Engineering 11

of the category score and the coordinate offset ereforethe loss function can be described as follows

L xkij c l g1113872 1113873

1N

Lconf xkij c1113872 1113873 + αLloc x

kij l g1113872 11138731113960 1113961 (2)

where xkij 1 means that the candidate bounding box i

matches the object real bounding box j with category p

successfully and otherwise xkij 0 means the match fails N

is the number of candidate bounding box that can bematched with the true value Lloc is the position loss functionsmooth L1 loss and α is set to 1e network parameters canbe optimized according to the result of the loss function

32 KCF Tracking Based on Temporal Information KCFalgorithm is a classical discriminative-based object trackingalgorithm which has good performance in tracking speed andtracking accuracy In the tracking process the objectbounding box of the KCF algorithm has been set and the sizeof the object scale has not changed from beginning to endHowever the object size often changes in the tracking videosequence which will lead to the drift of the bounding box inthe tracking process of the tracker even resulting in trackingfailure In addition the KCF algorithm cannot deal with theocclusion of the object in the tracking process which will leadto the feature extraction error when training the filter modelWhen the object moves rapidly some object features cannotbe extracted because of the fixed size of the searching boxwhere the quality of the detection model will be reduced andthe tracking failure will be caused when updating the modelIn order to solve the problem of tracking failure caused by theKCF algorithm in the above situations some scholars im-proved the KCF algorithm and proposed some novel yeteffective object tracking algorithms based on deep learningdetection and a large number of experiment results show thatthe improved algorithm has better accuracy and robustnessthan the original KCF algorithm

As for complex monitoring applications the real-timeperformance of object tracking is very important We selectKCF as the basic tracking algorithm which has a greateradvantage in speed In addition considering the charac-teristics of large changes in object scale a multiscale adaptivemodule is added in KCF HOG features are adopted to trainthe classifier and transform it into a ridge regression modelso as to establish themapping relationship between the inputsample variable x and the output response y e ridgeregression objective function can be rewritten as follows

minω

1113944i

f xi( 1113857 minus yi( 11138572

+ λω2 (3)

where λ(λge 0) is a regularization parameter e regulari-zation term is added to avoid the occurrence of overfitting inoptimization In order to minimize the gap between thesample label predicted by the regression model and the reallabel a weight coefficient is assigned to each sample to obtaina closed solution formula for the regression parameterserefore the analytical solution ω can be deduced andrepresented as

ω XTX + λI1113872 1113873

minus1X

Ty (4)

Due to the time-consuming calculation of dense sam-pling in equation (3) cyclic shifting is used to constructtraining samples and the problem domain is transformedinto the discrete Fourier domain e characteristics of thecirculant matrix can avoid the process of matrix inversionand accelerate feature space learning e circulant matrixcan be diagonalized and this can be described as follows

X Fdiag(1113954x)FH

(5)

In order to simplify the calculation the features obtainedby ridge regression with linear space are mapped to thenonlinear space through the kernel function and a dualproblem is solved in the nonlinear spacerough themappingfunction ϕ(x) the classifier can be denoted as follows

f xi( 1113857 ωTϕ xi( 1113857 (6)

Given ω 1113936 iaiϕ(xi) the solution of ω can be trans-formed into the solution of α erefore on the basis of thekernel function K ϕ(X)ϕ(X)T we can get the solutionbased on the ridge regression under the kernel functionnamely

α (K + λI)minus1

y (7)

Finally we can get the response results of all test samplesin the Fourier domain

f(z) 1113954kxzΘ1113954α (8)

e sample with the strongest response is selected as theobject position in the current frame

e overall framework of the tracking algorithm isshown in Figure 5 First the object is initialized in the firstframe and the features of the object are extracted and thenthe ridge regression model is trained to obtain the optimalfilter parameters then in the process of object tracking thefeature is extracted on the current frame and convolutionoperation is performed with the filter template trained in theprevious frame We can get the response map where themaximum correlation value is the object position

In order to adapt the change of the object scale a scaleadaptive strategy is developed to ensure the stability oftracking Taking the object position as the center rectan-gular bounding boxes with different scales are selected assamples and their HOG features are extracted respectivelyerefore we can get the respective sample responsesR0 R+1 andRminus1 after tracking the classifier and obtain thestrongest response after comparison

R max R0 R+1 Rminus1( 1113857 (9)

e rectangular bounding box corresponding to thesample with the strongest response is the current objectscale where the improved KCF can be used for multiscaleadaptation selection and the amount of calculation is smalland efficient and feasible

Mathematical Problems in Engineering 5

33 Object Detection and Tracking for Spatiotemporal FusionAs we all know using deep learning for object detection toextract single-frame image features has high accuracy canidentify and classify unknown objects and has high ro-bustness However the object detection does not combinethe temporal information relationship between the con-secutive frame in the video which may lead to the misseddetection and slow running speed KCF tracking isachieved by extracting the characteristics of continuousframe images to train filters in ridge regression where thecalculation is small and the processing speed is also fastHowever it is easy to accumulate errors because oftracking drift and be easily affected by object occlusionand background interference erefore the fusion oftemporal information and spatial information can makefull use of the advantages of deep learning and KCFimprove overall performance and achieve more accurateand stable detection and tracking on the basis of ro-bustness and real-time performance

In the process of information fusion the spatial positioninformation of the object is determined by the deeplearning-based object detection algorithm in the first frameand then the position of the object in the first frame is used asthe input of the KCF tracking algorithm and the trackingalgorithm is used to track the object in the following framesAfter tracking a fixed number of frames the detectionmechanism is run to ensure the accuracy of continuousdetection and tracking through the YOLO-V3 detectionalgorithm e number of tracking frames between the twodetection operations can be determined by experimentGenerally it can be set to 50 frames In addition we can alsouse the confidence of the detection results as the basis oftemplate refresh and recapture

After running the redetection mechanism it is not surewhich one is better to track candidate bounding boxes ordetect candidate bounding boxes obtained by the redetectionmodule erefore this paper designs a candidate frameselection strategy Firstly the overlap ratio between detec-tion candidate bounding box Si and tracking candidatebounding box Kj is calculated to judge whether the detectedand tracked objects are the same In this paper the inter-section over union (IOU) is used as the criterion of overlap

ratio e IOU of two candidate bounding boxes can bewritten as follows

IOU Si capKj

Si capKj (10)

If forallKj IOU(Si Kj)lt 04 Si will be regarded as a newobject and output to achieve the initialization of the trackingalgorithm If existKj IOU(Si Kj)lt 04 it is considered that thedetection bounding box Si and the tracking bounding boxKj

have detected the same object then the confidence levelconf(Si) of the bounding box of the detection algorithm iscompared with the normalized response conf(Kj) of thebounding box of the tracking algorithm Finally thebounding box with higher confidence is taken as the outputof the system

4 Experimental Results and Analysis

41 Dataset and Verification Platform In order to improvethe accuracy and robustness of the detection and trackingalgorithm in the video surveillance task this experimentconstructs a surveillance dataset with 321550 images Tofacilitate performance analysis all data are labeled frame byframe in scale and position and classified according to theinterference state

e improved detection and tracking model is dividedinto three parts object detection based on deep spatial in-formation KCF tracking based on temporal informationand fusion of spatiotemporal informatione parameters ofeach part are consistent with the original model Duringoffline training all convolution layers will be updated Afteronline updating the parameters of the shallow convolutionlayer are fixed and the last two convolution layers will befine-tuned according to the test data During the trainingthe YOLO-V3 model trained by Pascal VOC2007 [24] isused as the initial weight parameter to fine-tune the networkwhere the learning rate is set to 0001 and the weight at-tenuation was 00005 30000 iteration times in training wereconducted on NVIDIA Geforce GTX 1080TI e KCFmodule uses the peak-side-lobe ratio to select the optimaltracking point and the threshold of normalized response isset to 065 If the regression response score is less than 065 it

Training and updating

Extracting hogfeatures

Training trackingfilter

KCF trackingmodel

Convolution Filter responsemap

Maximumresponse value Output

Feature extraction1st frame

T-th frame

Extracting hogfeatures

Figure 5 e overall framework of the tracking algorithm

6 Mathematical Problems in Engineering

is considered that the tracking is failed and the improvedYOLO-V3 detection network is used to recapture the op-timal object

In this paper eight representative subsets from videosurveillance are selected for verification where characteristicfor partial sequences is described in Table 1 For examplevideo 1 shows the similarity background occlusion and fastmotion video 2 shows the similarity background fastmotion and rotation video 3 and 4 show the occlusionrotation and attitude change and video 5 shows the fastmotion illumination and similarity background esimulation platform is AMD Ryzen 5 3500U host with31GHz and 8GB RAM

In this paper center error (CE) and overlap rate (OR) areused to compare and analyze the experimental results [19]e former is the relative number of frames whose centerposition error is less than a certain threshold and the latter isthe percentage of frames whose overlap rate of the objectbounding box exceeds the threshold In this paper theposition error of 20 and the overlap rate of 06 are selected asthe threshold of tracking success Because of the differentthresholds there are great differences in quantitative anal-ysis erefore precision plot and success plot are used toquantitatively analyze the performance of the comparisonalgorithms

42AblationAnalysis Our proposed method in this paper isan improved tracking method based on KCF to achieve theeffect of scale adaptation In order to illustrate the effec-tiveness the comparison experiment in this paper selectstracking methods with adaptive scale capabilities for com-parison such as KCF SAMF DSST CFNet [23] SiamRPN[24] and DKCF [25] where precision refers to the errorbetween the tracking point and the labeled point It can beknown that the result of KCF only updates the position of theobject (x y) and the size of the object remains unchanged sothe adaptability to the change of the object scale is relativelypoor SAMF is also a modified algorithm on the basis ofKCF and the object feature adds color features (color nameCN) which means that HOG features and CN features arecombined In addition multiscales 1 0985 099 0995 1005101 101 1015 are added to the scale pooling and theoptimal scale is cyclically selected at the expense of trackingspeed DSST uses two mutually independent filters for scalecalculation and object positioning where 17 scale changefactors and 33 interpolated scale change factors are estab-lished for scale evaluation and object positioning SiamFC isan object tracking algorithm based on a fully convolutionSiamese network where multiscale object fusion is imple-mented through a pyramid strategy to improve trackingaccuracy our proposed algorithm is a detect-before-trackmodel that uses deep neural networks in template updatingand scale adaptation e results of object detection andtracking under the influence of different environments areshown in Table 2 and the precision plot and success plot ofdetection and tracking in 8 different video sequences areshown in Figure 6 It can be seen from Table 2 and Figure 6that compared with video 1 the tracking success rates of

videos 2 3 4 and 5 have different degrees of decline It canbe seen that occlusion scale change motion blur and il-lumination have an impact on the detection and trackingeffect of which occlusion and illumination changes have agreater impact Different degrees of motion blur have dif-ferent effects on detection and tracking When the objectoverlap rate threshold is set to 06 the average detection andtracking accuracy is 7617 and the average speed can reach18 FPS e slower speed of video 2 is caused by the ap-pearance of new objects in the field of view e object scalein video 4 is larger so the detection and tracking time islonger

Video 2 under the condition of object occlusion andvideo 5 under the condition of illumination changes areselected for comparative experiments Our proposedtracking algorithm is compared with a single tracking al-gorithm and detection algorithm Video 2 has the phe-nomenon of object occlusion e experimental results areshown in Table 2 and Figure 6 In terms of center error andoverlap rate the fusion algorithm is obviously better Deeplearning detection algorithm may not be able to detect theobject with too small scale resulting in low recall rate In thelong-term detection and tracking the correlation filteringtracking algorithm will accumulate errors resulting in pooraccuracy Especially for the occluded object the trackingdrift phenomenon is easy to occur ese reasons make thecenter error and overlap rate of the single detection ortracking algorithm not high e fusion algorithm ensures ahigh recall rate through KCF tracking and corrects thecumulative error by YOLO-V3 detection After the object isoccluded it can still recapture the object again and keeptracking which solves the object lost problem in objectdetection and tracking

ere is illumination change in video 5 e experi-mental results are shown in Table 3 and Figure 7 Due tothe influence of illumination change it is difficult todistinguish the illumination and shade between the objectedge and the background which makes the objectbounding box cannot be determined for detection andtracking Even if the object position can be detected and

Table 1 Characteristic for partial sequences

Sequences CharacteristicBenchmarkvideo 1 Similarity background occlusion fast motion

Benchmarkvideo 2 Similarity background fast motion rotation

Benchmarkvideo 3 Occlusion rotation

Benchmarkvideo 4 Fast motion attitude change

Benchmarkvideo 5 Fast motion illumination similarity background

Benchmarkvideo 6 Occlusion similarity background blurry

Benchmarkvideo 7 Occlusion scale change angle of view

Benchmarkvideo 8 Occlusion rotation illumination

Mathematical Problems in Engineering 7

tracked the judgment of the object scale is not accurateerefore the accuracy of center position error is higherbut the overlap rate is lower in the KCF tracking algo-rithm YOLO-V3 detection algorithm has strong ro-bustness but it has the phenomenon of missing detectionerefore simulation results show that our proposedfusion algorithm has better detection and tracking per-formance in the complex environment

43ComparativeExperimentandAnalysis In this paper weselect different detection and tracking algorithms toconduct comparative experiments on single-object videoswhere the SSD and YOLO-V3 algorithms that are widelyused are selected in the spatial dimension and the classicsingle-object tracking algorithms DSST KCF and SAMFare selected in the temporal dimension e experiment isdivided into two parts e first part is a comparison of asingle spatial detection algorithm or a temporal trackingalgorithm with our proposed algorithm the second part isa comparison of different detection and tracking algo-rithm combinations based on the fusion strategy Table 4

shows the comparison results of a single algorithm If thedetection algorithm is compared separately the detectionaccuracy of the YOLO-V3 algorithm is higher Overall thesuccess rate of a single algorithm is much lower than theYOLO-V3 + KCF fusion algorithm is is because thedetection algorithm is affected by the complex back-ground resulting in a large number of missed detectionsthe temporal algorithm will be affected by motion blurand the accumulated error will cause the tracking driftmaking the IOU between tracking result and ground truthless than 06

Table 5 compares the fusion effects of different algorithmsIt can be seen from Table 5 that the YOLO-V3+KCF al-gorithm has the best effect Because the KCF algorithm inTable 4 has a better effect in the tracking algorithm the overalleffect of the YOLO-V3+KCF is also better than SSD+DSSTand SSD+SAMF Because the tracking algorithm usestemporal information to eliminate the missing detection ofthe detection algorithm and the detection algorithm correctsthe drift of the tracking result by accurately detecting a singleobject the success rate of the fusion algorithm detection ismore than that of the single algorithm in Table 4

Figure 8 shows the qualitative results of differentcomparison algorithms Table 6 is a quantitative com-parison for different sequences In Figure 8(a) there arefactors such as object scale changes illumination changesand background interference In the whole trackingprocess only DKCF SiamRPN and our proposed algo-rithm have better tracking results However due to the

Table 2 e quantitative analysis for testing sequences

Indexes Video 1 Video 2 Video 3 Video 4 Video 5 Video 6 Video 7 Video 8CLE 106 296 311 238 189 210 175 87OR 088 072 062 081 068 071 058 073FPS 174 120 209 168 141 191 185 172

10

08

06

04

02

000 10 20 30 40 50

Location error threshold

Prec

ision

Video 2Video 1Video 5

Video 7Video 8Video 3

Video 6Video 4

(a)

10

08

06

04

02

000 10 20 30 40 50

Overlap rate thresholdSu

cces

s rat

e

Video 2Video 1Video 5

Video 7Video 8Video 3

Video 6Video 4

(b)

Figure 6 e tracking results for different testing sequences (a) Precision plot and (b) success plot

Table 3 Detection and tracking results for different modules

Index (average) YOLO-V3 KCF FusionCLE (pixel) 169 146 105OR 0816 0752 0861Frame rate (FPS) 87 460 161

8 Mathematical Problems in Engineering

continuous change of object scale the KCF trackingtemplate-introduced background interference informa-tion gradually accumulates and finally there is a largetracking deviation (such as the 640th frame) Our pro-posed algorithm can automatically adjust the trackingbounding box size according to the object scale changethereby reducing the background interference informa-tion so it can always estimate the location and the scale of

the object the object in video 7 has dramatic changes inillumination and scale (frames 65 110 and 351 inFigure 8(b)) In the whole tracking process only ourproposed algorithm and SiamRPN can complete thetracking of the entire video and other methods cannotadapt to drastic changes in illumination and scale theobject in video 6 has a certain scale and posture changewhere KCF SAMF DSST CFNet SiamRPN and our

10

08

06

04

02

000 10 20 30 40 50

Location error threshold

Prec

ision

YOLO-V3 + KCFYOLO-V3KCF model

(a)

10

08

06

04

02

000 10 20 30 40 50

Overlap rate threshold

Succ

ess r

ate

YOLO-V3 + KCFYOLOKCF model

(b)

Figure 7 Ablation analysis for different modules in video 5 (a) Precision plot and (b) success plot

Table 4 Comparison results for the single tracking algorithm

Modules Object detection Object tracking FusionModel SSD YOLO-V3 DSST SAMF KCF YOLO-V3 +KCFCLE (pixel) 211 169 1727 1328 146 101OR 0524 0816 0711 0625 0752 0841Frame rate (FPS) 65 87 342 261 460 161

Table 5 Tracking performance of different module combinations

Index (average) YOLO-V3+KCF YOLO-V3 +DSST YOLO-V3+SAMF SSD+KCF SSD+KCF SSD+KCFCLE (pixel) 105 136 165 152 183 175OR 0861 0782 0771 0837 0825 0776Frame rate (FPS) 161 132 91 122 109 87

Mathematical Problems in Engineering 9

proposed algorithm have better tracking performance butour OR and CLE are the highest

5 Conclusion

In a complex surveillance video object detection andtracking usually suffers from various environmental in-terference especially scale changes occlusion illuminationchanges and motion blur is paper proposes an objectdetection and tracking model based on spatiotemporalinformation fusion which uses deep learning to detect andextract spatial information improve detection accuracyand avoid object position drift and then an improved KCFtracking is used to track temporal information so as toavoid missed detection finally the spatiotemporal infor-mation fusion strategy is designed to make detection in-formation and tracking information complementation

e results show that our proposed algorithm can effi-ciently continuously detect and track objects in differentcomplex scenes To a certain extent it can cope with theinfluence of the abovementioned environmental interfer-ence factors has both robustness and stable performanceHowever the detection and tracking effect with too smallscale is slightly worse so the next step will be to makeimprovements on it

Data Availability

e labeled dataset used to support the findings of this studyis available from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

ProposedSAMFSiamRPNKCF

CFNetDKCFDSST

(a)

(b)

(c)

Figure 8 Qualitative comparison of different tracking algorithms in different scenarios (a) Video 8 (b) video 7 and (c) video 6

Table 6 Quantitative comparison for different sequences

SequencesSuccess rate Center location error

KCF SAMF DSST DKCF CFNet SiamRPN Our KCF SAMF DSST DKCF CFNet SiamRPN OurVideo 8 078 056 062 070 075 079 086 253 423 244 249 170 144 91Video 7 060 048 065 061 078 081 081 217 286 92 374 212 108 75Video 6 079 067 058 082 067 079 079 127 71 63 126 168 156 93Video 5 061 052 054 077 047 040 042 146 250 114 171 152 29 81Video 4 073 068 071 075 075 088 082 273 223 144 249 170 149 101Video 3 068 070 072 076 079 083 087 317 283 92 374 212 108 84Video 2 059 057 062 065 075 081 085 125 91 63 126 168 156 89Video 1 068 062 066 072 061 079 079 746 230 214 271 152 199 182

10 Mathematical Problems in Engineering

References

[1] C Wu H Sun H Wang et al ldquoOnline multi-object trackingvia combining discriminative correlation filters with makingdecisionrdquo IEEE Access vol 6 pp 43499ndash43512 2018

[2] Y Qi C Wu D Chen and Y Lu Superpixel tracking basedon sparse representationrdquo Journal of Electronics and Infor-mation Technology vol 37 no 3 pp 529ndash535 2015

[3] G Yuan and M Xue ldquoVisual tracking based on sparse densestructure representation and online robust dictionary learn-ingrdquo Journal of Electronics amp Information Technology vol 37no 3 pp 536ndash542 2015

[4] H Luo B-K Zhong and F-S Kong ldquoTracking usingweighted block compressed sensing and location predictionrdquoJournal of Electronics amp Information Technology vol 37 no 5pp 1160ndash1166 2015

[5] Z-Q Hou A-Q Huang W-S Yu and X Liu ldquoVisual objecttracking method based on local patch model and modelupdaterdquo Journal of Electronics amp Information Technologyvol 37 no 6 pp 1357ndash1364 2015

[6] M Xue H Zhu and G-L Yuan ldquoRobust visual trackingbased on online discrimination dictionary learningrdquo Journalof Electronics amp Information Technology vol 37 no 7pp 1654ndash1659 2015

[7] L Matthews T Ishikawa S Baker et al ldquoe template updateproblemrdquo IEEE Transactions on Pattern Analysis andMachineIntelligence vol 26 no 6 pp 810ndash815 2004

[8] D A Ross J Lim R-S Lin and M-H Yang ldquoIncrementallearning for robust visual trackingrdquo International Journal ofComputer Vision vol 77 no 1-3 pp 125ndash141 2008

[9] S Hare S Golodetz A Saffari et al ldquoStruck structuredoutput tracking with kernelsrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 38 no 10 pp 2096ndash2109 2016

[10] Z Kalal K Mikolajczyk and J Matas ldquoTracking-learning-detectionrdquo IEEE Transactions on Pattern Analysis and Ma-chine Intelligence vol 6 no 1 pp 1409ndash1422 2010

[11] D S Bolme J R Beveridge B A Draper and Y M LuildquoVisual object tracking using adaptive correlation filtersrdquo inProceedings of the 2010 IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR)pp 2544ndash2550 San Francisco CA USA June 2010

[12] J F Henriques R Caseiro P Martins and J BatistaldquoExploiting the circulant structure of tracking-by-detectionwith kernelsrdquo in Proceedings of the 12th European conferenceon Computer Vision - Volume Part IV - ECCV 2012 vol 75no 1 pp 702ndash715 Florence Italy October 2012

[13] J F Henriques R Caseiro P Martins and J Batista ldquoHigh-speed tracking with kernelized correlation filtersrdquo IEEETransactions on Pattern Analysis amp Machine Intelligencevol 37 no 3 pp 583ndash596 2015

[14] M Danelljan G Hager F Khan and M Felsberg ldquoAccuratescale estimation for robust visual trackingrdquo in Proceedings ofthe British Machine Vision Conference pp 590ndash604 BMVAPress Nottingham England September 2014

[15] Y Li and J Zhu ldquoA scale adaptive kernel correlation filtertracker with feature integrationrdquo in Proceedings of the 2014European Conference on Computer Vision (ECCV) pp 254ndash265 Springer Zurich Switzerland September 2014

[16] M Danelljan G Hager F S Khan and M FelsbergldquoLearning spatially regularized correlation filters for visualtrackingrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer Vision (ICCV) pp 4310ndash4318 IEEESantiago Chile December 2015

[17] L Bertinetto J Valmadre S Golodetz O Miksik andP H S Torr ldquoStaple complementary learners for real-timetrackingrdquo in Proceedings of the 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) pp 1401ndash1409 Las Vegas NV USA June 2016

[18] F Li C Tian W Zuo L Zhang and M-H Yang ldquoLearningspatial-temporal regularized correlation filters for visualtrackingrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 4904ndash4913 Salt Lake CityUT USA June 2018

[19] M Mueller N Smith and B Ghanem ldquoContext-awarecorrelation filter trackingrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 1396ndash1404 Honolulu HI USA July 2017

[20] W Zhong H Lu andM H Yang ldquoRobust object tracking viasparse collaborative appearance modelrdquo IEEE Transactions onImage Processing A Publication of the IEEE Signal ProcessingSociety vol 23 no 5 pp 2356ndash2368 2014

[21] Y-Z Xue and T Wang ldquoObject tracking based on cost-sensitive Adaboost algorithmrdquo Chinese Journal of GraphicArts vol 21 no 5 pp 544ndash555 2016

[22] Q Wang J Gao J Xing M Zhang and W Hu ldquoDCFnetdiscriminant correlation filters network for visual trackingrdquoin Proceedings of the 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) Honolulu HI USAJuly 2017

[23] J Valmadre L Bertinetto J F Henriques A Vedaldi andP H S Torr ldquoEnd-to-end representation learning for cor-relation filter based trackingrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 5000ndash5008 Honolulu HI USA July 2017

[24] B Li J Yan W Wu Z Zhu and X Hu ldquoHigh performancetracking with Siamese region proposal networkrdquo in Pro-ceedings of the 2018 IEEECVF Conference on Computer Visionand Pattern Recognition pp 8971ndash8980 Salt Lake City UTUSA June 2018

[25] B Uzkent and Y W Seo ldquoEnKCF ensemble of kernelizedcorrelation filters for high-speed object trackingrdquo in Pro-ceedings of the 2018 IEEEWinter Conference on Applications ofComputer Vision pp 77ndash89 Lake Tahoe NV USA March2018

Mathematical Problems in Engineering 11

33 Object Detection and Tracking for Spatiotemporal FusionAs we all know using deep learning for object detection toextract single-frame image features has high accuracy canidentify and classify unknown objects and has high ro-bustness However the object detection does not combinethe temporal information relationship between the con-secutive frame in the video which may lead to the misseddetection and slow running speed KCF tracking isachieved by extracting the characteristics of continuousframe images to train filters in ridge regression where thecalculation is small and the processing speed is also fastHowever it is easy to accumulate errors because oftracking drift and be easily affected by object occlusionand background interference erefore the fusion oftemporal information and spatial information can makefull use of the advantages of deep learning and KCFimprove overall performance and achieve more accurateand stable detection and tracking on the basis of ro-bustness and real-time performance

In the process of information fusion the spatial positioninformation of the object is determined by the deeplearning-based object detection algorithm in the first frameand then the position of the object in the first frame is used asthe input of the KCF tracking algorithm and the trackingalgorithm is used to track the object in the following framesAfter tracking a fixed number of frames the detectionmechanism is run to ensure the accuracy of continuousdetection and tracking through the YOLO-V3 detectionalgorithm e number of tracking frames between the twodetection operations can be determined by experimentGenerally it can be set to 50 frames In addition we can alsouse the confidence of the detection results as the basis oftemplate refresh and recapture

After running the redetection mechanism it is not surewhich one is better to track candidate bounding boxes ordetect candidate bounding boxes obtained by the redetectionmodule erefore this paper designs a candidate frameselection strategy Firstly the overlap ratio between detec-tion candidate bounding box Si and tracking candidatebounding box Kj is calculated to judge whether the detectedand tracked objects are the same In this paper the inter-section over union (IOU) is used as the criterion of overlap

ratio e IOU of two candidate bounding boxes can bewritten as follows

IOU Si capKj

Si capKj (10)

If forallKj IOU(Si Kj)lt 04 Si will be regarded as a newobject and output to achieve the initialization of the trackingalgorithm If existKj IOU(Si Kj)lt 04 it is considered that thedetection bounding box Si and the tracking bounding boxKj

have detected the same object then the confidence levelconf(Si) of the bounding box of the detection algorithm iscompared with the normalized response conf(Kj) of thebounding box of the tracking algorithm Finally thebounding box with higher confidence is taken as the outputof the system

4 Experimental Results and Analysis

41 Dataset and Verification Platform In order to improvethe accuracy and robustness of the detection and trackingalgorithm in the video surveillance task this experimentconstructs a surveillance dataset with 321550 images Tofacilitate performance analysis all data are labeled frame byframe in scale and position and classified according to theinterference state

e improved detection and tracking model is dividedinto three parts object detection based on deep spatial in-formation KCF tracking based on temporal informationand fusion of spatiotemporal informatione parameters ofeach part are consistent with the original model Duringoffline training all convolution layers will be updated Afteronline updating the parameters of the shallow convolutionlayer are fixed and the last two convolution layers will befine-tuned according to the test data During the trainingthe YOLO-V3 model trained by Pascal VOC2007 [24] isused as the initial weight parameter to fine-tune the networkwhere the learning rate is set to 0001 and the weight at-tenuation was 00005 30000 iteration times in training wereconducted on NVIDIA Geforce GTX 1080TI e KCFmodule uses the peak-side-lobe ratio to select the optimaltracking point and the threshold of normalized response isset to 065 If the regression response score is less than 065 it

Training and updating

Extracting hogfeatures

Training trackingfilter

KCF trackingmodel

Convolution Filter responsemap

Maximumresponse value Output

Feature extraction1st frame

T-th frame

Extracting hogfeatures

Figure 5 e overall framework of the tracking algorithm

6 Mathematical Problems in Engineering

is considered that the tracking is failed and the improvedYOLO-V3 detection network is used to recapture the op-timal object

In this paper eight representative subsets from videosurveillance are selected for verification where characteristicfor partial sequences is described in Table 1 For examplevideo 1 shows the similarity background occlusion and fastmotion video 2 shows the similarity background fastmotion and rotation video 3 and 4 show the occlusionrotation and attitude change and video 5 shows the fastmotion illumination and similarity background esimulation platform is AMD Ryzen 5 3500U host with31GHz and 8GB RAM

In this paper center error (CE) and overlap rate (OR) areused to compare and analyze the experimental results [19]e former is the relative number of frames whose centerposition error is less than a certain threshold and the latter isthe percentage of frames whose overlap rate of the objectbounding box exceeds the threshold In this paper theposition error of 20 and the overlap rate of 06 are selected asthe threshold of tracking success Because of the differentthresholds there are great differences in quantitative anal-ysis erefore precision plot and success plot are used toquantitatively analyze the performance of the comparisonalgorithms

42AblationAnalysis Our proposed method in this paper isan improved tracking method based on KCF to achieve theeffect of scale adaptation In order to illustrate the effec-tiveness the comparison experiment in this paper selectstracking methods with adaptive scale capabilities for com-parison such as KCF SAMF DSST CFNet [23] SiamRPN[24] and DKCF [25] where precision refers to the errorbetween the tracking point and the labeled point It can beknown that the result of KCF only updates the position of theobject (x y) and the size of the object remains unchanged sothe adaptability to the change of the object scale is relativelypoor SAMF is also a modified algorithm on the basis ofKCF and the object feature adds color features (color nameCN) which means that HOG features and CN features arecombined In addition multiscales 1 0985 099 0995 1005101 101 1015 are added to the scale pooling and theoptimal scale is cyclically selected at the expense of trackingspeed DSST uses two mutually independent filters for scalecalculation and object positioning where 17 scale changefactors and 33 interpolated scale change factors are estab-lished for scale evaluation and object positioning SiamFC isan object tracking algorithm based on a fully convolutionSiamese network where multiscale object fusion is imple-mented through a pyramid strategy to improve trackingaccuracy our proposed algorithm is a detect-before-trackmodel that uses deep neural networks in template updatingand scale adaptation e results of object detection andtracking under the influence of different environments areshown in Table 2 and the precision plot and success plot ofdetection and tracking in 8 different video sequences areshown in Figure 6 It can be seen from Table 2 and Figure 6that compared with video 1 the tracking success rates of

videos 2 3 4 and 5 have different degrees of decline It canbe seen that occlusion scale change motion blur and il-lumination have an impact on the detection and trackingeffect of which occlusion and illumination changes have agreater impact Different degrees of motion blur have dif-ferent effects on detection and tracking When the objectoverlap rate threshold is set to 06 the average detection andtracking accuracy is 7617 and the average speed can reach18 FPS e slower speed of video 2 is caused by the ap-pearance of new objects in the field of view e object scalein video 4 is larger so the detection and tracking time islonger

Video 2 under the condition of object occlusion andvideo 5 under the condition of illumination changes areselected for comparative experiments Our proposedtracking algorithm is compared with a single tracking al-gorithm and detection algorithm Video 2 has the phe-nomenon of object occlusion e experimental results areshown in Table 2 and Figure 6 In terms of center error andoverlap rate the fusion algorithm is obviously better Deeplearning detection algorithm may not be able to detect theobject with too small scale resulting in low recall rate In thelong-term detection and tracking the correlation filteringtracking algorithm will accumulate errors resulting in pooraccuracy Especially for the occluded object the trackingdrift phenomenon is easy to occur ese reasons make thecenter error and overlap rate of the single detection ortracking algorithm not high e fusion algorithm ensures ahigh recall rate through KCF tracking and corrects thecumulative error by YOLO-V3 detection After the object isoccluded it can still recapture the object again and keeptracking which solves the object lost problem in objectdetection and tracking

ere is illumination change in video 5 e experi-mental results are shown in Table 3 and Figure 7 Due tothe influence of illumination change it is difficult todistinguish the illumination and shade between the objectedge and the background which makes the objectbounding box cannot be determined for detection andtracking Even if the object position can be detected and

Table 1 Characteristic for partial sequences

Sequences CharacteristicBenchmarkvideo 1 Similarity background occlusion fast motion

Benchmarkvideo 2 Similarity background fast motion rotation

Benchmarkvideo 3 Occlusion rotation

Benchmarkvideo 4 Fast motion attitude change

Benchmarkvideo 5 Fast motion illumination similarity background

Benchmarkvideo 6 Occlusion similarity background blurry

Benchmarkvideo 7 Occlusion scale change angle of view

Benchmarkvideo 8 Occlusion rotation illumination

Mathematical Problems in Engineering 7

tracked the judgment of the object scale is not accurateerefore the accuracy of center position error is higherbut the overlap rate is lower in the KCF tracking algo-rithm YOLO-V3 detection algorithm has strong ro-bustness but it has the phenomenon of missing detectionerefore simulation results show that our proposedfusion algorithm has better detection and tracking per-formance in the complex environment

43ComparativeExperimentandAnalysis In this paper weselect different detection and tracking algorithms toconduct comparative experiments on single-object videoswhere the SSD and YOLO-V3 algorithms that are widelyused are selected in the spatial dimension and the classicsingle-object tracking algorithms DSST KCF and SAMFare selected in the temporal dimension e experiment isdivided into two parts e first part is a comparison of asingle spatial detection algorithm or a temporal trackingalgorithm with our proposed algorithm the second part isa comparison of different detection and tracking algo-rithm combinations based on the fusion strategy Table 4

shows the comparison results of a single algorithm If thedetection algorithm is compared separately the detectionaccuracy of the YOLO-V3 algorithm is higher Overall thesuccess rate of a single algorithm is much lower than theYOLO-V3 + KCF fusion algorithm is is because thedetection algorithm is affected by the complex back-ground resulting in a large number of missed detectionsthe temporal algorithm will be affected by motion blurand the accumulated error will cause the tracking driftmaking the IOU between tracking result and ground truthless than 06

Table 5 compares the fusion effects of different algorithmsIt can be seen from Table 5 that the YOLO-V3+KCF al-gorithm has the best effect Because the KCF algorithm inTable 4 has a better effect in the tracking algorithm the overalleffect of the YOLO-V3+KCF is also better than SSD+DSSTand SSD+SAMF Because the tracking algorithm usestemporal information to eliminate the missing detection ofthe detection algorithm and the detection algorithm correctsthe drift of the tracking result by accurately detecting a singleobject the success rate of the fusion algorithm detection ismore than that of the single algorithm in Table 4

Figure 8 shows the qualitative results of differentcomparison algorithms Table 6 is a quantitative com-parison for different sequences In Figure 8(a) there arefactors such as object scale changes illumination changesand background interference In the whole trackingprocess only DKCF SiamRPN and our proposed algo-rithm have better tracking results However due to the

Table 2 e quantitative analysis for testing sequences

Indexes Video 1 Video 2 Video 3 Video 4 Video 5 Video 6 Video 7 Video 8CLE 106 296 311 238 189 210 175 87OR 088 072 062 081 068 071 058 073FPS 174 120 209 168 141 191 185 172

10

08

06

04

02

000 10 20 30 40 50

Location error threshold

Prec

ision

Video 2Video 1Video 5

Video 7Video 8Video 3

Video 6Video 4

(a)

10

08

06

04

02

000 10 20 30 40 50

Overlap rate thresholdSu

cces

s rat

e

Video 2Video 1Video 5

Video 7Video 8Video 3

Video 6Video 4

(b)

Figure 6 e tracking results for different testing sequences (a) Precision plot and (b) success plot

Table 3 Detection and tracking results for different modules

Index (average) YOLO-V3 KCF FusionCLE (pixel) 169 146 105OR 0816 0752 0861Frame rate (FPS) 87 460 161

8 Mathematical Problems in Engineering

continuous change of object scale the KCF trackingtemplate-introduced background interference informa-tion gradually accumulates and finally there is a largetracking deviation (such as the 640th frame) Our pro-posed algorithm can automatically adjust the trackingbounding box size according to the object scale changethereby reducing the background interference informa-tion so it can always estimate the location and the scale of

the object the object in video 7 has dramatic changes inillumination and scale (frames 65 110 and 351 inFigure 8(b)) In the whole tracking process only ourproposed algorithm and SiamRPN can complete thetracking of the entire video and other methods cannotadapt to drastic changes in illumination and scale theobject in video 6 has a certain scale and posture changewhere KCF SAMF DSST CFNet SiamRPN and our

10

08

06

04

02

000 10 20 30 40 50

Location error threshold

Prec

ision

YOLO-V3 + KCFYOLO-V3KCF model

(a)

10

08

06

04

02

000 10 20 30 40 50

Overlap rate threshold

Succ

ess r

ate

YOLO-V3 + KCFYOLOKCF model

(b)

Figure 7 Ablation analysis for different modules in video 5 (a) Precision plot and (b) success plot

Table 4 Comparison results for the single tracking algorithm

Modules Object detection Object tracking FusionModel SSD YOLO-V3 DSST SAMF KCF YOLO-V3 +KCFCLE (pixel) 211 169 1727 1328 146 101OR 0524 0816 0711 0625 0752 0841Frame rate (FPS) 65 87 342 261 460 161

Table 5 Tracking performance of different module combinations

Index (average) YOLO-V3+KCF YOLO-V3 +DSST YOLO-V3+SAMF SSD+KCF SSD+KCF SSD+KCFCLE (pixel) 105 136 165 152 183 175OR 0861 0782 0771 0837 0825 0776Frame rate (FPS) 161 132 91 122 109 87

Mathematical Problems in Engineering 9

proposed algorithm have better tracking performance butour OR and CLE are the highest

5 Conclusion

In a complex surveillance video object detection andtracking usually suffers from various environmental in-terference especially scale changes occlusion illuminationchanges and motion blur is paper proposes an objectdetection and tracking model based on spatiotemporalinformation fusion which uses deep learning to detect andextract spatial information improve detection accuracyand avoid object position drift and then an improved KCFtracking is used to track temporal information so as toavoid missed detection finally the spatiotemporal infor-mation fusion strategy is designed to make detection in-formation and tracking information complementation

e results show that our proposed algorithm can effi-ciently continuously detect and track objects in differentcomplex scenes To a certain extent it can cope with theinfluence of the abovementioned environmental interfer-ence factors has both robustness and stable performanceHowever the detection and tracking effect with too smallscale is slightly worse so the next step will be to makeimprovements on it

Data Availability

e labeled dataset used to support the findings of this studyis available from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

ProposedSAMFSiamRPNKCF

CFNetDKCFDSST

(a)

(b)

(c)

Figure 8 Qualitative comparison of different tracking algorithms in different scenarios (a) Video 8 (b) video 7 and (c) video 6

Table 6 Quantitative comparison for different sequences

SequencesSuccess rate Center location error

KCF SAMF DSST DKCF CFNet SiamRPN Our KCF SAMF DSST DKCF CFNet SiamRPN OurVideo 8 078 056 062 070 075 079 086 253 423 244 249 170 144 91Video 7 060 048 065 061 078 081 081 217 286 92 374 212 108 75Video 6 079 067 058 082 067 079 079 127 71 63 126 168 156 93Video 5 061 052 054 077 047 040 042 146 250 114 171 152 29 81Video 4 073 068 071 075 075 088 082 273 223 144 249 170 149 101Video 3 068 070 072 076 079 083 087 317 283 92 374 212 108 84Video 2 059 057 062 065 075 081 085 125 91 63 126 168 156 89Video 1 068 062 066 072 061 079 079 746 230 214 271 152 199 182

10 Mathematical Problems in Engineering

References

[1] C Wu H Sun H Wang et al ldquoOnline multi-object trackingvia combining discriminative correlation filters with makingdecisionrdquo IEEE Access vol 6 pp 43499ndash43512 2018

[2] Y Qi C Wu D Chen and Y Lu Superpixel tracking basedon sparse representationrdquo Journal of Electronics and Infor-mation Technology vol 37 no 3 pp 529ndash535 2015

[3] G Yuan and M Xue ldquoVisual tracking based on sparse densestructure representation and online robust dictionary learn-ingrdquo Journal of Electronics amp Information Technology vol 37no 3 pp 536ndash542 2015

[4] H Luo B-K Zhong and F-S Kong ldquoTracking usingweighted block compressed sensing and location predictionrdquoJournal of Electronics amp Information Technology vol 37 no 5pp 1160ndash1166 2015

[5] Z-Q Hou A-Q Huang W-S Yu and X Liu ldquoVisual objecttracking method based on local patch model and modelupdaterdquo Journal of Electronics amp Information Technologyvol 37 no 6 pp 1357ndash1364 2015

[6] M Xue H Zhu and G-L Yuan ldquoRobust visual trackingbased on online discrimination dictionary learningrdquo Journalof Electronics amp Information Technology vol 37 no 7pp 1654ndash1659 2015

[7] L Matthews T Ishikawa S Baker et al ldquoe template updateproblemrdquo IEEE Transactions on Pattern Analysis andMachineIntelligence vol 26 no 6 pp 810ndash815 2004

[8] D A Ross J Lim R-S Lin and M-H Yang ldquoIncrementallearning for robust visual trackingrdquo International Journal ofComputer Vision vol 77 no 1-3 pp 125ndash141 2008

[9] S Hare S Golodetz A Saffari et al ldquoStruck structuredoutput tracking with kernelsrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 38 no 10 pp 2096ndash2109 2016

[10] Z Kalal K Mikolajczyk and J Matas ldquoTracking-learning-detectionrdquo IEEE Transactions on Pattern Analysis and Ma-chine Intelligence vol 6 no 1 pp 1409ndash1422 2010

[11] D S Bolme J R Beveridge B A Draper and Y M LuildquoVisual object tracking using adaptive correlation filtersrdquo inProceedings of the 2010 IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR)pp 2544ndash2550 San Francisco CA USA June 2010

[12] J F Henriques R Caseiro P Martins and J BatistaldquoExploiting the circulant structure of tracking-by-detectionwith kernelsrdquo in Proceedings of the 12th European conferenceon Computer Vision - Volume Part IV - ECCV 2012 vol 75no 1 pp 702ndash715 Florence Italy October 2012

[13] J F Henriques R Caseiro P Martins and J Batista ldquoHigh-speed tracking with kernelized correlation filtersrdquo IEEETransactions on Pattern Analysis amp Machine Intelligencevol 37 no 3 pp 583ndash596 2015

[14] M Danelljan G Hager F Khan and M Felsberg ldquoAccuratescale estimation for robust visual trackingrdquo in Proceedings ofthe British Machine Vision Conference pp 590ndash604 BMVAPress Nottingham England September 2014

[15] Y Li and J Zhu ldquoA scale adaptive kernel correlation filtertracker with feature integrationrdquo in Proceedings of the 2014European Conference on Computer Vision (ECCV) pp 254ndash265 Springer Zurich Switzerland September 2014

[16] M Danelljan G Hager F S Khan and M FelsbergldquoLearning spatially regularized correlation filters for visualtrackingrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer Vision (ICCV) pp 4310ndash4318 IEEESantiago Chile December 2015

[17] L Bertinetto J Valmadre S Golodetz O Miksik andP H S Torr ldquoStaple complementary learners for real-timetrackingrdquo in Proceedings of the 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) pp 1401ndash1409 Las Vegas NV USA June 2016

[18] F Li C Tian W Zuo L Zhang and M-H Yang ldquoLearningspatial-temporal regularized correlation filters for visualtrackingrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 4904ndash4913 Salt Lake CityUT USA June 2018

[19] M Mueller N Smith and B Ghanem ldquoContext-awarecorrelation filter trackingrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 1396ndash1404 Honolulu HI USA July 2017

[20] W Zhong H Lu andM H Yang ldquoRobust object tracking viasparse collaborative appearance modelrdquo IEEE Transactions onImage Processing A Publication of the IEEE Signal ProcessingSociety vol 23 no 5 pp 2356ndash2368 2014

[21] Y-Z Xue and T Wang ldquoObject tracking based on cost-sensitive Adaboost algorithmrdquo Chinese Journal of GraphicArts vol 21 no 5 pp 544ndash555 2016

[22] Q Wang J Gao J Xing M Zhang and W Hu ldquoDCFnetdiscriminant correlation filters network for visual trackingrdquoin Proceedings of the 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) Honolulu HI USAJuly 2017

[23] J Valmadre L Bertinetto J F Henriques A Vedaldi andP H S Torr ldquoEnd-to-end representation learning for cor-relation filter based trackingrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 5000ndash5008 Honolulu HI USA July 2017

[24] B Li J Yan W Wu Z Zhu and X Hu ldquoHigh performancetracking with Siamese region proposal networkrdquo in Pro-ceedings of the 2018 IEEECVF Conference on Computer Visionand Pattern Recognition pp 8971ndash8980 Salt Lake City UTUSA June 2018

[25] B Uzkent and Y W Seo ldquoEnKCF ensemble of kernelizedcorrelation filters for high-speed object trackingrdquo in Pro-ceedings of the 2018 IEEEWinter Conference on Applications ofComputer Vision pp 77ndash89 Lake Tahoe NV USA March2018

Mathematical Problems in Engineering 11

is considered that the tracking is failed and the improvedYOLO-V3 detection network is used to recapture the op-timal object

In this paper eight representative subsets from videosurveillance are selected for verification where characteristicfor partial sequences is described in Table 1 For examplevideo 1 shows the similarity background occlusion and fastmotion video 2 shows the similarity background fastmotion and rotation video 3 and 4 show the occlusionrotation and attitude change and video 5 shows the fastmotion illumination and similarity background esimulation platform is AMD Ryzen 5 3500U host with31GHz and 8GB RAM

In this paper center error (CE) and overlap rate (OR) areused to compare and analyze the experimental results [19]e former is the relative number of frames whose centerposition error is less than a certain threshold and the latter isthe percentage of frames whose overlap rate of the objectbounding box exceeds the threshold In this paper theposition error of 20 and the overlap rate of 06 are selected asthe threshold of tracking success Because of the differentthresholds there are great differences in quantitative anal-ysis erefore precision plot and success plot are used toquantitatively analyze the performance of the comparisonalgorithms

42AblationAnalysis Our proposed method in this paper isan improved tracking method based on KCF to achieve theeffect of scale adaptation In order to illustrate the effec-tiveness the comparison experiment in this paper selectstracking methods with adaptive scale capabilities for com-parison such as KCF SAMF DSST CFNet [23] SiamRPN[24] and DKCF [25] where precision refers to the errorbetween the tracking point and the labeled point It can beknown that the result of KCF only updates the position of theobject (x y) and the size of the object remains unchanged sothe adaptability to the change of the object scale is relativelypoor SAMF is also a modified algorithm on the basis ofKCF and the object feature adds color features (color nameCN) which means that HOG features and CN features arecombined In addition multiscales 1 0985 099 0995 1005101 101 1015 are added to the scale pooling and theoptimal scale is cyclically selected at the expense of trackingspeed DSST uses two mutually independent filters for scalecalculation and object positioning where 17 scale changefactors and 33 interpolated scale change factors are estab-lished for scale evaluation and object positioning SiamFC isan object tracking algorithm based on a fully convolutionSiamese network where multiscale object fusion is imple-mented through a pyramid strategy to improve trackingaccuracy our proposed algorithm is a detect-before-trackmodel that uses deep neural networks in template updatingand scale adaptation e results of object detection andtracking under the influence of different environments areshown in Table 2 and the precision plot and success plot ofdetection and tracking in 8 different video sequences areshown in Figure 6 It can be seen from Table 2 and Figure 6that compared with video 1 the tracking success rates of

videos 2 3 4 and 5 have different degrees of decline It canbe seen that occlusion scale change motion blur and il-lumination have an impact on the detection and trackingeffect of which occlusion and illumination changes have agreater impact Different degrees of motion blur have dif-ferent effects on detection and tracking When the objectoverlap rate threshold is set to 06 the average detection andtracking accuracy is 7617 and the average speed can reach18 FPS e slower speed of video 2 is caused by the ap-pearance of new objects in the field of view e object scalein video 4 is larger so the detection and tracking time islonger

Video 2 under the condition of object occlusion andvideo 5 under the condition of illumination changes areselected for comparative experiments Our proposedtracking algorithm is compared with a single tracking al-gorithm and detection algorithm Video 2 has the phe-nomenon of object occlusion e experimental results areshown in Table 2 and Figure 6 In terms of center error andoverlap rate the fusion algorithm is obviously better Deeplearning detection algorithm may not be able to detect theobject with too small scale resulting in low recall rate In thelong-term detection and tracking the correlation filteringtracking algorithm will accumulate errors resulting in pooraccuracy Especially for the occluded object the trackingdrift phenomenon is easy to occur ese reasons make thecenter error and overlap rate of the single detection ortracking algorithm not high e fusion algorithm ensures ahigh recall rate through KCF tracking and corrects thecumulative error by YOLO-V3 detection After the object isoccluded it can still recapture the object again and keeptracking which solves the object lost problem in objectdetection and tracking

ere is illumination change in video 5 e experi-mental results are shown in Table 3 and Figure 7 Due tothe influence of illumination change it is difficult todistinguish the illumination and shade between the objectedge and the background which makes the objectbounding box cannot be determined for detection andtracking Even if the object position can be detected and

Table 1 Characteristic for partial sequences

Sequences CharacteristicBenchmarkvideo 1 Similarity background occlusion fast motion

Benchmarkvideo 2 Similarity background fast motion rotation

Benchmarkvideo 3 Occlusion rotation

Benchmarkvideo 4 Fast motion attitude change

Benchmarkvideo 5 Fast motion illumination similarity background

Benchmarkvideo 6 Occlusion similarity background blurry

Benchmarkvideo 7 Occlusion scale change angle of view

Benchmarkvideo 8 Occlusion rotation illumination

Mathematical Problems in Engineering 7

tracked the judgment of the object scale is not accurateerefore the accuracy of center position error is higherbut the overlap rate is lower in the KCF tracking algo-rithm YOLO-V3 detection algorithm has strong ro-bustness but it has the phenomenon of missing detectionerefore simulation results show that our proposedfusion algorithm has better detection and tracking per-formance in the complex environment

43ComparativeExperimentandAnalysis In this paper weselect different detection and tracking algorithms toconduct comparative experiments on single-object videoswhere the SSD and YOLO-V3 algorithms that are widelyused are selected in the spatial dimension and the classicsingle-object tracking algorithms DSST KCF and SAMFare selected in the temporal dimension e experiment isdivided into two parts e first part is a comparison of asingle spatial detection algorithm or a temporal trackingalgorithm with our proposed algorithm the second part isa comparison of different detection and tracking algo-rithm combinations based on the fusion strategy Table 4

shows the comparison results of a single algorithm If thedetection algorithm is compared separately the detectionaccuracy of the YOLO-V3 algorithm is higher Overall thesuccess rate of a single algorithm is much lower than theYOLO-V3 + KCF fusion algorithm is is because thedetection algorithm is affected by the complex back-ground resulting in a large number of missed detectionsthe temporal algorithm will be affected by motion blurand the accumulated error will cause the tracking driftmaking the IOU between tracking result and ground truthless than 06

Table 5 compares the fusion effects of different algorithmsIt can be seen from Table 5 that the YOLO-V3+KCF al-gorithm has the best effect Because the KCF algorithm inTable 4 has a better effect in the tracking algorithm the overalleffect of the YOLO-V3+KCF is also better than SSD+DSSTand SSD+SAMF Because the tracking algorithm usestemporal information to eliminate the missing detection ofthe detection algorithm and the detection algorithm correctsthe drift of the tracking result by accurately detecting a singleobject the success rate of the fusion algorithm detection ismore than that of the single algorithm in Table 4

Figure 8 shows the qualitative results of differentcomparison algorithms Table 6 is a quantitative com-parison for different sequences In Figure 8(a) there arefactors such as object scale changes illumination changesand background interference In the whole trackingprocess only DKCF SiamRPN and our proposed algo-rithm have better tracking results However due to the

Table 2 e quantitative analysis for testing sequences

Indexes Video 1 Video 2 Video 3 Video 4 Video 5 Video 6 Video 7 Video 8CLE 106 296 311 238 189 210 175 87OR 088 072 062 081 068 071 058 073FPS 174 120 209 168 141 191 185 172

10

08

06

04

02

000 10 20 30 40 50

Location error threshold

Prec

ision

Video 2Video 1Video 5

Video 7Video 8Video 3

Video 6Video 4

(a)

10

08

06

04

02

000 10 20 30 40 50

Overlap rate thresholdSu

cces

s rat

e

Video 2Video 1Video 5

Video 7Video 8Video 3

Video 6Video 4

(b)

Figure 6 e tracking results for different testing sequences (a) Precision plot and (b) success plot

Table 3 Detection and tracking results for different modules

Index (average) YOLO-V3 KCF FusionCLE (pixel) 169 146 105OR 0816 0752 0861Frame rate (FPS) 87 460 161

8 Mathematical Problems in Engineering

continuous change of object scale the KCF trackingtemplate-introduced background interference informa-tion gradually accumulates and finally there is a largetracking deviation (such as the 640th frame) Our pro-posed algorithm can automatically adjust the trackingbounding box size according to the object scale changethereby reducing the background interference informa-tion so it can always estimate the location and the scale of

the object the object in video 7 has dramatic changes inillumination and scale (frames 65 110 and 351 inFigure 8(b)) In the whole tracking process only ourproposed algorithm and SiamRPN can complete thetracking of the entire video and other methods cannotadapt to drastic changes in illumination and scale theobject in video 6 has a certain scale and posture changewhere KCF SAMF DSST CFNet SiamRPN and our

10

08

06

04

02

000 10 20 30 40 50

Location error threshold

Prec

ision

YOLO-V3 + KCFYOLO-V3KCF model

(a)

10

08

06

04

02

000 10 20 30 40 50

Overlap rate threshold

Succ

ess r

ate

YOLO-V3 + KCFYOLOKCF model

(b)

Figure 7 Ablation analysis for different modules in video 5 (a) Precision plot and (b) success plot

Table 4 Comparison results for the single tracking algorithm

Modules Object detection Object tracking FusionModel SSD YOLO-V3 DSST SAMF KCF YOLO-V3 +KCFCLE (pixel) 211 169 1727 1328 146 101OR 0524 0816 0711 0625 0752 0841Frame rate (FPS) 65 87 342 261 460 161

Table 5 Tracking performance of different module combinations

Index (average) YOLO-V3+KCF YOLO-V3 +DSST YOLO-V3+SAMF SSD+KCF SSD+KCF SSD+KCFCLE (pixel) 105 136 165 152 183 175OR 0861 0782 0771 0837 0825 0776Frame rate (FPS) 161 132 91 122 109 87

Mathematical Problems in Engineering 9

proposed algorithm have better tracking performance butour OR and CLE are the highest

5 Conclusion

In a complex surveillance video object detection andtracking usually suffers from various environmental in-terference especially scale changes occlusion illuminationchanges and motion blur is paper proposes an objectdetection and tracking model based on spatiotemporalinformation fusion which uses deep learning to detect andextract spatial information improve detection accuracyand avoid object position drift and then an improved KCFtracking is used to track temporal information so as toavoid missed detection finally the spatiotemporal infor-mation fusion strategy is designed to make detection in-formation and tracking information complementation

e results show that our proposed algorithm can effi-ciently continuously detect and track objects in differentcomplex scenes To a certain extent it can cope with theinfluence of the abovementioned environmental interfer-ence factors has both robustness and stable performanceHowever the detection and tracking effect with too smallscale is slightly worse so the next step will be to makeimprovements on it

Data Availability

e labeled dataset used to support the findings of this studyis available from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

ProposedSAMFSiamRPNKCF

CFNetDKCFDSST

(a)

(b)

(c)

Figure 8 Qualitative comparison of different tracking algorithms in different scenarios (a) Video 8 (b) video 7 and (c) video 6

Table 6 Quantitative comparison for different sequences

SequencesSuccess rate Center location error

KCF SAMF DSST DKCF CFNet SiamRPN Our KCF SAMF DSST DKCF CFNet SiamRPN OurVideo 8 078 056 062 070 075 079 086 253 423 244 249 170 144 91Video 7 060 048 065 061 078 081 081 217 286 92 374 212 108 75Video 6 079 067 058 082 067 079 079 127 71 63 126 168 156 93Video 5 061 052 054 077 047 040 042 146 250 114 171 152 29 81Video 4 073 068 071 075 075 088 082 273 223 144 249 170 149 101Video 3 068 070 072 076 079 083 087 317 283 92 374 212 108 84Video 2 059 057 062 065 075 081 085 125 91 63 126 168 156 89Video 1 068 062 066 072 061 079 079 746 230 214 271 152 199 182

10 Mathematical Problems in Engineering

References

[1] C Wu H Sun H Wang et al ldquoOnline multi-object trackingvia combining discriminative correlation filters with makingdecisionrdquo IEEE Access vol 6 pp 43499ndash43512 2018

[2] Y Qi C Wu D Chen and Y Lu Superpixel tracking basedon sparse representationrdquo Journal of Electronics and Infor-mation Technology vol 37 no 3 pp 529ndash535 2015

[3] G Yuan and M Xue ldquoVisual tracking based on sparse densestructure representation and online robust dictionary learn-ingrdquo Journal of Electronics amp Information Technology vol 37no 3 pp 536ndash542 2015

[4] H Luo B-K Zhong and F-S Kong ldquoTracking usingweighted block compressed sensing and location predictionrdquoJournal of Electronics amp Information Technology vol 37 no 5pp 1160ndash1166 2015

[5] Z-Q Hou A-Q Huang W-S Yu and X Liu ldquoVisual objecttracking method based on local patch model and modelupdaterdquo Journal of Electronics amp Information Technologyvol 37 no 6 pp 1357ndash1364 2015

[6] M Xue H Zhu and G-L Yuan ldquoRobust visual trackingbased on online discrimination dictionary learningrdquo Journalof Electronics amp Information Technology vol 37 no 7pp 1654ndash1659 2015

[7] L Matthews T Ishikawa S Baker et al ldquoe template updateproblemrdquo IEEE Transactions on Pattern Analysis andMachineIntelligence vol 26 no 6 pp 810ndash815 2004

[8] D A Ross J Lim R-S Lin and M-H Yang ldquoIncrementallearning for robust visual trackingrdquo International Journal ofComputer Vision vol 77 no 1-3 pp 125ndash141 2008

[9] S Hare S Golodetz A Saffari et al ldquoStruck structuredoutput tracking with kernelsrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 38 no 10 pp 2096ndash2109 2016

[10] Z Kalal K Mikolajczyk and J Matas ldquoTracking-learning-detectionrdquo IEEE Transactions on Pattern Analysis and Ma-chine Intelligence vol 6 no 1 pp 1409ndash1422 2010

[11] D S Bolme J R Beveridge B A Draper and Y M LuildquoVisual object tracking using adaptive correlation filtersrdquo inProceedings of the 2010 IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR)pp 2544ndash2550 San Francisco CA USA June 2010

[12] J F Henriques R Caseiro P Martins and J BatistaldquoExploiting the circulant structure of tracking-by-detectionwith kernelsrdquo in Proceedings of the 12th European conferenceon Computer Vision - Volume Part IV - ECCV 2012 vol 75no 1 pp 702ndash715 Florence Italy October 2012

[13] J F Henriques R Caseiro P Martins and J Batista ldquoHigh-speed tracking with kernelized correlation filtersrdquo IEEETransactions on Pattern Analysis amp Machine Intelligencevol 37 no 3 pp 583ndash596 2015

[14] M Danelljan G Hager F Khan and M Felsberg ldquoAccuratescale estimation for robust visual trackingrdquo in Proceedings ofthe British Machine Vision Conference pp 590ndash604 BMVAPress Nottingham England September 2014

[15] Y Li and J Zhu ldquoA scale adaptive kernel correlation filtertracker with feature integrationrdquo in Proceedings of the 2014European Conference on Computer Vision (ECCV) pp 254ndash265 Springer Zurich Switzerland September 2014

[16] M Danelljan G Hager F S Khan and M FelsbergldquoLearning spatially regularized correlation filters for visualtrackingrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer Vision (ICCV) pp 4310ndash4318 IEEESantiago Chile December 2015

[17] L Bertinetto J Valmadre S Golodetz O Miksik andP H S Torr ldquoStaple complementary learners for real-timetrackingrdquo in Proceedings of the 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) pp 1401ndash1409 Las Vegas NV USA June 2016

[18] F Li C Tian W Zuo L Zhang and M-H Yang ldquoLearningspatial-temporal regularized correlation filters for visualtrackingrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 4904ndash4913 Salt Lake CityUT USA June 2018

[19] M Mueller N Smith and B Ghanem ldquoContext-awarecorrelation filter trackingrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 1396ndash1404 Honolulu HI USA July 2017

[20] W Zhong H Lu andM H Yang ldquoRobust object tracking viasparse collaborative appearance modelrdquo IEEE Transactions onImage Processing A Publication of the IEEE Signal ProcessingSociety vol 23 no 5 pp 2356ndash2368 2014

[21] Y-Z Xue and T Wang ldquoObject tracking based on cost-sensitive Adaboost algorithmrdquo Chinese Journal of GraphicArts vol 21 no 5 pp 544ndash555 2016

[22] Q Wang J Gao J Xing M Zhang and W Hu ldquoDCFnetdiscriminant correlation filters network for visual trackingrdquoin Proceedings of the 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) Honolulu HI USAJuly 2017

[23] J Valmadre L Bertinetto J F Henriques A Vedaldi andP H S Torr ldquoEnd-to-end representation learning for cor-relation filter based trackingrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 5000ndash5008 Honolulu HI USA July 2017

[24] B Li J Yan W Wu Z Zhu and X Hu ldquoHigh performancetracking with Siamese region proposal networkrdquo in Pro-ceedings of the 2018 IEEECVF Conference on Computer Visionand Pattern Recognition pp 8971ndash8980 Salt Lake City UTUSA June 2018

[25] B Uzkent and Y W Seo ldquoEnKCF ensemble of kernelizedcorrelation filters for high-speed object trackingrdquo in Pro-ceedings of the 2018 IEEEWinter Conference on Applications ofComputer Vision pp 77ndash89 Lake Tahoe NV USA March2018

Mathematical Problems in Engineering 11

tracked the judgment of the object scale is not accurateerefore the accuracy of center position error is higherbut the overlap rate is lower in the KCF tracking algo-rithm YOLO-V3 detection algorithm has strong ro-bustness but it has the phenomenon of missing detectionerefore simulation results show that our proposedfusion algorithm has better detection and tracking per-formance in the complex environment

43ComparativeExperimentandAnalysis In this paper weselect different detection and tracking algorithms toconduct comparative experiments on single-object videoswhere the SSD and YOLO-V3 algorithms that are widelyused are selected in the spatial dimension and the classicsingle-object tracking algorithms DSST KCF and SAMFare selected in the temporal dimension e experiment isdivided into two parts e first part is a comparison of asingle spatial detection algorithm or a temporal trackingalgorithm with our proposed algorithm the second part isa comparison of different detection and tracking algo-rithm combinations based on the fusion strategy Table 4

shows the comparison results of a single algorithm If thedetection algorithm is compared separately the detectionaccuracy of the YOLO-V3 algorithm is higher Overall thesuccess rate of a single algorithm is much lower than theYOLO-V3 + KCF fusion algorithm is is because thedetection algorithm is affected by the complex back-ground resulting in a large number of missed detectionsthe temporal algorithm will be affected by motion blurand the accumulated error will cause the tracking driftmaking the IOU between tracking result and ground truthless than 06

Table 5 compares the fusion effects of different algorithmsIt can be seen from Table 5 that the YOLO-V3+KCF al-gorithm has the best effect Because the KCF algorithm inTable 4 has a better effect in the tracking algorithm the overalleffect of the YOLO-V3+KCF is also better than SSD+DSSTand SSD+SAMF Because the tracking algorithm usestemporal information to eliminate the missing detection ofthe detection algorithm and the detection algorithm correctsthe drift of the tracking result by accurately detecting a singleobject the success rate of the fusion algorithm detection ismore than that of the single algorithm in Table 4

Figure 8 shows the qualitative results of differentcomparison algorithms Table 6 is a quantitative com-parison for different sequences In Figure 8(a) there arefactors such as object scale changes illumination changesand background interference In the whole trackingprocess only DKCF SiamRPN and our proposed algo-rithm have better tracking results However due to the

Table 2 e quantitative analysis for testing sequences

Indexes Video 1 Video 2 Video 3 Video 4 Video 5 Video 6 Video 7 Video 8CLE 106 296 311 238 189 210 175 87OR 088 072 062 081 068 071 058 073FPS 174 120 209 168 141 191 185 172

10

08

06

04

02

000 10 20 30 40 50

Location error threshold

Prec

ision

Video 2Video 1Video 5

Video 7Video 8Video 3

Video 6Video 4

(a)

10

08

06

04

02

000 10 20 30 40 50

Overlap rate thresholdSu

cces

s rat

e

Video 2Video 1Video 5

Video 7Video 8Video 3

Video 6Video 4

(b)

Figure 6 e tracking results for different testing sequences (a) Precision plot and (b) success plot

Table 3 Detection and tracking results for different modules

Index (average) YOLO-V3 KCF FusionCLE (pixel) 169 146 105OR 0816 0752 0861Frame rate (FPS) 87 460 161

8 Mathematical Problems in Engineering

continuous change of object scale the KCF trackingtemplate-introduced background interference informa-tion gradually accumulates and finally there is a largetracking deviation (such as the 640th frame) Our pro-posed algorithm can automatically adjust the trackingbounding box size according to the object scale changethereby reducing the background interference informa-tion so it can always estimate the location and the scale of

the object the object in video 7 has dramatic changes inillumination and scale (frames 65 110 and 351 inFigure 8(b)) In the whole tracking process only ourproposed algorithm and SiamRPN can complete thetracking of the entire video and other methods cannotadapt to drastic changes in illumination and scale theobject in video 6 has a certain scale and posture changewhere KCF SAMF DSST CFNet SiamRPN and our

10

08

06

04

02

000 10 20 30 40 50

Location error threshold

Prec

ision

YOLO-V3 + KCFYOLO-V3KCF model

(a)

10

08

06

04

02

000 10 20 30 40 50

Overlap rate threshold

Succ

ess r

ate

YOLO-V3 + KCFYOLOKCF model

(b)

Figure 7 Ablation analysis for different modules in video 5 (a) Precision plot and (b) success plot

Table 4 Comparison results for the single tracking algorithm

Modules Object detection Object tracking FusionModel SSD YOLO-V3 DSST SAMF KCF YOLO-V3 +KCFCLE (pixel) 211 169 1727 1328 146 101OR 0524 0816 0711 0625 0752 0841Frame rate (FPS) 65 87 342 261 460 161

Table 5 Tracking performance of different module combinations

Index (average) YOLO-V3+KCF YOLO-V3 +DSST YOLO-V3+SAMF SSD+KCF SSD+KCF SSD+KCFCLE (pixel) 105 136 165 152 183 175OR 0861 0782 0771 0837 0825 0776Frame rate (FPS) 161 132 91 122 109 87

Mathematical Problems in Engineering 9

proposed algorithm have better tracking performance butour OR and CLE are the highest

5 Conclusion

In a complex surveillance video object detection andtracking usually suffers from various environmental in-terference especially scale changes occlusion illuminationchanges and motion blur is paper proposes an objectdetection and tracking model based on spatiotemporalinformation fusion which uses deep learning to detect andextract spatial information improve detection accuracyand avoid object position drift and then an improved KCFtracking is used to track temporal information so as toavoid missed detection finally the spatiotemporal infor-mation fusion strategy is designed to make detection in-formation and tracking information complementation

e results show that our proposed algorithm can effi-ciently continuously detect and track objects in differentcomplex scenes To a certain extent it can cope with theinfluence of the abovementioned environmental interfer-ence factors has both robustness and stable performanceHowever the detection and tracking effect with too smallscale is slightly worse so the next step will be to makeimprovements on it

Data Availability

e labeled dataset used to support the findings of this studyis available from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

ProposedSAMFSiamRPNKCF

CFNetDKCFDSST

(a)

(b)

(c)

Figure 8 Qualitative comparison of different tracking algorithms in different scenarios (a) Video 8 (b) video 7 and (c) video 6

Table 6 Quantitative comparison for different sequences

SequencesSuccess rate Center location error

KCF SAMF DSST DKCF CFNet SiamRPN Our KCF SAMF DSST DKCF CFNet SiamRPN OurVideo 8 078 056 062 070 075 079 086 253 423 244 249 170 144 91Video 7 060 048 065 061 078 081 081 217 286 92 374 212 108 75Video 6 079 067 058 082 067 079 079 127 71 63 126 168 156 93Video 5 061 052 054 077 047 040 042 146 250 114 171 152 29 81Video 4 073 068 071 075 075 088 082 273 223 144 249 170 149 101Video 3 068 070 072 076 079 083 087 317 283 92 374 212 108 84Video 2 059 057 062 065 075 081 085 125 91 63 126 168 156 89Video 1 068 062 066 072 061 079 079 746 230 214 271 152 199 182

10 Mathematical Problems in Engineering

References

[1] C Wu H Sun H Wang et al ldquoOnline multi-object trackingvia combining discriminative correlation filters with makingdecisionrdquo IEEE Access vol 6 pp 43499ndash43512 2018

[2] Y Qi C Wu D Chen and Y Lu Superpixel tracking basedon sparse representationrdquo Journal of Electronics and Infor-mation Technology vol 37 no 3 pp 529ndash535 2015

[3] G Yuan and M Xue ldquoVisual tracking based on sparse densestructure representation and online robust dictionary learn-ingrdquo Journal of Electronics amp Information Technology vol 37no 3 pp 536ndash542 2015

[4] H Luo B-K Zhong and F-S Kong ldquoTracking usingweighted block compressed sensing and location predictionrdquoJournal of Electronics amp Information Technology vol 37 no 5pp 1160ndash1166 2015

[5] Z-Q Hou A-Q Huang W-S Yu and X Liu ldquoVisual objecttracking method based on local patch model and modelupdaterdquo Journal of Electronics amp Information Technologyvol 37 no 6 pp 1357ndash1364 2015

[6] M Xue H Zhu and G-L Yuan ldquoRobust visual trackingbased on online discrimination dictionary learningrdquo Journalof Electronics amp Information Technology vol 37 no 7pp 1654ndash1659 2015

[7] L Matthews T Ishikawa S Baker et al ldquoe template updateproblemrdquo IEEE Transactions on Pattern Analysis andMachineIntelligence vol 26 no 6 pp 810ndash815 2004

[8] D A Ross J Lim R-S Lin and M-H Yang ldquoIncrementallearning for robust visual trackingrdquo International Journal ofComputer Vision vol 77 no 1-3 pp 125ndash141 2008

[9] S Hare S Golodetz A Saffari et al ldquoStruck structuredoutput tracking with kernelsrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 38 no 10 pp 2096ndash2109 2016

[10] Z Kalal K Mikolajczyk and J Matas ldquoTracking-learning-detectionrdquo IEEE Transactions on Pattern Analysis and Ma-chine Intelligence vol 6 no 1 pp 1409ndash1422 2010

[11] D S Bolme J R Beveridge B A Draper and Y M LuildquoVisual object tracking using adaptive correlation filtersrdquo inProceedings of the 2010 IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR)pp 2544ndash2550 San Francisco CA USA June 2010

[12] J F Henriques R Caseiro P Martins and J BatistaldquoExploiting the circulant structure of tracking-by-detectionwith kernelsrdquo in Proceedings of the 12th European conferenceon Computer Vision - Volume Part IV - ECCV 2012 vol 75no 1 pp 702ndash715 Florence Italy October 2012

[13] J F Henriques R Caseiro P Martins and J Batista ldquoHigh-speed tracking with kernelized correlation filtersrdquo IEEETransactions on Pattern Analysis amp Machine Intelligencevol 37 no 3 pp 583ndash596 2015

[14] M Danelljan G Hager F Khan and M Felsberg ldquoAccuratescale estimation for robust visual trackingrdquo in Proceedings ofthe British Machine Vision Conference pp 590ndash604 BMVAPress Nottingham England September 2014

[15] Y Li and J Zhu ldquoA scale adaptive kernel correlation filtertracker with feature integrationrdquo in Proceedings of the 2014European Conference on Computer Vision (ECCV) pp 254ndash265 Springer Zurich Switzerland September 2014

[16] M Danelljan G Hager F S Khan and M FelsbergldquoLearning spatially regularized correlation filters for visualtrackingrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer Vision (ICCV) pp 4310ndash4318 IEEESantiago Chile December 2015

[17] L Bertinetto J Valmadre S Golodetz O Miksik andP H S Torr ldquoStaple complementary learners for real-timetrackingrdquo in Proceedings of the 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) pp 1401ndash1409 Las Vegas NV USA June 2016

[18] F Li C Tian W Zuo L Zhang and M-H Yang ldquoLearningspatial-temporal regularized correlation filters for visualtrackingrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 4904ndash4913 Salt Lake CityUT USA June 2018

[19] M Mueller N Smith and B Ghanem ldquoContext-awarecorrelation filter trackingrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 1396ndash1404 Honolulu HI USA July 2017

[20] W Zhong H Lu andM H Yang ldquoRobust object tracking viasparse collaborative appearance modelrdquo IEEE Transactions onImage Processing A Publication of the IEEE Signal ProcessingSociety vol 23 no 5 pp 2356ndash2368 2014

[21] Y-Z Xue and T Wang ldquoObject tracking based on cost-sensitive Adaboost algorithmrdquo Chinese Journal of GraphicArts vol 21 no 5 pp 544ndash555 2016

[22] Q Wang J Gao J Xing M Zhang and W Hu ldquoDCFnetdiscriminant correlation filters network for visual trackingrdquoin Proceedings of the 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) Honolulu HI USAJuly 2017

[23] J Valmadre L Bertinetto J F Henriques A Vedaldi andP H S Torr ldquoEnd-to-end representation learning for cor-relation filter based trackingrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 5000ndash5008 Honolulu HI USA July 2017

[24] B Li J Yan W Wu Z Zhu and X Hu ldquoHigh performancetracking with Siamese region proposal networkrdquo in Pro-ceedings of the 2018 IEEECVF Conference on Computer Visionand Pattern Recognition pp 8971ndash8980 Salt Lake City UTUSA June 2018

[25] B Uzkent and Y W Seo ldquoEnKCF ensemble of kernelizedcorrelation filters for high-speed object trackingrdquo in Pro-ceedings of the 2018 IEEEWinter Conference on Applications ofComputer Vision pp 77ndash89 Lake Tahoe NV USA March2018

Mathematical Problems in Engineering 11

continuous change of object scale the KCF trackingtemplate-introduced background interference informa-tion gradually accumulates and finally there is a largetracking deviation (such as the 640th frame) Our pro-posed algorithm can automatically adjust the trackingbounding box size according to the object scale changethereby reducing the background interference informa-tion so it can always estimate the location and the scale of

the object the object in video 7 has dramatic changes inillumination and scale (frames 65 110 and 351 inFigure 8(b)) In the whole tracking process only ourproposed algorithm and SiamRPN can complete thetracking of the entire video and other methods cannotadapt to drastic changes in illumination and scale theobject in video 6 has a certain scale and posture changewhere KCF SAMF DSST CFNet SiamRPN and our

10

08

06

04

02

000 10 20 30 40 50

Location error threshold

Prec

ision

YOLO-V3 + KCFYOLO-V3KCF model

(a)

10

08

06

04

02

000 10 20 30 40 50

Overlap rate threshold

Succ

ess r

ate

YOLO-V3 + KCFYOLOKCF model

(b)

Figure 7 Ablation analysis for different modules in video 5 (a) Precision plot and (b) success plot

Table 4 Comparison results for the single tracking algorithm

Modules Object detection Object tracking FusionModel SSD YOLO-V3 DSST SAMF KCF YOLO-V3 +KCFCLE (pixel) 211 169 1727 1328 146 101OR 0524 0816 0711 0625 0752 0841Frame rate (FPS) 65 87 342 261 460 161

Table 5 Tracking performance of different module combinations

Index (average) YOLO-V3+KCF YOLO-V3 +DSST YOLO-V3+SAMF SSD+KCF SSD+KCF SSD+KCFCLE (pixel) 105 136 165 152 183 175OR 0861 0782 0771 0837 0825 0776Frame rate (FPS) 161 132 91 122 109 87

Mathematical Problems in Engineering 9

proposed algorithm have better tracking performance butour OR and CLE are the highest

5 Conclusion

In a complex surveillance video object detection andtracking usually suffers from various environmental in-terference especially scale changes occlusion illuminationchanges and motion blur is paper proposes an objectdetection and tracking model based on spatiotemporalinformation fusion which uses deep learning to detect andextract spatial information improve detection accuracyand avoid object position drift and then an improved KCFtracking is used to track temporal information so as toavoid missed detection finally the spatiotemporal infor-mation fusion strategy is designed to make detection in-formation and tracking information complementation

e results show that our proposed algorithm can effi-ciently continuously detect and track objects in differentcomplex scenes To a certain extent it can cope with theinfluence of the abovementioned environmental interfer-ence factors has both robustness and stable performanceHowever the detection and tracking effect with too smallscale is slightly worse so the next step will be to makeimprovements on it

Data Availability

e labeled dataset used to support the findings of this studyis available from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

ProposedSAMFSiamRPNKCF

CFNetDKCFDSST

(a)

(b)

(c)

Figure 8 Qualitative comparison of different tracking algorithms in different scenarios (a) Video 8 (b) video 7 and (c) video 6

Table 6 Quantitative comparison for different sequences

SequencesSuccess rate Center location error

KCF SAMF DSST DKCF CFNet SiamRPN Our KCF SAMF DSST DKCF CFNet SiamRPN OurVideo 8 078 056 062 070 075 079 086 253 423 244 249 170 144 91Video 7 060 048 065 061 078 081 081 217 286 92 374 212 108 75Video 6 079 067 058 082 067 079 079 127 71 63 126 168 156 93Video 5 061 052 054 077 047 040 042 146 250 114 171 152 29 81Video 4 073 068 071 075 075 088 082 273 223 144 249 170 149 101Video 3 068 070 072 076 079 083 087 317 283 92 374 212 108 84Video 2 059 057 062 065 075 081 085 125 91 63 126 168 156 89Video 1 068 062 066 072 061 079 079 746 230 214 271 152 199 182

10 Mathematical Problems in Engineering

References

[1] C Wu H Sun H Wang et al ldquoOnline multi-object trackingvia combining discriminative correlation filters with makingdecisionrdquo IEEE Access vol 6 pp 43499ndash43512 2018

[2] Y Qi C Wu D Chen and Y Lu Superpixel tracking basedon sparse representationrdquo Journal of Electronics and Infor-mation Technology vol 37 no 3 pp 529ndash535 2015

[3] G Yuan and M Xue ldquoVisual tracking based on sparse densestructure representation and online robust dictionary learn-ingrdquo Journal of Electronics amp Information Technology vol 37no 3 pp 536ndash542 2015

[4] H Luo B-K Zhong and F-S Kong ldquoTracking usingweighted block compressed sensing and location predictionrdquoJournal of Electronics amp Information Technology vol 37 no 5pp 1160ndash1166 2015

[5] Z-Q Hou A-Q Huang W-S Yu and X Liu ldquoVisual objecttracking method based on local patch model and modelupdaterdquo Journal of Electronics amp Information Technologyvol 37 no 6 pp 1357ndash1364 2015

[6] M Xue H Zhu and G-L Yuan ldquoRobust visual trackingbased on online discrimination dictionary learningrdquo Journalof Electronics amp Information Technology vol 37 no 7pp 1654ndash1659 2015

[7] L Matthews T Ishikawa S Baker et al ldquoe template updateproblemrdquo IEEE Transactions on Pattern Analysis andMachineIntelligence vol 26 no 6 pp 810ndash815 2004

[8] D A Ross J Lim R-S Lin and M-H Yang ldquoIncrementallearning for robust visual trackingrdquo International Journal ofComputer Vision vol 77 no 1-3 pp 125ndash141 2008

[9] S Hare S Golodetz A Saffari et al ldquoStruck structuredoutput tracking with kernelsrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 38 no 10 pp 2096ndash2109 2016

[10] Z Kalal K Mikolajczyk and J Matas ldquoTracking-learning-detectionrdquo IEEE Transactions on Pattern Analysis and Ma-chine Intelligence vol 6 no 1 pp 1409ndash1422 2010

[11] D S Bolme J R Beveridge B A Draper and Y M LuildquoVisual object tracking using adaptive correlation filtersrdquo inProceedings of the 2010 IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR)pp 2544ndash2550 San Francisco CA USA June 2010

[12] J F Henriques R Caseiro P Martins and J BatistaldquoExploiting the circulant structure of tracking-by-detectionwith kernelsrdquo in Proceedings of the 12th European conferenceon Computer Vision - Volume Part IV - ECCV 2012 vol 75no 1 pp 702ndash715 Florence Italy October 2012

[13] J F Henriques R Caseiro P Martins and J Batista ldquoHigh-speed tracking with kernelized correlation filtersrdquo IEEETransactions on Pattern Analysis amp Machine Intelligencevol 37 no 3 pp 583ndash596 2015

[14] M Danelljan G Hager F Khan and M Felsberg ldquoAccuratescale estimation for robust visual trackingrdquo in Proceedings ofthe British Machine Vision Conference pp 590ndash604 BMVAPress Nottingham England September 2014

[15] Y Li and J Zhu ldquoA scale adaptive kernel correlation filtertracker with feature integrationrdquo in Proceedings of the 2014European Conference on Computer Vision (ECCV) pp 254ndash265 Springer Zurich Switzerland September 2014

[16] M Danelljan G Hager F S Khan and M FelsbergldquoLearning spatially regularized correlation filters for visualtrackingrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer Vision (ICCV) pp 4310ndash4318 IEEESantiago Chile December 2015

[17] L Bertinetto J Valmadre S Golodetz O Miksik andP H S Torr ldquoStaple complementary learners for real-timetrackingrdquo in Proceedings of the 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) pp 1401ndash1409 Las Vegas NV USA June 2016

[18] F Li C Tian W Zuo L Zhang and M-H Yang ldquoLearningspatial-temporal regularized correlation filters for visualtrackingrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 4904ndash4913 Salt Lake CityUT USA June 2018

[19] M Mueller N Smith and B Ghanem ldquoContext-awarecorrelation filter trackingrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 1396ndash1404 Honolulu HI USA July 2017

[20] W Zhong H Lu andM H Yang ldquoRobust object tracking viasparse collaborative appearance modelrdquo IEEE Transactions onImage Processing A Publication of the IEEE Signal ProcessingSociety vol 23 no 5 pp 2356ndash2368 2014

[21] Y-Z Xue and T Wang ldquoObject tracking based on cost-sensitive Adaboost algorithmrdquo Chinese Journal of GraphicArts vol 21 no 5 pp 544ndash555 2016

[22] Q Wang J Gao J Xing M Zhang and W Hu ldquoDCFnetdiscriminant correlation filters network for visual trackingrdquoin Proceedings of the 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) Honolulu HI USAJuly 2017

[23] J Valmadre L Bertinetto J F Henriques A Vedaldi andP H S Torr ldquoEnd-to-end representation learning for cor-relation filter based trackingrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 5000ndash5008 Honolulu HI USA July 2017

[24] B Li J Yan W Wu Z Zhu and X Hu ldquoHigh performancetracking with Siamese region proposal networkrdquo in Pro-ceedings of the 2018 IEEECVF Conference on Computer Visionand Pattern Recognition pp 8971ndash8980 Salt Lake City UTUSA June 2018

[25] B Uzkent and Y W Seo ldquoEnKCF ensemble of kernelizedcorrelation filters for high-speed object trackingrdquo in Pro-ceedings of the 2018 IEEEWinter Conference on Applications ofComputer Vision pp 77ndash89 Lake Tahoe NV USA March2018

Mathematical Problems in Engineering 11

proposed algorithm have better tracking performance butour OR and CLE are the highest

5 Conclusion

In a complex surveillance video object detection andtracking usually suffers from various environmental in-terference especially scale changes occlusion illuminationchanges and motion blur is paper proposes an objectdetection and tracking model based on spatiotemporalinformation fusion which uses deep learning to detect andextract spatial information improve detection accuracyand avoid object position drift and then an improved KCFtracking is used to track temporal information so as toavoid missed detection finally the spatiotemporal infor-mation fusion strategy is designed to make detection in-formation and tracking information complementation

e results show that our proposed algorithm can effi-ciently continuously detect and track objects in differentcomplex scenes To a certain extent it can cope with theinfluence of the abovementioned environmental interfer-ence factors has both robustness and stable performanceHowever the detection and tracking effect with too smallscale is slightly worse so the next step will be to makeimprovements on it

Data Availability

e labeled dataset used to support the findings of this studyis available from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

ProposedSAMFSiamRPNKCF

CFNetDKCFDSST

(a)

(b)

(c)

Figure 8 Qualitative comparison of different tracking algorithms in different scenarios (a) Video 8 (b) video 7 and (c) video 6

Table 6 Quantitative comparison for different sequences

SequencesSuccess rate Center location error

KCF SAMF DSST DKCF CFNet SiamRPN Our KCF SAMF DSST DKCF CFNet SiamRPN OurVideo 8 078 056 062 070 075 079 086 253 423 244 249 170 144 91Video 7 060 048 065 061 078 081 081 217 286 92 374 212 108 75Video 6 079 067 058 082 067 079 079 127 71 63 126 168 156 93Video 5 061 052 054 077 047 040 042 146 250 114 171 152 29 81Video 4 073 068 071 075 075 088 082 273 223 144 249 170 149 101Video 3 068 070 072 076 079 083 087 317 283 92 374 212 108 84Video 2 059 057 062 065 075 081 085 125 91 63 126 168 156 89Video 1 068 062 066 072 061 079 079 746 230 214 271 152 199 182

10 Mathematical Problems in Engineering

References

[1] C Wu H Sun H Wang et al ldquoOnline multi-object trackingvia combining discriminative correlation filters with makingdecisionrdquo IEEE Access vol 6 pp 43499ndash43512 2018

[2] Y Qi C Wu D Chen and Y Lu Superpixel tracking basedon sparse representationrdquo Journal of Electronics and Infor-mation Technology vol 37 no 3 pp 529ndash535 2015

[3] G Yuan and M Xue ldquoVisual tracking based on sparse densestructure representation and online robust dictionary learn-ingrdquo Journal of Electronics amp Information Technology vol 37no 3 pp 536ndash542 2015

[4] H Luo B-K Zhong and F-S Kong ldquoTracking usingweighted block compressed sensing and location predictionrdquoJournal of Electronics amp Information Technology vol 37 no 5pp 1160ndash1166 2015

[5] Z-Q Hou A-Q Huang W-S Yu and X Liu ldquoVisual objecttracking method based on local patch model and modelupdaterdquo Journal of Electronics amp Information Technologyvol 37 no 6 pp 1357ndash1364 2015

[6] M Xue H Zhu and G-L Yuan ldquoRobust visual trackingbased on online discrimination dictionary learningrdquo Journalof Electronics amp Information Technology vol 37 no 7pp 1654ndash1659 2015

[7] L Matthews T Ishikawa S Baker et al ldquoe template updateproblemrdquo IEEE Transactions on Pattern Analysis andMachineIntelligence vol 26 no 6 pp 810ndash815 2004

[8] D A Ross J Lim R-S Lin and M-H Yang ldquoIncrementallearning for robust visual trackingrdquo International Journal ofComputer Vision vol 77 no 1-3 pp 125ndash141 2008

[9] S Hare S Golodetz A Saffari et al ldquoStruck structuredoutput tracking with kernelsrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 38 no 10 pp 2096ndash2109 2016

[10] Z Kalal K Mikolajczyk and J Matas ldquoTracking-learning-detectionrdquo IEEE Transactions on Pattern Analysis and Ma-chine Intelligence vol 6 no 1 pp 1409ndash1422 2010

[11] D S Bolme J R Beveridge B A Draper and Y M LuildquoVisual object tracking using adaptive correlation filtersrdquo inProceedings of the 2010 IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR)pp 2544ndash2550 San Francisco CA USA June 2010

[12] J F Henriques R Caseiro P Martins and J BatistaldquoExploiting the circulant structure of tracking-by-detectionwith kernelsrdquo in Proceedings of the 12th European conferenceon Computer Vision - Volume Part IV - ECCV 2012 vol 75no 1 pp 702ndash715 Florence Italy October 2012

[13] J F Henriques R Caseiro P Martins and J Batista ldquoHigh-speed tracking with kernelized correlation filtersrdquo IEEETransactions on Pattern Analysis amp Machine Intelligencevol 37 no 3 pp 583ndash596 2015

[14] M Danelljan G Hager F Khan and M Felsberg ldquoAccuratescale estimation for robust visual trackingrdquo in Proceedings ofthe British Machine Vision Conference pp 590ndash604 BMVAPress Nottingham England September 2014

[15] Y Li and J Zhu ldquoA scale adaptive kernel correlation filtertracker with feature integrationrdquo in Proceedings of the 2014European Conference on Computer Vision (ECCV) pp 254ndash265 Springer Zurich Switzerland September 2014

[16] M Danelljan G Hager F S Khan and M FelsbergldquoLearning spatially regularized correlation filters for visualtrackingrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer Vision (ICCV) pp 4310ndash4318 IEEESantiago Chile December 2015

[17] L Bertinetto J Valmadre S Golodetz O Miksik andP H S Torr ldquoStaple complementary learners for real-timetrackingrdquo in Proceedings of the 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) pp 1401ndash1409 Las Vegas NV USA June 2016

[18] F Li C Tian W Zuo L Zhang and M-H Yang ldquoLearningspatial-temporal regularized correlation filters for visualtrackingrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 4904ndash4913 Salt Lake CityUT USA June 2018

[19] M Mueller N Smith and B Ghanem ldquoContext-awarecorrelation filter trackingrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 1396ndash1404 Honolulu HI USA July 2017

[20] W Zhong H Lu andM H Yang ldquoRobust object tracking viasparse collaborative appearance modelrdquo IEEE Transactions onImage Processing A Publication of the IEEE Signal ProcessingSociety vol 23 no 5 pp 2356ndash2368 2014

[21] Y-Z Xue and T Wang ldquoObject tracking based on cost-sensitive Adaboost algorithmrdquo Chinese Journal of GraphicArts vol 21 no 5 pp 544ndash555 2016

[22] Q Wang J Gao J Xing M Zhang and W Hu ldquoDCFnetdiscriminant correlation filters network for visual trackingrdquoin Proceedings of the 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) Honolulu HI USAJuly 2017

[23] J Valmadre L Bertinetto J F Henriques A Vedaldi andP H S Torr ldquoEnd-to-end representation learning for cor-relation filter based trackingrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 5000ndash5008 Honolulu HI USA July 2017

[24] B Li J Yan W Wu Z Zhu and X Hu ldquoHigh performancetracking with Siamese region proposal networkrdquo in Pro-ceedings of the 2018 IEEECVF Conference on Computer Visionand Pattern Recognition pp 8971ndash8980 Salt Lake City UTUSA June 2018

[25] B Uzkent and Y W Seo ldquoEnKCF ensemble of kernelizedcorrelation filters for high-speed object trackingrdquo in Pro-ceedings of the 2018 IEEEWinter Conference on Applications ofComputer Vision pp 77ndash89 Lake Tahoe NV USA March2018

Mathematical Problems in Engineering 11

References

[1] C Wu H Sun H Wang et al ldquoOnline multi-object trackingvia combining discriminative correlation filters with makingdecisionrdquo IEEE Access vol 6 pp 43499ndash43512 2018

[2] Y Qi C Wu D Chen and Y Lu Superpixel tracking basedon sparse representationrdquo Journal of Electronics and Infor-mation Technology vol 37 no 3 pp 529ndash535 2015

[3] G Yuan and M Xue ldquoVisual tracking based on sparse densestructure representation and online robust dictionary learn-ingrdquo Journal of Electronics amp Information Technology vol 37no 3 pp 536ndash542 2015

[4] H Luo B-K Zhong and F-S Kong ldquoTracking usingweighted block compressed sensing and location predictionrdquoJournal of Electronics amp Information Technology vol 37 no 5pp 1160ndash1166 2015

[5] Z-Q Hou A-Q Huang W-S Yu and X Liu ldquoVisual objecttracking method based on local patch model and modelupdaterdquo Journal of Electronics amp Information Technologyvol 37 no 6 pp 1357ndash1364 2015

[6] M Xue H Zhu and G-L Yuan ldquoRobust visual trackingbased on online discrimination dictionary learningrdquo Journalof Electronics amp Information Technology vol 37 no 7pp 1654ndash1659 2015

[7] L Matthews T Ishikawa S Baker et al ldquoe template updateproblemrdquo IEEE Transactions on Pattern Analysis andMachineIntelligence vol 26 no 6 pp 810ndash815 2004

[8] D A Ross J Lim R-S Lin and M-H Yang ldquoIncrementallearning for robust visual trackingrdquo International Journal ofComputer Vision vol 77 no 1-3 pp 125ndash141 2008

[9] S Hare S Golodetz A Saffari et al ldquoStruck structuredoutput tracking with kernelsrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 38 no 10 pp 2096ndash2109 2016

[10] Z Kalal K Mikolajczyk and J Matas ldquoTracking-learning-detectionrdquo IEEE Transactions on Pattern Analysis and Ma-chine Intelligence vol 6 no 1 pp 1409ndash1422 2010

[11] D S Bolme J R Beveridge B A Draper and Y M LuildquoVisual object tracking using adaptive correlation filtersrdquo inProceedings of the 2010 IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR)pp 2544ndash2550 San Francisco CA USA June 2010

[12] J F Henriques R Caseiro P Martins and J BatistaldquoExploiting the circulant structure of tracking-by-detectionwith kernelsrdquo in Proceedings of the 12th European conferenceon Computer Vision - Volume Part IV - ECCV 2012 vol 75no 1 pp 702ndash715 Florence Italy October 2012

[13] J F Henriques R Caseiro P Martins and J Batista ldquoHigh-speed tracking with kernelized correlation filtersrdquo IEEETransactions on Pattern Analysis amp Machine Intelligencevol 37 no 3 pp 583ndash596 2015

[14] M Danelljan G Hager F Khan and M Felsberg ldquoAccuratescale estimation for robust visual trackingrdquo in Proceedings ofthe British Machine Vision Conference pp 590ndash604 BMVAPress Nottingham England September 2014

[15] Y Li and J Zhu ldquoA scale adaptive kernel correlation filtertracker with feature integrationrdquo in Proceedings of the 2014European Conference on Computer Vision (ECCV) pp 254ndash265 Springer Zurich Switzerland September 2014

[16] M Danelljan G Hager F S Khan and M FelsbergldquoLearning spatially regularized correlation filters for visualtrackingrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer Vision (ICCV) pp 4310ndash4318 IEEESantiago Chile December 2015

[17] L Bertinetto J Valmadre S Golodetz O Miksik andP H S Torr ldquoStaple complementary learners for real-timetrackingrdquo in Proceedings of the 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) pp 1401ndash1409 Las Vegas NV USA June 2016

[18] F Li C Tian W Zuo L Zhang and M-H Yang ldquoLearningspatial-temporal regularized correlation filters for visualtrackingrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 4904ndash4913 Salt Lake CityUT USA June 2018

[19] M Mueller N Smith and B Ghanem ldquoContext-awarecorrelation filter trackingrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 1396ndash1404 Honolulu HI USA July 2017

[20] W Zhong H Lu andM H Yang ldquoRobust object tracking viasparse collaborative appearance modelrdquo IEEE Transactions onImage Processing A Publication of the IEEE Signal ProcessingSociety vol 23 no 5 pp 2356ndash2368 2014

[21] Y-Z Xue and T Wang ldquoObject tracking based on cost-sensitive Adaboost algorithmrdquo Chinese Journal of GraphicArts vol 21 no 5 pp 544ndash555 2016

[22] Q Wang J Gao J Xing M Zhang and W Hu ldquoDCFnetdiscriminant correlation filters network for visual trackingrdquoin Proceedings of the 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) Honolulu HI USAJuly 2017

[23] J Valmadre L Bertinetto J F Henriques A Vedaldi andP H S Torr ldquoEnd-to-end representation learning for cor-relation filter based trackingrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 5000ndash5008 Honolulu HI USA July 2017

[24] B Li J Yan W Wu Z Zhu and X Hu ldquoHigh performancetracking with Siamese region proposal networkrdquo in Pro-ceedings of the 2018 IEEECVF Conference on Computer Visionand Pattern Recognition pp 8971ndash8980 Salt Lake City UTUSA June 2018

[25] B Uzkent and Y W Seo ldquoEnKCF ensemble of kernelizedcorrelation filters for high-speed object trackingrdquo in Pro-ceedings of the 2018 IEEEWinter Conference on Applications ofComputer Vision pp 77ndash89 Lake Tahoe NV USA March2018

Mathematical Problems in Engineering 11