VIDEO-BASED ROAD TRAFFIC MONITORING PROGRESS REPORTstudentnet.cs.manchester.ac.uk/resources/library/... · Figure 1.1: Function diagram of VRTM System 1.3 Overview of VRTM System

VIDEO-BASED ROAD TRAFFICMONITORING

PROGRESS REPORT

2013

ByLi LI

School of Computer Science

Contents

Abstract 4

1 Introduction 51.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Aim and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Overview of VRTM System . . . . . . . . . . . . . . . . . . . . . . 7

1.4 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.5 Report Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 background 102.1 Vehicle Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.1 Frame Differential Method . . . . . . . . . . . . . . . . . . . 11

2.1.2 Optical Flow Field Method . . . . . . . . . . . . . . . . . . . 12

2.1.3 Background Subtraction . . . . . . . . . . . . . . . . . . . . 12

2.1.4 Scan-Line Based Method . . . . . . . . . . . . . . . . . . . . 16

2.2 Shadow Suppression Methods . . . . . . . . . . . . . . . . . . . . . 16

2.2.1 HSV Colour Space Method . . . . . . . . . . . . . . . . . . 16

2.2.2 Edges Analysis Method . . . . . . . . . . . . . . . . . . . . 17

2.3 Vehicle Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3.1 Template matching . . . . . . . . . . . . . . . . . . . . . . . 18

2.3.2 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4 Vehicle Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.4.1 Feature Selection based on prior knowledge . . . . . . . . . . 21

2.4.2 Scale-invariant feature transform (SIFT) . . . . . . . . . . . . 22

2.4.3 Bag-of-Feature Model . . . . . . . . . . . . . . . . . . . . . 24

2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2

3 Methodology 263.1 Nonparametric Background Generation . . . . . . . . . . . . . . . . 263.2 Vehicle Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2.1 Morphological Processing . . . . . . . . . . . . . . . . . . . 283.2.2 Blob Searching . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.3 Vehicle Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.3.1 Configuration of Kalman Filter . . . . . . . . . . . . . . . . . 303.3.2 Region-based vehicle tracking algorithm . . . . . . . . . . . 31

3.4 Vehicle Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 313.4.1 Feature Selection based on prior knowledge . . . . . . . . . . 333.4.2 Bag-of-Feature Model . . . . . . . . . . . . . . . . . . . . . 33

4 Progress 354.1 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.3 Evaluation Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.3.1 Background Generation . . . . . . . . . . . . . . . . . . . . 364.3.2 Vehicle Segmentation . . . . . . . . . . . . . . . . . . . . . . 374.3.3 Vehicle Tracking . . . . . . . . . . . . . . . . . . . . . . . . 384.3.4 Shadow removal . . . . . . . . . . . . . . . . . . . . . . . . 384.3.5 Vehicle classification . . . . . . . . . . . . . . . . . . . . . . 39

4.4 Time Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.5 Current Progress and Future work . . . . . . . . . . . . . . . . . . . 40

Bibliography 42

Word Count: 10,300

3

Abstract

In this article, we proposed some feasible methods about vehicle detection, trackingand classification for building a video-based road traffic monitoring (VRTM) system.The system is able to monitor the traffic flow through the analysis of videos gatheredfrom a stationary camera and to output the vehicle trajectories, count and type. Thewhole system composes of three main modules: object detection, blob tracking andvehicle classification. In addition, numerous methods proposed in the literature havebeen reviewed and selected for the purpose of building a robust and effective system.Specifically, the object detection method based on a most reliable background mode(MRBM) is able to subtract an idealized background image under changeable circum-stances and systematic noise. As for blob tracking, the proposed method is based onthe theory of Kalman Filter which estimates the current state of blob by giving thecurrent input measurements and the previous state. According to the regular patternof moving vehicles between two frames, we introduced a new motion model that op-timises the matching results for the same vehicle in different frames. Regarding theclassification, the features of each type of vehicles are collected by Scale-invariantfeature transform (SIFT) descriptor. The contribution in this field is twofold. First,the features are detectable even under changes in image scale, noise and illumination.Second, the method relies on a supervised learning stage using Bag-of-Words modelto classify the vehicles. Finally, a detailed evaluation plan were produced to ensurethat the proposed methodologies are robust enough to handle different vehicle typesand challenging situations, like weather variations, systematic noise and interferenceof pedestrian.

4

Chapter 1

Introduction

In this chapter, we briefly introduce the motivation for using video-based traffic surveil-lance applications and its advantages. In addition, a brief overview of this project,include aims, objectives, systems structure and challenges, is illustrated. Finally, wegive the outline of this report.

1.1 Introduction

Nowadays, with the growth of motorization and urbanisation, traffic congestion andtraffic violation have become two urgent problems in transport and traffic managementas they increase the burden of governments and local authorities to a great extent. Gen-erally speaking, a good solution for solving these problems is to emphasise infrastruc-ture construction, which may enhance the traffic capability for road network. However,with the restriction of limited space, money and other resources, people are now fo-cusing on improving the operational efficiency by employing various high-technologysystems, like Intelligent Transportation System (ITS) [WP00].

ITS is a worldwide research hotspot in the domain of transportation analysis. Itcontains advanced computer process technology, information technology and elec-tronic control technology. These advanced technologies aim to provide innovativeservices to different modes of transportation and traffic management, like building areal-time, accurate and high-efficient integrated traffic management system [PoJ10].Typically, traffic data can be collected via radar, magnetic sensor, LIDAR or video-cameras [JA12]. Compared to approaches based on other data sources, the video-basedmethod has its own special advantages that are shown as follow:

• Detect multiple vehicles on different lanes at the same time.

5

• Compare with other methods, stationary camera has a more extensive field ofvision.

• Low cost on installation and maintenance, installation and maintenance withouttransportation interruption [KW11].

• Obtain traffic information, include traffic volume, vehicle speed and vehicletypes, synchronously.

• Easy to store, retrieve and maintain traffic data through the Internet and intranet.

1.2 Aim and Objectives

The aim of this project is to develop a Video-based Road Traffic Monitoring (VRTM)system which is able to detect vehicles, gather traffic information and classify vehicletypes by analysing the traffic video.

Objectives of our project are shown as follow:

• Identify a robust background generation method that is able to produce highquality background images by overcoming the interference of weather varia-tions, illumination change and stationary objects.

• Improve the existent background subtraction algorithm in order to obtain morecomplete vehicle blobs and overcome blobs overlapping problem.

• The system should be able to robustly suppress the vehicle shadow.

• Deploy Kalman Filter [Del98] and Region-based matching [Qia09] algorithm totrack the vehicle blobs and obtain the count and trajectories of vehicles.

• Compare the overall performances of feature selection method based on priorknowledge and Bag-of-Feature model [Sch11] in terms of vehicle classification.

• The vehicle tracking and classification module should satisfy the real-time re-quirement by using low computational complexity algorithms, such as KalmanFilter and simple linear-kernel SVM classifier [CC95].

• Users should be able to gather the overall traffic information and certain vehiclestrajectory through interacting with the interface.

6

Figure 1.1: Function diagram of VRTM System

1.3 Overview of VRTM System

Since the quality and comprehensiveness of traffic data is crucial to ITS, lots of im-portant traffic information, such as vehicle trajectories, number of vehicles passed theroad, average velocity, vehicle types, lane occupancy ratio, occurrence of accidents etc.[HZ04], should be collected by the traffic surveillance system effectively. To achievethis, the VRTM system is composed of three modules: object detecting module, blobtracking module and vehicle classification module. The object detecting module is re-sponsible for segmenting any foreground objects of interest from the background. Thetracking module is to match the same image blob appeared in a sequence of frames.And the classification module is to identify the type of segmented vehicle according toits unique features. The detailed functionalities of VRTM are shown in figure 1.1.

We name the connected component which satisfies specific conditions as blob.While the objects that are detected by mistake referred to as ghosts. The informa-tion of a blob, includes width, length and the position of centroid, are considered asthe preliminary data of a vehicle.

These three modules can be further divided into six parts, which are backgroundgeneration, vehicle segmentation, shadow removal, vehicle tracking, feature selectionand feature matching. Figure 1.2 shows the overall flow chart of VRTM system. Be-fore deploying VRTM system to collect traffic data, users must specify the number ofsamples used for training background, minimal weight coefficient, minimal differentialand extremum of pixel-based length of detectable blob to configure the system. Thenthe background image is generated from a sequence of frames and updated in a regularinterval. After that, system starts to segment vehicle blobs from each frame and triesto match them with the corresponding blobs in the previous frame. During the process,the Kalman Filter set for the current blob will be updated so as to estimate its nextstate. When the moving blob is ready to leave the detected area, vehicles image would

7

Figure 1.2: Flow Chart of VRTM System

be extracted and processed through shadow removal and classification algorithms. Theinput data for classification module is vehicle images that dislodge the interferences asmuch as possible. In this project, the VRTM system should divide the vehicles in threegroups as Motorcycle, Car & Lorry and Bus & Truck. There are two main steps in herewhich are feature selection and feature matching. Finally, the traffic information willbe updated and displayed on the user interface.

1.4 Challenges

According to the objectives listed above, some major challenges for implementingVRTM system are listed below:

1. The background model generated by the system can be largely affected by thechange of external environment, such as illumination, stationary object, systemnoise, etc.

2. Vehicle occlusions are prevalent in most observation angles and are one of themost difficult factors to overcome. Occlusions usually occur when one vehicleappears next to another and covers it partially or completely [MW08]. Thisresults in miscalculation on vehicle numbers as the system takes two occludedvehicles as one. Therefore, occlusion problems should be properly solved inorder to improve the accuracy of vehicle detection.

8

3. Vehicle may partially appear in the binary image as some pixels may have similargrey value as the ones in background image. This may cause the failures onvehicle segmentation and classification.

4. When the vehicle has a substantial change on its velocity, tracking algorithmmay obtain a wrong target.

5. In most traffic monitoring systems, the shadow of vehicles can greatly affectthe performance of vehicle identification. An obvious shadow would result inobtaining inaccurate information of detected vehicle, like height and width, oreven false targets.

6. Since the vehicles of same type may have various shapes and colours, it is diffi-cult for the system to classify each of them.

7. The classifier should be strong enough to handle various vehicle images takenfrom different angles.

As regards to the solutions, a wide range of background research is conducted in orderto search for appropriate methods that can overcome these issues. A detailed back-ground research report and methodologies statement will be illustrated in the followingsections.

1.5 Report Outline

The remainder of this report is organized as follow: some background concepts andtheories related to each module of our system are reviewed in Chapter 2, followed bythe detail description of the proposed methodologies applied in our system in Chapter3. As for Chapter 4, we will describe the overall progress of our project, which includesprojects aim, objectives, time plan and a detailed evaluation plan for each module.Finally, we conclude the research which has been done so far and give the future planfor the project during the next several months.

9

Chapter 2

Background Research

In this chapter, some commonly used theories and algorithms related to each moduleare discussed from a wider perspective. Specifically, we illustrate respective character-istics of those background theories and summarize the advantages and shortcomingsof each of them. Therefore, we may select techniques that are suitable for the systemand further improve them.

2.1 Vehicle Detection

The main purpose of vehicle detection is to extract the vehicle image from a trafficvideo and to remove the interferences as much as possible. Owing to the existence ofinterference factors in real situation, such as the lighting change, interference of mov-ing background objects, vehicles shadow, vibration and occlusion of other vehicles,the accuracy and robustness of vehicle detection algorithm can be impacted to a greatextent. Therefore, we have to take these situations into account when we are buildingthe VRTM system.

Among all the procedures in traffic data collection, vehicle detection is the keyproblem since an accurate segmentation of moving object will greatly enhance the per-formance of vehicle tracking and classification. Currently, there are lots of methodshave been adopted for moving object detection, like optical flow [BR78], HSV colourbackground modelling [ZL03] and background subtraction based on time difference[Liu06]. Even though many effective methods have been provided in the literature[YS08, MW09], the fundamental problems for precision of detecting vehicles are stillnot being completely solved. In the following sections, we describe three commonly

10

used vehicle detection methods, which are Frame Differential Method [Sek00], Opti-cal Flow Field Method [Gib50], Background Subtraction [Liu06] and Detection LineMethod [T84].

2.1.1 Frame Differential Method

The theory of Frame Differential Method is based on the close relationship among asequence of motion images [Sek00]. The algorithm is shown as follow: firstly, wesubtract the grey value of the k frame from the grey value of the k-1 frame in the videosequence; then select a threshold and transform the difference image into binary value.If the grey value of a pixel is higher than the given threshold, such pixel is treated asforeground, conversely as background pixel. The blobs selected from the foregroundpixels are considered as the candidates of actual moving vehicles. Two main equationsare shown as follow:

Dk(x,y) = |Fk(x,y)−Fk−1(x,y)| (2.1)

Vk(x,y) =

1 if Dk(x,y)> t

0 else(2.2)

where Dk(x,y) indicates absolute value of differential between the two adjoining frames[KW11]. Fk(x,y) refers to the grey value of a pixel (x,y) in the k frame. The foregroundmask is represented by Vk(x,y) and decided via the threshold t.

As for its advantages, the Frame Differential method is insensitive to illumination,and it is suit for dynamic environments and real-time system due to its less computa-tional complexity. Nevertheless, this algorithm also has some disadvantages. Owningto the feature of dynamic, it is difficult for the algorithm to detect stationary objects orthe objects with low speed. As for the high-speed objects, the segmentation result willbe split apart and the interior of detected blob is much likely to contain lots of noiseand hollow. An improved algorithm has been presented by Collins [CR99], whichuses multiple frames differential instead of the initial one. In spite of this, Collins alsoadopted adaptive statistical optimal binary method instead of using fixed threshold[CR99].

11

2.1.2 Optical Flow Field Method

The concept of Optical Flow Field method is brought out by Gibson in 1950 [Gib50].Kun Wu proposed that the optical flow represents the velocity of mode within an im-age [KW11]. The optical flow field is a two-dimensional instantaneous velocity fieldwhich are the project of visible three-dimensional velocity vectors on imaging plane.Optical Flow method takes image within the detecting area as a vector field of velocity;each vector represents the transient variation of the position for a pixel in the scenery.Therefore, optical flow field carries abundant information about relative motions andthree dimensional structure of the related scenery. It can be used to decide whetherthere are moving vehicles in the detecting area.

By deploying the Optical Flow Method, we can detect the moving targets even ina moving condition. However, this method also needs other features, like colour, greylevel, edges and etc., to enhance the accuracy of segmentation, which makes it verysensitive to the system noise. Therefore, it cannot be utilized in a real-time processingsystem if there is no special hardware support. Moreover, the inner points of a largehomogeneous object (a vehicle with single colour) cannot be featured with the opticalflow [HZ04].

2.1.3 Background Subtraction

Background Subtraction method [Liu06] is a classical while practical approach forvehicle detection. The principle of the algorithm is very simple: build a backgroundmodel (mean is the simplest model) for a sequence of frames to get the backgroundimage, and then subtract the background mask with the current frame. After that weselect a threshold and transform the subtracted image into binary image, which is sameas the Frame Differential Method. The pixels whose grey level is greater than thethreshold are considered as the foreground points.

The background subtraction approaches can be typically classified as parametricmethod and nonparametric method [Liu06]. In terms of parametric background gener-ation algorithm, the most commonly used method is the Gaussian mixture model. In[YS08],the algorithm classifies the Gaussian distribution into reliable distribution andunreliable distribution, and the unreliable distribution can evolve into reliable distribu-tions by accumulating pixel values. As a matter of fact, the performance of Gaussianmixture model is not steady when the objects are in a low speed or do a whistle stop. As

12

for nonparametric techniques, M. Elgammal [MED00] builds a nonparametric back-ground model by kernel density estimation, which is robust enough to handle situationswhere the background is impacted by small motions, like tree branches and litter.

Furthermore, background subtraction method is insensitive to the speed of movingobjects. Therefore, it is able to get a correct segmentation result on excessively fast andslow objects, and even on the static objects. Typically, the foreground information pro-vided by Background Subtraction method is more integrated than other methods, andit has a great value on practicability due to its less computational complexity. How-ever, this method is sensitive to illumination, weather and other extraneous factors. Asa result, it is necessary to build different background generation models and adaptiveupdating method to fit in with the surroundings. In this section, we mainly describethree commonly used background generation models, which are Average Model [Y12],Gaussian Mixture Model [PK01] and Kernel Density Estimate Model [MED00].

Average Model

Average method considers the average grey value of a pixel in a sequence of frames asthe background value of the same pixel [Y12]. The equation is as follow:

Bk =1N( fK−N+1 + fK−N+2 + ...+ fk) (2.3)

where Bk represents the updated background image; N is the number of training frames;fk is the k frame image.

In terms of background update, as pixels of current frame can be divided into theforeground and background, we use background pixels of current frame to correct thecurrent background mask. In addition, a weight should be set as the role of updatingrate. The update of background model is denoted as follow:

Bk+1 = (1−β)Bk +β fk (2.4)

where β is the updating rate. β with small value will make the background changeslowly as the previous background image has more proportion than the current one.Conversely, background image will be updated with a high frequency. Normally, tak-ing as a value of 0.05 can reach good update effect [KW11]. The update of backgroundshould be applied in two situations:

1. Reduce the deviation which is brought by the light change.

13

2. The background should be updated in the situation of occlusion.

Gaussian Mixture Model

The algorithm used in Gaussian Mixture Model [PK01] can be divided into two parts.The first part is about adaptive background modelling, which describes the backgroundpixels by using multiple Gaussian models and how to update the model by using newpixels. The second part is Expectation Maximization (EM) algorithm. EM can be seenas an improvement upon the updating strategy of background modelling.

Within the adaptive background image, each pixel is described by K Gaussian dis-tribution models. The method that builds such models is called Gaussian Mixturemodelling. As for each pixel, its Gaussian mixture models are updated constantly overtime. We assume that there are K Gaussian model in one pixel, and the weight ofnumber k model is Wk. Therefore, in time N, the normal distribution of the model isη(XN ;θk), and the background model can be described as:

p(XN) =K

∑j=1

Wjη(XN ;θ j) (2.5)

After we obtain the background model, each pixel within it would be converted intoa sequence of Gaussian models with weights Wk. Regarding all these backgroundmodels, we sort them by the value of Wk/θk so as to pick first B models whose valueis larger than threshold T , while other models are considered as invalid models. Thisprocedure helps us to update the background models and filter out invalid ones. B canbe expressed in a mathematical language:

B = argmin(b

∑j=1

Wj > T ) (2.6)

As to a pixel XN in a new frame N, we traverse the sequence of background modelsto find the matching target which satisfies the condition |XN − µk| ≤ 2.5σk in the firstranking. The updating strategy of Gaussian mixture model is as follow:

W′k =(1−α)Wk +αp

µ′k =(1−α)µk +ρxN

Σ′k =(1−α)Σk +ρ(xN−µ

′k)(xN−µ

′k)

T

(2.7)

14

Among these equations, parameter ρ equals αη(xN ;µk,Σk). As for parameter p, if thepixel matches with background model, p equals 1, otherwise, p equals 0. Parameter α

is the learning rate, which controls the proportion of new pixels used in updating. Ifwe set a big value for it, the Gaussian background models would be updated at a fasterspeed, and the previous background will be eliminated more quickly. If no matchingmodels have be found, the model ranked at last will be replaced by the model whosecentre locates at XN .

Kernel Density Estimation Method

Elgammal proposed a nonparametric background model based on kernel density esti-mation [MED00]. This method evaluates the video sample data through kernel func-tions and selects sample data that has a maximal probability density as background.Different from Gaussian Mixture Model, Kernel Density Estimation makes the bestuse of preceding frames so as to build the background model. It is able to handle thefrequent shifts of pixels within a short time, which let the algorithm to get a moreprecise result. However, since the noise and uninteresting foreground points are alsoestimated, the algorithm is more complex than other methods.

We assume that there are M pixels in a video frame and each pixel has N back-ground samples. In the time t, the value of pixel i is x(t)i, and the value of pixel i inj background sample is x(t)i, j, The probability of pixel i can be calculated by followequation:

P(x(t)i) =1N

ΣNj=1K(x(t)i− x(t)i, j) (2.8)

where K is the kernel estimator. Assume that K obeys Gaussian distribution, we takeR, G, B components as eigenvalues. If they are independent to each other, the sum ofprobabilities of N samples can be written as:

P(x(t)i) =1N

ΣNj=1Π

dm=1

1√2πσi,m

e−

(x(t)i,m−x(t)(i,m), j)2

2σ2i,m (2.9)

where parameter d is the feature dimension, σi,m is the kernel width of feature m. Ifthe probability satisfies P(x(t)i)< Tf , the pixel x(t)i is considered as foreground. Tf isthe global threshold under the whole image.

15

2.1.4 Scan-Line Based Method

Like induction coil sensor, the detection line should be set in an appropriate positionand is perpendicular to the vehicles direction. The algorithm stores all the informationon the detection line, such as background image and edges. When the object comesacross the line, the image which is in the detecting area will be overlapped. Therefore,if the length of overlapped area is larger than the threshold, we may consider this objectas a moving vehicle.

Abramczuk et.al [T84] adopted an artificial detection line whose width is 3 pixels.While another scan-line concept adopted by Kollery et.al [KD93] is to consider whitestripes on the road as a mark to detect vehicles. After obtaining the binary edge imageof labelled region, we compare it with the referenced edge image in order to verifywhether the road marks have been overlapped. If the differential value is greater thanthe threshold, we may think the covering is a moving vehicle. In addition, this methodis able to eliminate the interference of shadow.

Generally speaking, Scan-Line Based Method is a simple vehicle detecting algo-rithm that is suitable for real-time applications and certain traffic conditions, such ashighway and one-way roads. However, this method does not perform well in somecomplex crossroads as vehicles are moving follow different directions.

2.2 Shadow Suppression Methods

In traffic environment, shadows are produced by stationary objects and moving ve-hicles. As interference, it cannot be suppressed by background subtraction or othermethods we have mentioned above. In addition, shadow is attached to the edge of ve-hicle and may stick multiple vehicles together, which may pose a great threat to vehicledetection and the measurement of vehicles geometric characteristics. Currently, the al-gorithms about shadow suppression are mainly based on HSV colour space [RC01]and edges analysis [ZYh05].

2.2.1 HSV Colour Space Method

HSV colour space not only closes to humans colour sense, but also reflects hue infor-mation and grey scale image in a more precise way, especially for the bright and darkobjects. Typically, we compare the pixel value of shadow with the value of same pixelin the background image. If one pixels value of chrominance and grey value are both

16

below than the thresholds, it is considered as shadow. The algorithm based on HSVcolour [RC01] is shown as follow:

Vk(x,y) =

1 β≤ Vnew(i, j)Vmodel(i, j)

≤ γ∪

Snew(i, j)−Smodel(i, j)≤ TS∪

|Hnew(i, j)−Hmodel(i, j)| ≤ TH

0 other

(2.10)

where TS and TH represent the thresholds for chrominance and grey level respectively.Since the V value of shadow point is always smaller than the value of non-shadow

point, γ is no bigger than 1. As to β, the value of it may get smaller if the lightis intensive, and vice versa. Shadow points usually have a lower value on space S.Moreover, the differential between shadow and background model is often as negative.And the reason why we consider about space H is to obtain a better result. The selectionof TS and TH should be decided by actual tests.

2.2.2 Edges Analysis Method

The shadows belong to vehicle blob have following features:

• Vehicle blobs are usually rectangles.

• Shadow attaches with vehicles edge.

• Shaded area is darker than other area.

• The edge of shaded area is blurry.

The procedure of algorithm is as follow: Firstly, we create a tracking list for the pointsin the edges of vehicle blob. Then these points will be analysed that whether they meetthe above conditions. The judge conditions can be written in follow mathematicallanguage [ZYh05]:

I(i, j)− Ib(i, j)< 0 ∩

|I(i, j)− I(i−1, j)|< tx ∩

|I(i, j)− I(i+1, j)|< tx ∩

|I(i, j)− I(i, j−1)|< ty ∩

|I(i, j)− I(i, j+1)|< ty

(2.11)

17

where I(x,y) is the grey value of the point in tracking list. Ib(x,y) is the grey value ofcorresponding background point. The first condition represents that the current point isdarker than the same point of the background image. The remainders adopt horizontaldifference and vertical difference to detect the differential degree of point (i, j). tx andty are the thresholds for horizontal difference and vertical difference respectively, andtheir value decide the sensitivity of shadow detection. The threshold with a large valuewill erode the vehicle edges, while the one with small value will reduce the accuracy.The selection of tx and ty need to be decided by experimental tests. Finally, the pointwhich satisfies the inequalities will be considered as the shadow point.

2.3 Vehicle Tracking

The main purpose of vehicle tracking is to obtain certain traffic information, such asvelocity and traffic volume. Most vehicle tracking methods observe a fundamentalprinciple which is about using space length to judge whether two blobs in adjacentframes describe the same vehicle. This method helps us to achieve vehicle trackingwithin the time domain. Here the space length refers to not only the Euclidean distancebut also other criterions, such as the Hausdorff distance. In our system, the tracking al-gorithm should be capable to address certain kinds of situations, like vehicle occlusionand temporary disappearance.

Typically, vehicle tracking method can be divided into different types as the vehi-cle can be described by many ways, like model, blob, edge and features [DCl09]. Theprocedure of vehicle tracking can be seen as the process of matching targets among asuccessive frames based on vehicles model, area, edge or features. Recently, severaltypical mathematical tools, such as Pattern Matching and Kalman Filter, are widelyused in object tracking domain. Below are respective introductions of these technolo-gies.

2.3.1 Template matching

The tracking algorithm based on template matching is to find the known pattern fromthe testing vehicle images [CY96]. It can be applied into vehicle detecting and trackingfor both static image and motion image. Typically, Pattern matching method is easy toimplement but it is also computational intensive [CY96].

We assume that the dimensions of template image T and testing image S are M ∗N

18

and L ∗W . Image T is the searching window which overlaps the image S, and thesearching region is called sub graph Si, j where i, j are the coordinate of top left cornerof sub graph in the image S. The ranges of i and j are: 1 ≤ i ≤ L−M+1, 1 ≤ j ≤W −N +1. We use following equation [CY96] to measure the similarity of T and Si.

D(i, j) = ΣMm=1Σ

Nn=1[Si, j(m,n)−T (m,n)]2 (2.12)

After expending and normalising 2.12, we may obtain 2.13:

R(i, j) =ΣM

m=1ΣNn=1[Si, j(m,n)−T (m,n)]2

(ΣMm=1ΣN

n=1[Si, j(m,n)]2)12 (ΣM

m=1ΣNn=1T (m,n)]2)

12

(2.13)

According to Cauchy-Schwartz Inequality [Bit01], we may know that 0≤ R(i, j)≤ 1.If the value of R(i, j) is bigger than the threshold, then Si, j is matched with T, otherwise,it is not.

The algorithm of Template Matching method can overcome the differential ofbrightness. However, it has a high computation complexity.

2.3.2 Kalman Filter

The tracking algorithm based on Kalman Filter [Del98] does not require additionalpast information, which is different from the Template Matching Method. Instead,according to the measured value of current state and previous states, object’s currentstate can be calculated through a set of recursive formulas. It is to be observed thatKalman Filter has less computational complexity and dedicated space, which makes itsuitable for a real-time system.

In order to have a good understanding of its principle, we introduce a discretecontrol system, which can be described via a linear stochastic difference equation:

X(k) = A×X(k−1)+B×U(k)+W (k) (2.14)

where X(k) is systems state and U(k) is the controlled vector. Matrix A and B are thestate transition model and control-input model. The systems observation at time k canbe expressed as follow:

Z(k) = H×X(k)+V (k) (2.15)

where Z(k) is the measured value for time k. Matrix H is the observation model for the

19

system. While W (k) and V (k) are the process noise and observation noise respectively.Both of them are assumed as White Gaussian Noise.

The recursive process of Kalman filtering can be divided into two different phases,which are prediction phase and updating phase. Two important formulas that supportthese two phases are shown in 2.16 and 2.17:

X(k | k−1) = AX(k−1 | k−1)+BU(k) (2.16)

P(k | k−1) = AP(k−1 | k−1)A′+Q (2.17)

2.18, 2.19 and 2.20 show three important formulas for the update phase [Del98]:

X(k | k) = X(k | k−1)+Kg(k)(Z(k)−HX(k | k−1)) (2.18)

Kg(k) =P(k | k−1)H

′

HP(k | k−1)H ′+R

(2.19)

P(k | k) = (1−Kg(k)H)P(k | k−1) (2.20)

where X(k | k) refers to a posteriori state estimate at time k while given the observationstate at time k. While P(k | k) is the corresponding covariance of X(k | k). In addition,Kg is the Kalman Gain. Now we obtain the optimal estimate X(k | k) through thepreceding procedures. When the system gets into the k+1 state, P(k | k) will becomethe P(k−1 | k−1) in 2.17.

Dellaertf [Del98] used Kalman Filter to predict searching region in the next frame,which greatly reduces the search band and diminishes the computational complexityto a great extent. Moreover, if there is partial occlusion in the blob, the estimates pro-duced by Kalman filter is able to replace the optimal matching points. Therefore, thesystem may not lose the vehicle even part of it has been overlapped by the interfer-ences. Generally speaking, Kalman filter has a better accuracy and stability on mosttraffic scenarios.

Yuan et al. [YJW06] proposed a tracking algorithm which is based on grey pre-dicting model GM(1,1). This algorithm is able to find the law of motion of the targetby updating the grey predicting model continuously. This novel method overcomes thedeficiency of Kalman Filter which needs to make an assumption on objects motion andnoise characteristic. And this method can produce a more faster and precise trackingresult.

20

2.4 Vehicle Classification

There are two main steps in vehicle classification module which are feature selectionand feature matching. The main purpose of feature selection is to extract particularimage features of different types of vehicles and then pass them to the feature matchingmodule. Two typical methods of feature selection are based on a priori knowledge[RR05] and Scale-invariant feature transform (SIFT) [Low99b].

In terms of feature matching, Support Vector Machine (SVM) [CC95] and Bag-of-Features model [Sch11] are two commonly used supervised learning algorithms for im-age classification. The Bag-of-Features model derives from Bag-of-words [LFFT09]which describes the frequencies of words from a dictionary. And SVM is considered asa strong classifier for classifying two-class or multi-class non-linear data sets. Beloware respective introductions to these technologies.

2.4.1 Feature Selection based on prior knowledge

Roya Rad et al. [RR05] proposed a feature selection method based on a priori knowl-edge which contains the width, length, dimension and velocity of the detected vehicle.Before we start to collect the vehicle features, the system should recognize the overalldirection of moving vehicle so as to adjust the weights of features. The overall featureof vehicle can be expressed by 2.21:

F(i) = αW (i)+βL(i)+ γD(i)+µV (i) (2.21)

where W (i) is the width of the blob, L(i) is the length of the blob, D(i) is the dimensionand V (i) is the velocity of the detected blob. In addition, the parameter α, β, γ and µ

are the weights of above features. Each vehicle blob can be expressed by the value ofF(i). By clustering the training data, we may obtain some typical values of F(i) fordifferent types of vehicles.

Since the clustering algorithm can be affected by the noise and outliers, we cal-culate the frequency of occurrence of each vehicle type and select the types with thehighest frequency of occurrence as the true vehicle types.

This method is easy to implement and suitable for the real-time system due to itslow computational complexity. However, this algorithm only works well on certaintraffic conditions, such as highway and one-way roads, and it needs a priori knowledgeabout the direction of moving vehicles, which can be obtained through by-hand input

21

or vehicle tracking algorithm.

2.4.2 Scale-invariant feature transform (SIFT)

Lowe proposed an image feature generation method which transforms an image to alarge number of feature vectors [Low99a]. SIFT is able to solve the matching problemcaused by image planning, rotation and affine transformation. The SIFT descriptor isconstructed by the following steps:

Firstly, we generate the scale-space by using the Gaussian kernel G(x,y,σ):

L(x,y,σ) = G(x,y,σ)× I(x,y) (2.22)

where σ is the value of scale in the scale space. L(x,y,σ) represents the Gaussian-blurred images, which decides the degree of blurring of the image after applying theGaussian kernel. Lowe introduced the Difference of Gaussian (DoG) so as to identifythe preliminary keypoints of the image. The equation of DoG space is shown in 2.23:

D(x,y,σ) = L(x,y,kσ)−L(x,y,σ) (2.23)

The preliminary keypoint is defined as the extreme point within the DoG space, whichis the one whose pixel value is bigger or smaller than its neighbours. In the next step,since these preliminary keypoints are still changeful, we need to specify the positionand scale of keypoints more accurately. Specifically, the accurate locations of extremescan be calculated through the derivative of 2.23, which is shown as follow:

D(x) = D+∂DT

∂xx+

12

xT ∂2

∂x2 x (2.24)

As for the next step, the algorithm should eliminate the keypoints which have highedge responses but with lower stability. A second order Hessian matrix shown in 2.25expresses the curvatures across and along the edge.

H =

[Dxx Dxy

Dxy Dyy

](2.25)

The trace of H refers to Dxx+Dyy, while its determinant can be expressed as DxxDyy−

22

Figure 2.1: Generation of Keypoint descriptor [Low99a]

D2xy. The ratio R is shown below:

R =Tr(H)2

Det(H)=

(rth +1)2

rth(2.26)

If R is bigger than a threshold, the corresponding point should be rejected. Generally,we adopt rth as 10. After that, in order to achieve stable rotation, the algorithm mayassign orientation parameters to each keypoint. The gradient magnitude and directionfor a point (x,y) is shown in 2.27 and 2.28:√

(L(x+1,y)−L(x−1,y))2 +(L(x,y+1)−L(x,y−1))2 (2.27)

θ(x,y) = αtan2(L(x,y+1)−L(x,y−1),L(x+1,y)−L(x−1,y)) (2.28)

The gradient magnitude and direction of each pixel in a neighbouring region aroundthe keypoints should be calculated, which generates an orientation histogram with 36bins (10 degree for a bin). As for the histogram graph, the peak of the histogramcorresponds to the dominant orientation. And any bins occupy 80% of the maximumbin are also considered as the auxiliary orientation [Low99a].

In the final step, the SIFT keypoint descriptor should be generated based on lo-cation, scale and rotation of a keypoint. We take an 8 ∗ 8 neighbouring region as thesampling window and calculate histogram for 8 directions on 2 ∗ 2 neighbouring re-gions, which forms 4 ∗ 4 seed points to describe a keypoint. Therefore, each featurevector has 4∗4∗8 = 128 dimensions.

23

Figure 2.2: Generation of Codeword [Sch11]

Figure 2.3: Histograms of testing images [SL12]

2.4.3 Bag-of-Feature Model

The bag-of-word model [SL12] is originally used in document classification, and itscodebook is the occurrence counts of words, which can be seen as the histogram ofthe vocabulary. While in the domain of computer vision, a bag of visual words is theoccurrence counts of local image features, therefore, it also can be called as Bag-of-Feature Model. The key stage of this method is as follow:

Firstly, all descriptors produced by SIFT are collected together and ready to beprocessed. Then, K-Mean clustering algorithm is deployed to generate a codebookfor all the training images. Similar features will be clustered into a single codeword,which is shown in figure 2.2. After that, codewords are used to represent each trainingimage through calculating the number of appearance of them in the codebook. Thehistograms are shown in figure 2.3 : In the final step, according to the value of eachbin in the histogram of a new testing image, we count the frequency of each codeword

24

appears in the image, and assign the corresponding class of the most frequent codewordto the testing data.

Bag-of-Feature model has a good performance on classifying images according tothe objects they contain [Sch11]. In spite of this, it is invariant to the position anddirection of an object which is shown in the image [Sch11]. On the other hand, BoWmodel ignores the spatial relationship among the visual words, which makes it poor atlocalizing objects within an image [SSC06].

2.5 Conclusion

In practice, some methods introduced above are not applicable for the VRTM system.Specifically, as for vehicle detection, Frame Differential method has a relatively lowaccuracy when the system is analysing a complex traffic situation, and Optical FlowField method has a high requirement on systems hardware. Therefore, BackgroundSubtraction can be a compromise. Regarding shadow suppression, Edge Analysismethod is more suitable than the method based on HSV colour space as the formertakes the traffic situation into account. In terms of vehicle tracking, Kalman Filteris the first choice as it has been widely used in object tracking domain. It also hasbeen proved that Kalman Filter can meet the needs of real-time system [Qia09]. Asfor vehicle classification, the proposed methods needed to be evaluated through thisproject.

Although the particular advantages of these methods can help to achieve the overallgoals of vehicle detection, tracking and classification, we should not apply the identicalalgorithms as our system has its own performance requirements and applicable envi-ronments. As a result, we should take their basic ideas as references and ”improve”them in order to achieve the objectives of our system.

25

Chapter 3

Methodology

Based on the merits and demerits of the preceding methodologies, we adopt parts ofthem and propose new algorithms to meet the practical needs of VRTM system. Inthis chapter, some primary algorithms of vehicle detection, tracking and classificationapproaches applied in our system are illustrated.

3.1 Nonparametric Background Generation

In terms of background generation, we adopt a nonparametric method which is pro-posed by Liu [Liu06]. The basic computational model of this method is mean shift[CM02], which is an efficient way to find the modes of the underlying density wherethe gradient is zero [Liu06]. The detailed algorithm is shown as follow:

1. Select preliminary samplersUsers should specify the number of frames used for background generation.Here we introduce equation 3.1 [Liu06] to present the value of a pixel within avideo sequence.

S = {xi}, i = 1, · · · ,n (3.1)

where xi is the grey level of pixel x in frame i. n is number of samples.

2. Select representative pointsIn order to alleviate the computational intensity of the algorithm, we calculate lo-cal means of certain number of samples and denote the set of means by equation3.2 [Liu06].

P = {pi}, i = 1, · · · ,m (3.2)

26

where m�n.

3. Apply mean shiftBy applying mean shift on representative points in P, we may obtain m conver-gence points. As some convergence points are very similar or even identical toeach other, we may cluster them together as one class. Equation 3.3 illustratesthe cluster centres for q classes [Liu06].

C = {{c(i,)wi}}, i = 1, · · · ,q (3.3)

where ci is the grey level and wi is the weight for each cluster centre. And thevalue of q is much less than the value of m. The weight wi is calculated byequation 3.4 [Liu06].

wi =lim, i = 1, · · · ,q (3.4)

where li is the number of points for each cluster.

4. Obtain the most reliable background modeBackground mask is generated by selecting the ci which has the highest weightwi. The final result we got is the most reliable background mode.

Figure 3.1 shows the overall algorithm of nonparametric method for one pixel in asequence of frames. We can see that the initial samples are selected into representativepoints, and then the points will be converged and clustered as the candidate backgroundmodes. Finally, the grey level of pixel which has the highest weight is selected as thebackground pixel.

To dynamically adapt the illumination change and the influence of static objects,the background image should be updated periodically. All the pixels in a new framewill be allocated to the nearest cluster centre. Therefore, the weight of each clusterwill be updated, which may possibly change the value of corresponding backgroundpixel.

3.2 Vehicle Segmentation

In the module of vehicle segmentation, we adopt the method mentioned in section2.1.3 which is about subtracting the background mask with the current video frameand transforming the foreground image into binary image. The foreground image with

27

Figure 3.1: Algorithm of Background Generation Method [Liu06]

moving vehicles contains massive noise in the edge area, and there are numerous dis-continuous hollows in the interior of vehicle area. To solve this problem, we utiliseseveral morphological operations to process the grey image. In spite of this, in orderto tackle the problem of vehicle occlusion, we introduce a region growing method.

3.2.1 Morphological Processing

The basic morphological operations include erosion and dilation [BK08]. Erosioneliminates all the boundary points of an object, which reduces the region of objectby one pixel along its perimeter [BK08]; while dilation combines the object with itsconnected background points, which expands the region of the object [BK08]. Themore general morphology contains opening and closing, which are the different com-bination of erosion and dilation. In terms of opening, we deploy dilation after erosion.It is used to eliminate small objects, isolate objects in a fine connected region andsmooth the boundary of big objects while retain the dimension of them [BK08]. In thecase of closing, we apply erosion after dilation, which helps us to fill the small hollowwithin the object, connect adjacent objects and smooth the boundary while retain thedimension of them [BK08]. In vehicle segmentation module, we adopt consecutiveopening and closing operations to eliminate noise and fill the small hollow within thevehicle blob.

28

Figure 3.2: Result of 8*8 regional searching algorithm

3.2.2 Blob Searching

After deploying a threshold for the grey image, we obtain a binary image with multipleconnected regions. The usual method for searching these regions is at pixel level,which can be affected by the noise greatly. To overcome this problem, we introduce anew searching method that is based on an N×N pixels region.

Firstly, we split each frame into numerous N×N regions and scan them accordingto the order from left to right, top to bottom. Then we calculate the proportion offoreground pixels in one region. If the proportion is larger than a predefined threshold,then this region is considered as a foreground region. Finally, we combine all theregions which are connected to each other and obtain the vehicle blobs from them.Figure 6 shows the scanning result of vehicle blobs by using 8*8 regional searchingalgorithm. And we can see that vehicles are properly segmented.

The proposed method enhances the accuracy and robustness of vehicle segmenta-tion algorithm by diminishing the influence of noise. Moreover, it helps us to mergeseveral blobs which belong to one vehicle and split one blob contains multiple vehiclesinto different pieces by analysing the minimal rectangle of detected blobs.

3.3 Vehicle Tracking

Since the tracking module works in a real-time situation, it is impossible for the systemto scan each pixel in the frame and to find the position of matching object. In spiteof this, some model-matching algorithms and methods based on prior knowledge arealso not suitable for the system due to their intensive computational complexity. In

29

order to simplify the matching process and enhance the computational efficiency, weestimate the vehicles current position by analysing its past trajectory, and search for thematching target near the predicted point. The substantial reduction of searching areacan not only decrease the computation complexity but also diminish the impact of othermoving objects. The proposed method for vehicle tracking is reliable to most trafficscenarios and robust enough to overcome most significant interferences by producinga reasonable estimation.

Moreover, as vehicle blobs are collected from the object detection module, thetracking method which is based on image blob should be adopted. So a commonlyused tracking technique Kalman filter is applied in our system. The predicted valueestimated by Kalman filter can not only reduce the searching area but also overcomepartial occlusion when the VRTM system is trying to find the matching point of acertain blob. In spite of this, we introduce a novel tracking algorithm based on regionalanalysis.

3.3.1 Configuration of Kalman Filter

Before applying Kalman Filter on the detected blobs, we need to gather informationfrom the blob, which contains the position of centroid, blobs length and width. Theserecords will be taken as the definition of system state used in Kalman Filter. Thestate X(k) is a four dimensional vector (x(k),y(k),Vx(k),Vy(k)) where they refer to theposition of targets centroid and the velocity along x and y axis . According to equation2.14, we define transition model A and control-input model B as follow:

A =

1 0 ∆t 00 1 0 ∆t

0 0 1 00 0 0 1

B =

000∆t

(3.5)

where ∆t is the time interval between two adjacent frames. In addition, we define theobservation matrix H as equation 3.6.

H =

[1 0 0 00 1 0 0

](3.6)

30

According to the work of Zhang et al. [ZYh05], the noise covariances Q and R can beset as follow:

R =

[0.03 0.005

0.005 0.3

]Q = 0.01×E (3.7)

The detailed equations deployed in Kalman Filter algorithm are illustrated in section2.3.2.

3.3.2 Region-based vehicle tracking algorithm

The process of object tracking is to build a relationship between the targets in two ad-jacent frames. Typically, the whole tracking process can be divided into three partswhich are extraction of moving regions, region prediction and region matching & up-dating. The former two have been solved by vehicle segmentation module and KalmanFilter. Now we will discuss the third part in detail.

Qiao [Qia09] proposed a matching criterion which is shown in 3.8.

R(i, j) = αD(i, j)+βS(i, j)

D(i, j) =√(xi

k− x jk+1)

2+(yi

k− y jk+1)

2

S(i, j) = |LikW

ik−L j

k+1W jk+1|

(3.8)

where xik,yi

k,Lik,W i

k are the ith moving objects position, length and width in the framek. D(i, j) is the Euclidean distance of two centroids of object i in frame k and k+ 1.S(i, j) is the differential of dimensions of same object in frame k and k+1. α and β arethe coefficients which decide the proportion of distance and the degree of deformationin matching equation R(i, j).

The moving blob that has the minimal value of M(i, j) is selected as the matchedtarget for the object i. Moreover, a maximal threshold for M(i, j) should be set to detectthe new vehicle. The overall flowchart of tracking algorithm is shown in figure 3.3.

3.4 Vehicle Classification

There are two main steps in vehicle classification, which are feature selection and fea-ture classification. During the first step, the selected features should be typical to dif-ferent types of vehicles. As for some ideal road conditions, like highway and one-laneroute, it was sufficient to use the length or dimension of blob and average velocity to

31

Figure 3.3: Flowchart of region-based vehicle tracking algorithm

32

classify the vehicle types if the system only works on one stationary camera. While inreal situation, the VRTM system should be capable enough to handle various situations.Therefore, we adopted Scale-invariant feature transform (SIFT) [Low99b] descriptorto describe features of a vehicle image. The advantages of the SIFT algorithm are theinvariance of scaling, the change of viewing angle and brightness between two imagesof same object.

As for the second step, since there are no sufficient literary evidences prove thatwhich image classification method is best suit for the traffic scenario, we have to com-pare the performances between bag-of-feature model and the method based on priorknowledge. The detailed algorithms of the latter is illustrated in section 2.4. Whilewe focus here on how to apply Bag-of-Feature model in out system.

3.4.1 Feature Selection based on prior knowledge

Before applying the algorithm, we should give prior knowledge to the system, whichmay include the direction of moving objects, typical sizes of long and short vehicles,general velocity of different types of vehicles and etc. This knowledge will be appliedinto equation 2.21 and transformed into a composite mathematical value.

In spite of this, SVM can be deployed to train the vehicle information so as toclassify the vehicle in a new frame. It is important to note that the training data shouldbe collected from the traffic video stream rather than other car image repositories.Therefore, users must input the exact type of vehicle into the system until the systemobtains a strong classifier.

In order to avoid the impact of noise, we predefine a threshold for the value of F(i).If the value of F(i) of a blob is lower than the threshold, it will be considered as thenoise sample.

3.4.2 Bag-of-Feature Model

The training data sets used for constructing Bag-of-Feature model is gathered fromPASCAL Object Classes. The vehicle image repository contains thousands of imagesof bicycle, bus, car and motorbike. Before we start to analyse the traffic video, thefeatures of each vehicle image should be extracted by using SIFT descriptor, and thentaken as the input data for building Bag-of-Feature model. The detailed algorithm hasbeen descripted in section 2.4.3.

To enhance the accuracy of Bag-of-Feature model, we employ SVM classifier in

33

Figure 3.4: Flowchart of vehicle classification algorithm

the final step of the procedure. The values of codewords in a histogram are taken asthe training data set, and the vehicle type is considered as the actual label.

After we obtain the vehicle image, SIFT is applied to produce a descriptor for theimage. Each feature within the descriptor is allocated to the corresponding clustercentre (codeword) in the codebook. As a result, a histogram for the testing image isgenerated and tested on the trained SVM classifier. Figure 3.4 illustrates the overallflowchart of the algorithm.

34

Chapter 4

Progress

In this chapter, we explain the aims and objectives for implementing VRTM system.According to the objectives listed below, some thought-out methodologies are intro-duced so as to meet the requirements of our system. Then an overview of VRTMsystem will be presented as the final deliverable of this project. Moreover, a summa-tive evaluation plan is designed in order to measure the overall property of the systemin the debugging stage. Finally, a schedule of the project is illustrated through a Granttchart in which several milestones are pointed out.

4.1 Aim

The aim of this project is to develop a Video-based Road Traffic Monitoring (VRTM)system which is able to detect vehicles, gather traffic information and classify vehicletypes by analysing the traffic video.

4.2 Objectives

Objectives of our project are shown as follow:

• Identify a robust background generation method that is able to produce highquality background images by overcoming the interference of weather varia-tions, illumination change and stationary objects.

• Improve the existent background subtraction algorithm in order to obtain morecomplete vehicle blobs and overcome blobs overlapping problem.

35

• The system should be able to robustly suppress vehicles shadow.

• Deploy Kalman Filter and Region-based matching algorithm to track the vehicleblobs and obtain the count and trajectories of vehicles.

• Compare the overall performances of feature selection method based on priorknowledge and Bag-of-Feature model in terms of vehicle classification.

• The vehicle tracking and classification module should satisfy the real-time re-quirement by using low computational complexity algorithms, such as KalmanFilter and simple linear-kernel SVM classifier.

• Users should be able to gather the overall traffic information and certain vehiclestrajectory through interacting with the interface.

4.3 Evaluation Plan

Since VRTM system has to tackle with various interference factors during the wholeanalysing process, a comprehensive evaluation plan is needed to measure the outcomeof each module and to decide whether they have achieved the objectives.

As regards the proposed modules, we will illustrate the incoming data, intendedoutcome and evaluation methods for each of them.

4.3.1 Background Generation

Incoming data

• Number of samples N should be defined as the input parameter. And it is variedwith the condition of traffic congestion as we need more samples if the road iscongested. Its value is usually between 100 and 500.

• Number of representative points M is used for deploying mean shift operator,the value of it is a tenth of N.

• Minimal differential T is the threshold for turning background image into a bi-nary mask, its value should be decided through practical experiments.

• N consecutive frames picked from a traffic video are taken as the training datafor generating background

36

Intended outcome

• An image shows all the stationary objects within the training frames and ex-cludes any moving objects. Moreover, the object shown in the image should tobe identical with the one in the actual background. Only a small quantity ofnoise is allowable.

• The background image is updated regularly through which the image is able toresponse the considerable change of illumination and stationary objects.

Evaluation Method

The motivation for evaluating background generation algorithm is twofold. Firstly, thecomputational complexity of background updating algorithm should be controlled inorder to meet the real-time requirement. Secondly, the accuracy of vehicle segmen-tation is mainly decided by the quality of background image. Since it is difficult tosingly measure the quality of background model, we may take the property of vehiclesegmentation module as a preferred reference.

4.3.2 Vehicle Segmentation

Incoming data

• An accurate background mask gathered from the previous module.

• A grey Image of current frame.

Intended outcome

• Image blobs that only contain the vehicles and their shadow. And each blob is fitfor only one vehicle. Any blobs include multiple vehicles or other objects willbe taken as error results.

Evaluation Method

The overall performance of vehicle detection module is measured by the proportion ofcorrect detected vehicles. The expectant accuracy of vehicle detection should be noless than 95%.

37

4.3.3 Vehicle Tracking

Incoming data

• Information gathered from the accurate vehicle blobs contains the positions ofcentroids, blobs length and width.

• An initialised tracking list that is used to store all vehicles information and theirtrajectories.

Intended outcome

• Trajectories of vehicles composed by their positions in every frame. Specifi-cally, all the positions appear within the detectable area for a vehicle should bedetected. Any incorrect or missing positions will cause failure on vehicle track-ing.

• The correct vehicle count through over the whole traffic video.

Evaluation Method

The actual vehicle count of the video is calculated manually. It should be identical tothe number gathered from the VRTM system.

In addition, each vehicles trajectories should be check by hand, and any trajectorieswhich are not reasonable, such as incomplete or disordered paths, will be taken as theerror result. The error rate of vehicles trajectories should be no more than 10

4.3.4 Shadow removal

Incoming data

• Image of vehicle blobs that have left the detectable area.

Intended outcome

• A segmented vehicle image in which the shadow part has been greatly sup-pressed.

• Updated data of the segmented vehicle image.

38

Evaluation Method

We plan to use a traffic video that contains twenty vehicles with obvious shadows tomeasure the performance of shadow suppression algorithm. No more than 20% of theshadow part is acceptable for our system.

4.3.5 Vehicle classification

Incoming data

• Vehicles categories.

• Vehicles data passed from the previous module, which includes blobs length,width and position.

• Threshold for the minimal dimension of different types vehicles. Determined bythe practical experiments.

• Hundreds of vehicles image collected from PASCAL object resource, which areconsidered as the training data set for Bag-of-Feature model.

• Vehicles images passed from the previous module are taken as the testing dataset for Bag-of-Feature model.

Intended outcome

• As for the method based on prior knowledge, a strong SVM classifier should begenerated in order to classify the types of vehicles.

• In terms of Bag-of-Feature method, a strong Bag-of-Feature model should betrained.

• The preceding algorithms should satisfy the real-time requirement.

Evaluation Method

By comparing the generalization abilities of two classification methods, we may ableto estimate which one of them is more effective on vehicle classification. In addition,P-values can be deployed in order to decide whether the result is said to be statisticallysignificant.

39

Figure 4.1: Schedule for doing the project

The duration of model training and testing should also be evaluated as the processof classification is in sync with the video frames.

4.4 Time Plan

Figure 4.1 shows an overall time table for doing our project. Three labelled milstonesset the short-term goals that help to foucs effort and structure our work.

4.5 Current Progress and Future work

Until now we have preliminarily implemented the vehicle detection module and ve-hicle tracking module. By testing on two traffic videos with minor interferences,the vehicle detection module is able to generate a satisfactory background model byanalysing a few sample frames, and about 90% of the vehicles can be segmented com-pletely while others are failed due to the occlusion and system noise.

As a preliminary result, about 60% of the vehicles are correctly tracked throughoutthe traffic video so far. Therefore, the region-based tracking algorithm needs to beimproved so as to overcome the dynamic factors happened in traffic situations, such asocclusion, change of vehicles direction, speed change and etc. Some brief experimentalresults are illustrated in figure 4.5.

40

(a) video frame (b) background image

(c) segementation result (d) tracking result

Figure 4.2: Experimental Result

In the following two months, we mainly focus on implementing vehicle classifica-tion module as it may cost lots of time on selecting training data and building classifier.Specifically, we need to manually select hundreds of typical vehicle images from PAS-CAL repositories and train them by using Bag-of-words model and SVM. Meanwhile,a thorough testing and debugging process is critical to our project as it may offer im-portant experimental data for improving the algorithms we have proposed so far, andthe input parameters for the algorithms can also be properly set.

41

Bibliography

[Bit01] V.I Bityutskov. Bunyakovskii inequality. Encyclopedia of Mathematics,Hazewinkel, Michiel, 2001.

[BK08] Gary Bradski and Adrian Kaebler. Learning OpenCV. OReilly Media,USA, 2008.

[BR78] Andrew Burton and John Radford. Thinking in Perspective: Critical Essays

in the Study of Thought Processes. Routledge, 1978.

[CC95] Vladimir Vapnik Corinna Cortes. Support-vector networks. Machine

Learning, 20(3), May 1995.

[CM02] D. Comaniciu and P. Meer. Mean shift: a robust approach toward featurespace analysis. Pattern Analysis and Machine Intelligence, IEEE Transac-

tions on, 24(5), 2002.

[CR99] Kanade T Collins R, Lipton A. A system for vehicle surveillance and mon-itoring. In Proc of 8th International Topical Meeting on Robotics and Re-

mote Systems, Pittsburgh: ANS, 1999.

[CY96] Weng J Cui Y. Hand segmentation using learning-based prediction andverification for hand sign recognition [c]. In IEEE Conference on Computer

Vision and Pattern Recognition, 1996.

[DCl09] Dong Yu-ning Dong Chun-li. Survey on video based vehicle detection andtracking algorithms. Journal of Nanjing University of Posts and Telecom-

munications (Natural Science), 29(2), 2009.

[Del98] Thorpec Dellaertf. Robust car tracking using kalman filtering and bayesiantemplates [c]. In Conference on Intelligent Transportation Systems, 1998.

42

[Gib50] James J. Gibson. The perception of the visual world. Houghton Mifflin,1950.

[HZ04] Li Peihong He Zhiwei, Liu Jilin. New method of background update forvideo-based vehicle detection. IEEE Intelligent Transportation Systems

Conference, (36), October 2004.

[JA12] Marcos Nieto Jon Arrspide, Luis Salgado. Video analysis-based vehicle de-tection and tracking using an mcmc sampling framework. EURASIP Jour-

nal on Advances in Signal Processing, (2), 2012.

[KD93] Nagelyz H. et al Kollery D, Daniilidisy K. Model based object tracking inmonocular image sequences of road traffic scenes [j]. International Journal

of Computer Vision, (3), 1993.

[KW11] Tianmao Xu Ju Song Kun Wu, Haiying Zhang. Overview of video-basedvehicle detection technologies. Computer Science Education, (2), August2011.

[LFFT09] R. Fergus L. Fei-Fei and A. Torralba. Recognizing and learn-ing object categories. http://people.csail.mit.edu/torralba/

shortCourseRLOC/index.html, 2009.

[Liu06] Ya-Zhou Liu. Nonparametric background generation. Pattern Recognition,

2006. ICPR 2006. 18th International Conference, 4, 2006.

[Low99a] David G Lowe. Object recognition from local scale-invariant features. InProceedings of the International Conference on Computer Vision, 1999.

[Low99b] David G Lowe. Video-based vehicle detection and tracking using spatio-temporal maps. Journal of the Transportation Research Board, (2), 1999.

[MED00] D. Harwood M. Elgammal and L. Davis. Non-parametric model for back-ground subtraction. In Journal of Highway and Transportation Research

and Development, Dublin, Ireland, 2000.

[MW08] Yegor Malinovskiy and Yao-Jan Wu. Video-based vehicle detection andtracking using spatio-temporal maps. Journal of the Transportation Re-

search Board, (1), August 2008.

43

[MW09] Pei Lin Minjian Wu. Study of background generation in video traffic scene.Journal of Highway and Transportation Research and Development, 26(8),August 2009.

[PK01] R. Bowden P. KadewTraKuPong. An improved adaptive background mix-ture model for real-time tracking with shadow detection [a]. In 2nd Euro-

pean Workshop on Advanced Video-Based Surveillance System [C], 2001.

[PoJ10] Directive 2010/40/EU Of The European Parliament and Of The Councilof 7 July 2010. on the framework for the deployment of intelligent transportsystems in the field of road transport and for interfaces with other modes oftransport. Official Journal of the European Union, (207), 2010.

[Qia09] Pengfei Qiao. Research on key technology in vehicle flow extraction basedtraffic video. master thesis, Shandong University of Science and Technol-ogy, June 2009.

[RC01] Massimo Piccardi et al Rita Cucchiara, Costantino Grana. Improvingshadow suppression in moving object detection with hsv color information[a]. In IEEE Intelligent Transportation Systems Conference Proceedings

[C], Oakland, CA, 2001.

[RR05] Mansour Jamzad Roya Rad. Real time classification and tracking of multi-ple vehicles in highway. Pattern Recognition Letters, (26), 2005.

[Sch11] Cordelia Schmid. Bag-of-features for category classification.http://www.di.ens.fr/willow/events/cvml2011/materials/

CVML2011_Cordelia_bof.pdf, 2011.

[Sek00] Fujiwara H.; Sumi K Seki, M. A robust background subtraction methodfor changing background. In Applications of Computer Vision, 2000, Fifth

IEEE Workshop on, 2000.

[SL12] L. Fei-Fei D. Lowe C. Szurka S. Lazebnik, A. Torralba. Bag-of-words mod-els. http://cs.nyu.edu/˜fergus/teaching/vision_2012/9_BoW.

pdf, 2012.

[SSC06] J. Winn S. Savarese and A. Criminisi. Discriminative object class modelsof appearance and shape by correlatons. Proc. of IEEE Computer Vision

and Pattern Recognition, 2006.

44

[T84] Abramczuk T. A microcomputer based tv detector for road traffic [j]. Sym-

posium on Road Research Program, 3(2), 1984.

[WP00] R J Weiland and L B Purser. Intelligent transportation systems. Transporta-

tion in the New Millennium, (2), 2000.

[Y12] Gu Y. Video vehicle detection based on self-adaptive background up-date. Journal of Nanjing Institute of Technology (National Science Edition),10(2), 2012.

[YJW06] Shi Zhong-ke Yuan Ji-Wei. A method of vehicle tracking based on gm(1,1). Control and Decision, 2(3), 2006.

[YS08] Minwu Ren Yeqin Shao. Background generation algorithm based on gaus-sian distribution. Computer Engineering, 34(13), July 2008.

[ZL03] Li Zhi-neng Zhang Li. Adaptive hsv colour background modelling for real-time vehicle tracking with shadow detection in traffic surveillance. Journal

of Image and Graphics, 8(7), July 2003.

[ZYh05] et al Zhang Yi-hui, Xu Xiao-xia. A vehicle detection system with adaptivebackground update and shadow suppression. Journal of Shanghai Univer-

sity (Natural Science), 11(5), October 2005.

45

Documents

VIDEO-BASED ROAD TRAFFIC MONITORING PROGRESS REPORTstudentnet.cs.manchester.ac.uk/resources/library/... · Figure 1.1: Function diagram of VRTM System 1.3 Overview of VRTM System