Master’s Thesis - DiVA portal839146/FULLTEXT01.pdf · Autoliv Electronics AB ... This master’s thesis describes a way to represent ... The existing system at Autoliv makes use

Department of Electrical Engineering

Master’s Thesis

Using Homographies for Vehicle MotionEstimation

Pär Lundgren

LiTH-ISY-EX–15/4846–SELinköping 2015

Department of Electrical EngineeringLinköping University

SE-581 83 Linköping, Sweden

Using Homographies for Vehicle Motion Estimation

Master’s Thesis in Automatic Controlcompleted at The Institute of Technology, Linköping University,

by

Pär Lundgren

LiTH-ISY-EX–15/4846–SE

Supervisor: Michael Rothisy, Linköpings universitet

Daniel AnkelhedAutoliv Electronics AB

Examiner: Martin Enqvistisy, Linköpings universitet

Linköping, June 10, 2015

Avdelning, InstitutionDivision, Department

Division of Automatic ControlDepartment of Electrical EngineeringSE-581 83 Linköping

DatumDate

2015-06-10

SpråkLanguage

� Svenska/Swedish

� Engelska/English

�

�

RapporttypReport category

� Licentiatavhandling

� Examensarbete

� C-uppsats

� D-uppsats

� Övrig rapport

�

�

URL för elektronisk version

ISBN

—

ISRN

LiTH-ISY-EX–15/4846–SE

Serietitel och serienummerTitle of series, numbering

ISSN

—

TitelTitle

Using Homographies for Vehicle Motion Estimation

FörfattareAuthor

Pär Lundgren

SammanfattningAbstract

This master’s thesis describes a way to represent vehicles when tracking them through an im-age sequence. Vehicles are described with a state containing their position, velocity, size, etc..The thesis highlights the properties of homographies due to their suitability for estimationof projective transformations. The idea is to approximatively represent vehicles with planesbased on feature points found on the vehicles. The purpose with this approach is to estimatethe displacement of a vehicle by estimating the transformation of these planes. Thus, when avehicle is observed from behind, one plane approximates features found on the back and oneplane approximates features found on the side, if the side of the vehicle is visible. The pro-jective transformations of the planes are obtained by measuring the displacement of featurepoints.

The approach presented in this thesis builds on the prerequisites that a camera placed on avehicle provides an image of its field of view. It does not cover how to find vehicles in animage and thus it requires that the patch which contains the vehicle is provided.

Even though this thesis covers large parts of image processing functionalities, the focus is onhow to represent vehicles and how to design an appropriate filter for improving estimates ofvehicle displacement. Due to noisy features points, approximation of planes, and estimatedhomographies, the obtained measurements are likely to be noisy. This requires a filter thatcan handle corrupt measurements and still use those that are not.

An unscented Kalman filter, UKF, is utilized in this implementation. The UKF is an approx-imate solution to nonlinear filtering problems and is here used to update the vehicle’s statesby using measurements obtained from homographies. The choice of the unscented Kalmanfilter was made because of its ease of implementation and its potentially good performance.

The result is not a finished implementation for tracking of vehicles, but rather a first attemptfor this approach. The result is not better than the existing approach, which might dependon one or several factors such as poorly estimated homographies, unreliable feature pointsand bad performance of the UKF.

NyckelordKeywords Tracking, Unscented Kalman filter, homographies

Abstract

This master’s thesis describes a way to represent vehicles when tracking themthrough an image sequence. Vehicles are described with a state containing theirposition, velocity, size, etc.. The thesis highlights the properties of homographiesdue to their suitability for estimation of projective transformations. The idea is toapproximatively represent vehicles with planes based on feature points found onthe vehicles. The purpose with this approach is to estimate the displacement of avehicle by estimating the transformation of these planes. Thus, when a vehicle isobserved from behind, one plane approximates features found on the back andone plane approximates features found on the side, if the side of the vehicle isvisible. The projective transformations of the planes are obtained by measuringthe displacement of feature points.

The approach presented in this thesis builds on the prerequisites that a cameraplaced on a vehicle provides an image of its field of view. It does not cover how tofind vehicles in an image and thus it requires that the patch which contains thevehicle is provided.

Even though this thesis covers large parts of image processing functionalities,the focus is on how to represent vehicles and how to design an appropriate filterfor improving estimates of vehicle displacement. Due to noisy features points, ap-proximation of planes, and estimated homographies, the obtained measurementsare likely to be noisy. This requires a filter that can handle corrupt measurementsand still use those that are not.

An unscented Kalman filter, UKF, is utilized in this implementation. The UKFis an approximate solution to nonlinear filtering problems and is here used toupdate the vehicle’s states by using measurements obtained from homographies.The choice of the unscented Kalman filter was made because of its ease of imple-mentation and its potentially good performance.

The result is not a finished implementation for tracking of vehicles, but rathera first attempt for this approach. The result is not better than the existing ap-proach, which might depend on one or several factors such as poorly estimatedhomographies, unreliable feature points and bad performance of the UKF.

iii

Contents

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4.1 Image processing . . . . . . . . . . . . . . . . . . . . . . . . 51.4.2 Filtering methods . . . . . . . . . . . . . . . . . . . . . . . . 5

1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Image Processing 72.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Feature point extraction . . . . . . . . . . . . . . . . . . . . . . . . 82.3 Optical flow calculation . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.1 Image registration . . . . . . . . . . . . . . . . . . . . . . . . 112.3.2 Pyramidal implementation . . . . . . . . . . . . . . . . . . . 12

3 Filtering 173.1 Extended Kalman filter . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Unscented Kalman filter . . . . . . . . . . . . . . . . . . . . . . . . 183.3 Filter properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Projective geometry 234.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.2 Projective geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.2.1 Homographies . . . . . . . . . . . . . . . . . . . . . . . . . . 254.3 RANSAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5 Implementation 295.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.2 Initiating track . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.3 Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.3.1 Sigma points from predicted state . . . . . . . . . . . . . . . 325.3.2 Predicted measurements . . . . . . . . . . . . . . . . . . . . 33

v

vi Contents

5.3.3 Optical flow of feature points . . . . . . . . . . . . . . . . . 345.3.4 Measurement from homography . . . . . . . . . . . . . . . 345.3.5 Measurement update . . . . . . . . . . . . . . . . . . . . . . 35

5.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6 Experiments and results 376.1 Experiment description . . . . . . . . . . . . . . . . . . . . . . . . . 376.2 Obtained result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.2.1 Obtained tracks . . . . . . . . . . . . . . . . . . . . . . . . . 406.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

7 Conclusion 477.1 Conclusions and remarks . . . . . . . . . . . . . . . . . . . . . . . . 477.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Bibliography 49

1Introduction

This thesis is about target tracking of vehicles in the field of view of a camera thatis mounted in a car. The goal is to construct a tracker that makes use of featurepoints in 2D images to create robust estimates of a vehicle’s position, velocityand pose, etc.. The objective is to obtain a tracker that can provide the driverwith accurate information regarding poses of oncoming and preceding vehicles.

1.1 Background

By utilizing powerful computational resources, a large number of sensors andsophisticated dynamical models it is today possible to obtain good estimates ofwhat the surrounding environment looks like. At Autoliv Electronics today, theydetect vehicles and pedestrians from 2D images and generate models of them.This enables a safety system that is capable of providing drivers with helpfulinformation about the road ahead.

The existing system at Autoliv makes use of data from a stereo camera to retrieveinformation about the road ahead. There are two cameras that together providestwo images which are combined into a depth image, a 3D image, of the field ofview, FOV, see Figure 1.1. From this it is possible to extract information regardingvisible vehicles, for example the distance to a vehicle and its yaw rate.

Badino et al. [2009] introduced the stixel representation for 3D-world represen-tation. By the use of stixels, that is columns of pixels, it is possible for the currentsystem to estimate the length and the direction of vehicles. Stixels are obtainedfrom the depth image. The concept is that if several adjacent pixels in a image col-umn have roughly the same depth values, then they together constitute a stixel.Hence a stixel indicates that something is perpendicular to the camera axis.

1

2 1 Introduction

Field ofview

Visible sides

Headingdirection

Camera axis

Yaw

Camera

Figure 1.1: The camera is drawn solid with the camera axis as a dashed line.The vehicle is the rectangle heading in the direction of the arrow. The yaw-angle is drawn between the heading direction of the vehicle and the dottedline that is parallel to the camera axis.

If stixels are detected in several neighbouring columns they can be grouped torepresent a patch of the image. Grouping stixels requires that neighbouring stix-els consists of similar depth values. When observing a vehicle the stixels that areobtained from one visible side can be grouped into a patch that approximativelyrepresents a rectangle. When two sides of a vehicle are visible it is possible toapproximate two rectangular patches. The choice of breaking point, where thetwo patches are divided, is not described in this thesis since the exact descriptionof how stixels are grouped lies beyond the scope of this thesis. The stixels in eachpatch might vary in depth depending on the vehicle’s pose. Seen from above therectangles are represented by an L-shaped line consisting of the two sides thatare facing the camera. This L-shape is determined by the existing system.

The heading direction of the vehicle is determined from the pose and the velocityvector of the obtained L-shape. The yaw rate is the derivative of the estimatedyaw angle, as illustrated in Figure 1.1.

Due to the uncertainty of the currently used L-shape corner, the direction of thevehicle is uncertain as well. This motivates Autoliv to investigate additional meth-ods to improve their current system and provide a more accurate representationof the surrounding vehicles.

The first step to track an object through a sequence of images with the help offeature points is to locate the object of interest in the images. This task is doneby using a classifier. Since this is outside the scope of this thesis it is not further

1.1 Background 3

Figure 1.2: Feature points located on the side are obtained on a much smallerpart of the image even though they might be located further apart in theworld. Since the length of the car is greater than the width, the oppositerelationship would be desirable.

explained here. Provided the interesting patch of the image, the feature pointscan be extracted.

It is possible to obtain information regarding the object of interest by locatingthe same features in two consecutive images. The retrieved information dependon the approach that is used to represent the object and the assumptions that aremade.

One might choose among different approaches to make use of the data retrievedfrom tracking feature points. Fundamental for utilizing feature points is to relatetheir movement in 2D images to the vehicle’s movement on the road. This re-quires an approach that can utilize the relationship among feature points andapproximate the overall displacement between consecutive frames. This willintroduce noise and uncertainty due to unreliable tracks of individual featurepoints. Hence it is important that the chosen approach can handle uncertaintyand unreliable measurements.

First of all, individual feature points that are related to the vehicle are extractedand then tracked between images. When tracking feature points individuallyan important problem arises. One should utilize the tracked feature points andrelate their displacement between images to the vehicle’s motion. Since the ob-tained displacement of features has limited accuracy it is desirable to use a repre-sentation that is robust even though unreliable displacements are obtained.

Since vehicles are usually observed from behind in a slightly skewed position,the long side of the vehicle is seen within a small patch of the image. This isillustrated in Figure 1.1, where the long side is referring to the left of the twovisible sides. From this follows that feature points from the long side are moresensitive to changes in yaw angle of the vehicle than features that are from theback of the vehicle. Figure 1.2 illustrates how the vehicle is observed by thecamera in Figure 1.1, with feature points drawn.

Figure 1.1 and Figure 1.2 emphasize another problem, the feature points foundon the side of the vehicle will transform in a different manner than those on theback. Therefore one must treat the feature points found on each side separately.

4 1 Introduction

Feature points found on one side of the vehicle can be approximated to be locatedin a plane, hence feature points from two sides can be approximated with twoplanes. With the approximations of planes from feature points one can describethe transformation of features points between frames with the transformation ofplanes. Hence it is possible to approximatively describe the change of pose of thevehicles from the approximated transformations of planes.

1.2 Objective

The main objective of this thesis is to find a way to use the tracks of individualfeature points for describing the vehicle motion in an accurate way. A crucialaspect is the way that one chooses to represent the vehicle since it decides theability of utilizing the feature points for describing projective transformations.The goal is to make use of feature points from both of the vehicle’s visible sidesand to design a filter that utilizes the representation of vehicles made by twoplanes to estimate states from the obtained measurements.

Transformation of feature points from both the back and from the side of the ve-hicle should be described by a projective transformation of a plane. This shouldenable the ability to capture any motion of the vehicle. Since the obtained dis-placements of feature points are likely to be noisy and since all feature points onone side are approximated to be located in a plane it is required to have a filterthat handles uncertainty.

1.3 Limitations

The focus in this thesis is on tracking that involves filtering, modelling and han-dling of noise. Therefore, the functionalities used for image processing are ob-tained by using functions from open source libraries.

One initial goal of the thesis was to evaluate different potential filtering methods.The aim was then to evaluate the performance of both the extended Kalman filter,EKF, and the unscented Kalman filter, UKF, separately and then compare them.This has not been performed since the representation of vehicles was consideredmore interesting to investigate.

Another initial thought of approach for this thesis was to make use of a depth im-age by augmenting feature points with data from this image and use the featurepoint’s depth when tracking. This thought was discarded due to insight of otherinteresting approaches for this thesis.

This thesis project was carried out at Autoliv Electronics, Linköping, and theirexisting system has been used to support this project’s implementation with dataand functionalities. The data provided by Autoliv’s existing system consist of im-ages and the prediction of vehicles’ state and state covariance. Two functions areprovided, one function that determines the region of interest, ROI, for a vehicleand one transition function. The ROI is derived from a state and it represents a

1.4 Related work 5

patch in the image where the vehicle is located. The transition function performsthe time update of a state. The use of a given transition function in this thesis ismotivated by the objective of utilizing homographies for measurement updates.Therefore no effort has been made to increase the accuracy of Autoliv’s existingtransition function.

1.4 Related work

Tracking objects in image sequences has been investigated for some time and avariety of approaches exist.

1.4.1 Image processing

Computer vision is today a highly active research area and there are a wide rangeof approaches to determine the best representation of the real world from images.

Since the goal of this thesis is to track vehicles through image sequences it is vitalto find good features to track. Shi and Tomasi [1994] proposed a method for fea-ture point extraction from 2D intensity images. The method is today recognizedand has been used widely for many years. The implementation in this thesis willhence use Shi and Tomasi’s method to extract and select feature points.

When tracking feature points there are a number of considerations to take intoaccount. Lucas and Kanade [1981] presented a feature point tracker that per-forms image registration through a local search. For example, they describedhow to utilize linearisation of image properties. Bouguet [2001] introduced analgorithm that performs tracking on images represented in a pyramidal form. Byusing a pyramidal representation, the algorithm he introduced was able to uti-lize the approximations introduced by Lucas and Kanade [1981] even when thedisplacement of image features was large between the frames.

These methods are well recognized and generally accepted as high performancemethods. Due to their availability and reliability the above methods will be usedfor the underlying image processing in this thesis.

At Autoliv they use a stereo camera to obtain a depth image, in addition to thegray scale image. Many studies have been performed to evaluate different tech-niques that utilize the depth information, for example finding feature points indepth images. Among those are Loop Closure by the Help of Surface Elements,surfels [Weise et al., 2009] and Shape Index Mapping, SIM [Gedik and Alatan,2013, He et al., 2013].

1.4.2 Filtering methods

Since the measurements obtained from tracking feature points can be unreliableit is desirable to have a filter that is able to handle unreliable measurements. AKalman filter is likely to handle such circumstances rather well and is thereforechosen in this thesis [Gustafsson, 2010].

6 1 Introduction

The Extended Kalman Filter, EKF, is a filter developed for handling nonlinearsystems which is the general case for real systems [Welch and Bishop, 1995]. TheEKF uses linearisation around the current mean for describing nonlinearities andhence it only approximates the actual system. Since nonlinearities are commonwhen one observes vehicles in traffic the EKF was a potential candidate for use inthis project.

Another filter that is developed for handling nonlinearities is the UnscentedKalman Filter, UKF. The UKF was introduced by Julier and Uhlmann [1997] andhas since been further refined and given alternative designs. For example, thescaled UKF was introduced by Julier [2002].

The UKF approaches the problem of approximating nonlinearities in a differentmanner. It predicts the covariance of the system based on a few samples andhence it only approximates the system, as with the EKF. An important differencefrom the EKF is that the UKF does not linearise around the current mean to es-timate the nonlinearities. In some cases the UKF might have advantages to theEKF [Wan and Van Der Merwe].

1.5 Outline

The outline of the thesis is as follows. The three following chapters, Chapters 2-4,contain the theory that is used in this thesis. Chapter 2 contains theory about im-age processing. It describes the means that are used for extracting features fromimages and tracking them through image sequences. Chapter 3 describes the fil-ter theories that are of interest. It describes two fundamental filters that are usedfor nonlinear applications and contains a small part regarding the differences be-tween these two and the arguments for the selected filter. Chapter 4 includes thetheory used for the vehicle representation. It describes some parts of the mathe-matical concept known as projective geometry and the use of homographies. Thethree remaining chapters, Chapters 5-7, describe the implementation, result andconclusion. Chapter 5 describes how the implementation was performed. It re-lates in large part to the theory described in the Chapters 2-4. Chapter 6 containsan evaluation method and the obtained result. It mentions possible causes forlack of performance. Chapter 7 contains conclusions that have been made dur-ing the thesis and some retrospectives regarding the work. It suggests futurework to improve the existing implementation and also an alternative approach.

2Image Processing

This chapter describes the image processing methods that are utilized in this the-sis. More specifically it handles topics such as feature point extraction and track-ing of feature points.

2.1 Background

The images used in this thesis are gray-scale with 8-bits resolution. Image motionis derived from tracking of feature points, which means that one finds character-istic points in the image and follows these through a sequence of images.

Autoliv utilizes a classifier that supports the tracker with information regardingin which region of the image it is interesting to search for features. This area isonly a small part of the entire image and it enables the tracker to find featuresthat are related to the object. The tracker can then extract feature points withinthis area to find reliable features to use for tracking. The area is referred to as theregion of interest, ROI.

When determining image motion by feature point tracking, it is essential tochoose reliable feature points. The first step when selecting feature points fromimages is to determine which information that distinguishes good feature pointsfrom the rest of the image. Shi and Tomasi [1994] proposed a feature point cri-terion that is based upon the functionality of the tracker. Their focus was ondetermining the affine changes of features that arise when one performs trackingof moving objects.

It is assumed that the information that distinguishes a feature point is not trans-formed in a different way than what an affine transform is possible to describe.

7

8 2 Image Processing

This is assumed since a match between feature points only depends on a smallarea that is surrounding the feature and not an large patch of the image, whichis the case when matching larger objects between images. The feature points de-pends on a small area since they are chosen due to the characteristics of theirabsolute closest neighbouring area.

To begin with, image motion can be described as

J(x, y) = I(x − ξ(x, y), y − η(x, y)), (2.1)

where the latter image J can be obtained from the previous image I by movingevery pixel, (x, y), from the previous image by the image displacement δ = (ξ, η)of the point x = (x, y).

Shi and Tomasi [1994] emphasized the importance of using an affine motionmodel that handles the fact that different points can move in different wayswithin an image. They show that this is superior to the alternative of only de-scribing pure translation for image motion. Affine motion can be representedwith

δ = Dx + d, where D =[dxx dxydyx dyy

](2.2)

is a deformation matrix and d is the pure translation of the image centre. Point xin the first image I then moves to point Ax + d in the second image J , see FigureFigure 2.1 where A = 1 + D and 1 is the identity matrix. Given this, two imagescan be related as

J(Ax + d) = I(x). (2.3)

The affine transformation that is mentioned here is not to be confused with theprojective transformation that is used for modelling the entire vehicle, see Chap-ter 4.

2.2 Feature point extraction

Shi and Tomasi [1994] brought forward the necessity of finding features that con-tain enough information for being tracked reliably. They proposed a feature pointcriterion that is optimal by construction since it selects features based on the func-tionality of the tracker.

The objective of tracking a feature is to find the matrix A and the vector d thatminimize the absolute value of

ε =∫W

(J(Ax + d) − I(x)

)2w(x)dx, (2.4)

where W is a window that is used to describe the feature’s characteristics and

2.2 Feature point extraction 9

Figure 2.1: Displacement vector Ax+d in image J corresponds to displace-ment vector x in image I . The car is just for illustrating purposes, in practiceit is not an appropriate feature.

w(x) is a weighting function which for simplicity can be set to 1 [Shi and Tomasi,1994]. Figure 2.1 shows the displacement vector for the feature window in imageI and J , respectively.

Linearisation of J(Ax + d) via Taylor expansion gives

J(Ax + d) = J(x) + gT δ, (2.5)

where g =[gx gy

]T=

[∂J∂x ,

∂J∂y

]Tand δ = Dx + d. In accordance with Shi and

Tomasi [1993] this yields the 6 × 6 linear system

T z = a (2.6)

where zT =[dxx dyx dxy dyy dx dy

]contains the values of the deforma-

tion matrix D and the displacement d, a is an error vector and T is shown in (2.8).The error vector a depends on the differences of the images as

a =∫W

[I(x) − J(x)

]

xgxxgyygxygygxgy

w(x)dx. (2.7)

The 6 × 6 matrix T , which is obtained from one image, is derived as

T =∫W

[U VV T Z

]w(x)dx (2.8)

where


U =

x2g2

x x2gxgy xyg2x xygxgy

x2gxgy x2g2y xygxgy xyg2

y

xyg2x xygxgy y2g2

x y2gxgyxygxgy xyg2

y y2gxgy y2g2y

, V =

xg2x xgxgy

xgxgy xg2y

yg2x ygxgy

ygxgy yg2y

and

Z =[g2x gxgy

gxgy g2y

].

However, since the deformation of a feature between two frames is assumed to besmall, the deformation matrix D can be set to a zero matrix. Attempting to deter-mine the deformation can actually lead to poor displacement solutions accordingto Shi and Tomasi [1994].

By utilizing that D can be set to zero the error vector a can be rewritten as errore, where e only contains the two last entries of vector a. This provides

Zd = e, where dT =[dx dy

], (2.9)

which can be used to determine the displacement d.

Z is determined from one image, here image J , and is used to select feature. It isnecessary that the eigenvalues of the 2×2 matrix Z are larger than a certain value.This is to ensure that the feature’s characteristics are reliable to use for tracking.The eigenvalues of Z can neither differ too much in magnitude since that couldcorrespond to a unidirectional texture pattern.

Two large eigenvalues could for example represent a corner that could be trackedreliably. One large eigenvalue could correspond to a line in the image. In prac-tise, a predefined threshold λ works as a bound for which features that are to bechosen and which that are not. Thus, the patch is chosen to represent a feature if

min(λ1, λ2) > λ (2.10)

and discarded otherwise.

In this project features are drawn from the part of the image that contains thevehicle. The best features are chosen and saved together with the vehicle’s state.The state of a vehicle contains data regarding the vehicle’s position, velocity, etc..

2.3 Optical flow calculation

When good features are found and related to a state in one image they can beused in the next, concurrent, image by finding the corresponding feature pointsthere. The distance with which the feature has moved is then possible to obtain,and this is called the optical flow. The Lucas-Kanade tracker, [Lucas and Kanade,1981], is a method used to determine the optical flow.

2.3 Optical flow calculation 11

Figure 2.2: Displacement vector x+d in image J corresponds to displacementvector x in image I. The car is just for illustrating purposes.

2.3.1 Image registration

By utilizing certain similarity measures one can map features between images,known as image registration. By minimizing those similarity measures it is possi-ble to perform image registration with desired precision. The image registrationprocedure is what is enabling the computation of the optical flow.

Measure of similarities between images can be derived by taking either the L1 orthe L2 norm as

L1,norm =∑xεW

∣∣∣I(x) − J(x+d)∣∣∣ and L2,norm =

( ∑xεW

∣∣∣I(x) − J(x+d)∣∣∣2)1/2

, (2.11)

whereW is the integration window, that is the part of the image that is interestingto compare.

The image registration procedure used by the Lucas-Kanade tracker aims to findthe displacement vector d = [dx, dy] that minimizes the L2 norm, and the errorfunction ε(d), for a window that surrounds the point. The error function is for-mulated as

ε(d) = ε(dx, dy) =(L2,norm

)2=

ux+wx∑x=ux−wx

uy+wy∑y=uy−wy

(I(x, y) − J(x + dx, y + dy)

)2, (2.12)

which gives the size of the integration window as (2wx + 1) · (2wy + 1) pixels.

The optimal solution is derived by taking the first derivative of ε(d) with respectto d and finding its zero point as

∂ε(d)∂d

=[0 0

]T. (2.13)


By approximating J linearly equation (2.12) gets quadratic in dx, dy and mini-mization is then possible to obtain by solving a linear system of equations.

2.3.2 Pyramidal implementation

Due to the first order Taylor expansion, the Lucas-Kanade feature point trackeris only a good approximation when the displacement of feature points is small.Bouguet [2001] introduced a method that was able to utilize the properties of theLucas-Kanade tracker and also handle large displacement of feature points. Thismethod is based on a pyramidal image representation.

Pyramid image

Given the image I with a resolution of nx × ny , an image pyramid is derived bygenerating subimages recursively. Each pyramid level contains half the resolu-tion of its preceding level.

The native image is hence at level 0 and the pyramid consist of Lm levels. Eachlevel L is derived as

IL(x, y) =14IL−1(2x, 2y)+

18

(IL−1(2x − 1, 2y) + IL−1(2x + 1, 2y)+

IL−1(2x, 2y − 1) + IL−1(2x, 2y + 1))+

116

(IL−1(2x − 1, 2y − 1) + IL−1(2x + 1, 2y + 1)+

IL−1(2x − 1, 2y + 1) + IL−1(2x + 1, 2y − 1)).

(2.14)

An implementation according to equation (2.14) makes use of the lowpass filter[ 1/4 1/2 1/4 ] × [ 1/4 1/2 1/4 ], for anti-aliasing purposes. However, inpractice a lowpass filter like

[1/16 1/4 3/8 1/4 1/16

]×[

1/16 1/4 3/8 1/4 1/16]

is used, according to the implementation proposedby Bouguet [2001].

Bouguet [2001] defines dummy image values one pixel around the image IL−1

according to

IL−1(−1, y) =IL−1(0, y),

IL−1(x,−1) =IL−1(x, 0),

IL−1(nL−1x , y) =IL−1(nL−1

x − 1, y),

IL−1(x, nL−1y ) =IL−1(x, nL−1

y − 1),

IL−1(nL−1x , nL−1

y ) =IL−1(nL−1x − 1, nL−1

y − 1),

for 0 ≤ x ≤ nL−1x − 1 and 0 ≤ y ≤ nL−1

y − 1.


From this follows that equation (2.14) is defined for x and y that suits the criteriaof 0 ≤ 2x ≤ nL−1

x − 1 and 0 ≤ 2y ≤ nL−1y − 1. Hence, the width nLx and length nLy of

image IL are the largest integers that fulfils the two criteria

nLx =nL−1x + 1

2,

nLy =nL−1y + 1

2.

(2.15)

From the image pyramid determined by (2.14) and (2.15) it is possible to handlelarge pixel motions in the image, and still keep the window used for integrationsmall in the subimages. This enables the tracker to utilize the first order Taylorexpansion. Typical numbers of levels that are used are 2, 3 or 4 depending on themaximum expected optical flow.

Pyramid tracking

For a given point x in image I the corresponding point xL =[xLx xLy

]Tin image

IL is

xL =x

2L. (2.16)

The residual pixel displacement for each image level L is derived, analogous toequation (2.12). In addition to (2.12) each pyramid level L is provided an initialguess, gL = [gLx gLy ]T , of the optical flow. This gives the expression

εL(dL) = εL(dLx , dLy ) =

xLx+wx∑x=xLx−wx

xLy+wy∑y=xLy−wy

(IL(x, y)−JL(x+gLx +dLx , y+gLy +dLy )

)2(2.17)

to minimize with respect to dL. The initial guess, gL, for each image level dependson gL+1 and the displacement vector d found in the previously evaluated level,L + 1, according to

gL = 2(gL+1 + dL+1). (2.18)

Using the initial guess gLm = [0 0]T , the final optical flow can be expressed as

d =Lm∑L=0

2LdL (2.19)

Iterative computation

Provided the pyramidal representation of the images, {IL}L=0,...,Lm and {JL}L=0,...,Lm ,the goal for every level L is to minimize (2.17) to obtain the optimal solution as in


(2.13). The solution is obtained by iterative computations in a Newton-Raphsonstyle. That is, the optimal solution is found by iterating in the direction of theminimum L2 norm.

Provided the initial guess vk−1 =[vk−1x vk−1

y

]Tfor each iteration k and by de-

noting JLk (x, y) = JL(x + vk−1x , y + vk−1

y ), the solution for iteration k is obtained byminimizing the error function εk

εk(nk) = εk(nkx, nky) =

xLx+wx∑x=xLx−wx

xLy+wy∑y=xLy−wy

(IL(x, y) − JLk (x + nkx, y + nky)

)2, (2.20)

with respect to nk = (nkx, nky).

The vector v is updated in each iteration according to

vk = vk−1 + nk . (2.21)

If the norm of the vector nk is smaller than the accuracy threshold the iterationcan be terminated and the solution vk can be used for the initial guess of the nextpyramidal level L − 1. More we get

dL = vk (2.22)

which is used according to (2.18) to update the initial guess gL−1 for the nextpyramidal level, L − 1.

Optical flow algorithm

Algorithm 1 summarizes the sequence of derivations made when finding theoptimum optical flow according to the pyramidal implementation of the Lucas-Kanade tracker described by Bouguet [2001].


Algorithm 1 Find the corresponding vector v in image J to the point u in image I.

Build representations of pyramidal I and J : {IL}L=0,...,LM and {JL}L=0,...,Lm

Initial pyramidal guess: gLm =[gLmx gLmy

]T=

[0 0

]Tfor L = Lm to 0 step -1 do

Location of u on image IL: uL =[ux uy

]T= u/2L

Estimate the derivative of IL with respect to x: Ix(x, y) = IL(x+1,y)−IL(x−1,y)2

Estimate the derivative of IL with respect to y: Iy(x, y) = IL(x,y+1)−IL(x,y−1)2

Spatial gradient matrix:

G =∑ux+wxx=ux−wx

∑uy+wyy=uy−wy

[I2x (x, y) Ix(x, y)Iy(x, y)

Ix(x, y)Iy(x, y) I2y (x, y)

]Initialization of iteration step L: v0 =

[0 0

]Tfor k = 1 to K step 1(or until ||nk || < accuracy threshold) do

Image difference: δIk(x, y) = IL(x, y) − JL(x + gLx + vk−1x , y + gLy + vk−1

y )

Image mismatch vector: bk =∑ux+wxx=ux−wx

∑uy+wyy=uy−wy

[δIk(x, y)Ix(x, y)δIk(x, y)Iy(x, y)

]Optical flow (according to Lucas-Kanade): nk = G−1bkGuess for next iteration: νk = νk−1 + nk

end forFinal optical flow for level L: dL = νK

Guess for next level L − 1: gL−1 =[gL−1x gL−1

y

]T= 2(gL + dL)

end forFinal optical flow: d = g0 + d0

Location of point u in image J : v = u + d

3Filtering

This chapter describes the two filters that are of interest for this implementation.It describes their properties and compares their suitability for handling measure-ments obtained from feature point tracking.

In this thesis, the objective is to obtain a filter that handles measurements of fea-ture points obtained by the tracker described in Chapter 2 [Bouguet, 2001]. Thefilter is required to handle non-linearities such as the non-linear displacement offeature points between frames. The goal is to use the obtained measurements inaccordance with their reliability, depending on what kind of measurement modelthat is used.

Two filters for approximate filtering in non-linear models have been consideredfor this thesis. Today at Autoliv they utilize an extended Kalman filter, EKF, forthe purpose of target tracking. The other interesting alternative is the unscentedKalman filter, UKF, that has become rather popular during the last decades.

3.1 Extended Kalman filter

The EKF is a non-linear version of the Kalman filter. The EKF is widely usedand has become a standard technique used in a number of nonlinear estima-tion applications during the last decades, often with good success [Wan and VanDer Merwe, Julier and Uhlmann, 2004].

The EKF can be applied to a nonlinear state-space model

xk = f (xk−1, uk−1) + wk−1 (3.1a)

zk = h(xk) + vk , (3.1b)

17

18 3 Filtering

where f is the transition function and h the observation model. Furthermorexk and zk are the state and measurement at time k, respectively. The processand observation noises, wk and vk , respectively, are assumed to be zero meanmultivariate Gaussian noises with covariances Qk and Rk , respectively.

The EKF time update is made according to

xk|k−1 = f (xk−1|k−1, uk−1) (3.2a)

Pk|k−1 = Fk−1Pk−1|k−1FTk−1|k−1 + Qk (3.2b)

where

Fk−1 =∂f

∂x

∣∣∣∣∣xk−1|k−1,uk−1

The measurement update is made according to

yk = zk − h(xk|k−1) (3.3a)

Sk = HkPk|k−1HTk + Rk (3.3b)

Kk = Pk|k−1HTk S−1k (3.3c)

xk|k = xk|k−1 + Kk yk (3.3d)

Pk|k = (I − KkHk)Pk|k−1, (3.3e)

where

Hk =∂h∂x

∣∣∣∣∣xk|k−1

The matrices Pk|k−1 and Pk|k are the predicted and corrected covariance, respec-tively, for the state estimation error x − x.

The EKF’s weak point is the linearisation around the state estimate, which mightbe inadequate. This is what has driven researchers to find alternative methodsaccording to Julier and Uhlmann [2004]. Even though the EKF has some knownflaws it has been widely used due to its relatively low computational complexityand because it is simple, it also often works well.

3.2 Unscented Kalman filter

Julier and Uhlmann [1997] proposed the UKF as a new extension of the Kalmanfilter for non-linear systems. They claimed the performance of the UKF to beequivalent to a Kalman filter in the linear case. The UKF was developed to ad-dress the weakness of the EKF, according to Julier and Uhlmann [2004], and itmakes no use of linearisation via Taylor expansion.

3.2 Unscented Kalman filter 19

Unscented transform

Uhlmann [1995] presented in his doctoral dissertation the unscented transform,UT, that would be fundamental for the filter that he and Simon Julier later wouldintroduced. The UT is based on the intuition that it is easier to estimate a prob-ability distribution than it is to estimate a non-linear function or transformation[Uhlmann, 1994]. The UT is a function that is used for estimation of non-lineartransformations by applying their non-linear transform to a finite set of samples,known as sigma points. The sigma points are distributed around the last esti-mated state so that their mean and covariance are in accordance with the esti-mated mean and covariance of the state.

The sigma points are hence likely to represent some aspects of the probabilitydistribution of the given state. When the non-linear transformation is applied onthe state and the sigma points the idea is that the result will reflect the transfor-mation of the entire probability distribution, see Figure 3.1.

Unscented transform

Figure 3.1: The unscented transform aims to reflect the transformation ofthe entire probability distribution of the given state.

Each sigma point, χi , is assigned a weight, Wi , which can be assigned any non-negative value under the condition that

2L∑i=0

Wi = 1, (3.4)

where L is the dimension of the state [Julier and Uhlmann, 2004]. 2L + 1 sigmapoints are generated, one at the mean and 2L at the contour of the covariance.This is to retrieve an unbiased transformation of the state and can be achievedthrough a number of variants of weights.

Time update

The UKF utilizes the unscented transform to update the state estimate and itscovariance. When doing the time update of the state the first stage is to gener-ate sigma points corresponding to the previous estimate of state, xk−1|k−1, andcovariance, Pk−1|k−1. The sigma points can be distributed according to the scaledunscented transform as

20 3 Filtering

χ0k−1|k−1 = xk−1|k−1

χik−1|k−1 = xk−1|k−1 +(√

(L + λ)Pk−1|k−1

)ii = 1, ..., L

χik−1|k−1 = xk−1|k−1 −(√

(L + λ)Pk−1|k−1

)i−L

i = L + 1, ..., 2L

(3.5)

where i and i − L indicates the column for the matrix within parentheses.λ = α2(L + κ) − L where κ usually is set to 0 and α to 10−3 [Julier, 2002].

The square root of Pk−1|k−1 is achieved through Cholesky decomposition of thecovariance matrix as

Pk−1|k−1 = AAT ⇒√Pk−1|k−1 = AT . (3.6)

Thereafter, the sigma points are propagated through the transition function fand are time updated according to

χik|k−1 = f (χik−1|k−1), i = 0, ..., 2L. (3.7)

The sigma points are assigned weights according to the scaled unscented trans-form proposed by Julier [2002] as

W 0m =

λL + λ

W 0c =

λL + λ

+ (1 − α2 + β)

W im = W i

c =1

2(L + λ), i = 1, .., 2L

(3.8)

where β is set to 2 which is optimal for the case with Gaussian distribution [Julier,2002].

The weights are used to provide the predicted state and the predicted covariancefrom the sigma points as

xk|k−1 =2L∑i=0

W imχ

ik|k−1 (3.9a)

Pk|k−1 =2L∑i=0

W ic

(χik|k−1 − xk|k−1

)(χik|k−1 − xk|k−1

)T. (3.9b)

Measurement update

The next step in the UKF algorithm is to predict the measurement, zk . The resultof (3.7) is propagated through the observation model,

γ ik = h(χik|k−1

), i = 0, ..., 2L. (3.10)

3.3 Filter properties 21

The observations, γ ik , combined with their respective weights generate the pre-dicted measurement, zk . This follows from

zk =2L∑i=0

W imγ

ik . (3.11)

The estimated measurement covariance is computed according to

Pzkzk =2L∑i=0

W ic (γ ik − zk)(γ

ik − zk)

T . (3.12)

The state measurement cross-covariance is also obtained from

Pxkzk =2L∑i=0

W ic (χik|k−1 − xk|k−1)(γ ik − zk)

T (3.13)

and the Kalman gain is obtained as

Kk = Pxkzk P−1zkzk (3.14)

where R is the covariance matrix of the measurement noise.

The state is then updated by adding the innovation weighted by the Kalman gainto the predicted state, xk|k−1 . The innovation comes from the difference betweenthe predicted measurement, zk , and the obtained measurement, zk . The updateis carried out according to

xk|k = xk|k−1 + Kk(zk − zk) (3.15)

and the covariance is updated as

Pk|k = Pk|k−1 − KkPzkzkKTk . (3.16)

3.3 Filter properties

Regarding performance of handling nonlinearities, both the EKF and the UKFare limited since they both utilize approximations that introduce loss in perfor-mance. In a reality where nonlinearities occur, approximations cannot be morethan qualified guesses and it is therefore hard to tell which of the filters that isbest.

Even though the UKF has gained popularity during the latest years there areno complete result about whether it is better or worse than the EKF. Julier andUhlmann [1997] claim that the performance of the UKF is better than the perfor-mance of the EKF. Another point of view is given by Gustafsson and Hendeby[2012]. They show that the claimed performance of the UKF does not hold for all

22 3 Filtering

cases but they also show that the UKF succeeds to provide a good approximationfor many common sensor models.

The UKF has been chosen in this project due to its ease of implementation andalso due to its potentially good performance. The implemented UKF is describedin Chapter 5. Details regarding the observation model are described there, i.e.how obtained measurements from feature points are handled.

4Projective geometry

This chapter describes some part of the mathematical concept known as projec-tive geometry. In large parts it relates to the theory described by Hartley andZisserman [2004].

More specifically, it focuses on the means that are used in this thesis for de-scribing transformation of vehicle models. The transformation of interest ap-pears when vehicles are tracked and one tries to match their appearance betweenframes. The projective transformation is important when matching vehicles thatare obtained in different poses and from different angles.

4.1 Background

The projective transformation extends the properties of the affine transformationthat was mentioned in Chapter 2. For example, by using an affine transformationwhen tracking objects, the parallel lines belonging to the object are preserved.This is not desirable since it is not realistic to assume that parallel lines are con-sistent between frames. When for example a vehicle seen from behind changespose, that is, appears from a different angle, the part rotated away from the cam-era becomes smaller in the the image. Similarly, the part that is rotated towardsthe camera is enlarged. In traffic, these so called cornerstone effects are likely toappear for vehicles that are observed by a camera.

If the visible sides of a vehicle are said represented with two planes, the projectivetransformation of those planes is a good estimate of how the vehicle is moving.The idea is to represent planes with feature points found on the vehicle and thenobtain the projective transformation by tracking the features and matching theplanes. Figure 4.1 shows the projective transformation of a rotated plane. This is

23

24 4 Projective geometry

Figure 4.1: Cornerstone effect of a rectangle. The right part is rotated in-wards and the left part outwards. The reference is drawn with dashed lines.

the kind of transformation that is desired to estimate by tracking feature pointsfrom vehicles between frames.

4.2 Projective geometry

Projective geometry is the field of geometry where geometric properties are in-variant under projective transformations. The projective space, P2, is what en-ables the projective geometry and it consists of a set of lines that pass throughthe origin of a vector space. One property of the projective space is that two par-allel lines, in Euclidean space, are said to intersect at infinity. This is analogousto a railway track meeting at the horizon, see Figure 4.2. From this follows thatangles are not relevant in a projective space since they are not invariant underprojective transformation.

Figure 4.2: Picture from Pixabay [2015]. The parallel track intersect at thehorizon. In a projective space each line represents a point.

Homogeneous coordinates, also known as projective coordinates, are utilized

4.2 Projective geometry 25

in projective transformation. A point x = (x, y) in 2D-space is represented inhomogeneous coordinates as x = (x1, x2, x3)T where the scale factor x3 , 0. Thepoint x can then be described in inhomogeneous coordinates as

x =x1

x3, y =

x2

x3. (4.1)

Multiplying a point with a constant in homogeneous coordinates results in thesame point since it is still on the same line in the projective space. x = (x, y, 1)is one example of a point in homogeneous coordinates which is considered to beequivalent to x = (2x, 2y, 2).

4.2.1 Homographies

Hartley and Zisserman [2004], page 32, defines homography as an invertible map-ping h from P

2 to itself such that three points x1, x2 and x3 are on the same line ifand only if h(x1), h(x2) and h(x3) are. See Figure 4.3 where each pair of collinearpoints in plane P1 and P2 occur where the lines from origin n intersects the planes.

P1

P2

n

Figure 4.3: Each projective line intersect the corresponding points in bothplanes, P1 and P2, and they all intersect in n, the origin in the projectivespace.

Hartley and Zisserman [2004] define a homography as a linear transformation onhomogeneous 3-vectors represented by a nonsingular 3 × 3 matrix according to


xi2 = Hxi1 (4.2)

where

H =

h1 h2 h3h4 h5 h6h7 h8 h9

and where the equality is not as per value but in the direction of the left- andright-hand side expressions. They are hence equal in the projective space andcan differ by magnitude.

A more eloquent way of describing the relation between xi1 and xi2is therefore bythe use of the cross-product equation

xi2 × Hxi1 = 0 (4.3)

where the zero vector comes from the fact that they are pointing in the samedirection.

The text in this part follows the theory presented by Hartley and Zisserman[2004], pages 32-33 and 87-93. Following the notations done by Hartley andZisserman [2004] the j-th row of matrix H can be denoted as hjT and thus

Hxi =

h1T xi

h2T xi

h3T xi

. (4.4)

Denoting xi2 = [xi2, yi2, w

i2]T the cross-product can be written as

xi2 × Hxi1 =

y i2h

3T xi1 − wi2h

2T xi1wi2h

1T xi1 − xi2h

3T xi1xi2h

2T xi1 − yi2h

1T xi1

= 0. (4.5)

Exploiting that hjT xi1 = xiT1 hj and denoting 0 = [0, 0, 0]T , (4.5) can be rewrittenas

0T −wi2x

iT1 y i2x

iT1

wi2xiT1 0T −xi2x

iT1

−y i2xiT1 xi2x

iT1 0T

︸︷︷︸=Ai

h1

h2

h3

︸︷︷︸=h

= 0. (4.6)

This provides us with three sets of equations among which only two are linearlyindependent. Hence, since Ai is only linearly independent for two block rows the

4.3 RANSAC 27

expression can be reduced to[0T −wi2x

iT1 y i2x

iT1

wi2xiT1 0T −xi2x

iT1

]︸︷︷︸

=A′i

h1

h2

h3

= 0. (4.7)

This way of expressing the equation is valid for any homogeneous coordinatesdefined as xi1 = [xi1, y

i1, w

i1]T and xi2 = [xi2, y

i2, w

i2]T . The scaling parameters wi1

and wi2 can be arbitrarily chosen but for convenience wi1 = wi2 = 1 are goodvalues.

Finding H

Considering that H has nine entries might indicate that there are nine degreesof freedom, but since H only is determined up to a certain scaling it leaves eightdegrees of freedom left. The scaling can be arbitrarily chosen, for example as||h|| = 1.

By mapping one corresponding point between two different frames there are twodegrees determined. Since one point has one x and one y component there twoconstraints added for each mapped point. Hence four points are required forsolving H .

Provided four points one has four sets of equations as A′ih = 0, according to (4.7).Rewriting the four sets of A′ih = 0 into Ah = 0 by adding the rows of A′i beneatheach other provides totally eight equations to use for finding the unknown vectorh. The solution is obtained by finding the null-vector of A.

When more than four points are used to find H the system of equations is over-determined. If the points are from image coordinates they are likely to containnoise. From this follows that there is no exact solution to Ah = 0. The interestingsolution is hence found from minimizing the norm ||Ah||. Minimizing ||Ah|| canbe achieved by finding the unit singular vector corresponding to the smallestsingular value of A.

From tracked feature points, described in Chapter 2, the idea is that it is possibleto derive a homography that relates the features between the frames. The numberof features is required to be more than four in this implementation and hence anoverdetermined system of equation is provided. Due to unreliable feature pointsit is necessary to detect outliers. A possible way to do this is to utilize a RANSACmethod.

4.3 RANSAC

Kovesi [2014] provides a function named ransacfithomography that uses aRANSAC (RANdom SAmple Consensus) method to detect outliers. The functionby Kovesi makes use of four randomly chosen points to derive a homography.


The homography is then applied to all other points and then the distances be-tween the plane and the points are calculated. Those points that lie within theconsidered threshold are considered inliers and those that do not are consideredoutliers.

After a fixed number of trials the homography that fits the most number of inliersis chosen. The inliers for the best model are used to derive a new homographyfrom an overdetermined system of equations, since it is based on more than fourpoints. The obtained homography is then used to obtain the measured transfor-mation of the ROI.

5Implementation

This chapter describes the algorithms and the structure of the implementation.The chapter relates to the theory that is described in the previous chapters. Theimplementation suits the overall goal of the thesis well in theory since it utilizesboth visible sides of vehicles to determine displacement.

5.1 Background

How to represent the objects of interest came to be the most vital part of theimplementation. The object of interest is a vehicle observed in a more or lessskewed pose. The implementation is provided images and prediction of vehicles’states and state covariances from Autoliv’s existing system. The predicted statesare estimates of vehicles in matters of position, speed, heading, yaw rate andsize. A vehicle’s position and heading direction is relative to the ego vehicle buta vehicle’s speed and yaw rate are relative to the world. The existing system alsoprovides a transition function and a function to determine the region of interest,ROI. It is Autoliv’s intention to not leave out any details of how they choose todescribe the state of a vehicle or how their transition function is designed.

The ROI is a patch of the image where the vehicle is located and the ROI is derivedfrom the vehicle’s state. Two quadrangles represents the ROI, one for the back ofthe vehicle and one for the side. The properties of the ROI are depending onthe vehicle’s position and heading direction. The transition function determinesan estimate of a state forward in time. Autoliv’s existing transition function isalso applicable inversely. By applying the inverse transition function one can es-timate a state backwards in time. Since the prerequisites in this project provideda state that was updated in time it was Autoliv’s intention to use the inverse tran-

29

30 5 Implementation

sition function to determine the previous state. In this implementation the ROIis used in combination with the inversely applied transition function to predictmeasurements.

An interface is used to utilize data and functionalities from the existing system.The interface enables one to receive and send data to and from the existing sys-tem.

When a predicted state and predicted covariance are received from Autoliv’s sys-tem the implementation searches among existing tracks for a related state. Exist-ing tracks consist of estimated states of vehicles and coordinates of the vehicle’scorresponding feature points. Depending on the Euclidean distance to previouslytracked vehicles, the tracker will either initiate a new track or update an existingone.

5.2 Initiating track

If no existing track has a state that is close enough to the predicted state, in termsof position, a new track is initiated. The position of a state is described in worldcoordinates where the origin is set at the ego vehicle, the vehicle where the cam-era is attached. The estimated position of a vehicle corresponds to the location ofthe point in Figure 1.1 from where the arrow for heading direction originates.

When a new track is initiated the tracker locates new features within the deter-mined ROI or more specifically, within the area of interest, AOI, which is a morerestricted area inside the ROI. This is shown in Figure 5.1, where the AOI is givenby the two smaller rectangles that are inside the ROI. The state and the covari-ance of the initiated track is set equal to the predicted state and covariance as

xk|k = xk|k−1,

Pk|k = Pk|k−1,(5.1)

where xk|k−1 and Pk|k−1 are the predicted state and covariance received from theexisting system. The updated state and covariance, xk|k and Pk|k respectively, aresaved and also sent back to Autoliv’s system. The image is saved so that one cantrack the located features in a following image.

The implementation makes use of a function from OpenCV [2014b] to extractfeature points. Features are required to be positioned with a minimum distance,pixel-wise, from the closest neighbouring feature. This is to prevent that featurepoints are located too close to each other since, otherwise, there is a risk that theywill be exchanged with each other. Exchanged feature points would decrease thereliability of the tracker since obtained measurements would contain more noise.

A relative quality measure is used for feature points. This measurement requiresthat no feature is allowed to have less than 10 percent of the quality obtained

5.2 Initiating track 31

Figure 5.1: Vehicle spotted from behind with feature points initiated on backand side. The ROI is drawn in white and divided into two parts, one repre-senting the back and the side. The AOI is drawn in green and can be seen asthe two inner rectangles.

from the best feature. The quality is obtained from the left hand-side expressionof equation (2.10).

The AOI is used to ensure that found feature points belong to the vehicle and toexclude corners. Feature points located on corners might change characteristicdepending on reflection and variation in light. Such feature points are thereforeless reliable and not desirable to rely on when estimating the state of a vehicle.A feature point located on a corner could also be difficult to assign to any side ofthe vehicle since it might not fit to any of the planes that represent the vehicle’ssides.

Feature points from the side of the vehicle are used only if the side is representedin a patch that is wide enough. Figure 5.1 shows this situation where featurepoints are drawn on both the back and the side of the vehicle. The image patchrepresenting the side in Figure 5.1 is at the verge of not being utilized for trackingfeature points since it is very narrow.

32 5 Implementation

5.3 Tracking

When the Euclidean distance between the predicted state’s position and any ex-isting track’s position is small enough the tracker will make use of the matchedtrack’s feature points. The feature points belonging to that track are located inthe current image and from the tracked feature points the measurement is de-rived. The idea is to describe the displacement of feature points between imageswith homographies and apply the same homographies on the ROI that is derivedfrom the predicted state. The result from applying a homography on a ROI is anew ROI. The considered measurement in this implementation is the ROI that isretrieved from applying the measured homography on the predicted ROI.

5.3.1 Sigma points from predicted state

The tracker determines a predicted measurement, based on the properties of theUKF, from the state’s ROI. The predicted state, xk|k−1, and the predicted covari-ance, Pk|k−1, are used to generate sigma points. The sigma points are distributedaccording to (3.5) but xk|k−1 and Pk|k−1 are used instead of xk−1|k−1 and Pk−1|k−1.

Since the idea of the observation model is to estimate the previous state’s ROI, theobservation model utilize that the transition can be applied inversely to retrievethe previous state. The observation model applies the inverse transition functionon each sigma point according to

f −1(χik|k−1) = χik−1|k−1. (5.2)

The obtained sigma points, χik−1|k−1, are used to determine their respective ROIs.This is possible since a sigma point contains the same type of information as astate. The corners of their ROIs are the observations, γ ik , as in (3.10) and theyare used to determine the predicted measurement, zk , according to (3.11). Thepredicted measurement, zk , is hence an estimate of the previous state’s ROI.

Figure 5.2 shows the corners that are used to describe a ROI. There are only sixpoints needed to represent a ROI since the two in the middle are used both for theback and for the side. By moving the corners of a ROI one obtains a transformedROI that is possible to describe by applying homographies on the original.

Depending on whether the side of the vehicle is used or not the measurementwill consist of either four or eight points. The side is not used if the patch thatconstitutes the side in the image is too narrow. When both sides are used thereare two sets of points that contain the corner points of the L-shape. The points atthe corner are duplicated since the implementation separates the measurementsobtained from the back and from the side of the vehicle.

5.3 Tracking 33

Figure 5.2: Vehicle spotted from behind with ROI drawn. Points at the cor-ners of the ROI are used to describe the transformation of the vehicle.

5.3.2 Predicted measurements

The obtained sets of corner points are the observations, γ ik , of the sigma-states,h(χik|k−1), in accordance with (3.10). Observation γ ik consists of four pairs of uand v coordinates for the corners as

γ ik =[uik,1 vik,1 uik,2 vik,2 uik,3 vik,3 uik,4 vik,4

]T. (5.3)

The predicted measurement is determined both for the back and for the sideof the ROI. Hence two separate observations are obtained for each set of sigmapoint.

The predicted measurement, zk , is thereafter obtained by using (3.11) and thepredicted measurement covariance, Pk|k−1, according to (3.12).

An alternative approach for the predicted measurement would be to apply thetransition function inversely on the predicted state xk|k−1 to retrieve xk−1|k−1.Then generate sigma points around xk−1|k−1 and propagate those trough the tran-sition function. Doing so would provide a predicted measurement zk for thecurrent state’s ROI.

The weights that are used in the filter are chosen according to (3.8) and theKalman gain is derived as in (3.14) where the covariance matrix R is estimated

34 5 Implementation

from data. The matrix R is further tuned manually to give the filter satisfactoryproperties.

5.3.3 Optical flow of feature points

The coordinates of the related track’s feature points and the previous image areused to determine the optical flow. The feature points belonging to the relatedtrack, with state xk−1|k−1, are searched for within the ROI that belongs to thepredicted state, xk|k−1. The search for every feature is restricted to the neighbour-hood of the feature’s location in the previous frame. The matched features in thecurrent frame are required to be within the ROI in order to prevent erroneousfeatures to affect the performance of the filter.

In summary the implementation finds new coordinates for the feature pointsfrom the previous image within the new image. The feature points’ displace-ments between the images are the optical flow. The search for matching featuresfollows the approach described in Chapter 2 and is performed through the imple-mentation from OpenCV [2014a]. See Algorithm 1 in Chapter 2 for an overview.

5.3.4 Measurement from homography

The matched feature points are used to derive an estimate of the vehicle’s trans-formation. A homography is computed to match the displacement of the featurepoints such that it describes the displacement from the current frame to the pre-vious. If both visible sides are used for the track, two homographies are derivedindependently of each other.

The feature points are converted into homogeneous coordinates with scale fac-tor one, see page 25. The homography is derived with the implementation fromKovesi [2014] which uses the theory described in Chapter 4. The minimum num-ber of feature points are eight and hence an overdetermined system of equationsis provided, see page 27 for details. The requirement of having at least eighttracked feature points regards each side of the vehicle. The requirement is set tohave a more robust estimate of the vehicle’s transformation between images sinceit is assumed that some feature points are tracked erroneously.

This stage introduces uncertainty since it is assumed that all features belong toa flat plane, which in general is not the case with features found in a frame. Forexample a feature located on the towbar or the rear-view mirror is distanced tothe plane that is represented by the features located on the licence plate or frontdoor.

The assumption that all feature points lie in a plane requires that outliers are de-tected and removed so that only reliable feature points are used to estimate thehomography. This implementation makes use of the ransacfithomography func-tion written by Kovesi [2014] to sort out outliers and provide a homography thatis determined from inliers. Figure 5.3 shows a track with two outliers.

The homographies obtained from the tracked feature points are used to providethe measurement, zk . From applying the obtained homographies on the pre-

5.3 Tracking 35

Outlier

Outlier

Figure 5.3: Bird-view of a vehicle represented by two planes with featurepoints. Two outliers exist.

dicted state’s ROI, the measurement zk is obtained. zk consists of the cornersfrom a displaced ROI. When feature points are tracked on both the back and onthe side of the vehicle two homographies are obtained and hence the measure-ment zk consists of eight points, four for the back and four for the side. Sincethe duplicated set of points from the ROI’s middle are transformed with two dif-ferent homographies zk consists of eight individual points. Likewise, when onlyfeatures on the back are tracked zk consist of four points.

If few features are possible to track or if a poor homography is obtained, thenno measurement is obtained. When no measurement is obtained a new track isinitiated so that one can obtain a measurement in the following image. The newtrack will follow the procedure described earlier in this chapter, i.e. extractingnew feature points and using the predicted state for state updates.

5.3.5 Measurement update

The state is updated according to (3.15) where the innovation comes from thedifference between the predicted measurement and the obtained measurement,zk and zk respectively. The covariance is updated according to (3.16).

The updated state is saved so that it is available to match with states that areprovided the implementation in following images. The state and the covarianceare provided to the existing system which performs time updating of the stateand the covariance. The time updated state and covariance can then be providedto the implementation in the following image. Whether the implementation isprovided the time updated state and covariance depends on Autoliv’s existingsystem and hence that decision is outside the scope for this thesis.

36 5 Implementation

5.4 Overview

A flowchart of the implementation is shown in Figure 5.4. Two types of inputsare provided, either an updated image or the predicted state and covariance.

Image

Initiate andsave track

Predictedstate andpredictedcovariance

Track features andderive homography

Predict measurementsusing filter characteristics

Fuse measurementsand save track

Image

Enough featurestracked and reliable

homography?

Yes Updatedstate

Updatedstate

Find relatedtrack?

Found

Not found

No

Figure 5.4: A flowchart that describes the logic of the tracker. The existingtracks are saved between images and are available when trying to match theprovided predicted state.

6Experiments and results

This chapter describes the result and the experiments made to obtain it. More-over the chapter brings up potential sources for errors and evaluates the result.

6.1 Experiment description

Experiments are carried out by running the implementation on two recordedimage sequences. For the video sequences one and two there are 42 obtainedtracks in total, lasting from a pair of images up to 13 images. The average lengthof a track is 5.14 images and the variance is 11.24. The recorded sequences areobtained with a camera that is fixed to a car. The image sequences contain imagesfrom city traffic where vehicles are spotted at different angles in different trafficsituations. The car with the camera is the considered ego vehicle.

The performance of the implementation is evaluated based on the accuracy of thevisualized representation that consists of the estimated ROI drawn on top of theimage. Since no ground truth exists for the recorded sequences it is not possibleto verify the accuracy of the result quantitatively. When no feature point trackingis active the state of a vehicle is based on other image evaluation methods that areperformed by Autoliv’s existing system. An optimal and desired result is that theROI is drawn over the part of image that contains the vehicle and that the breakpoint of the L-shape is at the vehicle’s corner.

Autoliv’s current implementation utilizes feature point tracking to determineaffine transformations of the vehicles between images and it sometimes projectsthe ROI inaccurately. For example, the existing implementation sometimes failsto represent a vehicle when the yaw angle changes, i.e. when the vehicle changesheading direction relative to the ego vehicle. When this occurs, the ROI faces

37

38 6 Experiments and results

another direction than the vehicle does and hence the ROI does not representthe sides of the vehicle. The existing implementation does not obtain any mea-surements of the projective transformation from the gray-scale images and hencethere is room for improvement.

A result is considered bad when the ROI is drawn with decreased accuracy com-pared to when no feature point tracking is used. One example is when the esti-mated ROI contains parts of the image that do not belong to the vehicle. If thatis the case the tracker might use features that are found outside the vehicle tomodel the vehicle’s displacement between frames. The result is also consideredto be bad when the ROI is drawn on a smaller part than what is visibly consideredto be accurate. A small ROI is not desired since it limits the area that the trackercan utilize for finding and tracking feature points.

To evaluate whether the obtained homography reflects the projective transforma-tion the ROI must follow a vehicle while it changes pose in an image sequence.Essential for a good result is also that the side of the vehicle is obtained in thepart of ROI that represents the side. Neither more nor less than the side of thevehicle should be inside that part.

Unfortunately the system does not provide the ability to compare Autoliv’s exist-ing feature point tracker with the implementation made in this project. However,the obtained result makes it clear that the existing implementation performs bet-ter.

6.2 Obtained result

The developed implementation performs poorly. There are rare occasions wherethe tracker manages to follow a vehicle through a sequence of images, but thetracker never manages to follow a vehicle through an entire traffic situation, i.e.through the whole sequence of images where the vehicle is visible.

Figures 6.1 and 6.2 demonstrate desired results from the implemented tracker.In Figure 6.1, the ROI fits the vehicle well with feature points both from the backand from the side. In Figure 6.2, the side of the ROI is drawn inside the back, inaccordance with the heading direction of the vehicle.

Examples of bad performance of tracks are demonstrated in Figures 6.3and 6.4. The effect in Figure 6.3 is a result from an erroneously estimated ho-mography where the obtained translation is greater than the actual. Figure 6.4shows the result from a homography that estimates the affine scaling wrong.

By observing the feature points through an image sequence one can see the weak-nesses of the feature points. Bad match of feature points between frames occur-ring and this can cause poor estimates of the homographies, as can be seen inFigure 6.3. Feature points sometimes drift in ways that are not related to thevehicle’s displacement, which is likely due to change of illumination and weak-nesses of the chosen algorithm. If the optical flow that is obtained from tracked

6.2 Obtained result 39

Figure 6.1: An ROI aligned correctly to the vehicle with feature points fromboth back and side.

Figure 6.2: An ROI in accordance with the vehicle. The side of the vehicleis represented inside the part that represents the back and is in accordancewith the heading direction of the car.

feature points is poor the obtained measurements can be unreliable. Even thoughthe implementation is designed to handle outliers it is affected when few featuresare available since that limits the ability to distinguish outliers. Drifting featurepoints are observed occasionally but this is not considered to be the crucial prob-lem for this implementation since the tracker still can fail even when featurepoints are tracked reliably.

The implementation has also been evaluated on image sequences where the fea-ture points are consistent between frames but the tracker still fails to follow thevehicle. It is considered an error when the ROI does not match the patch of theimage that contains the vehicle. Such error is considered as the major weaknessfor this implementation. It is hard to identify the source for the poor ROI sincethe derived homographies are results from a random sample consensus function.

The UKF seems to handle the circumstances well when reliable measurementsare obtained but it ceases to function when unreliable measurements are ob-tained. Unfortunately unreliable measurements occur frequently, which can be


Figure 6.3: A frame where the ROI is displaced compared to the vehicle. Inthis case the features are tracked reliably but the ROI is displaced due to anunreliable measurement.

Figure 6.4: A frame where the ROI is zoomed out to the vehicle. In thisframe the ROI is badly scaled to the vehicle, it is larger.

concluded from performing a χ2-test on the obtained innovation.

The innovation, used in (3.15), is taken as the difference between the obtainedmeasurement and the predicted measurement, from (3.11).

Uncertainties in the measurements are due to various sources, among which arethe reliability of feature points and the robustness of a RANSAC function to findthe best homography. Those aspects will be further discussed later in this chapter,see Section 6.3. The predicted measurement does also introduce uncertaintiesdue to the properties of the UKF, even though it is not considered as critical asthe uncertainties in the measurements.

6.2.1 Obtained tracks

The graphs below are illustrating some of the tracks that were obtained withthe tracker. From the graphs one can see that the tracker is able to utilize boththe feature points, the homographies and the UKF to track vehicles in imagesequences of limited length. However none of the tracks lasts for more than 20


frames and hence one can conclude that this tracker is not able to provide a driverwith necessary information regarding the surrounding vehicles in traffic.

The different parameters in the graphs are interpreted as follows,

• The x parameter shows the vehicles distance ahead relative to the ego vehi-cle.

• The y parameter shows the vehicles distance to the left relative to the egovehicle.

• The z parameter shows the vehicles vertical stage relative to the ego vehicle.

• The speed and yaw rate of the vehicle are relative to the world.

• The heading direction of the vehicle is relative to the ego vehicle, seeFigure 1.1 where the heading direction is represented with the yaw angle.

• The width, length and height of the vehicle are estimated continuously dur-ing each track.

For example, when viewing the graphs one should interpret a decreasing x valueas that a vehicle is approaching. That could be the case for a oncoming vehicle,when the vehicle stops, and the ego vehicle keeps going, or when the ego vehicledrives faster than the vehicle, and therefore is approaching. An increasing valueof the y parameter indicates that the vehicle drives to the left relative to the egovehicle and this should correspond with a positive value of the vehicles headingdirection as in Figure 6.9.

Figure 6.5: Graphs of a track where the vehicle is turning left relative to theego vehicle, seen on the y parameter. The positive heading direction doesalso indicate this.


Figure 6.6: Graphs of a track where the y parameter decreases somewhateven though the heading direction is positive. This indicates that the egovehicle also is turning left, slightly sharper than the tracked vehicle.

Figure 6.7: Graphs of a track where the distance x to the vehicle is increasing.This corresponds to that the speed also is increasing, which indicates that theego vehicle does not accelerate as much as the vehicle in front.


Figure 6.8: Graphs of a track where the distance to the vehicle ahead is quiteconstant, within 0.1 meter, and the speed is decreasing. This indicates thatthe ego vehicle also is slowing down. The increasing heading direction re-sults in a small displacement to the left.

Figure 6.9: Graphs of a track where the x parameter increases and the speedis relatively constant, indicating that the ego vehicle is driving somewhatslower than the tracked vehicle.

One should have in mind that these graphs only display the tracks that are ob-tained with the made implementation. Whether they actually provide accurateestimates of the vehicles’ position and movement is hard to conclude since no


ground truths exist.

It is hard to tell why the tracker fails from the look of the graphs. It is not ob-vious that there is any specific factor, or parameter, that is fatal for the trackerand makes it lose the tracks. For example, if all graphs would have ended whenthe graph representing the y parameter increased it would have been possible toclaim that those specific scenarios were fatal for the tracker. Unfortunately, theobtained data do not show any indication of that kind.

What makes the tracker lose the tracks is considered to be due to several dif-ferent factors in the implementation. Among the considered weaknesses in theimplementation are the UKF, the reliability of the homographies estimated fromtracked feature points and the robustness of the RANSAC algorithm to find thebest homographies. The potential impact and error introduced by those aspectsare further discussed in Section 6.3.

6.3 Discussion

There might be many reasons for the lack of performance in this implementationand there are three vital aspects to evaluate among the used methods:

• The performance of the UKF.

• The reliability of the homographies derived from the displacements of fea-ture points.

• The RANSAC algorithm used to find the best homographies.

Regarding the UKF filter, its general properties are well suited to estimate thestate of a vehicle. Though, the choice of sigma points in this implementation isa possible source for errors. When sigma points are created they are distributedaccording to the predicted covariance of the predicted state. The distributionof the noise is assumed to be Gaussian and hence the sigma-states are createdaccordingly. If the noise is not Gaussian the created sigma points do not reflectthe correct probability distribution of the state and hence the approximation ofthe probability distribution will be wrong. If one use samples that are unlikelyto appear, the approximation of the distribution is likely to be poor. That is,if none of the chosen sigma points are likely to appear it is hard to accuratelyapproximate the distribution.

When the system is evaluated it is observed that poor predictions of the measure-ments sometimes are generated. This is likely due to a poor state update whichmight originate from a bad measurement in the previous image. Even though thisoriginates from the inaccurate estimate of a homography it is unfortunate that theUKF suffers from this. This could be an indication of that the estimated homogra-phies do not provide good measurements and that the implemented UKF doesnot handle noisy measurements.

6.3 Discussion 45

Another possible source of error is the design parameter and covariance matrixR used in the UKF filter. The lack of ground truth is a weakness since it affectsthe choice of covariance matrix R. If ground truth existed one could obtain thecorrect measurement noise and determine the accurate covariance matrix R.

Pixel-wise the measurement noise variance differs between a feature’s u and vcoordinates, i.e. horizontal and vertical coordinates, respectively. It is larger foru coordinates than for v coordinates. Different view-points of the same featuresare likely to be affected by reflections of sunlight and other surrounding elements.The influence of reflection can cause poor feature point tracking which can resultin bad measurements.

When a vehicle is tracked the feature points that belong to the vehicle arematched between frames. Since there are uncertainties whether the match offeature points is correct the results are unreliable. For example, when a featurebelongs to something light-sensitive, typically a reflection, there is a risk that thefeature drifts aside due to change of reflecting light and not because of a displace-ment of the vehicle itself. This affects the result with increased risk of poor ho-mography estimates. Attempting to increase the requirements for matching fea-ture points between frames makes it more likely to lose track of a feature. Whena feature point is lost the tracker is required to initiate new feature points whichprevents the implementation from measuring the displacement of vehicles.

Features concentrated to a small area of the ROI can cause errors since the imple-mentation is sensitive to the behaviour of those adjacent features. For example, ahomography determined from features from the licence plate is likely to be moresensitive to noise than if features drawn from both the tail lights are used. This isa possible source of error in this implementation where there might be room forimprovement.

Feature points drawn from a vehicle are hard to utilize without specific restric-tions. To use features for the purpose of homography estimation one could prob-ably gain performance by requiring more from the distribution of the featurepoints and by defining relations in between them. However, how to find anddefine relationship between feature points is not covered by this thesis.

The use of the RANSAC algorithm to find the best set of inliers that estimates thebest homography is also a possible source of error. The RANSAC is used when onetries to map the previous image’s features to the current image’s features. Whenfinding the set of points that best fits the transformation between the imagesthere is only a limited number of attempts made. Since the features are selectedrandomly, there is no guarantee that the best homography is found. However,the number of attempts to find the best set of features was increased withoutobserving any improved performance.

The performance of the implementation in this project is worse than for Autoliv’sexisting tracker. Whether it is relevant or not to describe projective transforma-tion with homographies that are based on tracked feature points from vehicles isnot concluded in this thesis. Deciding whether the way of describing projective


transformation in this implementation is relevant or not lies beyond the scope ofthis thesis.

6.4 Evaluation

From the obtained result there is more to desire and hence there are things thatcould be improved in the implementation.

The approach of using homographies is considered interesting since it providesthe ability to describe more information than what method that only describesaffine transformation does. The approximation of planes from feature points thatare drawn from either sides of a vehicle is likely an approach where there is roomfor improvement, for example by requiring more from the distribution of thefeature points.

The lack of accurately described measurement noise is one factor that affects theperformance of the tracker, but as was mentioned in the discussion there areother factors as well.

7Conclusion

7.1 Conclusions and remarks

This thesis has investigated the possibility of utilizing homographies for deter-mining displacement of vehicles between images. The approach can (and should)be looked at as a first attempt where the concept is evaluated.

The lack of performance in this implementation seems to be a result of manyfactors, as discussed in the previous chapter, and hence it is hard to identifyanything specific. Even though there are factors that could be changed to improvethe overall performance there are also limits to what one can achieve with thisapproach. For example, the approximation of planes from feature points foundon a vehicle is not likely to provide an exact solution. It is desirable to haverobustness towards erroneous measurements so that it is possible to utilize theapproximations.

7.2 Future work

In this implementation the homographies were estimated upon the displacementof features that are randomly distributed on the side of a vehicle. One possibleother way to use homographies in a similar sense would be to take features fromdifferent areas of the ROI to ensure better accuracy of the estimated homogra-phies. By forcing feature point extraction from different parts of the ROI theestimated homographies are likely to better approximate the displacement.

A potential source for the lack of performance could be the underlying logic ofthe implementation, described in Chapter 5. There might be room for improve-ment by developing the approach further.

47

48 7 Conclusion

Instead of using homographies an alternative approach could be to approximateplanes from feature points extended with depth values. By tracking features ingray scale images and updating the depth with values from depth images it wouldbe possible to determine the displacement in a similar way. This requires reliabledepth values, which possibly could be obtained by filtering the depth images.

Bibliography

Hernán Badino, Uwe Franke, and David Pfeiffer. The stixel world - a compactmedium level representation of the 3D-world. In Pattern Recognition, vol-ume 31, pages 51–60. Springer, 2009. Cited on page 1.

J. Bouguet. Pyramidal implementation of the Lucas-Kanade feature tracker: De-scription of the algorithm. Technical report, Intel Corp., Microprocessor Re-search Labs, 2001. Cited on pages 5, 12, 14, and 17.

O. Serdar Gedik and A. Aydin Alatan. Fusing 2D and 3D Clues for 3D TrackingUsing Visual and Range Data. In Proceedings of the 16th IEEE Conferance onInformation Fusion, pages 1966–1973. IEEE, 2013. Cited on page 5.

Fredrik Gustafsson. Statistical sensor fusion. Studentlitteratur, 2010. Cited onpage 5.

Fredrik Gustafsson and Gustaf Hendeby. Some Relations Between Extended andUnscented Kalman Filters. Signal Processing, IEEE Transactions, 60(2):545–555, 2012. Cited on page 21.

Richard Hartley and Andrew Zisserman. Multiple view geometry in computervision. Cambridge University Press, 2004. Cited on pages 23, 25, and 26.

Bingwei He, Zeming Lin, and Youfu F. Li. An automatic registration algorithmfor the scattered point clouds based on the curvature feature. Optics & LaserTechnology, 46:53–60, 2013. Cited on page 5.

Simon J. Julier. The scaled unscented transformation. In Proceedings of the 2002American Control Conference, volume 6, pages 4555–4559. IEEE, 2002. Citedon pages 6 and 20.

Simon J. Julier and Jeffrey K. Uhlmann. A New Extension of the Kalman Filterto Nonlinear Systems. In AeroSense, pages 182–193. International Society forOptics and Photonics, 1997. Cited on pages 6, 18, and 21.

Simon J. Julier and Jeffrey K. Uhlmann. Unscented filtering and nonlinear esti-mation. Proceedings of the IEEE, 92(3):401–422, 2004. Cited on pages 17, 18,and 19.

49

50 Bibliography

Peter Kovesi. Matlab functions, 2014. URL http://www.csse.uwa.edu.au/~pk/research/matlabfns/. Cited on pages 27 and 34.

Bruce D. Lucas and Takeo Kanade. An iterative image registration technique withan application to stereo vision. In Proceedings of the 7th International JointConference on Artificial Intelligence, volume 2, pages 674–679, 1981. Citedon pages 5 and 10.

OpenCV. calcopticalflowpyrlk(), 2014a. URL http://docs.opencv.org/modules/video/doc/motion_analysis_and_object_tracking.html. Cited on page 34.

OpenCV. goodfeaturestotrack(), 2014b. URL http://docs.opencv.org/modules/imgproc/doc/feature_detection.html. Cited on page 30.

Pixabay. Railway rails. https://pixabay.com/en/railway-rails-seemed-gleise-train-711567/, 2015. Online,accessed: 2015-06-05. Cited on page 24.

Jianbo Shi and Carlo Tomasi. Good Features to Track. Technical report, CornellUniversity, 1993. Cited on page 9.

Jianbo Shi and Carlo Tomasi. Good features to track. In Proceedings of IEEEConference on Computer Vision and Pattern Recognition, pages 593–600. IEEE,Computer Society Conference, 1994. Cited on pages 5, 7, 8, 9, and 10.

Jeffrey K. Uhlmann. Simultaneous Map Building and Localization for Real TimeApplications. Transfer thesis, University of Oxford, 1994. Cited on page 19.

Jeffrey K. Uhlmann. Dynamic map building and localization: New theoreticalfoundations. PhD thesis, University of Oxford, 1995. Cited on page 19.

Eric A. Wan and Rudolph Van Der Merwe. The Unscented Kalman Filter forNonlinear Estimation. In Proceedings of the IEEE 2000 Adaptive Systems forSignal Processing, Communications, and Control Symposium, pages 153–158.Cited on pages 6 and 17.

Thibaut Weise, Thomas Wismer, Bastian Leibe, and Luc Van Gool. In-hand Scan-ning with Online Loop Closure. In Proceedings of Computer Vision Work-shops, pages 1630–1637. IEEE, 2009. Cited on page 5.

Greg Welch and Gary Bishop. An introduction to the Kalman filter. TechnicalReport NC 27599-3175, Department of Computer Science University of NorthCarolina at Chapel Hill, 1995. Cited on page 6.

http://www.csse.uwa.edu.au/~pk/research/matlabfns/

http://www.csse.uwa.edu.au/~pk/research/matlabfns/

http://docs.opencv.org/modules/video/doc/motion_analysis_and_object_tracking.html



http://docs.opencv.org/modules/imgproc/doc/feature_detection.html

http://docs.opencv.org/modules/imgproc/doc/feature_detection.html

https://pixabay.com/en/railway-rails-seemed-gleise-train-711567/

https://pixabay.com/en/railway-rails-seemed-gleise-train-711567/

Upphovsrätt

Detta dokument hålls tillgängligt på Internet — eller dess framtida ersättare —under 25 år från publiceringsdatum under förutsättning att inga extraordinäraomständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för icke-kommersiell forskning och för undervisning. Överföring av upphovsrätten viden senare tidpunkt kan inte upphäva detta tillstånd. All annan användning avdokumentet kräver upphovsmannens medgivande. För att garantera äktheten,säkerheten och tillgängligheten finns det lösningar av teknisk och administrativart.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsmani den omfattning som god sed kräver vid användning av dokumentet på ovanbeskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådanform eller i sådant sammanhang som är kränkande för upphovsmannens litteräraeller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förla-gets hemsida http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet — or its possi-ble replacement — for a period of 25 years from the date of publication barringexceptional circumstances.

The online availability of the document implies a permanent permission foranyone to read, to download, to print out single copies for his/her own use andto use it unchanged for any non-commercial research and educational purpose.Subsequent transfers of copyright cannot revoke this permission. All other usesof the document are conditional on the consent of the copyright owner. Thepublisher has taken technical and administrative measures to assure authenticity,security and accessibility.

According to intellectual property law the author has the right to be men-tioned when his/her work is accessed as described above and to be protectedagainst infringement.

For additional information about the Linköping University Electronic Pressand its procedures for publication and for assurance of document integrity, pleaserefer to its www home page: http://www.ep.liu.se/

© Pär Lundgren

http://www.ep.liu.se/

http://www.ep.liu.se/

Documents

Master’s Thesis - DiVA portal839146/FULLTEXT01.pdf · Autoliv Electronics AB ... This master’s thesis describes a way to represent ... The existing system at Autoliv makes use