Initialisation and Estimation Methods for Batch ...liu.diva-portal.org/smash/get/diva2:646572/FULLTEXT01.pdfInitialisation and Estimation Methods for Batch Optimisation of Inertial/Visual

Technical report from Automatic Control at Linköpings universitet

Initialisation and Estimation Methods forBatch Optimisation of Inertial/VisualSLAM

Martin A. Skoglund, Zoran Sjanic, Fredrik GustafssonDivision of Automatic ControlE-mail: [email protected], [email protected],[email protected]

9th September 2013

Report no.: LiTH-ISY-R-3065Submitted to Transactions on Robotics – T-RO

Address:Department of Electrical EngineeringLinköpings universitetSE-581 83 Linköping, Sweden

WWW: http://www.control.isy.liu.se

AUTOMATIC CONTROLREGLERTEKNIK

LINKÖPINGS UNIVERSITET

Technical reports from the Automatic Control group in Linköping are available fromhttp://www.control.isy.liu.se/publications.

http://www.control.isy.liu.se/~ms

http://www.control.isy.liu.se/~zoran

mailto:[email protected]



http://www.control.isy.liu.se/publications/?type=techreport&number=3065&go=Search&output=html

http://www.control.isy.liu.se

http://www.control.isy.liu.se/publications

AbstractSimultaneous Localisation and Mapping (SLAM) denotes the problem ofjointly localizing a moving platform and mapping the environment. Thiswork studies the SLAM problem using a combination of inertial sensors,measuring the platform’s accelerations and angular velocities, and a monoc-ular camera observing the environment. We formulate the SLAM problemon a nonlinear least squares (NLS) batch form, whose solution provides asmoothed estimate of the motion and map. The NLS problem is highlynonconvex in practice, so a good initial estimate is required. We pro-pose a multi-stage iterative procedure, that utilises the fact that the SLAMproblem is linear if the platform’s rotations are known. The map is ini-tialised with camera feature detections only, by utilising feature trackingand clustering of feature tracks. In this way, loop closures are automati-cally detected. The initialization method and subsequent NLS refinementis demonstrated on both simulated and real data.

Keywords: Simultaneous localisation and mapping, optimisation, inertialmeasurement unit, monocular camera

1

Initialisation and Estimation Methods for BatchOptimisation of Inertial/Visual SLAM

Martin A. Skoglund, Zoran Sjanic, Fredrik Gustafsson Fellow, IEEE

Abstract—Simultaneous Localisation and Mapping (SLAM)denotes the problem of jointly localising a moving platformand mapping the environment. This work studies the SLAMproblem using a combination of inertial sensors, measuring theplatform’s accelerations and angular velocities, and a monocularcamera observing the environment. We formulate the SLAMproblem on a nonlinear least squares (NLS) batch form, whosesolution provides a smoothed estimate of the motion and map.The NLS problem is highly nonconvex in practice, so a goodinitial estimate is required. We propose a multi-stage iterativeprocedure, that utilises the fact that the SLAM problem is linearif the platform’s rotations are known. The map is initialisedwith camera feature detections only, by utilising feature trackingand clustering of feature tracks. In this way, loop closures areautomatically detected. The initialisation method and subsequentNLS refinement is demonstrated on both simulated and real data.

I. INTRODUCTION

The goal in Simultaneous Localisation and Mapping(SLAM) is to estimate a map of the surrounding environmentfrom a moving platform, while simultaneously localising theplatform, or more generally, to estimate the state of the plat-form including both position and orientation. For more thantwenty years SLAM has been a popular field of research andis considered an important enabler for autonomous robotics.An excellent introduction to SLAM is given in the two parttutorial by [1], [2].

There are many types of sensors used in SLAM applicationsand the laser range scanners are probably the most popularwhile cameras have increased in popularity in recent years,see [3] for a thorough overview of visual SLAM. BundleAdjustment (BA), [4]–[6], belongs to these methods. Theidea is to minimise the reprojection error for all cameraposes and the structure as a (potentially) large nonlinearleast-squares (NLS) problem. The online (filtering) methods,which handle sequential data, are known as Structure fromMotion (SfM) [7], [8]. A typical pipeline combining SfMand BA initialises the BA optimiser with a solution obtainedfrom running SfM on the complete data. Without any othersensors measuring the platform’s dynamics, the trajectory andmap can only be estimated up to a universal scale. It istherefore necessary to enforce a scale (typically unit) whensolving a BA system, since otherwise all coordinates willconverge to a point. The underlying idea in visual/inertialSLAM is to implicitly estimate the scale using the inertialmeasurements. By adding an Inertial Measurement Unit (IMU)

Division of Automatic Control Department of Electrical EngineeringLinkoping University SE-581 83 Linkoping, Sweden Email: {ms, zoran,fredrik}@isy.liu.se

p1

p2

p3

p4

m1

m2m3

m4

m5 m6

m7 m8

Xn

Yn

Zn

Xb

Yb

Zb

Figure 1: The setup with inertial and visual sensors. Thecamera is observing the environment represented by pointlandmarks, m1, . . . ,m8, and the inertial sensors are measuringaccelerations and angular rates in the body coordinate system,(b), which moves together with the camera. Also, a global,fixed navigation coordinate system, (n), is drawn.

measuring accelerations and angular rates (up to an unknownbias and noise), this, otherwise unknown, scale can be re-solved, see e.g., [9]–[12]. Inertial/visual SLAM is a branch ofresearch with many applications in entertainment, augmentedreality and autonomous robotics. There are both on-line andbatch solutions to the SLAM problem. EKF-SLAM [13] andFastSLAM [14] belong to the class of on-line algorithms.They are quite efficient for ground robotics, which was thedriving application when the SLAM problem was originallyformulated. However, these algorithms scale badly with thedimension of the map and platform state and are difficult toapply in their standard formulations. Batch algorithms canpotentially overcome this limitation. The SLAM problem iseasily formulated in a NLS framework, but the NLS cost func-tion is highly non-convex. Thus, proper initialisation is needed.Although there are many batch formulations of the SLAMproblem in literature, see for instance [15]–[18], initialisationis not discussed in detail. This paper is entirely devoted to theinitialisation problem for a combination of monocular cameraand inertial sensors. See Figure 1 for an illustration of thissetup.

A multi-stage procedure for estimation of initial motion,map and data association based on images-only as wellas combined visual/IMU methods is devised in this work.The initialisation is utilising a reformulation of the standardreprojection error given that platform’s rotations are known,resulting in an almost linear formulation of the SLAM prob-lem. That is, without measurement noise the problem would

2

be completely linear, see [9], [10], [19], but in our case thenoise is parameter dependent and is estimated in an iterativefashion. Moreover, another important issue related to the visualpart is the data association between camera measurements andmap landmarks. These associations should be as accurate aspossible since erroneous labeling can cause inconsistency inthe estimate. The correspondence problem is fundamental inboth SLAM and Bundle Adjustment. It is however somewhateasier with sequential data because features can be trackedin order to find the relative displacements of the platformand to predict positions of the tracked features. While locallyconsistent correspondences are rather easy to obtain, consistentloop closures are more difficult and there are no standardmethods. In the proposed approach data association is basedon data clustering, see for instance [20], of feature tracks. Thefeature tracks, see e.g., [21], are estimated as a linear programformulation of the assignment problem. Outliers are efficientlyeliminated using an iterative procedure on the reprojectionerrors using the IMU data. This initial estimate of the motionand map together with the data association can be usedfor warm starting for example a NLS SLAM procedure orother nonlinear estimators where all the parameters, includingrotations, are treated as unknown. The initialisation proceduresuggested in this paper provides such a value and leads to abetter total estimate than for example naıve initialisation basedon measurements only.

The block diagrams in Figure 2a serve as illustrativeoverview of the proposed method. The initialisation proce-dure computes a set of landmarks with its correspondingmeasurements (this includes the loop closure candidates), anestimate of the trajectory and velocity using IMU data anda camera-only rotation estimate. The flow of this procedureis illustrated in Figure 2b. The landmark initialisation inFigure 2b represents the image processing operations suchas feature tracking and track clustering. Feature tracks areextracted from matching correspondence pairs in the imagesequence. The feature tracks are then clustered based upontheir average feature descriptor in order to find loop closurecandidates. The clustered feature tracks are then used toinitialise 3D point landmarks.

The paper is organised as follows; Section II describes themodels of the different sensors used in the formulation ofthe problem. Section III handles the initialisation procedurebased on the almost linear formulation of the visual/inertialSLAM problem. Here, all steps for the initialisation of thetrajectory, orientation, landmarks and data association aredescribed. Each subsection here represents a particular blockin Figures 2b and 2c. In Section IV a nonlinear refinementmethod is used where the initial point is given by the proposedinitialisation procedure. In Section V some comments and dis-cussion that motivate the linear initialisation method are given.Finally, results on both simulated and real data are shown inSections VI and VII respectively, and some conclusions andfuture work are discussed in Section VIII.

II. MODELS

The sensors of interest are monocular camera and 6-DOFinertial sensors, gyroscopes and accelerometers, contained in a

Images

SLAM Ini.alisa.on

Iner.al Data Nonlinear Least-‐Squares

SLAM

(a) High level abstraction of the computational flow.

SLAM Ini)alisa)on

Landmark Ini)alisa)on

Linear SLAM

Rota)on Ini)alisa)on Outlier Removal

Iner)al Data

Images SLAM Esi)mate

(b) Camera and inertial data initialisation procedure.

Feature Tracks

Track Clustering

Landmark Ini5alisa5on

Images Landmarks

(c) Image processing procedure.

Figure 2: An overview of the proposed method. Most blockscorresponds to a subsection in Section III and Section IV withthe same name

single unit. A standard Cartesian 3D point landmark parametri-sation is used and its measurement is given by the pinholeprojection model. In this work we assume that both the cameraand the relative pose of the camera optical center with respectto the center of the IMU are calibrated. The camera calibrationimplies that image pixel coordinates can be transformed tometric coordinates and all the inertial measurements can beassumed to measure the camera’s rotation and acceleration.

A. Position and Orientation

Given the accelerations, a = [ax, ay, az]T , and angularvelocities, ω = [ωx, ωy, ωz]T , of a moving and rotating objectexpressed in the non-moving frame, the so called navigation(or world or earth) frame. The the position, velocity and ori-entation (parametrised as unit quaternion q = [q0, q1, q2, q3]T ,qT q = 1) of the object in the navigation frame, [p, v, q], canbe written as a discrete time dynamic model as

pt = pt−1 + Tvt−1 +T 2

2at (1a)

vt = vt−1 + Tat (1b)

qt = exp

(T

2Sω(ωt)

)qt−1 (1c)

3

where T is the sampling time, the skew-symmetric matrix

Sω(ω) =

0 −ωx −ωy −ωzωx 0 ωz −ωyωy −ωz 0 ωx

ωz ωy −ωx 0

, (2)

parametrises the quaternion dynamics and here exp( · ) denotesthe matrix exponential.

B. IMU Measurements

The IMU measures the specific force and rotation speed ina frame attached to the IMU body frame, denoted b. Usuallythese measurements are imperfect and contain both biasesand measurement noise. The biases are assumed constantand this is usually only a good approximation for a shortperiod of time since in practice biases will vary due to e.g.,temperature. Under these assumptions the measurements canthen be described as

yat = Rbe(qt)(aet − ge) + ba + eat (3a)

yωt = ωt +bω + eωt (3b)

where ge = [0, 0,−g] is the local gravity vector expressedin the navigation frame, and g ≈ 9.82, Rbe(q) is the rotationmatrix parametrisation of the quaternion and the measurementnoises are assumed i.i.d. Gaussian with zero mean and timeinvariant covariances Ra and Rω , i.e., eat ∼ N (0, Ra) andeωt ∼ N (0, Rω) .

C. Camera Measurements

The monocular camera is modeled as a standard pinholecamera, see cf. [4]. The camera calibration matrix and lensdistortion need to be estimated prior to usage. Since thecalibration and distortion are known the distorted pixels canbe pre-multiplied with the inverse of the camera matrix anddistortion can be compensated for. Thus, the camera thenworks as a projective map in Euclidean space, P : R3 → R2.The projection is defined as P ([X,Y, Z]T ) = [X/Z, Y/Z]T

and normalised camera measurement ymt = [ut, vt]T of a

landmark, m, at time t is then

ymt = P (Rce(qt)(m−pt)) + emt (4)

which relates the absolute pose of the camera w.r.t. the 3Dlocation of the point. The measurement noise is assumed i.i.d.Gaussian, emt ∼ N (0, Rm). The correspondence variablesat time t, cit, encodes the measurement-landmark assignment,yit ↔ mj , which gives a subset of all M landmarks at time t,Mt = {mj}, j ∈ {1, . . . ,M | cit = j}. At time t the stackedmeasurement equation is then

u1t

v1t...

uNy

t

vNy

t

︸︷︷︸

yt

=

P (Rce(qt)(m

c1t −pt))...

P (Rce(qt)(mcNyt −pt))

︸︷︷︸

ht(xt,θ)

+emt , (5)

where cit denotes the index of the corresponding landmarkand Ny is the number of measurements, which of course variesover time. Methods for estimation of correspondence variablesare discussed in Section III-B.

III. SLAM INITIALISATION

In this section a method intended for initialisation of monoc-ular visual/inertial SLAM from sequential data is described.The output of the method is a landmark map and the motionof the platform. It also establishes local correspondencesvia assignment variables using image features descriptors.Classical algorithms that solve assignment problems are theHungarian (Munkres) algorithm [22] and the popular Auctionalgorithm [23]. Here a slightly different approach is adoptedwhich results in a sequence of linear optimisation problems. Inthe landmark initialisation procedure we use appearance basedcorrespondence matching, see e.g., [24], [25]. It aims at findingsimilar features corresponding to the same physical object indifferent images. Appearance based matching relies on featuredescriptors that are distinctive and holds some invarianceproperties. For instance, image intensity invariance can be im-portant in outdoor environments where lighting conditions maychange and matching over large baselines requires invarianceagainst scale, rotation and possibly invariance against changeof viewpoint is desirable. In the following subsections we willdescribe the total initialisation procedure in detail and providealgorithms that implement these steps.

A. Feature Tracks

Feature tracks are established from the appearance of cor-respondences over multiple views by a matching scheme.Feature descriptor vectors, f , from the popular Scale-InvariantFeature Transform (SIFT) [26] are used for establishing corre-spondences. Given a sequence of images It, t ∈ {1, . . . ,K},the feature matching problem consists of assigning a subsetof feature measurements from image It, f it , i ∈ {1, . . . , Nt},to a subset of feature measurements from image It+1, f jt+1,j ∈ {1, . . . ,Mt+1}, such that each measurement gets assignedto exactly one, unique, other measurement. In a manner similarto measurement-landmark assignment described before theseassignments are also encoded by correspondence variables(which are binary in this case), cijt ∈ {0, 1}, which arecollected into ct and the assignments for all images arecollected into c. Furthermore, each assignment is associatedwith a matching cost Gijt as

Gijt = −‖f it − f jt+1‖−12 , (6)

which is the negative inverse Euclidean distance between thefeature descriptor vectors. The costs are used to construct amatrix and to find pairwise matches in the image sequence.This is done by solving the assignment problem which can be

4

formulated as the following binary program (BP)

ct = arg mincijt

Nt∑i=1

Mt+1∑j=1

Gijt cijt

subject toNt∑i=1

cijt ≤ 1,∀j (7)

Mt+1∑j=1

cijt ≤ 1,∀i

cijt ∈ {0, 1}

which is typically hard to solve. A standard method is torelax the binary constraints cijt ∈ {0, 1} to 0 ≤ cijt ≤ 1.This relaxation gives that (7) becomes a linear program (LP)which is much easier to solve. A compact representation of therelaxed BP assignment problem (7) on matrix form is (omittingtime index for readability)

c = arg minc

GTc

subject to Ac ≤ 1(N+M)×1 (8)0NM×1 ≤ c ≤ 1NM×1

where G and c are the vectorised versions of the matrices Gand c where columns are stacked on top of each other. Thematrices 1i×j and 0i×j are the i× j matrix of ones and zerosrespectively. Matrix A, which has dimension (N+M)×NM ,has a specific structure as follows: The first M rows look like

A1 = IM ⊗ 11×N (9a)

and the last N rows look like

A2 = 11×M ⊗ IN (9b)

and A = [AT1 AT2 ]T . ⊗ represents the Kronecker’s matrixproduct. This constraint matrix A is totally unimodular, thatis, all possible square sub-matrices are unimodular i.e., havingdeterminant equal to ±1. An important observation is thatthe matrix of (relaxed) constraints is unimodular. This meansthat the LP problem is integral, i.e., its optimum has aninteger value corresponding to the optimum of the originalBP problem [27]. This means the assignment problems aresimple since good and fast LP solvers are readily available. Inthis work Gurobi Optimizer [28] is used. The computationalbottleneck for these problems is creating the cost matrix Gsince each element must be calculated.

The solution to the assignment problem will always use allthe measurements from the smaller set no matter how bad thefit is. The reason is because the cost will always decrease byassigning one more variable, no matter how small the decreaseis. This is not a desired behavior since these matches can inprinciple be arbitrarily bad. It is therefore necessary to modelfeatures which are unique for each measurement such that theydo not end up being assigned. One way of doing this is to adda regularisation term to the assignment cost as

Gijt = −‖f it − f jt+1‖−12 + η, (10)

f1

1f2

8f3

6f4

2

f3

4f4

17f5

1f6

2f7

8

f7

2f8

1f9

2f10

3

Time

Track number

Track 1

Track 2

Track 3

Figure 3: Example of time-disjoint track clustering where onlyTrack 1 and Track 3 are allowed to be clustered. Subscript fig-ures on the descriptors, f , are the time indices and superscriptfigures are the enumeration of features at each time instant.

where η > 0 is a tuning parameter which controls the rejectionof the excess assignments that are bad. Thus, η will forcethe cost of certain, unlikely, assignments to become positive,implying that those assignments will never be chosen sincethey would increase the total cost.

B. Track Clustering

Feature tracks are defined as a time sequence ofpairwise matching feature correspondences Ct:t+s =[ft,ft+1, . . . ,ft+s]. The minimum length of a track is thena pair because a feature without a match is not useful. Thelength of the tracks has a twofold interpretation; a featuredescriptor is unique with respect to others in the sequence,i.e., the feature has a unique surrounding, and the othercase is when the camera is stationary and thus the scenehas been observed for a long time. However, in case ofa moving camera, feature tracks may be lost due to e.g.,occlusion or change of perspective. Therefore, new featuretracks may represent previously initiated tracks. To cater forthis a track clustering scheme is employed joining tracks thatmay represent observations of the same feature. For simplicityof calculation, each track is represented by the mean value, C,of all the descriptors that constitute that track. The distancebetween tracks used for clustering is then

dij = ‖Ci − Cj‖2. (11)

Since tracks have a temporal meaning and each feature canbe measured only once in each image, valid clusters of trackscontain only time-disjoint tracks. That is, Ci:j and Ck:l areallowed to be clustered together only if {i : j} ∩ {k : l} = ∅.Time-disjoint clustering is illustrated in Figure 3. Solvingfor these constraints can be done by simply removing time-overlapping tracks within clusters. Since the amount of land-marks is unknown, a clustering method where the number ofclusters is not explicitly given is used. One such method issingle-linkage clustering where the clustering stops when somecondition on the between-cluster distance is fulfilled, [20].This distance is viewed as a tuning parameter. Another benefitof the single-linkage clustering is its speed, since there aregood implementations available. Furthermore, data reductionand automatic loop closure detection is obtained since loops

5

Algorithm 1 Track Clustering

Input: The set of all tracks {Ci}NCi=1, where Ci =

[fk,fk+1, . . . ,fk+l], l − k > 1.Output: Track clusters C

1: Compute track means:Ci

= 1l−k

∑lt=k ft, i = 1, . . . , NC

2: Cluster the tracks:C = Cluster data(C

i), i = 1, . . . , NC

3: for all clusters do4: if tracks within the cluster are time disjoint then5: keep the cluster6: end if7: end for8: Remaining clusters are C

are defined by clusters containing more than one track. Thisimplies that the between-cluster distance used for terminationof the clustering controls the quality of the loop closures; ifstopping too soon, there will be many small clusters and someloop closures will be missed and if stopping too late the risk ofclustering wrong tracks together is increased. The track clus-tering algorithm is summarised in Algorithm 1. In this way, aset of landmarks has been obtained representing the initial mapof the environment. Errors introduced, outlier measurements,in the clustering and in the feature tracks should be removed.This will be done according to Algorithm 3.

C. Rotation Initialisation

From sets of correspondences, estimated as in Section III-A,for each image pair in the image sequence the relative transfor-mations (up to a scale of the translation) can be obtained. Thisis done with the Eight-Point Algorithm, see e.g., [4], resultingin a rotation sequence

Rc = {Rci}K−1i=1 , (12)

where Rci is the relative rotation of the cameras, (c), from timei to i + 1. The global rotation, from world to camera, (ce),for any time, j, can be calculated by the matrix product

Rcej =

0∏i=j

Rci . (13)

Note that rotation matrices do not generally commute meaningthat the product must be done in the reverse time order.Also note that the rotations are initialised with camera onlyand the gyro is not used. This is because the dominatingerror from the gyro is bias giving drifting rotation estimates.Errors from camera estimated rotations have more random likebehaviour resulting in the random walk errors. There may alsobe correspondence errors in the feature tracks resulting in badrotations. The sensitivity to initial rotation errors are furtheranalysed in Section VI-B.

D. Linear SLAM

Methods of 3D structure estimation using linear methodsand image point correspondences are well known. The basic

idea is to form an overdetermined triangulation problem,which is linear in the unknowns, and solve it by linear leastsquares. This is essentially the Direct Linear Transformation(DLT) [29] which may work well in practice. However, insteadof minimising the discrepancy between measured image coor-dinates and the back projection of points, an algebraic errorwithout good geometrical interpretation is minimised [4]. Itis therefore common to proceed with a nonlinear optimisationover the reprojection residuals with the linear method as astarting point. Given the correct weighting of the measure-ments, then the linear method minimises the reprojection error,see for instance [30]. This weighting does however depend onthe unknown depth of the points but iterative re-weightingoften improves the linear solution.

The projection is a convex operation and this can beexploited in many ways. Optimal approaches to reconstructionconsider reprojection errors under the L∞ norm [31], [32]since it preserves (quasi)-convexity, see [33], [34] and havea single optimum which is typically not the case for theL2 cost. These approaches assume outlier free measurementsbecause otherwise the maximum error will correspond to anoutlier and by using this insight an iterative method for outlierremoval using L∞ was proposed [35], however, it does notscale to large problems [36]. In common, it is assumed thatrotations are known beforehand and in many situations this isa reasonable alternative since rotations may be estimated frompoint correspondences. Known (or error-free) orientation wasconsidered for fusion of vision and inertial sensors in [10] andfor the case of visual odometry with inertial measurementsin [9], [37]. The assumption of known orientation will also beused here which results in an almost linear method. Assumingknown rotations it is possible to rewrite (4) in the followingway (omitting time index)

δ = Rce(q)(m−p) (14)[u

v

]=

[δxδzδyδz

]+

[euev

]⇒ (15)[

uδzvδz

]=

[δxδy

]+

[euδzevδz

](16)

where δ(p,m) = [δx, δy, δz]T is the difference between land-

mark and camera positions expressed in the camera coordinatesystem. Equation (16) is linear in the unknown parameters mand p, but has noise that is dependent on the parameters. Withδ explicit (16) becomes

R3,:(m−p)[u− euv − ev

]=

[R1,:(m−p)R2,:(m−p)

], (17)

where Ri,: denotes the i:th row of the rotation matrix Rce. Theaccelerometer measurements with bias are as in Section II-B

yat = Rcet (at − ge) + ba + eat (18)

where ge, ba are assumed constant and eat ∼ N (0, Ra).Usually, the sampling rate of an IMU is faster than a camerawhich can be handled in a straightforward fashion e.g., by

6

averaging accelerations between the camera samples

yat =1

(St − St−1)

St∑s=St−1

yas , (19)

where St maps the time indices between the camera andIMU. Treating the accelerations, its bias, initial velocityand landmarks as unknown parameters and defining θ =[a1:N , v0, ba,m]T , the following formulation is proposed

θinit = arg minθ

N∑t=1

‖yat − Rcet (at − ge)− ba‖2R−1a

+ (20)

‖ymt δz(pt,m)− δx,y(pt,m)‖2R−1

m

subject to[ptvt

]= F t

[p0

v0

]+

t∑i=1

F i−1Bai,

F =

[I3 TI30 I3

], B =

[T 2

2 I3TI3

],

where Rm = δz(p0t ,m

0)2Rm, and superscript 0 indicates thatthe parameters used for weighting are fixed for each iteration.The constraints represent the second order linear dynamicsintroduced in (1). In turn, this can also be interpreted as asecond order interpolation of the trajectory. The formulationin (20) would be a constrained weighted linear least squares(WLS) problem if the measurement noise in the camera did notdepend on the landmark. Now since any pt can be expressed asa (linear) function of v0 and a1:t the constraint can be directlysubstituted into the cost function resulting in the followingunconstrained problem

θinit = arg minθ

N∑t=1


+ (21)

‖ymt δz(v0, a1:t,m)− δx,y(v0, a1:t,m)‖2R−1

m,

where Rm = δz(v0, a1:t,m)2Rm. The only difference be-tween this problem and usual WLS is the parameter dependentnoise for the landmark measurements. This can be treatedin an iterative fashion where δz is used for weighting thenoise covariance which is evaluated using the parameter valuesfrom the previous iteration. This approach is also known asIterative Reweighted Least Squares (IRLS) which is a wellknown method, see e.g., [38]. The procedure is described inAlgorithm 2. This approach usually converges after a fewiterations and in our implementation three (K = 3) iterationswere a suitable choice.

The problem (21) can be augmented with linear terms forinitial gyro bias estimation, ‖qt − T

2 Sω(ωt + bω)qt−1‖2R−1q

,which is the first order approximation of (1c). Using thebilinear relation Sω(b)q = Sq(q)b, this can be written as‖qt− T

2 Sω(ωt)qt−1− T2 Sq(qt−1)bω‖2R−1

q, which is also a linear

function of bω , since rotations are assumed to be known. Theseterms are decoupled from the rest of the parameters and theWLS problem defined by these can be solved separately ifrequired.

Algorithm 2 Iterative Reweighted Least SquaresInput: IMU measurements, ya1:N , (ω1:N ), feature measure-ments, ym1:N , rotations Rec1:N , data associations c, initial pa-rameters θ0 = [a1:N , v0, ba, (bω),m]T , number of iterationsKOutput: Parameter estimates θinit

1: for k := 1 . . .K do2: Create the WLS problem according to (21) with Rm =

δz(θk−1)2Rm3: Solve the WLS problem giving θk4: end for5: θinit := θk

Algorithm 3 Iterative Outlier RejectionInput: IMU measurements, ya1:N , feature measurements, ym1:N ,rotations Rec1:N , data associations c, initial parameters a1:N , v0,ba, m, rejection threshold λOutput: Data associations c

1: terminate := false2: k := 03: ck := c4: while not terminate do5: Solve the problem according to Algorithm 2 given the

assignments ck and all the parameters and produce aset of landmark residuals for each image i, εim

6: for each image i do7: ε := εim\max(εim)8: if all max(εim)� ε > λ then9: remove the assignment that is associated with

max(εim) from ck

10: end if11: end for12: if no assignments removed from ck then13: terminate := true14: else15: k := k + 116: end if17: end while18: c := ck

E. Iterative Outlier Removal

The landmark initialisation produced by Algorithm 1 willintroduce erroneous associations due to the unavoidable ambi-guity of the feature descriptors. These associations should beconsidered outliers. It is difficult to discriminate outliers basedon descriptors alone. However, given the IMU data, whichdescribe the motion independently of cluster appearance, astrategy for inertial based outlier rejection can be devisedaccording to the pseudo-code in Algorithm 3. This procedurewill terminate when all of the residuals are of similar size,where similar is defined here by the rejection threshold λ.The operator � denotes element-wise division.

IV. NONLINEAR LEAST-SQUARES SLAMAccelerations, initial velocity, landmarks and biases esti-

mated with the linear method described in Section III-D are

7

used as an initial value for NLS. Here the reprojection erroris formulated in its original form, that is

ymt = P (Rce(qt)(m−pt)) + et (22)

where the operator P is defined as in (4). The second additionis that rotations are not fixed any more, but are estimatedtogether with the rest of the parameters. This is done by addingthe angular velocities ω1:N to the parameter set. Now, themeasurement relation from Section II-B can be used

yωt = ωt + bω + eωt . (23)

and the rotations can be calculated by using the relation (1c)as

qt =

[t∏

k=1

exp

(T

2Sω(ωk)

)]q0, (24)

where q0 is assumed given. These two modifications give thenew parameter vector θ = [a1:N , ω1:N , v0, ba, bω,m]T and thenew nonlinear least squares (NLS) formulation of the problemaccording to

θ = arg minθ

N∑t=1


+

‖yωt − ωt − bω‖2R−1ω

+ (25)∥∥∥∥ymt − δx,y(v0, a1:t, ω1:t,m)

δz(v0, a1:t, ω1:t,m)

∥∥∥∥2

R−1m

and Rcet is now a function of ω1:t and δ is defined in (14).This problem can be solved efficiently with e.g., a standardLevenberg-Marquardt solver, [39].

V. HEURISTIC MOTIVATION OF THE LINEARINITIALISATION

In tracking and navigation the measurement models are of-ten nonlinear and so are most SLAM systems stemming frome.g., transformations between reference frames by rotations,perspective divide, among others. In practice this means thatthere exist local minima which should be avoided. In order toreach the global minimum the initialisation point should be inthe proximity of the global minimum or at least the functionshould be monotone between the initial point and the globalminimum and even better is of course if it is also convexalong this direction. In fact, it is sufficient that the function isconvex on a path that the minimisation procedure will take inorder to end up in the global minimum. Here we propose aninitialisation procedure based on the almost linear method. Wewill use a simple heuristics to motivate that the initial pointcreated in this manner is better than an initial point createdusing available measurements.

The definition of convex function f(θ) : Rn → R is

f(λθ1 + (1− λ)θ2) ≤ λf(θ1) + (1− λ)f(θ2), (26a)0 ≤ λ ≤ 1, (26b)∀θ1, θ2 ∈ dom f ⊂ Rn. (26c)

Geometrically, this is interpreted as the hyperplane that liesbetween points (θ1, f(θ1)) and (θ2, f(θ2)) is always above

the function f . If this is fulfilled for all θ then the function isconvex. Convex functions have a property that there is onlyone global minimum, so any minimisation procedure can beused to obtain that. For example linear least squares problemsfall into this category. However, many functions, although non-convex on the whole domain, are convex on a subset of thedomain, usually in the proximity of the local minima. Thiscan be motivated with the fact that Taylor expansion aroundthe local minimum will be quadratic function plus a rest termof a higher degree. If this rest term is not dominating overthe quadratic term, the function is (locally) convex in thisregion. As stated before, in order to apply a local minimisationprocedure, and successfully obtain the global minimum, thepath between the initial point and minimum should also fulfillthe convexity property (26).

The main idea is to check the local convexity of the NLScost function given the initial point produced with the linearinitialisation procedure. This can be done approximately byusing a dense sampling of the cost function along the searchdirection as given by the initialisation and then evaluate ifthe path is convex according to (26). In addition, we alsorequire that the search direction in the initial point, p0, is wellaligned with the direction from the initial point to the truesolution, θ∗ − θ0. The intuition behind this is that an initialsearch direction is crucial for convergence to a good solution.This is determined with the angle between these two directionsdefined as

γ = arccos

(pT0 (θ∗ − θ0)

‖p0‖2 ‖θ∗ − θ0‖2

), (27a)

p0 = −(JT0 J0)−1JT0 ε0, (27b)

where J0 is a Jacobian matrix of the cost function evaluated inθ0 and ε0 is the residual in θ0. These criteria will be evaluatedin a Monte Carlo fashion for the linear initialisation procedureand compared to other initialisation approaches, for exampleusing measurements only.

VI. MONTE CARLO SIMULATIONS

Monte Carlo (MC) simulations are used to evaluate thewhole initialisation approach. That is, to find out if the linearmethod gives a good starting point for the nonlinear optimi-sation and if the outlier rejection procedure seems reasonable.For the initialisation of the landmarks, i.e., the clusteringapproach, only real data are used since it is complicated tomake simulated SIFT features and these results are presentedin Section VII.

A. Efficiency of the Linear Initialisation

Here, the method according to Section III-D is evaluated. Inthe first set of simulations a trajectory and a set of landmarksare created and different white Gaussian noise is simulatedand added to the measurements for each MC simulation.No outliers are present. Initial rotation used for the linearoptimisation is also randomly perturbed but fixed together withthe trajectory and the scene. The linear solution producedin this way is compared with the naıve guess where the

8

Initialisation method Linear Naıve

Average initial angle γ [◦] 17.7 66.6Average error ‖θ − θ∗‖/dim(θ∗) 1.6 · 10−3 221 · 10−3

# of non convex paths 0 0

Table I: MC simulation results for the initialisation method,see Section VI-A, where the measurement noise was varying.The angle γ is defined in (27a), θ is defined in (25) and θ∗

is the true solution.

Perturbation magnitude [◦/s] 0.1 1 5

Average initial angle γ [◦] 1.64 40.2 84.0Average error ‖θ − θ∗‖/dim(θ∗) 8.34 · 10−9 69.1 1.0Average error ‖θinit − θ∗‖/dim(θ∗) 0.08 0.63 0.78# of non convex paths 0 3 18

Table II: MC simulation results for the initialisation method,see Section VI-B, where the initial rotation error was varying.The angle γ is defined in (27a), the NLS estimate θ is definedin (25), the initial estimate θinit is defined in (21) and θ∗ isthe true solution.

initial point is created from the measurements and randomisedlandmark positions.

The first set of MC simulations consist of 50 simulationsperformed on a data set with 20 landmarks and 30 timepoints. Landmarks were placed in a general configuration, withvarying distance to the camera. The noise standard deviationwas fixed and is set to σa = 10−3 m/s2, σm = 10−4 mand σω = .5◦/s. The results are listed in Table I. Both theproposed initialisation method and the naıve one have convexpaths between the initial and the true point, but the initialangle between the search direction and the direction to thetrue solution is much larger (approximately four times) for themeasurement initialised optimisation. This causes the averageerror of the solution to be much larger (approximately 140times) for the naıve initialisation.

B. Sensitivity to Initial Rotation Errors

In the second set, consisting of 3 times 25 realisations,the trajectory and the scene are fixed as before, no noiseis added to the measurements and no outliers are present.Here the initial rotations are varied randomly with differentperturbation magnitude 0.1, 1 and 5 [◦/s]. The solutions to thelinear initialisations given different rotation perturbations arethen used to solve the non-linear optimisation problem and theresults are in Table II. Here it can be seen that both error in thelinear solution as well as initial angle grow with the magnitudeof the error in rotation. As a consequence the number of non-convex paths also increases with the perturbation magnitudeand the average error of the estimate is large. The reason forthe very large average for the perturbation of 1 degree persecond is the presence of 4 really bad solutions. With theseremoved the value is 0.45 which is more reasonable. Thisevaluation shows the importance of the initial estimation ofthe rotations.

Figure 4: Sensor unit containing both camera and inertialsensors.

C. Iterative Outlier Removal

The third set is used to evaluate the performance of theoutlier rejection method proposed in Algorithm 3. Here theoutlier measurement rate is varied while the trajectory, sceneand the initial rotations are fixed. The performance is evaluatedaccording to the amount of outlier measurements that are leftand the amount of inlier measurements that are removed.

Results are presented in Table III, in this case the sceneis fixed to 30 landmarks in 21 images and the outlier rateis varied, both as a number of landmark outliers and as anumber of images where outliers are present. The outliersare created by randomly flipping a pair of associations inan image. For example, if the number of landmarks to beoutliers are two, then in all images where outliers are presentthe measurements from these two landmarks are flipped withtwo other randomly chosen landmarks. This strategy is chosenbecause it is the behaviour of the data association methodemployed here. Three different rejection thresholds are used,λ = {3, 5, 8}, in the experiments. The result shows that, as ageneral trend and as expected, the rejection threshold governsthe amount of outliers that are left and number of inliersthat are rejected. It is of general interest to reject as manyoutliers as possible and to keep as many inliers as possible. Thelower threshold means better outlier rejection, but the price isthat more inliers are also rejected. This also emphasises theimportance of having many landmark measurements in orderto be resilient to removing inliers. Note that this statisticsis conservative in the way that only measurements that arecreated as outliers are considered as true outliers. In manycases all measurements for a landmark that is creating outliersare removed, implying that this landmark is no longer deemedas usable and it can be excluded from the statistics. This meansthat results in Table III would be somewhat better if these aretaken into account.

VII. REAL DATA EXPERIMENTS

In the real data experiments, a sensor unit, see Figure 4,equipped with monocular monochrome VGA camera (Point-grey firefly) and three axis inertial sensors (gyroscopes andaccelerometers) was used. It contains also magnetometers anda temperature sensor which is used for internal calibration.In one of the experiments the sensor unit was mounted at

9

λ = 3 Outliers left [%] Inliers removed [%]IO \ LO 0.07 0.13 0.2 0.07 0.13 0.2

0.05 0 0 0 13 22 220.10 0 0 0 18 23 310.14 0 0 0 17 26 410.19 0 6.3 0 29 36 400.24 0 0 0 20 20 520.29 0 0 0 20 46 480.33 0 0 2.4 23 44 520.38 6.3 0 2.1 42 50 54


0.05 0 50 0 9.1 12 140.10 0 0 0 11 21 320.14 0 8.3 5.6 17 25 420.19 0 0 0 21 34 380.24 0 5 0 19 25 540.29 0 0 0 16 36 490.33 7 0 0 21 34 510.38 6.3 0 0 39 35 54


0.05 0 0 8.3 8.1 9.2 150.10 0 0 0 17 20 270.14 0 0 11 16 18 260.19 0 0 0 15 25 350.24 0 5 10 22 34 580.29 8.3 0 0 16 25 420.33 0 0 2.4 27 42 380.38 6.3 0 8.3 34 31 66

Table III: Outlier rejection simulation results. IO is the fractionof images in which outliers are present. LO is the fraction ofthe landmark set in which outliers are present. Three differentrejection thresholds λ are used.

the tool position of an IRB-1400 industrial robot from ABBfor the purpose of an accurately known ground truth. Also,a small scene with objects of known size was created sothat the estimated scene could be compared with respect toits size. In the second experiment, a free-hand movement ofthe camera is used in a room, and no accurate ground truthis available. In this case the accuracy is evaluated based onthe approximate measurements in the room and approximately“knowing” where camera was. Prior to usage, the camera wascalibrated using the toolbox [40] and the relative pose of thecamera centre with respect to the IMU centre was calibratedas described in [41]. A open source SIFT implementationfrom [42] was used.

In Figure 6 the trajectory estimate from the linear initialisa-tion and the nonlinear refinement are plotted together with theground truth trajectory for the first data set. An image withthe feature measurements is illustrated in Figure 5. Velocityin x, y-plane is shown in Figure 7. Since the true rotations areknown, errors in quaternions and Euler angles are depicted inFigures 8 and 9 respectively. Landmark estimates for initialand nonlinear estimation are shown in Figure 10.

For the second data set (free-hand run), an example imagetogether with plotted features is showed in Figure 11. Theresulting trajectory estimate is shown in Figure 12. The

Figure 5: Example image from the robot run with extractedfeatures shown as red stars.

−0.3−0.2

−0.10

−0.2−0.15

−0.1−0.050

−0.05

0

0.05

0.1

0.15

X [m]

Trajectory

Y [m]

Z [m

]

Figure 6: Trajectory from the linear estimation (green), fromnonlinear refinement (red) and ground truth (blue).

x, y-plane velocity, quaternions, Euler angles and landmarkestimates are plotted in Figures 13, 14, 15 and 16 respectively.

In both cases the nonlinear refinement improves the initialestimate for both rotations and kinematic parameters. In thefirst data set the landmark estimate is much better than theinitial estimate, and for the second data set this is harderto assess, but the positions of the landmarks look reasonablegiven the environment. For example we see that there are threelevels in the bookshelf with distinct features, which can beseen in both Figure 11 and Figure 16.

A. Clustering Results

For the landmark initialisation approach (track clustering)a helicopter data set made with a Yamaha Rmax helicopter isused. In this data set only the images were available, i.e., noIMU, implying that no outlier rejection nor complete SLAMcould be done and this data set is only used to evaluate theclustering performance. The flight is performed in a circle,meaning that the helicopter visits the same place. The totallength of one such loop was 325 images taken in 4 Hz givingthe total time of 81.25 s. An example of the loop-closureis illustrated in Figure 17 where two tracks, one from thebeginning and one from the end of the loop, are clustered

10

0 1 2 3 4 5 6 7 80

0.02

0.04

0.06

0.08

0.1

0.12

0.14Horizontal Velocity

Time [s]

v x,y [m

/s]

Figure 7: Estimated horizontal velocity from the linear esti-mation (green) and from the nonlinear refinement (red). Truevelocity is 0.1 m/s except in three time points when it is zero.

0 1 2 3 4 5 6 7 8−0.04

−0.03

−0.02

−0.01

0

0.01

0.02Quaternion Errors

Time [s]

q

Figure 8: Estimated quaternion error from camera (dash-dotted) and after nonlinear refinement (solid).

0 1 2 3 4 5 6 7 8−4

−3

−2

−1

0

1

2Euler Angles Errors

Time [s]

Ψ, θ

, φ

Figure 9: Estimated Euler angles error in degrees from camera(dash-dotted) and after nonlinear refinement (solid). Blue is theyaw, green is the pitch and red is the roll angle.

−0.4−0.3

−0.2−0.1

0

−0.5

0

0.5−0.52

−0.5

−0.48

−0.46

−0.44

−0.42

−0.4

X [m]

Landmarks

Y [m]

Z [m

]

Figure 10: Landmark estimates from the linear estimation(green) and from nonlinear refinement (red). Most of thelandmarks should lie on the −0.5 m plane and some shouldbe higher up.

Figure 11: Example image from the free-hand run with ex-tracted features shown as red stars.

−1−0.5

00.5

11.5

0

1

2

−1

−0.5

0

0.5

1

X [m]

Trajectory

Y [m]

Z [m

]

Figure 12: Trajectory from the linear estimation (green), fromnonlinear refinement (red).

11

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Horizontal Velocity

Time [s]

v x,y [m

/s]

Figure 13: Estimated horizontal velocity from the linear esti-mation (green) and from the nonlinear refinement (red).

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8Quaternions

Time [s]

q

Figure 14: Estimated rotations from camera (dashed) and afternonlinear refinement (solid).

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5−140

−120

−100

−80

−60

−40

−20

0

20

40

60Euler Angles

Time [s]

Ψ, θ

, φ

Figure 15: Estimated Euler angles in degrees from camera(dash-dotted) and after nonlinear refinement (solid). Blue isthe yaw, green is the pitch and red is the roll angle.

−10

12

3 −50

510

−1.5

−1

−0.5

0

0.5

1

1.5

Y [m]

Landmarks

X [m]

Z [m

]

Figure 16: Landmark estimates from the linear estimation(green) and from nonlinear refinement (red). Three distinctlayers can be recognised which basically come from thebookshelf’s levels.

together. With this cluster tuning the amount of loop-closuresis about 4%. This should be compared to the amount of imagesshowing the overlapping environment which were 7% of thetotal amount of images. For the other two data sets this numberis higher, especially for the free-hand run data set, wherelarge parts of the environment were visible the whole time.In general, most of the clusterings are local loops where forexample a feature is lost for a few images and is then trackedagain. One such example can be seen in Figure 18.

Even some outliers are introduced in the clustering process,which is, as explained earlier, expected, and one such isdepicted in Figure 19. The total amount of outlier landmarkwas about 10%, which is similar for the other two data sets.Many of the outliers are caused by the too similar environment,for the example in Figure 19 the road edge looks similar toSIFT and the descriptors are too similar. This will cause theerroneous clustering.

VIII. CONCLUSIONS AND FUTURE WORK

In this work we presented a method for initialisation ofoptimisation based visual/inertial SLAM on batch form. Thissensor combination makes it possible to obtain full Euclidianreconstruction of the environment and trajectory. The methodis based on a multistage strategy where visual methods, such asthe Eight-Point Algorithm, feature extraction and clustering offeature tracks, are used for rotation and landmark initialisation.Inertial data are used for data association including outlierrejection and initialisation of trajectory and landmark locationparameters. The method exploits the conditional linearity ofvisual/inertial SLAM. The experiments done on the simulatedand real data sets show that the initialisation method gives bet-ter starting point for the subsequent full nonlinear optimisationthan naıve initialisation with measurements only.

Also, the landmark initialisation method based on clusteringof the tracked features gives quite promising results wheremany possible loop-closures are identified while the amountof wrong associations is rather low. This allows for the iterativeoutlier rejection method with aid from the inertial data. Even

12

(a) Image 7. (b) Image 8.

(c) Image 9. (d) Image 321.

(e) Image 322. (f) Image 323.

(g) Image 324. (h) Image 325.

Figure 17: Example of successful loop-closure clustering withthe helicopter data. The tracked feature is marked with the redstar.

this method shows good results with efficient outlier removalwhile keeping the inlier amount relatively high.

It must be pointed out that this approach requires a largeamount of landmark measurements in order to produce goodresults, i.e., the equation system must be highly overdeter-mined. On top of that, since a camera is a bearings-only sensor,there is also a demand for sufficient viewpoint change, alsoknown as parallax, in order to accurately estimate landmarkposition. The trajectory estimation can be compensated withthe inertial data, but even there a good SNR is required.

In the future it can be interesting to use proper constrainedclustering algorithm instead of discarding clusters with over-



Figure 18: Example of clustering creating local loops with thehelicopter data. The tracked feature is marked with the redstar.



Figure 19: Example of erroneous clustering causing an outlierwith the helicopter data. The tracked feature is marked withthe red star.

13

lapping times, as it is done now. Also alternative featuredetectors might be used to see if descriptors from those havedifferent behaviour compared to SIFT.

REFERENCES

[1] H. Durrant-Whyte and T. Bailey, “Simultaneous Localization and Map-ping: Part I,” IEEE Robotics & Automation Magazine, vol. 13, no. 12,pp. 99–110, June 2006.

[2] T. Bailey and H. Durrant-Whyte, “Simultaneous Localization and Map-ping (SLAM): Part II,” IEEE Robotics & Automation Magazine, vol. 13,no. 3, pp. 108–117, Sep. 2006.

[3] M. Chli, “Applying Information Theory to Efficient SLAM,” Ph.D.dissertation, Imperial College London, 2009.

[4] R. I. Hartley and A. Zisserman, Multiple View Geometry in ComputerVision, 2nd ed. Cambridge University Press, 2004.

[5] B. Triggs, P. Mclauchlan, R. Hartley, and A. Fitzgibbon, “Bundle adjust-ment - a modern synthesis,” in Vision Algorithms: Theory and Practice,ser. Lecture Notes in Computer Science, B. Triggs, A. Zisserman, andR. Szeliski, Eds., vol. 1883. Springer-Verlag, 2000, pp. 298–372.

[6] S. Agarwal, N. Snavely, I. Simon, S. Seitz, and R. Szeliski, “BuildingRome in a Day,” in Computer Vision, 2009 IEEE 12th InternationalConference on, oct 2009, pp. 72 –79.

[7] A. W. Fitzgibbon and A. Zisserman, “Automatic Camera Recovery forClosed or Open Image Sequences,” in ECCV (1), 1998, pp. 311–326.

[8] C. Taylor, D. Kriegman, and P. Anandan, “Structure and Motion inTwo Dimensions from Multiple Images: A Least Squares Approach,”in Proceedings of the IEEE Workshop on Visual Motion, Princeton, NJ,USA, October 1991, pp. 242 –248.

[9] L. Kneip, A. Martinelli, S. Weiss, D. Scaramuzza, andR. Siegwart, “Closed-Form Solution for Absolute Scale VelocityDetermination Combining Inertial Measurements and a SingleFeature Correspondence,” in International Conference on Roboticsand Automation, Shanghai, China, May 2011. [Online]. Available:http://hal.inria.fr/hal-00641772

[10] A. Martinelli, “Vision and IMU Data Fusion: Closed-Form Solutionsfor Attitude, Speed, Absolute Scale, and Bias Determination,” Robotics,IEEE Transactions on, vol. 28, no. 1, pp. 44 –60, feb. 2012.

[11] M. Bryson and M. Johnson-Roberson and S. Sukkarieh, “Airbornesmoothing and mapping using vision and inertial sensors,” in Pro-ceedings of the International Conference on Robotics and Automation(ICRA). Kobe, Japan: IEEE Press, 2009, pp. 3143–3148.

[12] T. Lupton and S. Sukkarieh, “Visual-Inertial-Aided Navigation for High-Dynamic Motion in Built Environments Without Initial Conditions,”Robotics, IEEE Transactions on, vol. 28, no. 1, pp. 61 –76, feb. 2012.

[13] P. Moutarlier and R. Chatila, “Stochastic multisensory data fusion formobile robot location and environment modelling,” in 5th InternationalSymposium on Robotics Research, Tokyo, Japan, 1989, pp. 207–216.

[14] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit, “FastSLAM:A Factored Solution to the Simultaneous Localization and MappingProblem,” in Proceedings of the AAAI National Conference on ArtificialIntelligence. Edmonton, Canada: AAAI, 2002.

[15] Dellaert, F. and Kaess, M., “Square Root SAM: SimultaneousLocalization and Mapping via Square Root Information Smoothing,”International Journal of Robotics Research, vol. 25, no. 12,pp. 1181–1203, 2006. [Online]. Available: http://dx.doi.org/10.1177/0278364906072768

[16] S. Thrun and M. Montemerlo, “The GraphSLAM algorithm with appli-cations to large-scale mapping of urban structures,” INTERNATIONALJOURNAL ON ROBOTICS RESEARCH, vol. 25, no. 5, pp. 403–430,2006.

[17] G. Grisetti, S. Grzonka, C. Stachniss, P. Pfaff, and W. Burgard, “Efficientestimation of accurate maximum likelihood maps in 3D,” in In Proc. ofthe IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS, 2007,pp. 3472–3478.

[18] S.-H. Jung and C. Taylor, “Camera trajectory estimation using inertialsensor measurements and structure from motion results,” in ComputerVision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the2001 IEEE Computer Society Conference on, vol. 2, 2001, pp. II–732–II–737 vol.2.

[19] S. N. Sinha, D. Steedly, and R. Szeliski, “A Multi-staged LinearApproach to Structure from Motion,” in In RMLE-ECCV workshop,2010., 2010.

[20] T. Hastie, R. Tibshirani, and J. H. Friedman, The Elements of StatisticalLearning, 2nd ed. Springer, 2009.

[21] T. Thormahlen, N. Hasler, M. Wand, and H.-P. Seidel, “Merging of Fea-ture Tracks for Camera Motion Estimation from Video,” in 5th EuropeanConference on Visual Media Production (CVMP 2008). Hertfordshire,UK: IET, jan 2008, pp. 1 – 8.

[22] H. W. Kuhn, “The Hungarian Method for the Assignment Problem,”Naval Research Logistics Quarterly, vol. 2, no. 1-2, pp. 83–97, 1955.[Online]. Available: http://dx.doi.org/10.1002/nav.3800020109

[23] D. P. Bertsekas, “An Auction Algorithm for Shortest Paths,” SIAMJournal on Optimization, vol. 1, no. 4, pp. 425–447, 1991. [Online].Available: http://dx.doi.org/doi/10.1137/0801026

[24] M. Cummins and P. Newman, “Appearance-only SLAM at large scalewith FAB-MAP 2.0,” The International Journal of Robotics Research,2010. [Online]. Available: http://ijr.sagepub.com/content/early/2010/11/11/0278364910385483.abstract

[25] K. L. Ho and P. M. Newman, “Loop closure detection in SLAM bycombining visual and spatial appearance,” Robotics and AutonomousSystems, vol. 54, no. 9, pp. 740–749, 2006.

[26] D. Lowe, “Object Recognition from Local Scale-Invariant Features,”in Proceedings of the Seventh International Conference on ComputerVision (ICCV’99), Corfu, Greece, 1999, pp. 1150–1157.

[27] C. H. Papadimitriou and K. Steiglitz, Combinatorial Optimization:algorithms and complexity. Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1982.

[28] Gurobi Optimization Inc, “Gurobi optimizer reference manual,” 2013.[Online]. Available: http://www.gurobi.com

[29] Y. Abdel-Aziz and H. Karara, “Direct Linear Transformation from Com-parator to Object Space Coordinates in Close-Range Photogrammetry,”in Proceedings of the American Society of Photogrammetry Symposiumon Close-Range Photogrammetry, Falls Church, VA, USA, 1971, pp.1–18.

[30] Z. Zhang and T. Kanade, “Determining the Epipolar Geometry andits Uncertainty: A Review,” International Journal of Computer Vision,vol. 27, pp. 161–195, 1998.

[31] R. Hartley and F. Schaffalitzky, “L∞; minimization in geometric re-construction problems,” in Computer Vision and Pattern Recognition,2004. CVPR 2004. Proceedings of the 2004 IEEE Computer SocietyConference on, vol. 1, june-2 july 2004, pp. I–504 – I–509 Vol.1.

[32] F. Kahl and D. Henrion, “Globally Optimal Estimates for GeometricReconstruction Problems,” Int. J. Comput. Vision, vol. 74, no. 1,pp. 3–15, Aug. 2007. [Online]. Available: http://dx.doi.org/10.1007/s11263-006-0015-y

[33] C. Olsson and F. Kahl, “Generalized Convexity in Multiple ViewGeometry,” Journal of Mathematical Imaging and Vision, vol. 38,pp. 35–51, 2010, 10.1007/s10851-010-0207-5. [Online]. Available:http://dx.doi.org/10.1007/s10851-010-0207-5

[34] F. Kahl, “Multiple view geometry and the L∞-norm,” in ComputerVision, 2005. ICCV 2005. Tenth IEEE International Conference on,vol. 2, oct. 2005, pp. 1002 – 1009 Vol. 2.

[35] K. Sim and R. Hartley, “Removing Outliers Using The L∞ Norm,” inComputer Vision and Pattern Recognition, 2006 IEEE Computer SocietyConference on, vol. 1, june 2006, pp. 485 – 494.

[36] S. Agarwal, N. Snavely, and S. M. Seitz, “Fast algorithms for L∞problems in multiview geometry,” in Computer Vision and PatternRecognition, 2008, pp. 1–8.

[37] L. Kneip, M. Chli, and R. Siegwart, “Robust Real-Time Visual Odom-etry with a Single Camera and an IMU,” in Proceedings of the BritishMachine Vision Conference. BMVA Press, 2011, pp. 16.1–16.11,http://dx.doi.org/10.5244/C.25.16.

[38] A. Bjorck, Numerical Methods for Least Squares Problems. SIAM,1996.

[39] J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed. NewYork: Springer, 2006.

[40] J.-Y. Bouguet, “Camera Calibration Toolbox for Matlab,” www.vision.caltech.edu/bouguetj/calib doc/, 2010.

[41] J. Hol, T. B. Schon, and F. Gustafsson, “Modeling and Calibrationof Inertial and Vision Sensors,” The international journal of roboticsresearch, vol. 29, no. 2, pp. 231–244, 2010.

[42] A. Vedaldi and B. Fulkerson, “VLFeat: An Open and PortableLibrary of Computer Vision Algorithms,” 2008. [Online]. Available:http://www.vlfeat.org/

http://hal.inria.fr/hal-00641772

http://dx.doi.org/10.1177/0278364906072768

http://dx.doi.org/10.1177/0278364906072768

http://dx.doi.org/10.1002/nav.3800020109

http://dx.doi.org/doi/10.1137/0801026

http://ijr.sagepub.com/content/early/2010/11/11/0278364910385483.abstract

http://ijr.sagepub.com/content/early/2010/11/11/0278364910385483.abstract

http://www.gurobi.com

http://dx.doi.org/10.1007/s11263-006-0015-y

http://dx.doi.org/10.1007/s11263-006-0015-y

http://dx.doi.org/10.1007/s10851-010-0207-5

www.vision.caltech.edu/bouguetj/calib_doc/

www.vision.caltech.edu/bouguetj/calib_doc/

http://www.vlfeat.org/

Documents

Initialisation and Estimation Methods for Batch ...liu.diva-portal.org/smash/get/diva2:646572/FULLTEXT01.pdfInitialisation and Estimation Methods for Batch Optimisation of Inertial/Visual