Download pdf - Pose Fusion with Chain Pose Graphs for Automated Driving - IPB · Pose Fusion with Chain Pose Graphs for Automated Driving Christian Merfels Cyrill Stachniss Abstract Automated driving

Pose Fusion with Chain Pose Graphs for Automated Driving

Christian Merfels Cyrill Stachniss

Abstract— Automated driving relies on fast, recent, accurate,and highly available pose estimates. A single localization system,however, can commonly ensure this only to some extent. In thispaper, we propose a multi-sensor fusion approach that resolvesthis by combining multiple localization systems in a plug andplay manner. We formulate our approach as a sliding windowpose graph and enforce a particular graph structure whichenables efficient optimization and a novel form of marginaliza-tion. Our pose fusion approach scales from a filtering-based toa batch solution by increasing the size of the sliding window.We evaluate our approach on simulated data as well as onreal data gathered with a prototype vehicle and demonstratethat our solution runs comfortably at 20 Hz, provides timelyestimates, is accurate, and yields a high availability.

I. INTRODUCTION

Navigation for automated vehicles requires a preciseknowledge of the car’s pose to make informed drivingdecisions. A large variety of systems and algorithms has beenproposed in the past to solve the localization task, includingsystems based on Global Positioning System (GPS), vision,and lidar. It is important to realize that the overwhelmingpart of localization systems operates within limited systemboundaries and can not guarantee 100% availability underreal-world conditions. A GPS-based localization system forexample will likely fail in satellite-denied regions suchas in tunnels or parking garages, whereas a vision-basedlocalization system is likely to fail in darkness or otherextreme lighting conditions. There is the potential that theavailability, robustness, and accuracy of the localization in-crease if multiple pose estimation procedures with orthogonalsensors are combined, adding also to the versatility and fail-safe behavior of the system.

In this paper, we present an approach to multi-sensor datafusion that decouples the localization from the fusion taskand can be executed online. Merfels et al. [1] propose a con-cept for sliding window pose graph fusion that we developto the more sophisticated chain pose graph approach in thispaper. The proposed system, called the PoseGraphFusion,enables the combination of multiple global pose sources(e.g., GPS) with multiple odometry sources (e.g., wheelodometry) to estimate the current pose, as illustrated inFig. 1 and Fig. 2. A key advantage of decoupling thefusion from the localization is the ability to incorporatethird-party localization modules for which source code isunavailable. The contribution of this paper is an online poseestimation algorithm based on fusing multiple pose sources.

Christian Merfels is with Volkswagen Group Research, Wolfsburg, andInstitute of Geodesy and Geoinformation, University of Bonn, Germany.Cyrill Stachniss is with Institute of Geodesy and Geoinformation, Universityof Bonn, Germany.

POSEGRAPHFUSION

graphconstruction

xaxb

xc

xd

graphoptimi-zation

odometry1...

odometryN

global pose1...

global poseM

pose estimate

Fig. 1. Overview of the proposed multi-sensor data fusion: multipleodometry and global pose sources are fused in a graph-based optimizationto provide a single pose estimate.

GPSvisual localizationwheel odometryPoseGraphFusionreference trajectory

Fig. 2. A coarse localization (red triangles), a precise but only temporaryavailable localization (blue triangles), and odometry as dead reckoningtrajectory (blue) are used to estimate the true trajectory (red) of a vehicle.The estimated poses are shown as black triangles: the goal is to approximatethe unknown red line as closely as possible with the black triangles.

The algorithm avoids overconfidence by performing delayedmarginalization. The pose estimation is formulated as asliding window graph-based optimization, which leads to themaximum likelihood (ML) estimate over the joint probabilityof vehicle poses in the current window. It converges to theonline ML estimate for increasing sizes of the sliding win-dow. Our pose fusion combines exchangeable input sourcesin a generic way. It deals with multi-rate sources, noncon-stant input frequencies, out-of-sequence estimates, and time-varying latencies in a straightforward manner. Efficiency isa major design criterion as the proposed system runs onlinein an automated vehicle. Different parametrizations makeit possible to scale from the (iterated) Extended KalmanFilter (EKF) to the online batch solution and thus to balanceruntime versus accuracy.

In brief, the key contributions of this paper are:• the presentation of an efficient sensor fusion algorithm

with generic odometry and global pose inputs offeringan intuitive architecture for pose estimation, whicheffortlessly resolves typical timing issues;

• the description of a graph construction algorithm de-signed to produce a sparse block-tridiagonal structure ofthe system matrix, which therefore offers a fast solution;

• the insight that marginalization on such a matrix struc-ture can exactly and efficiently be carried out withoutan additional fill-in, so that marginalization can beinterpreted as adding a prior node to the graph.

2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)Daejeon Convention CenterOctober 9-14, 2016, Daejeon, Korea

978-1-5090-3761-2/16/$31.00 ©2016 IEEE 3116

II. RELATED WORK

Multi-sensor data fusion for navigation systems enablesthe integration of information from multiple sources to esti-mate the state of the system. This is conventionally achievedby using filtering-based approaches such as the Kalman filterand its variants or, alternatively, sliding window smoothingalgorithms. For example, Kubelka et al. [2] use an errorstate EKF to fuse information from four different odometrysources. Weiss et al. [3] propose an EKF to fuse inertialmeasurement unit (IMU) with GPS data and a camera-based pose estimate. Their work is generalized by Lynenet al. [4] to a Multi-Sensor-Fusion EKF. The filtering-basedapproaches have in common that they rely at a very earlystage on the Markov assumption and marginalize all olderinformation, thus prematurely incorporating the linearizationerror. Strasdat et al. [5] show that mainly because of thatreason filtering performs suboptimal even for short timeframes when compared to smoothing.

In contrast to filtering techniques, smoothing approachescompute the ML estimate by nonlinear least squares op-timization to a Bayesian network, Markov random field(MRF), or factor graph. Offline batch optimization of thisform assumes additive, white Gaussian noise with zero mean.It considers past and future measurements. Online batchoptimization only takes into account all past states up to thecurrent one. Although these are both computationally expen-sive operations, online batch optimization becomes feasiblethrough the usage of incremental smoothing techniques, suchas iSAM2 [6], that recalculate only the part of the graph thatis affected by new measurements.

In this context, Chiu et al. [7] combine a long-termsmoother using iSAM2 and a short-term smoother using so-called Sliding-Window Factor Graphs to fuse pose sources.Indelman et al. [8] use the incremental smoothing tech-nique [6] to fuse multiple odometry and pose sources.Cucci and Matteucci propose the graph-based ROAMFREEframework [9] for multi-sensor pose tracking and sensorcalibration. They keep the size of the graph bounded bysimply discarding older nodes and edges, thus potentiallyobtaining overconfident estimates.

The approach that we present in this paper is from amethodical point of view closest to the approach of Sibleyet al. [10], who are the first to introduce the concept of a slid-ing window filter in the context of robotics. They apply it toplanetary entry, descent, and landing scenarios, in which theyestimate surface structure with a stereo camera setup. Ourcontribution is based on a similar methodology but appliesit to our use case of pose fusion. Furthermore, the structureof our problem is explicitly kept less complex by design andthus sparser due to the way our graph is constructed. Thisleads to a faster way of solving the nonlinear least squaresequations, performing marginalization, and estimating theuncertainty of the output. Furthermore, we provide a wayof semantically reasoning about the prior information arisingfrom marginalization by deriving a prior node.

III. POSEGRAPHFUSION

The input data to the sensor fusion are pose measurements,which are subject to noise. We assume the noise to be addi-tive, white, and normally distributed with zero mean. Manyestimation and data fitting problems can be formulated asnonlinear least squares problems. Our approach exploits thestate-of-the-art graph optimization framework g2o [11] andfor the most part, we adopt the notation of Kummerle et al.

The key idea is that given the state vectorx = (x>1 , . . . ,x

>m)> and a set of measurements, where zij

is the mean and Ωij the information matrix of a singlemeasurement relating xi to xj (with C being all pairs ofindices for which a measurement is available), least squaresestimation seeks the state

x∗ = argminx

∑

〈ij〉∈C

e>ijΩijeij (1)

that best explains all measurements given the `2 norm. Thevector error function e(xi,xj , zij) measures how well theconstraint from the measurement zij is satisfied and weabbreviate it as eij = e(xi,xj , zij). Solving (1) requiresiteratively solving a linear system with the system matrixH and the right-hand side vector b such that

H =∑

〈ij〉∈C

Jij(x)>ΩijJij(x), (2)

b> =∑

〈ij〉∈C

e>ijΩijJij(x), (3)

where Jij(x) refers to the Jacobian of the error functioncomputed in state x. For more details, we refer the readerto [11].

A. Sliding Window Chain Pose Graph Fusion

In contrast to general nonlinear least squares estimation,which is commonly taking into account all available in-formation within the full pose graph, it is necessary foran online state estimation system to limit the consideredinformation to keep the problem computationally tractable.The proposed approach achieves this by marginalizing outprior state variables and thus only considering a fixedamount of state variables. More formally, the state vectorx in a sliding window pose graph is reduced to the Mmost recent states x = (x>t−M+1, . . . ,x

>t )>. The size of the

system matrix H is therefore bounded by R3M×3M . Theoptimization result from the last time step provides the initialguess for the optimization in the current time step. This leadsto an effective and efficient solution in practice as a singleoptimization is usually sufficient to integrate the additionalinformation of the current time step.

The state variables consist of a position plus a heading.They are defined in a two-dimensional Cartesian coordinatesystem for which we choose the Universal Transverse Mer-cator coordinate system. We refer to pose sources, whichmeasure poses within this coordinate system, as global posesources (e.g., GPS), and poses in this system as global poses.Pose sources, which measure spatial transformations relative

3117

to the previous pose, are dubbed local pose sources or simplyodometry.

In the spirit of MRFs, we refer to nodes representing statevariables as hidden nodes. In contrast, global pose constraintsare encoded in so-called observed nodes and denoted with x.They are connected to hidden nodes to constrain them in theglobal coordinate frame. The observed nodes therefore “pull”the hidden nodes towards them. Their constraints equal zeroif and only if the hidden nodes are identical to the observednodes. Similarly, we map odometry measurements to edgesbetween hidden nodes. They “push” the hidden nodes torelative poses which are equal to the relative transformationencoded in the edge.

In contrast to related graph-based approaches [9], [10], weneither generate a hidden node every time a measurementarrives nor tie their generation to a specific pose source.Instead, we construct a hidden node every time step, i.e., ∆tseconds (the temporal resolution). For each hidden node,we query all global pose sources for measurements andinterpolate one observed node per source at the timestamp ofthe hidden node if measurements are available. Additionally,we query each odometry source to interpolate the edgesbetween all two successive hidden nodes. The motivationbehind this is to enforce a certain matrix structure forH , to include all measurement sources in a generic wayindependently of their specific output frequencies, and toa priori relate the number of state variables to the length ofthe interval of the sliding window. We refer to the resultingform of the graph as chain pose graph.

The block structure of H reflects the connections of posesand edges in the graph as it is its adjacency matrix [12].Its structure changes slightly with the availability of mea-surements. In general, the block structure of a chain posegraph is a block-tridiagonal matrix. An example graph inFig. 3 illustrates how to integrate multiple hidden and ob-served nodes as well as odometry constraints. It additionallyshows the resulting block-tridiagonal matrix structure of thecorresponding system matrix. The diagonal entries in thesystem matrix are influenced by odometry and global poseconstraints, while the off-diagonal entries are only affectedby odometry constraints. Loop closures are not consideredas they arise rarely when driving straight from destination totarget and break with the treatment of input sources as blackboxes.

The block-tridiagonal structure is a consequence of thelinear temporal ordering of the state variables combinedwith the fact that edges are at most constructed betweensuccessive nodes. We specifically design our solution toproduce a block-tridiagonal matrix because this structuredoes not produce fill-in in H after marginalization of theoldest state variables. Beyond that, even the Cholesky fac-torization H = R>R, which we perform to solve the linearsystem, does not suffer from fill-in in its triangular matrixR. In fact, R becomes a band matrix. As a consequence,costly variable reordering techniques are unnecessary as Ralready contains the minimum number of nonzero elementsnecessary to reconstruct H .

xaxb

xc

xd

(a) A chain pose graph.

(b) Corresponding block-tridiagonal structure of thesystem matrix.

Fig. 3. A chain pose graph and the corresponding structure of the systemmatrix. The black circles are hidden nodes, the dashed blue circles are globalpose measurements from two different sources, the non-dashed blue circlesare observed nodes, and the green edges are odometry constraints. Notehow the raw global pose measurements are interpolated (dashed blue lines)at the same timestamps as the hidden nodes to obtain the observed nodes.

Furthermore, the computational complexity of theCholesky factorization of a block-tridiagonal matrix is O(n)(with n being the number of nonzero entries in H). Thisis a substantial improvement as for arbitrary (dense) matrixstructures their decomposition or inversion becomes as costlyas approximately O(n2.4). For experiments concerning theruntime for general graph-based optimization with g2o, werefer the reader to the runtime evaluations in [11, Fig. 8],where the time complexity for optimizing the Manhattan3500dataset is clearly higher than linear in the number of nodesalthough the same sparse matrix techniques have been usedthat are being used in our implementation. For our approach,we demonstrate the linear time complexity for a practicalexperiment in the evaluations section.

In summary, our chain pose graph approach prevents fill-in after marginalization in H , during the factorization inR, makes common variable reodering strategies unnecessary,and is efficiently solvable in O(n).

B. Time behavior

After detailing how and for which timestamps we con-struct hidden and observed nodes, we turn our attention tothe questions how our system handles the time behavior ofinput sources and how we design the time behavior of thepose fusion. In this paper, we define time behavior as thelatency, frequency, and availability of estimates.

Integrating input sources with unknown time behavioris difficult as we deal with multi-rate sources, noncon-stant input frequencies, out-of-sequence estimates, and time-varying latencies. Our approach consists in buffering allincoming data and preprocessing it. This does not introduceany delay as we do not need the measurements before thenext graph construction phase. The preprocessing includesdetecting missing pose estimates by estimating the recentinput frequency of each source and comparing the number ofestimates we should have received to the number of estimateswe actually received. Sorting the data by time enables theintegration of out-of-sequence data.

Instead of being data-triggered, the output of thePoseGraphFusion is time-triggered. Its output frequency fis decoupled from the temporal resolution ∆t. Every 1/fthe cycle of graph construction and optimization is triggered.

3118

t

source 0

source 1

source 2

PoseGraphFusion

Fig. 4. Time behavior of the input data for three examples sources 0 to 2and the corresponding time behavior of the output of the PoseGraphFusion.The input sources show occasional activity (source 0), data dropouts (source1), and different data rates. The PoseGraphFusion is able to handle thesecharacteristics and incorporates all input data by first buffering it. At the startof each cycle (directly after the red vertical line), the graph is constructedand optimized. The time necessary for that is the computation time (tealhatched). After the optimization, the estimated pose is propagated to thebeginning of the next cycle (green arrows) so that it can be transmittedimmediately.

The time spent for constructing and optimizing the graph issummed up as the computation time. The most recent state isestimated with a sliding window pose graph over the currentset of measurements. It is subsequently propagated into thefuture to the start of the next cycle as depicted in Fig. 4 witha constant turn rate and velocity model. The propagationensures that at that point, the now already computed poseestimate can directly be sent out. This in turn guarantees alow latency of the pose estimate, where we define the latencyto be the difference of the time when the pose has beencomputed and the time for which it is valid. Depending onthe application, a slightly higher latency might be tolerablein exchange for a propagation-free pose estimate, in whichcase the pose propagation is turned off.

A conventional Kalman filtering approach has difficulty togenerically incorporate multiple input sources with unknowndata rates. To integrate a new measurement, it has to prop-agate its state back in time to the time of the measurement,apply the measurement, and re-apply all other stored mea-surements. A pose graph approach with the described char-acteristics does not suffer from repeated backward-forwardcomputations and is able to elegantly resolve time behaviorissues and treat all sources in a homogeneous manner.

C. Marginalization in the form of a prior nodeAs stated in Section III-A, it is mandatory to limit the

amount of hidden nodes to maintain constant runtime com-plexity. Simply removing edges and nodes leads to informa-tion loss and is equivalent to conditioning, which potentiallyleads to overconfidence. Therefore, we marginalize the oldestnodes. This truncates the graph but retains the same informa-tion (given the linearization point). The common approachfor that is computing the Schur complement on the systemmatrix H . In general, the disadvantage of this operation isthe introduction of conditional dependencies between statevariables that are connected. As we design our problemstructure to be a chain pose graph, we are able to retainthe same sparsity pattern and do not suffer from a densersystem matrix after marginalization.

We examine the effect of the Schur complement and showthat we can exploit the knowledge of the particular block-tridiagonal matrix structure to derive the concept of a priornode, which carries the same information as introduced by

the Schur complement. In general, using a representationin the form of a graph is beneficial compared to directlysolving the least squares equations because of the possibilityto visually understand the problem, more possibilities fordata inspection, and a more intuitive way to manipulate theproblem structure. These are the same reasons why it isadvantageous to construct a prior node for marginalization in-stead of performing the Schur complement. The user has thepossibility to understand how the prior information affectsthe rest of the graph, thus allowing him to manipulate thisinformation if desired. If one was to repeatedly perform theSchur complement, it would become untraceable to under-stand the optimization result of the graph as not all necessaryinformation is conceptually represented herein. The conceptof a prior node is also supported by a more pragmatic reason:it allows us to store and load the optimization problemwith solely the help of its graph representation. Furthermore,it opens up the possibility to explicitly apply a robustkernel on the cost function of the prior node and to adjustthe uncertainty of the prior information based on context.In total, marginalization by using a prior node is makingthe graph construction logic aware of the marginalizationprocess, allows the understanding and manipulation of theprior information, and is thus preferable over marginalizationby performing the Schur complement on the system matrix.

In the following, we will first analyze the impact of theSchur complement on our chain pose graphs and subse-quently show, how to compute the uncertainty and meanestimate of the prior node to obtain the same result. Thederivation is detailed for the marginalization of a singlehidden node but can easily be applied iteratively if multiplenodes shall be marginalized.

1) Effect of the Schur complement on chain pose graphs:It is useful for the derivation to start with a graph withonly two hidden nodes (see Fig. 5a) and ignore the node xafor now to see how the different terms are affected by themarginalization. Consider a graph Gsmall where xb is linkedto xc and additionally to one or more observed nodes (Fig. 5ashows an example for two connected observed nodes, seethe blue circles). The corresponding system matrix Hsmall

is given as a block matrix and the optimization solves theequation

[Hsmallbb Hsmall

bc

Hsmallcb Hsmall

cc

]∆xsmall = −

[bsmallb

bsmallc

]. (4)

If we additionally include the hidden node xa, the graphstructure and the system matrix change. Let xa also beconnected to one or more observed nodes and consider theone or more odometry constraints between xa and xb. Theresulting graph Gfull (see Fig. 5b) is defined by the systemmatrix

H full =

Haa + Haa Hab

Hba Hsmallbb + Hbb Hsmall

bc

Hsmallcb Hsmall

cc

(5)

3119

xb

xc

(a) Graph Gsmall.

xaxb

xc

(b) Graph Gfull.

xpxb

xc

(c) Graph Gprior.

Fig. 5. Graph marginalization can be understood as prepending a priornode xp to the graph. Blue non-dashed circles represent observed nodes.For understanding how the hidden node xa influences the optimization, it isuseful to start with a graph in (a) without xa. Considering xa in (b) leadsto additional terms in the system matrix. Marginalization in (c) with a priornode xp leads to the same system matrix as performing the conventionalSchur complement on Gfull.

and the coefficient vector

bfull =

bfulla

bfullb

bfullc

=

ba + babsmallb + bbbsmallc

, (6)

where Hij and Hij capture the sum of all entries of Hij

stemming from observed nodes and odometry constraints,respectively. The terms bi and bi have the same function forthe vector entry bi. All of these terms are readily availableas they have been computed in the last iteration, includingany robust cost function applied on them.

The common way to marginalize xa consists in computingthe Schur complement of H full and bfull. This leads to themarginalized system matrix Hmarg, which is identical toHsmall except for the upper left block which changes to

Hmargbb = Hsmall

bb + Hschur, (7)

Hschur = Hbb − Hba(Haa + Haa)−1Hab. (8)

The corresponding marginalized coefficient vector is

bmarg =

[bsmallb + bschur

bsmallc

], (9)

bschur = bb − Hba(Haa + Haa)−1(ba + ba). (10)

We remark that after the marginalization, the structure of thesystem matrix is still block-tridiagonal due to the particulardesign of our chain pose graph, meaning that the sparsitypattern of H is retained after marginalization without fill-in. Moreover, the information is conserved in a consistentway. Any other marginalization method that claims to beexact, needs to produce the same result. In the remainderof this section, we show that an alternative marginalizationtechnique consists in replacing xa and its connected observednodes and edges with a prior node xp, which behaves likean observed node. To this end, two questions have to beanswered: how to compute its correct mean estimate andhow to compute the uncertainty of this prior node?

2) Derivation of the prior node: Our goal is now to derivethe equations for computing the prior node xp as depictedin Fig. 5c. First of all, we observe that in the special casesthat xa is not directly connected to any observed node orthat there exist no edges between xa and xb, xa does notinfluence the estimate of xb and we can omit constructing a

prior node. The derivations in the following treat the generalcase in which xa is connected to at least one observed node(which can for example be the prior node from the last cycle)and is additionally linked via at least one edge to xb.

The first step is to remove xa as well as all observednodes and edges connected to it. Connecting the prior nodexp with the information matrix Ωp to xb leads to the graphGprior (see Fig. 5c). The prepended node yields an additionaladdend Hp in the matrix Hprior of Gprior such that

Hprior =

[Hsmallbb + Hp Hsmall

bc

Hsmallcb Hsmall

cc

], (11)

and an additional addend bp in bprior such that

bprior =

[bsmallb + bpbsmallc

]. (12)

We want to create the prior node so that it behaves like anyother observed node in the graph. This already determinesthat the mean estimate of the edge, which relates xp toxb, is equal to the mean estimate of any observed node,i.e., (0, 0, 0). This trick simplifies the calculation of the theerror function epb and consequently its Jacobian Jpb(x)while not limiting the generality of the solution. We obtain

epb =

[R>θp 0

0> 1

](xb − xp), (13)

Jpb(x) =

[R>θp 0

0> 1

], (14)

where Rθp is the standard two-dimensional rotation matrix.We need these two terms to compute Hp and bp:

Hp = Jpb(x)>ΩpJpb(x), (15)

bp = Jpb(x)>ΩpJpb(x)(xb − xp). (16)

In total, we derived how xp influences Hprior, bprior throughHp, bp. We have also shown that the Schur complementinfluences Hmarg, bmarg through Hschur, bschur. The laststep consists in postulating Hp = Hschur and bp = bschurto guarantee that the effect of the prior node is equivalent tothe exact marginalization. We solve the resulting system ofequations by inserting Hp into bp, which leads to

bp = Hp(xb − xp) (17)⇔ bschur = Hschur(xb − xp) (18)

⇔ xp = xb −H−1schurbschur. (19)

As this allows us to calculate xp by using (8) and (10), wecan now compute Ωp:

Ωp = (Jpb(x)>)−1HschurJpb(x)−1. (20)

These analytic closed-form expressions for xp and Ωp

allow us to position the prior node xp in such a way thatthe resulting pose estimates for the rest of the graph areidentical to the Schur complement marginalization. The meanin combination with the uncertainty estimate provide us withthe insight how the marginalization affects the graph. Asmotivated above, this knowledge can (amongst others) beused to visualize the prior information, adapt its uncertainty,or apply a robust cost function on it.

3120

D. Assessing the uncertainty of the fused estimate

Modeling uncertainty is a key element of probabilisticrobotics and obtaining an estimate of the uncertainty of thefused pose estimate is vital for most sophisticated motionplanners. We described in the previous sections the process ofpreprocessing the input data (Section III-B) and constructinga sliding window chain pose graph (Section III-A). Theoptimization assigns to the hidden nodes the global poseswhich best satisfy all constraints. In this section we show thatwe can efficiently recover the uncertainty of the optimizedhidden nodes.

Solving the nonlinear least squares problem of the posefusion with a sparse solver typically involves computing theCholesky factorization H = R>R, where R is an uppertriangular matrix with entries rij . Following [13], the uncer-tainty matrix of the hidden nodes H−1 with entries H−1ij isobtained by the recursive formula

H−1ii =1

rii

1

rii−

n∑

k=i+1

rik 6=0

rikH−1ki

, (21)

H−1ij =1

rii

−

j∑

k=i+1

rik 6=0

rikH−1kj −

n∑

k=j+1

rik 6=0

rikH−1jk

. (22)

The formula yields an O(n) time complexity (with n beingthe number of state variables) because it operates on thenonzero entries of the sparse band matrix R. It becomesconstant time for sliding window pose graphs as the numberof state variables is upper bounded by a constant value.As we are only interested in the estimated uncertainty ofthe last hidden node, only a single and comparably trivialcalculation has to be performed. We can therefore computethe uncertainty of our pose estimates, and even more, do soefficiently.

E. Noise correlations between input sources

We need to apply appropriate preprocessing if noise iscorrelated between input sources. This occurs when differentinput sources build up on the same sensor data or when thesame algorithm runs on two physically different sensors. Weexplicitly account for correlated noise between sources toavoid overconfident estimates. As our approach is agnosticto the specific type of input source, we perform fusion underunknown correlation and employ covariance intersection [14]in a preprocessing step (see [1] for further details). To thisend, we identify groups of sources with correlated noise,buffer their measurements, and combine them into a singleconsistent estimate per group using covariance intersection.These estimates are subsequently entered into the pose fusionprocess. This results in a reduced number of input sources,in conservative covariance estimates, and avoids the potentialthreat of divergent or overconfident fusion estimates.

IV. EVALUATION

The experimental section is designed to support our claimsthat the presented sensor fusion approach is an efficient

pose estimation algorithm, converges accurately towards theonline ML estimate, scales from a filtering to a batchleast squares solution, and handles different time behaviorswithout difficulty. We provide experiments on data gatheredon a real prototype vehicle and on simulated data.

A. Experiment on a real prototype vehicle

The following experiment is designed to show that thepose fusion is able to run online on a car and effectively gen-erate pose estimates given a set of real-world input sources.The prototype vehicle is an Audi A6 Avant equipped with afront-facing monocular camera, a front-facing automotive-grade lidar scanner, and four fisheye top view cameras.Three localization systems are incorporated as global posesources and one system is considered for odometry. The firstglobal pose source (referenced as pose source 0) is computedby matching coarse lidar scans to a globally referencedpoint cloud. The second global pose source is a GPS (posesource 1), whereas the third source (pose source 2) is avisual localization system, which computes a pose based oncomparisons of visual features with a globally referencedfeature map. The odometry source is provided by wheelodometry.

The PoseGraphFusion takes these four sources as inputand computes pose estimates for the planner. As the noisesof the odometry source and the global pose sources are notcorrelated, we directly fuse them and do not need to applycovariance intersection in this setup. We use M = 1000hidden nodes and set the temporal resolution to ∆t = 25 ms,which results in a sliding window over the last 25 s. The testdrive was carried out on a route of about 16 km in rural andurban areas in Germany.

In practice, these sources do not exhibit white Gaussiannoises with zero mean. To show this, we evaluate the auto-correlations of their noises and find significant componentsother than the zero component, meaning that the noisesare not independently distributed. Furthermore, the meansare unequal to zero. Moreover, pose sources 0 and 1 arethird-party black box modules without proper uncertaintymodels. We therefore assume constant uncertainties for theirestimates. Despite this input data not being ideal for thepresented fusion approach, we demonstrate the applicabilityof the fusion and show a decent performance. A detailedanalysis with the help of simulated data in Section IV-Bevaluates the performance for controlled noise.

1) Pose estimation quality: We evaluate the estimationquality of the PoseGraphFusion by comparing its output tothe batch solution. Fig. 6 shows the development of theposition error over time. Pose sources 0 and 1 provide onlycoarse localization estimates with RMS errors of 1.06 m and1.23 m respectively, whereas pose source 2 performs betterwith an RMS error of 0.28 m. The PoseGraphFusion staysclose to the performance of the best pose source with anRMS position error of 0.38 m and heading error of 1.16.Theoretically, the output of the fusion is supposed to improvecompared to the input sources. However, the noises are notindependently distributed with zero mean and the uncertainty

3121

0 200 400 600 8000

0.5

1

1.5

2

2.5

3

3.5

time [s]

posi

tion e

rror

[m]

pose source 0

pose source 1

pose source 2

PoseGraphFusion

Fig. 6. Position error over time for the three input sources and thefused estimate. The performance of the PoseGraphFusion is linked to theperformance of the input sources. Its RMS error is 0.38m.

models are rough as mentioned above. Additionally, the inputpose sources are providing estimates in the past and are notalways available as we will see in the next evaluation.

2) Latency and availability: As for the current real-worldexperiment, we demonstrate the capability of our approach tocombine the different time behaviors of the pose sources andgenerate a consistent behavior as output. For this purpose, weexamine the latencies of their estimates: the pose sources 0,1, and 2 all exhibit a latency of up to 0.3 s. This implies thattheir pose estimates are too old to directly feed them into ourplanner. In 95% of the time the PoseGraphFusion exhibits apose latency of 10 ms or less. Moreover, the pose sources arenot always able to calculate and provide a pose estimate dueto for example sensor or model failures. Consequently, thepose sources have a limited availability, ranging from 66.98%(pose source 0) over 97.76% (pose source 2) to 100% (posesource 1 and odometry) of the time of this experiment. ThePoseGraphFusion achieves an availability of 100% of thetime. Note that this influences the position accuracy as thepose sources can simply refuse to send a pose estimate indifficult situations whereas the PoseGraphFusion is bound todeliver.

Having shown that our system is capable of integratingdifferent time behaviors into a single consistent output be-havior, we study the question whether the proposed algorithmruns fast enough for online usage on a car.

3) Runtime performance: Next we concern ourselves withthe runtime performance of the PoseGraphFusion. The soft-ware was repeatedly run on a single core of a laptop withan Intel i7-4800QM processor. Fig. 7 shows the computationtime at each time step for different numbers of hidden nodesin the sliding window pose graph. The red curve illustratesthe need for limiting the size of the graph as otherwisethe computation time grows unboundedly and gets quicklytoo demanding for a decent output frequency. A choiceof M = 4000 hidden nodes allows us to set the outputfrequency to f ≈ 1

50ms = 20 Hz.The near-constant computation time once the graph attains

its full size is expected as the optimization of a chainpose graph of fixed size is constant as detailed in previous

0 200 400 600 8000

25

50

75

100

125

150

time [s]

com

puta

tion t

ime

[ms]

unlimited nodes

4000 nodes

2000 nodes

1000 nodes

20 nodes

Fig. 7. Computation time for four input sources. The red curve correspondsto a setup in which the number of hidden nodes is unlimited such thatnodes never get marginalized out. The resulting unbounded demand forcomputation time disqualifies it for online usage. The computation time isroughly constant in all other configurations.

0 20 40 60 80 100 120 140 160 180 200

2000

4000

6000

8000

10000

output frequency f [Hz]

Mm

ax(f

)

Fig. 8. Parameter space (assuming four input sources). The red curve marksthe maximum number of hidden nodes Mmax that can still be processedfast enough to attain the corresponding output frequency f .

sections. Also, the linear increase in computation time beforethe number of nodes equals M is in line with our theoreticalexpectations as the solution of a problem in the form of achain pose graph has a runtime complexity of O(n). Anempirical analysis of the data depicted in Fig. 7 revealsthe reciprocal relationship between the maximum numberof hidden nodes Mmax and the attainable output frequencyf , as shown in Fig. 8. Different parametrizations allow us tobalance the need for a high output frequency versus the desirefor more hidden nodes. For our requirements we usuallychoose an output frequency first and subsequently set thenumber of hidden nodes.

In total, the PoseGraphFusion generates a smooth tra-jectory and continuously provides accurate pose estimateswith a low latency at a configurable output frequency underreal-world conditions. Two last questions remain unansweredso far: how many hidden nodes are needed at least tofulfill a given accuracy requirement, and do more hiddennodes actually reduce the estimation error? To answer thesequestions, we perform two experiments with simulated inputdata.

B. Simulated input data

The next experiment is conducted on simulated inputdata and serves to investigate the estimation quality as afunction of the number of hidden nodes M . This engages thequestion of the required number of hidden nodes, illustratesthe need of delayed marginalization, and most importantlydemonstrates that the sliding window estimate converges to

3122

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

position error [m]

cum

ula

tive

dis

trib

uti

on

batch opt.

16000 nodes

4000 nodes

2000 nodes

1000 nodes

500 nodes

50 nodes

input sources

Fig. 9. The cumulative distribution of the position error for differentnumbers of hidden nodes M . The batch optimization (dashed black line)depicts the theoretically attainable upper bound of accuracy if futureinformation were available. The other configurations demonstrate that theaccuracy increases for higher values of M , thus showing the need fordelayed marginalization.

the online batch solution for an increasing number of hiddennodes. All inputs are sampled around the true values froma Gaussian distribution with a standard deviation of 3.0 m,3.0 m and 4 respectively in lateral and longitudinal directionand heading orientation of the vehicle. Fig. 9 shows thatchoices for M with a decent resulting accuracy lie wellwithin the space of possible parametrizations. Furthermore,the cumulative distribution of the position error improves formore hidden nodes and approaches the batch optimization.

The second experiment with simulated data is designedto show that the position error decreases for an increasingnumber of sources. The noise terms are identical for allsources and equal to the values mentioned above. Fig. 10shows how the fusion of inaccurate pose estimates (redcurve) leads to a much more accurate estimate and improvesfurther for increasing number of input sources. The onlinecombination of eight global pose and four odometry sourcesis even roughly as accurate as the offline batch optimizationof two global pose sources and one odometry source. Thisresult reinforces the motivation for a pose fusion algorithm.

In summary, a higher number of hidden nodes and moreinput sources are both advantageous with respect to theposition accuracy given certain noise assumptions.

V. CONCLUSION

The pose fusion concept presented in this paper is mo-tivated by the need for fast, recent, accurate, and highlyavailable pose estimates. We approached these requirementsby proposing a sliding window graph-based optimizationscheme, which scales from a fast incremental solution toan online batch solution by increasing the sliding windowsize. Furthermore, we described the design of chain posegraphs for efficient optimization. These graphs do not sufferfrom additional fill-in after marginalization and allow usto collapse all marginalization information in a prior node.We have shown with experiments on a real vehicle thatthe PoseGraphFusion is fast, provides timely estimates, isaccurate, and yields a high availability. The proposed system

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

position error [m]

cum

ula

tive

dis

trib

uti

on

8 + 4 sources batch opt.



8 + 4 sources

4 + 2 sources

2 + 1 sources

input sources

Fig. 10. The cumulative distribution of the position error for differentnumbers of input sources. The legend entries “a + b sources” denote aglobal pose sources (e.g., GPS) and b odometry sources. The dashed curvesare the offline batch optimization results. The red curve shows an examplefor the quality of a noisy input source. Two of these sources paired withan odometry source suffice as input to the PoseGraphFusion to obtain theaccuracy presented as black curve.

works with generic input sources and is general enough tobe applied to other autonomous systems.

REFERENCES

[1] C. Merfels, T. Riemenschneider, and C. Stachniss, “Pose Fusionwith Biased and Dependent Data for Automated Driving,” in Proc.Conf. Positioning and Navigation for Intelligent Transportation Syst.(POSNAV), 2016, ISSN: 2191-8287.

[2] V. Kubelka, L. Oswald, F. Pomerleau, F. Colas, T. Svoboda, andM. Reinstein, “Robust Data Fusion of Multimodal Sensory Informa-tion for Mobile Robots,” Journal of Field Robotics, pp. 447–473, 2014.

[3] S. Weiss, M. W. Achtelik, M. Chli, and R. Siegwart, “VersatileDistributed Pose Estimation and Sensor Self-calibration for an Au-tonomous MAV,” in Proc. IEEE Int. Conf. Robotics and Automation(ICRA), 2012, pp. 31–38.

[4] S. Lynen, M. W. Achtelik, S. Weiss, M. Chli, and R. Siegwart, “ARobust and Modular Multi-Sensor Fusion Approach Applied to MAVNavigation,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots andSystems (IROS), 2013, pp. 3923–3929.

[5] H. Strasdat, J. Montiel, and A. J. Davison, “Visual SLAM: Why filter?”Image and Vision Computing, vol. 30, no. 2, pp. 65–77, 2012.

[6] M. Kaess, H. Johannsson, R. Roberts, V. Ila, J. Leonard, and F. Del-laert, “iSAM2: Incremental Smoothing and Mapping using the Bayestree,” Int. Journal of Robotics Research, pp. 216–235, 2012.

[7] H. P. Chiu, S. Williams, F. Dellaert, S. Samarasekera, and R. Ku-mar, “Robust vision-aided Navigation using Sliding-Window FactorGraphs,” in Proc. IEEE Int. Conf. Robotics and Automation (ICRA),2013, pp. 46–53.

[8] V. Indelman, S. Williams, M. Kaess, and F. Dellaert, “Factor GraphBased Incremental Smoothing in Inertial Navigation Systems,” in Proc.Int. Conf. Inform. Fusion (FUSION), 2012, pp. 2154–2161.

[9] D. Cucci and M. Matteucci, “A Flexible Framework for Mobile RobotPose Estimation and Multi-Sensor Self-Calibration,” in Proc. Int. Conf.Inform. in Control, Automation and Robotics (ICINCO), 2013, pp.361–368.

[10] G. Sibley, L. Matthies, and G. Sukhatme, “Sliding Window Filter withApplication to Planetary Landing,” Journal of Field Robotics, vol. 27,no. 5, pp. 587–608, 2010.

[11] R. Kummerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard,“g2o: A General Framework for Graph Optimization,” in Proc. IEEEInt. Conf. Robotics and Automation (ICRA), 2011, pp. 3607–3613.

[12] R. Kummerle, “State Estimation and Optimization for Mobile RobotNavigation,” Ph.D. dissertation, University of Freiburg, 2013.

[13] M. Kaess and F. Dellaert, “Covariance Recovery from a Square RootInformation Matrix for Data Association,” Robotics and AutonomousSystems, pp. 1198–1210, 2009.

[14] S. J. Julier and J. K. Uhlmann, “A non-divergent estimation algorithmin the presence of unknown correlations,” in Proc. Amer. Control Conf.(ACC), 1997, pp. 2369–2373.

3123