Vision Aided Inertial Navigation

Embed Size (px)

Citation preview

  • 7/24/2019 Vision Aided Inertial Navigation

    1/14

    This article was downloaded by: [Nanyang Technological University]On: 04 September 2015, At: 00:57Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: 5 Howick Place,London, SW1P 1WG

    Click for updates

    Advanced RoboticsPublication details, including instructions for authors and subscription information:

    http://www.tandfonline.com/loi/tadr20

    Tightly-coupled stereo vision-aided inertial navigation

    using feature-based motion sensorsE. Asadi

    a& C.L. Bottasso

    ab

    aDepartment of Aerospace Science and Technology, Politecnico di Milano, Milano 20156,

    Italy.bWind Energy Institute, Technische Universitt Mnchen, Garching bei Mnchen 85748,

    Germany.

    Published online: 13 Jan 2014.

    To cite this article:E. Asadi & C.L. Bottasso (2014) Tightly-coupled stereo vision-aided inertial navigation using feature-

    based motion sensors, Advanced Robotics, 28:11, 717-729, DOI: 10.1080/01691864.2013.870496

    To link to this article: http://dx.doi.org/10.1080/01691864.2013.870496

    PLEASE SCROLL DOWN FOR ARTICLE

    Taylor & Francis makes every effort to ensure the accuracy of all the information (the Content) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of tContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon ashould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveor howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

    This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://

    www.tandfonline.com/page/terms-and-conditions

    http://crossmark.crossref.org/dialog/?doi=10.1080/01691864.2013.870496&domain=pdf&date_stamp=2014-01-13http://www.tandfonline.com/page/terms-and-conditionshttp://www.tandfonline.com/page/terms-and-conditionshttp://dx.doi.org/10.1080/01691864.2013.870496http://www.tandfonline.com/action/showCitFormats?doi=10.1080/01691864.2013.870496http://www.tandfonline.com/loi/tadr20http://crossmark.crossref.org/dialog/?doi=10.1080/01691864.2013.870496&domain=pdf&date_stamp=2014-01-13
  • 7/24/2019 Vision Aided Inertial Navigation

    2/14

    Advanced Robotics, 2014

    Vol. 28, No. 11, 717729, http://dx.doi.org/10.1080/01691864.2013.870496

    FULL PAPER

    Tightly-coupled stereo vision-aided inertial navigation using feature-based motion sensors

    E. Asadia and C.L. Bottassoa,baDepartment of Aerospace Science and Technology, Politecnico di Milano, Milano 20156, Italy;b Wind Energy Institute, Technische

    Universitt Mnchen, Garching bei Mnchen 85748, Germany

    (Received 31 March 2013; revised 6 August 2013; accepted 22 September 2013)

    A tightly-coupled stereo vision-aided inertial navigation system is proposed in this work, as a synergistic incorporationof vision with other sensors. In order to avoid loss of information possibly resulting by visual preprocessing, a set offeature-based motion sensors and an inertial measurement unit are directly fused together to estimate the vehicle state.Two alternative feature-based observation models are considered within the proposed fusion architecture. The first modeluses the trifocal tensor to propagate feature points by homography, so as to express geometric constraints among threeconsecutive scenes. The second one is derived by using a rigid body motion model applied to three-dimensional (3D)reconstructed feature points. A kinematic model accounts for the vehicle motion, and a Sigma-Point Kalman filter is usedto achieve a robust state estimation in the presence of non-linearities. The proposed formulation is derived for a generalplatform-independent 3D problem, and it is tested and demonstrated with a real dynamic indoor data-set alongside of a

    simulation experiment. Results show improved estimates than in the case of a classical visual odometry approach andof a loosely-coupled stereo vision-aided inertial navigation system, even in GPS (Global Positioning System)-deniedconditions and when magnetometer measurements are not reliable.

    Keywords:vision-aided inertial navigation; sensor fusion; tight-coupling; trifocal constraint

    1. Introduction

    It appears that some animal species have evolved sophisti-

    cated behavioral capabilities by, among other means, fusing

    together specialized multiple sensory systems, as for exam-

    ple in the case of some flying insects.[1] Mimicking this sit-

    uation, the sensor fusion concept is being actively exploitedin many robotics applications using both proprioceptive and

    exteroceptive sensors.[2]

    Stereo vision systems [3] are amongst the most

    information-rich sensors available today, and perform in

    a manner similar to the animal binocular vision.[4] Eyes

    provide the brain with two views of a scene, which are in

    turn combined and processed by the brain itself to build a

    complex representation of the environment, including fea-

    tures and depth information. Vision-based systems have

    been gaining increased attention in the field of autonomous

    control and navigation in ground,[57] underwater[8], and

    also aerial [9,10] applications. Several attempts have al-

    ready been documented in the design and implementation

    of robust visual odometry systems relying on monocular

    or binocular vision.[1113] Egomotion estimation through

    the use of the trifocal tensor using monocular vision was

    described in [14], and then extended in [12] to stereo

    visual odometry by considering a simple constant velocity

    Corresponding author. Email: [email protected]

    motion. Visual odometry approaches based on a quadrifocal

    relationship have also been proposed.[15]

    However, reliance on the sole visual information is a

    potential weakness of visual odometry methods for many

    applications, as scenes may occasionally become not suf-

    ficiently information rich and vision may be impaired byseveral external factors. Some authors have addressed this

    problem by combining vision and inertial sensor informa-

    tion, as this may enhance the quality of state estimates,

    with a positive cascading effect on the control systems that

    make use of such estimates. Inertial measurements have

    been used as model inputs [16] or states,[1719] using vari-

    ants of the Kalman filtering approach to robustly estimate

    motion. For example, a combination of vision-based mo-

    tion sensors with inertial navigation via three-dimensional

    (3D) reconstruction was presented in [20]. An interesting

    characteristic of the vision-aided inertial approach is that

    it may significantly improve estimation even in difficult

    situations, such as GPS (Global Positioning System)-deniedconditions, for example under vegetation cover or indoors,

    or in the presence of disruptions in the tracking of visual

    features, for example when the vehicle undergoes rapid

    motions.

    As described in [21], there are two possible architectures

    for the fusion of an inertial navigation system (INS) with a

    2014 Taylor & Francis and The Robotics Society of Japan

  • 7/24/2019 Vision Aided Inertial Navigation

    3/14

    718 E. Asadi and C.L. Bottasso

    vision unit. The first, termed loosely-coupled approach,

    [2224] uses separate INS and structure from motion (SfM)

    blocks that exchange information. In the second, termed

    tightly-coupled approach,[20,25] all information compris-

    ing the raw vision data and the inertial measurements are

    fused within a single high-order estimation filter.

    In most tightly-coupled vision-aided systems, vehiclestates as well as visual landmarks are jointly estimated,[17,

    2528] similarly to simultaneous localization and mapping

    (SLAM) approaches.[29] These methods are capable of

    handling the displacement measurement provided by the

    vision system, whilebounding the position estimation error;

    however, their computational requirements can be signif-

    icant due to the state vector augmentation that they per-

    form. This may hinder their use in real-time demanding

    applications.

    There are also approximate SLAM solutions such as

    [30,31], providing less computational complexity and sub-

    optimal estimates. Other SLAM approaches may reduce

    the amount of information that they process by incorporat-ing only high-level features (e.g. features such as corners,

    junctions, and lines), in order to achieve real-time capabil-

    ities. This way, the problem of data association is simpli-

    fied by applying the correspondence search over a smaller

    number of features. As considered in [32,33], revisiting

    landmarks that were previously tracked would not be of

    significant benefit for long-distance navigation. The idea

    explored in [32,33] is based on incorporating high-level

    landmarks that can be easily tracked for long periods of

    time using a track maintenance algorithm. This way only

    high-level landmarks are augmented to the state vector,

    while stale landmarks (i.e. correspondences that have been

    unsuccessful for a given period of time) are pruned byremoving the associated filter states. However, the resulting

    estimates of the vehicle states are suboptimal compared to

    those obtained by incorporating all feature points including

    both low- and high-level features.

    There are some other approaches [19,34] incorporating

    only frame to frame measurements with a reduced compu-

    tational complexity in comparison to SLAM. In this case,

    relative-state measurements are processed via a sole

    localization approach, while delayed states corresponding

    to previous poses are maintained by expanding the state

    vector. A formalapproachto this problem wasalso proposed

    in [2], which developed a stochastic cloning framework

    based on the indirect form of the Kalman filter. This method

    also considers augmenting the state vector with two in-

    stances of the vehicle states together with a copy of the

    last measurements.

    This work proposes a tightly-coupled stereo vision-aided

    inertial navigation system (TC-SVA-INS), resulting in an

    algorithm that is suitable for real-time applications with fast

    dynamics, as for example the flight of small autonomous

    vehicles. In order to avoid the loss of information resulting

    by the preprocessing of visual data,[35] this paper uses

    a tightly-coupled approach that relies on directly fusing

    tracked feature points together with inertial measurements

    to yield state estimates. In this work, twoalternativefeature-

    based observation models are embedded within the pro-

    posed tightly-coupled fusion architecture. The first model

    is derived via the trifocal tensor, which encapsulates ge-

    ometric constraints amongst three different scenes. Somepreliminary results concerning this model are presented in

    [36], while a comprehensive assessment of its performance

    is presented here in comparison to other possible alternative

    approaches. The second model is derived via rigid body

    motion applied to 3D reconstructed feature points. In order

    to reduce the computational cost of processing relative-state

    measurements,[2] here thestandardstateestimation method

    is used in conjunction with a simplified observation model.

    A comparison of these two tightly-coupled solutions with a

    loosely-coupled one is provided as well.

    The proposed approach is shown to limit the degradation

    of position estimation in GPS-denied conditions. Differ-

    ently from other visual odometry approaches, akinematic motion model is used here to better reflect the

    actual dynamic behavior of the vehicle. The formulation,

    being derived for arbitrary 3D motion using a quaternion

    parameterization of finite rotations, is independent of the

    platform and can be applied to any vehicle. Outlier rejec-

    tion is achieved via egomotion estimation, which is per-

    formed outside of the sensors fusion loop. The Sigma-Point

    Kalmanfilter(SPKF) is used to perform data fusionbecause

    of its robustness in the presences of non-linearities. The

    TC-SVA-INS method could be also used in parallel with

    low-frequency SLAM approaches in a decentralized archi-

    tecture to enhance its precision, where information is dis-

    tributed between two loops,[37,38] although this possibilityin not explored here.

    While the efficiency in terms of computational effort

    and reduced complexity is of the utmost importance in a

    real-time multi-sensory application, a straightforward im-

    plementation of the standard estimator without any state

    augmentation is highly advisable. Our approach does not

    utilize any state vector augmentation, neither the one that

    SLAM methods perform nor the form of augmentation

    proposed in localization methods. This way, the approach

    allows for a rather high number of relative-state measure-

    ments to be fused with inertial information via a tightly

    coupled architecture using a single standard estimator. This

    results in a reliable frame to frame displacement estimation

    and robust outlier rejection by benefiting from all avail-

    able visual information (i.e. both low- and high-level fea-

    tures). Additionally, a reduced information loss is achieved

    in comparison with a loosely-coupled architecture. The

    computational cost of the method does not grow with the

    environment size. Data association is simple and readily

    provided by a tracking algorithm as only frame to frame

    features are used. The tracking system in our work does

    not use any kind of maintenance algorithm. This way, the

  • 7/24/2019 Vision Aided Inertial Navigation

    4/14

    Advanced Robotics 719

    correlation between consecutive measurements is neglected

    as previously assumed in [39,40], and as suggested in [2]

    to reduce computational complexity.

    The effectiveness of the proposed formulation is demon-

    strated with thehelp of both an aerialsimulation experiment

    and also a real data-set gathered in a dynamic indoor case

    by the Rawseeds project.[41]

    2. Classical inertial navigation

    In this work, tracked feature points are fused via a tightly-

    coupled architecture with a standard inertial measurement

    unit based on triaxial accelerometers and gyros, in addition

    to a triaxial magnetometer, a sonar altimeter and a GPS. The

    inertial measurement formulation is based on previously

    presented work,[20,42] which is briefly reviewed next.

    2.1. Kinematics

    The inertial frame of reference is centered at point O and

    denoted by a triad of unit vectors E .=(e1, e2, e3), pointing

    North, East, andDown (NED navigationalsystem).A body-

    attached frame has origin in the generic material point B of

    the vehicle and has a triad of unit vectors B .=(b1,b2,b3)

    (see Figure1).

    The components of the acceleration in the body-attached

    frame are sensed by an accelerometer located at point A on

    the vehicle. The accelerometer yields a readingaaccaffected

    by noisenacc. Gyroscopes measure the body-attached com-

    ponents of the angular velocity vector, yielding a reading

    gyro affected by a noise disturbance ngyro. The kinematic

    equations, describing the motion of the body-attached ref-erence frame with respect to the inertial one, can be written

    as

    vEB = gE R

    aacc+

    B B rBB A

    + h(gyro,ngyro) rBB A

    +Rnacc, (1a)

    B =h(gyro,ngyro), (1b)

    rEO B =vEB , (1c)

    q =T(B)q, (1d)

    where vBisthe velocity of pointB , is the angular velocity,

    and the angular acceleration. rB A is the position vector

    from pointB to pointA , whiler O B is from pointO to point

    B. In the previous expression,gE =(0, 0, g)T indicates the

    components of the acceleration of gravity, R = R(q) are

    the components of the rotation tensor that brings triadEinto

    triadB

    , while q are rotation parameters, which are chosenhere as quaternions, so that matrix T can be written as

    T(B)=1

    2

    0 B

    T

    B B

    . (2)

    The angular acceleration at time tk is computed accord-

    ing to the following three-point stencil formula based on a

    parabolic interpolation

    h(tk)=

    3gyro(tk)4gyro(tk1)+gyro(tk2)

    /(2t),

    (3)

    wheret=tk tk1 = tk1 tk2, (cf.[20]).

    2.2. Sensors

    AGPS is located at point Gon thevehicle(see Figure1).The

    inertial components of the velocity and position vectors of

    pointG , noted respectivelyvEG andrEOG , can be expressed

    in terms ofvEB ,B, rEO B , andq as

    vEG =vEB + R

    B rBBG , (4a)

    rEOG = rEO B + Rr

    BBG , (4b)

    where rBG is the position vector from point B to point G.

    The GPS yields measurements of the position and velocity

    of point G affected by their respective noise vectors nrgps

    andnvgps , i.e.

    vgps =vEG + nvgps , (5a)

    r gps = rEOG + nvgps . (5b)

    A sonar altimeter measures the distance halong the body-

    attached vector b3, between its location at point Son the

    vehicle and point Ton the terrain, as shown in Figure 1,

    resulting in the expression

    h = rEO B3 /R33 s, (6)

    Figure 1. Reference frames and location of sensors.

  • 7/24/2019 Vision Aided Inertial Navigation

    5/14

    720 E. Asadi and C.L. Bottasso

    whererEO B3 = r O B e3andR = [Ri j ],i, j =1, 2, 3. The

    sonar altimeter yields a reading hsonar affected by noise

    nsonar, i.e.

    hsonar =h +nsonar. (7)

    Finally, we consider a magnetometerthat senses themag-

    netic field m of the Earth in the body-attached system B.

    The inertial components mE

    are assumed to be known andconstant in the area of operation of the vehicle. Hence, we

    get

    mB =RTmE. (8)

    The magnetometer yields a measurementmmagnaffected by

    noisenmagn, i.e.

    mmagn =mB +nmagn. (9)

    3. Vision system

    Considering a pair of stereo cameras located on the vehicle

    (see Figure1), a triad of unit vectors C .= (c1,c2,c3) has

    origin at the optical centerCof the left camera, where c1isdirected along the horizontal scanlines of the image plane

    and c3 is parallel to the optical axis, pointing towards the

    scene. The right camera has its optical axis and scanlines

    parallel to those of the left camera, i.e. C C, where we

    use the symbol () to indicate quantities of theright camera.

    The origin C of triad C has a distance b = bc1 from C,

    whereb is the stereo baseline.

    The SVA-INS architecture investigated here is based on

    tracking scene points among stereo images and across time

    steps. The identification and matching of feature points is

    begun with the acquisition of the images and their rectifi-

    cation. In this work, strong corners are extracted and then

    tracked using a dense disparitymap obtained by theefficientlarge-scale stereo (ELAS) matching method.[43]

    3.1. Outlier detection

    Having a set of corresponding feature points between left

    and right images and from the previous frame to the current

    one, various methods can be used to detect and remove

    outliers. This process is here performed by an optimization

    approach based on the minimization of the sum of error

    projections, and merged into the random sample consensus

    algorithm similarly to what done in [44]. However, differ-

    ently from [44], this procedure is used here to reliably iden-

    tify outliers in dynamic environments, instead of estimating

    camera motions.

    Consider a feature point P , whose position vectors with

    respect to frames C and C are noted d and d, respec-

    tively. The projections ofP onto the image plane are noted

    pC and pC. Assuming an ideal pinhole camera model,

    the homogeneous components of the position vectors are

    indicated by the symbol () and noted dC

    =(dx, dy , dz , 1)T

    and pC

    =(pu ,pv , 1)T, and they are related by perspective

    projection.

    Detection of outliers via egomotion estimation starts by a

    3D reconstruction that implies theprojection of each feature

    point from thestereo image planesto the3D world,usingthe

    triangulation method [45] and a disparity map dp obtained

    by ELAS. This results in the expressions

    dCk1

    z =

    f b

    dp, (10a)

    dCk1

    x =pCk1v d

    Ck1z

    f, (10b)

    dCk1

    y =pCk1u d

    Ck1z

    f, (10c)

    where the camera calibration parameters, focal length fand

    optical centercu andcv, are assumed to be known.

    Next, each projected 3D point in the camera frame Ck1is transformed to the new frameCkaccording to the camera

    translational r ego and rotational ego motion param-

    eters. The projected 3D point in the new frame is then

    re-projected to its image plane to obtain pCk

    . These stepscan be expressed as

    p

    Cku

    pCkv

    1

    =

    f 0 cu0 f cv

    0 0 1

    R(ego) r ego

    dCk1

    x

    dCk1

    y

    dCk1

    z

    1

    ,

    (11)

    where Ris the rotation matrix associated with the rotation

    parametersego. An identical relationship is obtained for

    the right camera.

    The position of a feature obtained by the tracking pro-

    cedure at the same time step tk is noted pCkvsn . Thus, the

    re-projection error p for all feature points in both viewscan be readily calculated as

    p =

    Ni =1

    pCkvsn (i ) pCk2 + pCkvsn (i ) pCk2

    .

    (12)

    A GaussNewton approach is used to iteratively minimize

    (12). Motion parameters are independently estimated based

    on a given number (typically 50 here) of triple sample

    points selected at random, to find the best subset of in-

    liers with reprojection errors smaller than a given threshold.

    Finally, a subset of inliers is selected to be incorporated in

    the data fusion algorithm. These inliers are here chosen by

    looking at corresponding points in stereo views that satisfy

    equation pT

    Fba p= 0, up to a given tight threshold, where

    Fba is the fundamental matrix between stereo frames.

    4. Tightly-coupled vision-aided inertial navigation

    Each pair of tracked feature points across two consecutive

    frames provides for an implicit measure of motion, this

    way effectively playing the role as a feature-based mo-

    tion sensor. A tightly-coupled architecture is developed by

  • 7/24/2019 Vision Aided Inertial Navigation

    6/14

    Advanced Robotics 721

    incorporating in the process the selected subset of tracked

    feature points to yield an estimate of the vehicle states.

    The estimatoris basedon the followingstate-space model:

    x(t) = f

    x(t), u(t), (t)

    , (13a)

    y(tk) = hx(tk), (13b)z(tk) = y(tk)+ (tk), (13c)

    where the state vector x is defined as

    x .=(vE

    T

    B ,BT ,rE

    T

    O B , q)T. (14)

    Function f(, , ) in (13a) represents in compact form the

    rigid body kinematics expressed by (1). The input vectoru

    is defined as measurements provided by the accelerometers

    aacc and gyros gyro, i.e. u .= (aTacc,

    Tgyro)

    T, while .=

    (nTacc,nTgyro)

    T is its associated measurement noise vector.

    The observation models for GPS, sonar, and magnetometer

    are given by Equations (4),(6) and(8), respectively.

    For the implementation of the vision system, a discrete

    model is needed that provides a relationship among feature

    points in three scenes. Hence, a set of feature points in two

    old scenes are propagated by an observation model to their

    respective positions in a new frame. Two alternative vision-

    based models are considered within the proposed tightly-

    coupled fusion architecture by using:

    A trifocal tensor to propagate feature points by

    geometric constraints among three scenes.

    Or a rigid body modelapplied to 3D reconstructed

    feature points.

    The two approaches are described next.

    4.1. Vision-based model via trifocal constraints

    Given a pair of matched points in two scenes, pCk1

    pCk1 , and the camera projection matrices, it is possible

    to extract the position of the points in a third scene by

    the use of the trifocal tensor.[12,45] In order to implement

    this property as a vision-based observation model directly

    within the estimator, it is useful to derive the resulting

    projection matrices in terms ofvEB , B, and q, which are

    the elements of the estimator state vector, x .

    Consider a feature point P projected onto the imageplanesof onestereo cameraat time tk1(Figure 2), where its

    position components with respect to the left camera local

    reference frame are d(tk1)Ck1 . The components can be

    transformed to the right frame attk1by a translation along

    thestereo baseline,and to theleftframe at tk via translational

    (r c) and rotational (Rc) transformations between the

    two consecutive frames, as

    d(tk1)C

    k1 = d(tk1)Ck1 b, (15a)

    d(tk)Ck =Rcd(tk1)

    Ck1 rc. (15b)

    Figure 2. Geometry for the derivation of the discrete vision-basedmodel via trifocal constraints.

    By expressing the velocity of the camera optical center

    C in terms of the velocity of point B, the translational

    components can be expressed in terms of the states x , i.e.

    rc = tCT

    R(tk)TvEB (tk)+

    B(tk) cB

    , (16)

    where matrixCindicates the constant-in-time rotation ten-

    sor that brings triad BintoC, whilec .= r BCis the position

    vector of pointCwith respect to point B .

    The rotational motion between two consecutive frames

    can be derived by composing the corresponding rotation

    tensors, as

    Rc = CTR(tk)

    TR(tk1)C. (17)

    By introducing the angular velocity according to

    IR(tk)TR(tk1)= t(tk)

    B, the rotational motion can

    be expressed in terms of the states x as

    Rc =I tCT (tk)

    BC. (18)

    Projections that map feature coordinates in local reference

    frames to the corresponding homogeneous image coordi-

    nates in each of three scenes can be obtained via (15) as

    p(tk1)Ck1 =KL

    I 0

    d(tk1)Ck1 , (19a)

    p(tk1)C

    k1 =KR

    I b

    d(tk1)Ck1 , (19b)

    p(tk)Ck =KL

    Rc rc

    d(tk1)

    Ck1 . (19c)

    Here, Pa = KL

    I 0

    , Pb = KR

    I b

    , and Pc =

    KL

    Rc rc

    are projection matrices, while KL and

    KR are the left and the right camera calibration matrices,

    respectively.

  • 7/24/2019 Vision Aided Inertial Navigation

    7/14

    722 E. Asadi and C.L. Bottasso

    The trifocal tensor, T, can be derived according to the

    projection matrices,[45] using standard matrix-vector nota-

    tion,1 yielding

    Tqri =(1)

    i +1det

    Pa

    i

    P bq

    P cr

    . (20)

    Next, point[pai ]in the first frame is transferred to the third

    one, [pci ], by a homography mapping via a line passing

    through the corresponding point[pbi ]in the second frame,

    i.e.

    pcr = pa

    i lbqTqri . (21)

    A convenient choice for lbwas suggested in [45], as the line

    perpendicular toFba p(tk1). Equations (16)(21) provide

    for a non-linear discrete observation model, which can be

    used to predict the output of feature-based motion sensors

    p(tk)Ck, as described later on in Section4.3.

    4.2. Vision-based model via 3D reconstructionAn alternative to the trifocal tensor approach is discussed

    here. In this model, the reconstructed 3D points, obtained

    by(10), are propagated across time by directly using the

    equations of rigid body motion. A preliminary version of

    this model was previously presented in [20] and evaluated

    via a different framework and simulation experiments. The

    input/output structure of the model is modified in this work

    from 3D3D to 3Dtwo-dimensional (2D), which not only

    reduces the computational effort but also enhances the qual-

    ity of the estimates.

    Considering two consecutive time instants tk1 and tk,

    the following vector closure relationship (see Figure (3))

    can be derived for both the left and right cameras

    r(tk) +c(tk) +d(tk) = r (tk1) +c(tk1) +d(tk1), (22)

    wherer .= r O B is the position vector of the reference point

    Bon the vehicle with respect to the origin O of the inertial

    Figure 3. Geometry for the derivation of the discrete vision-basedmodel via 3D reconstruction.

    frame. By expressing each of the terms of Equation(22) in

    convenient reference frames, we obtain

    d(tk)Ck = CT

    R(tk)

    T

    r(tk)E r (tk1)

    E

    + IR(tk)TR(tk1) (c

    B +Cd(tk1)Ck1 )

    +d(tk1)Ck1 . (23)

    This expression depends on the absolute position vector r ,

    a quantity that however can not be observed by a vision

    system, which only senses relative distances. Hence, since

    in the absence of GPS measurements the observed absolute

    position will drift away from the true one, it is advisable to

    rewrite the above equation in terms of velocities. By setting

    IR(tk)TR(tk1) = t(tk)

    B, (24a)

    r(tk)E r (tk1)

    E =tvE(tk), (24b)

    we have

    d(tk)Ck = tCT

    R(tk)

    TvE(tk)+ B(tk)

    (cB +Cd(tk1)Ck1 )

    +d(tk1)

    Ck1 .(25)

    The left-hand side of Equation (25), d(tk)Ck, may be com-

    puted by stereo reconstruction from tracked feature points

    at time tkusing the triangulation approach, as proposed in

    [20]. In other words, the 2D feature points are first mapped

    to 3D world and then used as visual observations within the

    filter. This way, however, unknown errors are propagated to

    the observations.

    To avoid this problem, the model is modified here by

    re-projecting the output of Equation(25), i.e. the predicted

    3D feature points, to the 2D image plane of the currentscene, so as to compute p(tk)

    Ck as the final output of the

    vision-based model, givingpu (tk)

    Ck

    pv(tk)Ck

    =

    f/dz (tk)

    Ck 0 cu0 f/dz (tk)

    Ck cv

    dx(tk)Ckdy (tk)Ck1

    .(26)

    In this case, the left-hand side of Equation (26), p(tk)Ck, is

    directly available by tracking the system at time tk.

    Equations (25)and (26)provide for an alternative non-

    linear discrete model to predict the output of feature-based

    motion sensors p(tk)Ck, and its use for multi-sensor fusion

    is described next.

    4.3. Vision-augmented fusion

    The two alternative vision-based models presented above

    are separately incorporated in the sensor fusion algorithm.

    The first implementation, termed tightly-coupled stereo

    vision-aided inertial navigation via trifocal constrains

    (TC-SVA-INS-TC), is based on Equations (16)(21), which

    can be appended to Equations(4), (6) and (8), defining a

    vision-augmented observation model, expressed in compact

  • 7/24/2019 Vision Aided Inertial Navigation

    8/14

    Advanced Robotics 723

    form by function h() in (13b). The other, called tightly-

    coupled stereo vision-aided inertial navigation via 3D

    reconstruction (TC-SVA-INS-3D), is based on Equations

    (10)and (25)(26). As for the previous method, these equa-

    tions can be appended to Equations (4),(6), and (8) to form

    the vision-augmented observation model.

    For each feature point in the subset of best inliers, weinclude in the augmented vector a new output for both the

    left and right cameras:

    y =vE

    T

    G ,rET

    OG , h,mBT , . . . ,p(tk)

    CTk , p(tk)CTk , . . .

    T.

    (27)

    Considering the tracking results at time tk, these yield visual

    measurements pvsn affected by noise nvsn:

    pvsn = p(tk)Ck +nvsn. (28)

    Accordingly, (28) is appended to the measurement equa-

    tions, where the vision-augmented measurement and noise

    vectors are defined as

    z .=

    vTgps,r

    Tgps, hsonar,m

    Tmagn, . . . ,p

    Tvsn, p

    Tvsn, . . .

    T,

    (29a)

    .=

    nTvgps ,n

    Trgps

    , nsonar,nTmagn, . . . ,n

    Tvsn,n

    Tvsn, . . . ,

    T.

    (29b)

    SPKF[46,47] is here preferred to the extended Kalman

    filter, and it is used for handling non-linearities in the

    observation model.

    5. Loosely-coupled vision-aided inertial navigation

    In contrast to the TC-SVA-INS method, one may estimate

    the motion parameters directly using a separate SfM block

    and subsequently fuse its output with the INS, obtaining

    a loosely-coupled stereo vision-aided inertial navigation

    system (LC-SVA-INS). Although less computational effort

    is required by the loosely-coupled approach,the preprocess-

    ing of visual measurements results in the approximation of

    statistical distributions,[35] in turn leading to the loss of

    information and accuracy, as it will be shown later on in the

    results section. Nonetheless, this method is considered here

    for comparison purposes, and it is thereforebriefly reviewednext.

    The camera motion parameters are computed at the end

    of the outlier detection procedure by refining the egomotion

    estimation based on the best subset of inliers (the same used

    in TC-SVA-INS). Then this estimated parameters are fed to

    the Kalman filter as visual information.

    The estimator is basedon the state-space model expressed

    by Equations (13). The relative camera motion between

    two consecutive frames, noted, respectively, rc for the

    displacement andcfor the rotation, can be expressed in

    terms of the states x as

    r c =tCT(RTvEB +

    B cB), (30a)

    c =tCTB. (30b)

    Theseequations represent a vision-based observation model

    that can be appended to the GPS observation model (5),

    definingh()in Equation (13b). Accordingly, the vector ofoutputs yis defined as

    y =vE

    T

    G ,rET

    OG , . . . , rcT, c

    T

    . (31)

    The egomotion estimation yields measurements of the rel-

    ative motion of the camera affected by noise, i.e.

    rego =rc + nrego , (32a)

    ego =c + nego . (32b)

    The definition of model (13) is complemented by the vector

    of measurements z and associated noisevectors

    z .= vTgps,r Tgps, . . . , rTego, Tego (33a)

    .=

    nTvgps ,n

    Trgps

    , . . . ,nTrego ,nTCego

    . (33b)

    6. Experiments and results

    6.1. Simulation test

    A Matlab/Simulink simulator was developed that includes

    a flight mechanics model of a small RUAV, models of in-

    ertial navigation sensors, magnetometer, GPS, and their

    noise models. The simulator is used in conjunction with the

    OGRE graphics engine, for rendering a virtual environment

    scene and simulating the image acquisition process. All

    sensor measurements are simulated (seeTable 1) as the heli-copter flies at an altitude of 2 m following a rectangular path

    at a constant speed of 2 m/swithin a small village, composed

    of houses and several other objects with realistic textures

    (see Figure4). Navigation measurements are provided at

    a rate of 100 Hz, while stereo images at the rate of 2 Hz.

    The GPS is also available at a rate of 1 Hz. To show the

    capability of the system in ensuring an accurate estimation

    of the vehicle states even without GPS, four temporary GPS

    signal losses are simulated to occur att =15, 40, 65, 90s,

    and regained after 10 sec at times t =25, 50, 75, 100 s.

    The estimated position and velocity obtained with the

    TC-SVA-INS-3D method are shown in Figures 5(a) and (c),

    respectively, using a thick solid line, while the estimated

    Table 1. Sensors and vibration noise levels.

    Sensors Noise Level

    Gyro 50(deg /s)2

    Accelerometer 0.5 (m/s2)2

    Magnetometer 1104 (Gauss)2

    GPS 2 (m)2

  • 7/24/2019 Vision Aided Inertial Navigation

    9/14

    724 E. Asadi and C.L. Bottasso

    Figure 4. Simulated small village environment and flighttrajectory.

    position and velocity obtained with the TC-SVA-INS-TC

    method are plotted using a thin solid line. Similar plots are

    also provided for the errors of position and velocity esti-

    mates in Figures 5(b) and (d), respectively. The two vertical

    dashed and solid lines indicate the time instants when the

    GPS signal is lost and reacquired, respectively. The results

    show a successful performance of both approaches in caseof temporaryloss of theGPS signal. Theerrorof theposition

    estimates, as shown in Figure5(b), smoothly increases as

    the GPS signal is masked but it does not appear to cause

    any appreciable degradation in the quality of the velocity

    and position estimates.

    6.2. Real case study

    The formulation given above was derived for a general

    3D motion, such as the one of an aerial vehicle. However,

    Figure 5. Results for the simulation experiment using two approaches: TC-SVA-INS-TC and TC-SVA-INS-3D. The two vertical dashline and vertical solid line indicate the time instants when the GPS signal is lost and reacquired, respectively.

  • 7/24/2019 Vision Aided Inertial Navigation

    10/14

    Advanced Robotics 725

    Figure 6. Plan view of the building at the University di Milano-Bicocca, explored by the ground vehicle together with matchedreference path.

    lacking a 3D data-set, a ground robot data-set was used here

    to validate the concept and illustrate the basic performance

    of the proposed method. The example is taken from the

    Rawseeds project,[41] which provides several multi-sensor

    data-sets with their associated ground truths. Specifically,

    a dynamic indoor data-set was used to assess the perfor-

    mance of a pure vision-aided inertial system in GPS-denied

    conditions and unreliablemagnetometerreadings.The loca-

    tion explored by the vehicle includes two nearby buildings,

    shown together with the reference path in Figure 6. The

    data-set is illustrative of practical realistic conditions, since

    the number of tracked points decreases in a very marked

    way when the vehicle reaches the end of corridors, because

    of textureless walls, and tracking is occasionally disrupted

    when the vehicle turns rapidly. In all subsequent examples,

    stereo images are processed and fused at 5 Hz, while inertial

    measurements are incorporated at 120 Hz.

    The estimated trajectory obtained with the visual odom-

    etry method is depicted in Figure 7.The method exhibits a

    significant degradation of the estimation quality caused by

    the loss of tracking continuity.

    Several views of the dynamic environment captured by

    the left camera are depicted in Figure 8,which also reports

    tracked feature points across time and detected outliers.The

    figure shows that the outlier detection method is capable ofsuccessfully identifying tracked feature points on moving

    objects.

    The estimated trajectory obtained with the LC-VA-INS

    method is shown in Figure 9(a) using a thick solid line,

    while theground truth is plotted using a thin solid line. Since

    the reconstruction was performed without absolute position

    information, trajectories are depicted after realignment with

    the ground truth. Figure9(b) shows a time history plot of

    the velocity estimates in thexdirection (thick solid line), to-

    gether with the error of velocity estimates w.r.t ground truth

    Figure 7. Results for thevisual odometryapproach.[12] Thin line:ground truth; thick line: trajectory estimate; stars: points wherefeature tracking is disrupted.

    (thin solid line). Despite locally weak trackings at difficult

    points in the path, and particularly at the corner of corridors,

    estimatesare significantlybetter than in thevisual odometry

    case, because of the incorporation of inertial measurements.

    However, the position estimation errors are considerable in

    parts of the path where the quality of egomotion estimation

    is low.

    Figures10(a) and 11(a) show the trajectories obtained

    with the proposed tightly-coupled methods, respectively,

    TC-SVA-INS-TC and TC-SVA-INS-3D. Similarly, Figures

    10(b) and11(d) show time history plots of the xvelocitycomponent. In both cases, path and velocity estimates are

    reasonably reliable and significantlybetter than in thevisual

    odometry and LC-SVA-INS case, because of the use of

    inertial measurements and thanks to the tightly-coupled

    fusion process. The position estimate errors keep growing

    throughout the navigation, which is not an unexpected out-

    come because of the lack of absolute positions in the sensor

    fusion process. Nevertheless, the quality of state estimates

    as well as of velocities is significantly improved with re-

    spect to the other methods considered in this work. Further-

    more,it appears that TC-SVA-INS-3D is slightly superior to

    TC-SVA-INS-TC.

    7. Conclusions

    A tightly-coupled vision-aided INS was proposed in this

    work, that synergistically incorporates vision with other

    sensors. The approach reduces loss of information by pro-

    cessing all effective visual data directly within the fusion

    algorithm. Two alternative feature-based observation mod-

    els were described and included in the proposed fusion

    architecture; one uses geometric constraints encapsulated

  • 7/24/2019 Vision Aided Inertial Navigation

    11/14

    726 E. Asadi and C.L. Bottasso

    Figure 8. Performance of the outlier rejection algorithm in a dynamic environment. Tracked feature points are colored based on theirdistance with respect to the camera, while detected outliers are shown in blue.

    by thetrifocal tensor, while the other uses rigid body motion

    applied to 3D reconstructed feature points. The formulation

    was derived for an arbitrary 3D motion, making the algo-

    rithm platform-independent.

    The proposed approach is demonstrated in case of tempo-

    rary loss of GPS signal using simulation for an autonomous

    helicopter flying in an urban environment. Furthermore, a

    real data-set from the Rawseeds project was used to assess

  • 7/24/2019 Vision Aided Inertial Navigation

    12/14

    Advanced Robotics 727

    Figure 9. Results for the loosely-coupled stereo vision-aided INS.

    Figure 10. Results for the tightly-coupled stereo vision-aided inertial navigation via trifocal constraints.

    the performance of the approach in an indoor 2D case, in

    GPS-denied conditions and without reliable magnetometer

    measurements. Results indicate that the proposed TC-SVA-

    INS methods significantly improve the quality of state esti-mation with respect to a classical visual odometry approach

    and also with respect to a loosely-coupled implementation.

    Of the two alternative feature-based observation models, the

    one using 3D reconstruction and rigid body motion appears

    to offer the best results in terms of accuracy.

    Additional work is being conducted to further optimize

    the performance of the proposed method by the imple-

    mentation of exact models via stochastic cloning. These

    efforts will also consider the mitigation of the effects on the

    estimator performance caused by delays due to the online

    processing of visual information.

    If the computational burden is not of particular concern,

    a combination of the proposed TC-SVA-INS approach withSLAM can be considered, in order to enhance the precision

    of the SLAM method. This combination may be developed

    within a decentralized or a centralized architecture. In a

    decentralized form, the TC-SVA-INS method can be used

    in parallel with a low frequency SLAM approach, where the

    information is distributed between two loops. A decentral-

    ized implementation is more suitable for real-time applica-

    tion compared to a centralized one. In a centralized combi-

    nation, high-level features canbe included in thestate vector

  • 7/24/2019 Vision Aided Inertial Navigation

    13/14

    728 E. Asadi and C.L. Bottasso

    Figure 11. Results for the tightly-coupled stereo vision-aided inertial navigation via 3D reconstruction.

    in a similar wayto theSLAM approach,whilelow-level fea-

    tures areincorporated in thesameestimator viathe proposedTC-SVA-INS formulation. These ideas are currently being

    investigated and will be reported in future publications.

    Note

    1. Thei j entry of matrix Ais denoted bya ij , indexi being

    the row number, and index j being the column number. a i

    indicates thei -th row of matrix A and ai represents matrix Awithout rowi .

    Notes on contributors

    E. Asadi received the BSc degree from

    Kashan University, Kashan, Iran, in 2002, andthe MSc degree from Yazd University, Yazd,Iran, in 2005, both in Mechanical Engineering.Currently, he is a PhD student in AerospaceEngineering at Politecnico di Milano, Milan,Italy, where he works with POLI-Rotorcraftresearch group. His current research interestsinclude sensor fusion, Kalman filtering, vision-

    aided inertial navigation, and simultaneous localization andmapping (SLAM).

    Carlo L. Bottasso received a PhD degree inAerospace Engineering from the Politecnico

    di Milano, Italy, in 1993. Currently, he is theChair of Wind Energy at TUM, TechnischeUniversitt Mnchen, Germany, and Professorof Flight Mechanics with the Department ofAerospace Science and Technology, Politec-nico di Milano, Italy. He has held visitingpositions at various institutions, including

    Rensselaer Polytechnic Institute, Georgia Institute of Technology,Lawrence Livermore National Laboratory, NASA Langley, andNREL among others. His research interests and areas of expertiseinclude the flight mechanics and aeroelasticity of rotorcraftvehicles, aeroelasticity and active control of wind turbines, andflexible multibody dynamics.

    References

    [1] Campolo D, Schenato L, Pi L, Deng X, GuglielmelliE. Attitude estimation of a biologically inspired robotichousefly via multimodal sensor fusion. Adv. Robotics.2009;23:955977.

    [2] Mourikis AI, Roumeliotis SI, Burdick JW. SC-KF mobilerobot localization: a stochastic cloning-Kalman filterfor processing relative-state measurements. IEEE Trans.Robotics. 2007;23:717730.

    [3] Konolige K, Agrawal M, Bolles R, Cowan C, Fischler M,Gerkey B. Experimental robotics: outdoor mapping andnavigation using stereo vision. Berlin: Springer; 2008. p.179190.

    [4] Jung BS, Choi SB, Ban SW, Lee M. A biologically inspiredactive stereo vision system using a bottom-up saliencymap model. In: Artificial Intelligence and Soft Computing-

    ICAISC; 2004. p. 730735.[5] Bonin-Fontand F, Ortiz A, Oliver G. Visual navigation for

    mobile robots: a survey. J. Intell. Rob. Syst. 2008;53:263296.

    [6] Jia S, Sheng J, Shang E, Takase K. Robot localization inindoor environments using radio frequency identificationtechnology and stereo vision. Adv. Robotics. 2008;22:279297.

    [7] Irie K, Yoshida T, Tomono M. Outdoor localization usingstereo vision under various illumination conditions. Adv.Robotics. 2012;26:327248.

    [8] Dalgleish FR, Tetlow JW, Allwood RL. Vision-basednavigation of unmanned underwater vehicles: a survey. Part2: Vision-based station-keeping and positioning. IMARESTProc., Part B: Marine Design and Operat. 2005;8:1319.

    [9] Liu YC, Dai QH. Vision aided unmanned aerial vehicleautonomy: an overview. Proceedings of the 3th InternationalCongress on Image and Signal Processing. Yantai, China;2010. p. 417421.

    [10] MooreRJD, Thurrowgood S, Bland D, SoccolD, SrinivasanMV. A bio-inspired stereo vision system for guidance ofautonomous aircraft. Advances in Theory and Applicationsof Stereo Vision, ISBN: 978-953-307-516-7; 2011.

    [11] Nister D, Naroditsky O, Bergen J. Visual odometry forground vehicle applications. J. Field Robotics. 2006;23:320.

    [12] Kitt B, Geiger A, Lategahn H. Visual odometry basedon stereo image sequences with RANSAC-based outlier

  • 7/24/2019 Vision Aided Inertial Navigation

    14/14

    Advanced Robotics 729

    rejection scheme. Proceedings of the IEEE IntelligentVehicles Symposium; San Diego (CA), USA; 2010.p. 486492.

    [13] Iida F. Biologically inspired visual odometer for navigationof a flying robot. Robot. Auton. Syst. 2003;44:201208.

    [14] Yu YK, Wong KH, Chang MY, Or SH. Recursive camera-motion estimation with the trifocal tensor. IEEE Trans. Syst.Man Cybernet. B. 2006;36:10811090.

    [15] Comport AI, Malis E, Rives P. Real-time quadrifocal visualodometry. Int. J. Robot. Res. 2010;29:245266.

    [16] Roumeliotis SI, Johnson AE, Montgomery JF. Augmentinginertial navigation with image-based motion estimation.Proceedings of the IEEE International Conference onRobotics and Automation. Washington DC, USA, 2002.p. 43264333.

    [17] Qian G, Chellappa R, Zheng Q. Robuststructure frommotionestimation using inertial data. Opt. Soc.Am. 2001;18:29822997.

    [18] Veth MJ, Raquet JF, Pachter M. Stochastic constraints forefficient image correspondence search. IEEE Trans. Aero.Electron. Syst. 2006;42:973982.

    [19] Mourikis AI, Roumeliotis SI. A multi-state constraintKalman filter for vision-aided inertial navigation. Proceed-

    ings of the IEEE International Conference on Robotics andAutomation, Roma, Italy; 2007. p. 35653572.

    [20] Bottasso CL, Leonello D. Vision-aided inertial navigationby sensor fusion for an autonomous rotorcraftvehicle.Proceedings of the AHS International Specialists Meetingon Unmanned Rotorcraft. Scottsdale (AZ), USA, 2009. p.324334.

    [21] CorkeP,LoboJ, Dias J.An introduction to inertialand visualsensing. Int. J. Robot. Res. 2007;26:519535.

    [22] Weiss S, Siegwart R. Real-time metric state estimationfor modular vision-inertial systems. Proceedings of theIEEE International Conferenceon Roboticsand Automation;Shanghai, China; 2011. p. 45314537.

    [23] Tardif JP, George M, Laverne M, Kelly A, Stentz A. A newapproach to vision-aided inertial navigation. Proceedings

    of the IEEE/RSJ International Conference on IntelligentRobots and Systems; 2010. p. 41614168.

    [24] Goulding JR. Biologically-inspired image-based sensorfusion approach to compensate gyro sensor drift in mobilerobot systems that balance. Proceedings of the MultisensorFusion and Integration for Intelligent Systems; Arizona,USA; 2010. p. 102108.

    [25] Strelow D, Singh S. Motion estimation from image andinertial measurements. Int. J. Robot. Res. 2004;23:11571195.

    [26] Veth M, Anderson R, Webber F, Nielsen M. Tightly-coupledINS, GPS, and imaging sensors for precision geolocation.California: Proceedings of the Institute of NavigationNational Technical Meeting; San Diego; 2008.

    [27] Ong LL, Ridley M, Kim JH, Nettleton E, Sukkarieh S.

    Six DoF decentralised SLAM. Proceedings of AustralasianConference on Robotics and Automation; 2003. p. 1016[28] Chen J,PinzA. Structure and motionby fusion of inertialand

    vision-based tracking. Proceedings of 28th OAGM/AAPRConference, Digital Imaging in Media and Education; 2004.p. 5562

    [29] Durrant-Whyte H, Bailey T. Simultaneous localisation andmapping (SLAM): Part I the essential algorithms. IEEERobotics Autom. Mag. 2006;13:99110.

    [30] Newman P, Leonard J, Tardos JD, Neira J. Explore andreturn: experimental validation of real-time concurrentmappingand localization. Proceedings of IEEEInternational

    Conference on Robotics and Automation; Washington, DC;2002. p. 18021809

    [31] Julier SJ, Uhlmann JL. Simultaneous localisation and mapbuilding using split covariance intersection. Proceedings ofIEEE/RSJ International Conference on Intelligent Robotsand Systems; Maui, HI; 2001. p. 12571262

    [32] Veth M, Raquet J. Two-Dimensional stochastic projectionsfor tight integration of Optical and inertial sensors fornavigation. Proceedings of the Institute of NavigationNational Technical Meeting; 2006. p. 587596

    [33] Ebcin S, Veth M. Tightly-coupled image-aided inertialnavigation using the unscented Kalman filter. Proceedingsof the 20th International Technical Meeting of the SatelliteDivision of The Institute of Navigation; Fort Worth, TX;2007. p. 18511860

    [34] Roumeliotis SI, Burdick JW. Stochastic cloning: a general-ized framework for processing relative state measurements.Proceedings of IEEE International Conference on Roboticsand Automation; 2002. p. 17881795

    [35] Dijkstra F, Luinge HSchon TB. Tightly coupled UWB/IMUpose estimation. Proceedings of the IEEE InternationalConference on Ultra-Wideband; 2009. p. 688692

    [36] Asadi E, Bottasso CL. Tightly-coupled vision-aided inertial

    navigation via trifocal constraints. Proceedings of the IEEEInternational Conference on Robotics and Biomimetics;Guangzhou, China; 2012. p. 8590

    [37] Asadi E, Bozorg M. A decentralized architecture forsimultaneous localizationand mapping.IEEE/ASME Trans.Mech. 2009;14:6471.

    [38] WeissS, AchtelikM, LynenS, ChliM, SiegwartR. Real-timeonboard visual-inertial state estimation and self-calibrationof MAVs in unknown environments. Proceedings of theIEEE International Conferenceon Roboticsand Automation;St. Paul, Minnesota, USA; 2012. p. 957964

    [39] Konolige K. Large-scale map-making. Proceedings ofAAAINationalConferenceonArtificial Intelligence; SanJose, CA;2004. p. 457463

    [40] Hoffman BD, Baumgartner ET, Huntsberger TL, Schenker

    PS. Improved state estimation in challenging terrain. Auton.Robot. 1999;6:113130.

    [41] Bonarini A, Burgard W, Fontana G, Matteucci M, SorrentiDG, Tardos JD. RAWSEEDS: roboticsadvancement throughweb-publishing of sensorial and elaborated extensive datasets. Proceedings of the IROS06 Workshop on Benchmarksin Robotics, Research; 2006.

    [42] Willner D, Chang CB, Dunn KP. Kalman filter algorithms fora multi-sensor system. Proceedings of the IEEE Conferenceon Decision and Control; 1976. p. 570574

    [43] Geiger A, Roser M, Urtasun R. Efficient large-scale stereomatching. Lecture notes in computer science. Springer;2011. p. 2538

    [44] Geiger A, Ziegler J, Stiller C. StereoScan: dense 3Dreconstruction in real-time. Proceedings of the IEEE

    Intelligent Vehicles Symposium. Baden-Baden; Germany;2011. p. 963968.[45] Hartley R, Zisserman A. Multiple view geometry in

    computer vision. Cambridge University Press; 2004.[46] Van Der Merwe R, Wan E, Julier SJ. Sigma-Point

    Kalman filters for nonlinear estimation and sensor fusion:applications to integrated navigation. Proceedings of theAIAA GNC Conference and Exhibition; Providence, RI;2004. p. 17351764

    [47] Xiong K, Wei CL, Liu LD. Robust unscented Kalmanfiltering for nonlinear uncertain systems. Asian J. Control.2010;12:426433.