Odometria visuale nell'ambito del progetto STEPSoldsite.inrim.it/events/lib/Cumani_2013.pdfOdometria visuale nell'ambito del progetto STEPS Aldo Cumani [email protected] Istituto Nazionale

Odometria visuale nell'ambito del progetto STEPS

Aldo Cumani

[email protected]

Istituto Nazionale di Ricerca Metrologica

Panoramica INRiM

15 maggio 2013

A. Cumani (INRiM) Odometria visuale Panoramica INRiM 2013 1 / 33

Sommario

1 Introduzione

STEPS

Odometria

2 Algoritmo INRiM

Algoritmo generico

Features (visual landmarks)

Motion estimation

3 Risultati

4 Grazie per l'attenzione


Introduzione STEPS

Progetto Regionale STEPS

Sistemi e Tecnologie per l'EsPlorazione Spaziale

R&D di tecnologie per l'esplorazione spaziale con l'obiettivo di promuovere, inambito internazionale, l'eccellenza tecnologica presente nel territorio piemontese.

Partecipanti

Thales Alenia Space (capo�la), PoliTo, Università di Torino, Università delPiemonte Orientale, ALTEC, INRiM, 24 PMI piemontesi

Risultati (STEPS 1 2009-2012, 20M)

Tecnologie abilitanti, dimostratori (virtuali e �sici) in particolare �nalizzati allosviluppo di un sistema per atterraggio morbido (lander) e mobilità di super�cie(rover), applicabile a missioni verso Luna e Marte.


Introduzione STEPS

Contributo di INRiM in STEPS

INRiM ha partecipato al WorkPackage 1B di STEPS, per uncontributo complessivo di circa 93 ke, con i seguenti compiti:

Study, development, implementation and testing of a Visual Odometry

method based upon the processing of images taken by a stereo rig

onboard the rover.

Integration of the Visual Odometry algorithm into the STEPS

demonstrator software

Help to other partners in WP1B for the development of

stereovision-based algorithms for 3D reconstruction (DEM building)


Introduzione Odometria

Visual Odometry

Odometria

Dal greco oδoς=strada, µετρoν=misura:

calcolo della lunghezza del percorso di un veicolo

dal numero di giri delle ruote.

Odometria Visuale

Ricostruzione quantitativa, almeno 2D ma

preferibilmente 3D, del percorso di un veicolo da

sequenze di immagini riprese da bordo del

veicolo stesso.

Odometro di Leonardo(Codex Atlanticus, 1478-1518)

��-

q q q q q q q q



Il tema dell'odometria visuale si inquadra nel campo della cosiddetta

navigazione autonoma, cioè l'insieme di tecniche che consentono ad una

piattaforma robotica mobile di muoversi, senza intervento di operatori

umani, in un ambiente non strutturato (per esempio: esplorazione

automatica della super�cie di un pianeta).

�A fully autonomous robot has the ability to

Gain information about the environment (Rule #1)

Work for an extended period without human intervention (Rule #2)

Move either all or part of itself throughout its operating environmentwithout human assistance (Rule #3)

Avoid situations that are harmful to people, property, or itself unless thoseare part of its design speci�cations (Rule #4)�

(Wikipedia, Autonomous robot)



Perchè VO?

L'odometria delle ruote non è precisa (slittamenti...)

E non è comunque in grado di dare una stima 3D della traiettoria, a

di�erenza dei metodi basati sulla visione

Come si fa?

Per stimare l'egomotion con metodi di visione,

devono esserci nella scena punti di riferimento

(vicini) visibili prima e dopo lo spostamento

Di conseguenza, tutti gli algoritmi di visual

odometry sono di tipo incrementale: la

traiettoria del robot risulta dalla somma di

tanti spostamenti elementari, ciascuno stimato

in qualche modo dalle immagini riprese prima

e dopo lo spostamento

��-

q q q q q q q q



Quante telecamere?

una (visione monoculare): fattibile, ma

la visione monoculare non fornisce il

fattore di scala ⇒ necessità di ricavarlo

per altra via (p.es. osservando landmark

di dimensione nota) oppure da altri

sensori (p.es. odometria dellle ruote,

blah!)

due o più (stereo o multi-camera): OK,

due telecamere ( calibrate ) consentono

di stimare sia la struttura 3D

dell'ambiente che il movimento del rover

nell'ambiente stesso, incluso il fattore di

scala


Algoritmo INRiM Algoritmo generico

(Almost) generic Stereo Visual Odometry Algorithm

Feature extraction and tracking: At suitably

spaced keyframes visual landmarks are

extracted and matched (left-right and to the

previous keyframe)

Motion estimation: The relative motion of the

rover between the current keyframe and the

previous one is estimated from matched

features.

Possible Loop closure correction: Features and

pose estimates are periodically saved. When the

rover believes to be near a saved position,

observed features are compared to the stored

ones, and a pose correction is possibly

computed.

��-

q q q q q q q q







previous keyframe)




features.






computed.

��-

q q q q q q q q







previous keyframe)




features.






computed.

��-

q q q q q q q q


Algoritmo INRiM Features (visual landmarks)

Features (visual landmarks)

What kind of features?

Point (2 linear image coordinates)

Line (1 linear, 1 angular coord)

Line segment (4 linear... but unreliable!)

Other...

In an unstructured environment like Mars or Moon surface, the only

reasonable choice are point features

Obviously, two coordinates (x,y) are not enough for identifying the

feature - we need a descriptor of the image behaviour around the given

(x,y) in order to be able to compare points in di�erent images



Point features

Detection

Localisation (x , y) of visually salient points in the image, and determination

of the apparent size (scale σ) of the visual feature. Detectors typicallysearch image space (x , y) and scale space (σ) for extrema of some local

operator (Harris, Hessian, Laplacian etc.)

Description

compact representation of the image behaviour around the detected point:

N values → point in EN → similarity from Euclidean distance. Descriptors

encode the behaviour of the luminance in a region around (x , y) of size∝ σ into N parameters from some transformation (e.g. wavelets)



Feature matching

Matching relies essentially on a similarity measure based upon descriptor

vectors distance in feature space. A nearest-neighbor-ratio approach is

used, with a bidirectional matching check.

for stereo matching, positive disparity and epipolar constraints can be

used to restrict the search for matches

for tracking, blind matching is used, although previous knowledge

about 3D structure and predicted rover motion could be used for

restricting the search areas

blind matching is also needed for cyclic corrections (in this case,

relative motion is quite unreliable)



Choosing the right features

Detectors

Harris-Laplace detector (harlap)

Hessian-Laplace detector (heslap)

Harris-A�ne detector (hara� )

Hessian-A�ne detector (hesa� )

Harris-Hessian-Laplace detector(harhes)

Edge-Laplace detector (sedgelap)

Descriptors

Freeman's steerable �lters (jla)

Lowe's Scale Invariant FeatureTransform (sift)

Gradient Location-OrientationHistogram (extended SIFT) (gloh)

Van Gool's moment invariants (mom)

Spin image (spin)

cross-correlation of image patches (cc)

Speeded Up Robust Features (SURF) (Bay, Tuytelaars, Van Gool, 9th ECCV -

2006)

Fast CVL features



CVL Features

Simpli�ed Speeded Up Robust Features (SURF) (Bay, Tuytelaars, VanGool, 9th ECCV - 2006):

detector: pixel-space and scale-space maxima of the normalized Hessian

H(σ) = σ2(Ixx(σ)Iyy (σ)− D(σ)I 2xy

(σ))

descriptor: normalized 8× 8 resampled luminance in a 10σ area aroundthe maximum

Fast detection (as in SURFs) using integral image and box �lter

approximations for derivatives

Discretised and cropped Gaussian (σ = 1.2) second derivative �lter masksin yy and xy (left), and their box approximations (right) (from Bay et al. 2006)



CVL Features 2

Fast computation of descriptors, again using the integral image:

Drawbacks: Invariant to translation, scale and a�ne illumination

changes, but NOT invariant to rotation and uneven scaling

But much faster:points time (ms) pts/frame good pts

SURF 1474 1605 2451 155U-SURF 1474 808 2451 204New method 1448 391 2401 151

Performance comparison of new method and SURF



Synthetic world

Mars-like environment simulated

by POV-Ray

advantage: ground truth rover

pose and link between rover and

world coordinates are known

with absolute accuracy

-60

-40

-20

0

20

40

60

-50 0 50 100 150 200 250

circular path

waving path



Synthetic world results

I (circular path, various detectors)

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0 50 100 150 200 250 300 350

cvlsurf

heslap/siftharaff/siftharlap/sifthesaff/siftharhes/sift

sedgelap/sift

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0 50 100 150 200 250 300 350

cvlsurf

heslap/siftharaff/siftharlap/sifthesaff/siftharhes/sift

sedgelap/sift

Simulated circular path. Left: position error (m), right: rotation error (rad),vs. path length (m) for various detectors.



Synthetic world results - some statistics

circular path

featuresstep0..20

total0..5000

4-match0..500

inliers0..250

hara�/cchara�/glohhara�/jlahara�/momhara�/sifthara�/spinharhes/siftharlap/sifthesa�/siftheslap/siftsedgelap/siftsurfcvl

waving path

featuresstep0..20

total0..5000

4-match0..500

inliers0..250

hara�/cchara�/glohhara�/jlahara�/momhara�/sifthara�/spinharhes/siftharlap/sifthesa�/siftheslap/siftsedgelap/siftsurfcvl

step: average inter-keyframe steptotal: average total number of features detected in each image4-match: average number of features matched over all 4 images of a keyframe pairinliers: average number of usable 4-matched features in a keyframe pair


Algoritmo INRiM Motion estimation

Motion estimation I

Registration in 3D

First estimate the 3D positions of the observed points in the reference

frames of the stereo head before and after the motion

Then estimate the rototranslation by trying to align the two point

clouds

Used on the �rst Mars Exploration Rovers (Maimone et al., Journal ofField Robotics 24/3, 2007)

Drawbacks:

The �tting tolerance depends on the point distance and directionIt is di�cult to handle outliers



Motion estimation II

Registration on the image plane

Begin estimating the 3D positions of the observed points and the

rototranslation as before, but only as a starting approximation

Then optimize the estimate by trying to minimize the image plane

error, i.e. the di�erence between backprojected points and actually

observed ones

Advantages:

The �tting tolerance does not depend on point distance or direction,and this greatly eases handling of outliers

This technique has been well known to photogrammetrists since 1950's

as bundle adjustment . Indeed, it consists in adjusting the bundles of

rays from each camera so that the image plane error is minimixed.



Motion estimation by bundle adjustment

Given a pair of keyframes:

xi ,1L = PLXi

xi ,1R = PRMSXi

xi ,2L = PLM12Xi

xi ,2R = PRMSM12Xi

M =

[e[r]× t

0> 1

]X =

xy1

t

x =

uvw

Cost function:

J(p) =∑

i

∑q f (‖eiq‖2)

=∑

i

∑q f (‖uiq − u∗iq‖2)

u =

[u/wv/w

]p = [r12, t12, x1, y1, t1, ...xN , yN , tN ]> (6 + 3N unknowns)

f (e2) = log(1 + e2/σ2) (robust Lorentzian cost!)

Optimization by e�cient sparse Levenberg-Marquardt. Two-pass

optimisation (�rst with Lorentzian cost, second with standard

sum-of-squares cost on inliers only)A. Cumani (INRiM) Odometria visuale Panoramica INRiM 2013 21 / 33

Risultati

INRiM data I

-10

0

10

20

30

40

50

-50 -40 -30 -20 -10 0 10

-15

-10

-5

0

5

-50 -40 -30 -20 -10 0 10

INRIM campus data I


Risultati

INRiM data I (cont.)

INRIM

campus

data I

grassy plain path

featuresstep0..20

total0..5000

4-match0..500

inliers0..250

cvl (0.7)cvl (0.9)hara�/mom

0

2

4

6

8

10

0 50 100 150 200 250 300 350

cvl (raw)cvl (dejavu)

haraff/mom (raw)


Risultati

INRiM data II-20

-10

0

10

20

30

40

50

-10 0 10 20 30 40 50 60

y [m]

-5

0

5

10 15

-10 0 10 20 30 40 50 60

z [m]

x [m]

INRIM campus data II


Risultati

INRiM data II (cont.)

INRIM

campus

data II

paved road path

featuresstep0..20

total0..5000

4-match0..500

inliers0..250

cvl (0.7)hara�/mom

0

2

4

6

8

10

12

14

0 50 100 150 200 250 300 350

return position error [m]

estimated path length [m]

cvl 0.7haraff/mom 0.9


Risultati

Results (CVL) on Oxford data

New College Dataset52478 stereo pairs over a total path length of

about 2840m

-100

-50

0

50

0 50 100 150 200

0

5

10

15

20

0 200 400 600 800 1000 1200 1400


Risultati

Results (CVL) on Karlsruhe data

20090908drive21-8

-6

-4

-2

0

2

4

6

0 5 10 15 20 25 30 35 40 45 50

20100304drive21

-70

-60

-50

-40

-30

-20

-10

0

10

0 20 40 60 80 100 120 140 160

20090908drive19-120

-100

-80

-60

-40

-20

0

20

0 50 100 150 200 250 300 350 400 450


Grazie per l'attenzione


Odometro

Odometro di Leonardo (Codex Atlanticus, 1478-1518)

Go Back


Bundle adjustment

from:

http://www.geodetic.com/Whatis.htm

given the observations of N unknown 3D pointsin M images, express the image plane errors, i.e.distances of 2D projected points from theobserved ones, as a function of the unknownparameters - the 3D point coordinates, pluspossibly imaging geometry parameters (e.g. theposes of the M cameras)

de�ne a cumulative image plane error J as asuitable function (e.g. sum of squares) of theabove errors, and estimate the unknownparameters (3D structure and motion) by seekingfor a minimum of J

for J = sum of squares and Gaussian disturbances, the estimate is optimal (ML)

the least squares solution is highly sensitive to outliers (e.g. due to wrongmatches), so it is generally safer to use a more robust cost function, or to performan accurate outliers detection (or both)

Go Back


http://www.geodetic.com/Whatis.htm

Single View Geometry (Pinhole Camera)

r

6

��

��

�3

!!!!�

��

��

��

��

�

��

��

��

��

��

r6

��

��3

-

!!!!!!!!!!!!!

rtX

x

o

y

xv

u

zprincipal

axis

imageplane

Ccameracenter

X =

2664x

y

z

t

3775 x =

24 u

v

w

35x = PX

P =ˆM | p4

˜= K

ˆR | t

˜K =

24 fu s u0fv v0

1

35in the general case, P is a rank-3 3×4 matrix (projective camera). If M isnonsingular, it is a �nite camera. P is de�ned up to scale, so it has 11 DOF.

the matrix K is the intrinsic calibration matrix of the camera: fu and fv are thefocal lengths, s the skew, u0 and v0 the image center

Go Back


Parallel Axes Stereo

PL =

24 f 0 0f 0

1

35 ˆ I3 | 03˜

PR =

24 f 0 0f 0

1

3524 | −bI3 | 0| 0

35uL = fx/z uR = f (x − b)/z

vL = vR = fy/z

d = uL − uR = fb/z

d = uL − uR is the stereo disparity which allows to determine directly the depth z

of the observed point

z = fb/d ⇒ δz = (−z2/(fb))δd , i.e. for given image plane error the error in depthincreases quadratically with z

Go Back


Parallel Axes Stereo (2)

Let's do some rotations

X =

»R1 00 1

–X′

xL = R1x′L xR = R2x

′R

then

x′L =ˆI | 0

˜X′

x′R =ˆR′2RR1 | R′2t

˜X′

it is always possible to determine R1 and R2 so that R′2RR1 = I andR′2t = [−b, 0, 0]′, i.e. the parallel axes case.

the transformations R1 and R2 can be used to warp image data obtaining a s.c.stereo recti�ed pair (note that warping may include corrections for lens distortion).This is mostly useful for stereo algorithms seeking dense disparity maps by directcorrelation of luminance data.

Go Back


Documents

Odometria visuale nell'ambito del progetto STEPSoldsite.inrim.it/events/lib/Cumani_2013.pdfOdometria visuale nell'ambito del progetto STEPS Aldo Cumani [email protected] Istituto Nazionale