O Introduction: Robot Vision Philippe Martinet o Unifying Vision and Control Selim Benhimane o Efficient Keypoint Recognition Vincent Lepetit o Multi-camera

Introduction: Robot VisionPhilippe Martinet

Unifying Vision and ControlSelim Benhimane

Efficient Keypoint RecognitionVincent Lepetit

Multi-camera and Model-based Robot VisionAndrew Comport

Visual SLAM for Spatially Aware RobotsWalterio Mayol-Cuevas

Outdoor Visual SLAM for RoboticsKurt Konolige

Advanced Vision in Deformable EnvironmentsAdrien Bartoli

Tutorial organized by Andrew Comport and Adrien BartoliNice, September 22

Visual SLAM and Spatial Awareness

SLAM= Simultaneous Localisation and Mapping

An overview of some methods currently used for SLAM using computer vision.

Recent work on enabling more stable and/or robust mapping in real-time.

Work aiming to provide better scene understanding in the context of SLAM: Spatial Awareness.

Here we concentrate on “Small” working areas where GPS, odometry and other traditional sensors are not operational or available.

Spatial Awareness

SA: A key cognitive competence that permits efficient motion and task planning.

Even from early age we use spatial awareness: the toy has not vanished it is behind the sofa.

I can point to where the entrance to the building is but cant tell how many doors are from here to there.

SLAM offers a rigorous way to implement and manage SA

Wearable personal assistants

Mayol, Davison and Murray 2003

Video at http://www.robots.ox.ac.uk/ActiveVision/Projects/Vslam/vslam.02/Videos/wearableslam2.mpg

SLAM Key historical reference:

Smith, R.C.and Cheeseman, P. "On the Representation and Estimation of Spatial Uncertainty". The International Journal of Robotics Research 5 (4): 56-68. 1986.

Proposed a stochastic framework to maintain the relationship (uncertainties) between features in the map.

“Our knowledge of the spatial relationships among objects is inherently uncertain. A manmade object does not match its geometric model exactly because of manufacturing tolerances. Even if it did, a sensor could not measure the geometric features, and thus locate the object exactly, because of measurement errors. And even if it could, a robot using the sensor cannot manipulate the object exactly as intended, because of hand positioning errors…”[Smith,Self,Cheesman 1986]

SLAM

A problem that has been identified for several years, central in mobile robot navigation and branching into other fields like wearable computing and augmented reality.

SLAM – Simultaneous Localisation And Mapping

camera

3D points (features)

camera moved

perspective projection

predict location

Aim to:

• Localise camera (6DOF – Rotation and

Translation from reference

view)

• Simultaneously

estimate 3D map of

features (e.g. 3D points)

update positions

update location

Implemented using:

Extended Kalman Filter, Particle filters, SIFT, Edglets, etc.

State representation

as in [Davison 2003]

SLAM with first order uncertainty representation

as in [Davison 2003]

Challenges for visual SLAM

On the computer vision side, improving data association: Ensuring a match is a true positive.

Representations and parameterizations to enhance mapping while within real-time.

Alternative frameworks for mapping: Can we extend area of operation? Better scene understanding.

For data association, earlier approach

Small (e.g. 11x11) image patches around salient points to represent features.

Normalized Cross Correlation (NCC) to detect features.

Small patches + accurate search regions lead to fast camera pose estimation.

Depth by projecting hypothesis at different depths.

See: A. Davison, Real-Time Simultaneous Localisation and Mapping with aSingle Camera, ICCV 2003.

However

Simple patches are insufficient for large view point or scale variations.

Small patches help speed but prone to mismatch.

Search regions can’t always be trusted (camera occlusion, motion blur).

Possible solutions: Use better feature description orOther types of features e.g. edge information.

SIFT [D. Lowe, IJCV 2004]

Find maxima in scale space to locate keypoint.

…

128 elements vector

Around keypoint, build invariant local descriptor

using gradient histograms.If for tracking, this may be wasteful!

•Uses SIFT-like descriptors (histogram of gradients) around Harris corners.•Get scale from SLAM = “predictive SIFT”.

[Chekhlov, Pupilli, Mayol and Calway, ISVC06/CVPR07]

[Chekhlov, Pupilli, Mayol and Calway, ISVC06/CVPR07]

Video at http://www.cs.bris.ac.uk/Publications/attachment-delivery.jsp?id=9

[Eade and Drummond, BMVC2006]

Edglets:

• Locally straight section of gradient Image.

•Parameterized as 3D point + direction.

•Avoid regions of conflict (e.g. close parallel edges).

•Deal with multiple matches through robust estimation.

Video at http://mi.eng.cam.ac.uk/~ee231/bmvcmovie.avi

RANSAC [Fischler and Bolles 1981]

RandomSamplingANdConsensus

Gross “outliers”

Least squares fit

RANSAC fit

•Select random sample of points.

•Propose a model (hypothesis) based on sample.

•Assess fitness of hypothesis to rest of data.

•Repeat until max number of iterations or fitness threshold reached.

•Keep best hypothesis and potentially refine hypothesis with all inliers.

OK but…

Having rich descriptors or even multiple kinds of features may still lead to wrong data associations (mismatches).

If we pass to the SLAM system every measurement we think is good it can be catastrophic.

Better to be able to recover from failure than to think it won’t fail!

[Williams, Smith and Reid ICRA2007]

•Camera relocalization using small 2D patches + RANSAC to compute pose.

•Adds a “supervisor” between visual measurements and SLAM system.

Use 3 point algorithm -> up to 4 possible poses. Verify using Matas’ Td,d test.

Also see recent work [Williams, Klein and Reid ICCV2007] using randomised trees rather than simple 2D patches.

Carry onIs lost? Select 3matches

Computepose Consistent?

yes yes

no

In brief, while within real-time limit do:

[Williams, Smith and Reid ICRA2007]

Video at http://www.robots.ox.ac.uk/ActiveVision/Projects/Vslam/vslam.04/Videos/relocalisation_icra_07.mpg

Relocalisation based on appearance hashing Use a hash function to index similar descriptors (Brown et al 2005).

Fast and memory efficient (only an index needs to be saved per descriptor).

Chekhlov et al 2008

Quantize result of Haar masks

Video at: http://www.cs.bris.ac.uk/Publications/pub_master.jsp?id=2000939

Parallel Tracking and Mapping

[Klein and Murray, Parallel Tracking and Mapping for Small AR Workspaces Proc. International Symposium on Mixed and Augmented Reality. 2007]

Decouple Mapping from Tracking, run them in separate threads on multi-core CPU.

Mapping is based on key-frames, processed using batch Bundle Adjustment.

Map is intialised from a stereo pair (using 5-Point Algorithm).

Initialised new points with epipolar search.

Large numbers (thousands) of points can be mapped in a small workspace.

[Klein and Murray, 2007]

Parallel Tracking and Mapping

CPU1

CPU2

Detect Features

Compute Camera Pose

DrawGraphics

Update Map

Detect Features

Compute Camera Pose

DrawGraphics

…

……

Video at http://www.robots.ox.ac.uk/ActiveVision/Videos/index.html

So far we have mentioned that

Maps are sparse collections of low-level features: Points (Davison et al., Chekhlov et al.) Edgelets (Eade and Drummond) Lines (Smith et al., Gee and Mayol-Cuevas)

Full correlation between features and camera Maintain full covariance matrix Loop closure: effects of measurements propagated to

all features in map

Increase in state size limits number of features

Emphasis on localization and less on the mapping output.

SLAM should avoid making “beautiful” maps (there are other better methods for that!).

Very few examples exist on improving the awareness element, e.g. Castle and Murray BMVC 07 on known object recognition within SLAM.

Commonly in Visual SLAM

Better spatial awareness through higher level structural inference

Types of Structure

• Coplanar points → planes

• Collinear edgelets → lines

• Intersecting lines → junctions

Our Contribution

• Method for augmenting SLAM map with planar and line structures.

• Evaluation of method in simulated scene: discover trade-off between

efficiency and accuracy.

Discovering structure within SLAM

Gee, Checkhlov, Calway and Mayol-Cuevas, 2008

Plane Parameters:

Plane Representation

ii

i

ii

ii

θ

θφ

,φθ

coscos

sin

sincos

c

Camera

Plane

(x,y,z)

c(θ1,φ1)

c(θ2,φ2)

normal

2211 zyxm

Basis vectors:

World

O

Gee et al 2007

1. Discover planes using

RANSAC over thresholded

subset of map

2. Initialise plane in state using

best-fit plane parameters

found from SVD of inliers

3. Augment state covariance, P,

with new plane

Plane Initialisation

TJR0

0PJP

0

new

01

... zmmv ssss

0IJ

n

O

World

P=

Append measurement covariance

R0 to covariance matrix

Multiplication with Jacobian populates cross-covariance terms

State size increases by 7 after adding plane

Gee et al 2007

State size decreases by 1 after adding point to plane

Fix points in plane: reduces state size by 2 for each fixed point

Add point to planeAdd other points to planeState size is smaller than original state if >7 points are added to plane

d

1. Decide whether point lies on

plane

2. Add point by projecting onto

plane and transforming state

and covariance

3. Decide whether to fix point on

plane

Adding Points to Plane

TJPJP new

I00

rrrrr

00I

J mmmmv niii......

11

O

σmax

s

World

Gee et al 2007

Plane Observation1. Cannot make direct

observation of plane

2. Transform points to

3D world space

3. Project points into

image and match with

predicted observations

4. Covariance matrix

embodies constraints

between plane,

camera and points

World

Gee et al 2007

Discovering planes in SLAM

Gee et al. 2007Video at: http://www.cs.bris.ac.uk/~gee

Discovering planes in SLAM

Gee et al. 2007Video at: http://www.cs.bris.ac.uk/~gee

Mean error & State reduction, planesAverage 30 runs

Gee at al 2008

Discovering 3D lines

Video at: http://www.cs.bris.ac.uk/~gee

An example application

Chekhlov et al. 2007 Video at http://www.cs.bris.ac.uk/Publications/pub_master.jsp?id=2000745

Other interesting recent work

Active search and matching: or know what to measure. Davison ICCV 2005 and Chli and Davison ECCV 2008

Submapping: managing better the scalability problem. Clemente et al RSS 2007 Eade and Drummond BMVC 2008

And the work presented in this tutorial: Randomised trees: Vincent Lepetit SFM: Andrew Comport

Software tools:

http://www.doc.ic.ac.uk/~ajd/Scene/index.html

<MonoSLAM code for Linux, works out of the box>

http://www.robots.ox.ac.uk/~gk/PTAM/

<Parallel tracking and mapping>

http://www.openslam.org/

<for SLAM algorithms mainly from robotics community>

http://www.robots.ox.ac.uk/~SSS06/

<SLAM literature and some software in Matlab>

Recommended intro reading: Yaakov Bar-Shalom, X. Rong Li, Thiagalingam Kirubarajan, Estimation with

Applications to Tracking and Navigation, Wiley-Interscience, 2001.

Hugh Durrant-Whyte and Tim Bailey, Simultaneous Localisation and Mapping (SLAM): Part I The Essential Algorithms. Robotics and Automation Magazine, June, 2006.

Tim Bailey and Hugh Durrant-Whyte, Simultaneous Localisation and Mapping (SLAM): Part II State of the Art. Robotics and Automation Magazine, September, 2006.

Andrew Davison, Ian Reid, Nicholas Molton and Olivier Stasse MonoSLAM: Real-Time Single Camera SLAM, IEEE Trans. PAMI 2007.

Andrew Calway, Andrew Davison and Walterio Mayol-Cuevas, Slides of Tutorial on Visual SLAM, BMVC 2007 avaliable at:

http://www.cs.bris.ac.uk/Research/Vision/Realtime/bmvctutorial/

Some Challenges

Deal with larger maps.

Obtain maps that are task-meaningful (manipulation, AR, metrology).

Use different feature kinds on an informed way.

Benefit from other approaches such as SFM but keep efficiency.

Incorporate semantics and beyond-geometric scene understanding.

Fin

Documents

O Introduction: Robot Vision Philippe Martinet o Unifying Vision and Control Selim Benhimane o Efficient Keypoint Recognition Vincent Lepetit o Multi-camera