Stereo matching - CVG · simplify stereo matching . by warping the images. Apply projective transformation so that epipolar lines correspond to horizontal scanlines e e map epipole

Stereo matching

Reading: Chapter 11 In Szeliski’s book

Schedule (tentative)

2

# date topic1 Sep.22 Introduction and geometry2 Sep.29 Invariant features 3 Oct.6 Camera models and calibration4 Oct.13 Multiple-view geometry5 Oct.20 Model fitting (RANSAC, EM, …)6 Oct.27 Stereo Matching7 Nov.3 2D Shape Matching 8 Nov.10 Segmentation9 Nov.17 Shape from Silhouettes10 Nov.24 Structure from motion11 Dec.1 Optical Flow 12 Dec.8 Tracking (Kalman, particle filter)13 Dec.15 Object category recognition 14 Dec.22 Specific object recognition

3

An Application: Mobile Robot Navigation

The Stanford Cart,H. Moravec, 1979.

The INRIA Mobile Robot, 1990.

Courtesy O. Faugeras and H. Moravec.

Stereo

NASA Mars Rover

Standard stereo geometry

pure translation along X-axis

PointGrey Bumblebee

Standard stereo geometry

Stereo: basic idea

error

depth

Stereo matching

• Search is limited to epipolar line (1D)• Look for most similar pixel

?

9

Stereopsis

Figure from US Navy Manual of Basic Optics and Optical Instruments, prepared by Bureau of Naval Personnel. Reprinted by Dover Publications, Inc., 1969.

10

Human Stereopsis: Reconstruction

Disparity: d = r-l = D-F.

d=0

d<0In 3D, the horopter.

11

Human Stereopsis: experimental horopter…

12

Iso-disparity curves: planar retinas

X0X1

C1

X∞

C2

Xi Xj

iii

i

=−−

−−

=11:

10110111

:

111011

0

the retina act as if it were flat!

13

What if F is not known?

Human Stereopsis: Reconstruction

Helmoltz (1909):

• There is evidence showing the vergence anglescannot be measured precisely.

• Humans get fooled by bas-relief sculptures.

• There is an analytical explanation for this.

• Relative depth can be judged accurately.

14

Human Stereopsis: Binocular Fusion

How are the correspondences established?

Julesz (1971): Is the mechanism for binocular fusiona monocular process or a binocular one??• There is anecdotal evidence for the latter (camouflage).

• Random dot stereograms provide an objective answerBP!

15

A Cooperative Model (Marr and Poggio, 1976)

Excitory connections: continuity

Inhibitory connections: uniqueness

Iterate: C = Σ C - wΣ C + C .e i 0Reprinted from Vision: A Computational Investigation into the Human Representation and Processing of Visual Information by David Marr. 1982 by David Marr. Reprinted by permission of Henry Holt and Company, LLC.

16

Correlation Methods (1970--)

Normalized Correlation: minimize θ instead.

Slide the window along the epipolar line until w.w’ is maximized.2Minimize |w-w’|.

17

Correlation Methods: Foreshortening Problems

Solution: add a second pass using disparity estimates to warpthe correlation windows, e.g. Devernay and Faugeras (1994).

Reprinted from “Computing Differential Properties of 3D Shapes from Stereopsis without 3D Models,” by F. Devernay and O. Faugeras, Proc. IEEE Conf. on Computer Vision and Pattern Recognition (1994). 1994 IEEE.

18

Multi-Scale Edge Matching (Marr, Poggio and Grimson, 1979-81)

• Edges are found by repeatedly smoothing the image and detectingthe zero crossings of the second derivative (Laplacian).• Matches at coarse scales are used to offset the search for matchesat fine scales (equivalent to eye movements).

19


One of the twoinput images

Image Laplacian

Zeros of the Laplacian

Reprinted from Vision: A Computational Investigation into the Human Representation and Processing of Visual Information by David Marr. 1982 by David Marr. Reprinted by permission of Henry Holt and Company, LLC.

20


Reprinted from Vision: A Computational Investigation into the Human Representation and Processing of Visual Information by David Marr. 1982 by David Marr. Reprinted by permission of Henry Holt and Company, LLC.

21

Pixel Dissimilarity• Absolute difference of intensities

c=|I1(x,y)- I2(x-d,y)|• Interval matching [Birchfield 98]

– Considers sensor integration– Represents pixels as intervals

22

Alternative Dissimilarity Measures

• Rank and Census transforms [Zabih ECCV94]

• Rank transform:– Define window containing R pixels around each

pixel– Count the number of pixels with lower

intensities than center pixel in the window– Replace intensity with rank (0..R-1)– Compute SAD on rank-transformed images

• Census transform: – Use bit string, defined by neighbors, instead of

scalar rank

• Robust against illumination changes

23

Rank and Census Transform Results

• Noise free, random dot stereograms• Different gain and bias

24

Systematic Errors of Area-based Stereo

• Ambiguous matches in textureless regions

• Surface over-extension [Okutomi IJCV02]

25

Surface Over-extension• Expected value of E[(x-

y)2] for x in left and y in right image is:Case A: σF

2+ σB2+(μF- μB)2 for

w/2-λ pixels in each row

Case B: 2 σB2 for w/2+λ pixels

in each row

Right image

Left image

Disparity of back surface

26

Surface Over-extension• Discontinuity

perpendicular to epipolar lines

• Discontinuity parallel to epipolar lines

Right image

Left image

Disparity of back surface

27

Over-extension and shrinkage

• Turns out that:

for discontinuities perpendicular to epipolar lines

• And:

for discontinuities parallel to epipolar lines

26ww

≤≤− λ

22ww

≤≤− λ

28

Random Dot Stereogram Experiments

29

Random Dot Stereogram Experiments

30

Offset Windows

Equivalent to using minnearby costResult: loss of depth accuracy

31

Discontinuity Detection

• Use offset windows only where appropriate– Bi-modal distribution of SSD – Pixel of interest different than mode

within window

32

Compact Windows

• [Veksler CVPR03]: Adapt windows size based on:– Average matching error per pixel– Variance of matching error– Window size (to bias towards larger

windows)

• Pick window that minimizes cost

33

Integral Image

Sum of shaded part

Compute an integral image for pixel dissimilarity at each possible disparity

A C

DB

Shaded area = A+D-B-CIndependent of size

34

Results using Compact Windows

35

Rod-shaped filters

• Instead of square windows aggregate cost in rod-shaped shiftable windows [Kim CVPR05]

• Search for one that minimizes the cost (assume that it is an iso-disparity curve)

• Typically use 36 orientations

36

Locally Adaptive Support

Apply weights to contributions of neighboringpixels according to similarity and proximity [Yoon CVPR05]

37


• Similarity in CIE Lab color space:

• Proximity: Euclidean distance

• Weights:

38

Locally Adaptive Support: Results


39

Locally Adaptive Support: Results

40

Cool ideas

• Space-time stereo (varying illumination, not shape)

41

Challenges

• Ill-posed inverse problem– Recover 3-D structure from 2-D

information

• Difficulties– Uniform regions– Half-occluded pixels

Occlusions

(Slide from Pascal Fua)

Exploiting scene constraints

44

The Ordering Constraint

But it is not always the case..

In general the pointsare in the same orderon both epipolar lines.

Ordering constraint

11 22222 3 4,54,54,5 6 11 2,3 6

21 3 4,5 61

2,3

45

6

surface slice surface as a path

occlusion right

occlusion left

Uniqueness constraint

• In an image pair each pixel has at most one corresponding pixel– In general one corresponding pixel– In case of occlusion there is none

Disparity constraint

surface slice surface as a path

bounding box

use reconstructed features to determine bounding box

constantdisparitysurfaces

48

Dynamic Programming (Baker and Binford, 1981)

Find the minimum-cost path going monotonicallydown and right from the top-left corner of thegraph to its bottom-right corner.

• Nodes = matched feature points (e.g., edge points).• Arcs = matched intervals along the epipolar lines.• Arc cost = discrepancy between intervals.

49

Stereo matching

Optimal path(dynamic programming )

Similarity measure(SSD or NCC)

Constraints• epipolar• ordering• uniqueness• disparity limit• disparity gradient limit

Trade-off• Matching cost (data)• Discontinuities (prior)

(Cox et al. CVGIP’96; Koch’96; Falkenhagen´97;Van Meerbergen,Vergauwen,Pollefeys,VanGool IJCV‘02)

50

Dynamic Programming (Ohta and Kanade, 1985)

Reprinted from “Stereo by Intra- and Intet-Scanline Search,” by Y. Ohta and T. Kanade, IEEE Trans. on Pattern Analysis and MachineIntelligence, 7(2):139-154 (1985). 1985 IEEE.

51

Hierarchical stereo matchingD

owns

ampl

ing

(Gau

ssia

n py

ram

id)

Dis

pari

ty p

ropa

gati

on

Allows faster computation

Deals with large disparity ranges

(Falkenhagen´97;Van Meerbergen,Vergauwen,Pollefeys,VanGool IJCV‘02)

52

Disparity map

image I(x,y) image I´(x´,y´)Disparity map D(x,y)

(x´,y´)=(x+D(x,y),y)

53

Example: reconstruct image from neighboring images

55

Semi-global optimization

• Optimize: E=Edata+E(|Dp-Dq|=1)+E(|Dp-Dq|>1) [Hirshmüller CVPR05]– Use mutual information as cost

• NP-hard using graph cuts or belief propagation (2-D optimization)

• Instead do dynamic programming along many directions – Don’t use visibility or ordering constraints– Enforce uniqueness– Add costs

56

Results of Semi-global optimization

57

Results of Semi-global optimization

No. 1 overall in Middlebury evaluation(at 0.5 error threshold as of Sep. 2006)

Energy minimization


Graph Cut


(general formulation requires multi-way cut!)

(Boykov et al ICCV‘99)

(Roy and Cox ICCV‘98)

Simplified graph cut

Belief Propagation

per pixel +left +right +up +down

2 3 4 5 20...subsequent iterations

(adapted from J. Coughlan slides)

first iteration

Belief of one node about another gets propagated through messages (full pdf, not just most likely state)

63

Rectification

All epipolar lines are parallel in the rectified image plane.

64

Image pair rectification

simplify stereo matching by warping the images

Apply projective transformation so that epipolar linescorrespond to horizontal scanlines

e

e

map epipole e to (1,0,0)

try to minimize image distortion

problem when epipole in (or close to) the image

He001

=

65

Planar rectification

Bring two views to standard stereo setup(moves epipole to ∞)(not possible when in/close to image)

~ image size~ image size

(calibrated)

Distortion minimization(uncalibrated)

(standard approach)

66

67

68

Polar re-parameterization around epipolesRequires only (oriented) epipolar geometryPreserve length of epipolar linesChoose ∆θ so that no pixels are compressed

original image rectified image

Polar rectification(Pollefeys et al. ICCV’99)

Works for all relative motionsGuarantees minimal image size

69

polar rectification: example

70

polar rectification: example

71

Example: Béguinage of Leuven

Does not work with standard Homography-based approaches

72

Example: Béguinage of Leuven

General iso-disparity surfaces(Pollefeys and Sinha, ECCV’04)(Pollefeys and Sinha, ECCV’04)

Example: polar rectification preserves disp.

Application: Active vision

Also interesting relation to human horopter

74

Reconstruction from Rectified Images

Disparity: d=u’-u. Depth: z = -B/d.

75

Three Views

The third eye can be used for verification..

width of a pixel

Choosing the stereo baseline

•What’s the optimal baseline?– Too small: large depth error– Too large: difficult search problem, more

occlusions

Large Baseline Small Baseline

all of thesepoints projectto the same pair of pixels

The Effect of Baseline on Depth Estimation

(Okutami and Kanade PAMI’93)

80

I1 I2 I10

Reprinted from “A Multiple-Baseline Stereo System,” by M. Okutami and T. Kanade, IEEE Trans. on PatternAnalysis and Machine Intelligence, 15(4):353-363 (1993). \copyright 1993 IEEE.

Some Solutions• Multi-view linking (Koch et al. ECCV98)• Best of left or right (Kang et al. CVPR01)• Best k out of n , ...

Problem: visibility

• Multi-baseline, multi-resolution• At each depth, baseline and

resolution selected proportional to that depth

• Allows to keep depth accuracy constant!

Variable Baseline/Resolution Stereo(Gallup et al., CVPR08)

Variable Baseline/Resolution Stereo: comparison

84

Multi-view depth fusion

• Compute depth for every pixel of reference image– Triangulation– Use multiple views– Up- and down sequence– Use Kalman filter

(Koch, Pollefeys and Van Gool. ECCV‘98)

Allows to compute robust texture

85

Real-time stereo on graphics hardware

• Computes Sum-of-Square-Differences• Hardware mip-map generation used to aggregate

results over support region• Trade-off between small and large support window

Yang and Pollefeys CVPR03

140M disparity hypothesis/sec on Radeon 9700proe.g. 512x512x20disparities at 30Hz

Shape of a kernel for summing up 6 levels

87

(1x1)

(1x1+2x2)

(1x1+2x2+4x4+8x8)

(1x1+2x2+4x4+8x8+16x16)

Combine multiple aggregation windows using hardware mipmap and multiple texture units in single pass

Plane-sweep multi-view matching

• Simple algorithm for multiple cameras• no rectification necessary• doesn’t deal with occlusions

Collins’96; Roy and Cox’98 (GC); Yang et al.’02/’03 (GPU)

Fast GPU-based plane-sweeping stereo

Plane-sweep multi-view depth estimation(Yang & Pollefeys, CVPR’03)

Plane Sweep Stereo

Plane Sweep Matching Score

Images Projected on Plane

90

Plane Sweep Stereo Result

• Ideal for GPU processing• 512x384 depthmap, 50 planes, 5 images:

50 Hz

91

Plane Sweep Stereo

Plane Sweep Matching Score (SAD)

Images Projected on Plane

92

Window-Based Matching - Local Stereo

• Pixel-to-pixel matching is ambiguous for local stereo

• Use a matching window (SAD/SSD/NCC)

• Assumes all pixels in window are on the plane

93

Multi-directional plane-sweeping stereo

• Select best-cost over depths AND orientation

3D model from 11 video frames (hand-held)

Automatic computations of dominant orientations

(Gallup, Frahm & Pollefeys, CVPR07)

Plane Priors and Quicksweep

without plane prior with plane prior

)()|()|( iiiii PcPcP πππ ∝

Use SfM points as a prior for stereo

For real-time: test only the planes with high prior probability

Selecting Best Depth / Surface Normal Combining multiple sweeps

Sweep 1 Sweep 2 Sweep 3

How to combine sweeps?

Global labeling

Graph Cut:1-2 seconds

Continuous Formulation: 45 ms

Local best cost

OR

(Zach, Gallup, Frahm,Niethammer VMV 2008)

Results

1474 frame reconstruction, 11 images in stereo,512x384 grayscale, 48 plane hypotheses (quick sweep),

processing rate 20.0 Hz (Nvidia 8800 GTX)

98

Fast visibility-based fusion of depth-maps(Merrell , Akbarzadeh, Wang, Mordohai,Frahm, Yang, Nister, Pollefeys ICCV07)

visibility constraintsstability-based fusion

Two fusion approachesconfidence-based fusion

O(n2) O(n)

Fast on GPU as key operation is rendering depth-maps into other views

O(n)

• Complements very fast multi-view stereo

100

Fast video-based modeling of cities

• Fast video processing pipeline• - up to 26Hz on single CPU/GPU• - Most image processing on GPU• (x10-x100 faster)

• - Exploits urban structure

• - Generates textured 3D mesh

100

(Pollefeys et al. IJCV, 2008)

101

building surveyed to 6mm

cm

cm

cm

cm

cm

cm

cm

error histogram

RMS error: 13.4cm mean error: 6.8cmmedian error: 3.0cm

3D-from-video evaluation: Firestone building

102

3D-from-video evaluation: Middlebury Multi-View Stereo Evaluation

Benchmark

102

Ring datasets: 47 images

Results competitive but much, much faster(30 minutes → 30 seconds)

1.3 million video frames(Chapel Hill, NC)

103

• 1.3 million frames (2 cams per side)• 26 Hz reconstruction frame rate

Computation time:1PC (3Ghz CPU+ Nvidia 8800 GTX):

14hrs @ 26fps2 weeks @ 1fps

2.5 years @ 1fpm104

105

• 1.3 million frames (2 cams per side)• 26 Hz reconstruction frame rate

Computation time:1PC (3Ghz CPU+ Nvidia 8800 GTX):

14hrs @ 26fps

106

Another Scene

108

view 1 view N. . . . . .

Depthmap

109

view 1 referenceview

view N. . . . . .

3D Model: Standard Stereo

110


111


112


113

How Accurate Is Stereo?

b

z

f

d

dbfz −=

ddbfdzz ∆=∆≈∆ 2'

Δz

Δd

dbfzz ∆≈∆

2

b = baselinef = focal lengthd = disparityz = depth

114

• Multi-baseline, multi-resolution• At each depth, baseline and resolution

selected proportional to that depth• Allows to keep depth accuracy

constant!

Variable Baseline/Resolution Stereo (Gallup et al., CVPR08

•Analysis of matching scores at various baselines and scales

– Location of global minimum unchanged– Value of global minimum preserved

Verification of consistent results

Variable Baseline/Resolution Stereo: comparison

Results

138

Depth Resolution Limits

139

Piecewise Planar Stereo

•Fit planes to scene•But also handle non-planar regions•Use planar appearance•Ensure consistency over video sequences

140

3D Reconstruction from Video

Street-Side Video Real-Time Stereo

141

Method

Video Frame Depthmap withRANSAC planes

Planar ClassProbability Map

Graph-Cut Labeling

3D Model

Flowchart

142

(Gallup et al. CVPR 2010)

149

Heightmap Model

• Enforces vertical facades• One continuous surface, no holes• 2.5D: Fast to compute, easy to store

150

(Gallup et al. 3DPVT 2010)

Occupancy Voting

d

)/exp( σdV −=

emptyV λ−=

depth

voxel

voxel

This is alog-oddslikelihood

camera 151

n-layer heightmap fusion

• Generalize to n-layer heightmap• Each layer is a transition from full/empty or

empty/full

• Compute layer positions with dynamic programming

• Use model selection (BIC) to determine number of layers

152

(Gallup et al. DAGM’10)

n-layer heightmap fusion

Richar

ICVSS 2010 153

Reconstruction from Photo Collections• Millions of images download from Flickr• Camera poses computed using (Li et al.

ECCV’08)

• Depthmaps from GPU planesweep

154

Results

156

More on stereo …

The Middleburry Stereo Vision Research Pagehttp://cat.middlebury.edu/stereo/

Recommended reading

D. Scharstein and R. Szeliski. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms.IJCV 47(1/2/3):7-42, April-June 2002. PDF file (1.15 MB) - includes current evaluation.Microsoft Research Technical Report MSR-TR-2001-81, November 2001. PDF file (1.27 MB).

256

Next week:2D Shape Matching

Documents

Stereo matching - CVG · simplify stereo matching . by warping the images. Apply projective transformation so that epipolar lines correspond to horizontal scanlines e e map epipole