Upload
others
View
14
Download
0
Embed Size (px)
Citation preview
Stereo matching
Reading: Chapter 11 In Szeliski’s book
Schedule (tentative)
2
# date topic1 Sep.22 Introduction and geometry2 Sep.29 Invariant features 3 Oct.6 Camera models and calibration4 Oct.13 Multiple-view geometry5 Oct.20 Model fitting (RANSAC, EM, …)6 Oct.27 Stereo Matching7 Nov.3 2D Shape Matching 8 Nov.10 Segmentation9 Nov.17 Shape from Silhouettes10 Nov.24 Structure from motion11 Dec.1 Optical Flow 12 Dec.8 Tracking (Kalman, particle filter)13 Dec.15 Object category recognition 14 Dec.22 Specific object recognition
3
An Application: Mobile Robot Navigation
The Stanford Cart,H. Moravec, 1979.
The INRIA Mobile Robot, 1990.
Courtesy O. Faugeras and H. Moravec.
Stereo
NASA Mars Rover
Standard stereo geometry
pure translation along X-axis
PointGrey Bumblebee
Standard stereo geometry
Stereo: basic idea
error
depth
Stereo matching
• Search is limited to epipolar line (1D)• Look for most similar pixel
?
9
Stereopsis
Figure from US Navy Manual of Basic Optics and Optical Instruments, prepared by Bureau of Naval Personnel. Reprinted by Dover Publications, Inc., 1969.
10
Human Stereopsis: Reconstruction
Disparity: d = r-l = D-F.
d=0
d<0In 3D, the horopter.
11
Human Stereopsis: experimental horopter…
12
Iso-disparity curves: planar retinas
X0X1
C1
X∞
C2
Xi Xj
iii
i
=−−
−−
=11:
10110111
:
111011
0
the retina act as if it were flat!
13
What if F is not known?
Human Stereopsis: Reconstruction
Helmoltz (1909):
• There is evidence showing the vergence anglescannot be measured precisely.
• Humans get fooled by bas-relief sculptures.
• There is an analytical explanation for this.
• Relative depth can be judged accurately.
14
Human Stereopsis: Binocular Fusion
How are the correspondences established?
Julesz (1971): Is the mechanism for binocular fusiona monocular process or a binocular one??• There is anecdotal evidence for the latter (camouflage).
• Random dot stereograms provide an objective answerBP!
15
A Cooperative Model (Marr and Poggio, 1976)
Excitory connections: continuity
Inhibitory connections: uniqueness
Iterate: C = Σ C - wΣ C + C .e i 0Reprinted from Vision: A Computational Investigation into the Human Representation and Processing of Visual Information by David Marr. 1982 by David Marr. Reprinted by permission of Henry Holt and Company, LLC.
16
Correlation Methods (1970--)
Normalized Correlation: minimize θ instead.
Slide the window along the epipolar line until w.w’ is maximized.2Minimize |w-w’|.
17
Correlation Methods: Foreshortening Problems
Solution: add a second pass using disparity estimates to warpthe correlation windows, e.g. Devernay and Faugeras (1994).
Reprinted from “Computing Differential Properties of 3D Shapes from Stereopsis without 3D Models,” by F. Devernay and O. Faugeras, Proc. IEEE Conf. on Computer Vision and Pattern Recognition (1994). 1994 IEEE.
18
Multi-Scale Edge Matching (Marr, Poggio and Grimson, 1979-81)
• Edges are found by repeatedly smoothing the image and detectingthe zero crossings of the second derivative (Laplacian).• Matches at coarse scales are used to offset the search for matchesat fine scales (equivalent to eye movements).
19
Multi-Scale Edge Matching (Marr, Poggio and Grimson, 1979-81)
One of the twoinput images
Image Laplacian
Zeros of the Laplacian
Reprinted from Vision: A Computational Investigation into the Human Representation and Processing of Visual Information by David Marr. 1982 by David Marr. Reprinted by permission of Henry Holt and Company, LLC.
20
Multi-Scale Edge Matching (Marr, Poggio and Grimson, 1979-81)
Reprinted from Vision: A Computational Investigation into the Human Representation and Processing of Visual Information by David Marr. 1982 by David Marr. Reprinted by permission of Henry Holt and Company, LLC.
21
Pixel Dissimilarity• Absolute difference of intensities
c=|I1(x,y)- I2(x-d,y)|• Interval matching [Birchfield 98]
– Considers sensor integration– Represents pixels as intervals
22
Alternative Dissimilarity Measures
• Rank and Census transforms [Zabih ECCV94]
• Rank transform:– Define window containing R pixels around each
pixel– Count the number of pixels with lower
intensities than center pixel in the window– Replace intensity with rank (0..R-1)– Compute SAD on rank-transformed images
• Census transform: – Use bit string, defined by neighbors, instead of
scalar rank
• Robust against illumination changes
23
Rank and Census Transform Results
• Noise free, random dot stereograms• Different gain and bias
24
Systematic Errors of Area-based Stereo
• Ambiguous matches in textureless regions
• Surface over-extension [Okutomi IJCV02]
25
Surface Over-extension• Expected value of E[(x-
y)2] for x in left and y in right image is:Case A: σF
2+ σB2+(μF- μB)2 for
w/2-λ pixels in each row
Case B: 2 σB2 for w/2+λ pixels
in each row
Right image
Left image
Disparity of back surface
26
Surface Over-extension• Discontinuity
perpendicular to epipolar lines
• Discontinuity parallel to epipolar lines
Right image
Left image
Disparity of back surface
27
Over-extension and shrinkage
• Turns out that:
for discontinuities perpendicular to epipolar lines
• And:
for discontinuities parallel to epipolar lines
26ww
≤≤− λ
22ww
≤≤− λ
28
Random Dot Stereogram Experiments
29
Random Dot Stereogram Experiments
30
Offset Windows
Equivalent to using minnearby costResult: loss of depth accuracy
31
Discontinuity Detection
• Use offset windows only where appropriate– Bi-modal distribution of SSD – Pixel of interest different than mode
within window
32
Compact Windows
• [Veksler CVPR03]: Adapt windows size based on:– Average matching error per pixel– Variance of matching error– Window size (to bias towards larger
windows)
• Pick window that minimizes cost
33
Integral Image
Sum of shaded part
Compute an integral image for pixel dissimilarity at each possible disparity
A C
DB
Shaded area = A+D-B-CIndependent of size
34
Results using Compact Windows
35
Rod-shaped filters
• Instead of square windows aggregate cost in rod-shaped shiftable windows [Kim CVPR05]
• Search for one that minimizes the cost (assume that it is an iso-disparity curve)
• Typically use 36 orientations
36
Locally Adaptive Support
Apply weights to contributions of neighboringpixels according to similarity and proximity [Yoon CVPR05]
37
Locally Adaptive Support
• Similarity in CIE Lab color space:
• Proximity: Euclidean distance
• Weights:
38
Locally Adaptive Support: Results
Locally Adaptive Support
39
Locally Adaptive Support: Results
40
Cool ideas
• Space-time stereo (varying illumination, not shape)
41
Challenges
• Ill-posed inverse problem– Recover 3-D structure from 2-D
information
• Difficulties– Uniform regions– Half-occluded pixels
Occlusions
(Slide from Pascal Fua)
Exploiting scene constraints
44
The Ordering Constraint
But it is not always the case..
In general the pointsare in the same orderon both epipolar lines.
Ordering constraint
11 22222 3 4,54,54,5 6 11 2,3 6
21 3 4,5 61
2,3
45
6
surface slice surface as a path
occlusion right
occlusion left
Uniqueness constraint
• In an image pair each pixel has at most one corresponding pixel– In general one corresponding pixel– In case of occlusion there is none
Disparity constraint
surface slice surface as a path
bounding box
use reconstructed features to determine bounding box
constantdisparitysurfaces
48
Dynamic Programming (Baker and Binford, 1981)
Find the minimum-cost path going monotonicallydown and right from the top-left corner of thegraph to its bottom-right corner.
• Nodes = matched feature points (e.g., edge points).• Arcs = matched intervals along the epipolar lines.• Arc cost = discrepancy between intervals.
49
Stereo matching
Optimal path(dynamic programming )
Similarity measure(SSD or NCC)
Constraints• epipolar• ordering• uniqueness• disparity limit• disparity gradient limit
Trade-off• Matching cost (data)• Discontinuities (prior)
(Cox et al. CVGIP’96; Koch’96; Falkenhagen´97;Van Meerbergen,Vergauwen,Pollefeys,VanGool IJCV‘02)
50
Dynamic Programming (Ohta and Kanade, 1985)
Reprinted from “Stereo by Intra- and Intet-Scanline Search,” by Y. Ohta and T. Kanade, IEEE Trans. on Pattern Analysis and MachineIntelligence, 7(2):139-154 (1985). 1985 IEEE.
51
Hierarchical stereo matchingD
owns
ampl
ing
(Gau
ssia
n py
ram
id)
Dis
pari
ty p
ropa
gati
on
Allows faster computation
Deals with large disparity ranges
(Falkenhagen´97;Van Meerbergen,Vergauwen,Pollefeys,VanGool IJCV‘02)
52
Disparity map
image I(x,y) image I´(x´,y´)Disparity map D(x,y)
(x´,y´)=(x+D(x,y),y)
53
Example: reconstruct image from neighboring images
55
Semi-global optimization
• Optimize: E=Edata+E(|Dp-Dq|=1)+E(|Dp-Dq|>1) [Hirshmüller CVPR05]– Use mutual information as cost
• NP-hard using graph cuts or belief propagation (2-D optimization)
• Instead do dynamic programming along many directions – Don’t use visibility or ordering constraints– Enforce uniqueness– Add costs
56
Results of Semi-global optimization
57
Results of Semi-global optimization
No. 1 overall in Middlebury evaluation(at 0.5 error threshold as of Sep. 2006)
Energy minimization
(Slide from Pascal Fua)
Graph Cut
(Slide from Pascal Fua)
(general formulation requires multi-way cut!)
(Boykov et al ICCV‘99)
(Roy and Cox ICCV‘98)
Simplified graph cut
Belief Propagation
per pixel +left +right +up +down
2 3 4 5 20...subsequent iterations
(adapted from J. Coughlan slides)
first iteration
Belief of one node about another gets propagated through messages (full pdf, not just most likely state)
63
Rectification
All epipolar lines are parallel in the rectified image plane.
64
Image pair rectification
simplify stereo matching by warping the images
Apply projective transformation so that epipolar linescorrespond to horizontal scanlines
e
e
map epipole e to (1,0,0)
try to minimize image distortion
problem when epipole in (or close to) the image
He001
=
65
Planar rectification
Bring two views to standard stereo setup(moves epipole to ∞)(not possible when in/close to image)
~ image size~ image size
(calibrated)
Distortion minimization(uncalibrated)
(standard approach)
66
67
68
Polar re-parameterization around epipolesRequires only (oriented) epipolar geometryPreserve length of epipolar linesChoose ∆θ so that no pixels are compressed
original image rectified image
Polar rectification(Pollefeys et al. ICCV’99)
Works for all relative motionsGuarantees minimal image size
69
polar rectification: example
70
polar rectification: example
71
Example: Béguinage of Leuven
Does not work with standard Homography-based approaches
72
Example: Béguinage of Leuven
General iso-disparity surfaces(Pollefeys and Sinha, ECCV’04)(Pollefeys and Sinha, ECCV’04)
Example: polar rectification preserves disp.
Application: Active vision
Also interesting relation to human horopter
74
Reconstruction from Rectified Images
Disparity: d=u’-u. Depth: z = -B/d.
75
Three Views
The third eye can be used for verification..
width of a pixel
Choosing the stereo baseline
•What’s the optimal baseline?– Too small: large depth error– Too large: difficult search problem, more
occlusions
Large Baseline Small Baseline
all of thesepoints projectto the same pair of pixels
The Effect of Baseline on Depth Estimation
(Okutami and Kanade PAMI’93)
80
I1 I2 I10
Reprinted from “A Multiple-Baseline Stereo System,” by M. Okutami and T. Kanade, IEEE Trans. on PatternAnalysis and Machine Intelligence, 15(4):353-363 (1993). \copyright 1993 IEEE.
Some Solutions• Multi-view linking (Koch et al. ECCV98)• Best of left or right (Kang et al. CVPR01)• Best k out of n , ...
Problem: visibility
• Multi-baseline, multi-resolution• At each depth, baseline and
resolution selected proportional to that depth
• Allows to keep depth accuracy constant!
Variable Baseline/Resolution Stereo(Gallup et al., CVPR08)
Variable Baseline/Resolution Stereo: comparison
84
Multi-view depth fusion
• Compute depth for every pixel of reference image– Triangulation– Use multiple views– Up- and down sequence– Use Kalman filter
(Koch, Pollefeys and Van Gool. ECCV‘98)
Allows to compute robust texture
85
Real-time stereo on graphics hardware
• Computes Sum-of-Square-Differences• Hardware mip-map generation used to aggregate
results over support region• Trade-off between small and large support window
Yang and Pollefeys CVPR03
140M disparity hypothesis/sec on Radeon 9700proe.g. 512x512x20disparities at 30Hz
Shape of a kernel for summing up 6 levels
87
(1x1)
(1x1+2x2)
(1x1+2x2+4x4+8x8)
(1x1+2x2+4x4+8x8+16x16)
Combine multiple aggregation windows using hardware mipmap and multiple texture units in single pass
Plane-sweep multi-view matching
• Simple algorithm for multiple cameras• no rectification necessary• doesn’t deal with occlusions
Collins’96; Roy and Cox’98 (GC); Yang et al.’02/’03 (GPU)
Fast GPU-based plane-sweeping stereo
Plane-sweep multi-view depth estimation(Yang & Pollefeys, CVPR’03)
Plane Sweep Stereo
Plane Sweep Matching Score
Images Projected on Plane
90
Plane Sweep Stereo Result
• Ideal for GPU processing• 512x384 depthmap, 50 planes, 5 images:
50 Hz
91
Plane Sweep Stereo
Plane Sweep Matching Score (SAD)
Images Projected on Plane
92
Window-Based Matching - Local Stereo
• Pixel-to-pixel matching is ambiguous for local stereo
• Use a matching window (SAD/SSD/NCC)
• Assumes all pixels in window are on the plane
93
Multi-directional plane-sweeping stereo
• Select best-cost over depths AND orientation
3D model from 11 video frames (hand-held)
Automatic computations of dominant orientations
(Gallup, Frahm & Pollefeys, CVPR07)
Plane Priors and Quicksweep
without plane prior with plane prior
)()|()|( iiiii PcPcP πππ ∝
Use SfM points as a prior for stereo
For real-time: test only the planes with high prior probability
Selecting Best Depth / Surface Normal Combining multiple sweeps
Sweep 1 Sweep 2 Sweep 3
How to combine sweeps?
Global labeling
Graph Cut:1-2 seconds
Continuous Formulation: 45 ms
Local best cost
OR
(Zach, Gallup, Frahm,Niethammer VMV 2008)
Results
1474 frame reconstruction, 11 images in stereo,512x384 grayscale, 48 plane hypotheses (quick sweep),
processing rate 20.0 Hz (Nvidia 8800 GTX)
98
Fast visibility-based fusion of depth-maps(Merrell , Akbarzadeh, Wang, Mordohai,Frahm, Yang, Nister, Pollefeys ICCV07)
visibility constraintsstability-based fusion
Two fusion approachesconfidence-based fusion
O(n2) O(n)
Fast on GPU as key operation is rendering depth-maps into other views
O(n)
• Complements very fast multi-view stereo
100
Fast video-based modeling of cities
• Fast video processing pipeline• - up to 26Hz on single CPU/GPU• - Most image processing on GPU• (x10-x100 faster)
• - Exploits urban structure
• - Generates textured 3D mesh
100
(Pollefeys et al. IJCV, 2008)
101
building surveyed to 6mm
cm
cm
cm
cm
cm
cm
cm
error histogram
RMS error: 13.4cm mean error: 6.8cmmedian error: 3.0cm
3D-from-video evaluation: Firestone building
102
3D-from-video evaluation: Middlebury Multi-View Stereo Evaluation
Benchmark
102
Ring datasets: 47 images
Results competitive but much, much faster(30 minutes → 30 seconds)
1.3 million video frames(Chapel Hill, NC)
103
• 1.3 million frames (2 cams per side)• 26 Hz reconstruction frame rate
Computation time:1PC (3Ghz CPU+ Nvidia 8800 GTX):
14hrs @ 26fps2 weeks @ 1fps
2.5 years @ 1fpm104
105
• 1.3 million frames (2 cams per side)• 26 Hz reconstruction frame rate
Computation time:1PC (3Ghz CPU+ Nvidia 8800 GTX):
14hrs @ 26fps
106
Another Scene
108
view 1 view N. . . . . .
Depthmap
109
view 1 referenceview
view N. . . . . .
3D Model: Standard Stereo
110
3D Model: Standard Stereo
111
3D Model: Standard Stereo
112
3D Model: Standard Stereo
113
How Accurate Is Stereo?
b
z
f
d
dbfz −=
ddbfdzz ∆=∆≈∆ 2'
Δz
Δd
dbfzz ∆≈∆
2
b = baselinef = focal lengthd = disparityz = depth
114
• Multi-baseline, multi-resolution• At each depth, baseline and resolution
selected proportional to that depth• Allows to keep depth accuracy
constant!
Variable Baseline/Resolution Stereo (Gallup et al., CVPR08
•Analysis of matching scores at various baselines and scales
– Location of global minimum unchanged– Value of global minimum preserved
Verification of consistent results
Variable Baseline/Resolution Stereo: comparison
Results
138
Depth Resolution Limits
139
Piecewise Planar Stereo
•Fit planes to scene•But also handle non-planar regions•Use planar appearance•Ensure consistency over video sequences
140
3D Reconstruction from Video
Street-Side Video Real-Time Stereo
141
Method
Video Frame Depthmap withRANSAC planes
Planar ClassProbability Map
Graph-Cut Labeling
3D Model
Flowchart
142
(Gallup et al. CVPR 2010)
149
Heightmap Model
• Enforces vertical facades• One continuous surface, no holes• 2.5D: Fast to compute, easy to store
150
(Gallup et al. 3DPVT 2010)
Occupancy Voting
d
)/exp( σdV −=
emptyV λ−=
depth
voxel
voxel
This is alog-oddslikelihood
camera 151
n-layer heightmap fusion
• Generalize to n-layer heightmap• Each layer is a transition from full/empty or
empty/full
• Compute layer positions with dynamic programming
• Use model selection (BIC) to determine number of layers
152
(Gallup et al. DAGM’10)
n-layer heightmap fusion
Richar
ICVSS 2010 153
Reconstruction from Photo Collections• Millions of images download from Flickr• Camera poses computed using (Li et al.
ECCV’08)
• Depthmaps from GPU planesweep
154
Results
156
More on stereo …
The Middleburry Stereo Vision Research Pagehttp://cat.middlebury.edu/stereo/
Recommended reading
D. Scharstein and R. Szeliski. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms.IJCV 47(1/2/3):7-42, April-June 2002. PDF file (1.15 MB) - includes current evaluation.Microsoft Research Technical Report MSR-TR-2001-81, November 2001. PDF file (1.27 MB).
256
Next week:2D Shape Matching