Upload
trygg
View
72
Download
1
Embed Size (px)
DESCRIPTION
A Global Linear Method for Camera Pose Registration. Nianjuan Jiang* 1 , Zhaopeng Cui* 2 , Ping Tan 2 1 Advanced Digital Sciences Center, Singapore 2 National University of Singapore *Joint first authors. Structure from Motion ( SfM ). - PowerPoint PPT Presentation
Citation preview
1
A Global Linear Method for Camera Pose Registration
Nianjuan Jiang*1, Zhaopeng Cui*2, Ping Tan2
1Advanced Digital Sciences Center, Singapore2National University of Singapore
*Joint first authors
2
Structure from Motion (SfM)
Simultaneously recover both 3D scene points and camera poses
3
SfM PipelineStep 1. Epipolar geometry;
compute relative motion between 2 or 3 cameras• 6-point method [Quan 1995]• 7-point method [Torr & Murray 1997]• 8-point method (normalized) [Hartley 1997] • 5-point method [Nister 2004]
Images with matched feature points
4
SfM PipelineStep 1. Epipolar geometry;Step 2. Camera registration;
put all cameras in the same coordinate system (auto-calibration if needed [Pollefeys et al. 1998])
• [Fitzgibbon & Zisserman 1998]• [Pollefeys et al. 2004]
5
SfM PipelineStep 1. Epipolar geometry;Step 2. Camera registration;Step 3. Bundle adjustment.
optimize all cameras and points• [Triggs et al. 1999]
6
“The Black Art ”Step 1. Epipolar geometry;Step 2. Camera registration;Step 3. Bundle adjustment.
The state-of-the-art:1. Step 1 and 3 are very well studied with
elegant theories and algorithms.
2. The step 2 is often ad-hoc and heuristic.
The camera registration to initialize bundle adjustment “… is still to some extent a black art…”.
Page 452, Chapter 18.6
7
Typical Solutions
[Lhuillier & Quan 2005]
Hierarchical solution:Iteratively merge sub-sequences
[Fitzgibbon & Zisserman 1998]
8
Typical Solutions
[Lhuillier & Quan 2005]
Hierarchical solution:Iteratively merge sub-sequences
[Fitzgibbon & Zisserman 1998]
[Pollefeys et al. 2004]
Incremental solution: Iteratively add cameras one by one
[Snavely et al. 2006]
9
The block diagram (for the incremental solution):
Drawbacks:1. Repetitively calling bundle adjustment Inefficiency 90% of the total computation time is spent on bundle adjustment.2. Some cameras are fixed before the others asymmetric formulation leads to inferior results.
Pain of Existing Solutions
Our objective:Simultaneously register all cameras to
initialize the bundle adjustment
Add Cameras Bundle Adjustment More Cameras?
Initial Reconstruction
(2 cameras)
Step 1: Epipolar Geometry Register All Cameras in a Single Step Step 3: Bundle
Adjustment
10
Previous Works
L
[Govindu 2001]
[Martinec et al. 2007] [Arie-Nachimson et al. 2012][Kahl 2005]
linear global solution to rotations
[Hartley et al. 2013]
elegant quasi-convex optimization linear global solution to translations
[Crandall et al. 2011]
discrete-continuous optimization
cannot solve translations
sensitive to outliers
require coplanar cameras
degenerate at collinear motion
Desirable features:1. Solve both rotations & translations;2. Linear & robust solution;3. No degeneracy.
11
The Input Epipolar GeometryThe essential matrix encodes the relative motion
𝐸𝑖𝑗= [𝑡 𝑖𝑗 ]×𝑅𝑖𝑗
𝑅𝑖𝑗
𝑡𝑖𝑗𝐸𝑖𝑗 𝑡𝑖𝑗𝑅𝑖𝑗 and
12
A linear equation from every two cameras
Rotation Registration
𝑅𝑖
𝑅 𝑗=𝑅𝑖𝑗𝑅 𝑖
𝑅𝑖=[ , ,]
𝑅𝑖𝑗
𝑟3𝑖𝑟2
𝑖𝑟1𝑖
⨀⨀𝑅 𝑗
[Martinec et al. 2007]
𝑅2=𝑅12 𝑅1{cam1 , cam 2 }
……
𝑅3=𝑅23𝑅3{cam 2, cam 3 }
𝑅𝑛=𝑅𝑚𝑛𝑅𝑚{camm , camn }
13
Input:
Relative translations:
Output:
Camera positions:
ci cj
ck
Translation Registration (3 cameras)
𝑐 𝑖𝑘
𝑐 𝑖𝑗
𝑐 𝑗𝑘
14
Translation Registration (3 cameras)
Suppose , are known, can be computed by:
ci cj
ck
𝑐𝑘−𝑐𝑖=𝑅 𝑖(𝜃 𝑖❑ )𝑠𝑖𝑗
𝑖𝑘(𝑐 𝑗−𝑐 𝑖)
cj
A linear equation:
𝑅𝑖 (𝜃 𝑖❑ )
𝑠𝑖𝑗𝑖𝑘
𝜃𝑖❑
𝑐 𝑖𝑘
𝑐 𝑖𝑗
𝑅𝑖 (𝜃 𝑖❑ )𝑠𝑖𝑗𝑖𝑘
1. rotate to match the orientation of 2. shrink/grow to match the length of
both are easy to compute
15
Translation Registration (3 cameras)
A similar linear equation by matching and
𝑐𝑘−𝑐 𝑗=𝑅 𝑗 (−𝜃 𝑗❑ ) 𝑠𝑖𝑗
𝑗𝑘(𝑐 𝑖−𝑐 𝑗)
ci cj
ck
ci
𝜃 𝑗❑
𝑐 𝑖𝑗
𝑐 𝑗𝑘
16
Translation Registration (3 cameras)
A geometric explanation
ijc
jkc
ci cj
𝑐𝑘−𝑐𝑖=𝑅 𝑖 (𝜃 𝑖❑ )𝑠𝑖𝑗
𝑖𝑘(𝑐 𝑗−𝑐 𝑖)
𝑐𝑘−𝑐 𝑗=𝑅 𝑗 (−𝜃 𝑗❑ ) 𝑠𝑖𝑗
𝑗𝑘(𝑐 𝑖−𝑐 𝑗)
ikc
𝜋 1 𝜋 2
: plane spanned by and
: plane spanned by and
and are non-coplanar
ck
17
Translation Registration (3 cameras)
A geometric explanation
ijc
jkc
ci cj
ck
𝑐𝑘−𝑐𝑖=𝑅 𝑖 (𝜃 𝑖❑ )𝑠𝑖𝑗
𝑖𝑘(𝑐 𝑗−𝑐 𝑖)
𝑐𝑘−𝑐 𝑗=𝑅 𝑗 (−𝜃 𝑗❑ ) 𝑠𝑖𝑗
𝑗𝑘(𝑐 𝑖−𝑐 𝑗)
Bikc
A
𝜋 1 𝜋 2 : the mutual perpendicular line
: the middle point of
≈ A
≈𝐵𝑐𝑘=𝑐 𝑖+𝑅𝑖 (𝜃𝑖
❑) 𝑠𝑖𝑗𝑖𝑘(𝑐 𝑗−𝑐𝑖)
𝑐𝑘=𝑐 𝑗+𝑅 𝑗 (−𝜃 𝑗❑) 𝑠𝑖𝑗
𝑗𝑘(𝑐𝑖−𝑐 𝑗)
Our linear equations minimizes an approximate geometric error!
see derivation in the paper
18
Translation Registration (3 cameras)No degeneracy with collinear motion
ci cj
ck𝑐 𝑖𝑘
𝑐 𝑖𝑗
𝑐 𝑗𝑘
𝑐𝑘−𝑐𝑖=𝑅 𝑖 (0 )𝑠𝑖𝑗𝑖𝑘(𝑐 𝑗−𝑐 𝑖)
𝑐𝑘−𝑐 𝑗=𝑅 𝑗 ( 0 )𝑠𝑖𝑗𝑗𝑘(𝑐 𝑖−𝑐 𝑗)
19
Translation Registration (3 cameras)
Suppose , are known, can be computed by:
ci cj
ck
𝜃𝑖❑
𝜃𝑘❑𝑐 𝑖𝑘
𝑐 𝑖𝑗
𝑐 𝑗𝑘
𝑐 𝑗−𝑐 𝑖=𝑅𝑖 (−𝜃 𝑖❑ )𝑠𝑖𝑘𝑖𝑗 (𝑐𝑘−𝑐𝑖)
𝑐 𝑗−𝑐𝑘=𝑅𝑘 (𝜃𝑘 )𝑠𝑖𝑘𝑗𝑘(𝑐 𝑖−𝑐𝑘)
20
Translation Registration (3 cameras)
Suppose , are known, can be computed by:
ci cj
ck
𝜃𝑘❑
𝜃 𝑗❑
𝑐 𝑖𝑘
𝑐 𝑖𝑗
𝑐 𝑗𝑘
𝑐 𝑖−𝑐𝑘=𝑅𝑘 (−𝜃𝑘❑ )𝑠 𝑗𝑘
𝑖𝑘 (𝑐 𝑗−𝑐𝑘)
𝑐 𝑖−𝑐 𝑗=𝑅 𝑗 (𝜃 𝑗 ) 𝑠 𝑗𝑘𝑖𝑗 (𝑐𝑘−𝑐 𝑗)
21
Translation Registration (3 cameras)
Collecting all six equations
𝐵𝑖𝑗𝑘(𝑐 𝑖
𝑐 𝑗𝑐𝑘)=0
Translation Registration (n cameras)
1. Collect equations from all triangles in the match graph.
𝐵2 (𝑐2 ,𝑐3 ,𝑐4 )=0𝐵1 (𝑐1,𝑐2 ,𝑐3 )=0
2. Solve all equations
Generalize to n cameras
𝐵𝑌=0 𝑌=[𝑐1
𝑐2
𝑐3
𝑐4
𝑐5
𝑐6
𝑐7
𝑐8
𝑐9
]The match graph:each camera is a vertex,connect two cameras if their relative motion is known.
cameras can be non-coplanar.
23
TriangulationOnce cameras are fixed, triangulate matched corners to generate 3D points.
24
Robustness Issues• Exclude unreliable triplets• More consistency checks in the paper
𝑐 𝑖𝑘
𝑐 𝑖𝑗
𝑐 𝑗𝑘
�̂� 𝑖𝑘 �̂� 𝑗𝑘
�̂� 𝑖𝑗
Check if ??
ResultsAccuracy evaluation:Compare with recent methods on data with known ground truth.
Fountain-P11 Herz-Jesu-P25 Castle-P30
c meters
R degrees
c meters
R degrees
c meters
R degrees
Ours 0.0139 0.1954 0.0636 0.1880 0.2345 0.4800
[Arie-Nachimson et al. 2012] 0.0226 0.4211 0.0479 0.3125 - -
[Sinha et al. 2010] 0.1317 - 0.2538 - - -
VisualSFM 0.0364 0.2794 0.0551 0.2868 0.2639 0.3980
Fountain-P11 Herz-Jesu-P25 Castle-P30
All results are after the final bundle adjustment.
ResultsEfficiency evaluation:
Building (128) Notre Dame (371) Pisa (481) Trevi Fountain (1259)
Our Method
Visual-SFM
Our Method
Visual-SFM
Our Method
Visual-SFM
Our Method
Visual-SFM
Total running time (s)* 17 62 49 479 69 479 135 1790
BA time (s) 11 57 20 442 52 444 61 1715
Registration time (s) 6 5 29 37 17 12 74 75
# of reconstructed images
128 128 362 365 479 480 1255 1253
# of reconstructed points 91,290 78,100 103,629 104,657 134,555 129,484 297,766 292,277
* The total running time excludes the time spent on feature matching and epipolar geometry computation.
Building Notre Dame Pisa Trevi Fountain
27
Conclusions
• A global solution for orientations & positions;• Linear, robust & geometrically meaningful;• No degeneracy.
Thanks!
code & data available at:http://www.ece.nus.edu.sg/stfpage/eletp/
29
A large scale scene
Results
Quasi-dense points generated by CMVS [Furukawa et al. 2010] for better visualization.