A Global Linear Method for Camera Pose Registration

1

A Global Linear Method for Camera Pose Registration

Nianjuan Jiang*1, Zhaopeng Cui*2, Ping Tan2

1Advanced Digital Sciences Center, Singapore2National University of Singapore

*Joint first authors

2

Structure from Motion (SfM)

Simultaneously recover both 3D scene points and camera poses

3

SfM PipelineStep 1. Epipolar geometry;

compute relative motion between 2 or 3 cameras• 6-point method [Quan 1995]• 7-point method [Torr & Murray 1997]• 8-point method (normalized) [Hartley 1997] • 5-point method [Nister 2004]

Images with matched feature points

4

SfM PipelineStep 1. Epipolar geometry;Step 2. Camera registration;

put all cameras in the same coordinate system (auto-calibration if needed [Pollefeys et al. 1998])

• [Fitzgibbon & Zisserman 1998]• [Pollefeys et al. 2004]

5

SfM PipelineStep 1. Epipolar geometry;Step 2. Camera registration;Step 3. Bundle adjustment.

optimize all cameras and points• [Triggs et al. 1999]

6

“The Black Art ”Step 1. Epipolar geometry;Step 2. Camera registration;Step 3. Bundle adjustment.

The state-of-the-art:1. Step 1 and 3 are very well studied with

elegant theories and algorithms.

2. The step 2 is often ad-hoc and heuristic.

The camera registration to initialize bundle adjustment “… is still to some extent a black art…”.

Page 452, Chapter 18.6

7

Typical Solutions

[Lhuillier & Quan 2005]

Hierarchical solution:Iteratively merge sub-sequences

[Fitzgibbon & Zisserman 1998]

8

Typical Solutions

[Lhuillier & Quan 2005]

Hierarchical solution:Iteratively merge sub-sequences

[Fitzgibbon & Zisserman 1998]

[Pollefeys et al. 2004]

Incremental solution: Iteratively add cameras one by one

[Snavely et al. 2006]

9

The block diagram (for the incremental solution):

Drawbacks:1. Repetitively calling bundle adjustment Inefficiency 90% of the total computation time is spent on bundle adjustment.2. Some cameras are fixed before the others asymmetric formulation leads to inferior results.

Pain of Existing Solutions

Our objective:Simultaneously register all cameras to

initialize the bundle adjustment

Add Cameras Bundle Adjustment More Cameras?

Initial Reconstruction

(2 cameras)

Step 1: Epipolar Geometry Register All Cameras in a Single Step Step 3: Bundle

Adjustment

10

Previous Works

L

[Govindu 2001]

[Martinec et al. 2007] [Arie-Nachimson et al. 2012][Kahl 2005]

linear global solution to rotations

[Hartley et al. 2013]

elegant quasi-convex optimization linear global solution to translations

[Crandall et al. 2011]

discrete-continuous optimization

cannot solve translations

sensitive to outliers

require coplanar cameras

degenerate at collinear motion

Desirable features:1. Solve both rotations & translations;2. Linear & robust solution;3. No degeneracy.

11

The Input Epipolar GeometryThe essential matrix encodes the relative motion

𝐸𝑖𝑗= [𝑡 𝑖𝑗 ]×𝑅𝑖𝑗

𝑅𝑖𝑗

𝑡𝑖𝑗𝐸𝑖𝑗 𝑡𝑖𝑗𝑅𝑖𝑗 and

12

A linear equation from every two cameras

Rotation Registration

𝑅𝑖

𝑅 𝑗=𝑅𝑖𝑗𝑅 𝑖

𝑅𝑖=[ , ,]

𝑅𝑖𝑗

𝑟3𝑖𝑟2

𝑖𝑟1𝑖

⨀⨀𝑅 𝑗

[Martinec et al. 2007]

𝑅2=𝑅12 𝑅1{cam1 , cam 2 }

……

𝑅3=𝑅23𝑅3{cam 2, cam 3 }

𝑅𝑛=𝑅𝑚𝑛𝑅𝑚{camm , camn }

13

Input:

Relative translations:

Output:

Camera positions:

ci cj

ck

Translation Registration (3 cameras)

𝑐 𝑖𝑘

𝑐 𝑖𝑗

𝑐 𝑗𝑘

14


Suppose , are known, can be computed by:

ci cj

ck

𝑐𝑘−𝑐𝑖=𝑅 𝑖(𝜃 𝑖❑ )𝑠𝑖𝑗

𝑖𝑘(𝑐 𝑗−𝑐 𝑖)

cj

A linear equation:

𝑅𝑖 (𝜃 𝑖❑ )

𝑠𝑖𝑗𝑖𝑘

𝜃𝑖❑

𝑐 𝑖𝑘

𝑐 𝑖𝑗

𝑅𝑖 (𝜃 𝑖❑ )𝑠𝑖𝑗𝑖𝑘

1. rotate to match the orientation of 2. shrink/grow to match the length of

both are easy to compute

15


A similar linear equation by matching and

𝑐𝑘−𝑐 𝑗=𝑅 𝑗 (−𝜃 𝑗❑ ) 𝑠𝑖𝑗

𝑗𝑘(𝑐 𝑖−𝑐 𝑗)

ci cj

ck

ci

𝜃 𝑗❑

𝑐 𝑖𝑗

𝑐 𝑗𝑘

16


A geometric explanation

ijc

jkc

ci cj

𝑐𝑘−𝑐𝑖=𝑅 𝑖 (𝜃 𝑖❑ )𝑠𝑖𝑗




ikc

𝜋 1 𝜋 2

: plane spanned by and

: plane spanned by and

and are non-coplanar

ck

17


A geometric explanation

ijc

jkc

ci cj

ck

𝑐𝑘−𝑐𝑖=𝑅 𝑖 (𝜃 𝑖❑ )𝑠𝑖𝑗




Bikc

A

𝜋 1 𝜋 2 : the mutual perpendicular line

: the middle point of

≈ A

≈𝐵𝑐𝑘=𝑐 𝑖+𝑅𝑖 (𝜃𝑖

❑) 𝑠𝑖𝑗𝑖𝑘(𝑐 𝑗−𝑐𝑖)

𝑐𝑘=𝑐 𝑗+𝑅 𝑗 (−𝜃 𝑗❑) 𝑠𝑖𝑗

𝑗𝑘(𝑐𝑖−𝑐 𝑗)

Our linear equations minimizes an approximate geometric error!

see derivation in the paper

18

Translation Registration (3 cameras)No degeneracy with collinear motion

ci cj

ck𝑐 𝑖𝑘

𝑐 𝑖𝑗

𝑐 𝑗𝑘

𝑐𝑘−𝑐𝑖=𝑅 𝑖 (0 )𝑠𝑖𝑗𝑖𝑘(𝑐 𝑗−𝑐 𝑖)

𝑐𝑘−𝑐 𝑗=𝑅 𝑗 ( 0 )𝑠𝑖𝑗𝑗𝑘(𝑐 𝑖−𝑐 𝑗)

19



ci cj

ck

𝜃𝑖❑

𝜃𝑘❑𝑐 𝑖𝑘

𝑐 𝑖𝑗

𝑐 𝑗𝑘

𝑐 𝑗−𝑐 𝑖=𝑅𝑖 (−𝜃 𝑖❑ )𝑠𝑖𝑘𝑖𝑗 (𝑐𝑘−𝑐𝑖)

𝑐 𝑗−𝑐𝑘=𝑅𝑘 (𝜃𝑘 )𝑠𝑖𝑘𝑗𝑘(𝑐 𝑖−𝑐𝑘)

20



ci cj

ck

𝜃𝑘❑

𝜃 𝑗❑

𝑐 𝑖𝑘

𝑐 𝑖𝑗

𝑐 𝑗𝑘

𝑐 𝑖−𝑐𝑘=𝑅𝑘 (−𝜃𝑘❑ )𝑠 𝑗𝑘

𝑖𝑘 (𝑐 𝑗−𝑐𝑘)

𝑐 𝑖−𝑐 𝑗=𝑅 𝑗 (𝜃 𝑗 ) 𝑠 𝑗𝑘𝑖𝑗 (𝑐𝑘−𝑐 𝑗)

21


Collecting all six equations

𝐵𝑖𝑗𝑘(𝑐 𝑖

𝑐 𝑗𝑐𝑘)=0

Translation Registration (n cameras)

1. Collect equations from all triangles in the match graph.

𝐵2 (𝑐2 ,𝑐3 ,𝑐4 )=0𝐵1 (𝑐1,𝑐2 ,𝑐3 )=0

2. Solve all equations

Generalize to n cameras

𝐵𝑌=0 𝑌=[𝑐1

𝑐2

𝑐3

𝑐4

𝑐5

𝑐6

𝑐7

𝑐8

𝑐9

]The match graph:each camera is a vertex,connect two cameras if their relative motion is known.

cameras can be non-coplanar.

23

TriangulationOnce cameras are fixed, triangulate matched corners to generate 3D points.

24

Robustness Issues• Exclude unreliable triplets• More consistency checks in the paper

𝑐 𝑖𝑘

𝑐 𝑖𝑗

𝑐 𝑗𝑘

�̂� 𝑖𝑘 �̂� 𝑗𝑘

�̂� 𝑖𝑗

Check if ??

ResultsAccuracy evaluation:Compare with recent methods on data with known ground truth.

Fountain-P11 Herz-Jesu-P25 Castle-P30

c meters

R degrees

c meters

R degrees

c meters

R degrees

Ours 0.0139 0.1954 0.0636 0.1880 0.2345 0.4800

[Arie-Nachimson et al. 2012] 0.0226 0.4211 0.0479 0.3125 - -

[Sinha et al. 2010] 0.1317 - 0.2538 - - -

VisualSFM 0.0364 0.2794 0.0551 0.2868 0.2639 0.3980

Fountain-P11 Herz-Jesu-P25 Castle-P30

All results are after the final bundle adjustment.

ResultsEfficiency evaluation:

Building (128) Notre Dame (371) Pisa (481) Trevi Fountain (1259)

Our Method

Visual-SFM

Our Method

Visual-SFM

Our Method

Visual-SFM

Our Method

Visual-SFM

Total running time (s)* 17 62 49 479 69 479 135 1790

BA time (s) 11 57 20 442 52 444 61 1715

Registration time (s) 6 5 29 37 17 12 74 75

# of reconstructed images

128 128 362 365 479 480 1255 1253

# of reconstructed points 91,290 78,100 103,629 104,657 134,555 129,484 297,766 292,277

* The total running time excludes the time spent on feature matching and epipolar geometry computation.

Building Notre Dame Pisa Trevi Fountain

27

Conclusions

• A global solution for orientations & positions;• Linear, robust & geometrically meaningful;• No degeneracy.

Thanks!

code & data available at:http://www.ece.nus.edu.sg/stfpage/eletp/

29

A large scale scene

Results

Quasi-dense points generated by CMVS [Furukawa et al. 2010] for better visualization.

Documents

A Global Linear Method for Camera Pose Registration