58
CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Embed Size (px)

Citation preview

Page 1: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

CSCE 643 Computer Vision: Structure from Motion

Jinxiang Chai

Page 2: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Stereo reconstruction

Given two or more images of the same scene or object, compute a representation of its shape

knownknowncameracamera

viewpointsviewpoints

Page 3: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Stereo reconstruction

Given two or more images of the same scene or object, compute a representation of its shape

knownknowncameracamera

viewpointsviewpoints

How to estimate camera parameters?

- where is the camera?

- where is it pointing?

- what are internal parameters, e.g. focal length?

Page 4: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Calibration from 2D motion

Structure from motion (SFM) - track points over a sequence of images

- estimate for 3D positions and camera positions

- calibrate intrinsic camera parameters before hand

Self-calibration: - solve for both intrinsic and extrinsic camera parameters

Page 5: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

SFM = Holy Grail of 3D Reconstruction

Take movie of object

Reconstruct 3D model

Would be

commercially

highly viable

Page 6: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

How to Get Feature Correspondences

Feature-based approach

- good for images

- feature detection (corners or sift features)

- feature matching using RANSAC (epipolar line)

Pixel-based approach

- good for video sequences

- patch based registration with lucas-kanade algorithm

- register features across the entire sequence

Page 7: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

A Brief Introduction on Feature-based Matching

Find a few important features (aka Interest Points)

Match them across two images

Compute image transformation function h

Page 8: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Feature Detection

-Two images taken at the same place with different angles

- Projective transformation H3X3

Page 9: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Feature Matching

?

-Two images taken at the same place with different angles

- Projective transformation H3X3

Page 10: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Feature Matching

?

-Two images taken at the same place with different angles

- Projective transformation H3X3

How do we match features across images? Any criterion?

Page 11: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Feature Matching

?

-Two images taken at the same place with different angles

- Projective transformation H3X3

How do we match features across images? Any criterion?

Page 12: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Feature Matching

Intensity/Color similarity• The intensity of pixels around the corresponding features should

have similar intensity

Page 13: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Feature Matching

Feature similarity (Intensity or SIFT signature)• The intensity of pixels around the corresponding features should

have similar intensity

• Cross-correlation, SSD

Page 14: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Feature Matching

Feature similarity (Intensity or SIFT signature)• The intensity of pixels around the corresponding features should

have similar intensity

• Cross-correlation, SSD

Distance constraint• The displacement of features should be smaller than a given

threshold

Page 15: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Feature Matching

Feature similarity (Intensity or SIFT signature)• The intensity of pixels around the corresponding features should

have similar intensity

• Cross-correlation, SSD

Distance constraint• The displacement of features should be smaller than a given

threshold

Epipolar line constraint• The corresponding pixels satisfy epipolar line constraints.

Page 16: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Feature Matching

Feature similarity (Intensity or SIFT signature)• The intensity of pixels around the corresponding features should

have similar intensity

• Cross-correlation, SSD

Distance constraint• The displacement of features should be smaller than a given

threshold

Epipolar line constraint• The corresponding pixels satisfy epipolar line constraints.

Fundamental matrix H

Page 17: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Feature-space Outlier Rejection

bad

Good

Page 18: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Feature-space Outlier Rejection

Can we now compute H3X3 from the blue points?

Page 19: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Feature-space Outlier Rejection

Can we now compute H3X3 from the blue points?

Page 20: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Feature-space Outlier Rejection

Can we now compute H3X3 from the blue points?

• No! Still too many outliers…

Page 21: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Feature-space Outlier Rejection

Can we now compute H3X3 from the blue points?• No! Still too many outliers…

• What can we do?

Page 22: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Feature-space Outlier Rejection

Can we now compute H3X3 from the blue points?• No! Still too many outliers…

• What can we do?

Robust estimation!

Page 23: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Robust Estimation: A Toy Example

How to fit a line based on a set of 2D points?

Page 24: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

RANSAC for Estimating Projective Transformation

RANSAC loop:Select four feature pairs (at random)

Compute the transformation matrix H (exact)

Compute inliers where SSD(pi’, H pi) < ε

Keep largest set of inliers

Re-compute least-squares H estimate on all of the inliers

For more detail, check

- http://research.microsoft.com/en-us/um/people/zhang/INRIA/software-FMatrix.html

- Philip H. S. Torr (1997). "The Development and Comparison of Robust Methods for Estimating

the Fundamental Matrix". International Journal of Computer Vision 24 (3): 271–300

Page 25: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Structure from Motion

Two Principal Solutions• Bundle adjustment (nonlinear optimization)

• Factorization (SVD, through orthographic approximation, affine geometry)

Page 26: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Projection Matrix

Perspective projection:

2D coordinates are just a nonlinear function of its 3D coordinates and camera parameters:

1100

0

1 3

2

1

3

2

1

0

0

i

i

i

T

T

T

y

x

i

i

z

y

x

t

t

t

r

r

r

vf

uf

v

u

33

32302

33

30213021

)(

)(

tPr

ttfPrvrfv

tPr

tuttfPrurrfu

T

yTT

yi

Tx

TTTx

i

K

);,,( iPTRKf

);,,( iPTRKg

R T P

Page 27: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Nonlinear Approach for SFM

What’s the difference between camera calibration and SFM?

Page 28: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Nonlinear Approach for SFM

M

j

N

iijj

jiijj

ji

TRK

PTRKgvPTRKfujj

1 1

22

}{},{,

));,,(());,,((minarg

What’s the difference between camera calibration and SFM?

- camera calibration: known 3D and 2D

Page 29: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Nonlinear Approach for SFM

M

j

N

iijj

jiijj

ji

TRKP

PTRKgvPTRKfujji

1 1

22

}{},{,},{

)),,,(()),,,((minarg

M

j

N

iijj

jiijj

ji

TRK

PTRKgvPTRKfujj

1 1

22

}{},{,

));,,(());,,((minarg

What’s the difference between camera calibration and SFM?

- camera calibration: known 3D and 2D

- SFM: unknown 3D and known 2D

Page 30: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Nonlinear Approach for SFM

M

j

N

iijj

jiijj

ji

TRKP

PTRKgvPTRKfujji

1 1

22

}{},{,},{

)),,,(()),,,((minarg

M

j

N

iijj

jiijj

ji

TRK

PTRKgvPTRKfujj

1 1

22

}{},{,

));,,(());,,((minarg

What’s the difference between camera calibration and SFM?

- camera calibration: known 3D and 2D

- SFM: unknown 3D and known 2D

- what’s 3D-to-2D registration problem?

Page 31: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Nonlinear Approach for SFM

M

j

N

iijj

jiijj

ji

TRKP

PTRKgvPTRKfujji

1 1

22

}{},{,},{

)),,,(()),,,((minarg

M

j

N

iijj

jiijj

ji

TRK

PTRKgvPTRKfujj

1 1

22

}{},{,

));,,(());,,((minarg

What’s the difference between camera calibration and SFM?

- camera calibration: known 3D and 2D

- SFM: unknown 3D and known 2D

- what’s 3D-to-2D registration problem?

Page 32: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

SFM: Bundle Adjustment

SFM = Nonlinear Least Squares problem

Minimize through• Gradient Descent

• Conjugate Gradient

• Gauss-Newton

• Levenberg Marquardt common method

Prone to local minima

M

j

N

iijj

jiijj

ji

TRKP

PTRKgvPTRKfujji

1 1

22

}{},{,},{

)),,,(()),,,((minarg

Page 33: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Count # Constraints vs #Unknowns

M camera poses

N points

2MN point constraints

6M+3N + 4 (unknowns)

Suggests: need 2mn 6m + 3n+4

But: Can we really recover all parameters???

M

j

N

iijj

jiijj

ji

TRKP

PTRKgvPTRKfujji

1 1

22

}{},{,},{

)),,,(()),,,((minarg

Page 34: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Count # Constraints vs #Unknowns

M camera poses

N points

2MN point constraints

6M+3N+4 unknowns (known intrinsic camera parameters)

Suggests: need 2mn 6m + 3n+4

But: Can we really recover all parameters???• Can’t recover origin, orientation (6 params)

• Can’t recover scale (1 param)

Thus, we need 2mn 6m + 3n+4 - 7

M

j

N

iijj

jiijj

ji

TRKP

PTRKgvPTRKfujji

1 1

22

}{},{,},{

)),,,(()),,,((minarg

Page 35: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Are We Done?

No, bundle adjustment has many local minima.

Page 36: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

SFM Using Factorization

12

1

2

1

i

i

i

T

T

i

i

z

y

x

t

t

r

r

v

u

Assume an orthographic camera

Image World

Page 37: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

SFM Using Factorization

12

1

2

1

i

i

i

T

T

i

i

z

y

x

t

t

r

r

v

u

Assume orthographic camera

Image World

i

i

i

T

T

N

ii

i

N

ii

i

z

y

x

r

r

N

vv

N

uu

2

1

1

1

Subtract the mean

Page 38: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

SFM Using Factorization

N

N

N

T

T

N

N

z

y

x

z

y

x

z

y

x

r

r

v

u

v

u

v

u

...

...

...

~

~

...

...~

~

~

~

2

2

2

1

1

1

2

1

2

2

1

1

Stack all the features from the same frame:

Page 39: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

SFM Using Factorization

N

N

N

T

T

N

N

z

y

x

z

y

x

z

y

x

r

r

v

u

v

u

v

u

...

...

...

~

~

...

...~

~

~

~

2

2

2

1

1

1

2

1

2

2

1

1

N

N

N

TF

TF

T

T

NF

NF

F

F

F

F

NF

NF

F

F

F

F

z

y

x

z

y

x

z

y

x

r

r

r

r

v

u

v

u

v

u

v

u

v

u

v

u

...

...

...

~

~

...

...~

~

~

~

~

~

...

...~

~

~

~

2

2

2

1

1

1

2,

1,

2,1

1,1

,

,

2,

2,

1,

1,

,

,

2,

2,

1,

1,

Stack all the features from the same frame:

Stack all the features from all the images:

W

Page 40: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

SFM Using Factorization

N

N

N

T

T

N

N

z

y

x

z

y

x

z

y

x

r

r

v

u

v

u

v

u

...

...

...

~

~

...

...~

~

~

~

2

2

2

1

1

1

2

1

2

2

1

1

N

N

N

TF

TF

T

T

NF

NF

F

F

F

F

NF

NF

F

F

F

F

z

y

x

z

y

x

z

y

x

r

r

r

r

v

u

v

u

v

u

v

u

v

u

v

u

...

...

...

~

~

...

...~

~

~

~

~

~

...

...~

~

~

~

2

2

2

1

1

1

2,

1,

2,1

1,1

,

,

2,

2,

1,

1,

,

,

2,

2,

1,

1,

NFW 2

~

Stack all the features from the same frame:

Stack all the features from all the images:

W

32 FM NS 3

Page 41: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

SFM Using Factorization

N

N

N

TF

TF

T

T

NF

NF

F

F

F

F

NF

NF

F

F

F

F

z

y

x

z

y

x

z

y

x

r

r

r

r

v

u

v

u

v

u

v

u

v

u

v

u

...

...

...

~

~

...

...~

~

~

~

~

~

...

...~

~

~

~

2

2

2

1

1

1

2,

1,

2,1

1,1

,

,

2,

2,

1,

1,

,

,

2,

2,

1,

1,

NFW 2

~32 FM

Stack all the features from all the images:

W

NS 3

Factorize the matrix into two matrix using SVD:

NFW 2

~

TNF

TNF VSUMVUW 2

1

32

1

322

~~~

Page 42: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

SFM Using Factorization

N

N

N

TF

TF

T

T

NF

NF

F

F

F

F

NF

NF

F

F

F

F

z

y

x

z

y

x

z

y

x

r

r

r

r

v

u

v

u

v

u

v

u

v

u

v

u

...

...

...

~

~

...

...~

~

~

~

~

~

...

...~

~

~

~

2

2

2

1

1

1

2,

1,

2,1

1,1

,

,

2,

2,

1,

1,

,

,

2,

2,

1,

1,

NFW 2

~32 FM

Stack all the features from all the images:

NS 3

Factorize the matrix into two matrix using SVD:

NFW 2

~

TNF

TNF VSUMVUW 2

1

32

1

322

~~~

NNFF SQSQMM

31

333333232

~~

Page 43: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

SFM Using Factorization

N

N

N

TF

TF

T

T

NF

NF

F

F

F

F

NF

NF

F

F

F

F

z

y

x

z

y

x

z

y

x

r

r

r

r

v

u

v

u

v

u

v

u

v

u

v

u

...

...

...

~

~

...

...~

~

~

~

~

~

...

...~

~

~

~

2

2

2

1

1

1

2,

1,

2,1

1,1

,

,

2,

2,

1,

1,

,

,

2,

2,

1,

1,

NFW 2

~32 FM

Stack all the features from all the images:

W

NS 3

Factorize the matrix into two matrix using SVD:

NFW 2

~

TNF

TNF VSUMVUW 2

1

32

1

322

~~~

NNFF SQSQMM

31

333333232

~~

How to compute the matrix ? 33Q

Page 44: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

SFM Using Factorization

2,2,2,11,1

2,

1,

2,1

1,1

3232 FF

TF

TF

T

T

TFF rrrr

r

r

r

r

MM

M is the stack of rotation matrix:

2,2,

2,1,

1,2,

1,1,

2,12,1

2,11,1

1,12,1

1,11,1

FTF

FTF

FTF

FTF

T

T

T

T

rr

rr

rr

rr

rr

rr

rr

rr

Page 45: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

SFM Using Factorization

2,2,2,11,1

2,

1,

2,1

1,1

3232 FF

TF

TF

T

T

TFF rrrr

r

r

r

r

MM

M is the stack of rotation matrix:

2,2,

2,1,

1,2,

1,1,

2,12,1

2,11,1

1,12,1

1,11,1

FTF

FTF

FTF

FTF

T

T

T

T

rr

rr

rr

rr

rr

rr

rr

rr

1 010

1 010

Orthogonal constraints from rotation matrix

Page 46: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

SFM Using Factorization

2,2,2,11,1

2,

1,

2,1

1,1

3232 FF

TF

TF

T

T

TFF rrrr

r

r

r

r

MM

2,2,

2,1,

1,2,

1,1,

2,12,1

2,11,1

1,12,1

1,11,1

FTF

FTF

FTF

FTF

T

T

T

T

rr

rr

rr

rr

rr

rr

rr

rr

M is the stack of rotation matrix:

1 010

1 010

Orthogonal constraints from rotation matrix

TF

TF MQQM 32333332

~~

Page 47: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

SFM Using Factorization

TF

TF MQQM 32333332

~~

2,2,

2,1,

1,2,

1,1,

2,12,1

2,11,1

1,12,1

1,11,1

FTF

FTF

FTF

FTF

T

T

T

T

rr

rr

rr

rr

rr

rr

rr

rr

1 010

1 010

Orthogonal constraints from rotation matrices:

Page 48: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

SFM Using Factorization

TF

TF MQQM 32333332

~~

2,2,

2,1,

1,2,

1,1,

2,12,1

2,11,1

1,12,1

1,11,1

FTF

FTF

FTF

FTF

T

T

T

T

rr

rr

rr

rr

rr

rr

rr

rr

1 010

1 010

Orthogonal constraints from rotation matrices:

QQ: symmetric 3 by 3 matrix

Page 49: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

SFM Using Factorization

TF

TF MQQM 32333332

~~

2,2,

2,1,

1,2,

1,1,

2,12,1

2,11,1

1,12,1

1,11,1

FTF

FTF

FTF

FTF

T

T

T

T

rr

rr

rr

rr

rr

rr

rr

rr

1 010

1 010

Orthogonal constraints from rotation matrices:

How to compute QQT?

least square solution

- 4F linear constraints, 9 unknowns (6 independent due to symmetric matrix)

QQ: symmetric 3 by 3 matrix

Page 50: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

SFM Using Factorization

TF

TF MQQM 32333332

~~

2,2,

2,1,

1,2,

1,1,

2,12,1

2,11,1

1,12,1

1,11,1

FTF

FTF

FTF

FTF

T

T

T

T

rr

rr

rr

rr

rr

rr

rr

rr

1 010

1 010

Orthogonal constraints from rotation matrices:

How to compute QQT?

least square solution

- 4F linear constraints, 9 unknowns (6 independent due to symmetric matrix) How to compute Q from QQT:

SVD again: 2

1

UQVUQQ T

QQ: symmetric 3 by 3 matrix

Page 51: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

SFM Using Factorization

2,2,2,11,1

2,

1,

2,1

1,1

3232 FF

TF

TF

T

T

TFF rrrr

r

r

r

r

MM

2,2,

2,1,

1,2,

1,1,

2,12,1

2,11,1

1,12,1

1,11,1

FTF

FTF

FTF

FTF

T

T

T

T

rr

rr

rr

rr

rr

rr

rr

rr

M is the stack of rotation matrix:

1 010

1 010

Orthogonal constraints from rotation matrix

TF

TF MQQM 32333332

~~

QQT: symmetric 3 by 3 matrix

Computing QQT is easy:

- 3F linear equations

- 6 independent unknowns

Page 52: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

SFM Using Factorization

1. Form the measurement matrix

2. Decompose the matrix into two matrices and using SVD

3. Compute the matrix Q with least square and SVD

4. Compute the rotation matrix and shape matrix:

and

NFW 2

~

NS 3

~ 32

~FM

QMM F 32

~ 32

1 ~

FSQS

Page 53: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Weak-perspective Projection

Factorization also works for weak-perspective projection (scaled orthographic projection):

d z0

12

1

2

1

i

i

i

T

T

i

i

z

y

x

t

t

r

r

v

u

Page 54: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

Factorization for Full-perspective Cameras

[Han and Kanade]

Page 55: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

SFM for Deformable Objects

For detail, click here

Page 56: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

SFM for Articulated Objects

For video, click here

Page 57: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

SFM Using Factorization

Bundle adjustment (nonlinear optimization) - work with perspective camera model - work with incomplete data - prone to local minima

Factorization: - closed-form solution for weak perspective camera - simple and efficient - usually need complete data - becomes complicated for full-perspective camera model

Phil Torr’s structure from motion toolkit in matlab (click here)

Voodoo camera tracker (click here)

Page 58: CSCE 643 Computer Vision: Structure from Motion Jinxiang Chai

All Together Video

Click here

- feature detection

- feature matching (epipolar geometry)

- structure from motion

- stereo reconstruction

- triangulation

- texture mapping