Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit

Low Complexity Keypoint Recognition and Pose Estimation

Vincent Lepetit

Real-Time 3D Object Detection

Runs at 15 Hz

QuickTime™ and a decompressor

are needed to see this picture.



3

Keypoint Recognition

Pre-processingMake the actual classification easier

Nearest neighbor classification

One class per keypoint: the set of the keypoint’s possible appearances under various perspective, lighting, noise...

The general approach [Lowe, Matas, Mikolajczyk] is a particular case of classification:

Search in the Database

Search in the Database

4

Used at run-time to recognize the keypoints

Training phase Classifier

5

A New Classifier: FernsJoint Work with Mustafa Özuysal

6

Compromise:

which is proportional to

but complete representation of the joint distribution infeasible.

Naive Bayesian ignores the correlation:

We are looking for

€

argmaxi

P(C = c i patch)

If patch can be represented by a set of image features { fi }:

€

P(C = c i patch) = P(C = c i f1, f2,K fn, fn +1,K K fN )

Presentation on an Example

Ferns: TrainingThe tests compare the intensities of two pixels around the keypoint:

Invariant to light change by any raising function.

Posterior probabilities:

Ferns: Training

6

1

5

0

1

1

1

0

0

1

0

1

++

++

++

Ferns: Training

Ferns: Training Results

Ferns: Recognition

It Really Works





14

Ferns outperform Trees500 classes.

No orientation or perspective correction.

FERNS

TREES

Number of structures

Recognition rateFerns responses are combined multiplicatively(Naive Bayesian rule)

Trees responses are combined additively(average)

Optimized Locations versus Random Locations:We Can Use Random Tests

Number of trees

Recognition rate Information gain optimizationRandomness

Comparison of the recognition rates for 200 keypoints:

16

We Can Use Random Tests

For a small number of classeswe can try several tests, and

retain the best one according to some criterion.

17

We Can Use Random Tests

For a small number of classeswe can try several tests, and

retain the best one according to some criterion.

When the number of classes is largeany test does a decent job:

18

Another Graphical Interpretation

19


20


21


22


23


24

We Can Use Random Tests:Why It Is Interesting

Building the ferns takes no time (except for the posterior probabilities estimation);

Simplifies the classifier structure;

Allows incremental learning.

25

Comparison with SIFTRecognition rate

FERNS

SIFT

Frame Index

Number of Inliers

26

Comparison with SIFTComputation time

• SIFT: 1 ms to compute the descriptor of a keypoint (without including convolution);

• FERNS: 13.5 micro-second to classify one keypoint into 200 classes.

27

1: for(int i = 0; i < H; i++) P[i ] = 0.; 2: for(int k = 0; k < M; k++) { 3: int index = 0, * d = D + k * 2 * S; 4: for(int j = 0; j < S; j++) { 5: index <<= 1; 6: if (*(K + d[0]) < *(K + d[1])) 7: index++; 8: d += 2; } 9: p = PF + k * shift2 + index * shift1;10: for(int i = 0; i < H; i++) P[i] += p[i]; }

Very simple to implement;No need for orientation nor perspective correction;(Almost) no parameters to tune;Very fast.

Keypoint Recognition in Ten Lines of Code

28

Ferns Tuning

• The number of ferns, and

• The number of tests per ferns

can be tuned to adapt to the hardware in terms of CPU power and memory size.

Feature Harvesting

Estimate the posterior probabilities from a training video sequence:

QuickTime™ and aYUV420 codec decompressor


QuickTime™ and aYUV420 codec decompressor


Feature Harvesting

Update Classifier

Detect Object in Current Frame

With the ferns, we can easily:

- add a class;

- remove a class;

- add samples of a class to refine the classifier.

Incremental learning

No need to store image patches; We can select the keypoints the classifier can recognize.

Training examplesMatches

Test Sequence



Handling Light Changes













35



Low Complexity Keypoint Recognition and Pose Estimation

37

EPnP: An Accurate Non-Iterative O(n) Solution to the PnP Problem

Joint Work with Francesc Moreno-Noguer

38

The Perspective-n-Point (PnP) Problem

How to take advantage of the internal parameters ?

Solutions exist for the specific cases n = 3 [...], n = 4 [...], n = 5 [...], and the general case [...].

Rotation, Translation ?Internal

parameters known

2D/3D correspondences

known

39

A Stable Algorithm

MEAN MEDIAN

Rotation Error (%)

€

qtrue − q

q

⎛

⎝ ⎜

⎞

⎠ ⎟

Number of points used to estimate pose

LHM: Lu-Hager-Mjolsness, Fast and Globally Convergent Pose Estimation from Video Images. PAMI'00. (Alternatively optimize over Rotation and Translation);

EPnP: Our method.

40

A Fast Algorithm

Rotation Error (%)

Computation Time (sec) - Logarithmic scale

€

qtrue − q

q

⎛

⎝ ⎜

⎞

⎠ ⎟

MEDIAN

41

General Approach

Estimate the coordinates of the 3D points in the camera coordinate system.

€

knownpiworld

€

knownpiworld

€

picamera ?

€

picamera ?

€

estimatedpicamera

€

estimatedpicamera

Rotation, Translation[Lu et al. PAMI00]

Rotation, Translation[Lu et al. PAMI00]

42

€

pi = α ijc j

j=1

4

∑

Introducing Control PointsThe 3D points are expressed as a weighted sum of four control points.

€

pi

€

c1

€

c2

€

c3

€

c4

€

x =

c1camera

c2camera

c3camera

c4camera

⎡

⎣

⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥

€

x =

c1camera

c2camera

c3camera

c4camera

⎡

⎣

⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥

12 unknowns: The coordinates of the control points in the camera coordinates system.

43

The Point Reprojections Give a Linear System

€

wi

ui

1

⎡

⎣ ⎢

⎤

⎦ ⎥= Ap

i

camera = A α ijc jcamera

j=1

4

∑

For each correspondence i:

Rewriting and Concatenating the Equations from all the Correspondences:

€

Mx = 0

€

Mx = 0

€

withx =

c1camera

c2camera

c3camera

c4camera

⎡

⎣

⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥

44

Mx = 0 MTMx = 0 x belongs to the null space of MTM:

with vi eigenvectors of matrix MTM associated to null eigenvalues.

Computing MTM is the most costly operation — and linear in n, the number of correspondences.

The Solution as Weighted Sum of Eigenvectors

€

∃N, β i{ } such that x = β ivi

i=1

N

∑

€


i=1

N

∑

45

• The i are our N new unknowns;• N is the dimension of the null space of MTM;

• Without noise: N = 1 (scale ambiguity).

• In practice: no zero eigenvalues, but several very small, and N ≥ 1 (depends on the 2D locations noise).

We found that only the cases N = 1, 2, 3 and 4 must be considered.

From 12 Unknowns to 1, 2, 3, or 4

€


i=1

N

∑

46

How the Control Points Vary with the i



Reprojections in the Image Corresponding 3D points

€

∃ i{ } such that x = β ivi

i=1

N

∑ =

c1camera

c2camera

c3camera

c4camera

⎡

⎣

⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥

When varying the i:

47

Imposing the Rigidity Constraint

The distances between the control points must be preserved:

6 quadratic equations in the i.

€

c1

€

c2

€

c3

€

c4

€

ckcamera − c l

camera 2= ck

world − c lworld 2

(known)

€

ckcamera − c l

camera 2= ck


(known)

48

The Case N = 1

€

c1camera

c2camera

c3camera

c4camera

⎡

⎣

⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥

= β1v1

1 can easily be computed: • Its absolute value is solution of a linear system:

• Its sign is chosen so that the handedness of the control points is preserved.

€

ckcamera − c l

camera 2= ck


(known)

, and 6 quadratic equations:

€

× v1[k ] − v1

[ l ] = ckworld − c l

world

49

We use the linearization technique.Gives 6 linear equations in 11 = 1

2, 12 = 1 2, and 22 = 22 :

€

l11 l12 l13

l21 l22 l23

l31 l32 l33

l41 l42 l43

l51 l52 l53

l61 l62 l63

⎡

⎣

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

β11

β12

β 22

⎡

⎣

⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥=

ρ1

ρ 2

ρ 3

ρ 4

ρ 5

ρ 6

⎡

⎣

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎫

⎬

⎪ ⎪ ⎪

⎭

⎪ ⎪ ⎪

6 equations

€

l11 l12 l13

l21 l22 l23

l31 l32 l33

l41 l42 l43

l51 l52 l53

l61 l62 l63

⎡

⎣

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

β11

β12

β 22

⎡

⎣

⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥=

ρ1

ρ 2

ρ 3

ρ 4

ρ 5

ρ 6

⎡

⎣

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎫

⎬

⎪ ⎪ ⎪

⎭

⎪ ⎪ ⎪

6 equations

The Case N = 2

€

c1camera

c2camera

c3camera

c4camera

⎡

⎣

⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥

= β1v1 + β 2v2

€

ckcamera − c l

camera 2= ck


(known)


50

Same linearization technique.Gives 6 linear equations for 6 unknowns:€

c1camera

c2camera

c3camera

c4camera

⎡

⎣

⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥

= β ivi

i=1

3

∑

€

ckcamera − c l

camera 2= ck


(known)

€

l11 l12 l13 l14 l15 l16

l21 l22 l23 l24 l25 l26

l31 l32 l33 l34 l35 l36

l41 l42 l43 l44 l45 l46

l51 l52 l53 l54 l55 l56

l61 l62 l63 l64 l65 l66

⎡

⎣

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

β11

β12

β13

β 22

β 23

β 33

⎡

⎣

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

=

ρ1

ρ 2

ρ 3

ρ 4

ρ 5

ρ 6

⎡

⎣

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎫

⎬

⎪ ⎪ ⎪

⎭

⎪ ⎪ ⎪

6 equations

€

l11 l12 l13 l14 l15 l16

l21 l22 l23 l24 l25 l26

l31 l32 l33 l34 l35 l36

l41 l42 l43 l44 l45 l46

l51 l52 l53 l54 l55 l56

l61 l62 l63 l64 l65 l66

⎡

⎣

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

β11

β12

β13

β 22

β 23

β 33

⎡

⎣

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

=

ρ1

ρ 2

ρ 3

ρ 4

ρ 5

ρ 6

⎡

⎣

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎫

⎬

⎪ ⎪ ⎪

⎭

⎪ ⎪ ⎪

6 equations

The Case N = 3


51

Six quadratic equations in 1, 2, 3, and 4. The linearization introduces 10 products ab= a b

Not enough equations anymore ! Relinearization: The ab are expressed as a linear combination of eigenvectors.

€

l11 l12 l13 l14 l15 l16 l17 l18 l19 l1,10

l21 l22 l23 l24 l25 l26 l27 l28 l29 l2,10

l31 l32 l33 l34 l35 l36 l37 l38 l39 l3,10

l41 l42 l43 l44 l45 l46 l47 l48 l49 l4,10

l51 l52 l53 l54 l55 l56 l57 l58 l59 l5,10

l61 l62 l63 l64 l65 l66 l67 l68 l69 l6,10

⎡

⎣

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

β11

β12

β13

β14

β 22

β 23

β 24

β 32

β 33

β 44

⎡

⎣

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

=

ρ1

ρ 2

ρ 3

ρ 4

ρ 5

ρ 6

⎡

⎣

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎫

⎬

⎪ ⎪ ⎪ ⎪ ⎪ ⎪

⎭

⎪ ⎪ ⎪ ⎪ ⎪ ⎪

6 equations

€

l11 l12 l13 l14 l15 l16 l17 l18 l19 l1,10

l21 l22 l23 l24 l25 l26 l27 l28 l29 l2,10

l31 l32 l33 l34 l35 l36 l37 l38 l39 l3,10

l41 l42 l43 l44 l45 l46 l47 l48 l49 l4,10

l51 l52 l53 l54 l55 l56 l57 l58 l59 l5,10

l61 l62 l63 l64 l65 l66 l67 l68 l69 l6,10

⎡

⎣

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

β11

β12

β13

β14

β 22

β 23

β 24

β 32

β 33

β 44

⎡

⎣

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

=

ρ1

ρ 2

ρ 3

ρ 4

ρ 5

ρ 6

⎡

⎣

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎫

⎬

⎪ ⎪ ⎪ ⎪ ⎪ ⎪

⎭

⎪ ⎪ ⎪ ⎪ ⎪ ⎪

6 equations

The Case N = 4

52

Algorithm Summary

1. The control points coordinates are the (12) unknowns;

2. The 3D points should project on the given corresponding 2D locations: Linear system in the control points coordinates.

3. The control points coordinates can be expressed as a linear combination of the null eigenvectors of this linear system: The weights (the i) are the new unknowns (not more than 4).

4. Adding the rigidity constraints gives quadratic equations in the i.

5. Solving for the i depends on their number (linearization or relinearization).

53

Results





54

Thank you.

Questions ?

60

The Point Reprojections Give a Linear System

€

wi

ui

1

⎡

⎣ ⎢

⎤

⎦ ⎥= Ap

i

camera = A α ijc jcamera

j=1

4

∑For each correspondence i:

€

⇔ ∀i, wi

ui

v i

1

⎡

⎣

⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥=

fu 0 uc

0 fv vc

0 0 1

⎡

⎣

⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥

α ij

x jcamera

y jcamera

z jcamera

⎡

⎣

⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥j=1

4

∑Let's expand:

€

⇔α ij fux j

camera + α ij uc − ui( )z jcamera = 0

j=1

4

∑α ij fv y j

camera + α ij vc − v i( )z jcamera = 0

j=1

4

∑

⎧

⎨ ⎪

⎩ ⎪

Concatenating equations from all the correspondences:

€

⇔ Mx = 0, with x =

c1camera

c2camera

c3camera

c4camera

⎡

⎣

⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥

From point reprojection:

Documents

Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit