Upload
terence-mills
View
231
Download
0
Embed Size (px)
Citation preview
Low Complexity Keypoint Recognition and Pose Estimation
Vincent Lepetit
Real-Time 3D Object Detection
Runs at 15 Hz
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
3
Keypoint Recognition
Pre-processingMake the actual classification easier
Nearest neighbor classification
One class per keypoint: the set of the keypoint’s possible appearances under various perspective, lighting, noise...
The general approach [Lowe, Matas, Mikolajczyk] is a particular case of classification:
Search in the Database
Search in the Database
4
Used at run-time to recognize the keypoints
Training phase Classifier
5
A New Classifier: FernsJoint Work with Mustafa Özuysal
6
Compromise:
which is proportional to
but complete representation of the joint distribution infeasible.
Naive Bayesian ignores the correlation:
We are looking for
€
argmaxi
P(C = c i patch)
If patch can be represented by a set of image features { fi }:
€
P(C = c i patch) = P(C = c i f1, f2,K fn, fn +1,K K fN )
Presentation on an Example
Ferns: TrainingThe tests compare the intensities of two pixels around the keypoint:
Invariant to light change by any raising function.
Posterior probabilities:
Ferns: Training
6
1
5
0
1
1
1
0
0
1
0
1
++
++
++
Ferns: Training
Ferns: Training Results
Ferns: Recognition
It Really Works
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
14
Ferns outperform Trees500 classes.
No orientation or perspective correction.
FERNS
TREES
Number of structures
Recognition rateFerns responses are combined multiplicatively(Naive Bayesian rule)
Trees responses are combined additively(average)
Optimized Locations versus Random Locations:We Can Use Random Tests
Number of trees
Recognition rate Information gain optimizationRandomness
Comparison of the recognition rates for 200 keypoints:
16
We Can Use Random Tests
For a small number of classeswe can try several tests, and
retain the best one according to some criterion.
17
We Can Use Random Tests
For a small number of classeswe can try several tests, and
retain the best one according to some criterion.
When the number of classes is largeany test does a decent job:
18
Another Graphical Interpretation
19
Another Graphical Interpretation
20
Another Graphical Interpretation
21
Another Graphical Interpretation
22
Another Graphical Interpretation
23
Another Graphical Interpretation
24
We Can Use Random Tests:Why It Is Interesting
Building the ferns takes no time (except for the posterior probabilities estimation);
Simplifies the classifier structure;
Allows incremental learning.
25
Comparison with SIFTRecognition rate
FERNS
SIFT
Frame Index
Number of Inliers
26
Comparison with SIFTComputation time
• SIFT: 1 ms to compute the descriptor of a keypoint (without including convolution);
• FERNS: 13.5 micro-second to classify one keypoint into 200 classes.
27
1: for(int i = 0; i < H; i++) P[i ] = 0.; 2: for(int k = 0; k < M; k++) { 3: int index = 0, * d = D + k * 2 * S; 4: for(int j = 0; j < S; j++) { 5: index <<= 1; 6: if (*(K + d[0]) < *(K + d[1])) 7: index++; 8: d += 2; } 9: p = PF + k * shift2 + index * shift1;10: for(int i = 0; i < H; i++) P[i] += p[i]; }
Very simple to implement;No need for orientation nor perspective correction;(Almost) no parameters to tune;Very fast.
Keypoint Recognition in Ten Lines of Code
28
Ferns Tuning
• The number of ferns, and
• The number of tests per ferns
can be tuned to adapt to the hardware in terms of CPU power and memory size.
Feature Harvesting
Estimate the posterior probabilities from a training video sequence:
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
Feature Harvesting
Update Classifier
Detect Object in Current Frame
With the ferns, we can easily:
- add a class;
- remove a class;
- add samples of a class to refine the classifier.
Incremental learning
No need to store image patches; We can select the keypoints the classifier can recognize.
Training examplesMatches
Test Sequence
QuickTime™ and a decompressor
are needed to see this picture.
Handling Light Changes
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
35
QuickTime™ and a decompressor
are needed to see this picture.
Low Complexity Keypoint Recognition and Pose Estimation
37
EPnP: An Accurate Non-Iterative O(n) Solution to the PnP Problem
Joint Work with Francesc Moreno-Noguer
38
The Perspective-n-Point (PnP) Problem
How to take advantage of the internal parameters ?
Solutions exist for the specific cases n = 3 [...], n = 4 [...], n = 5 [...], and the general case [...].
Rotation, Translation ?Internal
parameters known
2D/3D correspondences
known
39
A Stable Algorithm
MEAN MEDIAN
Rotation Error (%)
€
qtrue − q
q
⎛
⎝ ⎜
⎞
⎠ ⎟
Number of points used to estimate pose
LHM: Lu-Hager-Mjolsness, Fast and Globally Convergent Pose Estimation from Video Images. PAMI'00. (Alternatively optimize over Rotation and Translation);
EPnP: Our method.
40
A Fast Algorithm
Rotation Error (%)
Computation Time (sec) - Logarithmic scale
€
qtrue − q
q
⎛
⎝ ⎜
⎞
⎠ ⎟
MEDIAN
41
General Approach
Estimate the coordinates of the 3D points in the camera coordinate system.
€
knownpiworld
€
knownpiworld
€
picamera ?
€
picamera ?
€
estimatedpicamera
€
estimatedpicamera
Rotation, Translation[Lu et al. PAMI00]
Rotation, Translation[Lu et al. PAMI00]
42
€
pi = α ijc j
j=1
4
∑
Introducing Control PointsThe 3D points are expressed as a weighted sum of four control points.
€
pi
€
c1
€
c2
€
c3
€
c4
€
x =
c1camera
c2camera
c3camera
c4camera
⎡
⎣
⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥
€
x =
c1camera
c2camera
c3camera
c4camera
⎡
⎣
⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥
12 unknowns: The coordinates of the control points in the camera coordinates system.
43
The Point Reprojections Give a Linear System
€
wi
ui
1
⎡
⎣ ⎢
⎤
⎦ ⎥= Ap
i
camera = A α ijc jcamera
j=1
4
∑
For each correspondence i:
Rewriting and Concatenating the Equations from all the Correspondences:
€
Mx = 0
€
Mx = 0
€
withx =
c1camera
c2camera
c3camera
c4camera
⎡
⎣
⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥
44
Mx = 0 MTMx = 0 x belongs to the null space of MTM:
with vi eigenvectors of matrix MTM associated to null eigenvalues.
Computing MTM is the most costly operation — and linear in n, the number of correspondences.
The Solution as Weighted Sum of Eigenvectors
€
∃N, β i{ } such that x = β ivi
i=1
N
∑
€
∃N, β i{ } such that x = β ivi
i=1
N
∑
45
• The i are our N new unknowns;• N is the dimension of the null space of MTM;
• Without noise: N = 1 (scale ambiguity).
• In practice: no zero eigenvalues, but several very small, and N ≥ 1 (depends on the 2D locations noise).
We found that only the cases N = 1, 2, 3 and 4 must be considered.
From 12 Unknowns to 1, 2, 3, or 4
€
∃N, β i{ } such that x = β ivi
i=1
N
∑
46
How the Control Points Vary with the i
QuickTime™ and a decompressor
are needed to see this picture.
Reprojections in the Image Corresponding 3D points
€
∃ i{ } such that x = β ivi
i=1
N
∑ =
c1camera
c2camera
c3camera
c4camera
⎡
⎣
⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥
When varying the i:
47
Imposing the Rigidity Constraint
The distances between the control points must be preserved:
6 quadratic equations in the i.
€
c1
€
c2
€
c3
€
c4
€
ckcamera − c l
camera 2= ck
world − c lworld 2
(known)
€
ckcamera − c l
camera 2= ck
world − c lworld 2
(known)
48
The Case N = 1
€
c1camera
c2camera
c3camera
c4camera
⎡
⎣
⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥
= β1v1
1 can easily be computed: • Its absolute value is solution of a linear system:
• Its sign is chosen so that the handedness of the control points is preserved.
€
ckcamera − c l
camera 2= ck
world − c lworld 2
(known)
, and 6 quadratic equations:
€
× v1[k ] − v1
[ l ] = ckworld − c l
world
49
We use the linearization technique.Gives 6 linear equations in 11 = 1
2, 12 = 1 2, and 22 = 22 :
€
l11 l12 l13
l21 l22 l23
l31 l32 l33
l41 l42 l43
l51 l52 l53
l61 l62 l63
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥
β11
β12
β 22
⎡
⎣
⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥=
ρ1
ρ 2
ρ 3
ρ 4
ρ 5
ρ 6
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥
⎫
⎬
⎪ ⎪ ⎪
⎭
⎪ ⎪ ⎪
6 equations
€
l11 l12 l13
l21 l22 l23
l31 l32 l33
l41 l42 l43
l51 l52 l53
l61 l62 l63
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥
β11
β12
β 22
⎡
⎣
⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥=
ρ1
ρ 2
ρ 3
ρ 4
ρ 5
ρ 6
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥
⎫
⎬
⎪ ⎪ ⎪
⎭
⎪ ⎪ ⎪
6 equations
The Case N = 2
€
c1camera
c2camera
c3camera
c4camera
⎡
⎣
⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥
= β1v1 + β 2v2
€
ckcamera − c l
camera 2= ck
world − c lworld 2
(known)
, and 6 quadratic equations:
50
Same linearization technique.Gives 6 linear equations for 6 unknowns:€
c1camera
c2camera
c3camera
c4camera
⎡
⎣
⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥
= β ivi
i=1
3
∑
€
ckcamera − c l
camera 2= ck
world − c lworld 2
(known)
€
l11 l12 l13 l14 l15 l16
l21 l22 l23 l24 l25 l26
l31 l32 l33 l34 l35 l36
l41 l42 l43 l44 l45 l46
l51 l52 l53 l54 l55 l56
l61 l62 l63 l64 l65 l66
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥
β11
β12
β13
β 22
β 23
β 33
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥
=
ρ1
ρ 2
ρ 3
ρ 4
ρ 5
ρ 6
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥
⎫
⎬
⎪ ⎪ ⎪
⎭
⎪ ⎪ ⎪
6 equations
€
l11 l12 l13 l14 l15 l16
l21 l22 l23 l24 l25 l26
l31 l32 l33 l34 l35 l36
l41 l42 l43 l44 l45 l46
l51 l52 l53 l54 l55 l56
l61 l62 l63 l64 l65 l66
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥
β11
β12
β13
β 22
β 23
β 33
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥
=
ρ1
ρ 2
ρ 3
ρ 4
ρ 5
ρ 6
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥
⎫
⎬
⎪ ⎪ ⎪
⎭
⎪ ⎪ ⎪
6 equations
The Case N = 3
, and 6 quadratic equations:
51
Six quadratic equations in 1, 2, 3, and 4. The linearization introduces 10 products ab= a b
Not enough equations anymore ! Relinearization: The ab are expressed as a linear combination of eigenvectors.
€
l11 l12 l13 l14 l15 l16 l17 l18 l19 l1,10
l21 l22 l23 l24 l25 l26 l27 l28 l29 l2,10
l31 l32 l33 l34 l35 l36 l37 l38 l39 l3,10
l41 l42 l43 l44 l45 l46 l47 l48 l49 l4,10
l51 l52 l53 l54 l55 l56 l57 l58 l59 l5,10
l61 l62 l63 l64 l65 l66 l67 l68 l69 l6,10
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥
β11
β12
β13
β14
β 22
β 23
β 24
β 32
β 33
β 44
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥
=
ρ1
ρ 2
ρ 3
ρ 4
ρ 5
ρ 6
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥
⎫
⎬
⎪ ⎪ ⎪ ⎪ ⎪ ⎪
⎭
⎪ ⎪ ⎪ ⎪ ⎪ ⎪
6 equations
€
l11 l12 l13 l14 l15 l16 l17 l18 l19 l1,10
l21 l22 l23 l24 l25 l26 l27 l28 l29 l2,10
l31 l32 l33 l34 l35 l36 l37 l38 l39 l3,10
l41 l42 l43 l44 l45 l46 l47 l48 l49 l4,10
l51 l52 l53 l54 l55 l56 l57 l58 l59 l5,10
l61 l62 l63 l64 l65 l66 l67 l68 l69 l6,10
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥
β11
β12
β13
β14
β 22
β 23
β 24
β 32
β 33
β 44
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥
=
ρ1
ρ 2
ρ 3
ρ 4
ρ 5
ρ 6
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥
⎫
⎬
⎪ ⎪ ⎪ ⎪ ⎪ ⎪
⎭
⎪ ⎪ ⎪ ⎪ ⎪ ⎪
6 equations
The Case N = 4
52
Algorithm Summary
1. The control points coordinates are the (12) unknowns;
2. The 3D points should project on the given corresponding 2D locations: Linear system in the control points coordinates.
3. The control points coordinates can be expressed as a linear combination of the null eigenvectors of this linear system: The weights (the i) are the new unknowns (not more than 4).
4. Adding the rigidity constraints gives quadratic equations in the i.
5. Solving for the i depends on their number (linearization or relinearization).
53
Results
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
54
Thank you.
Questions ?
60
The Point Reprojections Give a Linear System
€
wi
ui
1
⎡
⎣ ⎢
⎤
⎦ ⎥= Ap
i
camera = A α ijc jcamera
j=1
4
∑For each correspondence i:
€
⇔ ∀i, wi
ui
v i
1
⎡
⎣
⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥=
fu 0 uc
0 fv vc
0 0 1
⎡
⎣
⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥
α ij
x jcamera
y jcamera
z jcamera
⎡
⎣
⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥j=1
4
∑Let's expand:
€
⇔α ij fux j
camera + α ij uc − ui( )z jcamera = 0
j=1
4
∑α ij fv y j
camera + α ij vc − v i( )z jcamera = 0
j=1
4
∑
⎧
⎨ ⎪
⎩ ⎪
Concatenating equations from all the correspondences:
€
⇔ Mx = 0, with x =
c1camera
c2camera
c3camera
c4camera
⎡
⎣
⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥
From point reprojection: