View
9
Download
0
Category
Preview:
Citation preview
Vision Algorithms for Mobile Robotics
Lecture 11Tracking
Davide Scaramuzzahttp://rpg.ifi.uzh.ch
1
Lab Exercise โ This afternoon
Implement the Kanade-Lucas-Tomasi (KLT) tracker
2
Outline
โข Point trackingโข Template trackingโข Tracking by detection of local image features
3
Point Tracking
โข Problem: given two images, estimate the motion of a pixel point from image ๐ผ๐ผ0 to image ๐ผ๐ผ1
4
๐ผ๐ผ0(๐ฅ๐ฅ, ๐ฆ๐ฆ)
Point Tracking
โข Problem: given two images, estimate the motion of a pixel point from image ๐ผ๐ผ0 to image ๐ผ๐ผ1
5
๐ผ๐ผ1(๐ฅ๐ฅ,๐ฆ๐ฆ)
Point Tracking
โข Problem: given two images, estimate the motion of a pixel point from image ๐ผ๐ผ0 to image ๐ผ๐ผ1
โข Two approaches exist, depending on the amount of motion between the framesโข Block-based methodsโข Differential methods
6
๐ผ๐ผ0(๐ฅ๐ฅ, ๐ฆ๐ฆ)
(๐ข๐ข, ๐ฃ๐ฃ): optical flow vector
Point Tracking
โข Consider the motion of the following corner
7
Point Tracking
โข Consider the motion of the following corner
8
Point Tracking with Block Matching
โข Search for the corresponding patch in a ๐ท๐ท ร ๐ท๐ท region around the point to track.โข Use SSD, SAD, or NCC
9
Search region
Patch to track
Pros and Cons of Block Matching
โข Pros:โข Works well if the motion is large
โข Consโข Can become computationally demanding if the motion is large
โข Can the โsearchโ be implemented in a smart way if the motion is โsmallโ?โข Yes, use Differential methods
10
Point Tracking with Differential Methods
Looks at the local brightness changes at the same location. No patch shift is performed!
11
๐ผ๐ผ0(๐ฅ๐ฅ, ๐ฆ๐ฆ)
Point Tracking with Differential Methods
Looks at the local brightness changes at the same location. No patch shift is performed!
12
๐ผ๐ผ0(๐ฅ๐ฅ, ๐ฆ๐ฆ)
Point Tracking with Differential Methods
Looks at the local brightness changes at the same location. No patch shift is performed!
13
๐ผ๐ผ1(๐ฅ๐ฅ,๐ฆ๐ฆ)
Point Tracking with Differential Methods
Assumptions:โข Brightness constancy
โข The intensity of the pixels around the point to track does not change much between the two frames
โข Temporal consistencyโข The motion displacement is small (1-2 pixels); however, this
can be addressed using multi-scale implementations (see later)
โข Spatial coherencyโข Neighboring pixels undergo similar motion (i.e., they all lay
on the same 3D surface, i.e., no depth discontinuity)
14
The Kanade-Lucas-Tomasi (KLT) tracker
Consider the reference patch centered at (๐ฅ๐ฅ, ๐ฆ๐ฆ) in image ๐ผ๐ผ0 and the shifted patch centered at (๐ฅ๐ฅ + ๐ข๐ข, ๐ฆ๐ฆ + ๐ฃ๐ฃ)in image ๐ผ๐ผ1. The patch has size ฮฉ. We want to find the motion vector (๐ข๐ข, ๐ฃ๐ฃ) that minimizes the Sum of Squared Differences (SSD):
This is a simple quadratic function in two variables (๐ข๐ข, ๐ฃ๐ฃ)
15Lucas, Kanade, An iterative image registration technique with an application to stereo vision. Proceedings of Imaging Understanding Workshop, 1981. PDF.
Tomasi, Kanade, Detection and Tracking of Point Features, Carnegie Mellon University Technical Report CMU-CS-91-132, 1991. PDF.
๐๐๐๐๐๐ ๐ข๐ข, ๐ฃ๐ฃ = ๏ฟฝ๐ฅ๐ฅ,๐ฆ๐ฆโฮฉ
(๐ผ๐ผ0 ๐ฅ๐ฅ,๐ฆ๐ฆ โ ๐ผ๐ผ1 ๐ฅ๐ฅ + ๐ข๐ข,๐ฆ๐ฆ + ๐ฃ๐ฃ )2
โ ๏ฟฝ(๐ผ๐ผ0 ๐ฅ๐ฅ,๐ฆ๐ฆ โ ๐ผ๐ผ1 ๐ฅ๐ฅ,๐ฆ๐ฆ โ ๐ผ๐ผ๐ฅ๐ฅ๐ข๐ข โ ๐ผ๐ผ๐ฆ๐ฆ๐ฃ๐ฃ)2
โ ๐๐๐๐๐๐(๐ข๐ข, ๐ฃ๐ฃ) = ๏ฟฝ(โ๐ผ๐ผ โ ๐ผ๐ผ๐ฅ๐ฅ๐ข๐ข โ ๐ผ๐ผ๐ฆ๐ฆ๐ฃ๐ฃ)2
๐ผ๐ผ0
๐ผ๐ผ1
(๐ฅ๐ฅ, ๐ฆ๐ฆ)
(๐ฅ๐ฅ, ๐ฆ๐ฆ)(๐ฅ๐ฅ + ๐ข๐ข, ๐ฆ๐ฆ + ๐ฃ๐ฃ)
The Kanade-Lucas-Tomasi (KLT) tracker
16
To minimize it, we differentiate it with respect to (๐ข๐ข, ๐ฃ๐ฃ) and equate it to zero:
๐๐๐๐๐๐(๐ข๐ข, ๐ฃ๐ฃ) = ๏ฟฝ(โ๐ผ๐ผ โ ๐ผ๐ผ๐ฅ๐ฅ๐ข๐ข โ ๐ผ๐ผ๐ฆ๐ฆ๐ฃ๐ฃ)2
๐๐๐๐๐๐๐๐๐๐๐๐
= 0 , ๐๐๐๐๐๐๐๐๐๐๐๐
= 0
โ2๏ฟฝ๐ผ๐ผ๐ฅ๐ฅ(โ๐ผ๐ผ โ ๐ผ๐ผ๐ฅ๐ฅ๐ข๐ข โ ๐ผ๐ผ๐ฆ๐ฆ๐ฃ๐ฃ) = 0
โ2๏ฟฝ๐ผ๐ผ๐ฆ๐ฆ โ๐ผ๐ผ โ ๐ผ๐ผ๐ฅ๐ฅ๐ข๐ข โ ๐ผ๐ผ๐ฆ๐ฆ๐ฃ๐ฃ = 0
๐๐๐๐๐๐๐๐๐๐๐ข๐ข
= 0 โ
๐๐๐๐๐๐๐๐๐๐๐ฃ๐ฃ
= 0 โ
The Kanade-Lucas-Tomasi (KLT) tracker
โข Linear system of two equations in two unknowns
โข We can write them in matrix form:
17
๏ฟฝ๐ผ๐ผ๐ฅ๐ฅ(โ๐ผ๐ผ โ ๐ผ๐ผ๐ฅ๐ฅ๐ข๐ข โ ๐ผ๐ผ๐ฆ๐ฆ๐ฃ๐ฃ) = 0
๏ฟฝ๐ผ๐ผ๐ฆ๐ฆ โ๐ผ๐ผ โ ๐ผ๐ผ๐ฅ๐ฅ๐ข๐ข โ ๐ผ๐ผ๐ฆ๐ฆ๐ฃ๐ฃ = 0
๏ฟฝ๐ผ๐ผ๐ฅ๐ฅ๐ผ๐ผ๐ฅ๐ฅ ๏ฟฝ๐ผ๐ผ๐ฅ๐ฅ๐ผ๐ผ๐ฆ๐ฆ
๏ฟฝ๐ผ๐ผ๐ฅ๐ฅ๐ผ๐ผ๐ฆ๐ฆ ๏ฟฝ๐ผ๐ผ๐ฆ๐ฆ๐ผ๐ผ๐ฆ๐ฆ
๐ข๐ข๐ฃ๐ฃ =
๏ฟฝ๐ผ๐ผ๐ฅ๐ฅโ๐ผ๐ผ
๏ฟฝ๐ผ๐ผ๐ฆ๐ฆโ๐ผ๐ผโ ๐ข๐ข
๐ฃ๐ฃ =
๏ฟฝ๐ผ๐ผ๐ฅ๐ฅ๐ผ๐ผ๐ฅ๐ฅ ๏ฟฝ๐ผ๐ผ๐ฅ๐ฅ๐ผ๐ผ๐ฆ๐ฆ
๏ฟฝ๐ผ๐ผ๐ฅ๐ฅ๐ผ๐ผ๐ฆ๐ฆ ๏ฟฝ๐ผ๐ผ๐ฆ๐ฆ๐ผ๐ผ๐ฆ๐ฆ
โ1
๏ฟฝ๐ผ๐ผ๐ฅ๐ฅโ๐ผ๐ผ
๏ฟฝ๐ผ๐ผ๐ฆ๐ฆโ๐ผ๐ผ
Havenโt we seen this matrix already? Recall Harris detector!
Notice that these are NOT matrix products but pixel-wise products!
The Kanade-Lucas-Tomasi (KLT) tracker
In practice, det(๐๐) should be non zero, which means that its eigenvalues should be large (i.e., not a flat region, not an edge) โ in practice, it should be a corner or more generally contain any textured region!
18
Edge โ det(M) is low
Flat โ det(M) is low
Texture โ det(M) is high
๐๐ =
๏ฟฝ๐ผ๐ผ๐ฅ๐ฅ๐ผ๐ผ๐ฅ๐ฅ ๏ฟฝ๐ผ๐ผ๐ฅ๐ฅ๐ผ๐ผ๐ฆ๐ฆ
๏ฟฝ๐ผ๐ผ๐ฅ๐ฅ๐ผ๐ผ๐ฆ๐ฆ ๏ฟฝ๐ผ๐ผ๐ฆ๐ฆ๐ผ๐ผ๐ฆ๐ฆRR
= โ
2
11
00ฮป
ฮป
Application to Corner Tracking
Color encodes motion direction
19
Application to Optical Flow
What if you track every single pixel in the image?
20
Application to Optical Flow
21
Application to Optical Flow
22
Optical Flow example
23
Aperture Problem
โข Consider the motion of the following corner
24
Aperture Problem
โข Consider the motion of the following corner
25
Aperture Problem
โข Now, look at the local brightness changes through a small aperture
26
Aperture Problem
โข Now, look at the local brightness changes through a small aperture
27
Aperture Problem
โข Now, look at the local brightness changes through a small aperture
โข We cannot always determine the motion direction โ Infinite motion solutions may exist!โข Solution?
28
Aperture Problem
โข Now, look at the local brightness changes through a small aperture
โข We cannot always determine the motion direction โ Infinite motion solutions may exist!โข Solution?
โข Increase aperture size!
29
Block-based vs Differential methods
โข Block-based methods:โข Robust to large motionsโข Can be computationally expensive (๐ท๐ทร๐ท๐ท validations need to be made for a single point to track)
โข Differential methods: โข Works only for small motions (e.g., high frame rate). For larger motion, multi-scale implementations
are used but are more expensive (see later)โข Much more efficient than block-based methods. Thus, can be used to track the motion of every pixel
in the image (i.e., optical flow). It avoids searching in the neighborhood of the point by analyzing the local intensity changes (i.e., differences) of an image patch at a specific location (i.e., no search is performed)
30
Outline
โข Point trackingโข Template trackingโข Tracking by detection of local image features
31
Template tracking
Goal: follow a template image in a video sequence by estimating the warp
32
Template tracking
Goal: follow a template image in a video sequence by estimating the warp
33
Template image
Template Warping
โข Given the template image ๐๐(๐ฑ๐ฑ)โข Take all pixels from the template image ๐๐(๐ฑ๐ฑ) and warp them using the function ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ
parameterized in terms of parameters ๐ฉ๐ฉ
34
Template image
๐๐(๐ฑ๐ฑ)
๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ
warp
๐ผ๐ผ(๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ )
Current image
Common 2D Transformations
โข Translation
โข Euclidean
โข Affine
โข Projective(homography)
35
๐ฅ๐ฅโฒ = ๐ฅ๐ฅ + ๐๐1๐ฆ๐ฆโฒ = ๐ฆ๐ฆ + ๐๐2
๐ฅ๐ฅโฒ = ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ(๐๐3) โ ๐ฆ๐ฆ๐ฅ๐ฅ๐ฆ๐ฆ๐ฆ๐ฆ(๐๐3) + ๐๐1๐ฆ๐ฆโฒ = ๐ฅ๐ฅ๐ฅ๐ฅ๐ฆ๐ฆ๐ฆ๐ฆ(๐๐3) + ๐ฆ๐ฆ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ(๐๐3) + ๐๐2
๐ฅ๐ฅโฒ = ๐๐1๐ฅ๐ฅ + ๐๐3๐ฆ๐ฆ + ๐๐5๐ฆ๐ฆโฒ = ๐๐2๐ฅ๐ฅ + ๐๐4๐ฆ๐ฆ + ๐๐6
๐ฅ๐ฅโฒ =๐๐1๐ฅ๐ฅ + ๐๐2๐ฆ๐ฆ + ๐๐3๐๐7๐ฅ๐ฅ + ๐๐8๐ฆ๐ฆ + 1
๐ฆ๐ฆโฒ =๐๐4๐ฅ๐ฅ + ๐๐5๐ฆ๐ฆ + ๐๐6๐๐7๐ฅ๐ฅ + ๐๐8๐ฆ๐ฆ + 1
Common 2D Transformations in Matrix form
We denote the transformation W ๐ฑ๐ฑ,๐ฉ๐ฉ and p the set of parameters ๐๐ = (๐๐1,๐๐2, โฆ ,๐๐๐๐)
โข Translation
โข Euclidean
โข Affine
โข Projective(homography)
36
๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ =๐ฅ๐ฅ + ๐๐1๐ฆ๐ฆ + ๐๐2
= 1 0 ๐๐10 1 ๐๐2
๐ฅ๐ฅ๐ฆ๐ฆ1
๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ = ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ(๐๐3) โ ๐ฆ๐ฆ๐ฅ๐ฅ๐ฆ๐ฆ๐ฆ๐ฆ(๐๐3) + ๐๐1๐ฅ๐ฅ๐ฅ๐ฅ๐ฆ๐ฆ๐ฆ๐ฆ(๐๐3) + ๐ฆ๐ฆ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ(๐๐3) + ๐๐2
= cos(๐๐3) โsin(๐๐3) ๐๐1sin(๐๐3) cos(๐๐3) ๐๐2
๐ฅ๐ฅ๐ฆ๐ฆ1
๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ =๐๐1๐ฅ๐ฅ + ๐๐3๐ฆ๐ฆ + ๐๐5๐๐2๐ฅ๐ฅ + ๐๐4๐ฆ๐ฆ + ๐๐6
=๐๐1 ๐๐3 ๐๐5๐๐2 ๐๐4 ๐๐6
๐ฅ๐ฅ๐ฆ๐ฆ1
Homogeneous coordinates
๐๐ ๏ฟฝ๐๐,๐ฉ๐ฉ =๐๐1 ๐๐2 ๐๐3๐๐4 ๐๐5 ๐๐6๐๐7 ๐๐8 1
๐ฅ๐ฅ๐ฆ๐ฆ1
Common 2D Transformations in Matrix form
37
๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ = 1 0 ๐๐10 1 ๐๐2
๐ฅ๐ฅ๐ฆ๐ฆ1
๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ = cos(๐๐3) โsin(๐๐3) ๐๐1sin(๐๐3) cos(๐๐3) ๐๐2
๐ฅ๐ฅ๐ฆ๐ฆ1
๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ =๐๐1 ๐๐3 ๐๐5๐๐2 ๐๐4 ๐๐6
๐ฅ๐ฅ๐ฆ๐ฆ1
๐๐ ๏ฟฝ๐๐,๐ฉ๐ฉ =๐๐1 ๐๐2 ๐๐3๐๐4 ๐๐5 ๐๐6๐๐7 ๐๐8 1
๐ฅ๐ฅ๐ฆ๐ฆ1
๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ = ๐๐4cos(๐๐3) โsin(๐๐3) ๐๐1๐ฅ๐ฅ๐ฆ๐ฆ๐ฆ๐ฆ(๐๐3) cos(๐๐3) ๐๐2
๐ฅ๐ฅ๐ฆ๐ฆ1
Derivative and gradient
โข Function: ๐๐ ๐ฅ๐ฅ
โข Derivative: ๐๐โฒ ๐ฅ๐ฅ = ๐๐๐๐๐๐๐๐
, where ๐ฅ๐ฅ is a scalar
โข Function: ๐๐(๐ฅ๐ฅ1, ๐ฅ๐ฅ2, โฆ , ๐ฅ๐ฅ๐๐ )
โข Gradient: โ๐๐(๐ฅ๐ฅ1, ๐ฅ๐ฅ2, โฆ , ๐ฅ๐ฅ๐๐ )= ๐๐๐๐๐๐๐๐1
, ๐๐๐๐๐๐๐๐2
, โฆ , ๐๐๐๐๐๐๐๐๐๐
38
Jacobian
โข ๐น๐น(๐ฅ๐ฅ1, ๐ฅ๐ฅ2, โฆ , ๐ฅ๐ฅ๐๐ ) =๐๐1(๐ฅ๐ฅ1, ๐ฅ๐ฅ2, โฆ , ๐ฅ๐ฅ๐๐ )
โฎ๐๐๐๐(๐ฅ๐ฅ1, ๐ฅ๐ฅ2, โฆ , ๐ฅ๐ฅ๐๐ )
is a vector-valued function
โข The derivative in this case is called Jacobian ๐๐๐น๐น๐๐๐ฑ๐ฑ
:
39
Carl Gustav Jacob (1804-1851)
๐๐๐น๐น๐๐๐ฑ๐ฑ
=
๐๐๐๐1๐๐๐ฅ๐ฅ1
, โฆ ,๐๐๐๐1๐๐๐ฅ๐ฅ๐๐
โฎ๐๐๐๐๐๐๐๐๐ฅ๐ฅ1
, โฆ ,๐๐๐๐๐๐๐๐๐ฅ๐ฅ๐๐
Displacement-model Jacobians โ๐๐๐๐
๐๐ = (๐๐1,๐๐2, โฆ ,๐๐๐๐)
โข Translation:
โข Euclidean:
โข Affine:
40
๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ =๐ฅ๐ฅ + ๐๐1๐ฆ๐ฆ + ๐๐2
๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ = ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ(๐๐3) โ ๐ฆ๐ฆ๐ฅ๐ฅ๐ฆ๐ฆ๐ฆ๐ฆ(๐๐3) + ๐๐1๐ฅ๐ฅ๐ฅ๐ฅ๐ฆ๐ฆ๐ฆ๐ฆ(๐๐3) + ๐ฆ๐ฆ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ(๐๐3) + ๐๐2
๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ =๐๐1๐ฅ๐ฅ + ๐๐3๐ฆ๐ฆ + ๐๐5๐๐2๐ฅ๐ฅ + ๐๐4๐ฆ๐ฆ + ๐๐6
๐๐๐๐๐๐๐ฉ๐ฉ
=
๐๐๐๐1
๐๐๐๐1๐๐๐๐1
๐๐๐๐2๐๐๐๐2
๐๐๐๐1๐๐๐๐2
๐๐๐๐2
= 1 00 1
๐๐๐๐๐๐๐ฉ๐ฉ
= 1 0 โ๐ฅ๐ฅ๐ฅ๐ฅ๐ฆ๐ฆ๐ฆ๐ฆ(๐๐3) โ ๐ฆ๐ฆ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ(๐๐3)0 1 ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ(๐๐3) โ ๐ฆ๐ฆ๐ฅ๐ฅ๐ฆ๐ฆ๐ฆ๐ฆ(๐๐3)
๐๐๐๐๐๐๐ฉ๐ฉ
= ๐ฅ๐ฅ 0 ๐ฆ๐ฆ0 ๐ฅ๐ฅ 0
0 1 0๐ฆ๐ฆ 0 1
Template Warping
โข Given the template image ๐๐(๐ฑ๐ฑ)โข Take all pixels from the template image ๐๐(๐ฑ๐ฑ) and warp them using the function ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ
parameterized in terms of parameters ๐ฉ๐ฉ
41
Template image
๐๐(๐ฑ๐ฑ)
๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ
warp
๐ผ๐ผ(๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ )
Current image
Template Tracking: Problem Formulation
โข The goal of template-based tracking is to find the set of warp parameters p such that:
โข This is solved by determining p that minimizes the Sum of Squared Differences:
42
๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ = ๐๐(๐ฑ๐ฑ)
๐๐๐๐๐ท๐ท = ๏ฟฝ๐ฑ๐ฑโ๐๐
๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ โ ๐๐(๐ฑ๐ฑ) ๐๐
Assumptions
โข No errors in the template image boundaries: only the object to track appears in the template image
โข No occlusion: the entire template is visible in the input image
โข Brightness constancy, โข Temporal consistency, โข Spatial coherency
43
KLT tracker applied to template tracking
โข Uses the Gauss-Newton method for minimization, that is:โข Applies a first-order approximation of the warpโข Attempts to minimize the SSD iteratively
44Lucas, Kanade, An iterative image registration technique with an application to stereo vision. Proceedings of Imaging Understanding Workshop, 1981. PDF.
Tomasi, Kanade, Detection and Tracking of Point Features, Carnegie Mellon University Technical Report CMU-CS-91-132, 1991. PDF.
Derivation of the KLT algorithm
โข Assume that an initial estimate of p is known. Then, we want to find the increment โ๐ฉ๐ฉthat minimizes
โข First-order Taylor approximation of ๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ + โ๐ฉ๐ฉ yelds to:
45
๐๐๐๐๐๐ = ๏ฟฝ๐ฑ๐ฑโ๐๐
๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ โ ๐๐(๐ฑ๐ฑ)๐๐
๐๐๐๐๐๐ = ๏ฟฝ๐ฑ๐ฑโ๐๐
๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ + โ๐ฉ๐ฉ โ ๐๐(๐ฑ๐ฑ)๐๐
๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ + โ๐ฉ๐ฉ โ ๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ +๐ป๐ป๐ผ๐ผ ๐๐๐๐๐๐๐ฉ๐ฉโ๐ฉ๐ฉ
๐ป๐ป๐ผ๐ผ = ๐ผ๐ผ๐ฅ๐ฅ , ๐ผ๐ผ๐ฆ๐ฆ = Image gradient evaluated at ๐๐(๐ฑ๐ฑ,๐ฉ๐ฉ) Jacobian of the warp ๐๐(๐ฑ๐ฑ,๐ฉ๐ฉ)
Derivation of the KLT algorithm
โข By replacing ๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ + โ๐ฉ๐ฉ with its 1st order approximation, we get
โข How do we minimize it?โข We differentiate SSD with respect to โ๐ฉ๐ฉ and we equate it to zero, i.e.,
46
๐๐๐๐๐๐ = ๏ฟฝ๐ฑ๐ฑโ๐๐
๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ + โ๐ฉ๐ฉ โ ๐๐(๐ฑ๐ฑ)๐๐
๐๐๐๐๐๐ = ๏ฟฝ๐ฑ๐ฑโ๐๐
๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ +๐ป๐ป๐ผ๐ผ๐๐๐๐๐๐๐ฉ๐ฉ
โ๐ฉ๐ฉ โ ๐๐(๐ฑ๐ฑ)๐๐
๐๐๐๐๐๐๐๐๐๐โ๐ฉ๐ฉ
= 0
Derivation of the KLT algorithm
47
๐๐๐๐๐๐๐๐๐๐โ๐ฉ๐ฉ
= 2๏ฟฝ๐ฑ๐ฑโ๐๐
๐ป๐ป๐ผ๐ผ๐๐๐๐๐๐๐ฉ๐ฉ
T
๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ +๐ป๐ป๐ผ๐ผ๐๐๐๐๐๐๐ฉ๐ฉ
โ๐ฉ๐ฉ โ ๐๐(๐ฑ๐ฑ)
๐๐๐๐๐๐๐๐๐๐โ๐ฉ๐ฉ
= 0
2๏ฟฝ๐ฑ๐ฑโ๐๐
๐ป๐ป๐ผ๐ผ๐๐๐๐๐๐๐ฉ๐ฉ
T
๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ +๐ป๐ป๐ผ๐ผ๐๐๐๐๐๐๐ฉ๐ฉ
โ๐ฉ๐ฉ โ ๐๐(๐ฑ๐ฑ) = 0 โ
๐๐๐๐๐๐ = ๏ฟฝ๐ฑ๐ฑโ๐๐
๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ +๐ป๐ป๐ผ๐ผ๐๐๐๐๐๐๐ฉ๐ฉ
โ๐ฉ๐ฉ โ ๐๐(๐ฑ๐ฑ)๐๐
Derivation of the KLT algorithm
48
โ โ๐ฉ๐ฉ = ๐ป๐ปโ1๏ฟฝ๐ฑ๐ฑโ๐๐
๐ป๐ป๐ผ๐ผ๐๐๐๐๐๐๐ฉ๐ฉ
T
๐๐ ๐ฑ๐ฑ โ ๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ =
๐ป๐ป = ๏ฟฝ๐ฑ๐ฑโ๐๐
๐ป๐ป๐ผ๐ผ๐๐๐๐๐๐๐ฉ๐ฉ
T
๐ป๐ป๐ผ๐ผ๐๐๐๐๐๐๐ฉ๐ฉ
Second moment matrix (Hessian) of the warped image
What does H look like when the warp is a pure translation?
Notice that these are NOT matrix products but pixel-wise products!
KLT algorithm
1. Warp ๐ผ๐ผ(๐ฑ๐ฑ) with ๐๐(๐ฑ๐ฑ,๐ฉ๐ฉ) โ๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ
2. Compute the error: subtract ๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ from ๐๐(๐ฑ๐ฑ)
3. Compute warped gradients: ๐ป๐ป๐ผ๐ผ = ๐ผ๐ผ๐ฅ๐ฅ , ๐ผ๐ผ๐ฆ๐ฆ , evaluated at ๐๐(๐ฑ๐ฑ,๐ฉ๐ฉ)
4. Evaluate the Jacobian of the warping: ๐๐๐๐๐๐๐ฉ๐ฉ
5. Compute steepest descent: ๐ป๐ป๐ผ๐ผ ๐๐๐๐๐๐๐ฉ๐ฉ
6. Compute Inverse Hessian: ๐ป๐ปโ1 = โ๐ฑ๐ฑโ๐๐ ๐ป๐ป๐ผ๐ผ ๐๐๐๐๐๐๐ฉ๐ฉ
T๐ป๐ป๐ผ๐ผ ๐๐๐๐
๐๐๐ฉ๐ฉ
โ1
7. Multiply steepest descend with error: โ๐ฑ๐ฑโ๐๐ ๐ป๐ป๐ผ๐ผ ๐๐๐๐๐๐๐ฉ๐ฉ
T๐๐ ๐ฑ๐ฑ โ ๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ
8. Compute โ๐ฉ๐ฉ
9. Update parameters: ๐ฉ๐ฉโ๐ฉ๐ฉ + โ๐ฉ๐ฉ10. Repeat until โ๐ฉ๐ฉ < ๐บ๐บ
49
โ โ๐ฉ๐ฉ = ๐ป๐ปโ1๏ฟฝ๐ฑ๐ฑโ๐๐
๐ป๐ป๐ผ๐ผ๐๐๐๐๐๐๐ฉ๐ฉ
T
๐๐ ๐ฑ๐ฑ โ ๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ
KLT algorithm: computing โ๐ฉ๐ฉ = ๐ป๐ปโ1 โ๐ฑ๐ฑโ๐๐ ๐ป๐ป๐ผ๐ผ ๐๐๐๐๐๐๐ฉ๐ฉ
T๐๐ ๐ฑ๐ฑ โ ๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ
6x1
6x6
50
KLT algorithm: computing โ๐ฉ๐ฉ = ๐ป๐ปโ1 โ๐ฑ๐ฑโ๐๐ ๐ป๐ป๐ผ๐ผ ๐๐๐๐๐๐๐ฉ๐ฉ
T๐๐ ๐ฑ๐ฑ โ ๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ
6x1
6x6
51
KLT algorithm: computing โ๐ฉ๐ฉ = ๐ป๐ปโ1 โ๐ฑ๐ฑโ๐๐ ๐ป๐ป๐ผ๐ผ ๐๐๐๐๐๐๐ฉ๐ฉ
T๐๐ ๐ฑ๐ฑ โ ๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ
6x1
6x6
What is the size?๐๐๐๐ ร ๐๐๐๐
๐๐ ร ๐๐๐๐
Why does it look like that?๐๐๐๐๐๐๐ฉ๐ฉ
=๐ฅ๐ฅ 0 ๐ฆ๐ฆ
0 ๐ฅ๐ฅ 0
0 1 0
๐ฆ๐ฆ 0 1
52
KLT algorithm: computing โ๐ฉ๐ฉ = ๐ป๐ปโ1 โ๐ฑ๐ฑโ๐๐ ๐ป๐ป๐ผ๐ผ ๐๐๐๐๐๐๐ฉ๐ฉ
T๐๐ ๐ฑ๐ฑ โ ๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ
6x1
6x6
Why does it look like that?(slide 48)
What is the size?๐๐๐๐ ร ๐๐๐๐
๐๐ ร ๐๐๐๐
Whatโs its size?
๐๐ ร ๐๐๐๐
๐๐๐๐๐๐๐ฉ๐ฉ
=๐ฅ๐ฅ 0 ๐ฆ๐ฆ
0 ๐ฅ๐ฅ 0
0 1 0
๐ฆ๐ฆ 0 1
KLT algorithm: computing โ๐ฉ๐ฉ = ๐ป๐ปโ1 โ๐ฑ๐ฑโ๐๐ ๐ป๐ป๐ผ๐ผ ๐๐๐๐๐๐๐ฉ๐ฉ
T๐๐ ๐ฑ๐ฑ โ ๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ
6x1
6x6
What is the size?๐๐๐๐ ร ๐๐๐๐
๐๐ ร ๐๐๐๐
Whatโs its size?
๐๐ ร ๐๐๐๐
๐๐๐๐๐๐๐ฉ๐ฉ
=๐ฅ๐ฅ 0 ๐ฆ๐ฆ
0 ๐ฅ๐ฅ 0
0 1 0
๐ฆ๐ฆ 0 1
54
Why does it look like that?(slide 48)
KLT algorithm: computing โ๐ฉ๐ฉ = ๐ป๐ปโ1 โ๐ฑ๐ฑโ๐๐ ๐ป๐ป๐ผ๐ผ ๐๐๐๐๐๐๐ฉ๐ฉ
T๐๐ ๐ฑ๐ฑ โ ๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ
๐๐๐๐๐๐๐ฉ๐ฉ
=๐ฅ๐ฅ 0 ๐ฆ๐ฆ
0 ๐ฅ๐ฅ 0
0 1 0
๐ฆ๐ฆ 0 1
What is the size?๐๐๐๐ ร ๐๐๐๐
Whatโs its size?
๐๐ ร ๐๐๐๐
๐๐ ร ๐๐๐๐
6x1
6x66x1
55
Why does it look like that?(slide 48)
KLT algorithm: Discussion
Lucas-Kanade follows a predict-correct cycle
โข A prediction ๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉ of the warped image is computed from an initial estimate
โข The correction parameter โ๐ฉ๐ฉ is computed as a function of the error ๐๐ ๐ฑ๐ฑ โ ๐ผ๐ผ ๐๐ ๐ฑ๐ฑ,๐ฉ๐ฉbetween the prediction and the template
โข The larger this error, the larger the correction applied
predict correct
56
KLT algorithm: Discussion
โข How to get the initial estimate p?โข When does the Lucas-Kanade fail?
โข If the initial estimate is too far, then the linear approximation does not longer hold -> solution?
โข Pyramidal implementations (see next slide)
โข Other problems:โข Deviations from the mathematical model: object deformations, illumination changes,
etc.โข Occlusionsโข Due to these reasons, tracking may drift -> solution?
โข Update the template with the last image
57
Coarse-to-fine estimation
image I
pI(W)warp refine
p ฮp+
Pyramid of image I Pyramid of image T
image Tu=10 pixels
u=5 pixels
u=1.25 pixels
u=2.5 pixels
image T
58
Coarse-to-fine estimation
I I(W) Twarp refine
inp
pโ+
I I(W) Twarp refine
p
pโ+
I
pyramid construction
I I(W) Twarp refine
pโ+
T
pyramid construction
outp 59
Generalization of KLT
โข The same concept (predict/correct) can be applied to tracking of 3D object (in this case, what is the transformation to etimate? What is the template?)
60
Generalization of KLT
โข The same concept (predict/correct) can be applied to tracking of 3D object (in this case, what is the transformation to etimate? What is the template?)
โข In order to deal with wrong prediction, it can be implemented in a Particle-Filter fashion (using multiple hipotheses that need to be validated)
61
Outline
โข Point trackingโข Template trackingโข Tracking by detection of local image features
62
Tracking by detection of local image features
โข Step 1: Keypoint detection and matchingโข invariant to scale, rotation, or perspective
63Template image with the object to detect Current test image
Tracking by detection of local image features
โข Step 1: Keypoint detection and matchingโข invariant to scale, rotation, or perspective
64Template image with the object to detect Current test image
Tracking by detection of local image features
โข Step 1: Keypoint detection and matchingโข invariant to scale, rotation, or perspective
โข Step 2: Geometric verification (RANSAC) (e.g., 4-point RANSAC for planar objects, or 5 or 8-point RANSAC for 3D objects)
65Template image with the object to detect Current test image
Tracking by detection of local image features
66
Tracking issues
โข How to segment the object to track from background?โข How to initialize the warping?โข How to handle occlusionsโข How to handle illumination changes and non modeled effects?
67
Readings
โข Chapter 8 of Szeliskiโs book, 1st edition
68
Understanding Check
Are you able to answer the following questions?โข Are you able to illustrate tracking with block matching?โข Are you able to explain the underlying assumptions behind differential methods, derive their mathematical expression
and the meaning of the M matrix?โข When is this matrix invertible and when not?โข What is the aperture problem and how can we overcome it?โข What is optical flow?โข Can you list pros and cons of block-based vs. differential methods for tracking?โข Are you able to describe the working principle of KLT?โข What functional does KLT minimize?โข What is the Hessian matrix and for which warping function does it coincide to that used for point tracking?โข Can you list Lukas-Kanade failure cases and how to overcome them?โข How do we get the initial guess?โข Can you illustrate the coarse-to-fine Lucas-Kanade implementation?โข Can you illustrate alternative tracking procedures using point features?
69
Recommended