Upload
josie
View
116
Download
0
Tags:
Embed Size (px)
DESCRIPTION
SIFT (scale invariant feature transform). 2011.4.14 Reporter: Fei-Fei Chen. What is Computer Vision?. Local Invariant Feature. Applications. Wide-baseline matching Object recognition Texture recognition Scene classification Robot wandering Motion tracking Change in illumination - PowerPoint PPT Presentation
Citation preview
2011.4.14Reporter: Fei-Fei Chen
SIFT (scale invariant feature
transform)
What is Computer Vision?
Local Invariant Feature
Wide-baseline matchingObject recognitionTexture recognitionScene classificationRobot wanderingMotion trackingChange in illumination3D camera viewpointetc.
Applications
Object recognition
3D object recognition
Image retrieval (1/3)
…> 5000images
change in viewing angle
Image retrieval (2/3)
22 correct matches
Image retrieval (3/3)
…> 5000images
change in viewing angle+ scale change
Automatic image stitching (1/2)
Automatic image stitching (2/2)
Motivation: Matching ProblemFind corresponding features across two or
more views.
Elements to be matched are image patches of fixed size
Task: Find the best (most similar) patch in a second image.
Motivation: Patch Matching
Intuition: This would be a good match for matching, since it is very distinctive.
Not all patches are created equal
Intuition: This would be a BAD patch for matching, since it is not very distinctive.
Not all patches are created equal
Intuitively, junctions of contours.Generally more stable features over change of viewpoint.Intuitively, large variations in the neighborhood of the
point in all directions.They are good features to match!
What are corners?
SIFTDetection of Scale-Space ExtremaAccuracy Keypoint localizationOrientation assignmentKeypoint descriptor
detector
descriptor
For scale invariance, search for stable features across all possible scales using a continuous function of scale, scale space.
SIFT uses DoG filter for scale space because it is efficient and as stable as scale-normalized Laplacian of Gaussian.
1. Detection of scale-space extrema
DoG filteringConvolution with a variable-scale Gaussian
Difference-of-Gaussian (DoG) filter
Convolution with the DoG filter
Scale space doubles for the next octave
K=2(1/s)
Dividing into octave is for efficiency only.
Detection of scale-space extrema
Keypoint localization
X is selected if it is larger or smaller than all 26 neighbors
2. Accurate keypoint localization
Reject (1) points with low contrast (flat) (2) poorly localized along an edge
(edge)Fit a 3D quadratic function for sub-pixel
maxima
1
65
0-1 +1
31ˆ x
22 3262626)( xxxxxf
062)(' xxf
316
313
3126)ˆ(
2
xf
316
31
2. Accurate keypoint localizationTaylor series of several variables
Two variables
222
22
221)0,0(),( y
yyfxy
yxfx
xxfy
yfx
xffyxf
yx
yyf
yxf
yxf
xxf
yxyx
yf
xff
yx
f 22
22
21
00
xx
xxx
0x 2
2
21
ffff T
T
2. Accurate keypoint localization
Taylor expansion in a matrix form, x is a vector, f maps x to a scalar
nxf
xfxf
1
1
2
2
2
2
1
2
2
2
22
2
12
21
2
21
2
21
2
nnn
n
n
xf
xxf
xxf
xxf
xf
xxf
xxf
xxf
xf
Hessian matrix(often symmetric)
gradient
2D illustration
Derivation of matrix form
xffxffffT
2
2
2
2
2
2
21
xxxxxx
2. Accurate keypoint localization
x is a 3-vectorRemove sample point if offset is larger than
0.5Throw out low contrast (<0.03)
Eliminating edge responses
r=10
Let
Keep the points with
Hessian matrix at keypoint location
3. Orientation assignmentBy assigning a consistent orientation, the
keypoint descriptor can be orientation invariant.
For a keypoint, L is the Gaussian-smoothed image with the closest scale,
orientation histogram (36 bins)
(Lx, Ly)
m
θ
Orientation assignment
Orientation assignment
Orientation assignment
Orientation assignment
σ=1.5*scale of the keypoint
Orientation assignment
Orientation assignment
Orientation assignmentaccurate peak position is determined by fitting
Orientation assignment
0 2
36-bin orientation histogram over 360°, weighted by m and 1.5*scale falloff
Peak is the orientation
Local peak within 80% creates multiple orientations
About 15% has multiple orientations and they contribute a lot to stability
4. Local image descriptor
σ=0.5*width
• Thresholded image gradients are sampled over 16x16 array of locations in scale space
• Create array of orientation histograms (w.r.t. key orientation)• 8 orientations x 4x4 histogram array = 128 dimensions• Normalized for intensity variance, clip values larger than 0.2,
renormalize
Conclusions for SIFT Detection of Scale-Space Extrema
Accuracy Keypoint localization
Orientation assignment
Keypoint descriptor
For scale invariance
For rotation invariance
Remove unstable feature points
For illumination invariance
Image scale invariance. Image rotation invariance.Robust matching across a substantial range
of (1) affine distortion, (2) change in 3D viewpoint, (3) addition of noise, (4) change in illumination.
Conclusions for SIFT
For a feature x, he found the closest feature x1 and the second closest feature x2. If the distance ratio of d(x, x1) and d(x, x2) is smaller than 0.8, then it is accepted as a match.
Feature matching
Maxima in DoG
Remove low contrast
Remove edges
SIFT descriptor
SIFT descriptor
SIFT descriptor
SIFT descriptor
Image Matching
Image Matching
Image Matching
Image Matching
Image Matching
Image Matching
Image Matching
Image Matching
Thanks for your attention!
Q&A