CVIU Lecture 3

ENGN8530: Computer Vision and Image Understanding:

Theories and Research

Topic 3:Image Matching and Registration

Dr Chunhua Shen and Dr Roland GoeckeVISTA / NICTA & RSISE, ANU

Acknowledgement: Some slides from Dr Antonio Robles-Kelly, Dr TiberioCaetano, Dr Rob Mahony, Dr Cordelia Schmid, and Dr David Lowe

ENGN8530: CVIU 2

Some TermsImage Matching:

Align images of the same modality so as to provide a continuous ‘larger’ imageImages often taken from different viewpoints

Image Registration:Align images of the same or different modalities so that the object of interest shows up in the same way in all the imagesUsually more or less taken from the same viewpointVery important in medical imaging!

However, these terms are often used interchangeably!

ENGN8530: CVIU 3

How to Build a Panorama?

Reference: M. Brown and D.G. Lowe, “Recognising Panoramas”, ICCV 2003.

ENGN8530: CVIU 4

Template MatchingUseful for locating objects with known shape and appearance in an image.An n×m template is compared with n×m regions of the image with the aim of finding the point in the image to which it is most similar.Disadvantage:

Cannot handle pose changes wellCannot handle scale changes well

ENGN8530: CVIU 5

Template Matching (2)Given 2 images I1 and I2 (of same size) measure the correlation between them.E.g. how similar are these? Need similarity measure!

-1

-1

0

-1 0

0 1

1 1

-10

-10

0

-10 0

0 10

10 10

-10

-10

0

-10 0

0

1010

10

ENGN8530: CVIU 6

Similarity MeasuresSum of Absolute Difference (SAD):

Let I1 and I2 be images of the same size, I1(pi) = ai

I2(pi) = bi

Sum of Squared Difference (SSD) very similar, except we use squares

∑ −=i

ii ba),(SAD 21 II

ENGN8530: CVIU 7

SADExample: Find the eyes in a face

Original image showing templateand location of optimum

Sum of absolute differences, note the peak over the left eye

Template

ENGN8530: CVIU 8

SAD (2)Advantages:

Intuitivedegrades gracefully

Disadvantage:Not robust to changes in illumination

ENGN8530: CVIU 9

SAD (3)

original image I

darkened image

Idark= I/2 + 0.2

FAILS:

not robust to uniform change in illumination

ENGN8530: CVIU 10

CorrelationCorrelation measures how changes in one variable correlate with changes in another variable.We will use the normalised cross-correlation (NCC) to determine correlation between I1 and I2. Without normalisation, correlation is also sensitive to illumination changes.This measures the similarity in the way in which I1 and I2 deviate from their mean values. Measures the similarity of the ‘patterns’ of two images.

⇒ Robust to changes in illumination.

ENGN8530: CVIU 11

Normalised Cross CorrelationLet I1 and I2 be images of the same size, I1(pi) = ai

I2(pi) = bi

( )( )( )

( ) ( )2221

∑∑

∑

−−

−−=

ii

ii

iii

bbaa

bbaa),(NCC II

ENGN8530: CVIU 12

NCC (2)

I1 I2 NCC(I1, I2)

-1-10

-1 00 11 1

-1-10

-1 00 11 1

k>01k . +l

-1-10

-1 00 11 1 -1-10

-10

01

11

-1k . +l k>0

Here k and l are constants, '+l' means to add l to all matrix elements

ENGN8530: CVIU 13

NCC (3)

NCC(I1, I2) ∈ [-1,1]Measures the similarity of the ‘patterns’ of two images.Is undefined for a flat featureless image:

This virtually neveroccurs in practice

k

k

k

k k

k k

k k

This means “the interval on the real line bounded by –1 and 1, containing both its boundary points.”

ENGN8530: CVIU 14

NCC (4)

0

1

1

1

-1

0

1

1

-1

-1

0

0

2

3

2

0

-1

-1

0

-1 0

0 1

1 1

Apply in a similar manner to convolution, but• only calculate in ‘valid’ region.

ENGN8530: CVIU 15

NCC (5)

-1

-1

0

-1 0

0 1

1 1


1

-1

-1

0

-1 0

0 1

1 1

0

1

1

1

-1

0

1

1

-1

-1

0

0

2

3

2

0

ENGN8530: CVIU 16

NCC (6)


-1

-1

0

-1 0

0 1

1 1

-1

-1

0

-1 0

0 1

1 1

0

1

1

1

-1

0

1

1

-1

-1

0

0

2

3

2

0

1 0.846

ENGN8530: CVIU 17

NCC (7)


-1

-1

0

-1 0

0 1

1 1

-1

-1

0

-1 0

0 1

1 1

0

1

1

1

-1

0

1

1

-1

-1

0

0

2

3

2

0

1 0.846

0.833

ENGN8530: CVIU 18

NCC (8)


-1

-1

0

-1 0

0 1

1 1

-1

-1

0

-1 0

0 1

1 1

0

1

1

1

-1

0

1

1

-1

-1

0

0

2

3

2

0

1 0.846

0.833 0.258

ENGN8530: CVIU 19

Template Matching using NCCTemplate of right eye is flipped and used to locate left eye

Original image showing template,and location of maximum

in normalised cross-correlation

Normalised cross-correlation, note the peak over the left eye

ENGN8530: CVIU 20

NCC v. SADNCC SAD

unaltered image I

darkened image

Idark= I/2 + 0.2

Robust to uniform change

in illuminationnot robust to uniform change in illumination

FAIL!

ENGN8530: CVIU 21

IssuesThese similarity measures are not very good for handling

Scale changesPose changesArbitrary rotations of the object or cameraIllumination changes, in particular non-global

How can we improve image matching / registration?Solution:

Use local invariant image featuresThen use these features to do the matching / registration

ENGN8530: CVIU 22

How to Build a Panorama? (2)Need to align (match) images

ENGN8530: CVIU 23

How to Build a Panorama? (3)Detect feature points in both images

ENGN8530: CVIU 24

How to Build a Panorama? (4)Detect feature points in both imagesFind corresponding pairs

ENGN8530: CVIU 25

How to Build a Panorama? (5)Detect feature points in both imagesFind corresponding pairsUse these pairs to align images

ENGN8530: CVIU 26

Local Image FeaturesLocal invariant photometric descriptors

( )local descriptor

Local : robust to occlusion/clutter + no segmentationPhotometric : distinctiveInvariant : to image transformations + illumination changes

ENGN8530: CVIU 27

Invariant FeaturesImage content is transformed into local feature coordinates thatare invariant to translation, rotation, scale, and other imagingparameters

SIFT Features

ENGN8530: CVIU 28

Invariant Features (2)Advantages:

Locality: features are local, so robust to occlusion and clutter (no prior segmentation)Distinctiveness: individual features can be matched to a large database of objectsQuantity: many features can be generated for even small objectsEfficiency: close to real-time performanceExtensibility: can easily be extended to wide range of differing feature types, with each adding robustness

ENGN8530: CVIU 29

Matching with FeaturesProblem 1:

Detect the same point independently in both images

no chance to match!

We need a repeatable detector

ENGN8530: CVIU 30

Matching with Features (2)Problem 2:

For each point correctly recognize the corresponding one

?

We need a reliable and distinctive descriptor

ENGN8530: CVIU 31

Matching with Features (3)Determining correspondences

Vector comparison using the Mahalanobis distance

)()(),( 1 qpqpqp −Λ−= −TMdist

( ) ( )=?

ENGN8530: CVIU 32

A Little Bit of History…Zhang, Deriche, Faugeras, Luong (Artificial Intelligence, 1995):

Apply Harris corner detectorMatch points by correlating only at corner points Derive epipolar alignment using robust least-squares

ENGN8530: CVIU 33

A Little Bit of History… (2)Schmid & Mohr (1997)Apply Harris corner detectorUse rotational invariants at corner points

However, not scale invariant. Sensitive to viewpoint and illumination change.

ENGN8530: CVIU 34

Interest Point DetectorsContour based methods

Junctions, ends, etc.

Intensity based methodsAuto-correlation matrix

Parametric-model based methodL-corner

…

ENGN8530: CVIU 35

Harris Corner Detector

Basic idea:We should easily recognize the point by looking through a small windowShifting a window in any directionshould give a large change in intensity

Reference: C. Harris and M. Stephens, “A combined corner and edge detector”, Proceedings of the 4th Alvey Vision Conference, 1988, pp. 147--151.

ENGN8530: CVIU 36

Harris Corner Detector (2)Based on the idea of auto-correlation

“flat” region:no change in all directions

“edge”:no change along the edge direction

“corner”:significant change in all directions

ENGN8530: CVIU 37

Harris Corner Detector (3)Change of intensity for the shift [u,v]:

[ ]2

,

( , ) ( , ) ( , ) ( , )x y

E u v w x y I x u y v I x y= + + −∑

IntensityShifted intensity

Window function

orWindow function w(x,y) =

Gaussian1 in window, 0 outside

ENGN8530: CVIU 38

Harris Corner Detector (4)

For small shifts [u,v] we have a bilinear approximation:

[ ]( , ) ,u

E u v u v Mv⎡ ⎤

≅ ⎢ ⎥⎣ ⎦

where M is a 2×2 matrix computed from image derivatives:

2

2,

( , ) x x y

x y x y y

I I IM w x y

I I I⎡ ⎤

= ⎢ ⎥⎢ ⎥⎣ ⎦

∑Auto-correlation matrix

ENGN8530: CVIU 39

Harris Corner Detector (5)Intensity change in shifting window: eigenvalue analysis

[ ]( , ) ,u

E u v u v Mv⎡ ⎤

≅ ⎢ ⎥⎣ ⎦

direction of the slowest change

direction of the fastest change

λ1, λ2 – eigenvalues of M

(λmax)-1/2

(λmin)-1/2

Ellipse E(u,v) = const

ENGN8530: CVIU 40

Harris Corner Detector (6)Auto-correlation matrix

captures the structure of the local neighborhoodmeasure based on eigenvalues of this matrix

2 strong eigenvalues => interest point1 strong eigenvalue => contour0 eigenvalue => uniform region

Interest point detectionthreshold on the eigenvalueslocal maximum for localization

ENGN8530: CVIU 41


λ1

λ2

“Corner”λ1 and λ2 are large,λ1 ~ λ2;E increases in all directions

λ1 and λ2 are small;E is almost constant in all directions

“Edge”λ1 >> λ2

“Edge”λ2 >> λ1

“Flat”region

Classification of image points using eigenvalues of M:

ENGN8530: CVIU 42


Measure of corner response:

( )2det traceR M k M= −

1 2

1 2

dettrace

MM

λ λλ λ

== +

(k – empirical constant, k = 0.04-0.06)

ENGN8530: CVIU 43


λ1

λ2 “Corner”

“Edge”

“Edge”

“Flat”

R > 0

R < 0

R < 0|R| small

•R depends only on eigenvalues of M

•R is large for a corner

•R is negative with large magnitude for an edge

•|R| is small for a flatregion

ENGN8530: CVIU 44

Harris Corner Detector (10)The Algorithm:

Find points with large corner response function R (R > threshold)Take the points of local maxima of R

ENGN8530: CVIU 45

Harris: Workflow

ENGN8530: CVIU 46

Harris: Workflow (2)Compute corner response R

ENGN8530: CVIU 47

Harris: Workflow (3)Find points with large corner response: R>threshold

ENGN8530: CVIU 48

Harris: Workflow (4)Take only the points of local maxima of R

ENGN8530: CVIU 49

Harris: Workflow (5)

ENGN8530: CVIU 50

Harris: PropertiesRotation invariance

Ellipse rotates but its shape (i.e. eigenvalues) remains the same

Corner response R is invariant to image rotation

ENGN8530: CVIU 51

Harris: Properties (2)Partial invariance to affine intensity change

Only derivatives are used => invariance to intensity shift I → I + b

Intensity scale: I → a I

R

x (image coordinate)

threshold

R

x (image coordinate)

ENGN8530: CVIU 52

Harris: Properties (3)But: non-invariant to image scale!

All points will be classified as edges

Corner !

ENGN8530: CVIU 53

Scale Invariant FeaturesConsider regions (e.g. circles) of different sizes around a pointRegions of corresponding sizes will look the same in both images

ENGN8530: CVIU 54

Scale Invariant Features (2)The problem: how do we choose corresponding circles independently in each image?

ENGN8530: CVIU 55

Scale Invariant Features (3)Solution:

Design a function on the region (circle), which is “scale invariant” (the same for corresponding regions, even if they are at different scales)

Example: average intensity. For corresponding regions (even of different sizes) it will be the same.

scale = 1/2

– For a point in one image, we can consider it as a function of region size (circle radius)

f

region size

Image 1 f

region size

Image 2

ENGN8530: CVIU 56

Scale Invariant Features (4)Common approach:

scale = 1/2

f

region size

Image 1 f

region size

Image 2

Take a local maximum of this functionObservation: region size, for which the maximum is achieved, should be invariant to image scale.

s1 s2

Important: This scale invariant region size is found in each image independently!

ENGN8530: CVIU 57

Scale Invariant Features (5)A “good” function for scale detection:

has one stable sharp peak

f

region size

bad

f

region size

Good !f

region size

bad

• For usual images: a good function would be a one which responds to contrast (sharp local intensity change)

ENGN8530: CVIU 58

Scale Invariant Features (6)Functions for determining scale

2 2

21 22

( , , )x y

G x y e σπσ

σ+

−=

( )2 ( , , ) ( , , )xx yyL G x y G x yσ σ σ= +

( , , ) ( , , )DoG G x y k G x yσ σ= −

Kernel Imagef = ∗Kernels:

where Gaussian

(Laplacian)

(Difference of Gaussians)

Note: both kernels are invariant to scale and rotation

ENGN8530: CVIU 59

Scale Invariant Features (7)

scale

x

y

← Harris →

←La

plac

ian →

Harris-LaplacianFind local maximum of:

Harris corner detector in space (image coordinates)Laplacian in scale

Reference: K.Mikolajczyk, C.Schmid. “Indexing Based on Scale Invariant Interest Points”. ICCV 2001

ENGN8530: CVIU 60

Scale invariant Harris pointsMulti-scale extraction of Harris interest pointsSelection of points at characteristic scale in scale space

Characteristic scale:- Maximum in scale space- Scale invariantLaplacian

ENGN8530: CVIU 61

Scale invariant Harris points (2)

Multi-scale Harris points

Selection of points

at the characteristic scalewith Laplacian

invariant points + associated regions

ENGN8530: CVIU 62

Viewpoint ChangesLocally approximated by an affine transformation

A

Detected scale invariant region Projected region

Affine transformation: Linear transformation followed by a translation

ENGN8530: CVIU 63

Affine Invariant FeaturesSo far we considered:Similarity transform (rotation + uniform scale)

• Now we go on to:Affine transform (rotation + non-uniform scale)

ENGN8530: CVIU 64

Affine Invariant Features (2)

Take a local intensity extremum as initial pointGo along every ray starting from this point and stop when extremum of function f is reached

f

points along the ray

0

10

( )( )

( )t

ot

I t If t

I t I dt

−=

−∫

Reference: T.Tuytelaars, L.V.Gool. “Wide Baseline Stereo Matching Based on Local, Affinely Invariant Regions”. BMVC 2000.

ENGN8530: CVIU 65

Affine Invariant Features (3)We will obtain approximately corresponding regions

The regions found may not exactly correspond, so we approximate them with ellipsesGeometric moments of orders up to 2 allow to approximate the region by an ellipse

Remark: Search for scale in every direction

ENGN8530: CVIU 66

Affine Invariant Features (4)

q Ap=

2 1TA AΣ = Σ

12 1Tq q−Σ =

2 region 2

TqqΣ =

• Covariance matrix of region points defines an ellipse:

11 1Tp p−Σ =

1 region 1

TppΣ =

( p = [x, y]T is relative to the center of mass)

Ellipses, computed for corresponding regions,

also correspond!

ENGN8530: CVIU 67

Affine Invariant Harris

Initialisation with multi-scale interest points

Iterative modification of location, scale and neighbourhood

ENGN8530: CVIU 68

MSERMaximally Stable Extremal Regions

Threshold image intensities: I > I0

Extract connected components(“Extremal Regions”)Find a threshold when an extremalregion is “Maximally Stable”,i.e. there is a local minimum of the relative growth of its squareApproximate a region with an ellipse

Reference: J. Matas, O. Chum, M.Urban, T. Pajdla, “Robust Wide Baseline Stereo from Maximally Stable Extremal Regions”, BMVC 2002, pp. 384-393

ENGN8530: CVIU 69

SIFTScale-Invariant Feature TransformBasically SIFT is a 4-step process

Scale-space extrema detectionKeypoint localizationOrientation assignmentKeypoint descriptor

Reference: D. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.

ENGN8530: CVIU 70

SIFT (2)Build Scale-Space Pyramid

All scales must be examined to identify scale-invariant featuresAn efficient function is to compute the Difference of Gaussian (DOG) pyramid (Burt & Adelson, 1983)

Blur

Res ample

Subtra ct

Blur

Res ample

Subtra ct

Blur

Resample

Subtract

ENGN8530: CVIU 71

SIFT (3)Scale space processed one octave at a time

ENGN8530: CVIU 72

SIFT (4)Key point localisation

Detect maxima and minima of Difference-of-Gaussian (DoG) in scale spaceCould also use Laplacian of Gaussian (LoG)

ENGN8530: CVIU 73

SIFT (5)

0 2π

Select canonical orientation:Create histogram of local gradient directions computed at selected scaleAssign canonical orientation at peak of smoothed histogramEach key specifies stable 2D coordinates (x, y, scale, orientation)

ENGN8530: CVIU 74

SIFT (6)SIFT vector formation (Keypoint Descriptor):

Thresholded image gradients are sampled over 16x16 array of locations in scale spaceCreate array of orientation histograms8 orientations x 4x4 histogram array = 128 dimensions

ENGN8530: CVIU 75

SIFT Example

Laplacian of Gaussian

ENGN8530: CVIU 76

SIFT Example (2)

SIFT keypoints

ENGN8530: CVIU 77

SIFT Example (3)

Query

Result

Task:

Find query image parts in the image

ENGN8530: CVIU 78

SummarySIFT arguably the best affine invariant local image feature, but…SIFT is relatively expensive (computationally)MSER doesn’t work well with images with any motion blur, e.g. from a moving cameraInteresting alternatives:

GLOH (Gradient Location and Orientation Histogram)SURF (Speeded Up Robust Features)Histogram of Oriented GradientsKadir-Brady Saliency Detector

Documents

CVIU Lecture 3