102
Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

  • View
    229

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Approximate Nearest Neighbor - Applications to Vision & Matching

Lior ShovalRafi Haddad

Page 2: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Approximate Nearest NeighborApplications to Vision & Matching

1. Object matching in 3D Recognizing cars in cluttered

scanned images A. Frome, D. Huber, R. Kolluri, T. Bulow,

and J. Malik

2. Video Google A Text Retrieval Approach to object

Matching in Videos Sivic, J. and Zisserman, A

Page 3: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Object Matching Input:

An object and a dataset of models Output:

The most “similar” model Two methods will be presented

1. Voting based method2. Cost based method

Object Sq Model S1 Model S2 Model Sn …

Page 4: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

A descriptor based Object matching - Voting Every descriptor vote for the model that gave

the closet descriptor Choose the model with the most votes Problem

The hard vote discards the relative distances between descriptors

Object Sq Model S1 Model S2 Model Sn …

Page 5: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

A descriptor based Object matching - Cost Compare all object descriptors to all

target model descriptors

Object Sq Model S1 Model S2 Model Sn …

},..,1{},..,1{

),(min),(cosKk

mkMm

iq pqdistSSt

Page 6: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Application to cars matching

Page 7: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Matching - Nearest Neighbor

In order to match the object to the right model a NN algorithm is implemented

Every descriptor in the object is compared to all descriptors in the model

The operational cost is very high.

Page 8: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Experiment 1 – Model matching

Page 9: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Experiment 2 – Clutter scenes

Page 10: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Matching - Nearest Neighbor

E.g: Q – 160 descriptors in the object N – 83,640 [ref. desc.] X 12

[rotations] ~ 1E6 descriptors in the models

Exact NN - takes 7.4 Sec on 2.2GHz processor per one object descriptor

Page 11: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Speeding search with LSH Fast search techniques such as LSH

(Locality-sensitive hashing) can reduce the search space by order of magnitude

Tradeoff between speed and accuracy LSH – Dividing the high dimensional

feature space into hypercubes, devided by a set of k randomly-chosen axis parallel hyperplanes & l different sets of hypercubes

Page 12: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

LSH – k=4; l=1

Page 13: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

LSH – k=4; l=2

Page 14: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

LSH – k=4; l=3

Page 15: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

LSH - Results Taking the best

80/160 descriptors

Achieving close results with fewer descriptors

Page 16: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Descriptor based Object matching – Reducing Complexity

Approximate nearest neighbor Dividing the problem to two stages

1. Preprocessing 2. Querying

Locality-Sensitive Hashing (LSH)

Or...

Page 17: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Video Google

A Text Retrieval Approach to object Matching in Videos

Page 18: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Query

Results

Page 19: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Interesting facts on Google

The most used search engine in the web

Page 20: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Who wants to be a Millionaire?

Page 21: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

a. Around half a billion

How many pages Google search?

c. Around 10 billions d. Around 50 billions

b. Around 4 billions

Page 22: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

a. 10

How many machines do Google use?

c. Few thousands d. Around a million

b. Few hundreds

Page 23: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Video Google: On-line Demo

SamplesRun Lola Run:Supermarket logo (Bolle)

Frame/shot 72325 / 824Red cube logo:

Entry frame/shot 15626 / 174Rolette #20

Frame/shot 94951 / 988

Groundhog Day:Bill Murray's ties

Frame/shot 53001/294Frame/shot 40576/208

Phil's home:Entry frame/shot 34726/172

Page 24: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Query

Page 25: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad
Page 26: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad
Page 27: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Occluded !!!

Page 28: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad
Page 29: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad
Page 30: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad
Page 31: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad
Page 32: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Video Google

Text Google Analogy from text to video Video Google processes Experimental results Summary and analysis

Page 33: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Text retrieval overview

Word & Document Vocabulary Weighting Inverted file Ranking

Page 34: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Words & Documents Documents are parsed into words Common words are ignored (the, an,

etc) This is called ‘stop list’

Words are represented by their stems ‘walk’, ‘walking’, ‘walks’ ’walk’

Each word is assigned a unique identifier A document is represented by a vector

With components given by the frequency of occurrence of the words it contains

Page 35: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Vocabulary

The vocabulary contains K words Each document is represented by a

K components vector of words frequencies

(0,0, … 3,… 4,…. 5, 0,0)

Page 36: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Example:

“…… Representation, detection and learning are

the main issues that need to be tackled in

designing a visual system for recognizing

object. categories …….”

Page 37: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Parse and clean

represent detect learn

Representation, detection and learning are the

main issue tackle design

main issues that need to be tackled in designing

visual system recognize category

a visual system for recognizing object categories.

Page 38: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Creating document vector ID Assign unique id to

each word Create a document

vector of size K with word frequency: (3,7,2,………)/789 Or compactly with

the original order and position

WordPositionID

represent1,12,551

detect2,32,44.,..

2

learn3,113

…….…

Total789

Page 39: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Weighting

The vector components are weighted in various ways: Naive - Frequency of each word. Binary – 1 if word appear 0 if not. tf-idf - ‘Term Frequency – Inverse

Document Frequency’

id

idi n

N

n

nt log

Page 40: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

tf-idf Weighting

- Number of occurrences of word i in document

- Total number of words in the document - The number of documents in the whole

database - The number of occurrences of term i in

the whole database=> “Word frequency” X “Inverse

document frequency”=> All documents are equal!

id

idi n

N

n

nt log

idn

dnN

in

Tkid tttV ,...,,...,1

Page 41: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Inverted File – Index Crawling stage

Parsing all documents to create document representing vectors

Creating word Indices An entry for each word in

the corpus followed by a list of all documents (and positions in it)

Word ID

1

2

3

K

Doc. ID

1

2

3

N

Page 42: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

1. Parsing the query to create query vector Query: “Representation learning” Query Doc ID = (1,0,1,0,0,…)

2. Retrieve all documents ID containing one of the Query words ID (Using the invert file index)

3. Calculate the distance between the query and document vectors (angle between vectors)

4. Rank the results

Querying

Page 43: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Ranking the query results

1. Page Rank (PR) Assume page A has page T1,T2…Tn links to it Define C(X) as the number of links in page

X d is a weighting factor ( 0≤d≤1)

2. Word Order3. Font size, font type and more

n

i TiC

TiPRddAPR

1 )(

)()1()(

Page 44: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

FilmCorpus

The Visual Analogy

Document Frame

Stem ???

???Word

Text Visual

Page 45: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Detecting “Visual Words”

“Visual word” Descriptor What is a good descriptor?

Invariant to different view points, scale, illumination, shift and transformation

Local Versus Global How to build such a descriptor ?

1. Finding invariant regions in the frame2. Representation by a descriptor

Page 46: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Finding invariant regions

Two types of ‘viewpoint covariant regions’, are computed for each frame

1. SA – Shape Adapted2. MS - Maximally Stable

Page 47: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

1. SA – Shape Adapted

• Finding interest point using Harris corner detector

• Iteratively determining the ellipse center, scale and shape around the interest point

• Reference - Baumberg

Page 48: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

2. MS - Maximally Stable Intensity water shade image segmentation Iteratively determining the ellipse center,

scale and shape Reference - Matas

Page 49: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Why two types of detectors ? They are complementary representation

of a frame SA regions tends to centered at corner like

features MS regions correspond to blobs of high

contrast (such as dark window on a gray wall)

Each detector describes a different “vocabulary” (e.g. the building design and the building specification)

Page 50: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

MS - MA example

MS – yellowSA - cyan Zoom

Page 51: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Building the Descriptors

SIFT – Scale Invariant Feature Transform Each elliptical region is represented

by a 128-dimensional vector [Lowe] SIFT is invariant to a shift of a few

pixels (often occurs)

Page 52: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Building the Descriptors

Removing noise – tracking & averaging Regions are tracked across sequence of

frames using “Constant Velocity Dynamical model”

Any region which does not survive for more than three frames is rejected

Descriptors throughout the tracks are averaged to improve SNR

Large covariance’s descriptors are rejected

Page 53: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

FilmCorpus

The Visual Analogy

Document Frame

Stem ???

DescriptorWord

Text Visual

Page 54: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Building the “Visual Stems”

Cluster descriptors into K groups using K-mean clustering algorithm

Each cluster represent a “visual word” in the “visual vocabulary”

Result: 10K SA clusters 16K MS clusters

Page 55: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

K-Mean Clustering Input

A set of n unlabeled examples D={x1,x2,…,xn} in d-dimensional feature space

Number of clusters - K Objective

Find the partition of D into K non-empty disjoint subsets

So that the points in each subset are coherent according to certain criterion

jiDDDD ji

K

j j 1

E.g. Minimize square distance of vectors to centroids

j

jj Dx

Dj

K

j Dxj xmmx 1

1

2 ;

Page 56: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

K-mean clustering - algorithm

Step 1: Initialize a partition of D

a. Randomly choose K equal size sets and calculate their centers

D={a,b,…,k,l) ; n=12 ; K=4 ; d=2

m1

Page 57: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

K-mean clustering - algorithm

Step 1: Initialize a partition of D

b. For other point y, it is put into subset Dj, if xj is the closest center to y among the K centers

m1

D1={a,c,l} ; D2={e,g} ;

D3={d,h,i} ; D4={b,f,k)

Page 58: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

K-mean clustering - algorithm

Step 2: Repeat till no update

a. Compute the mean (mass center) for each cluster Dj,

b. For each xi:assign xi to the cluster with the closest center

m1

D1={a,c,l} ; D2={e,g} ;

D3={d,h,i} ; D4={b,f,k)

Page 59: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

K-mean algorithm

Final result

Page 60: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

K-mean clustering - Cons

Sensitive to selection of initial grouping and metric

Sensitive to the order of input vectors

The number of clusters, K, must be determined before hand

Each attribute has the same weight

Page 61: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

K-mean clustering - Resolution

Run with different grouping and ordering

Run for different K values Problem ?

Complexity!

Page 62: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

MS and SA “Visual Words”

SA

MS

Page 63: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

FilmCorpus

The Visual Analogy

Document Frame

Stem Centroid

DescriptorWord

Text Visual

Page 64: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Visual “Stop List” The most

frequent visual words that occur in almost all images are suppressed

After stop list

Before stop list

Page 65: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Ranking Frames

1. Distance between vectors (Like in words/Document)

2. Spatial consistency (= Word order in the text)

Page 66: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Visual Google process

Preprocessing: Vocabulary building Crawling Frames Creating Stop list

Querying Building query vector Ranking results

Page 67: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Vocabulary building

Regions construction (SA + MS)

10k frames * 1600 = 1.6E6 regions

Frames tracking

Subset of 48 shots is selected

10k frames = 10% of movie

Rejecting unstable regions

Clustering descriptors using k-mean algo.

Parameters tuning is done with the

ground truth set

SIFT descriptors representation

1.6E6 ~200k regions

Page 68: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Crawling Implementation To reduce complexity – one keyframe

per second is selected (100-150k frames 5k frames)

Descriptors are computed for stable regions in each key frame

Mean values are computed using two frames each side of the key frame

Vocabulary: Vector quantization – using the nearest neighbor algorithm (found from the ground truth set)

•The expressiveness of the visual vocabulary

Frames outside the ground truth set contains new object and scenes, and their detected regions have not been included in forming the clusters

Page 69: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Crawling movies summary

Regions construction (SA + MS)

Frames tracking

Key frames selection

5k frames

Rejecting unstable regions

Nearest neighbored for vector quantization

Stop list Tf-idf weighting Indexing

SIFT descriptors representation

Page 70: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

“Google like” Query Object

Use nearest neighbor algo’ to build query vector

Use inverse index to find relevant frames

Generate query descriptor

Calculate distance to relevant frames

Rank results

0.1 seconds with a Matlab

Doc vectors are sparse small set

Page 71: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Experimental results

The experiment was conducted in two stages: Scene location Matching Object retrieval

Page 72: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Scene Location matching

Goal Evaluate the method by matching

scene locations within a closed world of shots (=‘ground truth set’)

Tuning the system parameters

Page 73: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Ground truth set

164 frames, from 48 shots, were taken at 19 3D location in the movie ‘Run Lola Run’ (4-9 frames from each location)

There are significant view point changes in the frames for the same location

Page 74: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Ground Truth Set

Page 75: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Location matching

The entire frame is used as a query region

The performance is measured over all 164 frames

The correct results were determined by hand

Rank calculation

Page 76: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Location matching

relN

i

relreli

rel

NNR

NNRank

1 2

11

Rank - Ordering quality (0≤Rank≤1) ; 0 - best

Nrel - number of relevant images

N - the size of the image set (164)

Ri - the location of the i-th relevant image (1≤Ri≤N) in the result

if all the relevant images are

returned first

relN

ii

relrel RNN

12

1

Page 77: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Location matching - Example

– Frame 6 is the current query frame– Frames 13,17,29,135 contain the same scene

location Nrel = 5.

– The result was: {17,29,6,142,19,135,13,…

Frame number

6131729135Total

Ri3712619 Best Rank“4“"515

Page 78: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Location matching

152

155

2

1

relrel NN

19621731

relN

iiR

00487.015195164

1

Rank

Best Rank

Query Rank

relN

i

relreli

rel

NNR

NNRank

1 2

11

Page 79: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Rank of relevant frames

Page 80: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Frames 61 - 64

Page 81: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Object retrieval

Goal Searching for objects throughout the

entire movie The object of interest is specified by

the user as a sub part of any frame

Page 82: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Object query results (1)

Run Lola Run results

Page 83: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Groundhog Day results

Object query results (2)

•The expressive power of the visual vocabulary

The visual word learnt for ‘Lola’ are used unchanged for the ‘groundhog day’ retrieval!

Page 84: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Object query results (2)

Analysis: Both the actual frame returned and

the ranking are excellent No frames containing the object are

missed No false negative

The highly ranked frames all do contain the object

Good precision

Page 85: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Google Performance Analysis Vs Object macthing Q – Number of queried descriptors (~102) M – Number of descriptors per frame (~103) N – Number of key frames per movie (~104) D – Descriptor dimension (128~102) K – Number of “words” in the vocabulary (16X103~103) α - ratio of documents that does not contain any of the Q “words”

(~.1)

Brute force NN: Cost = QMND ~ 1011 Google: Query Vector quantization + Distance =

QKD + KN QKD + Q(αN)~ 107 + 105

Improvement factor ~ 104 -:- 106

Sparse

Page 86: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Video Google Summary

Immediate run-time object retrieval

Visual Word and vocabulary analogy

Modular frame work Demonstration of the expressive

power of the visual vocabulary

Page 87: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Open issues

Automatic ways for building the vocabulary are needed

Ranking of retrieval results method as Google does

Extension to non rigid objects, like faces

Page 88: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Future thoughts

Using this method for higher level analysis of movies Finding content of a movie by the

“words” it contains Finding the important (e.g. a star)

object in a movie Finding the location of unrecognized

video frames More ?

Page 89: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

a. The number 1E10

What is the meaning of the word Google?

c. The number 1E100 d. A simple clean search

b. Very big data

$1 Million!!!

Page 90: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Reference1. Sivic, J. and Zisserman, A., Video Google: A Text Retrieval Approach to Object Matching in

Videos. Proceedings of the International Conference on Computer Vision (2003)

2. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In 7th Int. WWW Conference, 1998.

3. K. Mikolajczyk and C. Schmid. An affine invariant interest point detector. In Proc. ECCV. Springer-Verlag, 2002.

4. A. Frome, D. Huber, R. Kolluri, T. Bulow, and J. Malik. Recognizing Objects in Range Data Using Regional Point Descriptors. To appear in European Conference on Computer Vision, Prague, Czech Republic, 2004

5. D. Lowe. Object recognition from local scale-invariant features. In Proc. ICCV, pages 1150–1157, 1999.

6. F. Schaffalitzky and A. Zisserman; Automated Location Matching in Movies

7. J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide baseline stereo from maximally stable

external regions. In Proceedings of the British Machine Vision Conference, pages 384.393, 2002.

Page 91: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad
Page 92: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Parameter tuning

K – number of clusters for each region type

The initial cluster center values Minimum tracking length for stable

features The proportion of unstable

descriptors to reject, based on their covariance

Page 93: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Locality-Sensitive Hashing (LSH) Divide the high -

dimensional feature space into hypercubes, by k randomly chosen axis-parallel hyperplanes

Each hypercube is a hash bucket

The probability that 2 nearby points are separated is reduced by independently choosing l different sets of hyperplanes

2 hyperplanes

Page 94: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

ε-nearest-neighbor

Page 95: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

ε-Nearest Neighbor Search

• d(q, p) ≤ (1 + ε) d(q, P)• d(q, p) is the distance between p and q in

the euclidean space• Normalized distance• d(q, p) = (Σ (x(i) – y(i))2)(1/2)

• Epsilon is the maximum allowed 'error'• d(q, P) distance of q to the closest point in

P• Point p is the member of P that is retrieved

(or not)

Page 96: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

ε-Nearest Neighbor Search Also called approximate Nearest

Neighbor searching Reports nearest neighbors to the

query point (q) with distances possibly greater than the true nearest neighbor distances d(q, p) ≤ (1 + ε) d(q, P) Don't worry, the math is on the next

slide

Page 97: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

ε-Nearest Neighbor Search Goal

• The goal is not to get the exact answer, but a good approximate answer

• Many applications of nearest neighbor search where an approximate answer is good enough

Page 98: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

ε-Nearest Neighbor Search

• What is currently out?• Arya and Mount presented an algorithm

• Query time• O(exp(d) * ε-d log n)

• Pre-processing• O(n log n)

• Clarkson improved dependence on ε• exp(d) * ε-(d-1)/2

• Grows exponentially with d

Page 99: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

ε-Nearest Neighbor Search• Striking observation

• “Brute Force” algorithm provides a faster query time

• Simply computes the distance from the query to every point in P

• Analysis: O(dn)• Arya and Mount

• “… if the dimension is significantly larger than log n (as it for a number of practical instances), there are no approaches we know of that are significantly faster than brute-force search”

Page 100: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

High Dimensions

• What is the problem?• Many applications of nearest neighbor

(NN) have a high number of dimensions• Current algorithms do not perform much

better than brute force linear searches

• Much work has been done for dimension reduction

Page 101: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Dimension Reduction• Principal Component Analysis

• Transforms a number of correlated variables into a smaller number of uncorrelated variables

• Can anyone explain this further?

• Latent Semantic Indexing• Used with the document indexing process• Looks at the entire document, to see which

other documents contain some of the same words

Page 102: Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

Descriptor based Object matching - Complexity Finding for each object descriptor,

the nearest descriptor in the model, can be a costly operation

Descriptor dimension ~ 1E2 1000 object descriptors 1E6 descriptors per model 56 models

Brute force nearest neighbor ~1E12

),(min},..,1{

mkMm

pqdist