59
Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

Embed Size (px)

Citation preview

Page 1: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

Fast Similarity Search in Image Databases

CSE 6367 – Computer VisionVassilis Athitsos

University of Texas at Arlington

Page 2: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

2

4128 images are generated for each hand shape.

Total: 107,328 images.

A Database of Hand Images

Page 3: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

3

Efficiency of the Chamfer Distance

• Computing chamfer distances is slow.– For images with d edge pixels, O(d log d) time.– Comparing input to entire database takes over 4 minutes.

• Must measure 107,328 distances.

input model

Page 4: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

4

The Nearest Neighbor Problem

database

Page 5: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

5

The Nearest Neighbor Problem

query

• Goal: – find the k nearest

neighbors of query q.database

Page 6: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

6

The Nearest Neighbor Problem

• Goal: – find the k nearest

neighbors of query q.

• Brute force time is linear to:– n (size of database).– time it takes to measure a

single distance.

query

database

Page 7: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

7

• Goal: – find the k nearest

neighbors of query q.

• Brute force time is linear to:– n (size of database).– time it takes to measure a

single distance.

The Nearest Neighbor Problem

query

database

Page 8: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

8

Examples of Expensive Measures

DNA and protein sequences: Smith-Waterman.

Dynamic gestures and time series: Dynamic Time Warping.

Edge images: Chamfer distance, shape context distance.

These measures are non-Euclidean, sometimes non-metric.

Page 9: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

9

Embeddings

database

x1

x2

x3

xn

Page 10: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

10

database

x1

x2

x3

xn

embedding F

x1x2

x3

x4

xn

Rd

Embeddings

Page 11: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

11

database

x1

x2

x3

xn

embedding F

x1x2

x3

x4

xn

q

query

Rd

Embeddings

Page 12: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

12

x1x2

x3

x4

xn

q

database

x1

x2

x3

xn

embedding F

q

query

Rd

Embeddings

Page 13: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

13

x1x2

x3

x4

xn

Rd

q

Measure distances between vectors (typically much faster).

Caveat: the embedding must preserve similarity structure.

Embeddingsdatabase

x1

x2

x3

xn

embedding F

q

query

Page 14: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

14

Reference Object Embeddings

original space X

Page 15: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

15

Reference Object Embeddings

original space X

r

r: reference object

Page 16: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

16

Reference Object Embeddings

original space X

r: reference object Embedding: F(x) = D(x,r)

D: distance measure in X.

r

Page 17: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

17

Reference Object Embeddings

original space X Real lineF

r: reference object Embedding: F(x) = D(x,r)

D: distance measure in X.

r

Page 18: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

18

original space X Real lineF

r: reference object Embedding: F(x) = D(x,r)

D: distance measure in X.

F(r) = D(r,r) = 0

r

Reference Object Embeddings

Page 19: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

19

original space X Real lineF

r: reference object Embedding: F(x) = D(x,r)

D: distance measure in X.

a

F(r) = D(r,r) = 0 If a and b are similar,

their distances to r are also similar (usually).

br

Reference Object Embeddings

Page 20: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

20

original space X Real lineF

r: reference object Embedding: F(x) = D(x,r)

D: distance measure in X.

a

F(r) = D(r,r) = 0 If a and b are similar,

their distances to r are also similar (usually).

br

Reference Object Embeddings

Page 21: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

21

F(x) = D(x, Lincoln)

F(Sacramento)....= 1543F(Las Vegas).....= 1232F(Oklahoma City).= 437F(Washington DC).= 1207F(Jacksonville)..= 1344

Page 22: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

22

F(x) = (D(x, LA), D(x, Lincoln), D(x, Orlando))

F(Sacramento)....= ( 386, 1543, 2920)F(Las Vegas).....= ( 262, 1232, 2405)F(Oklahoma City).= (1345, 437, 1291)F(Washington DC).= (2657, 1207, 853)F(Jacksonville)..= (2422, 1344, 141)

Page 23: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

23

F(x) = (D(x, LA), D(x, Lincoln), D(x, Orlando))

F(Sacramento)....= ( 386, 1543, 2920)F(Las Vegas).....= ( 262, 1232, 2405)F(Oklahoma City).= (1345, 437, 1291)F(Washington DC).= (2657, 1207, 853)F(Jacksonville)..= (2422, 1344, 141)

Page 24: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

24

Embedding Hand Images

F(x) = (C(x, R1), C(A, R2), C(A, R3))

R1

R2

R3

image x

x: hand image. C: chamfer distance.

Page 25: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

25

Basic Questions

R1

R2

R3

x: hand image. C: chamfer distance.

How many prototypes? Which prototypes? What distance should we

use to compare vectors?

image x

F(x) = (C(x, R1), C(A, R2), C(A, R3))

Page 26: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

26

Some Easy Answers.

R1

R2

R3

x: hand image. C: chamfer distance.

How many prototypes? Pick number manually.

Which prototypes? Randomly chosen.

What distance should we use to compare vectors? L1, or Euclidean.

image x

F(x) = (C(x, R1), C(A, R2), C(A, R3))

Page 27: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

27

Filter-and-refine Retrieval

Embedding step: Compute distances from query to reference

objects F(q). Filter step:

Find top p matches of F(q) in vector space. Refine step:

Measure exact distance from q to top p matches.

Page 28: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

28

Evaluating Embedding Quality

Embedding step: Compute distances from query to reference

objects F(q). Filter step:

Find top p matches of F(q) in vector space. Refine step:

Measure exact distance from q to top p matches.

How often do we find the true nearest neighbor?

Page 29: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

29

Evaluating Embedding Quality

Embedding step: Compute distances from query to reference

objects F(q). Filter step:

Find top p matches of F(q) in vector space. Refine step:

Measure exact distance from q to top p matches.

How often do we find the true nearest neighbor?

Page 30: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

30

Evaluating Embedding Quality

Embedding step: Compute distances from query to reference

objects F(q). Filter step:

Find top p matches of F(q) in vector space. Refine step:

Measure exact distance from q to top p matches.

How often do we find the true nearest neighbor?

How many exact distance computations do we need?

Page 31: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

31

Evaluating Embedding Quality

Embedding step: Compute distances from query to reference

objects F(q). Filter step:

Find top p matches of F(q) in vector space. Refine step:

Measure exact distance from q to top p matches.

How often do we find the true nearest neighbor?

How many exact distance computations do we need?

Page 32: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

32

Evaluating Embedding Quality

Embedding step: Compute distances from query to reference

objects F(q). Filter step:

Find top p matches of F(q) in vector space. Refine step:

Measure exact distance from q to top p matches.

How often do we find the true nearest neighbor?

How many exact distance computations do we need?

Page 33: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

33

query

Database (107,328 images)

nearestneighborBrute force retrieval time: 260 seconds.

Results: Chamfer Distance on Hand Images

Page 34: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

34

Brute Force

Embeddings Embeddings

Accuracy 100% 95% 100%

# of distances 80640 1866 24650

Sec. per query 112 2.6 34

Speed-up factor 1 43 3.27

Query set: 710 real images of hands.

Database: 80,640 synthetic images of hands.

Results: Chamfer Distance on Hand Images

Page 35: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

35

Ideal Embedding Behavior

original space X F Rd

Notation: NN(q) is the nearest neighbor of q.

For any q: if a = NN(q), we want F(a) = NN(F(q)).

aq

Page 36: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

36

A Quantitative Measure

qa

original space X F Rd

b

If b is not the nearest neighbor of q,F(q) should be closer to F(NN(q)) than to F(b).

For how many triples (q, NN(q), b) does F fail?

Page 37: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

37

A Quantitative Measure

original space X F Rd

qa

F fails on five triples.

Page 38: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

38

Embeddings Seen As Classifiers

qa

b

Classification task: is qcloser to a or to b?

Page 39: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

39

Any embedding F defines a classifier F’(q, a, b). F’ checks if F(q) is closer to F(a) or to F(b).

qa

b

Embeddings Seen As Classifiers

Classification task: is qcloser to a or to b?

Page 40: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

40

Given embedding F: X Rd: F’(q, a, b) = ||F(q) – F(b)|| - ||F(q) – F(a)||.

F’(q, a, b) > 0 means “q is closer to a.” F’(q, a, b) < 0 means “q is closer to b.”

qa

b

Classifier Definition

Classification task: is qcloser to a or to b?

Page 41: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

41

Given embedding F: X Rd: F’(q, a, b) = ||F(q) – F(b)|| - ||F(q) – F(a)||.

F’(q, a, b) > 0 means “q is closer to a.” F’(q, a, b) < 0 means “q is closer to b.”

Classifier Definition

Goal: build an F such that F’ has low

error rate on triples of type (q, NN(q), b).

Page 42: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

42

1D Embeddings as Weak Classifiers

1D embeddings define weak classifiers. Better than a random classifier (50% error rate).

Page 43: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

43

1D Embeddings as Weak Classifiers

1D embeddings define weak classifiers. Better than a random classifier (50% error rate).

We can define lots of different classifiers. Every object in the database can be a reference object.

Question: how do we combine many such

classifiers into a single strong classifier?

Page 44: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

44

1D Embeddings as Weak Classifiers

1D embeddings define weak classifiers. Better than a random classifier (50% error rate).

We can define lots of different classifiers. Every object in the database can be a reference object.

Question: how do we combine many such

classifiers into a single strong classifier?

Answer: use AdaBoost. AdaBoost is a machine learning method designed for

exactly this problem.

Page 45: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

45

Using AdaBoostoriginal space X

Fn

F2

F1

Real line

Output: H = w1F’1 + w2F’2 + … + wdF’d . AdaBoost chooses 1D embeddings and weighs them. Goal: achieve low classification error. AdaBoost trains on triples chosen from the database.

Page 46: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

46

From Classifier to Embedding

AdaBoost output H = w1F’1 + w2F’2 + … + wdF’d

What embedding should we use?What distance measure should we use?

Page 47: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

47

From Classifier to Embedding

AdaBoost output

BoostMap embedding

H = w1F’1 + w2F’2 + … + wdF’d

F(x) = (F1(x), …, Fd(x)).

Page 48: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

48

From Classifier to Embedding

AdaBoost output

BoostMap embedding

Distance measure

D((u1, …, ud), (v1, …, vd)) = i=1 wi|ui – vi|

d

F(x) = (F1(x), …, Fd(x)).

H = w1F’1 + w2F’2 + … + wdF’d

Page 49: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

49

From Classifier to Embedding

AdaBoost output

D((u1, …, ud), (v1, …, vd)) = i=1 wi|ui – vi|

d

F(x) = (F1(x), …, Fd(x)).BoostMap embedding

Distance measure

Claim: Let q be closer to a than to b. H misclassifiestriple (q, a, b) if and only if, under distance measure D, F maps q closer to b than to a.

H = w1F’1 + w2F’2 + … + wdF’d

Page 50: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

50

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)

i=1

d

i=1

d

i=1

d

Page 51: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

51

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)

i=1

d

i=1

d

i=1

d

Page 52: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

52

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)

i=1

d

i=1

d

i=1

d

Page 53: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

53

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)

i=1

d

i=1

d

i=1

d

Page 54: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

54

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)

i=1

d

i=1

d

i=1

d

Page 55: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

55

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)

i=1

d

i=1

d

i=1

d

Page 56: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

56

Significance of Proof• AdaBoost optimizes a direct measure of

embedding quality.

• We have converted a database indexing problem into a machine learning problem.

Page 57: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

57

query

Database (80,640 images)

nearestneighborBrute force retrieval time: 112 seconds.

Results: Chamfer Distance on Hand Images

Page 58: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

58

Query set: 710 real images of hands.

Database: 80,640 synthetic images of hands.

Results: Chamfer Distance on Hand Images

Brute Force

Random Reference

ObjectsBoostMap

Accuracy 100% 95% 95%

# of distances 80640 1866 450

Sec. per query 112 2.6 0.63

Speed-up factor 1 43 179

Page 59: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

59

Query set: 710 real images of hands.

Database: 80,640 synthetic images of hands.

Results: Chamfer Distance on Hand Images

Brute Force

Random Reference

ObjectsBoostMap

Accuracy 100% 100% 100%

# of distances 80640 24950 5995

Sec. per query 112 34 13.5

Speed-up factor 1 3.23 8.3