37
Min-Hashing and Geometric min-Hashing Ondřej Chum, Michal Perdoch, and Jiří Matas Center for Machine Perception Czech Technical University Prague

Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

Min-Hashing andGeometric min-Hashing

Ondřej Chum, Michal Perdoch, and Jiří Matas

Center for Machine Perception

Czech Technical University

Prague

Page 2: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

2 /34

Outline

1. Looking for representation of images that:• is compact (useful for very large datasets) • is fast (linear in the size of the output)• supports local (image and object) recognition/retrieval• is accurate (very low false positive rate)

2. min-Hash is a powerful representation, but we show thatGeometric min-Hash is significantly more powerful

3. Applications: • Large database clustering• Discovery of (even small) objects

Page 3: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

3 /26

Introduction to min-Hash for Images

min-Hash originates from the text retrieval community, originally used for detection of near duplicate documents

Page 4: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

4 /26

Image Representation: a Set of Words

Features Local Appearance

SIFT Description [Lowe’04]

Vector quantization

Visual vocabulary

0

4

0

2

... …0

1

0

1

...

Bag of words

Set of words

Visualwords

Image representation

Page 5: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

5 /26

min-Hash

This probability will be called “image similarity ” and denoted

min-Hash is a locality sensitive hashing (LSH) function m that selects an element (visual word) m(Ii) from each set Ii of visual words detected in image i so that

Page 6: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

6 /26

145263

min-Hash

A C D EB F

Vocabulary

A CB C DB

F F453621 A B546123 C C

Set I1 Set I2

Random orderings min-Hash

sim (I1, I2) = 1/2

E F F

m1 :

m2 :

m3:

vocabulary of 1 M visual words

~ 2000 wordsper image

fixed sizeimage

representation~100

Estimated similarity of I1 and I2 from 3 min-Hashes = 2/3

Page 7: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

7 /26

Q

A

E

V

Y

J

}}}

CA

EV

QZ

}}}

k hash tables

a sketch = s-tuple of min-Hashes

Sketch collisionI1 I2

... P{collision} = sim(I1,I2)s

P{retrieval} = 1 – (1 - sim(I1, I2)s)k

collision:all s min-Hashes must agree

min-Hash Retrieval

retrieval: 1. generate k sketches2. at least one of k

sketches must collide

QA

EV

YJ

Page 8: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

8 /34

Probability of Retrieving an Image Pair

similarity (set overlap)

Near duplicate imagesImages of the same object

prob

abili

ty o

f ret

rieva

l

s = 3, k = 512

Unrelated images

Page 9: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

9 /34

Near Duplicate Images

Page 10: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

10 /34

Near Duplicate Images

Page 11: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

11 /34

Near Duplicate Images

Page 12: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

12 /34

Scalable Near Duplicate Image Detection

• Images perceptually (almost) identical but not identical (noise, compression level, small motion, small occlusion)

• Similar images of the same object / scene

• Large databases

• Fast – linear in the number of duplicates

• Store small constant amount of data per image

Page 13: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

13 /34

Representation, Distance and Search

• Naïve method (space and time infeasible):– Representation: the whole image

– Similarity: sophisticated visual similarity measure

– Search: O(N^2) on N images

• Goal:– Compact representation – small constant number of bytes (that captures

visual content addressed by near duplicate definition)

– Similarity that allows for fast retrieval of similar images (that measures similarity according to near duplicate definition)

– Search: linear in the number of near duplicates – needs to be indexing

Page 14: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

14 /34

Word Weighting for min-Hash

all words Xw have the same chance to be a min-Hash

For hash function (set overlap similarity)

For hash function

the probability of Xw being a min-Hash is proportional to dw

A Q VE RJC ZA U B: YdA dC dE dVdJ dQ dY dZdR

Page 15: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

15 /34

Histogram Intersection Using min-HashIdea: represent a histogram as a set, use min-Hash set machinery

A1 C1B1

A2 C2

C3

C1 D1B1

B2 C2

C3

A1 C1 D1B1A2 B2 C2 C3min-Hash vocabulary:

Bag of words A / set A’ Bag of words B / set B’

A1 C1 D1B1A2 B2 C2 C3A’ U B’:

Set overlap of A’ of B’ is a histogram intersection of A and B

A C DBVisual words:

tA = (2,1,3,0) tB = (0,2,3,1)

Page 16: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

16 /34

Similarity Measures for min-Hash

Set representation Bag of words representation

Equa

l wei

ghts

Wei

ghte

d

Page 17: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

17 /34

Min-Hash on TrecVid

Query Retrieved images with decreasing similarity

• DoG features• vocabulary of 64,635 visual words• 192 min-Hashes, 3 min-Hashes per a sketch, 64 sketches• similarity threshold 35%

Set overlap similarity measure

Page 18: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

18 /34

Query Examples

Query image:

ResultsSet overlap, weighted set overlap, weighted histogram intersection

Page 19: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

19 /26

Geometric min-Hash (GmH)

Can geometry help us in finding sketches of min-Hashes with much higher repeatability than

random s-tuples?

Page 20: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

20 /26

The Idea

• For a sketch collision all s min-Hashes in the sketch must agree

• In the construction of a sketch, we can assume that the first min-Hash is matching

• If the assumption is violated, no harm is done, the sketch would not collide

We show how to exploit the assumption of matching m1(I) to design the rest of the sketch (not independent min-Hashes anymore). The new procedure – Geometric min-Hash - is superior to the standard min-Hash.

Page 21: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

21 /26

Vocabulary Size and Set Representation

Small vocabulary (1k visual words) Large vocabulary (1M visual words)

On a 100k dataset of images, 95% of features have a unique visual word in an image

For large vocabularies selecting a visual word from an image is

(almost) equivalent to selecting a feature (with location and scale)

Page 22: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

22 /26

Repeatability of Feature Sets:Translation and Occlusion

Do not group features that are far apart into sketches. Such sketches have lower repeatability (as groups, not individually)

under translation or occlusion

Page 23: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

23 /26

Repeatability of Feature Sets: Scale Change

Do not group features of very different scale into sketches. Such sketches have lower repeatability

under scale change.

Page 24: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

24 /26

Geometric min-Hash algorithm

1. Keep features with unique visual word in the image2. Obtain the “central feature” by min-Hash3. Select scale and spatial neighbourhood of the central feature4. Select secondary min-Hash(es) from the neighbourhood5. Relative pose of the sketch features is a geometric invariant (as in

geometric hashing)

EBF

Sketch of GmH: s-tuple of visual words + geometric invariant

Page 25: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

25 /26

Geometric vs. Standard min-Hash

• Lower false positives– Additional geometric invariant (part of

the hash key or verification)

– Lower probability random sketch collisions (next slide)

GmH: 37 collisions mH: 4 collisions

Sketches s=2, k=5000

GmH: 0 collisions mH: 4 collisions

Sketches s=2, k=5000

• Higher true positives– View point change

– Severe occlusion

– Scale change

– Object on a different background

• Faster spatial verification– Sketch collision defines geometric

transformation

Page 26: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

26 /26

Overlap of Random Sets

False positive = sketch collision of two random images

The probability of two random sets I1and I2having a common min-Hash (i.e. the average overlap of two random sets)

where w is the size of the vocabulary

The smaller the sets, the smaller probability of random collision

• Min-Hash: features in the whole image

• Geometric min-Hash: only small subset of the image

Page 27: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

27 /26

Experiments

Page 28: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

28 /26

Experiment 1: Clustering on Oxford 100k DB

Hertford

Keble

Magdalen

Pitt Rivers

Radcliffe Camera

All Soul's

Ashmolean

Balliol

Bodleian

Christ Church

Cornmarket

100 000 Images downloaded from FLICKRIncludes 11 Oxford Landmarks with manually labeled ground truth

Randomized clusteringCluster hypotheses by hashing, completion by retrieval

Page 29: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

29 /26

Clustering Results on Oxford 100k

16 min* 33 min*

<s = 2, k = 64~550 bytes per image

* The time does not include feature detection, SIFT computation, vector quantization

s = 3, k = 256~1600 bytes per image

Page 30: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

30 /26

Experiment 2: Object Discovery

Verification by co-segmentationcritical for small objects

[Cech, Matas, Perdoch CVPR 08], code available on WWW [Ferrari, Tuytelaars,Van Gool, ECCV 2004]

Geometric min-Hashsketch collisions = 2, k = 256

Page 31: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

31 /26

Small Object Discovery

Other instances of the discovered object by (sub)image retrieval

Page 32: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

32 /26

Faces are Small Objects

Page 33: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

33 /26

Discovery of the Face Category:the Largest Cluster in Oxford 100k

Page 34: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

34 /26

The Importance of the Co-segmentationin Object Discovery

Seed image pair

Result by query expansion using features inside the segmentation

Using the whole bounding box

Page 35: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

35 /26

Visual Content Hyperlinks

Page 36: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

36 /26

Conclusions

• Novel representation for hashing was introduced– Significantly improves the recall (reduces false negatives)

– Reduces the number of false positives

– Reduces memory footprint of image representation

– Connects min-Hash and geometric Hashing

– The first efficient combination of appearance and geometry in large scale indexing

• Applications– Clustering of spatially related images

– Discovery of small objects

Page 37: Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash originates from the text retrieval community, originally used for detection of near duplicate

37 /26

Thank you!

Example of discovered object

(cut outs)(cut outs)