Ken Chatfield James Philbin Andrew Zisserman

Ken ChatfieldJames Philbin

Andrew Zisserman

Efficient Retrieval of Deformable Shape Classes using Local Self-Similarities

University of Oxford NORDIA ‘09, 27 September 2009

Objective

Goal: Fast and accurate retrieval based on abstract shape Example: extract shape from images below efficientlyUse the descriptor of Shectman and Irani Extend the descriptor to provide fast shape matching Incorporate into scalable shape-based retrieval framework Theme: efficiency

Shectman and Irani [CVPR ’07] Abstract Shapes

The Self-Similarity Descriptor – Review

Shectman and Irani [CVPR ’07] Abstract Shapes

High Self-Similarity

Low Self-Similarity

1. GenerateCorrelationSurface (SSD)

2. BinMaximumSimilarities

(a) (b) (c) (d)

Implicit Shape Model – Review

Objective: Use descriptor data to find location of query shape in target image Account for non-rigid deformation

Leibe, Leonardis and Schiele [ECCV ’04]

Query

a

b

c

de

f

g

h i j

Target

a

b

c

de

f

g

h i j

Approach:Incorporate our set of descriptors into ISM (descriptors manually selected for now)

Apply the Generalised Hough Transform –1. Store offsets to an arbitrary object

centre for descriptors in the query2. Find putative matches in target3. Apply same offsets:

(x, y) → (xc, yc)4. Identify modes in Hough voting space

Apply Parzen window method with Gaussian basis

query target

Datasets

ETHZ Deformable Shape Classes used as our primary test dataset

Four main deformable object classes used

254 images in totalIllustrates some of the variation we

want to account for: Abstract shape representation

Accounted for by descriptor Changes in scale

Multiple Hough voting passes Non-rigid deformation

Account for in ISMnot explicitly accounted for in S&I

Searching over Scale

0.6 1.3

Que

ry

Targ

et

0.8

1. Select Query Points

2. AccumulateCentre Votes

3. Select Match Scale

4. Establish Point Correspondences

support radius

How does this all fit together?

Non-rigid Deformation

‘Support Radius’ can be used to implicitly account for non-rigid deformation

Example: larger radius used

Query Target

Improving Efficiency

OUTLINE:1) Basic Shape Matching Review2) Improving Efficiency3) Scalable and Efficient Retrieval4) Evaluation of Results

Improving Efficiency

Improve efficiency in two main ways:1. Cut down the number of descriptors used for matching2. Incorporate into efficient retrieval system using visual words and ‘Video

Google’ ideas (will return to this in a second)1. Shape Matching – Objective: Instead of manually defining points, user selects ROI in query System should then return regions containing the same shape in the target Naïve approach → dense sampling ISM well suited to this if the descriptors are all sufficiently informative

BUT computationally expensive

?Query Target

Efficiency-oriented Descriptor Selection

Instead, cut down the number of descriptors by: Eliminating Homogeneous descriptors as in S&I Applying 2NN thresholding:

85% reduction in descriptor count 97.75% reduction in runtime

Query Target Result

Without negatively impactingmatching performance

What about the descriptor?

Technically descriptors must be recomputed for each match scale However they exhibit a degree of inbuilt scale invariance due to:

(ii) log-polar binning(i) use of correlation patches as basic unit

Therefore, same descriptors used for all scales → further efficiency gain Log-polar binning also helps tolerance to non-rigid deformation

Shape-based Retrieval

OUTLINE:1) Basic Shape Matching Review2) Improving Efficiency3) Scalable and Efficient Retrieval4) Evaluation of Results Sivic and Zisserman [ICCV ’03]

Nister and Stewenius [CVPR ’06]Chum et al. [ICCV ’07]

Philbin et al. [CVPR ’07]Jegou et al. [ECCV ’08]

Text Retrieval Approach – Review

K-meansClustering

1. Train Vocabulary – develop a visual analogue to the textual word

2. Use Vocabulary

based on Sivic and Zisserman [ICCV ’03]

Bag-of-words formulation allowsapplication of standard IR techniques

4. bag-of-words vector(generated offline)

3. visual word assignment

2. cluster centres1. descriptor quantization

Visual Words – Examples

Vocabulary size: 1,000-10,000 words Training set: 254 images of ETHZ shape classes dataset

* points highlighted in green in each image indicate occurrences of each given visual word

Using Visual Words

Ranking: Use standard tf-idf architecture to rank Given weighted vectors, only need perform single scalar product for each of our

N images to rank If images contain an average of W unique visual word ‘term frequencies’:

→ O(NW) Compare to complexity of matching stage: O(ND2L)Matching: Retain spatial location of visual word occurrences Descriptors then effectively pre-matched offline

(by descriptors assigned to the same visual word in both images) Complexity now reduces to O(W2) instead of O(D2L) per image

N – number of imagesW – average number of visual words per imageD – average number of descriptors per image (D>W)L – number of dimensions of descriptor itself

Evaluation

OUTLINE:1) Basic Shape Matching Review2) Improving Efficiency3) Scalable and Efficient Retrieval4) Evaluation of Results

Matching Results

Objective: Establish localisation performance within images where query existsExperiment: Four queries within each of the four shape classes of the ETHZ dataset

Class Identified Misidentified Success

swans 69 87 44.2 %

bottles 95 69 57.9 %

apple-logos 84 92 46.5 %

mugs 59 105 36.0 %

Total 307 353 46.5 %

Tested for proportion of images where:

(of actual and estimated bounding box in target)

intersectionunion

> 0.8

estimated

actual

these results despite…

Matching Results

Retrieval Results

mAP

Class SSdesc SIFT

swans 0.34 0.22

bottles 0.38 0.21

apple-logos 0.36 0.36

mugs 0.31 0.31

(a) swans

(c) apple-logos

(b) bottles

(d) mugs

Retrieval algorithm tested over whole of ETHZ dataset

Vocabulary of 10,000 words trained Low recall → good performance Compared to Philbin et al.’s ‘Visual

Google’ with Hessian-Affine detections and SIFT descriptors

Video Retrieval

Episode of Lost used to test scalability Chosen due to lack of large-scale shape based datasets Retrain a new vocabulary of 10,000 words Particularly challenging – query present in only 84/2,721 frames,

subject to variety of affine deformations Again, at low recall → algorithm performs well Completes in < 3 seconds (unoptimised MATLAB)

Query

Conclusion

Presented self-similarity based approach to matching deformable shape classes

Demonstrated fast and efficient visual word-based retrieval scheme → outperforms SIFT across shape-based datasets

Future work Further develop retrieval stage

More advanced text retrieval techniques, spatial verification Incorporate into other representations of deformation

Deformable object recognition scheme of Ferrari et al.Registration schemes of Gay-Bellile et al. or Pilet et al.

Documents

Ken Chatfield James Philbin Andrew Zisserman