Ken ChatfieldJames Philbin
Andrew Zisserman
Efficient Retrieval of Deformable Shape Classes using Local Self-Similarities
University of Oxford NORDIA ‘09, 27 September 2009
Objective
Goal: Fast and accurate retrieval based on abstract shape Example: extract shape from images below efficientlyUse the descriptor of Shectman and Irani Extend the descriptor to provide fast shape matching Incorporate into scalable shape-based retrieval framework Theme: efficiency
Shectman and Irani [CVPR ’07] Abstract Shapes
The Self-Similarity Descriptor – Review
Shectman and Irani [CVPR ’07] Abstract Shapes
High Self-Similarity
Low Self-Similarity
1. GenerateCorrelationSurface (SSD)
2. BinMaximumSimilarities
(a) (b) (c) (d)
Implicit Shape Model – Review
Objective: Use descriptor data to find location of query shape in target image Account for non-rigid deformation
Leibe, Leonardis and Schiele [ECCV ’04]
Query
a
b
c
de
f
g
h i j
Target
a
b
c
de
f
g
h i j
Approach:Incorporate our set of descriptors into ISM (descriptors manually selected for now)
Apply the Generalised Hough Transform –1. Store offsets to an arbitrary object
centre for descriptors in the query2. Find putative matches in target3. Apply same offsets:
(x, y) → (xc, yc)4. Identify modes in Hough voting space
Apply Parzen window method with Gaussian basis
query target
Datasets
ETHZ Deformable Shape Classes used as our primary test dataset
Four main deformable object classes used
254 images in totalIllustrates some of the variation we
want to account for: Abstract shape representation
Accounted for by descriptor Changes in scale
Multiple Hough voting passes Non-rigid deformation
Account for in ISMnot explicitly accounted for in S&I
Searching over Scale
0.6 1.3
Que
ry
Targ
et
0.8
1. Select Query Points
2. AccumulateCentre Votes
3. Select Match Scale
4. Establish Point Correspondences
support radius
How does this all fit together?
Non-rigid Deformation
‘Support Radius’ can be used to implicitly account for non-rigid deformation
Example: larger radius used
Query Target
Improving Efficiency
OUTLINE:1) Basic Shape Matching Review2) Improving Efficiency3) Scalable and Efficient Retrieval4) Evaluation of Results
Improving Efficiency
Improve efficiency in two main ways:1. Cut down the number of descriptors used for matching2. Incorporate into efficient retrieval system using visual words and ‘Video
Google’ ideas (will return to this in a second)1. Shape Matching – Objective: Instead of manually defining points, user selects ROI in query System should then return regions containing the same shape in the target Naïve approach → dense sampling ISM well suited to this if the descriptors are all sufficiently informative
BUT computationally expensive
?Query Target
Efficiency-oriented Descriptor Selection
Instead, cut down the number of descriptors by: Eliminating Homogeneous descriptors as in S&I Applying 2NN thresholding:
85% reduction in descriptor count 97.75% reduction in runtime
Query Target Result
Without negatively impactingmatching performance
What about the descriptor?
Technically descriptors must be recomputed for each match scale However they exhibit a degree of inbuilt scale invariance due to:
(ii) log-polar binning(i) use of correlation patches as basic unit
Therefore, same descriptors used for all scales → further efficiency gain Log-polar binning also helps tolerance to non-rigid deformation
Shape-based Retrieval
OUTLINE:1) Basic Shape Matching Review2) Improving Efficiency3) Scalable and Efficient Retrieval4) Evaluation of Results Sivic and Zisserman [ICCV ’03]
Nister and Stewenius [CVPR ’06]Chum et al. [ICCV ’07]
Philbin et al. [CVPR ’07]Jegou et al. [ECCV ’08]
Text Retrieval Approach – Review
K-meansClustering
1. Train Vocabulary – develop a visual analogue to the textual word
2. Use Vocabulary
based on Sivic and Zisserman [ICCV ’03]
Bag-of-words formulation allowsapplication of standard IR techniques
4. bag-of-words vector(generated offline)
3. visual word assignment
2. cluster centres1. descriptor quantization
Visual Words – Examples
Vocabulary size: 1,000-10,000 words Training set: 254 images of ETHZ shape classes dataset
* points highlighted in green in each image indicate occurrences of each given visual word
Using Visual Words
Ranking: Use standard tf-idf architecture to rank Given weighted vectors, only need perform single scalar product for each of our
N images to rank If images contain an average of W unique visual word ‘term frequencies’:
→ O(NW) Compare to complexity of matching stage: O(ND2L)Matching: Retain spatial location of visual word occurrences Descriptors then effectively pre-matched offline
(by descriptors assigned to the same visual word in both images) Complexity now reduces to O(W2) instead of O(D2L) per image
N – number of imagesW – average number of visual words per imageD – average number of descriptors per image (D>W)L – number of dimensions of descriptor itself
Evaluation
OUTLINE:1) Basic Shape Matching Review2) Improving Efficiency3) Scalable and Efficient Retrieval4) Evaluation of Results
Matching Results
Objective: Establish localisation performance within images where query existsExperiment: Four queries within each of the four shape classes of the ETHZ dataset
Class Identified Misidentified Success
swans 69 87 44.2 %
bottles 95 69 57.9 %
apple-logos 84 92 46.5 %
mugs 59 105 36.0 %
Total 307 353 46.5 %
Tested for proportion of images where:
(of actual and estimated bounding box in target)
intersectionunion
> 0.8
estimated
actual
these results despite…
Matching Results
Retrieval Results
mAP
Class SSdesc SIFT
swans 0.34 0.22
bottles 0.38 0.21
apple-logos 0.36 0.36
mugs 0.31 0.31
(a) swans
(c) apple-logos
(b) bottles
(d) mugs
Retrieval algorithm tested over whole of ETHZ dataset
Vocabulary of 10,000 words trained Low recall → good performance Compared to Philbin et al.’s ‘Visual
Google’ with Hessian-Affine detections and SIFT descriptors
Video Retrieval
Episode of Lost used to test scalability Chosen due to lack of large-scale shape based datasets Retrain a new vocabulary of 10,000 words Particularly challenging – query present in only 84/2,721 frames,
subject to variety of affine deformations Again, at low recall → algorithm performs well Completes in < 3 seconds (unoptimised MATLAB)
Query
Conclusion
Presented self-similarity based approach to matching deformable shape classes
Demonstrated fast and efficient visual word-based retrieval scheme → outperforms SIFT across shape-based datasets
Future work Further develop retrieval stage
More advanced text retrieval techniques, spatial verification Incorporate into other representations of deformation
Deformable object recognition scheme of Ferrari et al.Registration schemes of Gay-Bellile et al. or Pilet et al.