Fast Contour Matching Using Approximate Earth Movers Distance
Kristen Grauman and Trevor DarrellComputer Science and Artificial Intelligence Laboratory
Massachusetts Institute of TechnologyCambridge, MA, 02139
Weighted graph matching is a good way to align a pair ofshapes represented by a set of descriptive local features;the set of correspondences produced by the minimum costmatching between two shapes features often reveals howsimilar the shapes are. However, due to the complexity ofcomputing the exact minimum cost matching, previous al-gorithms could only run efficiently when using a limitednumber of features per shape, and could not scale to per-form retrievals from large databases. We present a con-tour matching algorithm that quickly computes the min-imum weight matching between sets of descriptive localfeatures using a recently introduced low-distortion embed-ding of the Earth Movers Distance (EMD) into a normedspace. Given a novel embedded contour, the nearest neigh-bors in a database of embedded contours are retrieved insublinear time via approximate nearest neighbors searchwith Locality-Sensitive Hashing (LSH). We demonstrate ourshape matching method on a database of 136,500 images ofhuman figures. Our method achieves a speedup of four or-ders of magnitude over the exact method, at the cost of onlya 4% reduction in accuracy.
1. IntroductionThe minimum cost of matching features from one shape tothe features of another often reveals how similar the twoshapes are. The cost of matching two features may be de-fined as how dissimilar they are in spatial location, appear-ance, curvature, or orientation; the minimal weight match-ing is the correspondence field between the two sets of fea-tures that requires the least summed cost. A number of suc-cessful shape matching algorithms and distance measuresrequire the computation of minimal cost correspondencesbetween sets of features on two shapes, e.g., [2, 17, 8, 6, 3].
Unfortunately, computing the optimal matching for asingle shape comparison has a complexity that is super-polynomial in the number of features. The complexity isof course magnified when one wishes to search for similarshapes (neighbors) in a large database: a linear scan of thedatabase would require computing a comparison of super-
polynomial complexity for each database member againstthe query shape. Hierarchical search methods, pruning, orthe triangle inequality may be employed, yet query timesare still linear in the size of the database in the worst case,and individual comparisons maintain their high complexityregardless.
To address the computational complexity of currentcorrespondence-based shape matching algorithms, we pro-pose a contour matching algorithm that incorporates re-cently developed approximation techniques and enables fastshape-based similarity retrieval from large databases. Wetreat contour matching as a graph matching problem, anduse the Earth Movers Distance (EMD) the minimum costthat is necessary to transform one weighted point set intoanother as a metric of similarity. We embed the minimumweight matching of contour features into L1 via the EMDembedding of , and then employ approximate nearestneighbor (NN) search to retrieve the shapes that are mostsimilar to a novel query. The embedding step alone re-duces the complexity of computing a low-cost correspon-dence field between two shapes from superpolynomial inthe number of features to O(nd log ), where n is the num-ber of features, d is their dimension, and is the diameterof the feature space (i.e., the greatest inter-feature distance).
In this work we also introduce the idea of a low-dimensional shape descriptor manifold. Using many exam-ples of high-dimensional local features taken from shapesin an image database, we construct a subspace that capturesmuch of the descriptive power of the rich features, yet al-lows us to represent them compactly. We build such a sub-space over the shape context feature of , which con-sists of local histograms of edge points, and successfullyuse it within the proposed approximate EMD shape match-ing method.
We demonstrate our fast contour matching method ona database of 136,500 human figure images (real and syn-thetic examples). We report on the relative complexities(query time and space requirements) of approximate ver-sus exact EMD for shape matching. In addition, we studyempirically how much retrieval quality for our approximatemethod differs from its exact-solution counterpart (optimalgraph matching); matching quality is quantified based on
its performance as a k-NN classifier for 3-D pose. Withour method it is feasible to quickly retrieve similar shapesfrom large databases an ability which has applicationsin various example-based vision systems and our tech-nique eliminates the constraint on input feature set size fromwhich other contour matching techniques suffer.
2. Related Work
In this section we review relevant related work on currentshape matching techniques requiring optimal correspon-dences between features, the use of EMD as a similaritymeasure, and the embedding of EMD into a normed spaceand fast approximate similarity search. For additional infor-mation about various distance metrics for shape matchingand their computational complexities, please refer to .
A number of shape matching techniques require optimalcorrespondences between feature sets at some stage. Theauthors of  obtain least cost correspondences with anaugmenting path algorithm in order to estimate an aligningtransform between two shapes. They achieve impressiveshape matching results with their method, but they note thatthe run-time does not scale well with the representation sizedue to the cubic complexity of solving correspondences.The authors of  characterize local shape topologies withpoints and tangent lines and use a combinatorial geometrichashing method to compute correspondence between theseorder structures of two shapes. In , a polynomial timemethod is given where the shock graphs of 2-D contoursare compared by performing a series of edit operations, andthe optimal alignment of shock edges is found using dy-namic programming. In , a graduated assignment graphmatching method is developed for matching image bound-ary features that operates in time polynomial in the size ofthe feature sets.
The concept of using the Earth Movers Distance tomeasure perceptual similarity between images was first ex-plored in  for the purpose of measuring distance be-tween gray-scale images. More recently EMD has beenutilized for color- or texture-based similarity in [16, 9],and extended to allow unpenalized distribution transforma-tions in . In  exact EMD is applied to a databaseof 1,620 silhouettes whose shock graphs are embedded intoa normed space; the method does not use an embeddingto approximate the EMD computation itself, and thus maynot scale well with input or database size. In , a pseudo-metric derived from EMD that respects the triangle inequal-ity and positivity property is given and applied to measureshape similarity on edges.
In recent work by , AdaBoost is used to learn anembedding that maps the Chamfer distance into Euclideanspace, and it is applied to edge images of hands in order toretrieve 3-D hand poses from large databases. However, as
the authors note, their training algorithm, which requires alarge number of exact distance computations, has a runningtime that thus far prevents their method from embeddingmore complex distances (such as graph matching or EMD),and retrievals are based on a linear scan of the database.
Our goal is to achieve robust, perceptually meaningfulshape matching results as the above methods can, but ina way that scales more reasonably with an arbitrary rep-resentation size and allows real-time retrieval from largerdatabases.
In this work we show how EMD and Locality-SensitiveHashing (LSH) can be used for contour-based shape re-trievals. An embedding of EMD into L1 and the use of LSHfor approximate NN was shown for the purpose of colorhistogram-based image retrieval in . We utilize theshape context feature (log-polar histograms of edge points)of  as a basis for our shape reprentation in this work.While the authors of  mention that using approximateNN search algorithms for shape context-based retrieval is apossibility, their system actually utilizes pruning techniquesto speed searches. To our knowledge our work is the first touse an EMD embedding for fast contour matching, to em-ploy LSH for contour matching, and to develop a compactshape context subspace feature.
3. Fast Similarity Search with EMDIn this section, for the readers convenience, we briefly sum-marize the EMD metric and the randomized algorithms weuse in our shape matching method: the approximate similar-ity search algorithm LSH , and the embedding of EMDinto a normed space given in .
EMD is named for a physical analogy that may be drawnbetween the process of transforming one weighted point setinto another and the process of moving piles of dirt spreadaround one set of locations to another set of holes in thesame space. The points are locations, their weights are thesize of the dirt piles and holes, and the ground metric be-tween a pile and hole is the amount of work needed to movea unit of dirt. To use this transformation as a distance mea-sure, i.e., a measure of dissimilarity, one seeks the least costtransformation the movement of dirt that requires the leastamount of work. When the total weight in the two point setsis equal, the solution is a complete one-to-one correspon-dence, and it is equivalent to the problem of bipartite graphmatching. That is, for a metric space (X ,D) and two n-element sets A,B X , the distance is the minimum costof a perfect matching between A and B:
EMD(A,B) = min:AB