Robust Estimation of Nonrigid Transformation for …pages.ucsd.edu/~ztu/publication/cvpr13_l2e.pdfRobust Estimation of Nonrigid Transformation for Point Set Registration Jiayi Ma1;2,

Robust Estimation of Nonrigid Transformation for Point Set Registration

Jiayi Ma1,2, Ji Zhao3, Jinwen Tian1, Zhuowen Tu4, and Alan L. Yuille21Huazhong University of Science and Technology, 2Department of Statistics, UCLA

3The Robotics Institute, CMU, 4Lab of Neuro Imaging, UCLA{jyma2010, zhaoji84, zhuowen.tu}@gmail.com, [email protected], [email protected]

Abstract

We present a new point matching algorithm for robustnonrigid registration. The method iteratively recovers thepoint correspondence and estimates the transformation be-tween two point sets. In the first step of the iteration, fea-ture descriptors such as shape context are used to establishrough correspondence. In the second step, we estimate thetransformation using a robust estimator called L2E. This isthe main novelty of our approach and it enables us to dealwith the noise and outliers which arise in the correspon-dence step. The transformation is specified in a functionalspace, more specifically a reproducing kernel Hilbert space.We apply our method to nonrigid sparse image feature cor-respondence on 2D images and 3D surfaces. Our resultsquantitatively show that our approach outperforms state-of-the-art methods, particularly when there are a large num-ber of outliers. Moreover, our method of robustly estimatingtransformations from correspondences is general and hasmany other applications.

1. IntroductionPoint set registration is a fundamental problem which

frequently arises in computer vision, medical image analy-sis, and pattern recognition [5, 4, 6]. Many tasks in thesefields – such as stereo matching, shape matching, imageregistration and content-based image retrieval – can be for-mulated as a point matching problems because point repre-sentations are general and easy to extract [5]. The pointsin these tasks are typically the locations of interest pointsextracted from an image, or the edge points sampled froma shape contour. The registration problem then reduces todetermining the correct correspondence and to find the un-derlying spatial transformation between two point sets ex-tracted from the input data.

The registration problem can be categorized into rigid ornonrigid registration depending on the application and theform of the data. Rigid registration, which only involves asmall number of parameters, is relatively easy and has been

widely studied [5, 4, 7, 19, 14]. By contrast, nonrigid reg-istration is more difficult because the underlying nonrigidtransformations are often unknown, complex, and hard tomodel [6]. But nonrigid registration is very important be-cause it is required for many real world tasks includinghand-written character recognition, shape recognition, de-formable motion tracking and medical image registration.

In this paper, we focus on the nonrigid case and presenta robust algorithm for nonrigid point set registration. Thereare two unknown variables we have to solve for in this prob-lem: the correspondence and the transformation. Althoughsolving for either variable without information regardingthe other is difficult, an iterated estimation framework canbe used [4, 3, 6]. In this iterative process, the estimate of thecorrespondence is used to refine the estimate of the trans-formation, and vice versa. But a problem arises if thereare errors in the correspondence which occurs in many ap-plications particularly if the transformation is large and/orthere are outliers in the data (e.g., data points that are notundergoing the non-rigid transformation). In this situation,the estimate of the transformation will degrade badly un-less it is performed robustly. The main contribution of ourapproach is to robustly estimate the transformations fromthe correspondences using a robust estimator named the L2-Minimizing Estimate (L2E) [20, 2].

More precisely, our approach iteratively recovers thepoint correspondences and estimates the transformation be-tween two point sets. In the first step of the iteration, fea-ture descriptors such as shape context are used to estab-lish correspondence. In the second step, we estimate thetransformation using the robust estimator L2E. This esti-mator enable us to deal with the noise and outliers in thecorrespondences. The nonrigid transformation is modeledin a functional space, called the reproducing kernel Hilbertspace (RKHS) [1], in which the transformation function hasan explicit kernel representation.

1.1. Related Work

The iterated closest point (ICP) algorithm [4] is oneof the best known point registration approaches. It uses

1

nearest-neighbor relationships to assign a binary correspon-dence, and then uses estimated correspondence to refine thetransformation. Belongie et al. [3] introduced a method forregistration based on the shape context descriptor, whichincorporates the neighborhood structure of the point setand thus helps establish correspondence between the pointsets. But these methods ignore robustness when they re-cover the transformation from the correspondence. In re-lated work, Chui and Rangarajan [6] established a generalframework for estimating correspondence and transforma-tions for nonrigid point matching. They modeled the trans-formation as a thin-plate spline and did robust point match-ing by an algorithm (TRS-RPM) which involved determin-istic annealing and soft-assignment. Alternatively, the co-herence point drift (CPD) algorithm [17] uses Gaussian ra-dial basis functions instead of thin-plate splines. Anotherinteresting point matching approach is the kernel correla-tion (KC) based method [22], which was later improvedin [9]. Zheng and Doermann [27] introduced the notionof a neighborhood structure for the general point matchingproblem, and proposed a matching method, the robust pointmatching-preserving local neighborhood structures (RPM-LNS) algorithm. Other related work includes the relaxationlabeling method, generalized in [11], and the graph match-ing approach for establishing feature correspondences [21].

The main contributions of our work include: (i) we pro-pose a new robust algorithm to estimate a spatial transfor-mation/mapping from correspondences with noise and out-liers; (ii) we apply the robust algorithm to nonrigid pointset registration and also to sparse image feature correspon-dence.

2. Estimating Transformation from Corre-spondences by L2E

Given a set of point correspondences S = {(xi, yi)}ni=1,which are typically perturbed by noise and by outlier pointswhich undergo different transformations, the goal is to esti-mate a transformation f : yi = f(xi) and fit the inliers.

In this paper we make the assumption that the noise onthe inliers is Gaussian on each component with zero meanand uniform standard deviation σ (our approach can be di-rectly applied to other noise models). More precisely, aninlier point correspondence (xi, yi) satisfies yi − f(xi) ∼N(0, σ2I), where I is an identity matrix of size d×d, with dbeing the dimension of the point. The data {yi− f(xi)}ni=1

can then be thought of as a sample set from a multivariatenormal density N(0, σ2I) which is contaminated by out-liers. The main idea of our approach is then, to find thelargest portion of the sample set (e.g., the underlying inlierset) that “matches” the normal density model, and henceestimate the transformation f for the inlier set. Next, we in-troduce a robust estimator named L2-minimizing estimate(L2E) which we use to estimate the transformation f .

2.1. Problem Formulation Using L2E: Robust Esti-mation

Parametric estimation is typically done using maximumlikelihood estimation (MLE). It can be shown that MLEis the optimal estimator if it is applied to the correct proba-bility model for the data (or to a good approximation). ButMLE can be badly biased if the model is not a sufficientlygood approximation or, in particular, there are a significantfraction of outliers. In many point matching problems, itis desirable to have a robust estimator of the transformationf because the point correspondence set S usually containsoutliers. There are two choices: (i) to build a more complexmodel that includes the outliers – which is complex since itinvolves modeling the outlier process using extra (hidden)variables which enable us to identify and reject outliers,or (ii) to use an estimator which is different from MLEbut less sensitive to outliers, as described in Huber’s robuststatistics [8]. In this paper, we use the second method andadopt the L2E estimator [20, 2], a robust estimator whichminimizes the L2 distance between densities, and is par-ticularly appropriate for analyzing massive data sets wheredata cleaning (to remove outliers) is impractical. More for-mally, L2E estimator for model f(x|θ) recommends esti-mating the parameter θ by minimizing the criterion:

L2E(θ) =

∫f(x|θ)2dx− 2

n

n∑i=1

f(xi|θ). (1)

To get some intuition for whyL2E is robust, observe that itspenalty for a low probability point xi is −f(xi|θ) which ismuch less than the penalty of− log f(xi|θ) given by MLE(which becomes infinite as f(xi|θ) tends to 0). HenceMLE is reluctant to assign low probability to any points,including outliers, and hence tends to be biased by out-liers. By contrast, L2E can assign low probabilities tomany points, hopefully to the outliers, without paying toohigh a penalty. To demonstrate the robustness of L2E, wepresent a line-fitting example which contrasts the behaviorof MLE and L2E, see Fig. 1. The goal is to fit a linear re-gression model, y = αx + ε, with residual ε ∼ N(0, 1),by estimating α using MLE and L2E. This gives, re-spectively, α̂MLE = arg maxα

∑ni=1 log φ(yi − αxi|0, 1)

and α̂L2E = arg minα

[1

2√π− 2

n

∑ni=1 φ(yi − αxi|0, 1)

],

where φ(x|µ,Σ) denotes the normal density.As shown in Fig. 1, L2E is very resistant when we con-

taminate the data by outliers, but MLE does not show thisdesirable property. L2E always has a global minimum atapproximately 0.5 (the correct value for α) but MLE’s es-timates become steadily worse as the amount of outliers in-creases. Observe, in the bottom right figure, that L2E alsohas a local minimum near α = 2, which becomes deeper asthe number n of outliers increases so that the two minimabecome approximately equal when n = 200. This is ap-

2

0 2 4 6 8 10

0

10

20

30

40

50

x

y

0 2 4 6 8 10−5

0

5

10

15

20

25

x

y

0 1 2 3 4 5−6

−5

−4

−3

−2

−1

0

x 104

α

log

−lik

elih

oo

d

100

160

120

140

180

200

0 0.5 1 1.5 2 2.5−9

−7

−5

−3

−1

1

α

log

−lik

elih

oo

d

x 103

120

140160

200180

100

0 1 2 3 4 5−1.2

−0.8

−0.4

0

0.4

α

L2

E

100

120

200

180

160

140

α = 0.5

0 0.5 1 1.5 2 2.5−1.2

−0.8

−0.4

0

0.4

α

L2E

200

180

160

140

120

100

Figure 1. Comparison between L2E and MLE for linear re-gression as the number of outliers varies. Top row: data sam-ples, where the inliers are shown by cyan pluses, and the out-liers by magenta circles. The goal is to estimate the slope α ofthe line model y = αx. We vary the number of data samplesn = 100, 120, · · · , 200, by always adding 20 new outliers (re-taining the previous samples). In the second column the outliersare generated from another line model y = 2x. Middle and bot-tom rows: the curves of MLE and L2E respectively. The MLEestimates are correct for n = 100 but rapidly degrade as we addoutliers, see how the peak of the log-likelihood changes in the sec-ond row. By contrast, (see third row) L2E estimates α correctlyeven when half the data is outliers and also develops a local mini-mum to fit the outliers when appropriate (third row, right column).Best viewed in color.

propriate because, in this case, the contaminated data alsocomes from the same linear parametric model with slopeα = 2, e.g., y = 2x.

We now apply the L2E formulation in (1) to the pointmatching problem, assuming that the noise of the inliersis given by a normal distribution, and obtain the followingfunctional criterion:

L2E(f, σ2) =1

2d(πσ)d/2− 2

n

n∑i=1

φ(yi − f(xi)|0, σ2I

).

(2)We model the nonrigid transformation f by requiring it tolie within a specific functional space, namely a reproducingkernel Hilbert space (RKHS) [1, 24, 16]. Note that other pa-rameterized transformation models, for example, thin-platesplines (TPS) [23, 15], can also be easily incorporated into

our formulation.We define an RKHS H by a positive definite matrix-

valued kernel Γ : IRd × IRd → IRd×d. The optimal trans-formation f which minimizes the L2E functional (2) thentakes the form f(x) =

∑ni=1 Γ(x, xi)ci [16, 26], where

the coefficient ci is a d × 1 dimensional vector (to be de-termined). Hence, the minimization over the infinite di-mensional Hilbert space reduces to finding a finite set ofn coefficients ci. But in point correspondence problem thepoint set typically contains hundreds or thousands of points,which causes significant complexity problems (in time andspace). Consequently, we adopt a sparse approximation,and randomly pick only a subset of size m input points{x̃i}mi=1 to have nonzero coefficients in the expansion of thesolution. This follows [18] who found that this approxima-tion works well and that simply selecting a random subsetof the input points in this manner, performs no worse thanmore sophisticated and time-consuming methods. There-fore, we seek a solution of form

f(x) =

m∑i=1

Γ(x, x̃i)ci. (3)

The chosen point set {x̃i}mi=1 are somewhat analogous to“control points” [5]. By including a regularization term forimposing smooth constraint on the transformation, the L2Efunctional (2) becomes:

L2E(f, σ2) =1

2d(πσ)d/2− 2

n

n∑i=1

1

(2πσ2)d/2

e−‖yi−∑m

j=1 Γ(xi,x̃j)cj‖22σ2 + λ‖f‖2Γ, (4)

where λ > 0 controls the strength of regularization, andthe stabilizer ‖f‖2Γ is defined by an inner product, e.g.,‖f‖2Γ = 〈f, f〉Γ. By choosing a diagonal decomposablekernel [26]: Γ(xi, xj) = e−β‖xi−xj‖

2

I with β determiningthe width of the range of interaction between samples (i.e.neighborhood size), the L2E functional (4) may be conve-niently expressed in the following matrix form:

L2E(C, σ2) =1

2d(πσ)d/2− 2

n

n∑i=1

1

(2πσ2)d/2

e−‖yTi −Ui,·C‖2

2σ2 + λ tr(CTΓC), (5)

where kernel matrix Γ ∈ IRm×m is called the Gram matrixwith Γij = Γ(x̃i, x̃j) = e−β‖x̃i−x̃j‖

2

, U ∈ IRn×m withUij = Γ(xi, x̃j) = e−β‖xi−x̃j‖

2

, Ui,· denotes the i-th rowof the matrix U , C = (c1, · · · , cm)T is the coefficient ma-trix of size m× d, and tr(·) denotes the trace.

2.2. Estimation of the Transformation

Estimating the transformation requires taking the deriva-tive of the L2E cost function, see equation (5), with respect

3

Algorithm 1: Estimation of Transformation from Cor-respondences

Input: Correspondence set S = {(xi, yi)}ni=1,parameters γ, β, λ

Output: Optimal transformation f1 Construct Gram matrix Γ and matrix U ;2 Initialize parameter σ2 and C;3 Deterministic annealing:4 Using the gradient (6), optimize the objective

function (5) by a numerical technique (e.g., thequasi-Newton algorithm with C as the old value);

5 Update the parameter C ← arg minC L2E(C, σ2);6 Anneal σ2 = γσ2;7 The transformation f is determined by equation (3).

to the coefficient matrix C, which is given by:

∂L2E

∂C=

2UT[V ◦ (W ⊗ 11×d)]

nσ2(2πσ2)d/2+ 2λΓC, (6)

where V = UC − Y and Y = (y1, · · · , yn)T are matricesof size n × d, W = exp{diag(V V T)/2σ2} is an n × 1dimensional vector, diag(·) is the diagonal of a matrix, 11×dis an 1×d dimensional row vector of all ones, ◦ denotes theHadamard product, and ⊗ denotes the Kronecker product.

By using the derivative in equation (6), we can employefficient gradient-based numerical optimization techniquessuch as the quasi-Newton method and the nonlinear con-jugate gradient method to solve the optimization problem.But the cost function (5) is convex only in the neighborhoodof the optimal solution. Hence to improve convergence weuse a coarse-to-fine strategy by applying deterministic an-nealing on the inlier noise parameter σ2. This starts witha large initial value for σ2 which is gradually reduced byσ2 7→ γσ2, where γ is the annealing rate. Our algorithm isoutlined in algorithm 1.Computational complexity. By examining equations (5)and (6), we see that the costs of updating the objective func-tion and gradient are both O(dm2 + dmn). For the nu-merical optimization method, we choose the Matlab Opti-mization toolbox, which implicitly uses the BFGS Quasi-Newton method with a mixed quadratic and cubic linesearch procedure. Thus the total complexity is approxi-mately O(dm2 + dm3 + dmn). In our implementation,the numberm of the control points required to construct thetransformation f in equation (3) is in general not large, andso use m = 15 for all the results in this paper (increasingm only gave small changes to the results). The dimension dof the data in feature point matching for vision applicationsis typically 2 or 3. Therefore, the complexity of our methodcan be simply expressed as O(n), which is about linear inthe number of correspondences. This is important since it

enables our method to be applied to large scale data.

2.3. Implementation Details

The performance of point matching algorithms depends,typically, on the coordinate system in which points are ex-pressed. We use data normalization to control for this. Morespecifically, we perform a linear re-scaling of the correspon-dences so that the points in the two sets both have zero meanand unit variance.

We define the transformation f as the initial position plusa displacement function v: f(x) = x+v(x) [17], and solvefor v instead of f . This can be achieved simply by settingthe output yi to be yi − xi.Parameter settings. There are three main parameters inthis algorithm: γ, β and λ. The parameter γ controls the an-nealing rate. The parameters β and λ control the influenceof the smoothness constraint on the transformation f . Ingeneral, we found our method was very robust to parameterchanges. We set γ = 0.5, β = 0.8 and λ = 0.1 through-out this paper. Finally, the parameter σ2 and C in line 2 ofalgorithm 1 were initialized to 0.05 and 0 respectively.

3. Nonrigid Point Set Registration

Point set registration aims to align two point sets {xi}ni=1

(the model point set) and {yj}lj=1 (the target point set).Typically, in the nonrigid case, it requires estimating a non-rigid transformation f which warps the model point set tothe target point set. We have shown above that once wehave established the correspondence between the two pointsets even with noise and outliers, we are able to estimate theunderlying transformation between them. Next, we discusshow to find correspondences between two point sets.

3.1. Establishment of Point Correspondence

Recall that our method described above does not jointlysolve the transformation and point correspondence. In orderto use algorithm 1 to solve the transformation between twopoint sets, we need initial correspondences.

In general, if the two point sets have similar shapes, thecorresponding points have similar neighborhood structureswhich could be incorporated into a feature descriptor. Thusfinding correspondences between two point sets is equiv-alent to finding for each point in one point set (e.g., themodel) the point on the other point set (e.g., the target) thathas the most similar feature descriptor. Fortunately, the ini-tial correspondences need not be very accurate, since ourmethod is robust to noise and outliers. Inspired by thesefacts, we use shape context [3] as the feature descriptor inthe 2D case, using the Hungarian method for matching withthe χ2 test statistic as the cost measure. In the 3D case, thespin image [10] can be used as a feature descriptor, wherethe local similarity is measured by an improved correlation

4

Algorithm 2: Nonrigid Point Set Registration

Input: Two point sets {xi}ni=1, {yj}lj=1

Output: Aligned model point set {x̂i}ni=1

1 Compute feature descriptors for the target point set{yj}lj=1;

2 repeat3 Compute feature descriptors for the model point

set {xi}ni=1;4 Estimate the initial correspondences based on the

feature descriptors of two point sets;5 Solve the transformation f warping the model

point set to the target point set using algorithm 1;6 Update model point set {xi}ni=1 ← {f(xi)}ni=1;7 until reach the maximum iteration number;8 The aligned model point set {x̂i}ni=1 is given by{f(xi)}ni=1 in the last iteration.

coefficient. Then the matching is performed by a methodwhich encourages geometrically consistent groups.

The two steps of estimating correspondences and trans-formations are iterated to obtain a reliable result. In thispaper, we use a fixed number of iterations, typically 10 butmore when the noise is big or when there are a large per-centage of outliers contained in the original point sets. Wesummarize our point set registration method in algorithm 2.

3.2. Application to Image Feature Correspondence

The image feature correspondence task aims to find vi-sual correspondences between two sets of sparse featurepoints {xi}ni=1 and {yj}lj=1 with corresponding feature de-scriptors extracted from two input images. In this paper,we assume that the underlying relationship between the in-put images is nonrigid. Our method for this task is to esti-mate correspondences by matching feature descriptors us-ing a smooth spatial mapping f . More specifically, we firstestimate the initial correspondences based on the feature de-scriptors, and then use the correspondences to learn a spatialmapping f fitting the inliers by algorithm 1.

Once we have obtained the spatial mapping f , we thenhave to establish accurate correspondences. We prede-fine a threshold τ and judge a correspondence (xi, yj) tobe an inlier provided it satisfies the following condition:e−‖yj−f(xi)‖2/2σ2

> τ . We set τ = 0.5 in this paper.Note that the feature descriptors in the point set reg-

istration problem are calculated based on the point setsthemselves, and are recalculated in each iteration. How-ever, the descriptors of the feature points here are fixedand calculated from images in advance. Hence the iterativetechnique for recovering correspondences, estimating spa-tial mapping, and re-estimating correspondences can not beused here. In practice, we find that our method works well

without iteration, since we focus on determining the rightcorrespondences which does not need precise recovery ofthe underlying transformation, and our approach then playsa role of rejecting outliers.

4. Experimental ResultsIn order to evaluate the performance of our algorithm,

we conducted two types of experiments: i) nonrigid pointset registration for 2D shapes; ii) sparse image feature cor-respondence on 2D images and 3D surfaces.

4.1. Results on Nonrigid Point Set Registration

We tested our method on the same synthesized data as in[6] and [27]. The data consists of two different shape mod-els: a fish and a Chinese character. For each model, thereare five sets of data designed to measure the robustness ofregistration algorithms under deformation, occlusion, rota-tion, noise and outliers. In each test, one of the above distor-tions is applied to a model set to create a target set, and 100samples are generated for each degradation level. We usethe shape context as the feature descriptor to establish initialcorrespondences. It is easy to make shape context transla-tion and scale invariant, and in some applications, rotationinvariance is also required. We use the rotation invariantshape context as in [27].

Fig. 2 shows the registration results of our method onsolving different degrees of deformations and occlusions.As shown in the figure, we see that for both datasets withmoderate degradation, our method is able to produce analmost perfect alignment. Moreover, the matching perfor-mance degrades gradually and gracefully as the degree ofdegradation in the data increases. Consider the results onthe occlusion test in the fifth column, it is interesting thateven when the occlusion ratio is 50 percent our method canstill achieve a satisfactory registration result. Therefore ourmethod can be used to provide a good initial alignment formore complicated problem-specific registration algorithms.

To provide a quantitative comparison, we report the re-sults of four state-of-the-art algorithms such as shape con-text [3], TPS-RPM [6], RPM-LNS [27], and CPD [17]which are implemented using publicly available codes. Theregistration error on a pair of shapes is quantified as theaverage Euclidean distance between a point in the warpedmodel and the corresponding point in the target. Then theregistration performance of each algorithm is compared bythe mean and standard deviation of the registration error ofall the 100 samples in each distortion level. The statisticalresults, error means, and standard deviations for each set-ting are summarized in the last column of Fig. 2. In thedeformation test results (e.g., 1st and 3rd rows), five algo-rithms achieve similar registration performance in both fishand Chinese character at low deformation levels, and ourmethod generally gives better performance as the degree of

5

864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917

918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971

CVPR#***

CVPR#***

CVPR 2013 Submission #***. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809

810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863

CVPR#***

CVPR#***


Figure 4. Point matching results on dataset of Chinese character [4]. From top to bottom, experiments on deformation, rotation, noise,occlusion and outliers presented in every two rows. For each group of experiment, the upper figure is the model and target point sets, andthe lower figure is the registration result. From left to right, increasing degree of degradation.

8

756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809

810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863

CVPR#***

CVPR#***


0.02 0.035 0.05 0.065 0.08−0.04

−0.02

0

0.02

0.04

0.06

0.08

Degree of Deformation

Err

or

SCTPS−RPMRPM−LNSCPDOurs

0.1 0.2 0.3 0.4 0.5−0.04

0

0.04

0.08

0.12

Occlusion Ratio

Err

or


0.02 0.035 0.05 0.065 0.08−0.04

−0.02

0

0.02

0.04

0.06

0.08

Degree of DeformationE

rror


0.1 0.2 0.3 0.4 0.5

0

0.04

0.08

0.12

0.16

Occlusion Ratio

Err

or


Figure 5. .8Figure 6. .

[17] L. Yang, P. Meer, and D. J. Foran. Unsupervised Seg-mentation Based on Robust Estimation and Color Ac-tive Contour Models. IEEE Trans. on InformationTechnology in Biomedicine, 9(3):475–486, 2005. 3

[18] A. Zaharescu, E. Boyer, K. Varanasi, and R. Horaud.Surface Feature Detection and Description with Ap-plications to Mesh Matching. In CVPR, pages 373–380, 2009. 4

[19] J. Zhao, J. Ma, J. Tian, J. Ma, and D. Zhang. A Ro-bust Method for Vector Field Learning with Applica-tion to Mismatch Removing. In CVPR, pages 2977–

2984, 2011. 2[20] Y. Zheng and D. Doermann. Robust Point Matching

for Nonrigid Shapes by Preserving Local Neighbor-hood Structures. TPAMI, 28(4):643–649, 2006. 5, 6

9

Figure 2. Point set registration results of our method on the fish (top) and Chinese character (bottom) shapes [6, 27], with deformation andocclusion presented in every two rows. The goal is to align the model point set (blue pluses) onto the target point set (red circles). For eachgroup of experiments, the upper figure is the model and target point sets, and the lower figure is the registration result. From left to right,increasing degree of degradation. The rightmost figures are comparisons of the registration performance of our method with shape context(SC) [3], TPS-RPM [6], RPM-LNS [27] and CPD [17] on the corresponding datasets. The error bars indicate the registration error meansand standard deviations over 100 trials.

deformation increases. In the occlusion test results (e.g.,2nd and 4th rows), we observe that our method shows muchmore robustness compared with the other four algorithms.

More experiments on rotation, noise and outliers are alsoperformed on the two shape models, as shown in Fig. 3.From the results, we again see that our method is able togenerate good alignment when the degradation is moderate,and the registration performance degrades gradually andis still acceptable as the amount of degradation increases.

Note that our method is not affected by rotation which isnot surprising because we use the rotation invariant shapecontext as the feature descriptor. We also performed exper-iments on 3D data and got similar results.

In conclusion, our method is efficient for most non-rigidpoint set registration problems with moderate, and in somecases severe, distortions. It can also be used to providea good initial alignment for more complicated problem-specific registration algorithms.

6

864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917

918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971

CVPR#688

CVPR#688

CVPR 2013 Submission #688. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

Figure 6. Results on 3D surfaces of deformable objects (INRIA Dance-1 sequence). Top: results on frames 525 and 527; bottom: frames530 and 550. For each group, the left pair denotes the identified suspect inliers, and the right pair denotes the removed suspect outliers.

9

Figure 4. Results of image feature correspondence on 2D image pairs of deformable objects. From left to right, increasing degree ofdeformation. The inlier percentages in the initial correspondences are 79.61%, 56.57%, 51.84% and 45.71% respectively, and the corre-sponding precision-recall pairs are (100.00%, 99.73%), (99.06%, 99.53%), (99.09%, 99.35%) and (100.00%, 98.96%) respectively. Thelines indicate matching results (blue = true positive, green = false negative, red = false positive). Best viewed in color.

648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701

702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755

CVPR#***

CVPR#***


756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809

810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863

CVPR#***

CVPR#***



8

756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809

810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863

CVPR#***

CVPR#***



8

Figure 3. Point matching results. From top to bottom, experimental results on rotation, noise, occlusion and outliers presented in every tworows. For each group of experiment, the upper figure is the model and target point sets, and the lower figure is the registration result. Fromleft to right, increasing degree of degradation.

756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809

810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863

CVPR#***

CVPR#***



8

756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809

810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863

CVPR#***

CVPR#***



8


7

Figure 3. From top to bottom, results on rotation, noise and outlierspresented in every two rows. For each group of experiments, theupper figure is the data, and the lower figure is the registrationresult. From left to right, increasing degree of degradation.

4.2. Results on Image Feature Correspondence

In this section, we perform experiments on real images,and test the performance of our method for sparse imagefeature correspondence. These images contain deformableobjects and consequently the underlying relationships be-tween the images are nonrigid.

Fig. 4 contains a newspaper with different amounts ofspatial warps. We aim to establish correspondences be-tween sparse image features in each image pair. In our eval-uation, we first extract SIFT [13] feature points in each in-put image, and estimate the initial correspondences basedon the corresponding SIFT descriptors. Our goal is then toreject the outliers contained in the initial correspondencesand, at the same time, to keep as many inliers as possible.Performance is characterized by precision and recall.

The results of our method are presented in Fig. 4. For theleftmost pair, the deformation of the newspaper is relativelyslight. There are 466 initial correspondences with 95 out-liers, and the inlier percentage is about 79.61%. After usingour method to establish accurate correspondences, 370 outof the 371 inliers are preserved, and simultaneously all the95 outliers are rejected. The precision-recall pair is about

Table 1. Performance comparison on the image pairs in Fig. 4. Thevalues in the first row are the inlier percentages (%), and the pairsare the precision-recall pairs (%).

Inlier 79.61 56.57 51.84 45.71

ICF [12] (96.05, 98.38) (83.95, 86.73) (80.43, 95.48) (75.42, 92.71)VFC [26] (100.00, 97.04) (98.59, 99.53) (98.09, 99.35) (98.94, 97.91)Ours (100.00, 99.73) (99.06, 99.53) (98.09, 99.35) (100.00, 98.96)

(100.00%, 99.73%). On the rightmost pair, the deforma-tion is relatively large and the inlier percentage in the initialcorrespondences is only about 45.71%. In this case, ourmethod still obtains a good precision-recall pair (100.00%,98.96%). Note that there are still a few false positives andfalse negatives in the results since we could not preciselyestimate the true warp functions between the image pairsin this framework. The average run time of our method onthese image pairs is about 0.5 seconds on an Intel Pentium2.0 GHz PC with Matlab code.

In addition, we also compared our method to two state-of-the-art methods, such as identifying point correspon-dences by correspondence function (ICF) [12] and vectorfield consensus (VFC) [26]. The ICF uses support vectorregression to learn a correspondence function pair whichmaps points in one image to their corresponding points inanother, and then reject outliers by the estimated correspon-dence functions. While the VFC converts the outlier re-jection problem into a robust vector field learning problem,and learns a smooth field to fit the potential inliers as wellas estimates a consensus inlier set. The results are shown inTable 1. We see that all the three algorithms work well whenthe deformation contained in the image pair is relativelyslight. As the amount of deformation increases, the perfor-mance of ICF degenerates rapidly. But VFC and our methodseem to be relatively unaffected even when the number ofoutliers exceeds the number of inliers. Still, our methodgains slightly better results compared to VFC.

Our next experiment involves feature point matching on3D surfaces. We adopt MeshDOG and MeshHOG [25] asthe feature point detector and descriptor to determine the

7

108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161

162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215

CVPR#***

CVPR#***


Figure 1. Results on 3D surfaces of deformable objects (INRIA Dance-1 sequence, frames 525 and 527). There are 191 initial correspon-dences, and 156 of which are preserved by our method. The left pair denotes the identified suspect inliers, and the right pair denotes theremoved suspect outliers.

2. Estimating Spatial Mapping from Corre-spondences by L2E

Given a set of putative point correspondences S ={(xi, yi)}ni=1 which may be perturbed by noise and outliers,the goal is to learn a spatial mapping f : yi = f(xi) to fitthe inliers.

Without loss of generality, we make the assumption thatthe noise on the inliers is Gaussian on each component withzero mean and uniform standard deviation σ, i.e., for an in-lier point correspondence (xi, yi) it satisfies yi − f(xi) ∼N(0, σ2I), where I is an identity matrix of size d×d, and dis the dimension of the point. The data set {yi − f(xi)}ni=1

can then be seen as a sample set from a multivariate nor-mal density N(0, σ2I) with some unknown contaminationoutliers. The main idea of our technique is then, to find

the largest portion of the sample set (which corresponds tothe underlying inlier set) that “matches” the normal den-sity model, and consequently estimate the spatial mappingf based on the estimated inlier set. Next, we introduce a ro-bust estimator named L2-minimizing estimate (L2E), anduse it to estimate the spatial mapping f .

2.1. Problem Formulation Using L2E

Parametric estimation algorithms typically rely on max-imum likelihood estimation (MLE). It is known that themaximum likelihood is well suited for the estimation prob-lem in which the model is a good descriptor for the data andall the data are indeed coming from the model. However,the MLE estimates can be badly biased if the model is notgood enough or there are a small fraction of outliers. Toestimate the spatial mapping f , a robust estimator is desir-

2

Figure 5. Results on 3D surfaces of deformable objects (INRIADance-1 sequence). Top: results on frames 525 and 527; bottom:frames 530 and 550. For each group, the left pair denotes theidentified suspect inliers, and the right pair denotes the removedsuspect outliers.

initial correspondences. For the dataset, we use the INRIADance-1 sequence [25], in which each surface is from thesame moving person. Note that it is hard to give a quan-titative performance comparison since the correctness of acorrespondence is hard to decide. So we just schematicallyshow our results in Fig. 5. In the upper group, the twoframes are nearby, and the level of deformation is relativelyslight. There are 191 initial correspondences, of which 156are preserved after using our method to establish accuratecorrespondences. In the lower group, the two frames arefar apart, and the level of deformation is relatively large,leading to less good initial correspondences. There are 23initial correspondences, and 18 of which are preserved byour method.

5. ConclusionIn this paper, we have presented a new approach for

nonrigid point set registration. A key characteristic of ourapproach is the estimation of transformation from corre-spondences based on a robust estimator named L2E. Thecomputational complexity of estimation of transformationis linear in the scale of correspondences. We applied ourmethod to sparse image feature correspondence, where theunderlying relationship between images is nonrigid. Exper-iments on a public dataset for nonrigid point registration,2D and 3D real images for sparse image feature correspon-dence demonstrate that our approach yields results superiorto those of state-of-the-art methods when there is significantnoise and/or outliers in the data.

AcknowledgmentThe authors gratefully acknowledge the financial support

from the National Science Foundation (No. 0917141) and

China Scholarship Council (No. 201206160008). Z. Tuis supported by NSF CAREER award IIS-0844566, NSFaward IIS-1216528, and NIH R01 MH094343.

References[1] N. Aronszajn. Theory of Reproducing Kernels. Trans. of the Ameri-

can Mathematical Society, 68(3):337–404, 1950.[2] A. Basu, I. R. Harris, N. L. Hjort, and M. C. Jones. Robust and

Efficient Estimation by Minimising A Density Power Divergence.Biometrika, 85(3):549–559, 1998.

[3] S. Belongie, J. Malik, and J. Puzicha. Shape Matching and ObjectRecognition Using Shape Contexts. TPAMI, 24(24):509–522, 2002.

[4] P. J. Besl and N. D. McKay. A Method for Registration of 3-DShapes. TPAMI, 14(2):239–256, 1992.

[5] L. G. Brown. A Survey of Image Registration Techniques. ACMComputing Surveys, 24(4):325–376, 1992.

[6] H. Chui and A. Rangarajan. A new point matching algorithm fornon-rigid registration. CVIU, 89:114–141, 2003.

[7] A. W. Fitzgibbon. Robust Registration of 2D and 3D Point Sets.Image and Vision Computing, 21:1145–1153, 2003.

[8] P. J. Huber. Robust Statistics. John Wiley & Sons, New York, 1981.[9] B. Jian and B. C. Vemuri. Robust Point Set Registration Using Gaus-

sian Mixture Models. TPAMI, 33(8):1633–1645, 2011.[10] A. E. Johnson and M. Hebert. Using Spin-Images for Efficient Object

Recognition in Cluttered 3-D Scenes. TPAMI, 21(5):433–449, 1999.[11] J.-H. Lee and C.-H. Won. Topology Preserving Relaxation Labeling

for Nonrigid Point Matching. TPAMI, 33(2):427–432, 2011.[12] X. Li and Z. Hu. Rejecting Mismatches by Correspondence Func-

tion. IJCV, 89(1):1–17, 2010.[13] D. Lowe. Distinctive Image Features from Scale-Invariant Key-

points. IJCV, 60(2):91–110, 2004.[14] B. Luo and E. R. Hancock. Structural Graph Matching Using

the EM Algorithm and Singular Value Decomposition. TPAMI,23(10):1120–1136, 2001.

[15] J. Ma, J. Zhao, Y. Zhou, and J. Tian. Mismatch Removal via CoherentSpatial Mapping. In ICIP, 2012.

[16] C. A. Micchelli and M. Pontil. On Learning Vector-Valued Func-tions. Neural Computation, 17(1):177–204, 2005.

[17] A. Myronenko and X. Song. Point Set Registration: Coherent PointDrift. TPAMI, 32(12):2262–2275, 2010.

[18] R. Rifkin, G. Yeo, and T. Poggio. Regularized Least-Squares Clas-sification. In Advances in Learning Theory: Methods, Model andApplications, 2003.

[19] S. Rusinkiewicz and M. Levoy. Efficient Variants of the ICP Algo-rithm. In 3DIM, pages 145–152, 2001.

[20] D. W. Scott. Parametric Statistical Modeling by Minimum IntegratedSquare Error. Technometrics, 43(3):274–285, 2001.

[21] L. Torresani, V. Kolmogorov, and C. Rother. Feature Correspondencevia Graph Matching: Models and Global Optimization. In ECCV,pages 596–609, 2008.

[22] Y. Tsin and T. Kanade. A Correlation-Based Approach to RobustPoint Set Registration. In ECCV, pages 558–569, 2004.

[23] G. Wahba. Spline Models for Observational Data. SIAM, Philadel-phia, PA, 1990.

[24] A. L. Yuille and N. M. Grzywacz. A Mathematical Analysis of theMotion Coherence Theory. IJCV, 3(2):155–175, 1989.

[25] A. Zaharescu, E. Boyer, K. Varanasi, and R. Horaud. Surface FeatureDetection and Description with Applications to Mesh Matching. InCVPR, pages 373–380, 2009.

[26] J. Zhao, J. Ma, J. Tian, J. Ma, and D. Zhang. A Robust Method forVector Field Learning with Application to Mismatch Removing. InCVPR, pages 2977–2984, 2011.

[27] Y. Zheng and D. Doermann. Robust Point Matching for Non-rigid Shapes by Preserving Local Neighborhood Structures. TPAMI,28(4):643–649, 2006.

8

Documents

Robust Estimation of Nonrigid Transformation for …pages.ucsd.edu/~ztu/publication/cvpr13_l2e.pdfRobust Estimation of Nonrigid Transformation for Point Set Registration Jiayi Ma1;2,