Registration of 3D Points Using Geometric Algebra and Tensor Voting

International Journal of Computer Vision 75(3), 351–369, 2007

c© 2007 Springer Science + Business Media, LLC. Manufactured in the United States.

DOI: 10.1007/s11263-007-0038-z

Registration of 3D Points Using Geometric Algebra and Tensor Voting

LEO REYESCINVESTAV Unidad Guadalajara, Electrical Engineering and Computer Science Department, Av. Cientıfica 1145,

Colonia del Bajıo, CP 45010, Zapopan, Jalisco, Mexico

GERARD MEDIONIUniversity of Southern California, Institute for Robotics and Intelligence Systems, Los Angeles, CA 90089-0273

EDUARDO BAYRO∗

CINVESTAV Unidad Guadalajara, Electrical Engineering and Computer Science Department, Av. Cientıfica 1145,Colonia del Bajıo, CP 45010, Zapopan, Jalisco, Mexico

[email protected]

Received July 9, 2006; Accepted January 10, 2007

First online version published in February, 2007

Abstract. We address the problem of finding the correspondences of two point sets in 3D undergoing a rigid trans-formation. Using these correspondences the motion between the two sets can be computed to perform registration. Ourapproach is based on the analysis of the rigid motion equations as expressed in the Geometric Algebra framework.Through this analysis it was apparent that this problem could be cast into a problem of finding a certain 3D planein a different space that satisfies certain geometric constraints. In order to find this plane in a robust way, the TensorVoting methodology was used. Unlike other common algorithms for point registration (like the Iterated Closest Pointsalgorithm), ours does not require an initialization, works equally well with small and large transformations, it cannot betrapped in “local minima” and works even in the presence of large amounts of outliers. We also show that this algorithmis easily extended to account for multiple motions and certain non-rigid or elastic transformations.

Keywords: computer vision, 3D motion estimation, tensor voting, geometric algebra

1. Introduction

The problem of registering data sets is common in thecomputer vision literature. Applications range from thealignment of range measurements for the automatic re-construction of maps for robotic navigation (Ionescuet al., 1993; Lu and Milios, 1994); registration of CT andMR images for medical purposes (Grimson et al., 1994;Simon et al., 1995; Feldmar et al., 1996; Kybic and Unser,2003; Fookes et al., 2000); computer graphics and CADmodeling (Turk and Levoy, 1994; Eggert et al., 1998) andrecognition of objects (Borgefors, 1988; Wells, 1997).

1.1. State of the Art

The classical solution to this problem was given by Besland McKay with the Iterative Closest Points algorithm

∗Corresponding author.

(ICP) (Besl and McKay, 1992). Several improvementshave been made to the basic scheme starting with a moreefficient way to compute the distance and thresholdingof the points (Zhang, 1992) by the use of K-D trees andstatistical analysis, respectively; the use of extra cues toimprove the matching of points like texture (Johnson andKange, 1997; Guest et al., 2001); the implementation ofa soft-assign scheme to allow for varying degrees of cer-tainty in the matches (Wells, 1997); and the use of morerobust iteration techniques like Levenberg-Marquardt(Champleboux et al., 1992; Fitzgibbon, 2003; Chen andMedioni, 1991).

The advantages of the ICP are that it is simple, fast,and hence can be used for real-time applications. How-ever, in general, all the ICP-based approaches refine aninitial guess of the registration by iteratively updating theparameters of the transformation. If the initialization ispoor, the method will not converge to the desired solution.Because of this, ICP is generally used when the difference

352 Reyes, Medioni and Bayro

between the model and the data is small, constraining therange of possible applications. One way to overcome thisproblem is by initializing the transformation with othermethods like the Procrustes (Luo and Hancock, 1999)algorithm. Another drawback of this method is that it re-quires a model and a data set, so that the data set is asubset of the model. That is, at least one of the data setsmust not contain outliers and must be relatively noise-free. Also, the ICP in general is not robust against thepresence of large numbers of outliers, and is limited toregistering points via a rigid transformation. It can be ar-gued that the presence of outliers may be circumvented bya preprocessing stage; but the core of the ICP algorithmcannot deal with this problem by itself.

Methods that do not require initialization and are robustto outliers are generally based on voting schemes, like theHough Transform. In this respect, Hu (1995) solved theregistration problem in 3D by employing a Hough-basedvoting scheme. The advantages of this method are that itdoes not require an initialization and works equally wellindependently of the “size” of the transformation. How-ever, the translation must lie within the working range ofthe voting space. As will be proven in this paper, limitingand quantizing the range of possible transformations hasan effect on the accuracy and ranges that can be reliablydetected by this type of algorithms. Another disadvan-tage is that this method is model-based, meaning thatthe model must be relatively free of noise and must notcontain outliers.

It must be noted that all the algorithms previously men-tioned work with a single global motion. However, Kanget al. (2002) demonstrated how Tensor Voting can be usedto detect multiple affine motions in 2D. Their algorithmis largely independent of initialization (it allows for mul-tiple candidate matches and has a robust way of rejectingthe false matches), and is robust to outliers. This paperhas been inspired by the work of Kang, but we will beworking with 3D motions instead.

Another class of methods (Stein and Medioni, 1992;Johnson and Hebert, 1999; Chua and Jarvis, 1996, 1997)that do not require initialization and work with multi-ple simultaneous motions is based on the computationof local point or surface signatures, which are invariantto rotations and translations, for the model and the tar-get. These signatures are then matched and the resultingcorrespondences are used to register the objects. The 3Dregistration problem is then exchanged for a problem ofmatching enough pairs of signatures. The main difficultywith this approach is the selection of a good set of signa-tures so that different objects can be differentiated whilestill handling clutter, occlusion and noise.

There are also a few algorithms that handle non-rigidtransformations like (Chui and Rangarajan, 2000) whichuses soft-assign and an iterative minimization method(deterministic annealing) to produce a non-rigid regis-

tration of points. The advantage of this method is thatit guarantees a one-to-one mapping when the algorithmfinishes, but the disadvantage is that it requires a good ini-tialization and only withstands a small amount of outliersin the input data. Another ICP-based algorithm that per-forms non-rigid (affine and spline) registration is givenin Feldmar et al. (1996), the same disadvantages of ini-tialization and outliers apply. Finally, in Kybic and Unser(2003) B-splines are used to perform the registration andgradient descent technique is used to find the solution.This approach is dependent on a good initialization. Gen-erally, these methods find a greater application for med-ical imaging.

Another type of problem is the registration of mul-tiple data sets. Solutions to this problem have been re-viewed in Cunnington and Stoddart (1999). One solutionconsists mainly of expressing the problem as an optimiza-tion problem where the parameters are all the transfor-mations needed to register the multiple data sets (Fookeset al., 2000) and then a standard minimization algorithmis used. Another approach relies on modeling the prob-lem as a dynamic spring system to register the multiplesets together (Eggert et al., 1998; Stoddart and Hilton,1996). These algorithms suffer from the same problemsthat have been already mentioned: they require an ini-tialization and, depending on its quality, may or may notconverge to the desired solution, and they are not robustagainst the presence of outliers. However, we will restrictourselves to the two-frame case only.

In general, the ICP based algorithms suffer from thefollowing disadvantages.

• A good initialization must be provided. The quality ofthe initial correspondences impacts the performance ofthe algorithms.

• The range of possible motions is restricted (they workbetter for small motions).

• No outliers are permitted in the model and few to nonein the data.

• Requires preprocessing of the data to reject outliers (ifpresent).

• Only work when a single global rigid motion is present.• For the gradient-descent and Levenberg-Marquardt

implementations, a computation of the derivatives isneeded.

• The resulting mapping is not guaranteed to be one-to-one.

For the Hough-based algorithms, the disadvantages are

• The range of the possible transformations is limiteddue to the size of the voting space.

• The voting space introduces a quantization of the pa-rameters which compromises the accuracy of the solu-tion.

Registration of 3D Points Using Geometric Algebra and Tensor Voting 353

In this paper we propose a novel algorithm that pro-vides the following advantages.

• Robust to large numbers of outliers.• No initialization required.• No thresholding to reject incorrect pairings needed (as

is the case for most ICP implementations).• Works equally well for large and small motions.• No quantization is introduced for the parameters of the

transformation, the accuracy of the solution is thereforenot compromised.

• Does not require any preprocessing of the data.• Does not require estimation of derivatives.• Guarantees a one-to-one correspondence.• Multiple overlapping motions can be detected.

1.2. Affine 2D Motion Estimation Using Tensor Voting

As mentioned in previous paragraphs, this work waslargely inspired by Kang’s paper (Kang et al., 2002). Wewould like to discuss her approach in some detail here.

Kang begins with the six-parameter 2D affine trans-formation equation[

x ′

y′

]=

[a b

c d

] [x

y

]+

[tx

ty

], (1)

which transforms a 2D point (x, y) into (x ′, y′). Thisequation can be rewritten as

[a b −1 0 tx

c d 0 −1 ty

] [x y x ′ y′ 1

]T = 0.

(2)

From this equation it can be clearly seen that we actu-ally have the following separate joint spaces[a b −1 tx

] [x y x ′ 1

]T = 0, and (3)[c d −1 ty

] [x y y′ 1

]T = 0, (4)

which correspond to two independent 3D spaces in thehomogeneous coordinates (x, y, x ′, 1) and (x, y, y′, 1). It

is also easy to see that any correspondence pair (x, y) ↔(x ′, y′) obeying such an affine transformation will pro-duce two points in these joint spaces which lie on the 3Dplanes (a, b, −1, tx ) and (c, d, −1, ty).

From this observation, Kang proceeds to populateboth joint spaces with a set of tentative correspondences.These correspondences might contain a large amount ofoutliers (wrong correspondences). The outliers were dealtwith by applying Tensor Voting to find the actual planeswhich represented the affine motion.

1.3. Our Approach

Based on these results, we tried to find out what wouldhappen in the 3D rigid motion case. Hence, we followedKang’s procedure and began by writting down the equa-tion of rigid motion in 3D⎡⎢⎣x ′

y′

z′

⎤⎥⎦=

⎡⎢⎣ t A2x + c t Ax Ay + s Az t Ax Az − s Ay

t Ax Ay − s Az t A2y + c t Ay Az + s Ax

t Ax Az + s Ay t Ay Az − s Ax t A2z + c

⎤⎥⎦⎡⎢⎣ x

y

z

⎤⎥⎦ +

⎡⎢⎣ tx

ty

tz

⎤⎥⎦ , (5)

where θ is the angle of rotation, [Ax , Ay, Az] is theunitary axis of rotation and c = cos(θ ), s = sin(θ ),t = 1 − cos(θ ). This equation can be rewritten as

⎡⎢⎣ t A2x + c t Ax Ay + s Az t Ax Az − s Ay −1 0 0 tx

t Ax Ay − s Az t A2y + c t Ay Az + s Ax 0 −1 0 ty

t Ax Az + s Ay t Ay Az − s Ax t A2z + c 0 0 −1 tz

⎤⎥⎦

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

x

y

z

x ′

y′

z′

1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦= 0. (6)

These equations yield 3 independent 4D homegeneousspaces in the coordinates of (x, y, z, x ′, 1), (x, y, z, y′, 1)and (x, y, z, z′, 1). If we follow Kang’s idea, we wouldthen proceed to populate these 4D spaces with a set oftentative correspondences and then proceed to use TensorVoting to detect these planes. If simpler search space wasdesired, the equations would have to be reformulated soas to disentangle the complex relationships between thesines and cosines in the previous equations. Therefore, weturned our attention to Geometric Algebra to accomplishthis task.

In order not to interrupt the flow of ideas, we shall post-pone the explanation of Geometric Algebra to Section 2.In this framework, the rigid transformation of a point


x = xσ1 + yσ2 + zσ3 in 3D space can be written as

x′ = RxR + t. (7)

where R = eθ2

A = cos( θ2) + sin( θ

2)A = q0 + qxσ23 +

qyσ31 +qzσ12, is an entity called rotor that represents therotation about the axis A. The inverse of the rotor R isrepresented by the notation R, so that R R = 1. Finally,t is the 3D translation vector. By left-multiplication withR, the equation of rigid motion becomes

Rx′ =xR + Rt, (8)

(q0 + qxσ23 + qyσ31 + qzσ12)(x ′σ1 + y′σ2 + z′σ3)

= (xσ1 + yσ2 + zσ3)(q0 + qxσ23 + qyσ31 + qzσ12)

+ (q0+qxσ23+qyσ31+qzσ12)(txσ1+tyσ2+tzσ3).

(9)

Developing products, rearranging terms according totheir multivector parts and factoring we get

σ1 : q0(x ′ − x) − qy(z + z′) + qz(y + y′)

+ (qytz − q0tx − qzty) = 0, (10)

σ2 : q0(y′ − y) + qx (z + z′) − qz(x + x ′)

+ (qztx − q0ty − qx tz) = 0, (11)

σ3 : q0(z′ − z) − qx (y + y′) + qy(x + x ′)

+ (qx ty − q0tz − qytx ) = 0, (12)

σ123 : qx (x ′ − x) + qy(y′ − y) + qz(z′ − z)

−(qx tx + qyty + qztz) = 0. (13)

These equations clearly represent four 3D planes in theterms representing the rotor (q0, qx , qy and qz) and thetranslator (tx , ty and tz). Thus, in order to estimate thecorrespondences due to a rigid transformation, we canproduce a set of putative correspondences and use themto populate the joint spaces {(x ′ − x), (z + z′), (y + y′)},{(y′ − y), (z + z′), (x + x ′)}, {(z′ − z), (y + y′), (x + x ′)}and {(x ′ − x), (y′ − y), (z′ − z)}. If any rigid transforma-tion exists in the set of correspondences, then four planesmust appear in these spaces. In other words, if we candetect four significant planes in these joint spaces, thenthe points which lie in these planes are related by a rigidtransformation.

However, the problem can be further simplified. It canbe seen that the terms representing the planes are notindependent. We will show later that these planes arerelated by a powerful geometric constraint which enablesus to limit our search to a single plane which satisfies thisconstraint. It will be shown that this constraint is alsouseful to reject false or multiple matches.

We believe that this simplification is not possi-ble using the traditional matrix representation of the

rigid transformation, since the key step was a left-multiplication by the inverse of the rotor. This was neces-sary because the transformation equation requires a leftand right multiplication in order to rotate a 3D point inGeometric Algebra.

Finally, in order to find the plane in the joint space, avariety of techniques can be applied. However, we havedecided to use Tensor Voting (Medioni et al., 2000) be-cause this framework has proven to be quite robust andbecause it can detect general surfaces. In our case, certainnon-rigid motions which can be approximated piecewisewith rigid transformations produce curved surfaces in thejoint space. By using Tensor Voting, we can detect thosetransformations too.

1.4. Structure of the Paper

The structure of the paper is as follows. Section 2 is asmall introduction to Geometric Algebra. The formula-tion of our solution is presented here. Section 3 gives abrief description of the Tensor Voting methodology anddescribes how it applies to the current problem. The coreof the algorithm is presented in this section. Section 4describes results produced with both synthetic and realdata. Section 5 discusses the extension of our algorithmto multiple motion detection and non-rigid motion regis-tration. Finally, the conclusions follow in Section 6.

2. Geometric Algebra

The main alternative to the classical approach to com-puter vision is Clifford Geometric Algebra. This algebrasystem was invented by the english mathematicianWilliam Kingdom Clifford (1845–1879) who com-bined the ideas introduced by the german mathemati-cian Hermann Gunther Grassmann (1809–1877) and SirWilliam Rowan Hamilton (1805–1865). Since the 1960s,David Hestenes has been working on developing his ver-sion of Clifford Algebra (Hestenes, 1966). We will nowpresent a brief introduction to Geometric Algebra.

Geometric Algebra is enabled with a new product (theClifford product) that has an inverse (in general) and com-bines the properties of the interior and exterior products.For two vectors a and b, their Clifford product is ex-pressed as

ab = a · b + a ∧ b, (14)

where the wedge product ∧ is similar to the cross product;but instead of producing a vector, a new entity, called abivector is rendered. The bivector a∧b can be visualizedas the oriented plane spanned by a and b. The Cliffordproduct is linear, associative and anticommutative, thatis

ab = −ba (15)


qxqy

qz

B

Figure 1. A rotor R = e12 θ B visualized in the bivector bases of G3,0,0.

B is the unitary axis of rotation about the origin.

By definition, in a geometric algebra Gp,q,r , the first pbasis vectors σ1, . . . , σp will square to 1, the next q basisvectors σp+1, . . . , σp+q will square to −1 and the lastr basis vectors will square to 0. Thus, the common 3Dspace is usually represented in the space G3,0,0. Also, thewedge product of basis vectors is usually abbreviated bygrouping together the subindices, in other words: σi j =σi ∧ σ j .

In this framework, a general 3D rotation about the ori-gin is represented by an entity called rotor. The rotor canbe expressed in exponential form as

R = e12θ B, (16)

where B is a unit bivector that represents the axis ofrotation and θ is the angle of rotation. The inverse of

a rotor R is denoted by R, so that R R = 1 and can beobtained by reversing the sign of the bivector part of therotor. If we expand the previous formula we get

R = q0 + qxσ23 + qyσ31 + qzσ12, (17)

where sin( θ2)B = qxσ23 +qyσ31 +qzσ12 and q0 = cos( θ

2)

(see Fig. 1). Since σ 223 = σ 2

31 = σ 212 = −1, the previous

formula can be regarded as a generalized formulation forthe quaternion, with the added advantage that this entityis now part of a greater mathematical framework with avery concise geometric interpretation.

We shall conclude our introduction to Clifford Algebrahere. We refer the interested reader to Hestenes (1966),Hestenes and Sobczyk (1984), Lounesto (1997) andBayro-Corrochano (2001) for more details. We also rec-ommend the use of CLICAL (Lounesto, 1987; Lounestoet al., 1987) to help in familiarizing with the CliffordAlgebras.

2.1. Problem Formulation

Using the Geometric Algebra of 3D space G3,0,0 the rigidmotion of a 3D point x = xσ1 + yσ2 + zσ3 can be for-mulated as

x′ = RxR + t, (18)

where R is a rotor as described in the previous section, andt = txσ1 + tyσ2 + tzσ3. For simplicity, we will representR as rotor of the form

R = q0 + qxσ23 + qyσ31 + qzσ12. (19)

By left-multiplication with R, the equation of rigidmotion becomes

Rx′ =xR + Rt, (20)

(q0 + qxσ23 + qyσ31 + qzσ12)(x ′σ1 + y′σ2 + z′σ3)

= (xσ1 + yσ2 + zσ3)(q0 + qxσ23 + qyσ31 + qzσ12)

+ (q0+qxσ23+qyσ31+qzσ12)(txσ1+tyσ2+tzσ3).

(21)

Developing products, we get

q0x ′σ1 + qx x ′σ231 + qy x ′σ3 − qz x ′σ2 + q0 y′σ2

− qx y′σ3 + qy y′σ312 + qz y′σ1 + q0z′σ3 + qx z′σ2

− qyz′σ1 + qzz′σ123

= xq0σ1 + yq0σ2 + zq0σ3 + xqxσ123 + yqxσ3

− zqxσ2 − xqyσ3 + yqyσ231 + zqyσ1 + xqzσ2

− yqzσ1 + zqzσ312 + q0txσ1 + qx txσ231 + qytxσ3

− qztxσ2 + q0tyσ2 − qx tyσ3 + qytyσ312 + qztyσ1

+ q0tzσ3 + qx tzσ2 − qytzσ1 + qztzσ123. (22)

Re-arranging terms according to their multivector basiswe obtain the following four equations

σ1 : q0x ′ + qz y′ − qyz′

= q0x + qyz − qz y + q0tx + qzty − qytz, (23)

σ2 : q0 y′ + qx z′ − qz x ′

= q0 y + qz x − qx z + q0ty + qx tz − qztx , (24)

σ3 : q0z′ + qy x ′ − qx y′

= q0z + qx y − qy x + q0tz + qytx − qx ty, (25)

σ123 : qx x ′ + qy y′ + qzz′

= qx x + qy y + qzz + qx tx + qyty + qztz . (26)

These equations can be re-arranged to express linear re-lationships in the joint difference and sum spaces

σ1 : q0(x ′ − x) − qy(z + z′) + qz(y + y′)

+ (qytz − q0tx − qzty) = 0, (27)

σ2 : q0(y′ − y) + qx (z + z′) − qz(x + x ′)

+ (qztx − q0ty − qx tz) = 0, (28)

σ3 : q0(z′ − z) − qx (y + y′) + qy(x + x ′)

+ (qx ty − q0tz − qytx ) = 0, (29)

σ123 : qx (x ′ − x) + qy(y′ − y) + qz(z′ − z)

− (qx tx + qyty + qztz) = 0. (30)


These equations clearly represent four 3D planes in theentries of the rotor and the translator (the unknowns).Thus, in order to estimate the correspondences due toa rigid transformation, we can use a set of tentativecorrespondences to populate the joint spaces {(x ′ −x), (z + z′), (y + y′)}, {(y′ − y), (z + z′), (x + x ′)},{(z′−z), (y+y′), (x+x ′)} and {(x ′−x), (y′−y), (z′−z)}.If four planes appear in these spaces, then the points lyingon them are related by a rigid transformation. Howeverwe will shown in the following section, that the first threeplanes of Eqs. (27)–(29) are related by a powerful geo-metric constraint. So it is enough with finding the planedescribed by Eq. (30) and verify that it satisfies this con-straint. We will now show how this is done and howthis geometric constraint helps in eliminating multiplematches too.

2.2. The Geometric Constraint

Let (xi , x′i ) and (x j , x′

j ) be two points in correspondencethrough a rigid transformation. Then, these points satisfyEq. (27)

q0dx − qysz + qzsy + qk = 0, (31)

q0d ′x − qys ′

z + qzs′y + qk = 0, (32)

where dx = x ′i − xi , sz = z′

i + zi , sy = y′i + yi ; d ′

x = x ′j −

x j , s ′z = z′

j +z j , s ′y = y′

j +y j ; and qk = qytz −q0tx −qzty .If we substract these equations we get

q0vx − qyvsz + qzvsy = 0, (33)

where vx = dx − d ′x , vsz = sz − s ′

z and vsy = sy − s ′y .

Using the definition of R, this equation can be rewrittenas

kvx − ayvsz + azvsy = 0, (34)

where k = cos( θ2)/ sin( θ

2). Using a similar procedure,

for the Eqs. (28) and (29) we end up with the followingsystem of equations

kvx + azvsy − ayvsz = 0, (35)

kvy + axvsz − azvsx = 0, (36)

kvz + ayvsx − axvsy = 0. (37)

Where vy , vz and vsx are defined accordingly. Note thatwe now have a system of equations depending on theunitary axis of rotation [ax , ay, az]. Since we can obtainthe axis of rotation as the normal of the plane describedby Eq. (30), then we only have one unknown: k. These

equations can be mixed to yield the following three con-straints

vy(ayvsz − azvsy) − vx (azvsx − axvsz) = 0, (38)

vz(ayvsz − azvsy) − vx (axvsy − ayvsx ) = 0, (39)

vz(azvsx − axvsz) − vy(axvsy − ayvsx ) = 0. (40)

These equations only depend on the points themselvesand on the plane spanned by them. Thus, if we populatethe joint space described by Eq. (30) with a set of tentativecorrespondences and detect a plane in this space, we canverify if this plane corresponds to an actual rigid transfor-mation by verifying that the points which lie on this planesatisfy Eqs. (38)–(40). Note that these constraints neverbecome undefined because the factor k was removed. Sothis test can always be applied to confirm or reject a planethat seems to represent a rigid transformation. Further-more, since these constraints have been derived from theoriginal plane Eqs. (27)–(29) they are, in a sense, express-ing the requirement that these points lie simultaneouslyon all three planes.

On the other hand, Eqs. (38–40) have an interesting ge-ometric interpretation. They are in fact expressing a dou-ble cross product that is only satisfied for true correspon-dences. To see this, note that if A = [ax , ay, az]

T, V =[vx , vy, vz]

T and Vs = [vsx , vsy, vsz]T, then Eqs. (38)–

(40) can be rewritten as a vector equation of the form

V × (A × Vs) = 0. (41)

This equation only holds due to an inherent symme-try that is only present for true correspondences; in otherwords, these equations can be used to reject false matchestoo. To prove this, first remember the well-known fact thatany rigid motion in 3D is equivalent to a screw motion(rotation about the screw axis followed by a translationalong it). Hence, without loss of generality, we can con-sider the case where the screw axis is aligned with the zaxis. In this case, the screw motion consists of a rotationabout z followed by a translation tz along it. Therefore,

vz = dz − d ′z = z′

i − zi − z′j + z j

= (zi + tz) − zi − (z j + tz) + z j

= 0. (42)

Also, note that since A = [0 0 1]T, then the first crossproduct of Eq. (41), A × Vs = [−vsy, vsx , 0]T, hence thevsz component of Vs is irrelevant in this case and can besafely disregarded. Thus, we can analyze this problemin 2D by looking only at the x and y components ofV and Vs . Accordingly, the difference and sum vectorswill only have two components di = [dx , dy]T, d j =[d ′

x , d ′y]T, si = [sx , sy]T, and s j = [s ′

x , s ′y]T. The situation

is illustrated in Figs. 2(a) and (d).


x

xi

x’i

xj

x’jy

x

xi

x’i

xj

x’j

y

x

xi

x’i

x’j

y

(a) (b) (c)

(d) (e) (f )

x

xididi

V

Vs

si

sj

dj

x’i

xj

x’jy

x

xidi

di

V

Vs

si

sj

dj

x’i

xj

x’j

y

x

xi

di

V

Vs

sisj

dj

x’i

x’j

y

Figure 2. (a) and (d) Two correspondences belonging to the same transformation and the geometry of the plane of rotation. (b) and (e) Two

correspondences belonging to different transformations (different angle) and their corresponding geometry. (c) and (f) Multiple correspondence case

and its geometry.

Since the angle between xi and x ′i is the same as the

angle between x j and x ′j , then the parallelograms spanned

by the sum and difference of these points (the dashedlines in Fig. 2(d)) are equal up to a scale factor. Thatis, there is a scale factor k such that k||si || = ||s j || andk||di || = ||d j ||. From whence ||si ||/||di || = ||s j ||/||d j ||.In turn, this means that the triangle formed by the vectorsdi , d j and V is proportional to the triangle si , s j , Vs . Andsince, by construction, si ⊥ di and s j ⊥ d j , then V ⊥ Vs .

Now, let us return to Eq. (41). The cross product A×Vs

has the effect of rotating Vs by 90 degrees since A =[0, 0, 1]T in this case. But since Vs ⊥ V , then the vectorA×Vs will be parallel to V and hence, their cross productwill always be 0, which is consistent with the analyticderivation.

This symmetry is broken if we have points that belongto different transformations, as shown in Figs. 2(b) and(e) (we assume the worst case in which points xi and x j

were applied the same translation but different rotationangle φ. If a different translation is present, the planesof motion will be different for both points, breaking thesymmetry). Note how the angle between V and Vs is notorthogonal (Fig. 2(e)).

In a similar way, when we have multiple correspon-dences, i.e., xi matches both x ′

i and x ′j , i �= j (Fig. 2(c)),

the symmetry is also broken and V is not orthogonal

to Vs (see Fig. 2(f)). Hence, the constraint expressed byEq. (41) can be used to reject multiple matches too.

Following this procedure, we were able to cast theproblem of finding the correspondences between two setsof 3D points due to rigid transformation, into a problemof finding a 3D plane in a joint space which satisfies threegeometric constraints. In order to find a 3D plane froma set of points which may contain a large proportion ofoutliers, several methods can be used. We have decided touse Tensor Voting because it has proven to be quite robustand because it can be used to detect general surfaces too,which in turn enables us to easily extend our method tonon-rigid motion estimation. We will now explain TensorVoting and the details of the detection algorithm.

3. Tensor Voting

Tensor voting is a methodology for the extraction of denseor sparse features from n-D data. Some of the featuresthat can be detected with this methodology include lines,curves, points of junction and surfaces.

The Tensor Voting methodology is grounded in two el-ements: tensor calculus for data representation and tensorvoting for data communication. Each input site propa-gates its information in a neighborhood (the information


itself is encoded as a tensor and is defined by a predefinedvoting field). Each site collects the information cast thereby its neighbors and analyzes it, building a saliency mapfor each feature type. Salient features are located at localextrema of these saliency maps, which can be extractedby non-maximal suppression.

For the present work, we found that sparse tensor vot-ing was enough to solve the problem. Since we are onlyconcerned with finding 3D planes, we will limit our dis-cussion to the detection of this type of feature. We referthe interested reader to Medioni et al. (2000) for a com-plete description of the methodology.

3.1. Tensor Representation in 3D

In tensor voting, all points are represented as second ordersymmetric tensors. To express a tensor S we choose totake the associated quadratic form, and to diagonalize it,leading to a representation based on the eigenvalues λ1,λ2 and λ3 and the eigenvectors e1, e2 and e3. Therefore,we can write the tensor S as

S = [e1 e2 e3

] ⎡⎢⎣λ1 0 0

0 λ2 0

0 0 λ3

⎤⎥⎦⎡⎢⎣ eT

1

eT2

eT3

⎤⎥⎦ . (43)

Thus, a symmetric tensor can be visualized as an ellip-soid where the eigenvectors correspond to the principalorthonormal directions of the ellipsoid and the eigenval-ues encode the magnitude of each of the eigenvectors (seeFig. 3).

For the rest of this paper, we will use the conven-tion that the eigenvectors have been arranged so thatλ1 > λ2 > λ3. In this scheme, points are encoded as balltensors (i.e. tensors with eigenvalues λ1 = λ2 = λ3 ≥ 1);curvels as plate tensors (i.e. tensors with λ1 = λ2 = 1,and λ3 = 0, tangent direction given by e3); and surfels asstick tensors (i.e. λ1 = 1, λ2 = λ3 = 0, normal directiongiven by e1).

A ball tensor encodes complete uncertainty of direc-tion, a plate tensor encodes uncertainty of direction in twoaxis, but complete certainty in the other one, and a sticktensor encodes absolute certainty of direction. Tensors

e1e2

e3

3

2

1

Figure 3. Graphic representation of a second order 3D symmetric

tensor.

that lie between these three extremes encode differing de-grees of direction certainty. The point-ness of any giventensor is represented by λ3, the curve-ness is representedby λ2 − λ3 and the surface-ness by λ1 − λ2. Also, notethat a second order tensor only encodes direction, but notorientation, i.e. two vectors v and −v will be encoded asthe same second order tensor.

3.2. Voting Fields in 3D

We have just seen how the various types of input data areencoded in tensor voting, now we will describe how thesetensors communicate between them. The input usuallyconsists of a set of sparse points. These points are encodedas ball tensors if no information is available about theirdirection (i.e. identity matrices with λ1 = λ2 = λ3 = 1).If only a tangent is available, the points are encoded asplate tensors (i.e. tensors with λ1 = λ2 = 1, and λ3 = 0).Finally, if information about the normal of the point isgiven, it is encoded as a stick tensor (i.e. tensors withonly one nonzero eigenvalue: λ1 = 1, λ2 = λ3 = 0).Then, each encoded input point, or token communicateswith its neighbors using either a ball voting field (if no ori-entation is present), a plate voting field (if local tangentsare available), or a stick voting field (when the normal isavailable). The voting fields themselves consist of varioustypes of tensors ranging from stick to ball tensors.

These voting fields have been derived from a 2D funda-mental voting field that encodes the constrains of surfacecontinuity and smoothness, among others. To see howthe fundamental voting field was derived, suppose thatwe have a voter p with an associated normal np. At eachvotee site x surrounding the voter p, the direction of thefundamental field nx is determined by the normal of theosculating circle at x that passes through p and x and hasnormal np at p, see Fig. 4.

The saliency decay function of the fundamental fieldDF(s, κ, σ ) at each point depends on the arc length s =

lθsin θ

and curvature κ = 2 sin θl between p and x (see Fig. 4)

and is given by the following Gaussian function

DF(s, κ, σ ) = e−( s2+cκ2

σ2 ), (44)

np s

l

x

n x

Figure 4. The osculating circle and the corresponding normals of the

voter and votee.


x

y

(a) (b) (c)

Figure 5. The fundamental voting field. (a) Shape of the direction field when the voter p is located at the origin and its normal is parallel to the yaxis (i.e. n p = y). (b) Strength at each location of the previous case. White colors denote high strength, black denote no strength. (c) 3D display of

the strength field, where the strength has been mapped to the z coordinate.

where σ is a scale factor that determines the overall rateof attenuation and c is a constant that controls the decaywith high curvature. Note that the strength of the field be-comes negligible beyond a certain distance, in this way,each voting field has an effective neighborhood associ-ated with it given by σ . The shape of the fundamentalfield can be seen in Fig. 5. In this figure, the directionand strength fields are displayed separately. The directionfield shows the eigenvectors with the largest associatedeigenvalues for each tensor surrouding the voter (center).The strength field shows the value of the largest eigen-value around the voter: white denotes a strong vote, blackdenotes no intensity at all (zero eigenvalue).

Finally, both orientation and strength are encoded as astick tensor. In other words, each site around this votingfield, or votee, is represented as a stick tensor with varyingstrength and direction.

Communication is performed by the addition of thestick tensor present at the votee and the tensor pro-duced by the field at that site. To exemplify the votingprocess, imagine we are given an input point x and avoter located at the point p with associated normal np.The input point (votee) is first encoded as a ball tensor(λ1 = λ2 = λ3 = 1). Then the vote generated by p on xis computed. This vote is, in turn, a stick tensor. To com-pute this vote, Eq. (44) is used to compute the strength ofthe vote (λ1). The direction of the vote (e1) is computedthrough the osculating circle between the voter and thevotee (using the voter’s associated normal np). The othereigenvalues (λ2 and λ3) are set to zero and then, the sticktensor is computed using Eq. (43). Finally, the resultingstick tensor vote is added with ordinary matrix additionto the encoded ball tensor at x . Since, in general, the stickvote has only one nonzero eigenvalue, the resulting addi-tion produces a non-identity matrix with one eigenvaluemuch larger than the others (λ1 > λ2 and λ1 > λ3). Inother words, the ball tensor at x becomes an ellipsoidin the direction given by the stick tensor. The larger thefirst eigenvalue of the voter is, the more pronounced thisellipsoid becomes.

To speed things up, however, these calculations arenot done in practice. Instead, the set of votes surrounding

the stick voting field is precomputed and stored using adiscrete sampling of the space. When the voting is per-formed, these precomputed votes are just aligned withthe voter’s associated normal np and the actual vote iscomputed by linear interpolation.

Please, note that the stick voting field (or fundamen-tal voting field) can be used to detect surfaces. In or-der to detect joints, curves and other features, differentvoting fields must be employed. These other 3D votingfields can be generated by rotating the fundamental vot-ing field about the x , y and z axes, depending on the typeof field we wish to generate. For example, if the voteris located at the origin and its normal is parallel to they-axis, as in Fig. 5, then we can rotate this field about they-axis to generate the 3D stick voting field, as shown inFig. 6.

In a more formal way, let us define the general rotationmatrix Rψφθ where ψ , φ and θ stand for the angles ofrotation about the x , y and z axes, respectively, and letVS(x) stand for the tensor vote cast by a stick voting fieldin 3D at site x . Then VS(x) can be defined as

VS(x) =∫ π

0

Rψφθ V f(R−1

ψφθp)RT

ψφθdφ (ψ = θ = 0),

(45)

where V f (x) stands for the vote cast by the FundamentalVoting Field in 2D at site x .

xz

y

(a) (b)

Figure 6. The stick voting field in 3D. (a) The direction of the 3D

stick voting field when the voter is located at the origin and its normal

is parallel to the y-axis. Only the e1 eigenvectors are shown at several

positions. (b) The strength of this field. White denotes high strength,

black denotes no strength.


x

y

(a) (b) (c)

Figure 7. The ball voting field. (a) Direction field when the voter is located at the origin. (b) Strength of this field. White denotes high strength,

black denotes no strength. (c) 3D display of the strength field with the strength mapped to the z-axis.

The stick voting field can be used when the normalsof the points are available. However, when no orientationis provided, we must use the ball voting field. This fieldis produced by rotating the stick field about all the axesand integrating the contributions at each site surroundingthe voter. For example, for the field depicted in Fig. 5,the 2D ball voting field is generated by rotating this fieldabout the z-axis (as shown in Fig. 7). This 2D ball votingfield is further rotated about the y-axis to generate the 3Dball voting field. In other words, the 2D ball voting fieldVb(x) at site x can be defined as

Vb(x) =∫ π

0

Rψφθ V f(R−1

ψφθp)RT

ψφθdθ (ψ = φ = 0),

(46)

and the 3D ball voting VB(x) field can thus be furtherdefined as

VB(x) =∫ π

0

Rψφθ Vb(R−1

ψφθp)RT

ψφθdφ (ψ = θ = 0);

(47)

or, alternatively as

VB(x) =∫ π

0

∫ π

0

Rψφθ V f(R−1

ψφθp)RT

ψφθdθdφ (ψ = 0).

(48)

Please note that by rotating the stick voting field aboutall axes and adding up all vote contributions, the shapeof the votes in the ball voting field varies smoothly fromnearly stick tensors at the edge (λ1 = 1, λ2 = λ3 = 0),to ball tensors near the center of the voter (λ1 = λ2 =λ3 = 1). Thus, this field consists of ellipsoid-type ten-sors of varying shape. This is the reason why this fieldis not simply a “radial stick tensor field”, with stick ten-sors pointing radially away from the center. However, theadded complexity of rotating the stick tensor voting fieldto generate this field does not impact the implementation.As dicussed previously, in practice, this field is precom-puted in discreet intervals and linearly interpolated whennecessary.

Finally, the ball voting field can be used to infer thepreferred orientation (normals) at each input point if nosuch information is present to begin with. After votingwith the ball voting field, the eigensystem is computedat each input point and the eigenvector with the greatesteigenvalue is taken as the preferred normal direction atthat point. With the normals at each input point thus com-puted, a further stick voting step can be used to reinforcethe points which seem to lie in a surface. Surface detectionis precisely what we need in order to solve the original3D point registration problem. We will now describe theprocess of surface detection used in our approach.

3.3. Detection of 3D Surfaces

Now that we have defined the tensors used to encode theinformation and the voting fields, we can describe theprocess used to detect the presence of a 3D surface in aset of points. We will limit our discussion of this processto the stages of Tensor Voting which are relevant to ourwork. We refer the interested reader to Medioni et al.(2000) for a full account on feature inference throughTensor Voting.

As described previously, the input to our algorithm is a3D space populated by a set of putative correspondencesbetween two point sets. To avoid confusion, we will referto the points in the joint space simply as “tokens”. Each ofthese tokens is encoded as unitary ball tensor (i.e. with theidentity matrix I3×3). Then we place a ball voting field ateach input token and cast votes to all its neighbors. Sincethe strength of the ball voting field becomes negligibleafter a certain distance (given by the free parameter σ inEq. (44)), we only need to cast votes to the tokens whichlie within a small neighborhood about each input token.To cast a vote, we simply add the tensor present at thevotee with the tensor produced by the ball voting field atthat position. This process constitutes a sparse ball vot-ing stage. Once this stage is finished, we can examinethe eigensystem left at each token and thus extract thepreferred normals at each site. The preferred normal di-rection is given by the eigenvector e1, and the saliency ofthis orientation is given by λ1 − λ2.


After this step, each token has an associated normal.The next step consists in using the 3D stick voting fieldto cast votes to the neighbors so that the normals arereinforced. In order to cast a stick vote, the 3D stick votingfield is first placed on the voter and oriented to match itsnormal. Once again, the extent of this field is limited bythe parameter σ so we only need to cast votes to the tokenswhich lie within a small neighborhood of the voter. Afterthe votes have been cast, the eigensystem at each tokenis computed to obtain the new normal orientation andstrength at each site. This process constitutes a sparsestick voting stage.

In ordinary Tensor Voting, the eigensystem at eachtoken is used to compute different saliency maps: point-ness λ3, curve-ness λ2 − λ3 and surface-ness λ1 − λ2.Then, the derivative of these saliency maps is computedand non-maximal supression is used to locate the mostsalient features. After this step, the surfaces are poly-gonized using a marching cubes algorithm (or similar).However, our objective in this case was not the extrac-tion of polygonized surfaces; but simply the location ofthe most salient surface. Hence, a simple thresholdingtechnique was used instead. The token with the greatestsaliency is located and the threshold is set to a small per-centage of this saliency. Thus for example, tokens with asmall λ1 relative to this token are discarded. In a similarfashion, tokens with a small surface-ness (λ1 − λ2) withrespect to this token are also deleted.

After the sparse stick voting is performed, only thetokens which seem to belong to surfaces (i.e. λ1 is notsmall and λ1 − λ2 is high) cast votes to its neighborsto further reinforce the surface-ness of the tokens. Inputtokens which do not belong to surfaces are discarded (setto 03×3). This process is repeated a fixed number of timeswith increasing values of σ in order to make the surface(s)grow. In this way, there is a high confidence that thetokens which have not been discarded after the repeatedapplication of the sparse stick voting stage belong to asurface.

3.4. Estimation of 3D Correspondences

Given two sets of 3D points X1 and X2, we are expected tofind the correspondences between these two sets assum-ing a rigid transformation has taken place, and we havean unspecified number of outliers in each set. No otherinformation is given.

In the absence of better information, we populate thejoint space (x ′−x), (y′−y), (z′−z) by matching all pointsfrom the first set with all the points from the second set.Note that this allows us to detect any motion regardlessof its magnitude; but the amount of outliers present inthe joint space is multiplied 100-fold by this matchingscheme.

The tokens in the joint space thus populated are thenprocessed with Tensor Voting in order to detect the mostsalient plane, as described in the previous section. Theplane thus detected is further tested against the constraintgiven by Eq. (41). This constraint requires the specifica-tion of two different tokens. In practice we use the tokenon the plane with the highest saliency and test it againstthe rest of the points on the plane. If any pair of tokensdoes not satisfy the constraint, we remove it from theplane. If not enough points remain after this pruning iscompleted, we reject the plane. Remember that the en-forcement of this constraint also avoids the presence offalse or multiple matches. So the output is a one-to-oneset of correspondences.

As can be easily noted, Eq. (30) collapses for the caseof pure translation. However, in this case, all points thatbelong to a rigid transformation will tend to cluster to-gether in a single token in the joint space. This cluster iseasy to detect after the sparse ball voting stage becausethese tokens will have a large absolute saliency value atthis stage. If such a cluster is found, we stop the algorithmand produce the correspondences based on the tokens thatwere found clustered together.

Following this simple procedure, we can detect anyrigid transformation. Note, however, that using the sim-ple matching scheme mentioned earlier, the amount ofoutliers in the joint space is multiplied by 100-fold. Whenthe amount of outliers in the real space is large (for ex-ample, in the order of 90%), this can complicate the de-tection of surfaces in the joint space. When a situationlike this arises, we have adopted a scheme where sev-eral rotation angles and axes are tested in a systematicfashion in order to make the detection process simpler. Inthis variation of the algorithm, the set X1 is first rotatedaccording to the current axis and angle to be tested, andthe joint space specified by Eq. (30) is populated again.We then run the detection process using Tensor Voting.If a plane with enough support is found, we stop the al-gorithm and output the result. Otherwise, the next angleand rotation axis is tested until a solution is found, or allthe possibilites have been tested. The whole algorithm issketched in Algorithm 1. Finally, note that this variationof the algorithm is only needed when large amounts ofoutliers are present in the input, as stated previously.

Finally, we recognize that the scheme proposed here isfar from perfect. The exhaustive search of all angles androtation axes can be quite time-consuming, and appearsto be a little simplistic. Unfortunately, the density of theplane we are seeking varies with the angle of the rotationapplied to the set of points. That is, the density of thisplane is minimum (the plane spans the full voting space)when the rotation is 180◦, and it becomes infinite whenwe have pure translation (all the points of the plane clusterin a single location in space). Hence there does not seemto be some type of heuristic or constraint we can apply


1. Initialize the rotation angle α = 0◦, and axis A = [0, 0, 1]T.2. Rotate the set X1 according to α and A.

Populate the voting space with tokens generated from the candidate correspondences.3. Initialize all tokens to ball tensors.4. Perform sparse ball voting and extract the preferred normals.5. Check for the presence of a set of tokens clustered about a single point in space.

If this cluster is found, finish and output the corresponding translation detected.6. Perform sparse stick voting using the preferred normals. Optionally, repeat this step

a fixed number of times to eliminate outliers. After each iteration, increase the reachof the votes slightly, so as to make the plane grow.

7. Obtain the equation of the plane described by the tokens with highest saliency.Enforce the constraint of Eq. (41) and delete the tokens which do not satisfy it.

8. If a satisfactory plane is found, output the correspondences. Otherwise, increment α and A,and repeat steps 2–7 until all angles and axes of rotation have been tested.

Algorithm 1. General algorithm for the detection of correspondences between two 3D point sets under a single rigidtransformation.

to prune the search. An alternative to this is to use theother three search spaces as described in Eqs. (27)–(29)and perform Tensor Voting to detect these planes to helpimprove the search method. This is a matter for futureresearch.

However, this disadvantage is only apparent if the mag-nitude of the transformation is unbounded. Algorithmslike the ICP require that the transformation be relativelysmall. If we use the same limitation in our method, we donot need this exhaustive search, and our method workswithout the an iterative scheme. Finally, as will be shownin the next section, our algorithm has a complexity ofO(n2) in the worst case, where n is the number of tokensin the voting space (which never occurs in practice be-cause this implies that each token casts a vote to everyother token).

3.5. Analysis of the Algorithm

The complexity analysis is rather straightforward. Let usconsider the first voting step where a sparse ball votingis performed. Let n be the number of tokens in the votingspace. Suppose that in average, each of these tokens reachm neighbors with their voting fields. Then, the averagenumber of votes cast is nm. Therefore, the complexity ofthis step is O(nm) where n is the number of tokens ineach voting space and m is the number of tokens withinreach. Note that all subsequent voting is also of the sameorder. Hence, the overall complexity of our algorithm isO(mn), in average. In the worst case, where every singletoken casts a vote to every other token, the complexityincreases to O(n2). Now, if we use the simple matchingscheme mentioned earlier (each input 3D point in the firstset X1 is matched against each point in the second set X2),then the number of tokens in the joint space becomes

n = n1n2, where n1 = ‖X1‖ and n2 = ‖X2‖. The averagecomplexity thus becomes O(mn1n2) and the worst-casecomplexity becomes O((n1n2)2).

The algorithm, in its most general implementation(test-and-hypothesize runs, iterations to grow the planeslowly) is rather slow. It may take several minutes tohours to compute a 3D motion with lots of outliers. How-ever, bear in mind that this is an excessive scenario. Mostapplications rely on the assumption that the motion be-tween both data sets is small and, more importantly, thatthere are a small quantity of outliers. If we apply thesesame constraints to our algorithm, we can produce anoptimized version that produces the desired results in afast way, prone to be used in real-time systems. This ispossible mainly because in these cases no iterations areneeded at all, and the problem is solved in two or threetensor voting passes.

4. Testing our Method

Our method for 3D correspondences was first validatedby performing several experiments with synthetic data,and then some experiments with real data were con-ducted. Since the core of the problem is the detectionof a 3D plane among a set of points, we considered us-ing the Hough Transform for this. However, the HoughTransform did not prove to be useful in a large portion ofthe cases. This is because the voting space has 100 timesmore points due to false matches than the actual inliers(this is, of course, the best case: assuming that no out-liers are present in the original data sets) and the actualplane only covers a small region of the space. Hence, theglobal maximum in the Hough space almost always cor-responds to a random plane generated by the vast majorityof the tokens present. Whereas tensor voting is based on


050

100

0

50

100

0

50

-100-50

050

100

-100

-50

0

50

100

-100

-50

0

50

-100-50

050

100

-100

-50

0

50

100

-100

-50

0

50

ActualPlane

(a) (b)

(c)

Figure 8. (a) The joint space (x ′ − x), (y′ − y), (z′ − z). (b) The incorrect plane detected by the 3D Hough Transform. (c) The actual plane we are

seeking (correctly detected with our algorithm).

the density of these tokens which is not only a functionof the number of points present, but also on their rela-tive positions. An illustration of this problem appears inFig. 8. In this figure, the left image shows the joint space.The plane detected by Hough is highlighted in Fig. 8(b),whereas the actual plane is shown in Fig. 8(c).

Of course, it can be argued that increasing the res-olution of the Hough space will help in the detectionof the correct plane, however, due to the large amountsof points that will be tested, the computation time be-comes prohibitive. On the other hand, Tensor Voting canbe optimized to run with the use of Oct-trees so thatonly a tiny amount of tokens receive votes at each stage.Hence, we only compared our method against the Iter-ated Closest Points algorithm (ICP). The classic ICP canbe found in Besl and McKay (1992), though we actually

used Fitzgibbon’s implementation (F-ICP) as describedin Fitzgibbon (2003).

For the case of F-ICP, the tests were conducted in thefollowing fashion. A set of points was generated inside a10×10×10 cube of mute units centered on the origin. Aknown rigid transformation was then applied to this set.The bounds of these points are computed and the sameamount of random outliers is added to both sets. Theoutliers are distributed uniformly throughout the boundsset by the original sets.

It is well-known that the reliability of F-ICP dependslargely on the similarity between the two input sets: thecloser the transformation is to the identity, the betterit performs. Since our algorithm effectively rotates oneset of points until it finds a solution, it can be consid-ered that we too, are dependent upon the quality of the


Table 1. Success rates for the F-ICP algorithm. The range of the

components of the translation vector are shown in the columns labelled

Tx , Ty and Tz .

Angle Tx , Ty , Tz Outliers Success rate Trials

−5..5◦ −1.5..1.5 50% 90% 20

−5..5◦ −1.5..1.5 66% 80% 20

−5..5◦ −1.5..1.5 75% 60% 20

−5..5◦ −1.5..1.5 80% 67% 20

−5..5◦ −1.5..1.5 83% 50% 20

initialization. Therefore, to try to keep things fair, wechose to constrain the range of rotations between −5◦

and 5◦ and the translation range [Tx , Ty, Tz] to ±1.5 inall axes while testing the F-ICP algorithm. The angles,axis of rotation and translations were generated randomlyfollowing an uniform distribution. The range of the axisof rotation was largely immaterial and therefore coveredthe whole hemisphere of possible orientations. An exper-iment is deemed a success if at least 50% of the corre-spondences are correctly identified. The results of thesetests, are presented in Table 1.

Our algorithm was tested in a similar fashion. How-ever, in this case, the full range of possible rotation angleswas employed. The translation was also increased to liebetween −10 and 10 units. The results of these experi-ments can be seen in Table 2. It will be noted that we didnot include the density ratio column in this table, as wedid for the 2D case. This is because the actual densityof the plane varies with the rotation between both setsof points. The density of the plane reaches a minimumof 0.05 points per cubic volume unit at 180◦ and a maxi-mum of ∞ at 0◦ (all the tokens become clustered in thesame point).

As can be appreciated from these tables, our algorithmstill performs reliably at 75% outliers, whereas F-ICPonly succeeds about 60% of the times at this mark. Also,the performance of our algorithm deteriorates at muchslower pace than F-ICP in general. This shows that, justas in the 2D case, F-ICP does not withstand as muchoutliers as our algorithm.

Table 2. Success rates for the estimation of correspondences for 3D

rigid motion using Tensor Voting. The range of the components of the

translation vector are shown in the columns labelled Tx , Ty and Tz .

Angle Tx , Ty , Tz Outliers TV success Trials

0..360◦ −10..10 75% 100% 20

0..360◦ −10..10 83% 90% 20

0..360◦ −10..10 85% 85% 20

0..360◦ −10..10 88% 80% 20

0..360◦ −10..10 90% 80% 20

0..360◦ −10..10 93% 70% 20

4.1. Experiments with Real Data

When dealing with real data, the input points usuallyhave some amount of noise. This noise in turn affects theshape and thickness of the plane that has to be detected inthe joint space. Thus, instead of producing an ideal planein the joint space, points with noise yield a “fuzzy” planethat has small variations over its surface. However, evenin spite of this, Tensor Voting can be used successfully todetect these “fuzzy” surfaces. Also, the constraint givenby Eq. (41) has to be relaxed in order to avoid rejectingpoints which do not seem to be in correspondence dueto the noise, this relaxation is accomplished by checkingthat the equation yields a small absolute value, instead ofzero.

We performed the following experiments. First, we fol-lowed the position of a robotic arm in 3D in a sequence ofstereo pairs (Fig. 9 shows only the left images of the se-quence). In this case, the problem of 3D reconstruction isconsidered already solved and the input to our algorithmare the 3D points of this reconstruction. In practice, or-dinary camera calibration and stereo matching (throughcross-correlation) where performed to achieve the 3D re-construction.

The model of the object was picked by hand from thefirst reconstruction and then we computed the position ofthe arm in the subsequent reconstructions using an opti-mized version of our algorithm (namely, it only consistedof two stages: sparse ball voting and sparse stick voting,no iterations were used). Note that the sequence of im-ages do not form a video, hence, the features cannot betracked between successive frames due to the relativelylarge differences between the snapshots. After the mo-tion was computed, the position of the arm in 3D wasreprojected on the images (drawn in white) as shown inFig. 9.

In a second experiment we made a reconstruction ofa styrofoam model of a head using a stereo camera. Thetwo reconstructions are shown in Figs. 10(a)–(c). Thealigned sets can be seen in Figs. 10(d)–(f). In this case,however, another optimization was used. Since the setsare close to each other, and the points provide enoughstructure, we used tensor voting to compute the preferrednormals at each site.

The computation of the normals proceeds as in stan-dard sparse tensor voting. First, we initialized each pointto a ball tensor. Then, we placed a normal ball votingfield on each point and cast votes to all the neighbors.Then, the preferred normal at each site is obtained bycomputing the eigensystem at each point and selectingthe eigenvector with the greatest eigenvalue. A closeupof the surface of the model and some of the normals foundby this method is shown in Fig. 10(g). We used this in-formation to prune the candidate matches to those thatshared a relatively similar orientation only.


Figure 9. Sequence of (left) images from a stereo camera showing the position of the reprojected arm (in white lines). This is not a video.

(a)

(d) (e) (f ) (g)

(c)(b)

Figure 10. (a)–(c) Sets to be aligned. (d)–(f) Sets after alignment.

Also, note that in this case, there are non-rigid differ-ences between both sets. This can be noted in places likethe chin, where the alignment could not be made simplybecause the size of this section of the reconstruction dif-fers slightly between both sets (the overall height of thehead is slightly larger in the second set). Hence, it is im-possible to match all the points at the same time. However,even in spite of this, our algorithm does yield a reasonablesolution. In practice, two different surfaces are formed inthe joint space, one corresponds to a rigid transformationthat matches the forehead and the nose, and the othercorresponds to the transformation that aligns the chins ofboth models. We have chosen to display the first solution,where the upper part of the head is correctly aligned—this

solution also corresponds to the largest surface in the jointspace. The main point of this experiment is to show thatour algorithm still works even when the input cannot beperfectly aligned with a single rigid transformation.

Finally, in our last experiment, we aligned a model of aToyota car taken with a laser range scanner and aligned itwith a noisy reconstruction performed with a stereo cam-era. The noisy target is shown in Fig. 11(a), the model andthe target are shown in Fig. 11(b), and the final alignmentin Fig. 11(c). The procedure is the same as in the previ-ous case. Again, since the data sets provided structure,we used it to our advantage by computing the preferrednormals using tensor voting and pruning the candidatematches as described previously (Fig. 11(d)).


(c) (d)

(a) (b)

Figure 11. (a) Target for alignment, note the noisy surface. (b) Model displayed over the target. (c) Model and data after alignment. (d) Closeup of

the surface of the model showing some of the normals computed with Tensor Voting.

5. Extensions to Multiple Overlapping Motionsand Non-Rigid Motion

Another advantage our method has over ICP and similarmethods is the ability to simultaneously detect multipleoverlapping motions. This is also true for the 3D case.In this case, each different motion simply produces an-other plane in the voting space. There are limitations tothe motions that can be differentiated, though. A quickanalysis of Eq. (30) reveals that if two different motionsshare the same axis of rotation and same overall transla-tion, then they will span the same 3D plane in the votingspace. However, in these circumstances, it suffices withanalyzing the other three voting spaces (Eqs. (27)–(29))to disambiguate this case.

To illustrate this, we present a synthetic example wherethree overlapping motions with different axes of rotation,angles and translations were generated in a 10 × 10 × 10cube centered at the origin (see Fig. 12). Our algorithmis applied as described in Algorithm 1. However, afterthe first plane was detected, we removed its tokens fromthe voting space and the process was repeated until nomore planes were found. This is, of course, the naiveimplementation of the solution. However, the algorithmcan be modified to account for the presence of multi-ple planes. In that case, only the final stage, where theconstraint from Eq. (41) is enforced, would be executedseparately for each set of points.

5.1. Extension to Non-Rigid Motion

While it can still be argued that, with some work, theHough Transform might also be used to detect the sameplane we obtain through Tensor Voting, there is anotheradvantage to using the latter over the former: Tensor Vot-ing enables us to find general surfaces. This means thatwe can also detect certain non-rigid motions that producenon-planar surfaces in the voting spaces.

To illustrate this, we generated a synthetic plane andthen applied a twist transformation to it (see Fig. 13(a)).This transformation produces a curved surface in the vot-ing space (clearly visible in the center of Figs. 13(b)and (c), a closeup of the surface is also presentedFig. 14). The surface is easily detected using TensorVoting and the resulting correspondences, from twodifferent viewpoints, can be seen in Figs. 13(d) and(e).

In order to detect this surface, we had to modify ouralgorithm as follows. The first two stages (sparse ballvoting and sparse stick voting) are performed as usual.However, in the last stage, Eq. (41) was not enforcedglobally, but only locally around each active token. Inother words, we enforced presence of rigid transforma-tions only on a local level. It must be remembered thatEq. (41) depends on two points. Therefore, for each to-ken that was verified, we used the closest active neighbor.We illustrate this in Fig. 14. In that figure, the token xi is


0

5

0

2

4

6

0

2

4

6

0

5

0

5

0

2

4

6

0 2 4 6

0

1

2

3

4

5

0

2

4

6

02

460

2

0

1

2

3

4

5

(a) (b)

(c) (d)

Figure 12. (a) Three overlapping rigid motions in 3D. (b)–(d) The different motions as detected by our algorithm.

(a) (b) (c)

(d) (e)

Figure 13. (a) Non-rigid motion applied to a 3D plane. (b) and (c) The curved surface that was generated in the voting space from two different

view points. (d) and (e) The resulting correspondences found with our algorithm seen from two different viewpoints.


xi xj

Figure 14. A closeup of the surface corresponding to an elastic mo-

tion. The constrains of Eq. (41) are only verified locally between the

closest point pairs. In this figure, token xi is verified with its closest

neighbor, x j . Other pairs to be verified are also highlighted in the figure.

Figure 15. Solving the correspondences problem in catadioptric im-

ages via a spheric mapping. White circles were successfully registered

into the black and white circles.

being verified using its closest neighbor, x j . The normalsof the tokens are also shown.

A simpler version of this algorithm was also used tosolve the correspondences problem in catadioptric im-ages. In this case, the 2D images are mapped to a 3Dsphere. In this space, the corresponding corners in the 2Dimages form a curved 3D surface which is easily detectedusing Tensor Voting. The resulting correspondences canbe seen in Fig. 15.

6. Conclusions

We have presented a novel non-iterative algorithm thatcombines the power of expression of Geometric Algebrawith the robustness of Tensor Voting to find the corre-spondences between two sets of 3D points with an un-derlying rigid transformation. This algorithm was alsoshown to work with excessive amounts of outliers in bothsets.

We have also used Geometric Algebra to derive a setof constraints (Eq. (41)) that serves a double purpose:on one hand, it lets us decide whether or not the currentplane corresponds to a rigid motion; and on the otherhand, it allows us to reject multiple matches and enforcethe uniqueness constraint.

Our algorithm does not require an initialization(though it can benefit from one). Works equally well forlarge and small motions. And can be easily extended toaccount for multiple overlapping motions and even cer-tain non-rigid transformations.

It must be noted that our algorithm can detect multipleoverlapping motions, whereas the current solutions onlywork for one global motion. We have also shown that ouralgorithm can work with data sets that present small non-rigid deformations. During our synthetic experiments wealso noticed that some types of non-rigid motion produce3D curves instead of surfaces in the voting space. The nextlogical step is to explore what other types of motion canbe detected, and to extend these results to the detectionof multiple non-rigid motions.

In the unconstrained case, with a large amount of out-liers (83–90%), our algorithm can take several minutesto finish. However, in most real-life applications, theseextreme circumstances are not found, and a good initial-ization can be computed. When these conditions are met,our algorithm can be rather fast.

It is worth noting the effect of the “sampling rate” ofthe input points in our algorithm. Our method is basedon an equation for point motion and assumes a directpoint-to-point correspondence can be established. If theinput data sets have been sampled at different rates, thiscondition will not be met and the performance of our algo-rithm will degrade, maybe even stop working altogether.We are currently working on ways to overcome thisproblem.

Another aspect of our 3D algorithm is that we mustrotate one of the sets of points in order to densify theplane we are looking for in the voting space. We are cur-rently exploring other ways to make this more efficient.One possible solution might come from considering allfour voting spaces simultaneously. Up until now, only thespace produced by the trivector part was considered andthe other spaces were only used in an implicit fashion toprovide extra constraints to identify valid rigid motions.Maybe these other spaces can be used to direct the searchof the transformation we are seeking, but this is a subjectfor future research.

However, in spite of this apparent disadvantage. Ouralgorithm works even without initialization, unlike otheralgorithms like ICP. Furthermore, our algorithm can beused to initialize subsequent refinement stages with ICP,thus solving the problem of having a good initializationfor that algorithm.


References

Bayro-Corrochano, E. 2001. Geometric Computing for Perception Ac-tion Systems. Springer-Verlag.

Besl, P.J. and McKay, N. 1992. A method for registration of 3-d shapes.

IEEE Transactions on Pattern Analysis and Machine Intelligence,

14(2):239–256.

Borgefors, G. 1988. Hierarchical chamfer matching: A parametric edge

matching algorithm. IEEE Transactions on Pattern Analysis andMachine Intelligence, 10(6):849–865.

Champleboux, G., Lavallee, S., Szeliski, R., and Brunnie, L. 1992. From

accurate range imaging sensor calibration to accurate model-based

3d subject localization. In IEEE Conference on Computer Vision andPattern Recognition, pp. 83–89.

Chen, Y. and Medioni, G. 1991. Object modeling by registration of mul-

tiple range images. In IEEE International Conference on Roboticsand Automation, vol. 3, pp. 2724–2729.

Chua, C. and Jarvis, R. 1996. 3d free-form surface registration and ob-

ject recognition. International Journal Computer Vision, 17(1):77–

99.

Chua, C. and Jarvis, R. 1997. Point signatures: A new representation

for 3d object recognition. International Journal Computer Vision,

25(1):63–85.

Chui, H. and Rangarajan, A. 2000. A new point matching algorithm for

non-rigid registration. In IEEE Conference on Computer Vision andPattern Recognition (CVPR), vol. 2, pp. 44–51.

Cunnington, S.J. and Stoddart, A.J. 1999. N-view point set registra-

tion: A comparison. In British Machine Vision Conference, pp. 234–

244.

Eggert, D., Fitzgibbon, A.W., and Fisher, R.B. 1998. Simultaneous

registration of multiple range views satisfying global consistency

constraints for use in reverse engineering. In Computer Vision andImage Understanding, vol. 69, pp. 253–272.

Feldmar, J., Malandain, G., Declerck, J., and Ayache, N. 1996. Exten-

sion of the icp algorithm to non-rigid intensity-based registration of

3d volumes. In Workshop on Mathematical Methods in BiomedicalImage Analysis, pp. 84–93.

Fitzgibbon, A.W. 2003. Robust registration of 2d and 3d point sets.

Image Vision Computing, 21:1145–1153.

Fookes, C., Williams, J., and Bennamoun, M. 2000. Global 3d rigid

registration of medical images. In International Conference on ImageProcessing, vol. 2, pp. 447–450.

Grimson, W., Lozano-Perez, T., Wells, W., Ettinger, G., White, S., and

Kikinis, R. 1994. An automatic registration method for frameless

stereotaxy, image-guided surgery, and enhanced reality visualization.

In IEEE Conference on Computer Vision and Pattern Recognition,

pp. 430–436.

Guest, E., Berry, E., Baldock, R., Fidrich, M., and Smith, M. 2001.

Robust point correspondence applied to two- and three-dimensional

image registration. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 23(2):165–179.

Hestenes, D. 1966. Space-Time Algebra. Gordon and Breach.

Hestenes, D. and Sobczyk, G. 1984. Clifford Algebra to Geometric Cal-culus: A Unified Language for Mathematics and Physics. Dordrecht.

Hu, G. 1995. 3-d object matching in the hough space. In IEEE Inter-national Conference on Systems, Man and Cybernetics ‘IntelligentSystems for the 21st Century’, vol. 3, pp. 2718–2723.

Ionescu, D., Abdelsayed, S., and Goodenough, D. 1993. A registra-

tion and matching method for remote sensing images. In CanadianConference on Electrical and Computer Engineering, vol. 2, pp.

710–712.

Johnson, A. and Hebert, M. 1999. Using spin images for efficient object

recognition in cluttered 3d scenes. IEEE Transactions on PatternAnalysis and Machine Intelligence, 21(5):433–449.

Johnson, A. and Kange, S. 1997. Registration and integration of textured

3d data. In 3DIM’97, pp. 234–241.

Kang, E.Y., Cohen, I., and Medioni, G. 2002. Robust affine motion esti-

mation in joint image space using tensor voting. In 16th InternationalConference on Pattern Recognition, vol. 4, pp. 256–259.

Kybic, J. and Unser, M. 2003. Fast parametric elastic image registration.

IEEE Transactions on Image Processing, 12(11):1427–1442.

Lounesto, P. 1987. Clical, a calculator type computer program for vec-

tors, complex numbers, quaternions, bivectors, spinors, and multi-

vectors in clifford algebras.

Lounesto, P. 1997. Clifford Algebras and Spinors. Cambridge Univer-

sity Press.

Lounesto, P., Mikkola, R., and Vierros, V. 1987. Clical user manual.

Tech. Rep. A248, Institute of Mathematics.

Lu, F. and Milios, E. 1994. Robot pose estimation in unknown envi-

ronments by matching 2d range scans. In Conference on ComputerVision and Pattern Recognition, pp. 935–938.

Luo, B. and Hancock, E.R. 1999. Matching point-sets using procrustes

alignment and the em algorithm. In 10th British Machine VisionConference, pp. 43–52.

Medioni, G., Lee, M., and Tang, C. 2000. A Computational Frameworkfor Segmentation and Grouping. Elsevier Science.

Simon, D.A., Herbert, M., and Kanade, T. 1995. Techniques for fast

and accurate intra-surgical registration. Journal of Image GuidedSurgery, 1(1):17–29.

Stein, F. and Medioni, G. 1992. Structural indexing: efficient 3-d object

recognition. IEEE Transactions on Pattern Analysis and MachineIntelligence, 14(2):125–145.

Stoddart, A.J. and Hilton, A. 1996. Registration of multiple point sets. In

Proceedings of the International Conference on Pattern Recognition,

pp. 40–44.

Turk, G. and Levoy, M. 1994. Zippered polygons meshes from range

images. In ACM SIGGRAPH Conference on Computer Graphics, pp.

311–318.

Wells, W. 1997. Statistical approaches to feature-based object recogni-

tion. International Journal of Computer Vision, 21:63–98.

Zhang, Z. 1992. Iterative point matching for registration of free-form

curves. Tech. Rep. 1658, INRIA.

Documents

Registration of 3D Points Using Geometric Algebra and Tensor Voting