Fast and Robust Symmetric Image Registration …Johan Ofverstedt, Joakim Lindblad,¨ Member, IEEE, and Natasa Sladoje,ˇ Member, IEEE Abstract—Intensity-based image registration

1

Fast and Robust Symmetric Image RegistrationBased on Distances Combining Intensity and

Spatial InformationJohan Ofverstedt, Joakim Lindblad, Member, IEEE, and Natasa Sladoje, Member, IEEE

Abstract—Intensity-based image registration approaches relyon similarity measures to guide the search for geometric corre-spondences with high affinity between images. The properties ofthe used measure are vital for the robustness and accuracy of theregistration. In this study a symmetric, intensity interpolation-free, affine registration framework based on a combination ofintensity and spatial information is proposed. The excellentperformance of the framework is demonstrated on a combinationof synthetic tests, recovering known transformations in thepresence of noise, and real applications in biomedical and medicalimage registration, for both 2D and 3D images. The methodexhibits greater robustness and higher accuracy than similaritymeasures in common use, when inserted into a standard gradient-based registration framework available as part of the open sourceInsight Segmentation and Registration Toolkit (ITK). The methodis also empirically shown to have a low computational cost,making it practical for real applications. Source code is available.

Index Terms—Image registration, set distance, gradient meth-ods, optimization, cost function, iterative algorithms, fuzzy sets,magnetic resonance imaging, transmission electron microscopy.

I. INTRODUCTION

IMAGE registration [1]–[4] is the process of establishingcorrespondences between images and a reference space,

such that the contents of the images have a high degree ofaffinity in the reference space. An example of such corre-spondence is a mapping of an image (often referred to asfloating image) of a brain to a reference space of another image(often referred to as reference image) of a brain where theirimportant structures are well co-localized. There are two maincategories of approaches for image registration: feature-basedmethods extract a set of feature points between which a corre-spondence is found, whereas intensity-based methods use thevoxel-values directly, and evaluate candidate mappings basedon a similarity measure (affinity). There are also two maincategories of transformation models: linear (which include,as special cases, rigid, similarity, and affine transformations),and non-linear (deformable). The combination of differentiabletransformation models and differentiable similarity measuresenables the use of gradient-based local optimization methods.

Medical and biomedical image registration, [4]–[6], is animportant branch of general image registration and a lot

The authors are with the Centre for Image Analysis, Depart-ment of Information Technology, Uppsala University, Uppsala, Swe-den. Lindblad and Sladoje are also with Mathematical Institute ofthe Serbian Academy of Sciences and Arts, Belgrade, Serbia. E-mail:johan.ofverstedt,joakim.lindblad,[email protected]

of effort has been invested over the last decades to refinethe tools and techniques, [2]. Although a majority of therecent research has been devoted to non-linear registrationtechniques, the most prevalent registration method used in theclinic is still linear registration. In a number of situations,the deformations allowed by non-linear registration can bedifficult to evaluate and may affect reliability of diagnosis,[2]; hence, physicians may prefer a more constrained rigid oraffine alignment. Considering their wide usage as fundamentaltools, improvement of rigid and affine registration in terms ofperformance and reliability is highly relevant in practice.

Feature-based image registration is dependent on the abilityof the feature extraction method to locate distinct points ofinterest appearing in both (all) images. Feature-extractors (e.g.SIFT [7]) typically presuppose the existence and relevanceof specific local characteristics such as edges, corners andother salient features; if no, or too few, such distinct pointsare found, the registration will fail. This is often the case inmedical and biomedical applications, [8], [9], where intensity-based registration, therefore, tends to be the method of choice.Figure 1 shows an illustrative example of a biomedical applica-tion where a feature-based method fails, whereas an intensity-based method can be successful.

Intensity-based registration is, in general, formulated as anon-convex optimization problem. The similarity measurescommonly used as optimization criteria typically exhibit ahigh number of local optima [10], [11]; a count which tendsto rapidly increase under noisy conditions. A small regionof attraction of a global optimum imposes that the startingposition has to be set very close to the optimal solution for itto be found by an optimizer. This leads to reliability challengesfor automated solutions.

In this study we develop a registration framework based ona family of symmetric distance measures, proposed in [11],which combine intensity and spatial information in a singlemeasure. These measures have been shown to be characterizedby smooth distance surfaces with significantly fewer localminima than the commonly used intensity-based measures,when studied in the context of template matching and objectrecognition. In this work we demonstrate that slightly modifiedversions of these distance measures can be successfully usedfor fast and robust affine image registration. By differentiatingthe distance measure we are able to use efficient gradient-based optimization. The proposed method outperforms thecommonly used similarity measures in both synthetic and realscenarios of medical and biomedical registration tasks, which

arX

iv:1

807.

1159

9v2

[cs

.CV

] 2

1 Fe

b 20

19

2

(a) Reference Image (b) Floating Image (c) FB Rigid (Fail) (d) FB Affine (Fail) (e) IB, Rigid+Affine

Fig. 1. Illustrative example of a biomedical registration task where a widely used feature-based (FB) method (SIFT, as implemented in FIJI-plugin LinearStack Alignment2) fails, while the proposed intensity-based (IB) method (Sec. V-F1) performs well. Green points in (a) and (b) are incorrectly detected ashaving a match, and red points do not have a match. The feature-extractor fails to detect points corresponding to the relevant structures (one approximatelycorrect match, indicated with arrows, can be found manually), and both the central rings and the outer rings are misaligned.

we confirm by (i) landmark-based evaluation on transmissionelectron microscopy (TEM) images of cilia [12], with the aimof improving multi-image super-resolution reconstruction, aswell as (ii) evaluation on the task of atlas-based segmentationof magnetic resonance (MR) images of brain, on the LPBA40-dataset [13].

Intensity interpolation is typically a required tool in the con-text of intensity-based registration performed with commonlyused similarity measures since the sought transformation (andintermediate candidates) is likely to map points to regionsoutside of the regular grid. Treating the reference and floatingimages differently in terms of the interpolation introducesa significant source of asymmetry [14] and may lead tosuccess or failure of a registration task depending on whichimage is selected as reference and which is floating. Ourproposed approach requires no off-grid intensity values, andis interpolation-free in terms of intensities; empirical testsconfirm that it is highly symmetric in practice.

Noting that intensity-based image registration can be com-putationally demanding, we also include a study of executiontime of (i) isolated distance and gradient computations throughmicro-benchmarks, and (ii) entire image registration tasks.We observe that the proposed measure is fast to computein comparison with the implementations of the measuresexisting in the ITK-framework [14]. The proposed registra-tion framework is implemented in C++/ITK, as well as inPython/NumPy/SciPy, and its source code is available3.

II. PRELIMINARIES AND PREVIOUS WORK

A. Images as Fuzzy SetsFirst we recall a few basic concepts related to fuzzy sets

[15], a theoretical framework where gray-scale images areconveniently represented.

A fuzzy set S on a reference set XS is a set of orderedpairs, S = (x, µS (x)) : x ∈ XS, where µS : XS → [0, 1] isthe membership function of S . Where there is no risk forconfusion, we equate the set and its membership function andlet S(x) be equivalent to µS (x).

A gray-scale image can directly be interpreted as a spatialfuzzy set by rescaling the valid intensity range to [0, 1]. Weassume, w.l.o.g., that the images to be registered have anintensity range [0, 1] and we directly interpret them as fuzzy

2imagej.net/Linear Stack Alignment with SIFT3Source code available from www.github.com/MIDA-group

sets defined on a reference set which is the image domain, andis in most cases a subset of Zn. We use the terms image andfuzzy set interchangeably in this text.

A crisp set C ⊆ XC (a binary image) is a special caseof a fuzzy set, with its characteristic function as membershipfunction

µC(x) =

1, for x ∈ C0, for x /∈ C . (1)

The height of a fuzzy set S ⊆ XS is h(S) = maxx∈XS

µS(x). The

complement S of a fuzzy set S is S = (x, 1 − µS (x)) : x ∈XS. An α-cut of a fuzzy set S is a crisp set defined asαS = x ∈ XS : µS (x) ≥ α, i.e., a thresholded image.

Let p be an element of the reference set XS . A fuzzy point p(also called a fuzzy singleton) defined at p ∈ XS with heighth(p), is defined by a membership function

µp(x) =

h(p), for x = p0, for x 6= p .

(2)

B. Intensity-Based Registration and Point-Wise DistancesIntensity-based registration is a general approach to image

registration defined as a minimization process, where a dis-tance measure between the intensities of overlapping points(or regions) is used as optimization criterion. Given a distancemeasure d and a set of valid transformations Ω, intensity-basedregistration of two images A (floating) and B (reference) canbe formulated as the optimization problem,

T = arg minT∈Ω

d(T (A),B), (3)

where T (A) denotes a valid transform of image A into thereference space of image B .

Intensity-based similarity/distance measures which are mostcommonly used for image registration are Sum of SquaredDifferences (SSD) [16], Pearson Correlation Coefficient (PCC)and Mutual Information (MI) [17]. These measures are point-based, i.e. they are functions of the intensities of pointsbelonging to the overlapping regions of the two compared sets.Their evaluation, therefore, typically requires interpolation ofimage intensities.

For two images P and Q defined on a common referenceset XP,Q of overlapping points, these measures are defined as

SSD(P,Q) =∑

v∈XP,Q

(P (v)−Q(v))2, (4)

3

PCC(P,Q)=

∑v∈XP,Q

(P (v)− avg(P ))(Q(v)− avg(Q))√ ∑v∈XP,Q

(P (v)−avg(P ))2√ ∑v∈XP,Q

(Q(v)−avg(Q))2

(5)and

MI(P,Q) = HP +HQ −HP,Q. (6)

In (5) avg(P ), and avg(Q) denote means of the resp. intensitydistributions over the evaluated region. In (6) the (joint andmarginal) entropies HP , HQ and HP,Q of the image intensitydistributions P and Q are defined in terms of the estimatedprobability p of a randomly selected point v having intensitiesP (v), Q(v), as

HP = −∑

v∈XP,Q

p(P (v)) log(p(P (v))) , (7)

and

HP,Q = −∑

v∈XP,Q

p(P (v), Q(v)) log(p(P (v), Q(v))) . (8)

Intensity-based registration, as formulated in (3), is, ingeneral, a non-convex optimization problem with a largenumber of local optima, especially for the commonly usedpoint-based measures (SSD, PCC, and MI). To try to overcomethis optimization challenge, a resolution-pyramid-scheme isnormally used [18], [19], where smoothed low resolutionimages are first registered, followed by registration of imageswith increasing resolution and decreasing degree of smoothing,using the transform obtained from the previous stage asstarting position (so-called coarse-to-fine approach).

C. Distances Combining Intensity and Spatial Information

While the distances of Sec. II-B only rely on intensitiesof overlapping points, the distances presented in this sectionincorporate also spatial information of non-overlapping points.For such spatial relations, we consider distances between twopoints, between a point and a set, and between two sets. Themost commonly used point-to-point distance is the Euclideandistance, denoted dE .

Given a point-to-point distance d(a, b), the common crisppoint-to-set distance between a point a and a set B is

d(a,B) = infb∈B

d(a, b) . (9)

Closely related to the crisp point-to-set distance is the (exter-nal) distance transform of a crisp set B ⊆ XB (with point-to-point distance d) which is defined as

DT[B](x) = miny∈Bd(x, y) . (10)

Taking into the consideration the intensity, or equivalently,the height of a fuzzy point, the fuzzy point-to-set inwardsdistance dα, based on integration over α-cuts [11], between afuzzy point p and a fuzzy set S , is defined as

dα(p, S) =

∫ h(p)

0

d(p, αS) dα , (11)

where d is a point-to-set distance defined on crisp sets. Thecomplement distance [20] of a fuzzy point-to-set distance d is

d(p, S) = d(p, S) . (12)

The fuzzy point-to-set bidirectional distance dα is

dα(p, S) = dα(p, S) + dα(p, S) . (13)

For an arbitrary point-to-set distance d, Sum of MinimalDistances (SMD) [21] defines a set-to-set distance as

dSMD(A,B) =1

2

(∑a∈A

d(a,B) +∑b∈B

d(b, A)). (14)

A weighted version can be defined [11], which may beuseful if a priori information about relative importance ofcontributions of different points to the overall distance isavailable:

dwSMD(A,B;wA, wB) =

1

2

(∑a∈A

wA(a)d(a,B) +∑b∈B

wB(b)d(b, A)). (15)

Inserting distances (11) or (13) in (14) or (15) providesextensions of the SMD family of distances to fuzzy sets [11].We refer to them as dαSMD, dαSMD, dαwSMD and dαwSMD.

It has been observed for fuzzy set distances [22] in general,and for distances based on (11) and (13) in particular, thatdistances between sets with empty α-cuts may give infinite orill-defined distances. We follow a previous study and introducea parameter dMAX ∈ R≥0, [23], to limit the underlying crisppoint-to-set distance. This has a double benefit of (i) reducingthe effect of outliers and (ii) making the distances well definedalso for images with empty α-cuts.

Distances based on Optimal Mass Transport (OMT), suchas the Wasserstein distance, also combine intensity and spa-tial information, and are widely studied and used in imageprocessing [24]. The OMT can be framed as a linear pro-gramming optimization problem, which is solvable in O(N3)[25]. This is intractable for most realistic image processingscenarios, and approximations are typically considered [25],[26]. It is possible to incorporate these distances in imageregistration frameworks, but to the best of our knowledge, thishas only been done for non-linear (deformable) registration,and has been shown to be very computationally demanding[27], [28]. We performed a preliminary study of OMT-basedmethods using the formulation in [26], and observed both veryhigh computational demands and noisy distance landscapes.In absence of a complete registration framework for linearregistration based on OMT, this family of measures is excludedfrom the empirical part of this study.

D. Transformations, Interpolation, and Symmetry

Linear transformations relate points in one space to anothervia application of a linear function. A transformation is rigidif only rotations and translations are permitted, and affine if

4

shearing and reflections are also permitted. Affine transforma-tion T : Rn → Rn can be expressed as matrix multiplication,

Tx =

a11 a12 . . . a1n t1a21 a22 . . . a2n t2

......

. . ....

...an1 an2 . . . ann tn0 0 . . . 0 1

x1

x2

...xn1

. (16)

Linear transformations can, in general, transform pointssampled on an image grid to positions outside of the grid,hence an interpolator is required for obtaining the imageintensity at the transformed point’s location. Interpolation is alarge source of error, bias, and a significant contributing factorof asymmetry in intensity-based registration, [14]. Commonly,interpolation is only required for one of the two images,where sampling (for optimization) is done from the grid ofthe other image space; hence, the two images are treatedasymmetrically, yielding distinct similarity surfaces (over thetransformation parameters) depending on which image is takenas reference. This can cause a registration task to succeed orfail, depending on the registration direction.

E. OptimizationRegistration with a differentiable distance measure as ob-

jective function enables the use of gradient-based optimizationalgorithms, which typically are significantly more efficientthan derivative-free algorithms for local iterative optimiza-tion. An effective and commonly used subset of gradient-based algorithms are the stochastic gradient descent methods[29], which consider a random subset of the points in eachoptimization iteration, incurring a two-fold benefit: utilizingrandomness to escape shallow local optima in the implicitdistance surface, while also decreasing the computational workrequired per iteration. The size of the random subset is usuallygiven as a fraction of the total number of points, and denoted asthe sampling fraction. Approximation of the cost function byrandom subset sampling (where a new random subset of pointsis chosen in every iteration) has been, in previous studies, [17],[30], shown to perform well for intensity-based registration.

III. PROPOSED IMAGE REGISTRATION FRAMEWORK

A. DistancesTo extend the family of distance measures given by (15), to

be suitable for registration, optionally with random subsam-pling optimization methods [30], we here define a new relatedfamily of distance measures.

Definition 1 (Asymmetric average minimal distance). Givenfuzzy set A on a reference set XA ⊂ Rn, fuzzy set B onreference set XB⊂Rn, and a weight function wA : XA→R≥0,the Asymmetric average minimal distance from A to B , is

d−→AMD(A ,B;wA) =1∑

x∈XA

wA(x)

∑x∈XA

wA(x)d(A(x),B) .

(17)

We consider point-to-set distances defined by (11) or (13).Building on the asymmetric distance, we formulate a sym-

metric distance as follows:

Definition 2 (Average minimal distance). Given fuzzy set A onreference set XA ⊂ Rn, fuzzy set B on reference set XB ⊂ Rn,weight functions wA : XA → R≥0 and wB : XB → R≥0, theAverage minimal distance between A and B , is

dAMD(A ,B;wA, wB) =

1

2

(d−→AMD(A ,B;wA) + d−→AMD(B,A ;wB)

).

(18)

In the context of image registration, we utilize d−→AMDto express a (weighted) distance between transformed fuzzypoints T (A(x)), and the image B , where the transformationof a fuzzy point A(x) = (x, µA(x)) is given by thetransformation of the reference point x:

T (A(x)) = (T (x), µA(x)) . (19)

To reflect the bounded image domain, only the transformedpoints falling on a predefined region MB ⊂ Rn are consid-ered. Note that, when A and B are digital images, XA andXB are typically subsets of Zn and the transformed pointsT (x)|x∈XA do not necessarily coincide with the points of thereference set XB ; an illustrative example is given in Fig. 2.

We, therefore, provide the following definitions suited forthe task of image registration:

Definition 3 (Asymmetric average minimal distance for imageregistration). Given fuzzy set A on reference set XA ⊂ Rn,fuzzy set B on XB ⊂ Rn, a weight function wA : XA →R≥0, and a crisp subset (mask) MB ⊂ Rn, the Asymmetricaverage minimal distance for image registration from A to B ,parameterized by a transformation T : XA → Rn, is

d−→R

AMD(A ,B;T,wA,MB) =

1∑x∈X

wA(x)

∑x∈X

wA(x)d(T (A(x)),B)

where X = x : x ∈ XA ∧ T (x) ∈MB .

(20)

Definition 4 (Average minimal distance for image registra-tion). Given fuzzy set A on reference set XA , fuzzy set B onXB , weight functions wA : XA → R≥0 and wB : XB → R≥0,and crisp subsets (masks) MA,MB ⊂ Rn, the Averageminimal distance for image registration between A and B ,parameterized by an invertible transformation T : Rn → Rn,with inverse T−1, is defined as

dR

AMD(A ,B;T,wA, wB ,MA,MB) =

1

2

(d−→

R

AMD(A ,B;T,wA,MB) + d−→R

AMD(B,A ;T−1, wB ,MA)).

(21)

The distance dR

AMD is based on full sampling, taking intoaccount all points in the two sets which have non-zero weights,as long as they are transformed to points inside the maskassociated with the other set. To reduce the computational costof the distance and, in addition, to enable random iterativesampling, we propose an approximate version of dR

AMD:

Definition 5 (Subsampled average minimal distance for im-age registration). Given fuzzy set A on reference set XA ,fuzzy set B on XB , weight functions wA : XA → R≥0 and

5

(a) A with weights wA (radius). (b) B with mask MB (surroundingrectangle).

(c) A mapped into the space ofB . Triangles mark points mappedoutside MB , and thus discarded.

(d) Distance contribution graph fordα(T (A(x)),B); where x is cen-tral point of A .

4 3 2 1 0 1 2 3

Euclidean Distance

0

0.5

1

Mem

bers

hip

/Inte

nsi

ty

(e) Inwards part of dα: dα(A(x),B) point-to-set distance.

4 3 2 1 0 1 2 3

Euclidean Distance

0

0.5

1

Mem

bers

hip

/Inte

nsi

ty

(f) Complement part of dα: dα(A(x),B) point-to-set distance.

Fig. 2. Illustration of the Asymmetric average minimal distance for imageregistration. (a) Source set A (radius represents associated weight; gray-levelrepresents membership). (b) Target set B with associated mask MB . (c) Thetransformed (by rotation and translation) A on top of B . (d) Illustration ofthe contributions to the point-to-set distance dα by the central point of A .Thickness of lines show the α-integrated height contributed by each point inB . (e-f) The inwards and complement parts of dα visualized as 1D graphs,where the x-axis is the Euclidean distance (in 2D) from the mid-point andthe points at the left and right side (of the origin) respectively are the pointson the left and right side of the mid-point T (A(x)) in (d).

wB : B → R≥0, and crisp subsets (masks) MA,MB ⊂ Rn,the Subsampled average minimal distance for image regis-tration between A and B , parameterized by an invertibletransformation T : Rn → Rn, with inverse T−1, and crispsets PA ⊆ XA and PB ⊆ XB , is defined as

dR

AMD(A ,B; PA,PB , T, wA, wB ,MA,MB) =

1

2

(d−→

R

AMD(A ∩ PA,B;T,wA,MB)+

d−→R

AMD(B ∩ PB ,A ;T−1, wB ,MA)).

(22)

Inserting (11) or (13) as point-to-set distance in (20), andhence indirectly in (21) and (22), provides extensions of thisfamily of distances to the α-cut-based distances, which wedenote d−→

R

αAMD, d−→R

αAMD, dR

αAMD, dR

αAMD, dR

αAMD and dR

αAMD.

Normalization of the weights of the sampled points, intro-duced through Def. 1, renders the magnitude of the distance(and subsequently its derivatives) invariant to the size andaggregated weight of the sets or of the sampled subsets. Sincethe normalization is done separately from A to B and fromB to A , both directions are weighted equally even if the totalweights of the point subsets from the two sets are different.This normalization can simplify the process of choosing e.g.step-length of various optimization methods, and makes itmore likely that default hyper-parameter values can be foundand reused across different applications.

B. RegistrationWe propose to utilize symmetric distances dαAMD and

dαAMD as cost functions in (3) to define concrete imageregistration methods. Inserting dR

αAMD into (3) we obtain

T = arg minT∈Ω

dR

αAMD(A ,B;T,wA, wB ,MA,MB) . (23)

For the case of subset sampling with sets PA and PB ,registration is defined as

T = arg minT∈Ω

dR

αAMD(A ,B;PA, PB , T, wA, wB ,MA,MB).

(24)By selection of new random subsets PA and PB in each itera-tion, various stochastic gradient descent optimization methodscan be realized.

To solve the optimization problems stated in (23) and(24) with efficient gradient-based optimization methods, thepartial derivatives of the distance measures with respect to thetransformation parameters of T are required.

C. GradientsThe derivative of (9), the crisp point-to-set distance measure

d(T (x), S) (in n-dimensional space), with respect to param-eters Ti of the transformation T applied to a point x ∈ X ,yielding y = T (x), can be written as

∂d

∂Ti=

n∑j=1

∂d

∂yj

∂yj∂Ti

. (25)

The values ∂d∂yj

are the components (partial derivatives) of thegradient ∇d(y, S) of the point-to-set distance in point y ∈Y ⊂ Rn, and are not dependent on the transformation model.

The gradient of the fuzzy point-to-set distance measure (11)is given by the integral over α-cuts, of gradients of the (crisp)point-to-set distances:

∇d(x , S) =

∫ h(x )

0

∇d(x, αS) dα . (26)

D. Algorithms for Digital Images on Rectangular GridsThe distances and gradients can be computed efficiently for

the special case of digital images on rectangular grids. Forimages quantized to ` ∈ N>0 non-zero discrete α-levels theintegrals in (11) and (26) are suitably replaced by sums. Thenumber of quantization levels is typically taken to be a smallconstant; a choice of ` = 7 non-zero equally spaced α-levels

6

has previously shown to provide a good trade-off betweenperformance, speed and noise-sensitivity [11], and we keepit for all experiments.

We need a discrete operator to approximate the gradient ofd(x, S) for a set S defined on a rectangular grid with spacings ∈ Rn>0. We propose to use the following difference operatorproviding a discrete approximation of ∇d(x, S) :

∆d(x) = γx(δ1[d](x), . . . , δn[d](x)) , (27)

where

δi[d](x) = 12si

(d(x+ siui, S)− d(x− siui, S)) , (28)

γx is an indicator function,

γx =

1, for d(x, S) 6= 00, for d(x, S) = 0 ,

(29)

and ui is the unit-vector along the i-th dimension.The indicator function γx causes the operator ∆[S](x) to be

zero-valued for points included in S (i.e., where the distancetransform is zero-valued). This prevents discretization issuesalong set boundaries, where the standard central difference op-erator yields non-zero gradients, which can cause the measureto overshoot a potential voxel-perfect overlap.

Algorithm 1 Distance and Gradient MapsInput: Image A . Mask MA. Set of α-levels α1, . . . , α`.Output: Stack of pre-computed distance sums D[0 . . . `], and

corresponding discrete gradient approximations G[0 . . . `].1: procedure ∆αDT(A ,MA, (α1, . . . , α`), dMAX)2: α0 ← 0, D[0]← 0, G[0]← (0, . . . ,0)3: for i = 1 to ` do4: DTi ← DT[αiA ∩MA] . Compute DT of α-cut.5: DTi ← min(DTi, dMAX)6: for all v ∈ XA do7: D[i](v)← (αi − αi−1)DTi(v) +Di−1(v)8: G[i](v)← (αi − αi−1)∆DTi(v) +Gi−1(v)9: end for

10: end for11: return D,G12: end procedure13: procedure ∆αDT BD(A ,MA, (α1, . . . , α`), dMAX)14: D,G← ∆αDT(A ,MA, (α1, . . . , α`), dMAX)15: D,G← ∆αDT(A ,MA, (1− α`, . . . , 1− α1), dMAX)16: for i = 0 to ` do17: for all v ∈ XA do18: D[i](v)← D[i](v) +D[`− i](v)19: G[i](v)← G[i](v) +G[`− i](v)20: end for21: end for22: return D,G23: end procedure

By creating tables for the distance and gradient sums foreach image as a pre-processing step, using either of the proce-dures in Alg. 1 (∆αDT for inwards distances and ∆αDT BDfor bidirectional distances), the distance and gradient may thenbe readily computed with Alg. 2. |T | denotes the number

of parameters of the transformation, which is 6 for two-dimensional (2D) affine transformations, and 12 for three-dimensional (3D) affine transformations.

Algorithm 2 Point-to-Set Distance and its Gradient w.r.t. TInput: Fuzzy point A(v) and associated weight wA(v),

fuzzy set B and associated (binary) mask MB , distancesD[0 . . . `] and gradients G[0 . . . `]; transformation T .

Output: Point-to-set distance d, derivatives ∆d∆T , weight w.

1: procedure D PTS(A(v),B;T,wA(v),MB , DB , GB)2: v ← T (v)3: if v ∈MB then4: h← QUANTIZE(µA(v))5: d,g← INTERPOLATE(DB [h], GB [h], v)6: for i = 1 to |T | do7: ∆d

∆Ti←∑nj=1

∂vj∂Ti

gj8: end for9: return wA(v)d,wA(v) ∆d

∆T , wA(v)10: else11: return 0,0, 0 . Zero dist., grad., and weight.12: end if13: end procedure

The procedures in Alg. 1 have linear run-time complexityO((`+1) |XA |), achieved by using a linear-time algorithm forcomputing the distance transform (e.g. [31]) in line 4 of Alg.1. The space complexity of the algorithm is O((`+ 1) |XA |)and the D,G structures must remain in memory to enable fastlookup in Alg. 2. Figure 3 shows an example of the distanceand gradient of a sample α-level. Alg. 2 computes the point-to-set distance and gradient w.r.t. the transformation using thepre-computed tables and has run-time complexity O(|T |n)thus being independent of ` and the size of A and B .

Algorithm 3 Symmetric RegistrationInput: Fuzzy sets A ,B , with binary masks MA,MB , and

weight functions wA, wB , initial transformation guess T .No. of α-levels `, distance saturation dMAX, step-lengthsλ[1 . . . N ], and iteration count N are hyper-parameters.

Output: Symmetric set distance d (dR

αAMD), estimated trans-formation T .

1: DA, GA ← ∆αDT BD(A ,MA, (α1, . . . , α`), dMAX)2: DB , GB ← ∆αDT BD(B,MB , (α1, . . . , α`), dMAX)3: for i = 1 to N do4: d1,

∆d1∆T , w1 ← . . .∑v∈XA

D PTS(A(v),B;T,wA(v),MB , DB , GB)

5: d2,∆d2

∆T−1 , w2 ← . . .∑v∈XB

D PTS(B(v),A ;T−1, wB(v),MA, DA, GA)

6: T ← T − λ[i]2

(1w1

∆d1∆T + 1

w2

∆T−1

∆T∆d2

∆T−1

)7: end for8: d← 1

2 ( 1w1d1 + 1

w2d2) . Output final distance value.

9: T ← T . Output final transformation estimate.

Algorithm 3 performs a complete registration given twoimages, their binary masks, weight functions, and an initial

7

transformation. Algorithm 3 completes N full iterations, how-ever other termination criteria may be beneficial (see Sec. IV).The registration consists of pre-processing, followed by a loopwhere the symmetric set distance and derivatives are computedand T is updated. ∆T−1

∆T , in line 6 of Alg. 3 denotes a matrix[∂T−1j

∂Ti

]of partial derivatives of the parameters of the inverse

transform T−1 w.r.t. the parameters of the forward transformT . This matrix relates the computed partial derivatives ∆d2

∆T−1

with the parametrization of the forward transform.The overall run-time complexity is O(N |T |n(|XA | +

|XB |)+(`+1)(|XA |+ |XB |)). Practical choices for N tend tobe in the range [1000, 3000], depending on hyper-parameters(e.g. λ), and distance in parameter-space between startingposition and the global optimum. The evaluation in Sec. Vconfirms empirically that convergence, according to (31) or(32), tends to be reached after 1000 to 3000 iterations, usingan optimizer with a decaying λ.

The QUANTIZE procedure in Alg. 2 takes the membershipof point v, µA(v), and gives the index i of the minimal α-level (α1, . . . , α`) for which µA(v) ≥ αi. If the membershipis below all α-levels, the index is 0. For ` equally spacedα-levels, the quantization can be expressed as

QUANTIZE(µA(v)) = b`µA(v) + 0.5c. (30)

The INTERPOLATE procedure in Alg. 2 computes thevalue of the discrete functions D and G in point v whichmay not be on the grid due to application of T . There aremany interpolation schemes proposed in the literature, butwe suggest that either nearest neighbor (for maximal speed)or linear interpolation (for higher accuracy) are used here,since the distance and gradient fields are smooth. By linearityof integration and summation, nearest neighbor and linearinterpolation may be performed on the pre-processed D andG and yield the same result as if each level was interpolatedbefore integration, allowing efficient computation. The (dis-cretized) measure does not require intensity interpolation; theinterpolation operates on distances and gradients only.

IV. IMPLEMENTATION

We implemented the proposed distance measure and regis-tration method in the open-source Insight Segmentation andRegistration Toolkit (ITK) [14]. We chose this particularsoftware framework because it• enables the use of an existing optimization framework,• allows for a fair comparison against well written, tested,

and widely used implementations of reference similaritymeasures, with support for resolution-pyramids,

• supports anisotropic/scaled voxels in relevant algorithms,• facilitates reproducible evaluation,• makes the proposed measure easily accessible for others.

The built-in ITK optimizer we have used for the registrationtools and all the evaluation is RegularStepGradientDescen-tOptimizerv4. This is an optimizer based on gradient descent,with an initial step-length λ, and a relaxation factor whichreduces the used step-length gradually as the direction of thegradient changes, in order to enable convergence with highaccuracy. In addition to a maximum number of iterations N ,

1

0

0

0

1

1

1

0

0

0

0

1

1

0

1

0

1

1

0

0

0

1

0

0

0

(a) Binary Image (α-cut)

0

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

0

(b) Mask

0

0

0

0

0

1

1

0

0

0

0

1

1

0

1

0

1

1

0

0

0

1

0

0

0

(c) Masked Image (α-cut)

1.00

1.00

1.41

2.24

2.00

0

0

1.00

1.41

1.00

1.00

0

0

1.00

0

1.00

0

0

1.00

1.00

1.00

0

1.00

1.41

2.00

(d) Distance Transform

-1.00

-1.00

-0.41

-0.82

-1.00

0

0

-0.71

-0.62

-1.00

0.50

0

0

-0.21

0

0

0

0

0.21

1.00

0

0

1.00

0.41

1.00

(e) Horizontal Gradient

0

0.21

0.62

0.29

-0.24

0

0

0.71

0

-0.41

-1.00

0

0

0

0

-1.00

0

0

0.50

0

-1.00

0

0.71

0.50

0.59

(f) Vertical Gradient

Fig. 3. (a) Example α-cut in a small 2D image. (b) Binary mask. (c) α-cut after masking. (d) (Euclidean) Distance transform (for (c)). (e-f) Gradientapproximation of the distance transform ∆DT.

two termination criteria are used: (i) a gradient magnitudethreshold (GMT),√

∂d∂T1

2+ . . .+ ∂d

∂T|T |

2< GMT, (31)

and (ii) a minimum step-length (MSL),

λr < MSL, (32)

where r is the current relaxation coefficient. We use defaultvalues of 0.0001 for both of these criteria. A relaxation factorof 0.99 is used for all experiments, since it performed wellin preliminary tests; in this study we are willing to tradesome (potential) additional iterations for better robustness. Tomaximize utilization of the limited number of α-levels, imagesare normalized before registration to make sure that they arewithin the valid [0, 1] interval. We use the following robustnormalization technique: Let Pρ(X) denote the ρ-th percentileof image X with respect to image intensities,

NORMρ(X) = max[0,min

[1,

(X − Pρ(X))

P1−ρ(X)− Pρ(X)

]]. (33)

V. PERFORMANCE ANALYSIS

We evaluate performance of the proposed method, both for2D and 3D images, in two different scenarios; (i) we performa statistical study on synthetically generated images, wherewe seek to recover known transformations and measure theregistration error by comparing the ground truth locations ofknown landmarks with the corresponding registered ones; (ii)

8

we apply the proposed framework to real image analysis tasks:landmark-based evaluation of registration of TEM images in2D, and atlas-based segmentation evaluation of 3D MR imagesof brain.

To compare the proposed measure and registration methodagainst the most commonly used alternative methods andsimilarity measures, we select the widely used ITK imple-mentations of optimization framework and similarity mea-sures (SSD, PCC and MI) as the baseline of intensity-basedregistration accuracy. Note that the PCC measure is denotedNormalized Cross Correlation (NCC) in the ITK framework.

All experiments are performed on a workstation with a 6-core Intel i7-6800K, 3.4GHz processor with 48GB of RAMand 15MB cache. The operating system used is Ubuntu 16.04LTS. The compiler used to build the framework is g++ version5.4.0 (20160609). Version 4.9 of the ITK-framework is usedfor testing and evaluation.

A. DatasetsOne biomedical 2D dataset and one medical 3D dataset are

used for the evaluation.1) TEM Images of Cilia (2D): The dataset of 20 images

of cilia [12] is acquired with the MiniTEM4 imaging system.Each image is isotropic of size 129×129 pixels, with a pixel-size of a few nm. An example is shown in Fig. 1. A particularchallenge is the near-rotational symmetry of the object: 9 pairsof rings are located around a central pair of rings, which gives9 plausible solutions for a registration problem. The alignmentof the central pair can be taken into special consideration toreduce the number of solutions to two. The dataset comeswith a set of 20 landmarks per image, indicating the positionof each of the relevant structures to be detected and analysed –20 rings (2 in the center and 18 in a circle around the center).The landmarks are produced by a domain expert and are onlyused for evaluation of the registrations.

2) LPBA40 (3D): LPBA40 [13] is a publicly availabledataset of 40 3D images of brains of a diverse set of healthyindividuals, acquired with MRI. The images are anisotropic,of size 256 × 124 × 256 voxels with voxel-size 0.86 × 1.5 ×0.86 mm3. The dataset comes with segmentations of the brainsinto 56 distinct regions marked by a medical expert, whichare used in this study as ground-truth for evaluation. LPBA40includes two atlases: first 20 out of 40 MR images of brain inthe dataset are used to generate one brain atlas by SymmetricGroupwise Normalization (SyGN) [32]; another atlas is cre-ated analogously, from the last 20 brains in the dataset. Theatlases contain both a synthesized MR image and the fusedlabel category in all the voxels, as well as a whole brain maskwhich may be used for brain extraction.

B. Evaluation CriteriaWe evaluate accuracy and robustness of the registration

methods in presence of noise, their robustness w.r.t. changeof roles of reference and floating image (symmetry), andtheir speed. We quantify the performance of the observedframeworks in terms of the following quality measures:

4MiniTEM imaging system is developed by Vironova AB.

1) Average Error measure (AE): The registration result isquantified as the mean Euclidean distance between the setsof corresponding image corner landmarks LR and T (LF) inthe reference image space, after transformation of the floatingimage corner landmarks LF, where |LR| is the number oflandmarks (4 in 2D; 8 in 3D). The quality measure is definedas

AE(T ;LR, LF) =1

|LR|

|LR|∑i=1

dE(LR(i), T (LF(i))). (34)

A slight variation of this measure, the Average MinimalError (AME), is used in the real task of cilia registration:

AME(T ;LR, LF) =1

|LR|

|LR|∑i=1

minx∈LF

dE(LR(i), T (x)) . (35)

For the central pair, the error is simply AMECP = AME,whereas for the outer rings we utilize the knowledge that anodd (even) landmark should be matched with an odd (even)landmark of the other image. The error function for the outerrings, [12], is therefore defined as:

AMEOuter(T ;LOddR , LOdd

F , LEvenR , LEven

F ) =12 (AME(T ;LOdd

R , LOddF )+AME(T ;LEven

R , LEvenF )) .

(36)

2) Success Rate (SR): A registration is considered success-ful if its AE is below one voxel(pixel). Success rate (SR)at a given AE value corresponds to the ratio of successfulregistrations (w.r.t. the set of performed ones).

3) Symmetric Success Rate (SymSR): is defined as the ratioof performed registrations which are successful (i.e., AE ≤ 1)in both directions, i.e., when the roles of reference and floatingimage are exchanged.

4) Inverse Consistency Error (ICE), [33]: Given a set ofinterest XA ⊆ A, the transformations TAB : A → B, andTBA : B → A, the ICE of this pair of transformations is

ICE(TAB, TBA;XA) =1

|XA|∑x∈XA

dE(TBA(TAB(x)), x).

(37)We compute ICE considering all the points of the referenceimage for each of the cases where Symmetric Success isobserved (AE ≤ 1 in both directions).

5) Jaccard Index for segmentation evaluation: For twobinary sets, R1 and R2, the Jaccard Index is defined as

J(R1, R2) =|(R1 ∩R2)||(R1 ∪R2)|

. (38)

6) Execution Time: We evaluate (i) the execution timesrequired for one iteration in the registration procedure, i.e.,times needed to compute the distance (similarity) measure andits derivatives, with full sampling, and in full image resolution,between two distinct images from the same set, as well as (ii)the execution time for complete registrations.

C. Parameter TuningThe distance measure and optimization method have a num-

ber of parameters which must be properly chosen. Synthetictests indicated that the following values lead to good optimiza-tion performance: three pyramid levels with downsampling

9

factors (4, 2, 1) and Gaussian smoothing σ = (5.0, 3.0, 0.0),max 3000 iterations per level and an initial step-length λ =0.5. The number of α-levels used is ` = 7, which has shownto provide a reasonable trade-off between computational costs,sensitivity to significant variations in intensity and robustnessto noise [11]. The optimal value for ` is application-dependent;in essentially all observed cases, ` > 1 (non-crisp) outperformsa crisp (binarized) representation. Normalization percentileis normally 5%. This same parameter setting, if not stateddifferently, is used in all the tests, on both synthetic and realdata.

D. Synthetic Tests

A synthetic evaluation framework is used to evaluate theperformance of the proposed method, and to compare it withstandard tools based on SSD, PCC, and MI, in a controlledenvironment. For this evaluation, we construct sets of trans-formed versions of a reference image and add (a new instanceof) Gaussian noise to each generated image. The transfor-mations are selected at random from a multivariate uniformdistribution of rotations measured in degrees (1 angle for 2Dimages and 3 Euler angles for 3D images) and translationsmeasured in fractions of the original image size.

1) 2D TEM Images of Cilia: Three sets of transformed im-ages are built based on image Nr. 1 in the observed dataset, byapplying on it the following three groups of transformations:Small, containing compositions of translations of up to 10%of image size (in any direction) and rotations by up to 10;Medium, containing compositions of translations and rotationssuch that at least one of the parameters exceeds the range ofSmall, and falls within 10− 20% of image size of translation(in at least one direction), or 10− 20 of rotation; and Large,containing compositions of translations and rotations such thatat least one of the parameters exceeds the range of Medium,and falls within 20 − 30% of image size of translation (in atleast one direction), or 20− 30 of rotation. The transformedimages are also corrupted by additive Gaussian noise, fromN (0, 0.12) (σ = 0.1, corresponding to a PSNR≈ 20 dB).Each group of transformations is applied 1000 times, andthe resulting images are registered to image Nr. 1, each timecorrupted by a new instance of Gaussian noise.

To evaluate symmetry, we performed 1000 registrationsof images transformed by randomly selected translations ofup to 30% of image size, and rotations by up to 30, andcorrupted by additive noise from N (0, 0.12). Each of theregistrations were performed twice, with exchanged roles ofreference image and floating image.

Intensity-based registration with gradient-descent optimiza-tion can be computationally demanding, requiring the distancefunction and its derivative for each iteration of the optimizationprocedure. The time to compute the distance and derivativesis directly proportional to the number of sampled points.We, therefore, evaluate influence of the sampling fractionon registration success, observing registrations after Smalltransformations and added noise (with σ = 0.1), over a rangeof sampling fractions. For each evaluated sampling fraction,1000 registrations are performed and SR and AE are computed

TABLE IREGISTRATION OF SYNTHETIC 2D IMAGES OF CILIA. THE TABLES SHOW

SUCCESS RATE (SR), AVERAGE ERROR (AE) OF SUCCESSFULREGISTRATIONS, SYMMETRIC SUCCESS RATE (SYMSR), AVERAGE

INVERSE CONSISTENCY ERROR (ICE) AND AVERAGE RUNTIME FOR THEREGISTRATION WITH COMPLETE SAMPLING (A) AND WITH RANDOM

SUBSAMPLING (B). SUCCESSFUL REGISTRATIONS (AE ≤ 1) OFTRANSFORMATIONS UP TO (AND INCLUDING) LARGE, ARE CONSIDERED.

Measure SR AE SymSR ICE Time (s)SSD 0.536 0.3086 0.313 0.2424 17.3PCC 0.363 0.3413 0.249 0.4227 20.8MI 0.440 0.3495 0.251 0.4518 18.3

dRαAMD 1.000 0.1295 1.000 0.0023 3.49

(a) Full sampling

Measure SR AE SymSR ICE Time (s)SSD 0.367 0.6270 0.186 0.5260 1.761PCC 0.299 0.6364 0.152 0.5676 2.171MI 0.283 0.6219 0.068 0.5974 2.083

dRαAMD 1.000 0.1311 1.000 0.0193 0.834

(b) 0.1 sampling fraction

for successful registrations (AE ≤ 1). No resolution pyramidsare used for these tests.

2) 3D MR Images of Brain: Three sets of transformedimages are built based on image Nr. 1 in the observed dataset,by applying to it the following three groups of transformations:Small, containing compositions of translations of up to 10%of image size (in any direction) and rotations by up to10 (around each of the rotation axes); Medium, containingcompositions of translations and rotations such that at leastone of the parameters exceeds the range of Small, and fallswithin 10− 15% of image size of translation (in at least onedirection), or 10−15 of rotation (around at least one rotationaxes); and Large, containing compositions of translations androtations such that at least one of the parameters exceeds therange of Medium, and falls within 15− 20% of image size oftranslation (in at least one direction), or 15− 20 of rotation(around at least one rotation axes). The transformed images arealso corrupted by additive Gaussian noise, from N (0, 0.12).Each group of transformations is applied 200 times, and theresulting images are registered to image Nr. 1, each timecorrupted by a new instance of Gaussian noise.

E. Results of Synthetic Tests

1) 2D TEM Images of Cilia: Figure 4 shows the distri-butions of registration errors (AE), for the three transforma-tion classes. Superiority of the proposed measure, and thecorresponding registration framework, is particularly clear forMedium and Large transformations; it reaches a 100% successrate, with subpixel accuracy, whereas the competitors not onlyexhibit considerably lower accuracy, but also much lowersuccess rate, i.e., they completely fail in a large number ofcases.

Overall registration performance is summarized in Table I,for complete sampling (a), and for random sampling of 10%of the points (b). The proposed method has 100% success rateand also 100% symmetric success rate. The other observedmeasures exhibit much lower success rate and poor symmetry

10

(a) Reference image. (b) Reference mask. (c) Floating image. (d) Floating mask.

0.01 0.1 1 10 100

Registration error (px)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

SSD

PCC

MIR

AMD

(e) Small transformations

0.01 0.1 1 10 100


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

SSD

PCC

MIR

AMD

(f) Medium transformations

0.01 0.1 1 10 100


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

SSD

PCC

MIR

AMD

(g) Large transformations

Fig. 4. Registration error for 2D TEM images of cilia with Gaussian noise of σ = 0.1 added, for three observed transformation classes. (a-d) Examples ofreference-floating image pair with corresponding masks. (e-g) Cumulative histograms of the fraction of registrations with registration error AE below a givenvalue (left and up is better). The red vertical line shows the chosen threshold for success, AE ≤ 1.

scores; the second best, SSD, succeeds in 54% of the cases,and succeeds symmetrically in only 31% of the cases. Theregistration error for successful registrations is considerablysmaller for the proposed method, while the execution time isconsiderably lower. The reduced sampling fraction in (b) hasa small impact on the proposed method while substantiallydegrading the performance of the other measures.

Figure 5 shows registration performance for varying sam-pling fractions; Small transformations, in presence of noise(σ = 0.1) are considered. We observe that the registrationperformance flattens and stabilizes at approximately 0.01 sam-pling fraction (1% of the points). We conclude that previousfindings of [17], [30], suggesting that random subsamplingprovides good performance even with very small samplingfractions, apply well for the proposed measure.

2) 3D MR Images of Brain: Figure 6 shows the observeddistributions of registration errors (AE) for the three trans-formation classes, and clearly confirms that the proposedmethod is robust and with high performance, even for largertransformations, while the magnitude of the transformation hasa substantial negative effect on the performance of the otherobserved measures.

Figure 7 presents bar plots corresponding to the performedsynthetic tests on the LPBA40-dataset, consisting of 200registrations of images after up to (and including) Largetransformations (with additive Gaussian noise, N (0, 0.12)).Successful registrations (AE ≤ 1) are observed. Here as well,the proposed method delivers 100% success rate, whereas thesecond best, SSD, succeeds in only 33% of the cases. Theregistration error for successful registrations is the smallestfor the proposed method. We observe a relative increase inexecution time of the proposed registration framework in 3Dcase, where it is slightly slower than the other measures.

TABLE IITIME ANALYSIS OF DISTANCE (SIMILARITY) VALUE AND DERIVATIVE

COMPUTATIONS FOR A FULL RESOLUTION IMAGE, REPEATED TOGENERATE STATISTICS. BOLD MARKS THE FASTEST MEASURE IN EACHCATEGORY (2D AND 3D). THE 2D IMAGES ARE OF SIZE 1600× 1278,

AND THE 3D IMAGES ARE OF SIZE 256× 124× 256.

Measure Cilia (2D) [s] Brain (3D) [s]Mean Std.dev. Mean Std.dev.

dRαAMD 0.270 0.010 3.116 0.098

SSD 0.718 0.036 4.782 0.066PCC 1.191 0.026 8.562 0.002MI 0.890 0.025 5.699 0.002

3) Execution Time Analysis: The number of iterationsrequired for convergence of the optimization (registration)typically range from 1000 to 3000. Measures SSD, PCC andMI use cubic spline interpolation. Lookups from the distancemaps for dR

αAMD are done using linear interpolation. Table IIshows the mean (and standard deviation) execution time ofone iteration, which includes computation of the measuresand their derivatives, repeated 1000 times for 2D, and 50times for 3D affine image registrations. We observe that theproposed measure is the fastest per iteration both in 2D and3D. Note that these execution time measurements exclude pre-processing.

F. Evaluation on Real Applications

1) Registration of Cilia: Registration of multiple cilia in-stances detected in a single TEM sample, for enhancement ofdiagnostically relevant sub-structures, requires a pixel-accurateand robust method which is able to overcome the challengesposed by the near-rotational symmetry of a cilium. At mosttwo of the possible solutions properly align the central pair,which is vital for a successful reconstruction.

11

We compare the performance of the proposed method withreported results of a previous study [12] which uses intensity-based registration with PCC as similarity measure. We followthe general protocol described in [12] and perform, as a firststep, a multi-start rigid registration (parameterized by angle θin radians, and translation t = (tx, ty)), followed, in a secondstep, by affine registration initiated by the best (lowest finaldistance) registration of the 9 rigid ones.

No resolution pyramids are used since they were observedto interfere with the multi-start approach (by facilitating largemovements). The registrations are performed in full resolution,without stochastic subsampling. For the rigid registration weuse a small circular binary mask with radius of 24 pixels,positioned in the center, combined with a squared circularHann window function. The affine registration is performedusing a circular binary mask with radius of 52 pixels; themask removes the outside background and the outer plasmawhich is not helpful in guiding the registration. No additionalweight-mask is used for the affine registration. Step length 0.1was used for the rigid and 0.5 for the affine registration. Weuse ` = 7. Normalization percentile is set to 0% for the rigidstage and 1% for the affine stage.

A feature-based approach is also included in this per-formance evaluation. The SIFT feature-detector [7], withRANSAC [34] as model fitting and correspondence pointfiltering method, as implemented in FIJI, is evaluated withboth rigid and affine transformation models. The tests areperformed with, and without, circular masks (as describedabove), and with systematically varied parameter settings(using grid search): initial Gaussian blur tested with valuesin the range [0.4, 2.4], with steps of 0.4; feature descriptorsize tested with 1, 2, 4, 6, 8; steps per scale octave testedwith 1, 2, 3, 4, 5. The other available parameters are set totheir default values, since we observed insensitivity to thoseparameters in our preliminary tests.

2) Atlas-based Segmentation (LPBA40): In [35], a protocolfor evaluation of distance/similarity measures in the contextof image registration was proposed. The protocol starts withaffine registration, for which results are reported, and thenproceeds to deformable registration. Since this study focuseson the development of an affine (linear) registration framework

0.001 0.01 0.1 1.0

Sampling fraction

0

0.2

0.4

0.6

0.8

1

Su

cce

ss r

ate

0

0.2

0.4

0.6

0.8

1

Me

an

pix

el e

rro

r

Success rate

Mean (successful) pixel error

Fig. 5. (Left/Blue) SR for registrations of cilia images, and (Right/Red)AE of the successful registrations, as functions of sampling fraction forthe proposed method. Both measures improve (almost) monotonically withsampling fraction and flatten out after approximately 0.01.

TABLE IIIREGISTRATION OF CILIA: PERFORMANCE OF THE PROPOSED METHOD

COMPARED TO REFERENCE RESULTS, SHOWN AS THE ’MEAN (STD-DEV)’OF THE REGISTRATION ERROR (IN PIXELS) W.R.T. THE CONSIDERED SETS

OF LANDMARKS FOR THE 19 REGISTRATIONS. ’R’ DENOTES RIGID, ’A’DENOTES AFFINE AND ’D’ DIENOTES DEFORMABLE REGISTRATION. BOLD

MARKS THE SMALLEST ERROR FOR EACH SET OF LANDMARKS.

Method Registration ErrorMeasure Transform Central Pair Outer All

- Identity 4.32 (0.78) 5.75 (3.49) 5.61 (3.09)

PCC [12]R 2.6 (1.5) - -

R+A - 3.27 (1.75) -R+A+D 2.79 (1.84) 2.30 (1.80) 2.35 (1.82)

dRαAMD

R 2.65 (0.83) 6.49 (2.64) 6.10 (2.41)R+A 2.03 (1.04) 1.64 (0.36) 1.68 (0.29)

based on the proposed distance measure, we compare withthe reported affine-only performance; an improved affine reg-istration is of great significance since a very high correlationbetween the performance of the affine registration and that ofthe subsequent deformable registration has been established.

We start from the two atlases created utilizing the AdvancedNeuroimaging Tools (ANTs) registration software suite andthe open-source evaluation script provided in the referencestudy [35]. We utilize the atlas created using Mutual Informa-tion since that is the one found in [35] to be best performingand is used as the basis for the whole deformable registrationstudy. Two-fold cross validation is utilized; the first atlas isregistered to the last 20 brain images and the second atlas isregistered to the first 20 brains, hence all registrations are donewith brains that did not contribute to the creation of the atlas.

The multi-label segmentations defined by the atlas are trans-formed using the transformation parameters found during theregistration and compared to the ground-truth segmentationsfor each brain. The Jaccard Index [36] is calculated per region,as well as for the entire brain mask.

For the proposed method based on dR

αAMD we use ` = 7,normalization percentiles 5%, N = 3000, 0.05 samplingfraction, and circular Hann windows as weight-masks.

G. Results of Real Applications

1) Results of Registration of Cilia: Performance of theproposed method, together with the best previously publishedresults, are shown in Tab. III. The table shows the mean andstandard deviation of registration error (AME, in pixels) of the19 registrations, for the three considered sets of landmarks:the Central pair, the Outer rings, and All (1+9) ring pairs. ’R’denotes rigid; ’A’ denotes affine; and ’D’ denotes deformableregistration.

The original study includes deformable registration as afinal stage, after the rigid and affine steps. Here presentedframework based on dR

αAMD includes linear (rigid and affine),but not deformable registration. However, as results includedin Tab. III confirm, the proposed method outperforms theprevious state-of-the-art, even if using only rigid and affineregistrations.

We note that with only rigid registration we improve thealignment of the central pair while degrading the alignment ofthe outer rings. After the affine registration, the alignment of

12

(a) Reference (XY) (b) Reference (XZ) (c) Reference (YZ) (d) Floating (XY) (e) Floating (XZ) (f) Floating (YZ)

0.01 0.1 1 10 100


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

SSD

PCC

MIR

AMD

(g) Small transformations

0.01 0.1 1 10 100


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

SSD

PCC

MIR

AMD

(h) Medium transformations

0.01 0.1 1 10 100


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

SSD

PCC

MIR

AMD

(i) Large transformations

Fig. 6. Registration error for 3D MR images of brain with Gaussian noise, σ = 0.1, added, for three observed transformation magnitudes classes. (a-f)Example of reference-floating image pair in slices along each major axis. (g-i) Cumulative histograms of the fraction of registrations with registration errorAE below a given value (left and up is better). The red vertical line shows the chosen threshold of success, AE ≤ 1.

33.0%25.5%

17.5%

100.0%

SSDPCC M

IR

AMD

0

50

100

(a) Success-rate (%)

0.4104

0.49270.4511

0.3391

SSDPCC M

IR

AMD

0

0.2

0.4

0.6

(b) Mean error (px)

123.3143.7 141.8

179.1

SSDPCC M

IR

AMD

0

100

200

(c) Execution time (s)

Fig. 7. Results of synthetic registration of 3D brain images from the LPBA40 dataset. The plots show the (a) success-rate (SR), (b) mean error (ME) forsuccessful registrations and (c) the average runtime in seconds for the registration with random subsampling with 0.01 sampling fraction. (a) Higher is better.(b-c) Lower is better. Bold marks the best result w.r.t. each statistic.

the central pair is improved further, plausibly due to the lessconstrained transformation model of affine compared to rigid,and we observe that the alignment of the outer rings and thetotal alignment are improved substantially.

The feature-based method is omitted from Tab. III due tocomplete failure on all 19 image registration tasks, both withrigid and affine transformations; either too few matching pointswere detected, or the ones found resulted in large erroneoustransformations. One such failed registration example is illus-trated in Fig. 1.

2) Results of Atlas-Based Segmentation of Brains: TableIV shows results of atlas-based brain segmentation. The MeanJaccard Index is computed for each of the brain regions, fordR

αAMD and MI, with affine registration as reported in [35]. FordR

αAMD, mean and std. dev. are displayed; for the comparativeresults (MIAff, [35]), only mean was reported.

We observe that for the whole brain mask, for the ag-gregated overlap, and for 43 out of the 56 distinct regions,the proposed measure outperforms the reported performanceobtained with the MI metric; MI was the best performingmeasure out of the three evaluated in [35].

VI. DISCUSSION

Compared to the traditional similarity measures (SSD, PCC,MI), the proposed measure and associated registration methodrequire substantial amounts of memory to store the auxiliarydata-structures. A single 3D registration of two MR images ofbrains may require approximately 4GB of working memorywith a reasonable set of parameters; contemporary machinesfor high-end data processing typically have a lot more memorythan 4GB, but this requirement can affect how many registra-tions can be performed in parallel on a single machine.

VII. CONCLUSION

In this study we have adapted a family of distance measures[11] to gradient descent based image registration, for 2D and3D images. We have shown that such an extension is feasibleand that the very good performance of the measures observedpreviously for object recognition and template matching, andtheir property of a large catchment basin for local optimization,also hold in the context of registration. This has been shown byevaluating the method in four main ways: (i) on synthetic tests,

13

TABLE IVRESULTS OF ATLAS-BASED BRAIN SEGMENTATION. THE TABLE SHOWS

THE MEAN JACCARD INDEX FOR EACH OF THE BRAIN REGIONS FORdRαAMD AND MUTUAL INFORMATION WITH AFFINE REGISTRATION AS

REPORTED IN [35]. FOR dRαAMD , MEAN AND STD. DEV. ARE DISPLAYED;

FOR THE COMPARATIVE RESULTS (MIAFF), ONLY MEAN WAS REPORTED.

LPBA40 Label dRαAMD MIAff [35]

All LPBA Data 0.595 ± 0.0187 0.554Brain 0.922 ± 0.0082 0.90521 L superior frontal gyrus 0.690 ± 0.0351 0.70822 R superior frontal gyrus 0.683 ± 0.0322 0.74823 L middle frontal gyrus 0.672 ± 0.0345 0.53624 R middle frontal gyrus 0.663 ± 0.0451 0.51325 L inferior frontal gyrus 0.590 ± 0.0467 0.56926 R inferior frontal gyrus 0.591 ± 0.0562 0.55027 L precentral gyrus 0.560 ± 0.0579 0.50328 R precentral gyrus 0.550 ± 0.0584 0.50829 L middle orbitofrontal gyrus 0.529 ± 0.0723 0.50530 R middle orbitofrontal gyrus 0.522 ± 0.0569 0.48431 L lateral orbitofrontal gyrus 0.434 ± 0.0828 0.55132 R lateral orbitofrontal gyrus 0.421 ± 0.0846 0.56433 L gyrus rectus 0.507 ± 0.0571 0.50334 R gyrus rectus 0.536 ± 0.0712 0.48541 L postcentral gyrus 0.479 ± 0.0739 0.49042 R postcentral gyrus 0.474 ± 0.0653 0.46343 L superior parietal gyrus 0.563 ± 0.0509 0.47044 R superior parietal gyrus 0.569 ± 0.0532 0.47045 L supramarginal gyrus 0.502 ± 0.0719 0.50446 R supramarginal gyrus 0.510 ± 0.0702 0.46347 L angular gyrus 0.509 ± 0.0782 0.50648 R angular gyrus 0.520 ± 0.0527 0.47249 L precuneus 0.525 ± 0.0613 0.54650 R precuneus 0.541 ± 0.0610 0.54061 L superior occipital gyrus 0.424 ± 0.0825 0.41362 R superior occipital gyrus 0.409 ± 0.0665 0.39963 L middle occipital gyrus 0.516 ± 0.0686 0.42164 R middle occipital gyrus 0.512 ± 0.0573 0.39765 L inferior occipital gyrus 0.448 ± 0.0967 0.48466 R inferior occipital gyrus 0.451 ± 0.0914 0.49267 L cuneus 0.442 ± 0.1087 0.37268 R cuneus 0.445 ± 0.0889 0.38881 L superior temporal gyrus 0.574 ± 0.0478 0.51482 R superior temporal gyrus 0.586 ± 0.0446 0.49883 L middle temporal gyrus 0.513 ± 0.0580 0.48184 R middle temporal gyrus 0.540 ± 0.0495 0.47385 L inferior temporal gyrus 0.509 ± 0.0601 0.46286 R inferior temporal gyrus 0.534 ± 0.0572 0.46087 L parahippocampal gyrus 0.546 ± 0.0743 0.55688 R parahippocampal gyrus 0.535 ± 0.0630 0.54489 L lingual gyrus 0.519 ± 0.0943 0.42190 R lingual gyrus 0.541 ± 0.0678 0.42091 L fusiform gyrus 0.542 ± 0.0801 0.48892 R fusiform gyrus 0.548 ± 0.0646 0.453101 L insular gyrus 0.625 ± 0.0504 0.378102 R insular gyrus 0.611 ± 0.0639 0.420121 L cingulate gyrus 0.553 ± 0.0504 0.491122 R cingulate gyrus 0.545 ± 0.0588 0.508161 L caudate 0.583 ± 0.1002 0.494162 R caudate 0.574 ± 0.0952 0.502163 L putamen 0.626 ± 0.0581 0.559164 R putamen 0.636 ± 0.0668 0.561165 L hippocampus 0.603 ± 0.0836 0.633166 R hippocampus 0.604 ± 0.0810 0.643181 cerebullum 0.813 ± 0.0416 0.659182 brainstem 0.778 ± 0.0447 0.660

(ii) execution time measurement, (iii) registration of TEM-images of cilia for multi-image super-resolution reconstruc-tion, and (iv) atlas-based segmentation with annotated MR

brain images. We observe that the proposed method providesoutstanding performance for intensity-based affine registrationin terms of robustness, accuracy and symmetry. It is also fasteror similar in speed to the commonly used measures, whichallows its practical applications. The framework developed inthis study operates on single-layer (e.g. gray-scale) images,but can be extended to multi-layer images such as colorimages, either by considering a linear sum of distances, ormore sophisticated methods based on simultaneous presenceor absence of membership in the multiple layers [23], [37].Future work includes extending the measures to non-linear(deformable), as well as multi-modal registration.

ACKNOWLEDGMENTS

The authors would like to thank Dr. Ida-Maria Sintorn forproviding the cilia dataset and the landmarks used for theevaluation in Sec. V-F1. This work is supported by VIN-NOVA, MedTech4Health grants 2016-02329 and 2017-02447,Swedish Research Council grants 2015-05878 and 2017-04385, and the Ministry of Education, Science, and Techn.Development of the Republic of Serbia (proj. ON174008 andIII44006).

REFERENCES

[1] J. Maintz and M. A. Viergever, “A survey of medical image registration,”Medical Image Analysis, vol. 2, no. 1, pp. 1 – 36, 1998.

[2] M. A. Viergever, J. A. Maintz, S. Klein, K. Murphy, M. Staring, andJ. P. Pluim, “A survey of medical image registration – under review,”Medical Image Analysis, vol. 33, pp. 140 – 144, 2016.

[3] B. Zitova and J. Flusser, “Image registration methods: a survey,” Imageand vision computing, vol. 21, no. 11, pp. 977–1000, 2003.

[4] F. P. Oliveira and J. M. R. Tavares, “Medical image registration: a re-view,” Computer methods in biomechanics and biomedical engineering,vol. 17, no. 2, pp. 73–93, 2014.

[5] A. Sotiras, C. Davatzikos, and N. Paragios, “Deformable medical imageregistration: A survey,” IEEE Transactions on Medical Imaging, vol. 32,no. 7, pp. 1153–1190, 2013.

[6] S. Matl, R. Brosig, M. Baust, N. Navab, and S. Demirci, “Vascularimage registration techniques: a living review,” Medical image analysis,vol. 35, pp. 1–17, 2017.

[7] D. G. Lowe, “Object recognition from local scale-invariant features,” inProceedings of the Seventh IEEE International Conference on ComputerVision, vol. 2. IEEE, 1999, pp. 1150–1157.

[8] B. Berkels, P. Binev, D. A. Blom, W. Dahmen, R. C. Sharpley, andT. Vogt, “Optimized imaging using non-rigid registration,” Ultrami-croscopy, vol. 138, pp. 46 – 56, 2014.

[9] B. Fischer and J. Modersitzki, “Ill-posed medicine - an introduction toimage registration,” Inverse Problems, vol. 24, no. 3, p. 034008, 2008.

[10] D. Skerl, B. Likar, and F. Pernus, “A protocol for evaluation of similaritymeasures for rigid registration,” IEEE Transactions on Medical Imaging,vol. 25, no. 6, pp. 779–791, June 2006.

[11] J. Lindblad and N. Sladoje, “Linear time distances between fuzzysets with applications to pattern matching and classification,” IEEETransactions on Image Processing, vol. 23, no. 1, pp. 126–136, 2014.

[12] A. Suveer, N. Sladoje, J. Lindblad, A. Dragomir, and I.-M. Sintorn,“Enhancement of cilia sub-structures by multiple instance registrationand super-resolution reconstruction,” in Scandinavian Conference onImage Analysis. Springer, 2017, pp. 362–374.

[13] D. W. Shattuck, M. Mirza, V. Adisetiyo, C. Hojatkashani, G. Salamon,K. L. Narr, R. A. Poldrack, R. M. Bilder, and A. W. Toga, “Constructionof a 3D probabilistic atlas of human cortical structures,” Neuroimage,vol. 39, no. 3, pp. 1064–1080, 2008.

[14] B. B. Avants, N. J. Tustison, M. Stauffer, G. Song, B. Wu, and J. C.Gee, “The Insight ToolKit image registration framework,” Frontiers inneuroinformatics, vol. 8, p. 44, 2014.

[15] L. A. Zadeh, “Information and control,” Fuzzy sets, vol. 8, no. 3, pp.338–353, 1965.

14

[16] J. V. Hajnal, N. Saeed, E. J. Soar, A. Oatridge, I. R. Young, andG. M. Bydder, “A registration and interpolation procedure for subvoxelmatching of serially acquired MR images,” Journal of computer assistedtomography, vol. 19, no. 2, pp. 289–296, 1995.

[17] P. Viola and W. M. Wells III, “Alignment by maximization of mutualinformation,” International journal of computer vision, vol. 24, no. 2,pp. 137–154, 1997.

[18] M. Irani and S. Peleg, “Improving resolution by image registration,”CVGIP: Graphical models and image processing, vol. 53, no. 3, pp.231–239, 1991.

[19] A. Rosenfeld, Multiresolution image processing and analysis. SpringerScience & Business Media, 2013, vol. 12.

[20] J. Lindblad, V. Curic, and N. Sladoje, “On set distances and their appli-cation to image registration,” in Proceedings International Symposiumon Image and Signal Processing and Analysis, Sept 2009, pp. 449–454.

[21] T. Eiter and H. Mannila, “Distance measures for point sets and theircomputation,” Acta Informatica, vol. 34, no. 2, pp. 109–133, 1997.

[22] P. Brass, “On the nonexistence of hausdorff-like metrics for fuzzy sets,”Pattern Recognition Letters, vol. 23, no. 1-3, pp. 39–43, 2002.

[23] J. Ofverstedt, N. Sladoje, and J. Lindblad, “Distance between vector-valued fuzzy sets based on intersection decomposition with applicationsin object detection,” in Int. Symp. on Mathematical Morphology and ItsApplications to Signal and Image Process. Springer, 2017, pp. 395–407.

[24] Y. Rubner, C. Tomasi, and L. J. Guibas, “The earth mover’s distance asa metric for image retrieval,” International journal of computer vision,vol. 40, no. 2, pp. 99–121, 2000.

[25] W. Wang, D. Slepcev, S. Basu, J. A. Ozolek, and G. K. Rohde, “Alinear optimal transportation framework for quantifying and visualizingvariations in sets of images,” International journal of computer vision,vol. 101, no. 2, pp. 254–269, 2013.

[26] O. Pele and M. Werman, “Fast and robust earth mover’s distances.” inICCV, vol. 9, 2009, pp. 460–467.

[27] S. Haker, L. Zhu, A. Tannenbaum, and S. Angenent, “Optimal masstransport for registration and warping,” International Journal of com-puter vision, vol. 60, no. 3, pp. 225–240, 2004.

[28] T. ur Rehman, E. Haber, G. Pryor, J. Melonakos, and A. Tannenbaum,“3d nonrigid registration via optimal mass transport on the gpu,” Medicalimage analysis, vol. 13, no. 6, pp. 931–940, 2009.

[29] S. Ruder, “An overview of gradient descent optimization algorithms,”arXiv preprint arXiv:1609.04747, 2016.

[30] S. Klein, M. Staring, and J. P. Pluim, “Evaluation of optimization meth-ods for nonrigid medical image registration using mutual informationand b-splines,” IEEE Transactions on Image Processing, vol. 16, no. 12,pp. 2879–2890, 2007.

[31] C. R. Maurer, R. Qi, and V. Raghavan, “A linear time algorithmfor computing exact euclidean distance transforms of binary imagesin arbitrary dimensions,” IEEE Transactions on Pattern Analysis andMachine Intelligence, vol. 25, no. 2, pp. 265–270, 2003.

[32] B. B. Avants, P. Yushkevich, J. Pluta, D. Minkoff, M. Korczykowski,J. Detre, and J. C. Gee, “The optimal template effect in hippocampusstudies of diseased populations,” Neuroimage, vol. 49, no. 3, pp. 2457–2466, 2010.

[33] G. E. Christensen and H. J. Johnson, “Consistent image registration,”IEEE Transactions on Medical Imaging, vol. 20, no. 7, pp. 568–582,2001.

[34] M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigmfor model fitting with applications to image analysis and automatedcartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395,1981.

[35] B. B. Avants, N. J. Tustison, G. Song, P. A. Cook, A. Klein, and J. C.Gee, “A reproducible evaluation of ANTs similarity metric performancein brain image registration,” Neuroimage, vol. 54, no. 3, pp. 2033–2044,2011.

[36] A. A. Taha and A. Hanbury, “Metrics for evaluating 3d medical imagesegmentation: analysis, selection, and tool,” BMC medical imaging,vol. 15, no. 1, p. 29, 2015.

[37] N. Sladoje and J. Lindblad, “Distance between vector-valued repre-sentations of objects in images with application in object detectionand classification,” in International Workshop on Combinatorial ImageAnalysis. Springer, 2017, pp. 243–255.

Johan Ofverstedt received the B.Sc. and M.Sc.degrees in computer science in 2012 and 2017respectively, and is currently pursuing a Ph.D. de-gree in computerized image processing at Centrefor Image Analysis, Uppsala University, Sweden,since 2017. His research interests include distancemeasures, image registration, object recognition, andoptimization.

Joakim Lindblad received the M.Sc. in engineeringphysics and Ph.D. in computerized image analysisfrom Uppsala University, Sweden, in 1997 and 2003,respectively. He is currently Researcher at the Centrefor Image Analysis, Uppsala University, Sweden;Senior Research Associate at Mathematical Instituteof the Serbian Academy of Sciences and Arts, Ser-bia; and Head of Research at Topgolf Sweden AB,Stockholm, Sweden. His research interests includedevelopment of general and robust methods forimage processing and analysis.

Natasa Sladoje received the B.Sc. and M.Sc. de-grees in mathematics from the Faculty of Science,University of Novi Sad, Serbia, in 1992 and 1998,respectively, and the Ph.D. degree in computerizedimage analysis from the Centre for Image Analysis,Swedish University of Agricultural Sciences, Upp-sala, Sweden, in 2005. She is a Senior Lecturer atthe Centre for Image Analysis, Uppsala University,Sweden. Her research interests include theoreticaldevelopment of image analysis methods with appli-cations in biomedicine and medicine.

Documents

Fast and Robust Symmetric Image Registration …Johan Ofverstedt, Joakim Lindblad,¨ Member, IEEE, and Natasa Sladoje,ˇ Member, IEEE Abstract—Intensity-based image registration