LOCATING TARGETS UNDER PERSPECTIVE ......ASPRS 2006 Annual Conference Reno, Nevada May 1-5, 2006 LOCATING TARGETS UNDER PERSPECTIVE PROJECTION WITH GENETIC ALGORITHMS AND TABU SEARCH

ASPRS 2006 Annual Conference Reno, Nevada May 1-5, 2006

LOCATING TARGETS UNDER PERSPECTIVE PROJECTION WITH GENETIC ALGORITHMS AND TABU SEARCH

Yong Hu 1, Huayi Wu 2, Ruisheng Wang 3, Benoit St-Onge 1

1 Department of Geography, University of Quebec at Montreal 2 LIESMARS, Wuhan University

3 Department of Earth and Space Science, York University E-mails: {Hu.Yong; [email protected]; [email protected]; [email protected]

ABSTRACT The fundamental aspects on computational theories of automatic target recognition include the feature space, imaging geometry, transformation space, matching measures and search strategies. Under this framework, we proposed a hierarchical hybrid genetic matching algorithm (HHGMA) for locating targets under planar perspective projection. Assuming the planar perspective imaging geometry, the six relative orientation elements are converted to the more intuitive transform parameters (i.e., translation, scale and three rotations). The discretized steps of these parameters are derived rigorously compared to the empirical values adopted by other researchers. The adaptive and robust Hausdorff distance is used to compare the similarity between the real-time model and the reference image. The HHGMA algorithm searches the six dimensional transformation space using genetic algorithms and also incorporates knowledge specific to the target location problem into the local improvement procedure. Based on the experimental results for two and six transformation parameters, the performance of our matching algorithm, including its stability, reliability, convergence and location errors, is analyzed. The results using aerial images show that our algorithm is competent for urban target recognition and localization.

INTRODUCTION

Target recognition and localization use the pictorial information of an object to determine its geometric and physical properties. The core idea is actually matching, popularly said is comparison. Effective matching algorithms to solve this problem can considerably improve similar application tasks, such as target tracking and image registration. Brown (1992) analyzed all kinds of factors included in the image matching methods, and concluded that an image matching algorithm may be composed of following four components: (1) feature space, represents the information used for matching; (2) transformation space, also called search space, is the set of all possible transformations used to align the template and reference image. Its dimensions are determined by the freedoms of orientation parameters of the imaging geometry; (3) matching measure, is the index used to represent the extent of the similarity or non-similarity. For example, the Euclidean distance is one kind of measure that increases along with the non-similarity; (4) search strategy, is the way to search in the transformation space. It decides how to choose a transform to obtain the best result under some kind of matching measures.

In this paper, we will take the missile navigation as the study example. Usually the cruise missile is loaded with digital topographic maps (called reference Image, I) along the planned flying route. During the flying period, its missile-borne imaging sensor (having certain roll, pitch and rotation) takes a series of real-time nadir images (called real-time Model, M). Then the digital scene matching area correlator (DSMAC) compares those models with the reference data to determine whether the real-time model or a part of it appears on the reference data. This determines the missile’s attitude, and then corrects the attack direction. In this way, the task of self-location and ground object recognition is completed. Our task is to find out the orientation parameters of the model relative to the reference data by developing a point-set based template matching algorithm.

In general the reference data covers the object to be located and its surroundings; while the real-time model contains only a specific object and some backgrounds (see Figure x). Due to the differences of the imaging and photographic processing conditions (e.g., sensor, exposure station, time and so on), the real-time model likely has some differences in the grey values of pixels and their distributions. These factors mainly result in five types of differences (Brown, 1992; Li, 1997): (1) compared with the reference imaging system, the real-time imaging system has different orientations, including translation, roll, pitch and rotation. This causes the changes in attitude and scale, geometric distortion and occlusion; (2) the different illumination and atmospheric conditions (e.g., sunlight intensity, elevation angle, atmosphere refraction, clouds and so on) cause the radiation distortion; (3) the scene changes due to


object migration and background change; (4) the camera lens have different optics distortions, and thus different distortions in the radial and tangent directions; (5) the different random noises. Therefore, the automatic comparison between the real-time model and the reference data is a quite difficult task. An advanced image matching system should have enough resistivity to the above differences (Li, 1997).

Our objective is to develop advanced and effective template matching algorithms under the presence of the above differences between the reference and real-time images. This is an extremely important and key technique for developing a high-end terrain matching system. To register the real-time model to the reference image, we have to find the best transformation to eliminate the 1st type of difference, and the transformation space that will be searched is also determined by the knowledge related with the 1st type of difference. The 2nd type of difference usually influences the gray values. It is not easy to determine the grey modulation since it is difficult to model the terrain reflection characteristics, shape and sensor distance in a real-time (Brown, 1992). Moreover, the latter four types of difference all are not directly conquered by the matching algorithm. Therefore in this paper, we will mainly try to tackle the target location/image matching problem under the 1st type of difference.

In view of the fact that the 1st type of difference is the most important factor to be tackled in this paper, we will address the imaging geometry separately from the transformation space. Thus a matching algorithm is composed of five components, including the feature space, imaging geometry, transformation space, similarity measures and search strategy. In a match algorithm, the aforementioned components influence each other and also restrict mutually. Basically, the feature space limits the type of available similarity measure; the similarity measure specifies the abstract level of feature space; the imaging geometry indicates dimensions of transformation space; and the search strategy is influenced by and also influences the first four parts. In this paper, we implement a hierarchical hybrid genetic matching algorithm for locating targets under the planar perspective projection. The rest of this paper is organized as follows. In Section 2, the computational theory of the HHGMA is discussed. The HHGMA algorithm is then described in Section 3. In Section 4, the experimental results using aerial images are given and the performance of our algorithm, including its stability, reliability, convergence and location errors, is analyzed.

COMPUTATIONAL THEORY OF TEMPLATE MATCHING

This section briefly discusses the five individual components of our algorithm, focusing on our innovations. Feature Space: Synthesized Multi-scale Edge

Edge maybe the most important low-level feature. The edge operators usually detect local maxima or zero crossings after applying 1st- or 2nd-order derivative filters or phase congruency (Jian et al, 1999; 2000). Researchers have developed optimal edge operators focusing the step edge model and the additive white noise model. Canny (1986) proposed three indices for evaluating the performance of edge operators. It consists of following basic steps:

1. Compute the magnitude of edge strength by convolving the image with the first derivatives of Gaussian operator in x and y directions.

2. Estimate local edge normal directions for each pixel in the image. 3. Find the location of edges by non-maximal suppression. 4. Eliminate spurious responses by adaptive thresholding with hysteresis.

In general, we have to use multiple Gaussian operators with different standard deviation because the signal-to-noise ratio is likely different for edges in the image. Canny proposed to aggregate the final information about edges at multiple scales using the feature synthesis approach. The response of a larger operator can be predicated from responses of smaller operators. If the response of the larger operator is significantly different from the predicated value, then it is discriminated as new edge points. In most cases, most edges are detected by small operators, and large operators detect edges at shadows or between texture regions. The feature synthesis approach usually produces a map of dense edge points and the edge points detected from smaller operators also present shifts in large scale. Bergholm (1987) proposed an edge focusing method to merge edge in coarse-to-fine manner, but distortions happen for some blurred edges in small scale. A common weak for both methods is that they cannot determine which scale is best in different situations.

We modify Canny’s feature synthesis approach to detect edges by directly synthesizing multi-scale information. This synthesized edge detection uses following two steps to compute the magnitude of edge strength and to estimate local edge normal directions. Let the 2-D Gaussian function with a standard deviation σ be


)2

exp(2

1),( 2

22

2 σπσyxyxG +

−=

• Compute edge strengths in x and y directions using 1-D Gaussian operators for n different scales iσ (i=1…n). The sizes of Gaussian masks are 122 +iσ .

),(*),(*

yxIGEyxIGE

yiyi

xixi

== (1)

with )()()2/exp()2/exp( 212222 yhxhyxkxG iixi −=−−−= σσ

)()()2/exp()2/exp( 212222 xhyhxykyG iiyi −=−−−= σσ

)()2/exp()( 222

1 thtttkth i ⋅=−= σ , )2/exp()( 222 itkth σ−= , )2/(1 4πσ=k

• Accumulate edge strengths of multiple scales to obtain the synthesized edge strength E and the gradient or normal direction β at point (x, y).

),(),(),( 22 yxEyxEyxE yx += (2a)

( )),(/),(arctan),( yxEyxEyx xy=β (2b)

with nwEwEEwE i

n

iyiiy

n

ixiix /1,,

11

=== ∑∑==

Figures 9 and 10 give the synthesized edges for the aerial images shown in Figures 8 and 9, respectively, combining edges detected with 5.31 =σ and 1.22 =σ . The experimental results shows that the large and small operators give similar responses at places where strong discontinuities occurs; while the large operator play the dominated role at relatively smooth areas. Most edges are detected by large operators in our method, and small operators mainly make the edge location more accurate and find some details. The number of edge points is also reduced. This is contrary to Canny’s feature synthesis approach. The synthesized edges are composed of both contours with the large scale and details with the small scale. Moreover, the responses from small operators can rectify the location shift from large operators, and the large operators can also depress redundant details. Therefore, the edge distortion and location shift when applying the coarse-to-fine merging (Bergholm, 1987) and fine-to-coarse prediction (Canny, 1986; Lacroix, 1990) are both conquered. In addition, we can obtain more details if we assign larger weights to small operators when synthesizing the edge strengths. In this paper, the edge points detected in the reference I and model M images are denoted as A and B, respectively. Imaging Geometry: Coplanar Perspective Projection

Assuming that the reference image I and the real-time template M are taken at different stations toward the same ground object, the real-time template is often a part of the reference image. The matching needs to recover the relative attitudes between them, that is, to find out the orientation parameters. Because the reference image may be processed offline, it might be assumed as a horizontal photograph. Thus the matching becomes a kind of problem of solving the relative orientation parameters. We use the coordinate system of the reference image as the reference system of the relative orientation parameters ( , , , , , )B B BX Y Z ω ϕ κ . As shown in Figure 1, ( , , )B B BX Y Z stands for the

Figure 1. Imaging geometry for front projection

O1O2

(BX,BY,BZ)

y

x Image It

target

Y1

X1

X2Y2

Target plane

Z1

Model o1

f1

o2

xy

Figure 2. Coplanar perspective projection

Model plane

Target plane P0

P

p

p0

f

H

P Axis

O


baseline vector and ( , , )ω ϕ κ are the rotation angles around the X, Y, Z axes. We also suppose that the camera’s exposure center is consistent with the origin of the coordinate system )2,1( =− ixyzOi , the photograph is exactly placed at the plane that is vertical to Z and whose distance from iO is the focal length, and the projection is toward the positive direction of Z. The object point is denoted as ( , , )X Y Z in the system XYZO −1 of I. Its projections in I and M are denoted as ),( ii yx in the coordinate system xyoi − when taking the principle points ),( 00 ii yx as the origins, as ),( titi yx in xyoti − , as ),,( iii fyx in xyzOi − , and as ),,( iii ZYX in XYZOi − .

According to the geometry of planar perspective projection as shown in Figure 2, we get Equation 3 that is used to transform a point ( , )x y2 2 in M to the reference image and Equation 4 that inversely transform a point ( , )x y1 1 in I to the model.

xyx d

fryrxrfryrxrfssddyxxt +

++++

⋅=κϕω233232231

21321221122221 ),,,,,,,( (3a)

yyx d

fryrxrfryrxrfssddyxyt +

++++

⋅=κϕω233232231

22322222122221 ),,,,,,,( (3b)

233123113

23112111121112 )()(

)()(),,,,,,,(

frsdyrdxrfrsdyrdxr

fsddyxxtyx

yxyx ⋅+−+−

⋅+−+−=κϕω (4a)

233123113

23212211221112 )()(

)()(),,,,,,,(

frsdyrdxrfrsdyrdxr

fsddyxytyx

yxyx ⋅+−+−

⋅+−+−=κϕω (4b)

with 1 1

2−

⋅ =

BZ

ff

sZ , fBZ

dXx1 = , f

BZ

dYy1 = ,

κϕω RRRrrrrrrrrr

R =

=

333231

232221

131211

In geometry, s d dx y, , stand for, respectively, the relative scale of the reference image against the model, and the two components of the baseline vector when projected to the X and Y axes and reduce at the scale of the reference image. Thus the six relative orientation elements ( , , , , , )B B BX Y Z ω ϕ κ of independent rig are converted to the more intuitive transform parameters ( , , , , , )d d sx y ω ϕ κ . For simplicity, we take ),,,,,,,( 2221 κϕωsddyxxt yx

as

),( 2221 yxxt , then Equations 3 and 4 can be defined as

( )),(),,(),( 2221222122 yxytyxxtyxt = (5) ( )),(),,(),( 1112111211

1 yxytyxxtyxt =− (6)

Given two images M and I and the transformation space T that transform one image to another, an allowable transformation t T∈ will map each point of M (see Figure 3a) to a point in the coordinate system of I. The mapped image block of M is denoted as t(M). The region covered by t(M) in I is called the search map and is denoted as It (see Figure 3b). While the inverse transformation t-1 maps every pixel of It to a point in the coordinate system associated with M, and the image block produced is denoted as )(1

tIt − . Transformation Space: Six Dimensional

We know from the above section that there is a six dimensional transformation space. The ranges of all the six transformation parameters are also known: [ ]d dx y, ,∈ −127127 pixels, [ ]2.1,8.0∈s , [ ]14.0,14.0, −∈ϕω rad, and

[ ]κ π π∈ − , rad. So the transformation space T is a convex subset of the Euclidean space ℜ6 , and is defined as:

},14.,14.,2.18.,127,127|),,,,,({ πκπϕωκϕω ≤≤−≤≤−≤≤≤≤−= sddsddtT yxyx (7)

However, the transformation space should be discretized in the digital matching. Huttenlocher et al. (1993) and You (1994) did not determine the smallest unit size of the transformation space. We will discuss the steps of the six

q3

q2

q4

q1 y

x

M2o

Figure 3a. Real-time model

p4

p3 p1

p2

I

tI

Figure 3b. Search map


parameters under the planar perspective projection model following the basic idea given in Borgerfors (1988). At the same time, we also take account of keeping the digital topology when transforming B (Box, 1994). Genetic algorithms usually require a high independency among the parameters. But we find that under the planar perspective projection the transform parameters have a high intrinsic dependency. That is, a small change of the roll angle ω will lead to only a small translation in Y direction; similarly, it is hard to distinguish a small change of the pitch angle ϕ from a small translation in X direction. A change in shape can only be observed when the steps ∆ω and ∆ϕ are significantly larger than those leading to translations. Therefore, the change of any non-translation parameter may be considered as having the dual effect of the shape change as well as the translation. If the translation effect is kept, the shape change could not observe and this is equal to an additional change in translation. Therefore when computing the steps of parameters ω and ϕ , the translation components relative to the origin of the template should be eliminated from their distortion effects to ensure that each parameter will result in a single effect. This also decreases the extent of dependence among them and also the extent of correlation with the translation parameters. We will show later that this further satisfies the requirement of parameter independency in GAs. Let’s denote )( 6Iipi ∈ as one of the six parameters ( , , , , , )d d sx y ω ϕ κ , ∆pi as the step length. When pi changes to p pi i+ ∆ , for each point Myxq ∈),( , the change ),( ytxt ∆∆ from ),,( ipyxt to ),,( ii ppyxt ∆+ reads

( )),,(),,(),,,(),,( iiiiii pyxytppyxytpyxxtppyxxt −∆+−∆+

Then eliminating the translation components, the changes in x and y may be approximated by

ipi pxtxt ∆⋅≈∆ ∆∂∂ ,

ipi pytyt ∆⋅≈∆ ∆∂∂ (8)

),,,,,,0,0(),,,,,,,( 654321654321 ppppppxtppppppyxxtxt −=∆ ),,,,,,0,0(),,,,,,,( 654321654321 ppppppytppppppyxytyt −=∆

To reserve the digital topology of an image before and after the transformation and the direct adjacency between ),,( ipyxt and ),,( ii ppyxt ∆+ , the shape change at any position ( , )x y M∈ originated from the change of any parameter at the amount of a single step should be no more than one pixel in image. Thus we get

[ ] 1)()( 22222 ≤∆⋅+≈∆+∆ ∆∂∂

∆∂∂

ipipi pytxtytxt (9)

22 )()(/1min ∆∂∂

∆∂∂

∈ +=∆ ytxtp pipiTti (10a)

( )∆∂∂

∆∂∂

∈⋅=∆ ytxtp pipiTti /1 ,/1 min)2/2(' (10b)

Table 1. Lower and upper limits of transform parameter steps

Step lengths [ ]14,.14., −∈ϕω rad [ ]28,.28., −∈ϕω x∆ (pixel) .4055 .0861 y∆ (pixel) .4055 .0861

s∆ .0127 .0052 ω∆ (rad) .0042 .0008 ϕ∆ (rad) .0042 .0008

Lower limits

κ∆ (rad) .0108 .0042 x∆ (pixel) .5735 .1218 y∆ (pixel) .5735 .1218

s∆ .0180 .0074 ω∆ (rad) .0060 .0012 ϕ∆ (rad) .0060 .0012

Upper limits

κ∆ (rad) .0153 .0060

Table 1 gives the lower and upper limits of the steps of those transformation parameters. In this paper, we take the half of the lower limits as the actual step lengths, that is )0064.,0029.,0029.,0076.,2232.,2232(.),,,,,( =κ∆ϕ∆ω∆∆∆∆ syx . Thus the total points in the transformation space is 1422 10547.19742.486.5274.1137 ×≈×××=TN .


Similarity Measure: Adaptive and Robust Hausdorff Distance The conventional partial Hausdorff distance uses fixed fraction to eliminate effects of noises and outliers.

However, the ratios of outliers are often different for different matching instances. It is hard to determine an appropriate value of the fraction in advance so that all the outliers are exempt from the estimate perfectly. So the use of the partial Hausdorff distance has great limitations. In Hu et al. (2005), we have proposed an adaptive and robust Hausdorff distance (ARHD) to compute an empirical distribution function obeyed by the distance variable from the distance map of each template. This obtains sufficient information about the distribution structure of edge points in the templates, and makes the adaptive Hausdorff distance

Rfh (Equation 11) insensitive to the fraction values. The fractions are determined robustly by analyzing the distance curves and adjusting search directions. This method can get the best distances closest to the actual situations by adaptively calculate best fraction values, which are robust.

∑ ≤≤+−=

}:{ )(11ˆ

siri if drs

hR

( ))(, BtAH t= (11)

where |||,| thtl AfsAfr ⋅=⋅= , ]1,5.0[],[ ⊆hl ff , and ],[ hlR fff ∈ . The fractions hl ff , are low and high fraction, respectively, such as 9.0,8.0 == hl ff . Reliability of ARHD

To evaluate the reliability of the ARHD, we start from an initial biased transformation, and partition the steps iteratively until the ARHD value between the template and the reference image does not decrease any more. Table 2 shows a running instance using M5 in Figure 7 with 16 iterative partitions. The testing results show that a final ARHD value can be expected to fall from 0 to 0.9 pixel using many real time models. We observed from most running instances that the transformation parameters will convergent to the known accurate ones when using iteratively portioned step length by applying the local improvement technique (described in next section). This phenomena indicates that the ARHD is a type of reliable matching measure since it can correctly and sensitively express the small changes between two point sets due to very small changes of the transformation parameters.

Table 2. Procedure of refining transformation parameters

Partition times ( xd ,yd , s ,ω ,ϕ , κ ) ARRHD Distance (pixels)

Initial position (1.157244, -.852781, .959261, -.015855, -.037884, -.049929) 1.448205 0 ( .710744, -.629531, 1.004861, -.015855, -.029184, -.004779) 1.135396 1 ( .264244, -.294656, 1.001061, -.007155, -.007434, -.001554) .690218 2 ( .208432, -.238844, 1.001061, -.005705, -.005259, .000059) .476205 3 (-.014819, -.071406, 1.000111, -.001718, -.000179, .262880 … …… … 13 (-.000048, -.000114, 1.000000, -.000002, .000001, .000000) .000687 14 (-.000048, .000022, 1.000000, -.000002, .000001, .000000) .000491 15 (-.000048, .000010, 1.000000, -.000002, .000001, .000000) .000327 16 (-.000022, .000010, 1.000000, -.000002, .000001, .000000) .000327

Known position ( 0, 0, 1 , 0, 0, 0) .000000 Search Strategy: Hybrid Genetic Algorithm and Local Improvement

In previous sections, we have determined the feature space, used one group of parametric variables to present the transformation space, and established the evaluating index for assessing a specific transformation. In this section, the task is to find the best $t that is the optimum matching position and satisfies

( )BAtHt tTt ),(minargˆ 1−∈= or ( ))(,minargˆ BtAHt tTt∈= (12)

Equation 12 brings out an optimizing problem for a non-linear function with multiple peaks and variables. Moreover, this object function may be in-continuous and be different in present of noises. The entire transformation space has to be searched to find the best solution. However, this is not practical when it is required to be finished in a reasonable time limit. Huttenlocher (1993) proposed to search the transformation space point by point, and this


leads to a huge amount of calculations even if three kinds of acceleration technologies have been employed. Because it is hard to predict the complexity of the object function, the HCMA matching method proposed by Borgerfors (1988) could guarantee a good solution. Houck et al. (1996)’s research results show that genetic algorithms (GAs) can steadily find the good solutions with only few experiments when searching the big transformation space (although the optimal solution cannot be guaranteed). In view of the fact that the number of initial search points initN is extremely large, we take GAs as the main method for organizing the searching process and the local improvement technique as an assistant to seek a feasible solution to the above optimization problem.

GAs is kind of random and adaptive searching algorithms borrowed from the natural selection principle and the natural heredity mechanism (Whiteley, 1994; Forrest, 1996; Michalewicz, 1997). The implicit parallelism and the capability of effectively utilizing all the information are its two major characteristics. The former causes it only to have to examine few structures to reflect the massive regions of the search space, and the latter enables it to be robust and steadiness. GAs applies the selection, crossover and mutation operators to the individuals to probe the whole transformation space and to find out the prospective regions. Thus it might obtain the global optimum solution for some fitness function in the given parametric space.

Although the typical GAs are usually robust, it may not be the most successful optimizing algorithm for some specific domains. It is revealed in literature that GAs has a good performance when performing a global searching, but seems in-competent to the local searching. While local improvement algorithms (e.g., hill-climbing) can quickly find the local optimum solution in a small region, but are disadvantageous in global searching. Houck et al. (1995) and Joines et al. (2000) found that the local improvement could help GAs process a wide range of optimizing problem and it almost made GAs find better solutions faster although the schema rule might be corrupted by the Lamarckian learning.

So we modified the typical GAs by embedding the local improvement technique, and the algorithm has 6 steps:

(1) Create an initial population )0(P with gaN chromosomes, and calculate fitness values: 1,() ←gzegaInitiali ;

(2) Produce the intermediate population )( )1()'( −= gg PgaSelectP using the selection operator; (3) Produce a new population )( )'()''( gg PepreducegaRP = by applying the crossover and mutation operators to

the intermediate population )'( gP ; (4) Obtain the local minimum point )1,( )()'( gg tspLIPt = (the parameter 1 indicates to partition the parameter

steps once) by applying the local improvement to the present best transformation )(gt in )''(gP , and replace )(gt with the local optimum )'(gt at the probability of sp (usually is equal to 1) to produce a new population )(gP ;

(5) Calculate the fitness values for all the individuals using the fitness function: )( )( gPgaEvaluate ; (6) Test the termination condition. If satisfied, stop; otherwise, jump to Step 2 with 1+← gg .

The variable g stands for the evolution generation. This algorithm requires determining six basic questions: the creation of an initial population, representation of an individual, establishment of the fitness function, selection function, genetic operators and the termination condition. Among them, the fitness function is given below:

)ˆ(),,,,,( tvsddv yx =κϕω =( )BAtH t ),( ˆ

1−+ττ (13)

where τ is a distance threshold used to reject a bad transformation (Hu et al., 2005).

HIERARCHICAL HYBRID GENETIC

MATCHING ALGORITHM

By combining the components

discussed in the above sections, we can develop an effective technical framework for the target

Adaptive smoothing

Edge detection

Synthesized edge

Distance transform

ARHD distribution

Synthesized and small scale layers

Figure 4. Preprocessing

Input I, M

Initial population

Genetic operators

Local improvement

Improve the best transforms

Figure 5. Matching procedure

Candidates }ˆ{ it

Synthesized and small scale layers

Condition satisfied?

No

Yes


localization problem as below:

(1) Generating the data used for matching (Figure 4). First, the synthesized multi-scale and a small scale edge maps are detected for both the reference and model images. Second, the distance transformation is applied to the model and the empirical distribution function of the distance variable is estimated from the distance map (Hu et al., 2005). This process create a dual layer structure: a) a synthesized layer that is composed of the edge map of the reference image, the distance map of the model and the empirical distribution function all synthesized with two scales; b) a small scale layer that is composed of the edge map of the reference image, the distance map of the model and the empirical distribution function at a small scale.

(2) The matching process (Figure 5). First, the searching is performed on the synthesized layer using the hybrid genetic algorithm and the ARHD, and a total of gaN possible transformations }|{

gaNi Iit ∈ can be obtained. Second, re-permute those transformations according to their fitness values in the ascend order, and a set },|{ gami NmIit <∈ is obtained by choosing the first m transformations with largest fitness. Third, the local improvement is performed at the small scale layer to refine },|{ gami NmIit <∈ and thus the candidate transformations },|ˆ{ gami NmIit <∈ are found. (3) Hypothesis testing and conflict resolving (Figure 6). Each transformation in the candidate set

},|ˆ{ gami NmIit <∈ is tested to get the correct ones. Then those transformations },|ˆ{ mnIit ni ≤∈ passed the test are compared mutually. Finally, the transformation with the smallest ARHD value is considered to be the best one t .

We call the above algorithm hierarchical hybrid genetic matching algorithm based on the adaptive robust Hausdorff distance (HHGMA-ARHD). The HHGMA algorithm requires a group of parameters, such as the population size (Nga), selection strategy (S: S1 – pure selection; S2 – optimal selection), crossover probability (pc), mutation probability (pm), distance threshold ( τ ) and maximum iteration numbers etc., used to control the evolution procedure. Different values for these parameters will influence the matching performance. We design some experiments to determine the optimum values of five most important parameters. They are 1000 (Nga), S2 (S), 0.6 (pc), 0.2 (pm) and 1.8 ( τ ).

EXPERIMENTAL RESULTS AND EVALUATION Results The two reference images I1 (see Figure 7) and I2 (see Figure 8) are composed of a stereo pair. It is should be noted that the origins of the coordinate systems of both the reference and template images are located at their image

Hypothesis testing

Conflict resolving

Figure 6. Post-processing

Candidates }ˆ{ it

Best transform t

I1

M1 M4 M7

M2 M5 M8

M3 M6 M9

Figure 7. Reference image I1 and real-time models M1 ~ M9

I2

M10 M13 M16

M11 M14 M17

M12 M15 M18

Figure 8. Reference image I2 and real-time models M10 ~ M18


centers respectively. Among the templates of I1, M1 ~ M3 are taken from I1 itself, and M4 ~ M9 are from I2. And among the templates of I2, M10 ~ M15 are taken from I1, and M16 ~ M18 are from itself. Table 3 gives a part of their known transformations and the results obtained by matching using the ARHD. Parts of testing results are also illustrated in Figures 9 and 10, where the raw images are placed at the left side and the synthesized edges are at the right side.

Table 3. Correct and algorithm resulting transformations of real-time models

No. Known transformations Transformations Found by ARHD Results M3 (-100, 100, 1.1, .1,-.1, π/2) (-99.994, 99.991, 1.1007, .09978, -.0998, 1.5708) - M5 (0, 0, 1, 0, 0, 0) (-.0008, -.00512, .9999, -.00013, .000043, .00006) - M7 (100, -100, .9, .1, -.1, π/4) (99.773, -100.168, .8995, .1003, -.100139, .785) - M9 (100, 100, 1.1, .1, -.1, 3π/4) (100.141, 99.900, 1.10005, .0984, -.09965, 2.356) Figure 9

M10 (-100, -100, .9, -.1, -.1, π/4) (-100.046, -100.186, .894, -.1035, -.0981, .7970) - M12 (-100, 100, 1, .1, -.1, 3π/4) (-100.1449, 100.203, .992, .105, -.0997, 2.370) -M14 (0, 0, 1.1, -.1, -.1, -3π/4) (.1787, -.2109, 1.100008, -.0988, -.1002, -2.356) - M16 (100,-100, 1.1,-.1,-.1,-π/4) (99.979, -99.972, 1.0998, -.0992, -.1005, -.785) Figure 10

Figure10. Registration result of M16 on I2

Figure 9. Registration result of M9 on I1


Performance Analysis Using the system parameters (1000, S2, 0.6, 0.2, 1.8), each model M1 ~ M18 is matched against the reference

images 20 times at the small scale layer. The numbers of successful and failed matchings are 147 and 213, respectively. For a successful matching, it needs about 2 to 250 iterations, and the average iteration number is 108. The success matching rate pr is between 25% and 65%, and the average rate is 40.8%. Each matching instance costs about 3 to 236 minutes CPU time, and the average time needed is 154 minutes. But for successful matchings, the average CPU time is about 86 minutes. Stability and Reliability

The object recognition can identify the corresponding area of the real time model in the reference image. But the automatic matching results are not guaranteed to be true in every situation. Here, we try to explain and estimate the possibility of false alarms. In Hu et al. (2005), we have pointed out that the ARHD is effective in the condition that the difference between the transformation t to be estimated and the known transformation t is not larger than 4 to 8 times of the lower parameter steps. We also found in Section 2 that t usually can convergent to t by the local improvement when the above condition is satisfied. So we can consider that t could be found in a large probability close to 1 once the searching point t falls within the region that is centered at t and contains all the transformations within plus/minus 4 to 8 times of the step length of each dimension. We call that t and t are accommodating, and denote tC as the accommodating region of t that contains all the transformations accommodating with t in the transformation space T. tC is a six dimensional convex subset of T, and is defined as

{ }ksyxttksyxTttCt ⋅κ∆ϕ∆ω∆∆∆∆≤−≤⋅κ∆ϕ∆ω∆∆∆∆−∈= ),,,,,(ˆ),,,,,(,ˆ|ˆ , }8,7,6,5,4{∈k (14)

Therefore, the probability of finding the accommodating region of the best transformation when creating the initial population in HHGMA by uniform random sampling should be less than

gaNinitr Np )/11(10 −−= (15)

In Equation 15, if we have 1000,10181.5 7 =×= gainit NN , then rr pp <<×≈ −50 1093.1 . This shows that statistically

the success matching rate of 25 ~ 65% is the result produced by the effective searching, and HHGMA is inherently stable and reliable. Search Ratio and Time

The search rate is defined as the ratio between the computation times of the ARHD ( distN ) in a running of the algorithm and the number of accommodating regions in T ( initN ). For example, when the ARHD is computed the maximum 400,000 times, we will get a very low search rate of 0.77%. In such a case, 99.23% of the transformation space is actually not examined. Among 360 matching instances, the average value of distN is 217,550, and the average CPU time is 154 minutes. We found that basically the CPU time is linearly related with distN . Thus each calculation of the ARHD costs about 42 milliseconds. Convergence Speed

Figure 11 shows the frequency distribution of the iterations among 147 success matchings, and Figure 12 further depicts the uncertainty ranges of the average iterations by re-sampling 1000 times applying the Bootstrap technique. It is observed from Figure 12 that 90% of the success matchings are expected to be completed with about 101 to 115 iterations. A typical evolution procedure for matching M5 is illustrated in Figure 13, where the dotted and solid lines represent the fitness curve and the distance curve, respectively. We find that the

Figure 11. Frequency of required iterations

iteration

frequency


searching is converged to the correct transformation gradually with iterations. Moreover, there are often two or three jumps during the convergence procedure for most success matchings.

Errors of Transformation Parameters and Location

Even if the model is correctly matched to the reference image, the searched transformation t is unlikely equal to the known transformation t rigorously. This is because the noise, the interpolation errors when performing image transformation, the in-accuracy of the ARHD and other factors might influence the comparison. That is to say, there are discrepancies between the estimated and the known transformation parameters. This will certainly result in positioning errors. In general, the positioning accuracy is higher if the co-variances of the errors of the transformation parameters, and vice visa. Therefore, the errors of the transformation parameters and that of the target localization are important index for evaluating the results obtained by HHGMA.

Figure 14 illustrates the frequency distribution of the localization errors for 147 success matchings. The average errors of those transformation parameters are ),,,,,( κϕωsdd yx = (-.103758, -.088885, -.002209, -.001247, .000875,

.003108), and their variances are )~,~,~,~,~,~( κϕωsdd yx = (.368139, .269822, 0.007624, .038925, .021718, .009614). The

average localization error is d = 0.196783 pixel, and its variance is d~ = 0.231915 pixel. Compared with τ and 0σ , the errors of both the transformation parameters and the positioning are very small. This should be attributed to the effectiveness and reliability of the ARHD and the introduction of the local improvement technique because they ensure that normally we can expect a high accuracy transformation and very small positioning errors provided that t is approaching the accommodating region of t . Unfortunately, for failed matchings, the errors of the found transformation parameters could be as large as their whole ranges and the localization errors

Figure 12. Frequency of average required iterations by Bootstrap

frequency

iteration

Figure 13. Typical evolution procedure of success matchings

generation

value

Figure 14. Frequency of localization errors of success matchings

frequency

error


could be about the size of the reference image.

CONCLUSIONS

Target localization for navigation requires determining the relative orientation parameters between the real-time model and the reference image. The differences between sensed objects, such as pose change, scale change, geometric distortion and occlusion, are stem from their different imaging conditions. In this paper, we proposed and implemented a hierarchical hybrid genetic matching algorithm for locating targets under planar perspective projection taking the missile navigation as the study platform.

The HHGMA algorithm uses the adaptive and robust Hausdorff distance, effectively combines genetic algorithms and the local improvement technique, implements the template-image matching under the planar perspective projection, and incorporates some knowledge specific to the target recognition problem. Its basic ideas and methods have formed a new framework of template matching technology. The performance of our matching algorithm, including its stability, reliability, convergence and location errors, is analyzed in detail based on the comprehensive experimental results using a stereo pair of aerial images. It shows that our algorithm is competent for urban target recognition and localization.

REFERENCES Bergholm, F., 1987. Edge focusing, IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(6): 726-

741. Borgerfors, G., 1988. Hierarchical chamfer matching: a parametric edge matching algorithm, IEEE Transactions on

Pattern Analysis and Machine Intelligence, 10(6): 849-865. Brown, L.G., 1992. A survey of image registration techniques, ACM Computing Surveys, 24(4): 325-376. Box, L., 1994. Digitally continuous functions, Pattern Recognition Letters, 15(8): 115-118. Canny, J., 1986. A computational approach to edge detection, IEEE Transactions on Pattern Analysis and Machine

Intelligence, 8(6): 769-797. Davies, L., 1991. Handbook of Genetic Algorithms, New York: Van Nostrand Reinhold, 385 p. Forrest, S., 1996. Genetic algorithms, ACM Computing Surveys, 28(1): 71-80. Houck, C., Joines, J., Kay, M., 1995. The effective use of local improvement procedures in conjunction with genetic

algorithms, NCSU-IE Technical Report, North Carolina State University. Houck, C., Joines, J., Kay, M., 1996. A genetic algorithm for function optimization: a matlab implementation. ACM

T-Mathematical Software. Hu, Y., Xia, W., Hu, X., Wang, R., 2005. Object recognition through template matching using an adaptive and

robust Hausdorff distance, ASPRS Annual Conference, 7-11 March, Baltimore, 13 p. Huttenlocher, D.P., Klanderman, G.A., Rucklidge, W.J., 1993. Comparing images using Hausdorff distance, IEEE

Transactions on Pattern Analysis and Machine Intelligence, 15(9): 850-863. Jian, Y., Hu, Y., Li, J., Sun, Z., 1999. Phase congruency model for edge detection (in Chinese), Infrared and Laser

Engineering, 28(5): 30-34. Jian, Y., Hu, Y., Li, J., Sun, Z., 2000. Invariant feature detection based on phase congruency (in Chinese), Infrared

and Laser Engineering, 29(1): 17-21. Joines, J., Kay, M., Houck, C., 2000. Characterizing search spaces for tabu search and including adaptive memory

into a genetic algorithm, Journal of the Chinese Institute of Industrial Engineers, 17(5): 527-536. Lacroix, V., 1990. The primary raster: a multi-resolution image description, ICPR, 16-21 June, Atlantic, pp. 903-907. Li, J., 1997, A Request for Proposal on Image Matching Techniques, Technical Report at Institute of Image

Processing and Pattern Recognition, Shanghai Jiaotong University, 6 p. Michalewicz, Z., 1997. Evolutionary computation techniques and their applications, International Conference on

Intelligent Processing Systems, October 28-31, Beijing, pp. 14-25. Whiteley, D., 1994. Genetic algorithms: a tutorial, Statistics and Computing, 4(2): 65-85. You, J., 1994. A guided image matching approach using Hausdorff distance with interesting points detection, IEEE

International Conference on Image Processing, pp. 968-972.

Documents

LOCATING TARGETS UNDER PERSPECTIVE ......ASPRS 2006 Annual Conference Reno, Nevada May 1-5, 2006 LOCATING TARGETS UNDER PERSPECTIVE PROJECTION WITH GENETIC ALGORITHMS AND TABU SEARCH