16
 A New Line Symmetry Distance Based Automatic Clustering Technique: Application to Image Segmentation Sriparna Saha, 1 Ujjwal Maulik 2 1 Image Processing and Modeling, Interdisciplinary Center for Scientic Computing (IWR), University of Heidelberg, Heidelberg, Germany 2 Department of Theoretical Bioinformatics, DKFZ (Deutsches Krebsforsch ungszentrum, German Cancer Research Center), Heidelberg, Germany Received 3 March 2009; accepted 16 May 2010  ABSTRACT: In this article, at rst an automatic clustering technique using the concept of line symmetry property is developed. The pro- posed real-coded variable string length genetic clustering technique (VGALS clustering) is able to evolve the number of clusters present in the data set automati cally . Here assignment of poin ts to diffe rent clusters is done based on the line symmetry based distance rather than the Euclidean distance. The cluster centers are encoded in the chromosomes, whose value may vary. A newly developed line sym- metry based cluster validity index, LineSym-index, is used as a mea- sure of ‘goodness’ ’ of the corresponding partitioning. This validity index is able to correctly indicate the presence of clusters of different sizes as long as they are line symmetrical. A Kd-tree based data structure is used to reduce the complexity of computing the line sym- metry distance. The proposed technique is then applied to automati- cally segmen t diffe rent images . At rst , the superior ity of the pro- posed method to automatically segment the image data sets over Fuzzy C-means clustering technique, well-known mean-shift based method and GAPS clustering with Sym- inde x based method, are demonstrated for three remote sensing satellite images. Thereafter it is applied on several simulated T1-weighted, T2-weighted, and pro- ton density normal and MS lesion magnetic resonance brain images. The proposed method is able to detect most of the regions well. Su- periority of the proposed method over Fuzzy C-means and Expecta- tion Maximization clustering algorithms are demonstrated quantita- tivel y . The auto mati c segmentat ion obtai ned by VGAL S clus terin g technique is also compared with the available ground truth informa- tion. V V C 2011 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 21, 86–100, 2011 ; Publ ished onlin e in Wiley Onlin e Libra ry (wiley onlin elibra ry. com). DOI 10.1002/ima.20243 Key wor ds: unsu pervi sed clas sica tion ; clus ter valid ity index ; symmetry; line symmetry based dista nce; Prin cipal compo nent analysis; Kd tree; magnetic resonance image I. INTRODUCTION Remote sensing satellite images have signicant applications in dif- feren t areas like clima te studie s, assessment of forest resource s, examining marine environments, etc. For remote sensing applica- tio ns, cla ssication is an important tas k whe re the pixel s in the images are classied into homogeneous regions, each of which cor- responds to some particular landcover type. The problem of pixel class icat ion is often posed as clust ering (Saha and Bandyo pad- hyay, 2010) in the intensit y space (Maulik and Bandyo padhy ay, 2003). In the unsupervised pixel classication framework, various clust ering algorit hms like Fuzzy C-mea ns (Canno n et al., 1986), and statistical methods have been used for the purpose of satellite image segmentat ion. Rece ntly appli cati on of genet ic algori thms (GAs) in the eld of pixel classication has obtained signicant atte ntion of the researche rs (Maulik and Bandyo padhya y, 2003; Saha and Bandy opadhy ay, 2008). A conte xt-sen sitiv e clust ering technique for unsupervised image segmentation based on graph-cut initialization and expectation-maximization algorithm is developed by Tyagi et al., (2008). An unsupervised hyperspectral image classi- cation techn ique based on fuzzy- clust ering algorith ms that spa- tial ly exploit membersh ip relations is propose d by Bilgin et al., (2008). A multi resol ution remote sensin g image clusteri ng tech - nique is proposed by Wemmert et al., (2009) which uses informa- tion contained in both spatial resolutions. An agglomerative hier- archi cal clust ering method for large multispec tral images, whic h uses both spectral and spatial information for the aggregation deci- sion, is proposed by Marcal and Castro, (2005). A new spectral-spa- tial classication scheme for hyperspectral images which combines the results of a pixel wise support vector machine classication and the segmentation map obtained by partitional clustering using ma-  jority voting is proposed by Tarabalka et al., (2009). Recently, a multiobjective fuzzy genetic clustering technique for pixel classi- cation has been proposed by Bandyopadhyay et al., (2007). By Cha- mundeeswari et al., (2007), an unsupervised classication algorithm Correspondence to: Sriparna Saha; e-mail: [email protected] ' 2011 Wiley Periodicals, Inc.

A New Line Symmetry Distance Based Automatic

Embed Size (px)

Citation preview

Page 1: A New Line Symmetry Distance Based Automatic

8/2/2019 A New Line Symmetry Distance Based Automatic

http://slidepdf.com/reader/full/a-new-line-symmetry-distance-based-automatic 1/16

 A New Line Symmetry Distance Based AutomaticClustering Technique: Application to ImageSegmentation

Sriparna Saha,1 Ujjwal Maulik 2

1 Image Processing and Modeling, Interdisciplinary Center for Scientific Computing (IWR),University of Heidelberg, Heidelberg, Germany

2 Department of Theoretical Bioinformatics, DKFZ (Deutsches Krebsforschungszentrum,German Cancer Research Center), Heidelberg, Germany

Received 3 March 2009; accepted 16 May 2010

 ABSTRACT: In this article, at first an automatic clustering technique

using the concept of line symmetry property is developed. The pro-

posed real-coded variable string length genetic clustering technique

(VGALS clustering) is able to evolve the number of clusters present in

the data set automatically. Here assignment of points to different

clusters is done based on the line symmetry based distance rather

than the Euclidean distance. The cluster centers are encoded in the

chromosomes, whose value may vary. A newly developed line sym-

metry based cluster validity index, LineSym-index, is used as a mea-

sure of ‘‘goodness’’ of the corresponding partitioning. This validity

index is able to correctly indicate the presence of clusters of different

sizes as long as they are line symmetrical. A Kd-tree based data

structure is used to reduce the complexity of computing the line sym-metry distance. The proposed technique is then applied to automati-

cally segment different images. At first, the superiority of the pro-

posed method to automatically segment the image data sets over

Fuzzy C-means clustering technique, well-known mean-shift based

method and GAPS clustering with Sym-index based method, are

demonstrated for three remote sensing satellite images. Thereafter it

is applied on several simulated T1-weighted, T2-weighted, and pro-

ton density normal and MS lesion magnetic resonance brain images.

The proposed method is able to detect most of the regions well. Su-

periority of the proposed method over Fuzzy C-means and Expecta-

tion Maximization clustering algorithms are demonstrated quantita-

tively. The automatic segmentation obtained by VGALS clustering

technique is also compared with the available ground truth informa-

tion. VVC 2011 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 21, 86–100,

2011; Published online in Wiley Online Library (wileyonlinelibrary.com). DOI10.1002/ima.20243

Key words: unsupervised classification; cluster validity index;

symmetry; line symmetry based distance; Principal component

analysis; Kd tree; magnetic resonance image

I. INTRODUCTION

Remote sensing satellite images have significant applications in dif-

ferent areas like climate studies, assessment of forest resources,

examining marine environments, etc. For remote sensing applica-

tions, classification is an important task where the pixels in the

images are classified into homogeneous regions, each of which cor-

responds to some particular landcover type. The problem of pixel

classification is often posed as clustering (Saha and Bandyopad-

hyay, 2010) in the intensity space (Maulik and Bandyopadhyay,

2003). In the unsupervised pixel classification framework, various

clustering algorithms like Fuzzy C-means (Cannon et al., 1986),

and statistical methods have been used for the purpose of satellite

image segmentation. Recently application of genetic algorithms

(GAs) in the field of pixel classification has obtained significant

attention of the researchers (Maulik and Bandyopadhyay, 2003;

Saha and Bandyopadhyay, 2008). A context-sensitive clustering

technique for unsupervised image segmentation based on graph-cut

initialization and expectation-maximization algorithm is developed

by Tyagi et al., (2008). An unsupervised hyperspectral image classi-

fication technique based on fuzzy-clustering algorithms that spa-

tially exploit membership relations is proposed by Bilgin et al.,

(2008). A multiresolution remote sensing image clustering tech-

nique is proposed by Wemmert et al., (2009) which uses informa-

tion contained in both spatial resolutions. An agglomerative hier-

archical clustering method for large multispectral images, whichuses both spectral and spatial information for the aggregation deci-

sion, is proposed by Marcal and Castro, (2005). A new spectral-spa-

tial classification scheme for hyperspectral images which combines

the results of a pixel wise support vector machine classification and

the segmentation map obtained by partitional clustering using ma-

  jority voting is proposed by Tarabalka et al., (2009). Recently, a

multiobjective fuzzy genetic clustering technique for pixel classifi-

cation has been proposed by Bandyopadhyay et al., (2007). By Cha-

mundeeswari et al., (2007), an unsupervised classification algorithmCorrespondence to: Sriparna Saha; e-mail: [email protected]

' 2011 Wiley Periodicals, Inc.

Page 2: A New Line Symmetry Distance Based Automatic

8/2/2019 A New Line Symmetry Distance Based Automatic

http://slidepdf.com/reader/full/a-new-line-symmetry-distance-based-automatic 2/16

using Maximum a posteriori (MAP) segmentation for SAR images

is presented. Some more clustering techniques for satellite image

segmentation can be found elsewhere (Hilbert, 1977; Kauth et al.,

1977; Ghassemian and Landgrebe, 1988; Chen and Landgrebe,

1989; Sayood, 1992; Friedl et al., 2002; Pugh and Waxman, 2006;

Chamundeeswari et al., 2007; Guo et al., 2009; Gandhi et al.,

accepted; Jaffar et al., accepted).

Fully automatic brain tissue classification from magnetic reso-

nance images (MRI) is also an important research issue. The accu-

rate segmentation of MR images into different tissue classes, likegray matter (GM), white matter (WM), and cerebrospinal fluid

(CSF), is an important task. Additionally, regional volume calcula-

tions sometimes also bring even more useful diagnostic information

in diseases like Alzheimer disease, in movement disorders such as

Parkinson or Parkinson related syndrome, in white matter metabolic

or inflammatory disease, in congenital brain malformations or peri-

natal brain damage, or in post traumatic syndrome. The automatic

segmentation of brain MR images, however, remains a difficult

problem. Clustering approaches have been widely used for segmen-

tation of MR brain images. The use of neural networks, evolution-

ary computation and/or fuzzy clustering techniques for MR image

segmentation has been investigated in (Suckling et al., 1999; Bhan-

darkar and Zhang, 1999; Saha and Bandyopadhyay, 2007, 2009,

accepted).

To cluster a data set, some similarity or dissimilarity criteria has

to be defined. A new type of nonmetric distance, based on point

symmetry, (d ps), is proposed in (Bandyopadhyay and Saha, 2007).

For reducing the complexity of point symmetry distance computa-

tion, Kd-tree based data structure is used. From the geometrical

symmetry viewpoint, point symmetry and line symmetry are two

widely discussed issues. Inspired by this, a line symmetry based dis-

tance was proposed in (Saha and Bandyopadhyay, 2007). But the

proposed distance had several drawbacks. The major shortcoming

of the old line symmetry based distance was that its application is

limited to two-dimensional data sets only. A new line symmetry

based technique using principal component analysis is developed in

(Saha and Bandyopadhyay, accepted) that removes the limitationsof (Saha and Bandyopadhyay, 2007). This new line symmetry based

distance is more general and is applicable for any dimensional data

sets. In (Saha and Bandyopadhyay, accepted), a genetic clustering

technique is also developed based on this line symmetry distance.

But this technique a priori assumes the number of clusters present

in a data set. In this article, we have developed a line symmetry

based automatic clustering technique. The motivation of this article

is to develop a line symmetry-based automatic genetic clustering

technique for image segmentation.

In this article, a variable string length genetic line symmetry

(VGALS-clustering) based clustering technique is proposed which

is then used to automatically segment the remote sensing satellite

images. Centers of the clusters are encoded in the chromosome

whose values vary over a certain range. Here assignment of pointsto different clusters is done based on a newly proposed line symme-

try based distance rather than the Euclidean distance. A new cluster 

validity index based on the newly developed line-symmetry based

distance, LineSym-index, is proposed here and thereafter it is uti-

lized for computing the fitness of the chromosomes. The effective-

ness of the proposed line symmetry-based automatic clustering

technique (VGALS-clustering) is shown for automatically partition-

ing Indian remote sensing (IRS) satellite images of the parts of the

city of Kolkata and SPOT image of the part of the city of Kolkata.

Comparisons are made with those obtained by Fuzzy C-means clus-

tering technique, popular Mean-shift based segmentation technique

(Comaniciu and Meer, 2002) and GAPS clustering with Sym-index

based method (Bandyopadhyay and Saha, 2007). The segmentation

results are compared qualitatively and quantitatively.

The effectiveness of the proposed algorithm is thereafter 

shown in segmenting the MR images of the normal brain and MR

brain images with multiple sclerosis lesions. The segmentation

results are then compared with the available ground truth informa-

tion. For the purpose of comparison, the well-known Fuzzy C-means algorithm (Bezdek, 1973) and the Expectation Maximiza-

tion (EM) (Jain et al., 1999) clustering algorithm are also exe-

cuted, firstly with the number of clusters automatically deter-

mined by the VGALS clustering and then with the actual number 

of clusters present in the images. The segmentation results are

compared with that provided by VGALS clustering algorithm

quantitatively.

In a part of the experiment, fuzzy variable string length genetic

algorithm (Fuzzy-VGA) (Maulik and Bandyopadhyay, 2003) which

uses the Euclidean distance for computing the membership values

of points to different clusters is also executed on the MR brain

images to automatically segment it. The results are also compared

with those obtained by VGALS clustering technique. compared

qualitatively and quantitatively.

II. SYMMETRY BASED DISTANCES

In this section, at first the existing point symmetry based distance is

described in brief. Then a new definition of the newly developed

line symmetry based distance is proposed.

 A. Existing Point Symmetry Based Distance. In this section,

a new PS distance (Bandyopadhyay and Saha, 2007), d ps( x,c), asso-

ciated with point x with respect to a center  c is described. Let a

point be x. The symmetrical (reflected) point of  x with respect to a

particular centre c is 2 3 c 2  x. Let us denote this by x*. Let knear 

unique nearest neighbors of  x* be at Euclidean distances of d is, i 5

1,2,. . .knear . Then

d psð x; cÞ ¼ d symð x; cÞ3d eð x; cÞ; ð1Þ

¼

Pknear i¼1 d i

knear 3d eð x; cÞ; ð2Þ

where d e( x,c) is the Euclidean distance between the point x and c.

The complexity of computing d ps( x,c) is of order  n, where n is

the total number of data points. For all the n points and K  clusters,

the complexity becomes of order  n2 K . Thus, to reduce this, the Kd-

tree based nearest neighbor search ANNlib (Mount and Arya, 2005)

(Approximate Nearest Neighbor), which is a library written in

C11 (obtained from http://www.cs.umd.edu/ $mount/ANN) isused in (Bandyopadhyay and Saha, 2007). ANNlib is used to find

d i, i 5 1 to knear , in Eq. (2) efficiently.

B. Newly Developed Line Symmetry Based Distance

(Saha and Bandyopadhyay, accepted). Given a particular 

data set, we first find the first principal axis of this data set using

Principal Component Analysis (Jolliffe, 1986). Let the eigen vec-

tor of the co-variance matrix of the data set with highest eigen

value be [eg1 eg2 eg3 eg4 . . . egd ], where d  is the dimension of the

Vol. 21, 86–100 (2011) 87

Page 3: A New Line Symmetry Distance Based Automatic

8/2/2019 A New Line Symmetry Distance Based Automatic

http://slidepdf.com/reader/full/a-new-line-symmetry-distance-based-automatic 3/16

original data. Then the first principal axis of the data set is

given by:

ð x1 À c1Þ

eg1

¼ð x2 À c2Þ

eg2

¼ . . . ¼ð xd  À cd Þ

egd 

where the center of the data set is c 5 {c1, c2, . . ., cd }.

The obtained principal axis is treated as the symmetrical line of 

the relevant cluster, i.e., if the data set is indeed symmetrical then it

should also be symmetric with respect to the first principal axis of the dataset identified by the principal component analysis (PCA).

This symmetrical line is used to measure the amount of line symme-

try of a particular point in that cluster. To measure the amount of 

line symmetry of a point ( x) with respect to a particular line i, d ls( x,

i), the following steps are followed.

1. For a particular data point x, calculate the projected point pi

on the relevant symmetrical line i.

2. Find d sym( x, pi) as:

:d symð x;  piÞ ¼

Pknear i¼1 d i

knear ð3Þ

where knear  unique nearest neighbors of x* 5 2 3  pi 2  x are at

Euclidean distances of  d is, i 5 1,2,. . .knear . ANN library

(Mount and Arya, 2005) utilizing Kd-tree based nearest neigh-

bor search is used to reduce the complexity of computing these

d is (as described in Section II-C). Then the amount of line sym-

metry of a particular point x with respect to the symmetrical line

of cluster i, is calculated as:

:d lsð x; iÞ ¼ d symð x;  piÞ3d eð x; cÞ ð4Þ

where c is the centroid of the particular cluster  i and d e ( x,c) is

the Euclidean distance between the point x and c.

It can be seen from Eq. (3) that knear cannot be chosen equal to 1,

since if x* exists in the data set then d sym( x, pi) 5 0 and hence there

will be no impact of the Euclidean distance in the definition of  d ls( x,

i). On the contrary, large values of knear may not be suitable because

it may underestimate the amount of symmetry of a point with respect

to the first principal axis. Here knear  is chosen equal to 2. It may be

noted that the proper value of  knear  largely depends on the distribu-

tion of the data set. A fixed value of  knear  may have many draw-

backs. For instance, for very large clusters (with too many points), 2

neighbors may not be enough as it is very likely that a few neighbors

would have a distance close to zero. On the other hand, clusters with

too few points are more likely to be scattered, and the distance of the

two neighbors may be too large. Thus, a proper choice of  knear is an

important issue that needs to be addressed in the future.

Note that every point symmetric cluster is also line symmetricwith respect to its central axis. Thus, the proposed distance is able to

detect line symmetric as well as point symmetric clusters. The previ-

ous point symmetry based distance was only able to measure the

amount of point symmetry of a point with respect to a cluster center.

But the proposed line symmetry based distance measures total line

symmetry with respect to the symmetrical line of a cluster.

The above line symmetry based distance is realized more clearly

from Figure 1. Here x is a particular data set. c is the center of a par-

ticular cluster. The first principal axis of the given cluster is denoted

by Line l. This line is treated as the symmetrical line of the particu-

lar cluster. The projected point of  x on this symmetrical line is

denoted by p. Let the reflected point of  x with respect to p be

denoted by x0. The first two nearest neighbors (here knear  is chosen

equal to 2) of  x0 are at Euclidean distances of  d 1 and d 2, respec-

tively. Then the total amount of symmetry of  x with respect to the

projected point p on line Line l is calculated as follows:

d symð x;  pÞ ¼ d 1þd 22

. Therefore, the total line symmetry based distance

of the point x with respect to the symmetrical line of cluster l is cal-

culated as d ls( x,l) 5 d sym( x, p) 3 d e( x, c) where d e( x, c) is the Eu-

clidean distance between the point x and the cluster center c.

It is evident that the symmetrical distance computation is very

time consuming because it involves the computation of the nearest

neighbors. Computation of d ls( x,i) is of complexity O( N ). Hence for 

 N points and K  clusters, the complexity of computing the line sym-

metry based distance between all points to different clusters is

O( N 2 K ). To reduce the computational complexity, an approximate

nearest neighbor search using the Kd-tree approach is adopted in

this article.

C. Kd-tree Based Nearest Neighbor Computation. A K -

dimensional tree, or Kd-tree is a space-partitioning data structure

for organizing points in a K-dimensional space. A Kd-tree uses

only those splitting planes those are perpendicular to one of the

coordinate axes. Approximate Nearest Neighbor (ANN) is a library

written in C11 (Mount and Arya, 2005), which supports data

structures and algorithms for both exact and approximate nearest

neighbor searching in arbitrarily high dimensions. In this article

ANN library utilizing Kd-tree for nearest neighbor search is used to

find d is, where i 5 1,. . . , knear , in Eq. (3) efficiently. Thus, it

requires the construction of a Kd-tree consisting of N  points in the

data set, where N is the size of the data set. The construction of Kd-

tree requires O( N log N ) time and O( N ) space (Anderberg, 2000).

(Friedman et al., 1977) reported O(log N ) expected time for findingthe nearest neighbor using Kd-tree.

III. VGALS-CLUSTERING: VARIABLE STRING LENGTH

GENETIC LINE SYMMETRY DISTANCE BASED

CLUSTERING TECHNIQUE

In this section, the use of variable string length genetic algorithm

using the newly developed line symmetry based distance (VGALS-

clustering) is proposed for automatically evolving the number of 

clusters present in a data set. Here we have considered the best

Figure 1. Example of line symmetry based distance [color figure

can be viewed in the online issue, which is available at

wileyonlinelibrary.com.].

88 Vol. 21, 86–100 (2011)

Page 4: A New Line Symmetry Distance Based Automatic

8/2/2019 A New Line Symmetry Distance Based Automatic

http://slidepdf.com/reader/full/a-new-line-symmetry-distance-based-automatic 4/16

partition to be the one that corresponds to the maximum value of 

the proposed LineSym-index which is defined later. Both the num-

ber of clusters as well as the appropriate partitioning of the data are

evolved simultaneously using the search capability of genetic algo-

rithms. Since the number of clusters is considered to be variable,

the string lengths of different chromosomes in the same population

are allowed to vary. As a consequence, the crossover and mutation

operators are suitably modified to tackle the concept of variable

length chromosomes. The technique is described below in detail.

  A. String Representation and Population Initialization. In

VGALS-clustering, the chromosomes are made up of real numbers

which represent the coordinates of the centers of the partitions. If 

chromosome i encodes the centers of  M i clusters in N  dimensional

space then its length li is taken to be N * M i. Each center is consid-

ered to be indivisible. Each string i in the population initially enco-

des the centers of a number, M i, of clusters, such that M i 5

(rand()mod M *) 1 2. Here, rand() is a function returning an integer,

and M * is a soft estimate of the upper bound of the number of clus-

ters. The number of clusters will therefore range from two to M * 1

1. The M i centers encoded in a chromosome are randomly selected

distinct points from the data set. Thereafter five iterations of the K -

means algorithm is executed with the set of centers encoded in each

chromosome. The resultant centers are used to replace the centersin the corresponding chromosomes. This makes the centers sepa-

rated initially.

Example. Let M * 5 10. Let the random number  M i be equal to 4

for chromosome i. Then this chromosome will encode the centers

of 4 clusters. Let the four cluster centers (four randomly chosen

points from the data set) be (10.0, 5.0) (20.4, 13.2) (15.8, 2.9) (22.7,

17.7). Thus the chromosome may look like (20.4, 13.2) (15.8, 2.9)

(10.0, 5.0) (22.7, 17.7).

B. Fitness Computation. This is composed of two steps. First,

assignment of  n points to different clusters are done by using the

newly developed line symmetry based distance, d ls. Next, a new

cluster validity index based on the newly developed line symmetrydistance, LineSym-index, is computed and used as a measure of the

fitness of the chromosome.

(1) Computing the membership values. Let a particular chromo-

some encode centers of K number of clusters. At first, the first princi-

pal axis of each cluster is determined using principal component

analysis (Jolliffe, 1986). Then the assignment of each point x j , j 5

1,2, . . . N , to K different clusters are done in the following way. Find

the cluster center nearest to x j  in the line symmetrical sense. That is,

we find the cluster center k that is nearest to the input pattern x j  using

the minimum-value criterion: k 5 argmini51,. . . K  d ls( x j,i) where the

line symmetry based distance d ls( x j,i) is computed by Eq. (4). If the

corresponding d sym( x j , k ) [as defined in Eq. (3)] is smaller than a pre-

specified parameter y, then assign that particular point x j  to k th clus-

ter. Otherwise assignment is done based on the minimum Euclideandistance criterion as normally used in (Bandyopadhyay and Maulik,

2002) or the K -means algorithm, i.e., assign x j  to k th cluster where k 

5 argmini51, . . . K d e( x j , ci). Here, ci denotes the center of the ith

cluster and d e( x j , ci) denotes the Euclidean distance between the point

 x j  and the cluster center  ci. The reason for doing such an assignment

is as follows: in the intermediate stages of the algorithm, when the

centers are not yet properly evolved, then the minimum d ls value

for a point is expected to be quite large, since the point might not

be symmetric with respect to any cluster. In such cases, using

Euclidean distance for cluster assignment appears to be intuitively

more appropriate.

The value of  y is kept equal to the maximum nearest neighbor 

distance among all the points in the data set as described in (Ban-

dyopadhyay and Saha, 2007, 2008). It is to be noted that if a point

is indeed symmetric with respect to the principal axis of some clus-

ter center then the symmetrical distance computed in the above way

will be small, and can be bounded as follows. Let d  NN max be the

maximum nearest neighbor distance in the data set. That is

d max NN  ¼ maxi¼1;... N d  NN ð xiÞ; ð5Þ

where d  NN ( xi) is the nearest neighbor distance of xi. Assuming that

 x* lies within the data space, it may be noted that

d 1 d max

 NN 

2and d 2

3d max NN 

2; ð6Þ

resulted in, d 1þd 22

d max NN  : Ideally, a point x is exactly symmetrical

with respect to the principal axis of some cluster if d 15 0. However 

considering the uncertainty of the location of a point as the sphere

of radius d  NN max around x, we have kept the threshold y equals to

d  NN max. Thus the computation of y is automatic and does not require

user intervention.After the assignments are done, the cluster centers encoded in

the chromosome are replaced by the mean points of the respective

clusters. This is referred to as the K -means like update center 

operation.

(2) Fitness calculation: The fitness of a chromosome is com-

puted using a newly defined cluster validity index, LineSym-index

Note that this index is inspired by the point symmetry based cluster 

validity index Sym-index (Saha and Bandyopadhyay, 2008; Ban-

dyopadhyay and Saha, 2008). Let K cluster centres be denoted by ci

where 1 i  K and ni denotes the number of points present in the

i th cluster. Then LineSym-index is defined as follows:

LineSymð K Þ ¼1

 K 3

1

E  K 

3 D K 

; ð7Þ

where K  is the number of clusters present in that chromosome. Here,

E  K  ¼P K 

i¼1

 Ei such that Ei ¼Pni

 j ¼1 d lsð xi j ; iÞ and D K  ¼ max K 

i; j ¼1 ci À c j :

 D K  is the maximum Euclidean distance between any two cluster 

centers among all centers. Here x j i denotes the j th point of the ith

cluster. d ls( x j i, i) is computed by Eq. (4) where i denotes the symmet-

rical line (first principal axis) of the ith cluster.

The objective is to maximize the LineSym-index in order to

obtain the actual number of clusters and to achieve proper cluster-

ing. As formulated in Eq. (7), LineSym is a composition of three

factors, these are 1/  K , 1=E  K  and D K . The first factor increases as K 

decreases; as LineSym needs to be maximized for optimal cluster-ing, it will prefer to decrease the value of  K . The second factor is

the within cluster total line symmetrical distance. For clusters which

have good symmetrical structure, Ei value is less. This, in turn, indi-

cates that formation of more number of clusters, which are symmet-

rical in shape, would be encouraged. Finally the third factor, D K ,

measuring the maximum separation between a pair of clusters,

increases with the value of  K . As these three factors are comple-

mentary in nature, so they are expected to compete and balance

each other critically for determining the proper partitioning.

Vol. 21, 86–100 (2011) 89

Page 5: A New Line Symmetry Distance Based Automatic

8/2/2019 A New Line Symmetry Distance Based Automatic

http://slidepdf.com/reader/full/a-new-line-symmetry-distance-based-automatic 5/16

The fitness function for chromosome j  is defined as LineSym j ,

i.e., the LineSym -index computed for partitioning encoded in that

chromosome. The objective of the GA is to maximize this fitness

function.

C. Selection. Conventional proportional selection is applied on

the population of chromosomes. Here, a chromosome receives a

number of copies that is proportional to its fitness in the population.

We have used roulette wheel strategy for implementing the propor-

tional selection scheme.

D. Crossover. For the purpose of crossover, the cluster centers

are considered to be indivisible, i.e., the crossover points can only

lie in between two cluster centers. The crossover operation, applied

stochastically with probability of crossover (lc), must ensure that

information exchange takes place in such a way that both the off-

spring encode the centers of at least two clusters. For this, the oper-

ator is defined as follows (Maulik and Bandyopadhyay, 2003): Let

parent chromosomes P1 and P2 encode M 1 and M 2 cluster centers,

respectively. s1, the crossover point in P1, is generated as s1

5rand() mod M 1. Let s2 be the crossover point in P2, and it may

vary in between [LB(s2),UB(s2)], where LB(s2) and UB(s2) indicate

the lower and upper bounds of the range of  s2, respectively. LB(s2)

and UB(s2) are given by LB (s2) 5 min[2,max[0,2 2 ( M 1 2 s1)]]and UB (s2) 5 [ M 22 max[0,22 s1]]. Therefore s2 is given by

s2 ¼ LBðs2Þ þ randðÞmodðUBðs2Þ À LBðs2ÞÞ

if ðUBðs2Þ !  LBðs2ÞÞ;

¼ 0 otherwise:

It can be verified by some simple calculations that if the crossover 

points s1 and s2 are chosen according to the above rules, then none

of the offsprings generated would have less than two clusters.

E. Mutation. Mutation is applied on each chromosome with prob-

ability lm. Mutation is of three types.

1. Each cluster center encoded in a chromosome is mutated

with probability lm in the following way. The cluster center 

is replaced with a random variable drawn from a Laplacian

distribution, pðeÞ / eÀjeÀljd , where the scaling factor d sets the

magnitude of perturbation. Here l is the value at the position

which is to be perturbed. The scaling factor d is chosen equal

to 1.0. The old value at the position is replaced with the

newly generated value.

2. One randomly generated cluster center is removed from the

chromosome, i.e., the total number of clusters encoded in the

chromosome is decreased by 1.

3. The total number of clusters encoded in the chromosome is

increased by 1. One randomly chosen point from the data set

is encoded as the new cluster center.

Any one of the above mentioned types of mutation is applied

randomly on a particular chromosome if it is selected for mutation.

F. Termination. In this article, we have executed the algorithm

for a fixed number of generations. Moreover, the elitist model of 

GAs has been used, where the best string seen so far is stored in a

location within the population. The best string of the last generation

provides the solution to the clustering problem.

G. Complexity Analysis of VGALS Clustering

Technique. Below we have analyzed the time complexity of the

proposed VGALS clustering technique. Here N : total number of 

points in the data set, M * : Maximum possible number of clusters

and d : dimension of the data.

As discussed above Kd-tree data structure has been used in

order to find the nearest neighbor of a particular point. The

construction of Kd-tree requires O( N log N ) time and O( N )

space (Anderberg, 2000). Initialization of GA needs O(Popsize 3 stringlength) time

where Popsize and stringlength indicate the population size

and the length of each chromosome in the GA, respectively.

Note that stringlength is O( M * 3 d ) where d is the dimension

of the data set and M * is the soft estimate of the upper bound

of the number of clusters.

Fitness Computation is composed of three steps.

1. In order to find membership values of each point to all clus-

ter centers minimum line symmetrical distance of that point

with respect to all clusters have to be calculated. At first

Principal Component Analysis (PCA) is applied to detect the

first Principal axis. This will take O( N 3 d 2) time. There-

after this first Principal axis is used as the symmetrical line

of the respective cluster. In order to determine line symmetrybased distance, Kd-tree based nearest neighbor search is

used. If the points are roughly uniformly distributed, then

the expected case complexity is O(cd 1 log N ), where c is a

constant depending on dimension and the point distribution.

This is O(log N ) if the dimension d  is a constant (Bentley

et al., 1980). (Friedman et al., 1977) also reported O(log N )

expected time for finding the nearest neighbor. So in order 

to find minimal symmetrical distance of a particular point,

O( M *log N ) time is needed. Thus total complexity of com-

puting membership values of  N  points to M * clusters is

O( M * N log N ).

2. For updating the centers total complexity is O( M *).

3. Total complexity for computing the fitness values is O( N 

3  M *).So the fitness evaluation has total complexity5 O(Popsize 3

 M * N log N ).

Selection step of the GA requires O(Popsize3 stringlength) time.

Mutation and Crossover require O(Popsize 3 stringlength)

time each.

Thus summing up the above complexities, and considering string-

length (  N , total time complexity becomes O( M * N log N 3 Popsize)

per generation. For maximum Maxgen number of generations total

complexity becomes O( M * N log N 3 Popsize 3 Maxgen).

H. Space Complexity Analysis. The major space requirement

of VGALS clustering is due to its population. Thus, the total space

complexity of VGALS clustering is O(Popsize 3 Stringlength), i.e.,

O(Popsize 3 d 3  M *). Also for each population we have to keep a

membership matrix of size N 3 M *. Thus total space compexity

will be O(Popsize3  N 3  M *) as N ) d .

IV EXPERIMENTAL RESULTS

  A. Results on Satellite Images. In this section at first, the ex-

perimental results obtained after application of the above mentioned

90 Vol. 21, 86–100 (2011)

Page 6: A New Line Symmetry Distance Based Automatic

8/2/2019 A New Line Symmetry Distance Based Automatic

http://slidepdf.com/reader/full/a-new-line-symmetry-distance-based-automatic 6/16

VGALS-clustering technique for segmenting two remote sensing

satellite images of the parts of the cities of Kolkata are provided.

The two satellite images are of sizes 512 3 512, i.e., the size of the

data set to be clustered in all the images is 262,144. For these multi-spectral satellite images, the feature vector is composed of the in-

tensity values at different bands of the image. The parameters of the

proposed algorithm are as follows: population size520, number of 

generations520, probability of crossover 50.8 and probability of 

mutation50.2. For the purpose of comparison, popular Fuzzy C-

means (FCM) (Bezdek, 1981) clustering algorithm, a recent method

of satellite image segmentation proposed in (Bandyopadhyay and

Saha, 2007) (GAPS clustering with Sym-index based method),

mean-shift based segmentation technique (Comaniciu and Meer,

2002) are also executed on real-life images. The results are com-

pared both qualitatively and quantitatively.

B. IRS Image of Kolkata. The data used here was acquired from

Indian Remote Sensing Satellite (IRS-1A) using the LISS-II  sensor 

that has a resolution of 36.25 3 36.25 h. The image is contained in

four spectral bands namely, blue band of wavelength 0.45–0.52 lm,

green band of wavelength 0.52–0.59 lm, red band of wavelength

0.62 – 0.68 lm, and near infra red band of wavelength 0.77 – 0.86

lm. Thus, the feature vector of each image pixel is composed of 

four intensity values at different bands. The distribution of the pix-

els in the first three feature space of this image is shown in Figure

2. It can be easily seen from the Figure 2 that the entire data can be

partitioned into several different shaped clusters where symmetry

does exist.

Figure 3 shows the Kolkata image in the near infra red band.

Some characteristic regions in the image are the river Hooghly cut-

ting across the middle of the image, several fisheries observed

Figure 3. IRS image of Kolkata in the near infra red band with histo-

gram equalization.

Figure 2. Data distribution of IRS image of Kolkata in the first three

feature space.

Figure 4. Clustered IRS image of Kolkata using VGALS-clustering

technique.

Figure 5. Clustered IRS image of Kolkata using FCM clustering.

Vol. 21, 86–100 (2011) 91

Page 7: A New Line Symmetry Distance Based Automatic

8/2/2019 A New Line Symmetry Distance Based Automatic

http://slidepdf.com/reader/full/a-new-line-symmetry-distance-based-automatic 7/16

toward the lower-right portion, a township, SaltLake, to the upper-

left hand side of the fisheries. This township is bounded on the top

by a canal. Two parallel lines observed towards the upper right-

hand side of the image correspond to the airstrips in the Dumdum

airport. Other than these, there are several water bodies, roads, etc.

in the image. From our ground knowledge, we know that the image

has four clusters (Maulik and Bandyopadhyay, 2003) and these four 

clusters correspond to the classes turbid water, pond water, concrete

and open space.

The VGALS-clustering technique automatically provides four 

clusters for this image data (Fig. 4). It may be noted that the water 

class has been differentiated into turbid water (the Hooghly) and

pond water (fisheries, etc.) because of a difference in their spectral

properties. The canal bounding SaltLake from the upper portion has

also been correctly classified as pond water. Figure 5 shows the

Kolkata image partitioned in four clusters using FCM algorithm

(Bezdek, 1973). But the segmentation result is unsatisfactory from

the human visualization judgement. As can be seen, the river 

Hooghly as well as the city region has been incorrectly classified as

belonging to the same class. Therefore we can conclude that

although some regions, viz., fisheries, canal bounding SaltLake,

parts of the airstrip, etc., have been correctly identified, a significant

amount of confusion is evident in the FCM clustering result. GAPS

clustering technique with Sym index based method automatically

provides K 5 4 number of clusters from this data set. The corre-

sponding partitioning is shown in Figure 7. The automatically seg-

mented image after application of the mean-shift based segmenta-

tion technique (Comaniciu and Meer, 2002) is shown in Figure 6.

To validate the segmentation result obtained by VGALS-cluster-

ing quantitatively, here two well-known Euclidean distance based

cluster validity indices, namely I  index (Maulik and Bandyopad-

hyay, 2002) and XB-index (Xie and Beni, 1991) values are also

computed. These are provided in Table I. Smaller values of XB-

index and larger values of  I  index correspond to good clustering.

The values again show that the segmentation provided by VGALS-

clustering is much better than the existing other methods.

C. SPOT Image of Kolkata. The French satellites SPOT (Sys-

tems Probataire d’Observation de la Terre) (Richards, 1993),

launched in 1986 and 1990, carry two imaging devices that consist

of a linear array of charge coupled device (CCD) detectors. Two

imaging modes are possible, the multispectral and panchromatic

modes. The 512 3 512 SPOT image of a part of the city of Kolkata

is available in three bands in the multispectral mode. These bands

are:

Band 1 — green band of wavelength 0.50 – 0.59 lm

Band 2 — red band of wavelength 0.61 – 0.68 lm

Band 3 — near infra red band of wavelength 0.79 – 0.89 lm.

Thus, here feature vector of each image pixel composed of three

intensity values at different bands. The distribution of the pixels in

the feature space of this image is shown in Figure 9. It can be easily

seen from the Figure 9 that the entire data can be partitioned intoseveral hyperspherical clusters.

Figure 7. Clustered IRS image of Kolkata using GAPS clustering

technique with Sym-index based method.

Figure 6. Clustered IRS image of Kolkata using mean-shift based

clustering technique.

Table 1. I  index and XB-index values of the segmented Mumbai and Kolkata satellite images provided by VGALS-clustering, FCM-

clustering, and method proposed in (Bandyopadhyay and Saha, 2007)

Index

Kolkata IRS Mumbai IRS SPOT Kolkata

VGALS FCM GAPS with Sym VGALS FCM GAPS with Sym VGALS FCM GAPS with Sym

 I  index 24.24 5.71 18.27 214.78 23.06 180.45 26.58 5.71 20.67

XB index 1.75 23.67 2.23 2.12 4.67 2.91 2.28 12.23 3.05

92 Vol. 21, 86–100 (2011)

Page 8: A New Line Symmetry Distance Based Automatic

8/2/2019 A New Line Symmetry Distance Based Automatic

http://slidepdf.com/reader/full/a-new-line-symmetry-distance-based-automatic 8/16

Some important landcovers of Kolkata are present in the image.

Most of these can be identified, from a knowledge about the area,

more easily in the near infra-red band of the input image (Fig. 8).

These are the following: The prominent black stretch across the fig-

ure is the river Hooghly. Portions of a bridge (referred to as the sec-

ond bridge), which was under construction when the picture was

taken, protrude into the Hooghly near its bend around the center of 

the image. There are two distinct black, elongated patches below

the river, on the left side of the image. These are water bodies, the

one to the left being Garden Reach lake and the one to the right

being Khidirpore dockyard. Just to the right of these water bodies,

there is a very thin line, starting from the right bank of the river,

and going to the bottom edge of the picture. This is a canal called

the Talis nala. Above the Talis nala, on the right side of the picture,there is a triangular patch, the race course. On the top, right hand

side of the image, there is a thin line, stretching from the top edge,

and ending on the middle, left edge. This is the Beleghata canal

with a road by its side. There are several roads on the right side of 

the image, near the middle and top portions. These are not very

obvious from the images. A bridge cuts the river near the top of the

image. This is referred to as the first bridge.

The proposed VGALS clustering method automatically provides

 K 5 7 as the optimal number of clusters for this image data set (cor-

responding partitioning is shown in Figure 10). As identified in (Pal

et al., 2001) the above satellite image has seven classes namely, tur-

bid water, concrete, pure water, vegetation, habitation, open space

and roads (including bridges). The partitioning provided by the

Figure 9. Data distribution of SPOT image of Kolkata in the feature

space.

Figure 8. SPOT image of Kolkata in the near infra red band with

histogram equalization.

Figure 10. Clustered SPOT image of Kolkata using VGALS cluster-

ing technique.

Figure 11. Clustered SPOT image of Kolkata using FCM clustering

technique.

Vol. 21, 86–100 (2011) 93

Page 9: A New Line Symmetry Distance Based Automatic

8/2/2019 A New Line Symmetry Distance Based Automatic

http://slidepdf.com/reader/full/a-new-line-symmetry-distance-based-automatic 9/16

VGALS clustering technique separates almost all the regions well.

The Talis nala has been identified properly by the proposed method

(shown in Figure 10). The bridge is also correctly identified by the

proposed algorithm. This again shows that the proposed VGALS is

able to detect clusters of widely varying sizes. The segmentation

result obtained by Fuzzy C-means algorithm on this image for K 5

7 (actual number of clusters present in this image data set) is shown

in Figure 11. It can be seen from Figure 11 that FCM algorithm is

not able to detect the bridge. The automatically segmented image

obtained after application of mean-shift based segmentation tech-

nique (Comaniciu and Meer, 2002) is shown in Figure 12. Here

also the bridge has not been detected. The method proposed in

(Saha and Bandyopadhyay, 2008) (GAPS along with Sym-index)

provides K 5 6 clusters from this data set. The corresponding parti-

tioning is shown in Figure 13. This technique is also able to detect

the bridge correctly.

In order to validate the segmentation result obtained by

VGALS-clustering quantitatively, here two well-known Euclidean

distance based cluster validity indices, namely I  index (Maulik and

Bandyopadhyay, 2002) and XB-index (Xie and Beni, 1991) valuesare also computed. These are provided in Table I. Smaller values of 

XB-index and larger values of  I  index correspond to good cluster-

ing. The values again show that the segmentation provided by

VGALS-clustering is much better than the existing clustering

techniques.

To validate the results, 932 pixel positions were manually

selected from seven different land cover types which were labelled

accordingly. The confusion matrix of the partitioning obtained by

VGALS clustering technique for this data set is shown in Table II.

The class by class classification accuracies are 87, 83, 82, 83, 80,

81, and 89, respectively. The overall accuracy is 87.

D. Results on MR Brain Images. The MR images of the brain

chosen for the experiments are available in three bands: T 1 -weighted, proton density ( pd)-weighted and T 2 -weighted. The nor-

mal brain images are obtained from Brainweb database

(BrainWeb). The images correspond to the 1 mm slice thickness,

3% noise (calculated relative to the brightest tissue) and with 20%

intensity nonuniformity. The image of size 217 3 181 is available

in 181 different z planes. The proposed clustering algorithm is exe-

cuted on seven of these z planes. The parameters of the VGALS

Figure 12. Clustered SPOT image of Kolkata using mean-shift

based clustering technique.

Figure 13. Clustered SPOT image of Kolkata using GAPS cluster-

ing technique with Sym-index based method.

Table II. Confusion matrix of the partitioning obtained by VGALS cluster-

ing technique for numeric SPOT Kolkata image. Here following notations

are used: TW: turbid water, C: concrete, PW: pure water, V: vegetation,

H: habitation, OS: open space and R: roads

Ground Truth (Percent)

Class TW C PW V H OS R

TW 87 0 13 0 0 0 0

C 0 83 0 0 0 0 1

PW 13 0 82 0 0 0 5

V 0 0 0 83 15 3 0

H 0 12 0 10 80 10 1

OS 0 0 0 7 4 81 4

R 0 5 5 0 1 6 89

Table III. Minkowski Scores (MS) Obtained by FCM, EM and VGALS

clustering algorithms on simulated MR volumes for normal brain projected

on different z planes. Here # AC, # OC denotes, respectively, the actual

number of clusters and the automatically obtained number of clusters (after 

application of VGALS).

z Plane No. #AC

MS for AC

#OC

MS for OC

FCM EM FCM EM VGALS

1 6 1.077 1.052 11 0.80 1.17 0.77

2 6 0.76 0.78 9 0.65 0.83 0.62

3 6 0.57 0.76 8 0.62 0.64 0.59

36 9 0.90 0.98 9 0.90 0.98 0.84

72 10 0.75 0.74 10 0.75 0.74 0.70

108 9 0.79 0.589 10 0.81 0.68 0.701

144 9 0.82 0.72 11 0.49 0.60 0.32

94 Vol. 21, 86–100 (2011)

Page 10: A New Line Symmetry Distance Based Automatic

8/2/2019 A New Line Symmetry Distance Based Automatic

http://slidepdf.com/reader/full/a-new-line-symmetry-distance-based-automatic 10/16

algorithm are as follows: population size520, total number of gen-

erations515, probability of crossover (lc)50.9, probability of 

mutation (lm)50.02. Number of clusters, K , is varied from 2 to 20.

For the normal MR brain image, the ground truth information isavailable to us. There are a total of 10 classes present in the images.

But the number of classes varies along the different z planes. Ten

classes are Background, CSF, Grey Matter, White Matter, Fat, Mus-

cle/Skin, Skin, Skull, Glial Matter, and Connective. Table III shows

the actual number of clusters and the number of clusters automati-

cally determined by the proposed VGALS clustering technique (af-

ter application on the above mentioned brain images projected on

different z planes). To measure the segmentation solution quantita-

tively, we have also calculated Minkowski Score (MS) (Ben-Hur 

and Guyon, 2003). This is a measure of the quality of a solution

given the true clustering. Let T be the ‘‘true’’ solution and S the so-

lution we wish to measure. Denote by n11 the number of pairs of 

elements that are in the same cluster in both S and T. Denote by n01

the number of pairs that are in the same cluster only in S and not inT, and by n10 the number of pairs that are in the same cluster in T

and not in S. Minkowski Score (MS) is then defined as:

 MSðT ; SÞ ¼ ffiffiffiffiffiffiffiffiffiffiffiffi

n01þn10

n11þn10

q . For MS, the optimum score is 0, with lower 

scores being ‘‘better.’’ The MS scores obtained by VGALS cluster-

ing corresponding to the 7 brain images are also reported in Table

III. For the purpose of comparison, we have executed Fuzzy C-means (FCM) (Bezdek, 1973) and Expectation Maximization (EM)

(Jain et al., 1999) algorithms on the above mentioned brain datasets

with two different values of  K . In the first case, K is kept equal to the

actual number of clusters that present in that particular plane. Next, it

is set equal to that automatically determined by VGALS algorithm.

The MS scores obtained by both the comparing algorithms are also

reported in Table III for all the seven images. Results show that the

MS scores corresponding to the partitionings provided by the

VGALS clustering, in general, is the minimum among all the parti-

tions. This implies the superior performance of VGALS to automati-

cally detect the proper partitioning from MR normal brain images.

Figures 14a, 15a, 16a, 17a, and 18a show the original MR normal

brain images in T1 band projected on z1, z36, z72, z108, z144

planes, respectively. Figures 14b, 15b, 16b, 17b, and 18b show,respectively, the corresponding automatically segmented images

obtained after application of VGALS clustering algorithm.

Figure 14. (a) Original T1-weighted MR image of the normal brain in z1 plane. (b) Segmentation obtained by VGALS clustering technique.

Figure 15. (a) Original T1-weighted MR image of the normal brain in z36 plane. (b) Segmentation obtained by VGALS clustering technique.

Vol. 21, 86–100 (2011) 95

Page 11: A New Line Symmetry Distance Based Automatic

8/2/2019 A New Line Symmetry Distance Based Automatic

http://slidepdf.com/reader/full/a-new-line-symmetry-distance-based-automatic 11/16

Figure 17. (a) Original T1-weighted MR image of the normal brain in z108 plane. (b) Segmentation obtained by VGALS clustering technique.

Figure 18. (a) Original T1-weighted MR image of the normal brain in z144 plane. (b) Segmentation obtained by VGALS clustering technique.

Figure 16. (a) Original T1-weighted MR image of the normal brain in z72 plane. (b) Segmentation obtained by VGALS clustering technique.

96 Vol. 21, 86–100 (2011)

Page 12: A New Line Symmetry Distance Based Automatic

8/2/2019 A New Line Symmetry Distance Based Automatic

http://slidepdf.com/reader/full/a-new-line-symmetry-distance-based-automatic 12/16

Page 13: A New Line Symmetry Distance Based Automatic

8/2/2019 A New Line Symmetry Distance Based Automatic

http://slidepdf.com/reader/full/a-new-line-symmetry-distance-based-automatic 13/16

Figure 20. (a) Original T1-weighted MRI image of the brain with multiple sclerosis lesions in z36 plane. (b) Segmentation obtained by VGALS

clustering technique.

Figure 21. (a) Original T1-weighted MRI image of the brain with multiple sclerosis lesions in z72 plane. (b) Segmentation obtained by VGALS

clustering technique.

Figure 22. (a) Original T1-weighted MRI image of the brain with multiple sclerosis lesions in z108 plane. (b) Segmentation obtained by VGALS

clustering technique.

98 Vol. 21, 86–100 (2011)

Page 14: A New Line Symmetry Distance Based Automatic

8/2/2019 A New Line Symmetry Distance Based Automatic

http://slidepdf.com/reader/full/a-new-line-symmetry-distance-based-automatic 14/16

lesions are also provided in Table V. Results show that the proposed

VGALS clustering technique is more effective than the Fuzzy-

VGA. This establish the fact that the line symmetry based distanceis more effective in segmenting the MR brain images than the exist-

ing Euclidean distance.

 V. DISCUSSION AND CONCLUSION

In this article, a new variable string length genetic algorithm based

clustering technique is developed which utilizes a recently developed

line symmetry based distance for assignment of points to different

clusters. A new cluster validity index based on the line symmetry

based distance is also developed here and thereafter it is utilized for 

computing the fitness of the proposed genetic clustering technique.

The proposed clustering technique can automatically determine the

appropriate partitioning and the appropriate number of partitions

from a given data set having line symmetrical clusters. The effective-

ness of the proposed technique is shown in detecting the proper parti-

tioning from two remote sensing satellite images of the parts of the

cities of Kolkata. Results are compared with those obtained by the

Fuzzy C-means clustering technique, Mean-shift based method and

GAPS-clustering with Sym-index based method (Bandyopadhyay

and Saha, 2007). Thereafter, the effectiveness of the proposed algo-

rithm is shown in segmenting several MR brain images. The segmen-

tation results are then compared with the available ground truth infor-

mation. For the purpose of comparison, the well-known Fuzzy C-means and EM algorithms are also executed on these images. Experi-

mental results show that VGALS clustering is not only able to auto-

matically segment the MR brain images into different tissue classes

but the corresponding segmentation results are also the best.

Note that present work does not use any spatial information

while segmenting the images. But incorporation of spatial informa-

tion surely improves the quality of the results. Thus some new

methods of incorporating the spatial information have to be

invented in the future. Future work also includes the development

of some fuzzy genetic clustering technique based on the line sym-

metry based distance. Developing some new form of symmetry,

like plane symmetry etc. is also another important future research

work. Authors are currently working in this direction.

REFERENCES

Mark de Berg, Marc van Kreveld, Mark Overmars, and Otfried Schwarz-

kopf. Computational Geometry: Algorithms and Applications. Springer-

Verlag, Berlin, Germany, 2nd edition, 2000.

S. Bandyopadhyay and U. Maulik, Genetic clustering for automatic evolution

of clusters and application to image classification, Pattern Recogn (2002),

1197–1208.

S. Bandyopadhyay and S. Saha, GAPS: A clustering method using a new

point symmetry based distance measure, Pattern Recogn 40 (2007), 3451.

S. Bandyopadhyay and S. Saha, A point symmetry based clustering technique

for automatic evolution of clusters, IEEE Trans Knowl Data Eng 20 (2008),

1–17.

S. Bandyopadhyay, U. Maulik, and A. Mukhopadhyay, Multiobjectivegenetic clustering for pixel classification in remote sensing imagery, IEEE

Trans Geosci Remote Sens 45 (2007), 1506–1511.

A. Ben-Hur and I. Guyon, Detecting Stable Clusters using Principal Compo-

nent Analysis in Methods in Molecular Biology, M. Brownstein and A.

Kohodursky, (Editors), Humana Press, 2003.

J.L. Bentley, B.W. Weide, and A.C. Yao, Optimal expected-time algorithms

for closest point problems, ACM Trans Math Software 6 (1980), 563–580.

J.C. Bezdek, Fuzzy mathematics in pattern classification, Ph.D. dissertation,

Cornell University, Ithaca, NY, 1973.

Figure 23. (a) Original T1-weighted MRI image of the brain with multiple sclerosis lesions in z144 plane. (b) Segmentation obtained by VGALS

clustering technique.

Table V. The automatically obtained cluster (OC) number and the

corresponding Minkowski Scores (MS) after application of VGA And

VGALS clustering algorithms on simulated MR volumes for brain with

multiple sclerosis lesions projected on first 10 z planes

z Plane No. AC

VGA VGALS

OC MS OC MS

1 6 2 1.21 11 0.77

2 6 2 1.20 8 0.77

3 6 2 1.19 11 0.78

4 6 5 0.69 9 0.77

5 6 2 1.184 9 0.75

6 6 2 1.18 8 0.71

7 6 2 1.17 9 0.78

8 6 2 1.16 10 0.73

9 6 2 1.16 11 0.84

10 9 2 1.17 11 0.83

Vol. 21, 86–100 (2011) 99

Page 15: A New Line Symmetry Distance Based Automatic

8/2/2019 A New Line Symmetry Distance Based Automatic

http://slidepdf.com/reader/full/a-new-line-symmetry-distance-based-automatic 15/16

J.C. Bezdek, Pattern recognition with fuzzy objective function algorithms,

Plenum, New York, 1981.

S.M. Bhandarkar and H. Zhang, Image segmentation using evolutionary

computation, IEEE Trans Evol Comp 3 (1999), 1–21.

G. Bilgin, S. Erturk, and T. Yildirim, Unsupervised classification of hyper-

spectral-image data using fuzzy approaches that spatially exploit member-

ship relations, IEEE Geosci Remote Sens Lett 5 (2008), 673–677.

BrainWeb: Simulated brain database. Available at: http://www.bic.mni.

mcgill.ca/brainweb; http://www.bic.mni.mcgill.ca/brainweb/.

C.A. Cocosco, V. Kollokian, R.K.-S. Kwan, A.C. Evans: ‘‘BrainWeb:Online Interface to a 3D MRI Simulated Brain Database’’ NeuroImage,

vol. 5, no. 4, part 2/4, S425, 1997 – Proceedings of 3-rd International Confer-

ence on Functional Mapping of the Human Brain, Copenhagen, May, 1997.

R.L. Cannon, R. Dave, J.C. Bezdek, and M. Trivedi, Segmentation of a the-

matic mapper image using fuzzy c-means clustering algorithm, IEEE Trans

Geosci Remote Sens 24 (1986), 400–408.

V.V. Chamundeeswari, D. Singh, and K. Singh, Unsupervised land cover 

classification of SAR images by contour tracing, Proc. of IEEE International

Geoscience and Remote Sensing Symposium (IGARSS, 2007), Barcelona,

Spain, July, 23–28, 2007, pp. 547–550.

C.-C.T. Chen and D.A. Landgrebe, A spectral design system for the HIRIS/ 

MODIS era, IEEE Trans Geosci Remote Sens 27 (1989), 681–686.

D. Comaniciu and P. Meer, Mean shift: A robust approach toward feature

space analysis, IEEE Trans Pattern Anal Machine Intell 24 (2002), 603–619.

M.A. Friedl, D.K. McIver, J.C.F. Hodges, X.Y. Zhang, D. Muchoney, A.H.

Strahler, C.E. Woodcock, S. Gopal, A. Schneider, A. Cooper, A. Baccini, F.

Gao, C. Schaaf, Global land cover mapping from MODIS: Algorithms and

early results, Remote Sens Environ 83 (2002), 287–302.

J.H. Friedman, J.L. Bently, and R.A. Finkel, An algorithm for finding best

matches in logarithmic expected time, ACM Trans Math Software 3 (1977),

209–226.

V. Gandhi, J.M. Kang, S. Shekhar, J. Ju, E.D. Kolaczyk, and S. Gopal, Con-

text inclusive function evaluation: a case study with EM-based multi-scale

multi-granular image classification. Knowl. Inf. Syst. 21, 2 (Oct. 2009),

231–247. DOI5http://dx.doi.org/10.1007/s10115-009-0208-0

H. Ghassemian and P.A. Landgrebe, Object oriented feature extraction

method for image data compaction, IEEE Control Syst Mag 8 (1988), 42–48.

D. Guo, H. Xiong, V. Atluri, and N.R. Adam, Object discovery in high-resolution remote sensing images: a semantic perspective, Knowl Inf Syst

19 (2009), 211–233. DOI=http://dx.doi.org/10.1007/s10115-008-0160-4.

E.E. Hilbert, Cluster compression algorithm-a joint clustering data compres-

sion concept, JPL Publication, NASA, Technical Report, 1977

M.A. Jaffar, A. Hussain, and A.M. Mirza, Fuzzy entropy based optimization

of clusters for the segmentation of lungs in CT scanned images. Knowl. Inf.

Syst. 24, 1 (Jul. 2010), 91-111. DOI=http://dx.doi.org/10.1007/s10115-009-

0225-z.

A.K. Jain, M.N. Murthy, and P.J. Flynn, Data clustering: A review, ACM

Comput Rev 31, 3 (Sep. 1999), 264–323. DOI=http://doi.acm.org/10.1145/ 

331499.331504.

I. Jolliffe, Principal component analysis, Springer Series in Statistics, Eng-

land, 1986.

R. Kauth, A. Pentland, and G. Thomas, BLOB: An unsupervised clustering

approach to spatial preprocessing of MSS imagery, Proceedings of 11th Int.

Symp. Remote Sensing of the Environment, Ann Arbor, Mich, 1977, 1309– 

1317.

A. Marcal and L. Castro, Hierarchical clustering of multispectral images

using combined spectral and spatial criteria, IEEE Geosci Remote Sens Lett

2 (2005), 59–63.

U. Maulik and S. Bandyopadhyay, Performance evaluation of some cluster-

ing algorithms and validity indices, IEEE Trans Pattern Anal Machine Intell

24 (2002), 1650–1654.

U. Maulik and S. Bandyopadhyay, Fuzzy partitioning using a real-coded

variable-length genetic algorithm for pixel classification, IEEE Trans Geosci

Remote Sens 41 (2003), 1075–1081.

D.M. Mount and S. Arya, ANN: A library for approximate nearest neighbor 

searching, 2005, Available at:http://www.cs.umd.edu/ $mount/ANN.

S.K. Pal, S. Bandyopadhyay, and C.A. Murthy, Genetic classifiers for 

remotely sensed images: Comparison with standard methods, Int J RemoteSens 22 (2001), 2545–2569.

M.L. Pugh and A.M. Waxman, Classification of spectrally-similar land

cover using multi-spectral neural image fusion and the fuzzy ARTMAP neu-

ral classifier, Proc. of IEEE International Geoscience and Remote Sensing

Symposium (IGARSS, 2006), Denver, Colorado, July 31, 2006–Aug. 4,

2006, pp. 1808–1811.

J.A. Richards, Remote sensing digital image analysis: An introduction,

Springer-Verlag, New York, 1993.

S. Saha and S. Bandyopadhyay, MRI brain image segmentation by fuzzy

symmetry based genetic clustering technique, Proceedings of the 2007 IEEE

Congress on Evolutionary Computation (CEC’07), Singapore, 2007a, pp.

4417–4424.

S. Saha and S. Bandyopadhyay, A genetic clustering technique using a new

line symmetry based distance measure, Proceedings of Fifth InternationalConference on Advanced Computing and Communications (ADCOM’07),

IEEE Computer Society, Guwahati, India, 2007b, pp. 365–370.

S. Saha and S. Bandyopadhyay, Application of a new symmetry based clus-

ter validity index for satellite image segmentation, IEEE Geosci Remote

Sens Lett 5 (2008), 166–170.

S. Saha and S. Bandyopadhyay, MR brain image segmentation using a

multi-seed based automatic clustering technique, Fundam Inform 97 (2009),

199–214.

S. Saha and S. Bandyopadhyay, A new multiobjective clustering technique

based on the concepts of stability and symmetry, Knowl Inf Syst 23 (2010),

1–27.

S. Saha and S. Bandyopadhyay, Automatic MR brain image segmentation

using a multiseed based multiobjective clustering approach, Applied Intelli-

gence. DOI: 10.1007/s10489-010-0231-6.

S. Saha and S. Bandyopadhyay, On principle axis based line symmetry clus-

tering techniques, Memetic Computing. DOI: 10.1007/s12293-010-0049-0.

K. Sayood, Data compression in remote sensing applications, IEEE Geosci

Remote Sens Newslett 84 (1992), 7–15.

J. Suckling, T. Sigmundsson, K. Greenwood, and E. Bullmore, A modified

fuzzy clustering algorithm for operator independent brain tissue classifica-

tion of dual echo MR images, Magn Reson Imaging 17 (1999), 1065–1076.

Y. Tarabalka, J.A. Benediktsson, and J. Chanussot, Spectralspatial classifi-

cation of hyperspectral imagery based on partitional clustering techniques,

IEEE Trans Geosci Remote Sensi 47 (2009), 2973–2987.

M. Tyagi, F. Bovolo, A. Mehra, S. Chaudhuri, and L. Bruzzone, A context-

sensitive clustering technique based on graph-cut initialization and expecta-

tion-maximization algorithm, IEEE Geosci Remote Sens Lett 5 (2008), 21– 25.

C. Wemmert, A. Puissant, G. Forestier, and P. Gancarski, Multiresolution

remote sensing image clustering, IEEE Geosci Remote Sens Lett 6 (2009),

533–537.

X.L. Xie and G. Beni, A validity measure for fuzzy clustering, IEEE Trans

Pattern Anal Machine Intell 13 (1991), 841–847.

100 Vol. 21, 86–100 (2011)

Page 16: A New Line Symmetry Distance Based Automatic

8/2/2019 A New Line Symmetry Distance Based Automatic

http://slidepdf.com/reader/full/a-new-line-symmetry-distance-based-automatic 16/16

Copyright of International Journal of Imaging Systems & Technology is the property of John Wiley & Sons,

Inc. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright

holder's express written permission. However, users may print, download, or email articles for individual use.