11
Automated two-dimensional K-means clustering algorithm for unsupervised image segmentation q Intan Aidha Yusoff, Nor Ashidi Mat Isa , Khairunnisa Hasikin Imaging and Intelligent System Research Team (ISRT), School of Electrical & Electronic Engineering, Engineering Campus, Universiti Sains Malaysia, 14300 Nibong Tebal, Pulau Pinang, Malaysia article info Article history: Available online 11 December 2012 abstract This paper introduces the Automated Two-Dimensional K-Means (A2DKM) algorithm, a novel unsupervised clustering technique. The proposed technique differs from the conven- tional clustering techniques because it eliminates the need for users to determine the num- ber of clusters. In addition, A2DKM incorporates local and spatial information of the data into the clustering analysis. A2DKM is qualitatively and quantitatively compared with the conventional clustering algorithms, namely, the K-Means (KM), Fuzzy C-Means (FCM), Moving K-Means (MKM), and Adaptive Fuzzy K-Means (AFKM) algorithms. The A2DKM outperforms these algorithms by producing more homogeneous segmentation results. Ó 2012 Elsevier Ltd. All rights reserved. 1. Introduction The vast array of computer vision products in the market has caused a rise in the number of computer vision and digital image processing applications. Segmentation is one of the stages in image processing, wherein an image is simplified by seg- menting it into several meaningful regions. These segmented regions are essential for assisting tasks on image processing and analysis. Image segmentation in computer vision has applications in areas such as agricultural and food inspection [1–3], docu- ment processing [5], photography [5–7], and display or acquisition devices [8]. The most common segmentation approaches have a wide variety of applications, which include template matching [9], thresholding [10], region growing [11], and clustering [12,13]. The aforementioned approaches, especially the clustering algorithms, show the significance of image segmentation in the development of consumer electronic products. Thus, this pa- per aims to further contribute towards the application of clustering techniques in consumer electronics by introducing a new clustering algorithm. Clustering is the process wherein image pixels are grouped into clusters, such that cluster members have certain notable similarities. Several conventional clustering algorithms are available for implementation. Among which, the K-Means (KM), Fuzzy C-Means (FCM), and Moving K-Means (MKM) algorithms are the most widely used, having been continuously perfected over time. The KM clustering algorithm clusters data into a fixed number of clusters using the Euclidean distance based on the pix- el’s intensity value [14], whereas the FCM clustering algorithm allows data to become a member of more than one cluster but with different membership degrees [15]. The MKM clustering algorithm improves KM clustering by combining the fitness concept with the basic concept of the original algorithm [16]. One of the more recently developed clustering techniques is the Adaptive Fuzzy K-Means (AFKM) clustering algorithm [17]. The AFKM algorithm incorporates the conventional approaches of the KM, FCM, and MKM algorithms into an improved 0045-7906/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.compeleceng.2012.11.013 q Reviews processed and recommended for publication to Editor-in-Chief by Deputy Editor Dr. Ferat Sahin. Corresponding author. Tel.: +60 4 5996051; fax: +60 4 5941023. E-mail addresses: [email protected] (I.A. Yusoff), [email protected] (N.A. Mat Isa), [email protected] (K. Hasikin). Computers and Electrical Engineering 39 (2013) 907–917 Contents lists available at SciVerse ScienceDirect Computers and Electrical Engineering journal homepage: www.elsevier.com/locate/compeleceng

Automated two-dimensional K-means clustering algorithm for ... · Automated two-dimensional K-means clustering algorithm for unsupervised image segmentationq Intan Aidha Yusoff, Nor

Embed Size (px)

Citation preview

Page 1: Automated two-dimensional K-means clustering algorithm for ... · Automated two-dimensional K-means clustering algorithm for unsupervised image segmentationq Intan Aidha Yusoff, Nor

Computers and Electrical Engineering 39 (2013) 907–917

Contents lists available at SciVerse ScienceDirect

Computers and Electrical Engineering

journal homepage: www.elsevier .com/ locate/compeleceng

Automated two-dimensional K-means clustering algorithmfor unsupervised image segmentation q

Intan Aidha Yusoff, Nor Ashidi Mat Isa ⇑, Khairunnisa HasikinImaging and Intelligent System Research Team (ISRT), School of Electrical & Electronic Engineering, Engineering Campus, Universiti Sains Malaysia, 14300 NibongTebal, Pulau Pinang, Malaysia

a r t i c l e i n f o

Article history:Available online 11 December 2012

0045-7906/$ - see front matter � 2012 Elsevier Ltdhttp://dx.doi.org/10.1016/j.compeleceng.2012.11.01

q Reviews processed and recommended for public⇑ Corresponding author. Tel.: +60 4 5996051; fax

E-mail addresses: [email protected] (I.A. Y

a b s t r a c t

This paper introduces the Automated Two-Dimensional K-Means (A2DKM) algorithm, anovel unsupervised clustering technique. The proposed technique differs from the conven-tional clustering techniques because it eliminates the need for users to determine the num-ber of clusters. In addition, A2DKM incorporates local and spatial information of the datainto the clustering analysis. A2DKM is qualitatively and quantitatively compared withthe conventional clustering algorithms, namely, the K-Means (KM), Fuzzy C-Means(FCM), Moving K-Means (MKM), and Adaptive Fuzzy K-Means (AFKM) algorithms. TheA2DKM outperforms these algorithms by producing more homogeneous segmentationresults.

� 2012 Elsevier Ltd. All rights reserved.

1. Introduction

The vast array of computer vision products in the market has caused a rise in the number of computer vision and digitalimage processing applications. Segmentation is one of the stages in image processing, wherein an image is simplified by seg-menting it into several meaningful regions. These segmented regions are essential for assisting tasks on image processingand analysis.

Image segmentation in computer vision has applications in areas such as agricultural and food inspection [1–3], docu-ment processing [5], photography [5–7], and display or acquisition devices [8].

The most common segmentation approaches have a wide variety of applications, which include template matching [9],thresholding [10], region growing [11], and clustering [12,13]. The aforementioned approaches, especially the clusteringalgorithms, show the significance of image segmentation in the development of consumer electronic products. Thus, this pa-per aims to further contribute towards the application of clustering techniques in consumer electronics by introducing a newclustering algorithm. Clustering is the process wherein image pixels are grouped into clusters, such that cluster membershave certain notable similarities. Several conventional clustering algorithms are available for implementation. Among which,the K-Means (KM), Fuzzy C-Means (FCM), and Moving K-Means (MKM) algorithms are the most widely used, having beencontinuously perfected over time.

The KM clustering algorithm clusters data into a fixed number of clusters using the Euclidean distance based on the pix-el’s intensity value [14], whereas the FCM clustering algorithm allows data to become a member of more than one cluster butwith different membership degrees [15]. The MKM clustering algorithm improves KM clustering by combining the fitnessconcept with the basic concept of the original algorithm [16].

One of the more recently developed clustering techniques is the Adaptive Fuzzy K-Means (AFKM) clustering algorithm[17]. The AFKM algorithm incorporates the conventional approaches of the KM, FCM, and MKM algorithms into an improved

. All rights reserved.3

ation to Editor-in-Chief by Deputy Editor Dr. Ferat Sahin.: +60 4 5941023.usoff), [email protected] (N.A. Mat Isa), [email protected] (K. Hasikin).

Page 2: Automated two-dimensional K-means clustering algorithm for ... · Automated two-dimensional K-means clustering algorithm for unsupervised image segmentationq Intan Aidha Yusoff, Nor

908 I.A. Yusoff et al. / Computers and Electrical Engineering 39 (2013) 907–917

clustering technique. The KM and MKM concepts are used to assign the data to its nearest center, whereas the FCM concept isincluded in this approach to allow the data to belong to two or more clusters [17].

The conventional clustering algorithms (i.e., the KM, FCM, and MKM clustering algorithms) and the AFKM clustering algo-rithm have the following common limitations.

� The number of clusters for segmentation needs to be estimated and defined by the user at the initial stage of the segmen-tation process.� Clustering is performed based on a single local parameter, namely, the intensity value of a pixel, thereby discarding the

correlative effects of spatial parameters on a pixel, which leads to information loss during segmentation.

Therefore, new clustering algorithms that do not depend on a user-defined number of clusters are needed to overcomethese limitations. Furthermore, the new clustering algorithms should incorporate the spatial information of data into itsdecision making process. The prior determination of the number of clusters requires an extra step in preparing the imagesfor clustering. As for example, given that different users may define a different number of clusters for the same image. Nospecific method can guarantee the optimum number of clusters for that particular image. Thus, this manual proceduremay be time-consuming and subjective.

Numerous studies have been conducted on developing unsupervised segmentation algorithms to help ease the imple-mentation of clustering applications over the years [18–20]. In addition to the development and enhancement of clusteringalgorithm performance, several researchers have integrated spatial characteristics during image clustering. This approach isintended to reduce information loss and noisy pixel interference during segmentation [21–23]. The intensity or local averageintensity of a pixel (as the spatial parameter) commonly used by the researchers has been adapted into the FCM clusteringalgorithm.

Therefore, the Automated Two-Dimensional K-Means (A2DKM) clustering algorithm is developed in this study to over-come the two aforementioned limitations. The main motivation of the new clustering technique is to build an unsupervisedclustering algorithm which automatically determines the optimum number of clusters for a noiseless image by using thelocal and spatial characteristics of the input image. This technique will eliminate the need to declare the number of clustersprior to clustering, especially for images where the optimum number of clusters is unknown. Conventionally, the concept ofthe intensity median value in a specific N � N window of an image is adopted in noise filtering techniques to reduce un-wanted noise. The proposed A2DKM algorithm in this study is originally designed to segment noiseless images, which is con-trary to the removal of unwanted noise in the image that is performed by noise filtering techniques. The concept ofsegmentation is specifically employed to remove insignificant details which occur in the regions to be segmented. The insig-nificant details refer to any edges, small regions, among others, which are ideally not segmented as independent regions.Such details include the wave patterns on sea or lake water surfaces, brick edges on building walls, and the different colorsof tree leaves. These details exist in the form of a sudden change in the gradients of an intensity level. Thus, the concept ofthe intensity median value in a specific N � N window (i.e., not solely dependent on the individual intensity level) that isemployed in this study can possibly reduce and/or remove the effects of such details during segmentation. The removalof these insignificant details believed to provide better segmentation results by producing more homogeneous characteris-tics in their respective regions.

The rest of this paper is organized as follows: The proposed clustering algorithm is explained in Section 2. Section 3 illus-trates the proposed algorithm as it is implemented on a standard image. Section 4 explains the methods used for data anal-ysis. The results obtained from the proposed A2DKM algorithm are analyzed in Section 5 and its performance is evaluatedand compared with that of several selected conventional and relatively new clustering algorithms. Finally, Section 6 con-cludes the work presented in this paper.

2. Proposed approach

In segmenting an 8-bit gray scale digital image with 256 gray levels within the interval [0,255] using the proposedA2DKM algorithm, let p(x,y) be the pixel at location (x,y) with an intensity p (where x = 1, 2, 3, . . ., R and y = 1, 2, 3, . . ., S, withR and S as the number of columns and rows of the image, respectively). The initial values for all clusters are predetermined,and the number of clusters is initially set to two. All the data are then assigned to the nearest center based on the Euclideandistance.

Generally, the conventional KM clustering algorithm will minimize the following objective function for partitioning adataset fv tgN

t¼1 into the kth center (ck) with a predefined number of clusters (nc) [24]:

J ¼Xnc

k¼1

XN

t¼1

kv t � ckk2 ð1Þ

where k�k stands for a distance measurement that is normally taken to be the Euclidean norm. The proposed A2DKM clus-tering algorithm differs from the KM clustering algorithm by including the contribution of the local spatial information ofdata. Thus, vt and ck in (1) are represented by (2) and (3), respectively:

Page 3: Automated two-dimensional K-means clustering algorithm for ... · Automated two-dimensional K-means clustering algorithm for unsupervised image segmentationq Intan Aidha Yusoff, Nor

Fig. 1. Graphical representations of the data v t ¼ ðv tINT ;v tMEDÞ centered at ck ¼ ðckINT ; ckMEDÞ with the number of clusters = 2.

Fig. 2. Visualization of two quadrant axes built for two initial clusters that are centered at (c1INT, c1MED) and (c2INT, c2MED).

I.A. Yusoff et al. / Computers and Electrical Engineering 39 (2013) 907–917 909

v t ¼ ðv tINT ;v tMEDÞ ð2Þ

ck ¼ ðckINT ; ckMEDÞ ð3Þ

where ckINT and ckMED are defined by (4) and (5), respectively:

ckINT ¼1nk

Xt2ck

v tINT

!ð4Þ

ckMED ¼1nk

Xt2ck

v tMED

!ð5Þ

In (2)–(5), vtINT is the intensity (INT) dimension of tth data, vtMED is the intensity median (MED) dimension of tth data, and nk

is the number of pixels assigned to kth center. The graphical two-dimensional annotation of data with two obvious separatedregions, along with the aforementioned parameters, is illustrated in Fig. 1. This dataset is segmented into two clusters(nc ¼ 2).

In the proposed A2DKM algorithm, each region of the kth cluster is split into four quadrants (q = 1, 2, 3, 4) by setting theorigin at (ckINT, ckMED), as shown in Fig. 2. The total Euclidean distance of all members in each quadrant of ck is calculatedusing:

dkq ¼Xi2ckq

kv i � ckqk for q ¼ 1;2;3;4 ð6Þ

where ckq is the qth quadrant of the kth cluster, and vi is the ith member of ckq. The quadrant with the largest total Euclideandistance is then determined and denoted as dkqMAX. The total Euclidean distances of all four quadrants are compared with thedkqMAX to fulfill the following condition:

Page 4: Automated two-dimensional K-means clustering algorithm for ... · Automated two-dimensional K-means clustering algorithm for unsupervised image segmentationq Intan Aidha Yusoff, Nor

1 For

910 I.A. Yusoff et al. / Computers and Electrical Engineering 39 (2013) 907–917

dkqMAX < 0:5X4

q¼1

dkq

!ð7Þ

If (7) is not fulfilled, then the kth center is split into two different clusters. The first cluster consists of all the members of thelargest quadrant, while the other three quadrants form the second cluster. The center of each cluster is then recalculatedbased on (3)–(5) because changes are necessary when members belong to newly-formed clusters. The processes of splittingthe clusters into quadrants and forming new clusters are repeated until (7) is fulfilled. After (7) is fulfilled, no further forma-tion of new clusters is necessary. The last number of clusters that was calculated is set as the final (or optimum) cluster num-ber. This process could automatically determine the number of clusters; clustering starts with two clusters as the initialvalue because a data set must be partitioned into at least two groups with different cluster centers for the clustering analysisto have meaning [25].

3. Illustration of implementation

To illustrate the implementation of the A2DKM clustering algorithm, this study has applied the proposed algorithm on animage called Capsicum with the size of 100 � 100, as shown in Fig. 3a. The intensity median (for an N � N window where N isan odd number that is set to 3 in this study) versus the intensity graph is plotted by the scatter graph shown in Fig. 3. Ini-tially, the Capsicum image is clustered into two regions with the center values c1 and c2 set to (77,82) and (155,151), respec-tively. Using the proposed algorithm, the distance between each data point to all centers is calculated using a two-dimensional (2D) Euclidean distance equation, which is then assigned to the closest center. The output of the assigning pro-cess is visualized using the pixel distribution graph in Fig. 3b.

Four quadrants are subsequently built in each region, which are centered at c1 and c2. The data in each region belongs toone of the quadrants, as shown in Fig. 3c. The total 2D Euclidean distance for each quadrant to its center is then calculated.This method is used to decide the possibility of forming new clusters. In this example, Eq. (7) is not satisfied by centers, c1

and c2. The red1 arrows in Fig. 3d show that d14MAX > 0:5P4

q¼1d1q

� �(because quadrant 4 of c1 produces the maximum total

Euclidean distance) for c1, whereas d22MAX > 0:5P4

q¼1d2q

� �for c2 (because quadrant 2 of c2 produces the maximum total Euclid-

ean distance).For c1, quadrant 4 forms an individual cluster, whereas quadrants 1, 2, and 3 form the second cluster. For c2, quadrant 2

forms one cluster, and the other three quadrants form the second cluster. Thus, the number of clusters increases from two tofour, as depicted in Fig. 3d. The process is repeated until (7) is fulfilled. As shown in Fig. 3e, the final number of clusters ob-tained for Capsicum is 8. The resultant image obtained for the image labeled Capsicum (with nc = 8) is shown in Fig. 3f. In eachgraph in Fig. 3b–e, different symbols represent different clusters. For the quadrant formation process, symbols colored withgray, red, blue, and green represent data belonging to quadrants 1, 2, 3, and 4, respectively. The red circles represent the ini-tial centers, whereas the yellow circles represent the newly-formed centers.

4. Data analysis

The objective of this study is to develop a clustering algorithm that is able to automatically segment any given image intoits optimum segmentation regions without the initial determination of the number of clusters. The proposed algorithm istested against synthetic images as the ground truth, wherein the number of clusters to be segmented is known. This verifi-cation is important to prove that the A2DKM algorithm is able to define the correct optimum number of clusters for an im-age. The method of determining the number of clusters for standard images is very subjective. Likewise, this determination isproven to be difficult. The ten synthetic images used in this study are 100 � 100 images that were manually generated usingMicrosoft� Paint (version 5.1) in real colors. Furthermore, these synthetic images are converted into 8-bit gray scale images(with intervals from 0 to 255) using MATLAB� (version 7.5; R2007b).

To analyze the performance of the A2DKM clustering algorithm in consumer-related applications (i.e., digital cameras,CCD cameras, and photo editing software), 100 gray-scale standard consumer images have been tested using the proposedalgorithm. The same images are likewise processed using the conventional KM, FCM, and MKM clustering algorithms usingthe same number of clusters that was calculated using the A2DKM clustering algorithm. The performance of one of the latestclustering algorithms, namely, the AFKM is included in the comparison by using the same approach.

The qualitative results (i.e., the results which are visible to the users) of these test images are crucial for any image pro-cessing tool. Thus, we have chosen to first qualitatively evaluate the images and then subsequently strengthen the findingsby conducting a quantitative analysis. Qualitatively, high-quality segmented images are homogeneous and have uniform re-gions, with the minimum appearance of small isolated regions. In addition, adjacent regions of the segmented images shouldhave significantly different values for uniform characteristics to produce a desirable clustering result [26]. In addition to thequalitative findings, four parameters are employed as the quantitative analysis benchmarks, namely, F(I), F0(I), Q(I), and theexecution time t.

interpretation of color in Fig. 3, the reader is referred to the web version of this article.

Page 5: Automated two-dimensional K-means clustering algorithm for ... · Automated two-dimensional K-means clustering algorithm for unsupervised image segmentationq Intan Aidha Yusoff, Nor

(a) (b) Initialization of centers, c1 and c2

(c) Quadrant formation of initial clusters (d) Cluster updated from nc = 2 to nc = 4

(e) Quadrant formation of nc = 8 (f) A2DKM segmentation result of image Capsicum

Fig. 3. Implementation of the A2DKM algorithm.

I.A. Yusoff et al. / Computers and Electrical Engineering 39 (2013) 907–917 911

Page 6: Automated two-dimensional K-means clustering algorithm for ... · Automated two-dimensional K-means clustering algorithm for unsupervised image segmentationq Intan Aidha Yusoff, Nor

912 I.A. Yusoff et al. / Computers and Electrical Engineering 39 (2013) 907–917

F(I) was designed by Liu et al. in 1994 [27] as:

Table 1Automa

Imag

SYN1

SYN2

SYN3

SYN4

SYN5

FðIÞ ¼ 11000ðN �MÞ

ffiffiffiRp XR

i¼1

eiffiffiffiffiffiAip ð8Þ

where I is the segmented image, N �M is the image size, R is the number of regions in the clustered image, Ai is the area, andei is the Euclidean distance between the gray-level color vectors of the pixels of the ith region and the color vector attributedto region i in the segmented image.

Borsotti et al. [28] revised the F(I) function in 1998 and proposed an enhanced version of F(I), that is, F0(I) and Q(I):

FðIÞ ¼ 110;000ðN �MÞ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXMax

A¼1

½RðAÞ�1þ1=A

vuut �XR

i¼1

e2iffiffiffiffiffiAip ð9Þ

QðIÞ ¼ 110;000ðN �MÞ

ffiffiffiRp�XR

i¼1

e2i

1þ log Aiþ RðAiÞ

Ai

� �2" #

ð10Þ

where R(A) is the number of regions with the area A in (9), and Max is the area of the largest region in the segmented image.R(Ai) is the number of regions with an area equal to Ai in (10).

The three quantitative analysis functions are used to evaluate the segmentation results of an image, with as much cor-respondence as possible to the visual judgment. As observed from the functions, less homogeneous and uniform regionsof segmented images with too many isolated small regions or noise yield higher values of F(I), F0(I), and Q(I). The F0(I) penal-ized the segmentation with too many small regions, whereas Q(I) penalized the segmentation with small regions and regionswith large color errors. Thus, the difference between these three functions is the level of its resultant values that correspondto the visual judgment of segmented images. Although [28] proved that F0(I) and Q(I) closely correspond to the visual

ted segmentation results using the proposed A2DKM for synthetic images.

e Original A2DKM Image Original A2DKM

SYN6

Cluster no: 5 Cluster no: 5 Cluster no: 3 Cluster no: 3SYN7

Cluster no: 4 Cluster no: 4 Cluster no: 3 Cluster no: 3SYN8

Cluster no: 3 Cluster no: 3 Cluster no: 4 Cluster no: 4SYN9

Cluster no: 4 Cluster no: 4 Cluster no: 3 Cluster no: 3SYN10

Cluster no: 3 Cluster no: 3 Cluster no: 3 Cluster no: 3

Page 7: Automated two-dimensional K-means clustering algorithm for ... · Automated two-dimensional K-means clustering algorithm for unsupervised image segmentationq Intan Aidha Yusoff, Nor

I.A. Yusoff et al. / Computers and Electrical Engineering 39 (2013) 907–917 913

judgment, F(I) is still included in the quantitative analysis for a broader perspective on segmentation. In this analysis, thelower values of the three functions prove that the clustering algorithm produced more homogeneous, uniform, and smoothsegmentation results.

Execution time is used in this study as the fourth parameter in the quantitative analysis. The execution time analysiscould be useful for evaluating the complexity of the proposed algorithm. A simpler algorithm requires shorter processingtime, whereas a more complex algorithm has a longer processing time. This analysis penalizes the algorithms that requirerepetitive procedures.

Moreover, we included memory consumption as one of the criteria for evaluation to further evaluate the clustering algo-rithms. An algorithm that has minimum computation consumption by optimizing its memory usage is desirable.

5. Results and discussion

5.1. Qualitative analysis

The first part of the qualitative analysis is to prove that the proposed A2DKM clustering algorithm is able to give the opti-mum number of clusters for the clustering process. The proposed A2DKM is tested against synthetic images, and is able toaccurately estimate the number of clusters that is suitable for each image. The A2DKM clustering processes for 10 syntheticimages are summarized in Table 1, with the number of expected clusters and the resultant number of clusters from A2DKM.

The corresponding results for ten synthetic images are presented in Table 1. Through this testing method, the proposedA2DKM is proven to be able to automatically and successfully determine the exact number of clusters of an image that is tobe segmented for optimal segmentation quality.

The objective of the qualitative analysis on standard consumer images is to virtually evaluate the appearances of the clus-tering results. The performance of the proposed A2DKM clustering algorithm is compared with that of the KM, FCM, MKM,and AFKM clustering algorithms. Out of the 100 gray-scale standard consumer images that were tested, three images (Lake,House, and Hut) are chosen for detailed qualitative evaluation. The number of clusters for Lake, House, and Hut images are 5,7, and 4, respectively, as obtained by the proposed algorithm. These numbers of clusters are then used to set the number ofclusters for the KM, FCM, MKM, and AFKM clustering algorithms.

(a) (b) No. of regions = 4012, F(I) = 384.90, F’(I) = 39.13, Q(I) = 1043.70, t = 0.96 s

(c) No. of regions = 3249, F(I) = 496.20, F’(I) = 50.56,

Q(I) = 982.90, t = 2.10 s

(d) No. of regions = 3986, F(I) = 420.10, F’(I) = 42.72,

Q(I) = 1086.60, t = 0.91 s

(e) No. of regions = 4032, F(I) = 379.00, F’(I) = 38.49,

Q(I) = 1043.90, t = 4.27 s

(f) No. of regions = 2337, F(I) = 390.50, F’(I) = 40.07,

Q(I) = 591.60, t = 5.96 s

Fig. 4. The Lake image. (a) Original image as well as the segmentation results using the (b) KM, (c) FCM, (d) MKM, (e) AFKM, and (f) A2DKM algorithms. Thebest results are in boldface.

Page 8: Automated two-dimensional K-means clustering algorithm for ... · Automated two-dimensional K-means clustering algorithm for unsupervised image segmentationq Intan Aidha Yusoff, Nor

914 I.A. Yusoff et al. / Computers and Electrical Engineering 39 (2013) 907–917

The original images and the segmented results that were obtained using the KM, FCM, MKM, AFKM, and A2DKM cluster-ing algorithms for the images labeled Lake, House, and Hut, respectively, are displayed in Figs. 4–6. The significant differencesamong the clustering algorithms in terms of their performance are highlighted by arrows.

The Lake image in Fig. 4 is homogeneously segmented with smoother areas of the lake surface (without the wave patternon its water surface) in the resultant image of the A2DKM clustering algorithm. The tree bark on the right corner and the treeleaves on the left corner of the image are likewise more homogeneous according to the image that was segmented using theA2DKM clustering algorithm as compared with those of the other clustering algorithms. All the other clustering algorithmsinsignificantly segment these areas, with several small isolated regions in the aforementioned areas.

The proposed A2DKM clustering algorithm produces the most uniform and homogeneous wall area for the House image inFig. 5 (i.e., the edges of bricks on the wall area are successfully removed). The number of small isolated regions in this area ismuch less as compared with those segmented using the KM, FCM, MKM, and AFKM clustering algorithms. Furthermore, theroof and its shadowy areas are more uniform in the resultant House image that was segmented using the proposed clusteringalgorithm than in the images segmented by the rest of the clustering algorithms, such that the insignificant details (i.e., edgesof each piece of roof) and regions are successfully removed.

Segmenting the Hut image using the proposed A2DKM produces a similar high level of performance, as qualitatively seenin Fig. 6. A big area of bushes and the roof of the hut in the image are successfully segmented into solid and less heteroge-neous regions by the A2DKM clustering algorithm. By contrast, the KM, FCM, MKM, and AFKM clustering algorithms havesegmented the region into many small regions, which are insignificant for image segmentation. The A2DKM clustering algo-rithm successfully segments the more homogeneous bush regions at the background by ignoring the different intensity lev-els of the leaves. The FCM clustering algorithm is able to segment the roof area into a single homogeneous region. However,the algorithm is unable to segment the bush regions into a more uniform region, thereby indicating a degree of inaccuratesegmentation performance.

Therefore, from the detailed analysis of the three standard consumer images, the proposed A2DKM clustering algorithm isable to automatically determine the optimal number of clusters for segmentation. In addition, the proposed A2DKM cluster-ing algorithm qualitatively performs better as compared with the KM, FCM, MKM, and AFKM clustering algorithms by usingthe number of clusters which have been automatically obtained at the earlier stage of segmentation. Furthermore, this qual-itative analysis proves that the local and spatial characteristics that were employed in the proposed A2DKM clustering algo-rithm are highly capable of reducing the segmentation of insignificant regions, thereby ensuring more homogeneoussegmented regions.

(a) (b) No. of regions = 1651,

F(I) = 143.40, F’(I) = 14.68, Q(I) = 304.30, t = 0.98 s

(c) No. of regions = 1888, F(I) = 190.70, F’(I) = 19.50,

Q(I) = 351.60, t = 3.89 s

(d) No. of regions = 1426, F(I) = 155.10, F’(I) = 15.96,

Q(I) = 314.80, t = 0.93 s

(e) No. of regions = 1730, F(I) = 143.10, F’(I) = 14.65, Q(I) = 306.40, t = 3.15 s

(f) No. of regions = 925, F(I) = 134.10, F’(I) = 13.90,

Q(I) = 266.90, t = 3.04 s

Fig. 5. The House image. (a) Original image as well as the segmentation results using the (b) KM, (c) FCM, (d) MKM, (e) AFKM, and (f) A2DKM algorithms.The best results are in boldface.

Page 9: Automated two-dimensional K-means clustering algorithm for ... · Automated two-dimensional K-means clustering algorithm for unsupervised image segmentationq Intan Aidha Yusoff, Nor

(a) (b) No. of regions = 2712, F(I) = 1323.60, F’(I) = 134.48,

Q(I) = 2989.60, t = 0.93 s

(c) No. of regions = 3648, F(I) = 1307.30, F’(I) = 132.55,

Q(I) = 3420.00, t = 2.17 s

(d) No. of regions = 2703, F(I) = 1691.90, F’(I) = 171.82,

Q(I) = 4437.60, t = 0.93 s

(e) No. of regions = 2712, F(I) = 1323.60, F’(I) = 63.65,

Q(I) = 2989.60, t = 3.15 s

(f) No. of regions = 1858, F(I ) = 603.90, F’(I) = 61.69,

Q(I) = 1022.70, t = 3.04 s

Fig. 6. The Hut image. (a) Original image as well as the segmentation results using the (b) KM, (c) FCM, (d) MKM, (e) AFKM, and (f) A2DKM algorithms. Thebest results are in boldface.

I.A. Yusoff et al. / Computers and Electrical Engineering 39 (2013) 907–917 915

5.2. Quantitative analysis

The corresponding quantitative results for the Lake, House, and Hut images are presented below each image in Figs. 4–6,respectively. The best results are highlighted in bold. The quantitative analysis for the three selected images (i.e., Lake, House,and Hut) in those figures showed that the proposed A2DKM clustering algorithm gives the lowest readings of F(I) and F0(I) fortwo out of three images and gives the best readings of Q(I) for all three images. The AFKM clustering algorithm gives betterF(I) and F0(I) values than the proposed A2DKM clustering algorithm for the Lake image, with a less than 2% difference in theresults. This difference is comparable because the qualitative result of the proposed A2DKM clustering algorithm for Lake ismuch more homogeneous than that of the AFKM clustering algorithm.

In addition, the smallest number of regions is obtained by the proposed A2DKM algorithm for all three images, therebyproving its ability to produce more homogeneous segmented regions with less unwanted and scattered small regions.

The execution time analysis showed that the proposed A2DKM clustering algorithm takes a longer computational timethan the KM, FCM, MKM, and AFKM clustering algorithms, with the maximum time interval of 5 s for the House image. Thisresult is due to the fact that the A2DKM clustering algorithm is bound to be more complex compared to the other clusteringalgorithms to be able to automatically determine the number of optimal clusters for an image. Although the KM, FCM, MKM,and AFKM clustering algorithms have the shortest execution time, the number of clusters still needs to be predetermined bythe user in these algorithms.

To further verify the overall qualitative performance of the proposed A2DKM clustering algorithm as compared with theKM, FCM, MKM, and AFKM clustering algorithms, the qualitative readings of 100 gray-scale standard consumer images arecollected. The average values for all quantitative tests are tabulated in Table 2. In two out of three qualitative tests (i.e., F(I)and Q(I)), the proposed A2DKM clustering algorithm produces the best results as compared with its KM, FCM, MKM, andAFKM clustering counterparts.

From the observations regarding the execution time, the proposed A2DKM algorithm is not the best performer. However,the measurement of execution time t did not include the time it takes a user to define the number of clusters. In the imple-mentation of the A2DKM clustering algorithm, the process of finding the number of clusters as well as the clustering of pixelsis a continuous and repetitive process, until the optimum number of clusters is obtained. Thus, a longer processing time isinevitable because it has to include the time taken for the algorithm to obtain the number of clusters, as well as the timerequired for the actual clustering to occur. The ability to automatically determine the number of clusters by applying theA2DKM clustering algorithm can compensate for the time loss.

Page 10: Automated two-dimensional K-means clustering algorithm for ... · Automated two-dimensional K-means clustering algorithm for unsupervised image segmentationq Intan Aidha Yusoff, Nor

Table 2Average quantitative results for 100 standard consumer images. The best results are made bold.

Algorithm KM FCM MKM AFKM A2DKMParameter

F(I) (�1.0e+3) 0.4729 0.7893 0.6526 0.5614 0.4726F’(I) (�1.0e+2) 0.4816 0.8030 0.6647 0.5657 0.5241Q(I) (�1.0e+4) 0.3932 0.5035 0.3321 0.3191 0.1490t (s) 1.02 3.71 0.93 4.78 6.70Memory requirement (MB) 43.468 46.664 45.354 67.100 66.748

916 I.A. Yusoff et al. / Computers and Electrical Engineering 39 (2013) 907–917

As highlighted in Table 2, the proposed A2DKM requires more memory consumption as compared with the conventionalKM, FCM, and MKM clustering algorithms because of its continuous and repetitive process. However, the proposed A2DKMwould require less memory consumption as compared with the AFKM algorithm.

Thus, both the qualitative and quantitative analyses prove that the proposed A2DKM clustering algorithm has relativelybetter performance as compared with the KM, FCM, MKM, and AFKM clustering algorithms. Despite its complexity and high-er memory requirement, the results prove that the A2DKM algorithm produces better segmented images when comparedwith other clustering algorithms. The proposed A2DKM requires an extra processing time which is still within the acceptablerange for its reliable application.

6. Conclusion

A novel unsupervised clustering algorithm, namely, the Automated Two-Dimensional K-Means (A2DKM) has been pro-posed and introduced in this study. To evaluate its performance for consumer digital imaging applications, the A2DKM clus-tering algorithm is qualitatively and quantitatively tested using synthetics and standard consumer images. Based on theresults, the proposed A2DKM clustering algorithm is able to automatically determine the optimal number of clusters for im-age segmentation. The A2DKM algorithm qualitatively and quantitatively demonstrated better performance as comparedwith the conventional KM, FCM, and MKM clustering algorithms. By further comparing its performance against one of thelatest clustering algorithms (the AFKM clustering algorithm), the A2DKM algorithm shows similar results. Qualitatively,the images produced by the proposed A2DKM algorithm are more homogeneous and uniform, with smoother segmentedregions. Quantitatively, the A2DKM clustering algorithm produces lower F(I) and Q(I) values and comparable readings ofF0(I), which are desired in image segmentation. Execution times of the proposed algorithm are still comparable to those ofthe KM, FCM, MKM, and AFKM clustering algorithms because the A2DKM clustering algorithm completely omits the needto declare the number of clusters during the initial stage of processing. Finally, the memory requirement of A2DKM is provento be smaller than the amount required by the AFKM algorithm. Therefore, the proposed novel A2DKM clustering algorithmis able to automatically determine the optimal number of clusters for segmentation. The A2DKM clustering algorithm per-forms better than the KM, FCM, MKM, and the latest AFKM clustering algorithms in terms of image quality, which is furtherproven in the quantitative analysis. Given its excellent capabilities, despite applying it to the hard clustering algorithm (i.e.KM algorithm) the concept employed in the proposed A2DKM could be further applied in the soft clustering algorithm suchas the FCM. Furthermore, the proposed A2DKM algorithm could be applied to color images, where the robustness of the algo-rithm can be increased, thereby further supporting its high efficiency as an unsupervised segmentation technique.

Acknowledgement

This work was partially supported by the Universiti Sains Malaysia Short Term Grant entitled ‘‘Fuzzy Logic Based Segmen-tation Technique for Determination of Breast Tumor on Mammogram Image’’.

References

[1] Brosnan T, Sun D-W. Inspection and grading of agricultural and food products by computer vision systems – a review. Comput Electron Agric2002;36(2–3):193–213.

[2] Leemans V, Magein H, Destain M-F. Defects segmentation on ‘Golden Delicious’ apples by using colour machine vision. Comput Electron Agric1998;20(2):117–/130.

[3] Liming X, Yanchao Z. Automated strawberry grading system based on image processing. Comput Electron Agric 2010;71(1):S32–9.[5] Xu B, Zhuang Y, Tang H, Zhang L. Object-based multilevel contrast stretching method for image enhancement. IEEE Trans Consum Electron

2010;56(3):1746–54.[6] Lee E, Kang W, Kim S, Paik J. Color shift model-based image enhancement for digital multifocusing based on a multiple color-filter aperture camera.

IEEE Trans Consum Electron 2010;56(2):317–23.[7] Bhukhanwala SA, Ramabadran TV. Automated global enhancement of digitized photographs. IEEE Trans Consum Electron 1994;40(1):1–10.[8] Lee JW, Park R-H, Chang S. Local tone mapping using the K-means algorithm and automatic gamma setting. IEEE Trans Consum Electron

2011;57(1):209–17.[9] Warfield SK, Michael K, Jolesz FA, Ron K. Adaptive, template moderated, spatially varying statistical classification. Med Image Anal 2000;4(1):43–55.[10] Cheriet M, Said JN, Suen CY. A recursive thresholding technique for image segmentation. IEEE Trans Image Process 1998;7(6):918–21.

[11] Seunghwan Y, Rae-Hong P. Red-eye detection and correction using inpainting in digital photographs. IEEE Trans Consum Electron2009;55(3):1006–14.

Page 11: Automated two-dimensional K-means clustering algorithm for ... · Automated two-dimensional K-means clustering algorithm for unsupervised image segmentationq Intan Aidha Yusoff, Nor

I.A. Yusoff et al. / Computers and Electrical Engineering 39 (2013) 907–917 917

[12] Xiaopeng Y, Deyi Z, Yan Z. A new clustering algorithm based on distance and density. In: Proceedings of international conference on services systemsand services management, ICSSSM ’05, vol. 2; 2005. p. 1016–21.

[13] Dave RN, Krishnapuram R. Robust clustering methods: a unified view. IEEE Trans Fuzzy Syst 1997;5(2):270–93.[14] Verma NK, Roy A, Vasikarla S. Medical image segmentation using improved mountain clustering technique version-2. In: 7th International conference

on information technology: new generations (ITNG); 2010. p. 156–61.[15] Wang Z. Comparison of four kinds of fuzzy C-means clustering methods. In: Third international symposium on information processing (ISIP); 2010. p.

563–6.[16] Mashor MY. Hybrid training algorithm for RBF network. Int J Comput, Internet Manage 2000;8(2):50–65.[17] Sulaiman SN, Isa NAM. Adaptive fuzzy-K-means clustering algorithm for image segmentation. IEEE Trans Consum Electron 2010;56(4):2661–8.[18] Elsayad AM. Completely unsupervised image segmentation using wavelet analysis and Gustafson-Kessel clustering. In: 5th International multi-

conference on systems, signals and devices, IEEE SSD; 2008. p. 1–6.[19] Sahaphong S, Hiransakolwong N. Unsupervised image segmentation using automated fuzzy c-means. In: 7th IEEE International conference on

computer and information technology, CIT; 2007. p. 690–4.[20] Feng J, Mingjing L, Hong-Jiang Z, Bo Z. Unsupervised image segmentation using local homogeneity analysis. In: Proceedings of the 2003 international

symposium on circuits and systems, ISCAS ‘03, vol. 2; 2003. p. II-456–9[21] Hu LP, Liu HW, Yin KY, Wu SJ. Two-dimensional clustering-based discriminant analysis for SAR ATR. In: 8th International symposium on antennas,

propagation and EM theory, ISAPE; 2–5 November 2008. p. 509–13.[22] Chen S, Zhang D. Robust image segmentation using FCM with spatial constraints based on new kernel-induced distance measure. IEEE Trans Syst Man

Cybern B: Cybern 2004;34(4):1907–16.[23] Chuang KS, Tzeng HL, Chen S, Wu J, Chen TJ. Fuzzy c-means clustering with spatial information for image segmentation. Comput Med Imag Graph

2006:9–15.[24] Li L, Gong R, Chen W. Gray level image thresholding based on fisher linear projection of two-dimensional histogram. Pattern Recogn 1997;30:743–9.[25] Davies DL, Bouldin DW. A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1979;PAMI-1(2):224–7.[26] Haralick RH, Shapiro LG. Image segmentations techniques. Comput Vis Graph Image Process 1985;29:100–32.[27] Liu J, Yang YH. Multiresolution color image segmentation. IEEE Trans Pattern Anal Mach Intell 1994;16(7):689–700.[28] Borsotti M, Campadelli P, Schettini R. Quantitative evaluation of color image segmentation results. Pattern Recogn Lett 1998;19(8):741–7.

Intan Aidha Yusoff obtained her BEng (Hons) in Electronics Engineering from the Universiti Sains Malaysia (USM) in 2007. She is currently pursuing herMSc degree in Electrical and Electronics Engineering in USM while attached to the Imaging and Intelligent System Research Team (ISRT), School of Electricaland Electronic Engineering (SEEE), USM, Engineering Campus in Nibong Tebal, Penang, Malaysia.

Nor Ashidi Mat Isa received his BEng (in 1999) and PhD (in 2003) in Electrical and Electronics Engineering from USM. He is currently an Associate Professorat the SEEE, USM. His research interests include intelligent systems, image processing, biomedical engineering, and algorithms. He has published numerousjournals, book chapters, and proceedings at both the national and international arena.

Khairunnisa Hasikin obtained her BEng in Electrical Engineering and MEngSc from the University of Malaya (UM) in 2007 and 2010, respectively. She iscurrently pursuing her PhD in video/image processing while attached to the ISRT, SEEE, USM, Engineering Campus in Nibong Tebal, Penang, Malaysia.