A novel soft cluster neural network for the classification of suspicious areas in digital mammograms

Pattern Recognition 42 (2009) 1845 -- 1852

Contents lists available at ScienceDirect

Pattern Recognition

journal homepage: www.e lsev ier .com/ locate /pr

Anovel soft cluster neural network for the classification of suspicious areas indigitalmammograms

Brijesh Vermaa,∗, Peter McLeoda, Alan Klevanskyb

aSchool of Computing Sciences, Central Queensland University, Rockhampton, QLD 4701, AustraliabRadiology Department, Gold Coast Hospital, Gold Coast, QLD 4215, Australia

A R T I C L E I N F O A B S T R A C T

Article history:Received 30 April 2008Received in revised form 21 November 2008Accepted 20 February 2009

Keywords:Pattern classificationNeural networksClustering algorithms

This paper presents a novel soft cluster neural network technique for the classification of suspicious areasin digital mammograms. The technique introduces the concept of soft clusters within a neural networklayer and combines them with least squares for optimising neural network weights. The idea of softclusters is proposed in order to increase the generalisation ability of the neural network by providing amechanism to more aptly depict the relationship between the input features and the subsequent classi-fication as either a benign or malignant class. Soft clusters with least squares make the training processfaster and avoid iterative processes which have many problems. The proposed neural network techniquehas been tested on the DDSM benchmark database. The results are analysed and discussed in this paper.

© 2009 Elsevier Ltd. All rights reserved.

1. Introduction

The analysis of medical images represents an important, yet chal-lenging part of pattern recognition and artificial intelligent systems[1]. Computer aided diagnostic (CAD) systems can help a physicianin diagnosing a patient. CAD systems are used in the classificationtask where certain features (clinical findings) are used to assign acase to a particular pattern which represents a diagnosis. In particu-lar neural networks have demonstrated their efficacy in the clinicaldomain with diseases such as cancer where there is a weak relation-ship between the class patterns forming a benign or malignant di-agnosis [2,3]. Breast cancer diagnosis is particularly challenging dueto the underlying cause of the disease not being known and the sim-ilarities that exist between benign and malignant masses. The seri-ousness of the disease means that breast cancer is the leading causeof cancerous deaths in women in the 20–59 age group [4]. This sit-uation is then further compounded by mammograms being a fairlylow contrast representation of the breast. On top of this anatomicalanomalies, distortion of the breast during screening and backgroundnoise add further complications to the process. The early stages ofbreast cancer may only have subtle indications which can be variedin appearance, making physical examination ineffective and makingdiagnosis difficult even for experienced radiologists [5,6].

∗ Corresponding author.E-mail addresses: [email protected] (B. Verma), [email protected]

(P. McLeod).

0031-3203/$ - see front matter © 2009 Elsevier Ltd. All rights reserved.doi:10.1016/j.patcog.2009.02.009

Various studies [7,8] have shown that the size and appearance ofa lesion can affect the chance that a Radiologist may miss or incor-rectly diagnose a cancer. Subtlemasses and small lesions aremore of-ten missed [9] than larger ones. With an estimated 11–25% of breastcancers being missed [10] during routine screening mammographythere is still room for improvement. Although different studies havefound that certain lifestyle factors can increase the chance of con-tracting breast cancer [11], the underlying cause of the disease hasnot been discovered. The overall survival rate in the United States forearly stage breast cancer is 98%. If cancer has spread to the regionallymph nodes the survival rate decreases to 84%, and if metastaseshave spread to distant organs the chance of survival declines to 28%[11]. This means that in the short term the likelihood of develop-ing a cure is low indicating that early and accurate diagnosis is thesurest mechanism of reducing the mortality rate.

The use of CAD can improve the performance of the radiologistsin terms of reducing the number of missed diagnosis [9,10,12,13]and reducing the time taken to reach a diagnosis. Not only can CADlead to improvements in accuracy but CAD systems can lead to fasterturnaround of results as well as improved storage and retrieval ofradiographic images [14]. This can also facilitate the transmission ofthese images to other medical specialists. CAD systems can providea valuable tool for the training of radiologists.

The remainder of this paper is organised into seven sections.Section 2 reviews the existing techniques. Section 3 details the the-oretical underpinnings, architecture and training of the soft clusterneural network (SCNN). Section 4 presents the research methodol-ogy. Section 5 details the results obtained with Section 6 providinga discussion of the results together with a comparative analysis of

http://www.sciencedirect.com/science/journal/pr

http://www.elsevier.com/locate/pr

mailto:[email protected]

mailto:[email protected]

1846 B. Verma et al. / Pattern Recognition 42 (2009) 1845 -- 1852

the findings. In Section 7, conclusions are drawn and future researchdirections are addressed.

2. Related work

Medical diagnosis is a pattern classification dilemma where a setof input features are used to determine if a patient has a particulardisorder. Breast cancer diagnosis is a classification problem in termsof distinguishing between a malignant and benign mass from a sus-picious lesion or region. Such problems have leant themselves tothe application of CAD systems over the last few decades. A myriadof approaches such as artificial neural networks, genetic algorithms,wavelet filters and statistical transforms have been employed in or-der to solve this problem.

Optical Fourier transforms [15] and wavelet transforms are effec-tive on micro calcification where the high spatial frequency in theFourier spectrum allows for their identification, but may not be asaccurate on masses where a lower spatial frequency exists similarto the surrounding tissues. Fuzzy logic [16], Bayesian networks [17],case based systems [18] and artificial neural networks [2,3,19,20]have been utilised to varying degrees of success. Artificial neuralnetworks have demonstrated their capabilities in this area and havebeen widely adopted due to their generalisation capabilities [21].Georgiou [21] utilised morphological features with a support vectormachine (SVM) to obtain 91.54% classification accuracy. Halkiotiset al. [22] used theMIAS database and amulti layer perceptron (MLP)type neural network to obtain a classification rate of 94.7% with anaverage of 0.27 false positives per image. Brem et al. [13] used thesecond look CAD system (version 3.4) to determine the performanceof CAD systems on different sized lesions and micro calcifications toachieve an overall sensitivity of 89%. Their investigation was to tryand determine if lesion size would adversely impact upon the per-formance of a CAD system. Abdalla et al. [23] used textual featureswith a SVM classifier to achieve a classification accuracy of 82.5% onmammograms from the Digital Database of Screening Mammogra-phy (DDSM) [24]. Rangayyan et al. [25] noted that several methodshave good sensitivity (> 85%) for the identification of masses butalso have a high false positive rate.

As can be seen from the above, neural network based techniqueshave been used for the classification of suspicious areas into benignand malignant classes, however there are several disadvantages withexisting neural network techniques. Neural networks must undergoa training phase which can take a while to complete. The trainingphase needs to be repeated if new knowledge is to be added to thesystem making it hard to add new knowledge. The optimisation ofa neural network relies on the setting of various parameters whichneed to be adjusted during the training phase which can greatly af-fect the performance of the system if not correctly tuned. The tuningof the network parameters often requires multiple iterations to de-termine the optimal configuration. In other situations certain neu-ral networks may suffer from network paralysis, or excessive useof computational resources [26] making them infeasible for use ina clinical situation. Neural networks are often tasked with handlinga large number of features, many of which are only weakly corre-lated or are of limited diagnostic value with the class pattern thatthey are attempting to classify. This creates a problem for trainingthe network and produces low accuracy on test data. In this paper,we introduce a new concept of soft clusters with least squares andpropose a novel neural network which can avoid some of the abovementioned problems with improved classification accuracy.

3. Soft cluster neural network

SCNN is based on an idea that in a classification problem,each class can have more than one cluster called soft clusters

and that incorporating soft clusters output values into the learningof neural network weights will improve the learning process andthe overall classification accuracy. The use of soft clustered neuralnetwork for the classification of benign and malignant patternscan lead to an improvement in classification accuracy over otherapproaches. To validate the proposed claim, we define the null andalternative hypotheses as follows.

H0:l The use of soft clustered neural networks makes no differ-ence to the classification of benign and malignant classeswhen compared to traditional networks such as k-meansor SVM.

H1:l The use of soft clustered neural networks results in animprovement in the classification of benign and malignantclasses when compared to traditional approaches such ask-means or SVM.

The theoretical underpinnings, network architecture and trainingmethods for the SCNN are presented in the following sections.

3.1. Theoretical underpinnings

Let P = [p1(j),p2, . . . ,pn(j)] be the set of j input patterns with n at-tributes and O = [o1(j),o2(j)] be the output classes.

Let J=∑kj=1

∑ni=1||p(j)i − cj||be the objective function for clustering

where k is the number of clusters, ||p(j)i − cj||2is a chosen distancemeasure between a data point pi(j) and the cluster centre cj, which isan indicator of the distance of the n data points from their respectivecluster centres (centroids).

The clustering based on objective function mentioned abovecan cluster P = [p1(j),p2(j), . . . ,pn(j)] inputs into two clusters/classesO = [o1(j),o2(j)]. However hard clustering with one single hardboundary creates a lot of problems. Therefore, many soft clustersare created and the output of only those clusters which stronglyassociate with one of the classes, are used for weight optimisation.The weak and zero clusters are removed from influencing the net-work parameters in the wrong directions which will reduce thenumber of false classifications. It will also reduce the input data forweight optimisation and provide faster training of the network.

Let J = [j1,j2, . . . ,jk] be the output of k clusters, where j1,j3, . . . ,jn−1may belong to o1 and j2,j8, . . . ,jn may belong to o2. The output ofthe neural network can be optimised using the following outputfunctions.

output = f(J'∗W), where J′ ∈ (j1,j3, . . . ,jn) and (j1,j3, . . . ,jn) ∈ outputof all clusters.

output = f(J∗i), where J ∈ (j2, j3, j4) and (j2, j3, j4) ∈ output ofnon-zero and strong clusters where the first output function utilisesall the weights from the clustering process for the network train-ing process while the second output prunes the underperformingweights from inclusion in the training process of the neural network.The actual details of these functions are expanded in Sections 3.2and 3.3.

3.2. SCNN architecture

An overview of the proposed SCNN architecture is shown inFig. 1. It has n inputs, m outputs and k soft clusters.

The error minimisation function for SCNN is defined as follows.SCNN is trained to minimise the following error:

E = 12

p∑

i=1

m∑

j=1

(yij − tij)2

where p is the number of training samples, m is the number ofoutputs, y is the network output and t is the target (desired) output.

B. Verma et al. / Pattern Recognition 42 (2009) 1845 -- 1852 1847

Cluster k

Input Feature Data

Class 1 Class m Output Classes

Cluster 1

Weak and Zero Clusters/Neurons

are Removed from Output

WeightCalculation

MGSWeights

Fig. 1. SCNN architecture.

Table 1Cluster evaluation process.

Cluster number 1 2 3 4 5 6 . . . . k

Class 1 0 15 0 30 5 0 0Class 2 0 3 0 2 20 0 25

The output of the network is defined as follows:

yj =1

1 + e−netj,

netj =Wo(1/1+ e−(JWh)), where J-output of the cluster layer and Wh-weights for hidden layer. XWo = log(tj/1 − tj), here X-output of thehidden layer and Wo-weights for output layer.

3.3. SCNN training methods

3.3.1. Determining weak/zero clusters/neuronsThe weak and zero clusters are determined by creating a confu-

sion matrix as shown in Table 1 through clustering of the input data.The clustering analysis is an effective mechanism in computationalproblems as it groups a set of data in d-dimensional feature space tomaximise the similarity within a cluster and minimise the similaritybetween different clusters.

The use of clustering relies on an underlying assumption; and thatis any given population will consist of n elements described by mattributes and that this population can be partitioned into k clusters.Accordingly Pi = (pi1,pi2, . . . ,pim) which will represent the vector ofthe m number of attributes of element i.

The process of clustering that searches for an optimal partitionwithin a fixed number of clusters can be described as follows:

1. The initial seeds with the number of clusters k are selected and aninitial partition is established by using the seeds as the centroidsof the initial clusters.

2. Each input pattern is assigned to the centroid that is nearest, thusforming a cluster.

3. Keeping the same number of clusters, the new centroid is calcu-lated for each cluster.

4. Steps 2 and 3 are repeated until the clusters stop changing.

J =k∑

j=1

n∑

i=1

||p(j)i − cj||2

This process can be explained in the above equation where J is theobjective function we are trying to minimise. k is the number of

clusters, ‖p(j)i −cj‖2 is a chosen distancemeasure between a data point

p(j)i and the cluster centre cj, which is an indicator of the distance ofthe n data points from their respective cluster centres (centroids).

A confusion matrix as shown in Table 1 can be produced usingthe above clustering on training data. The matrix shows cluster num-bers together with the number of input patterns belonging to eachcluster.

The removal of zero and weak clusters is an attempt to prunethe weights that have the lowest salience for the clustering of theinput features, and hence provide low quality discrimination for therelationship between the input features and subsequent benign ormalignant class. Using the values in Table 1 only the output of neu-rons from cluster numbers 2, 4, 5 and k which are strong clusterswould be fed into the MGS layer for weight calculation.

3.3.2. Determining modified Gram-Schmidt weightsThe use of MGS in conjunction with linear system of equations

to calculate/determine the output weights has been chosen since ithas the potential to overcome some of the problems that could occurwith traditional iterative algorithms such as backpropagation [27]where the network could suffer from local minima, paralysis andslow training process [27,28]. By using a non-iterative process suchas MGS, the weights can be determined directly and the major prob-lems as mentioned above can be avoided. Modified Gram Schmidtcan also be faster at determining the output weights compared withbackpropagation. The output of the neural network can be writtenas follows.

output = f(J∗W) = T, where T is a target vector output for theneural network, W is a weight matrix for the output layer.

The linear system of equations JW = T is solved using the MGS.The MGS's factorisation process would occur as follows:

J =n∑

i=1

cQirRTi

where J represents a m×n matrix with (m> n), and J may be de-composed into the product of J = QR where Q is (m×n) orthogonal(QTQ = In) and R is (n×n) upper triangle. cQi represents the ith col-umn of our vector Q, and rRi refers to the ith row of R.

3.4. Advantages of SCNN over other existing techniques

The major advantages of the proposed SCNN over other exist-ing techniques such as SVMs are as follows: (1) SCNN can have anynumber of inputs and outputs so it can easily deal with multi-classproblems. SVM needs multiple binary SVMs to deal with multi-classproblems. (2) The complexity of SCNN is lower than the other tech-niques such as SVMs as they need a large number of support vectorswhich increases complexity. (3) SCNN can be used for approxima-tion as well as for classification. (4) SCNN can be used with binaryor real values without making any changes in architecture or train-ing scheme. (5) SCNN uses sigmoidal functions in output and hiddenlayers so it maintains non-linearity same as traditional back propa-gation type neural networks. (6) SCNN is faster and achieves bettergeneralisation than other techniques including SVM. (7) SCNN avoidslocal minima.

4. Methodology for experiments using SCNN

An overview of the research methodology for using SCNN to clas-sify benign and malignant patterns is given in Fig. 2 and it involvesdata collection, suspicious area extraction, feature extraction, neuralnetwork training and testing.


Data

Collection

Processing And

Area Extraction

Feature

Extraction

Soft Cluster

Neural Network

Benign Malignant

Fig. 2. An overview of research methodology.

Table 2BI-RADS breast density ratings.

Density ratings

Code Description

1 The breast is almost entirely fat.2 There are scattered fibroglandular densities that could obscure a lesion on mammography.3 The breast is heterogeneously dense. This may lower the sensitivity of mammography.4 The breast tissue is extremely dense, which lowers the sensitivity of mammography.

4.1. Data collection

The mammogram images were obtained from the University ofSouth Florida's DDSM in order to evaluate the results on a benchmarkdatabase [24]. The database contains approximately 2500 studies ofmalignant, benign, benign-without-callback and normal cases. TheDDSM has already been interpreted by expert radiologists and theappropriate information has been provided in “ics” and “overlay”files. In this research 100 malignant and 100 benign cases are usedfor training and testing.

4.2. Processing and area extraction

The obtained digital mammograms are processed for the extrac-tion of suspicious areas and features. The process of dividing a mam-mogram into distinct and discrete regions is called Area Extractionor Image Segmentation. This process is useful for subsequent phasessince the mammogram is divided into regions of interest (ROI) whichreduces the usage of system resources by discarding inappropriateregions. ROI represent regions of suspicion; however these regionscontain anatomical anomalies as well as malignant and benign struc-tures. The area extraction was based on chain code provided with theDDSM and the programs developed in our previous research [29,30]were used to extract these areas.

4.3. Feature extraction

A set of six features has been extracted and utilised in this re-search representing four BI-RADS descriptor features (density, massmargin, mass shape, abnormality assessment rank) together with pa-tient age and a subtlety value feature [24]. The features are describedbelow.

4.3.1. Density featureBreast density is measured on a scale from one to four. The values

are according to the BI-RADS definitions and are classified by anexpert radiologist. The classifications are reproduced in Table 2.

Table 3BI-RADS mass shape types.

Mass shape

Code Description Code Description

1 Round 6 Tubular2 Oval 7 Lymph node3 Lobulated 8 Asymmetric_breast_tissue4 Irregular 9 Focal_asymmetric_density5 Arcitectural_distortion 10 Any

Table 4BI-RADS mass margin types.

Mass margin

Code Description Code Description

1 Circumscribed 4 Ill_defined2 Microlobulated 5 Spiculated3 Obscured 6 Any

4.3.2. Morphological description of breast abnormalitiesThe morphological features and arrangements of abnormalities

are used to determine if an abnormality is of a cancerous or be-nign class. Like other features morphological features are encodedinto numeric values to represent the real feature types. Two sets ofdescriptors were required for the morphological description repre-senting the mass margin and the mass shape. The mass types are asfollows:

4.3.2.1. Mass shape feature. Table 3 presents the mass shape feature.

4.3.2.2. Mass margin feature. Table 4 presents the mass margin fea-ture.

4.3.3. Abnormality assessment rank featureThe abnormality assessment rank represents a determination of

the seriousness of the abnormality detected and a suggested course


Table 5BI-RADS abnormality assessment ranks.

BI-RADS abnormality assessment rank

Category Description and course of action

1 Negative.2 Benign finding.3 Probably benign finding.4 Suspicious abnormality—biopsy should be considered.5 Highly suggestive of malignancy—appropriate action should be taken.

of action that should be taken. It is determined by an expert radiol-ogist and is rated on a scale from 1 to 5 [24] (Table 5).

4.3.4. Patient age featureVarious studies and cancer statistics have demonstrated a corre-

lation between patient age and the risk of contracting breast cancer[6]. This makes patient age a useful feature for consideration in thediagnosis of breast cancer; however, cancer statistics also show anumber of exceptions to the general age based correlation.

4.3.5. Subtlety value featureSubtlety value is a subjective measure by a radiologist of how

obvious a lesion is to detect. Cancerous abnormalities aremore subtlewith ill definedmargins compared to non-cancerous abnormalities. Ithas a value between 1 and 5 with 1 being “subtle” and 5 representing“obvious”.

4.4. SCNN training

Training a neural network is an optimisation problem of findingthe set of network parameters (weights) that provide the best classi-fication performance of the system. An inappropriate set of weightswill lead to degraded performance of the system resulting in the in-appropriate assignment of cases to incorrect output classes. In orderto train the network the initial parameters for the number of fea-tures, number of clusters and number of hidden units are set. Thetraining steps for SCNN are described below.

4.4.1. Stepwise algorithm

Step 1: Process mammograms and extract features from suspi-cious areas.

Step 2: Use clustering algorithm to cluster data into many softclusters.

Step 3: Remove weak and zero clusters and keep strong clustersonly.

Step 4: Extract output of the clusters (J) for each training and testpattern.

Step 5: Initialise hidden weights (Wh) at small random values.Step 6: Calculate the output (X) of hidden layer (f(JWh)).Step 7: Use MGS to calculate weights (Wo) of the output layer for

training patterns.Step 8: Use test patterns (features) and calculate the accuracy.

5. Results

The SCNN has been implemented in C++ on the Windows plat-form and is composed of two programs. The features for the experi-ments were extracted from suspicious areas of digital mammogramstaken from the DDSM database. The features were clustered into softclusters. The zero and weak clusters were removed and the SCNNwas trained by calculating the weights of the network. The networkwas then tested with the test data. A number of experiments wereperformed by changing the number of clusters and the results arereported in Table 6.

Table 6Classification accuracy of SCNN.

Methods/algorithms # of clusters # of hidden units Classification accuracy (%)

Training set Test set

K-Means 16 NA 82 7917 NA 82 7919 NA 84 8020 NA 77 8121 NA 81 8229 NA 83 8041 NA 82 7950 NA 85 79

SCNN 12 24 91 91[All clusters] 29 96 91

35 94 9250 45 96 91

24 95 9327 93 9129 94 9145 96 91

SCNN 12 22 91 92[Strong clusters] 24 93 91

29 91 8935 91 89

50→9 24 91 9327 93 9429 95 9445 95 92

Table 7Summary of network sample population for Anova test.

Summary of sample populations (network topologies)

Classifier network Count Sum Average Variance

K-means 15 1179 78.6 2.9714SCNN—all clusters 15 1374 91.6 0.5429SCNN—strong clusters 15 1403 93.533 0.2667

Table 8Anova analysis of the different classifier networks.

Anova—single factor analysis of network topologies

Source of variation SS Df MS F P-value F crit

Between groups 1978.711 2 989.36 785.01 5.41E−34 3.22Within groups 52.93 42 1.26

Total 2031.64 44

6. Analysis of results

Once the results had been obtained a single factor ANOVA test ofvariance was performed on the top 15 results from each network todetermine if our alternate hypothesis was correct. The alpha valuewas set at a 5% confidence level. Table 7 contains the summary ofthe sample population and table eight contains the results from theAnova analysis.

The Anova analysis shown in Table 8 indicates that the P-value isless than the 5% threshold indicating that the null hypothesis (Sec-tion 3) has been disproved. However, in breast cancer research dueto the nature of the condition where a benign finding for a malignanttumour can have a larger impact on the patient the sensitivity andspecificity values need to be calculated. A confusion matrix was cal-culated for each network. Essentially a confusion matrix contains


Table 9Representation of a confusion matrix.

Representation of a confusion matrix

Actual result Benign Malignant

Benign TN FPMalignant FN TP

Table 10Confusion matrix for the performance of each network.

Confusion matrix for each classifier network

Classifier network Actual result Benign Malignant

K-means Benign 42 10Malignant 8 40

SCNN—all clusters Benign 47 4Malignant 3 46

SCNN—strong clusters Benign 49 5Malignant 1 45

Table 11Sensitivity and specificity values for each network.

Sensitivity and specificity values for each classifier network

Classifier network Sensitivity (%) Specificity (%)

K-means 83.3 80.77SCNN—all clusters 93.88 92.16SCNN—strong clusters 97.83 90.74

information about the actual and predicted classification results. Aconfusion matrix is represented as presented in Table 9.

Where TN represents the true negative findings (Benign), FPrepresents false positive findings (an incorrect diagnosis about amalignant mass), FN represents false negative findings (an incorrectdiagnosis about a benign mass) and TP represents true positivefindings (or correct diagnosis of a tumour). Table 10 represents theconfusion matrix for the neural networks.

A confusion matrix allows for the calculation of two ratios oftenused in Breast cancer research called sensitivity and specificity. Sen-sitivity is calculated as the TP classification divided by the TP andFN classifications. Specificity is calculated as the TN classificationsdivided by the FPs plus the TNs. An analysis is shown in Table 11.

The soft clustered neural networks performed better than thek-means network in terms of both their sensitivity and specificityvalues. Although the strong cluster neural network had a highersensitivity value than the all clusters neural network this was at theexpense of a declining specificity value. The higher sensitivity andspecificity values support the efficacy of the soft clustered neuralnetwork in comparison to the k-means network.

6.1. Tenfold cross validation

The analysis and comparison using tenfold cross validation hasalso been conducted. The classification accuracy using tenfold crossvalidation is shown in Table 12.

6.2. Clustering analysis

The clusters have been analysed and the clustering results arepresented in Table 13. The bold numbers represent the number ofpatterns in correct clusters. The underlined numbers represent thenumber of patterns in incorrect clusters.

Table 12Classification accuracy using tenfold cross validation.

Fold# Classification accuracy (%)

SCNN SVM K means SOM

Fold 1 95 75 70 85Fold 2 100 100 90 80Fold 3 95 90 60 80Fold 4 90 85 75 65Fold 5 95 70 80 70Fold 6 85 85 75 75Fold 7 90 90 85 80Fold 8 95 90 95 75Fold 9 100 95 90 70Fold 10 95 85 75 80

Average 94 86.50 84.5 76

Table 13Clustering results.

Fold # Class Clusters

#1 . . . #k

Fold 1 Train Benign 8 : 14 20 19 1 25 3Malignant 1 : 16 7 4 28 1 33

Test Benign : 3 5 1 1 0Malignant : 1 0 1 2 6

Fold 2 Train Benign : 15 22 20 5 24 4Malignant : 17 8 4 24 4 33

Test Benign : 1 3 2 0 4 0Malignant : 1 1 0 2 0 6

Fold 6 Train Benign : 18 28 14 1 28 1Malignant : 10 8 33 16 4 19

Test Benign : 1 5 2 0 2 0Malignant : 0 2 4 1 1 2

Fold 9 Train Benign 0 0 7 : 27 25 5 1 23 2Malignant 2 2 0 : 6 9 33 16 2 20

Test Benign 1 : 3 2 1 0 3 0Malignant 0 : 0 1 3 1 0 5

Table 14Classification accuracy of SCNN using small sample size.

Sample size # of hidden units Classification accuracy (%)

30 16 9030 24 8330 29 8060 16 9160 24 8960 29 8980 16 9380 24 9480 29 94

6.3. Problem of small sample size

The proposed SCNN has been analysed for small sample size.The experiments varying training sample size were conducted andthe performance was measured. The classification accuracy is pre-sented in Table 14. The increase in sample size has slightly increasedthe accuracy. The results also showed that the better accuracy wasachieved with small number of hidden units when small sample sizewas used.

6.4. Comparative analysis of results

The results obtained on a DDSM benchmark database usingthe proposed SCNN are detailed in Tables 6 and 12. The highest


0102030405060708090

100

Cla

ssifi

catio

nA

ccur

acy

AANN SCNN-Strong

ClustersClassification Techniques

Comparison of Accuracies on DDSM's Mass Dataset

SVM CART See5 GANN SOM SCNN-All

Clusters

Fig. 3. Comparison of accuracies on DDSM's mass dataset.

classification accuracy obtained by SCNN with strong clusters was94%. The results obtained by SCNN with strong clusters comparefavourably with other techniques and show the superiority of SCNNover other techniques.

The proposed SCNN has been compared with six recently pub-lished techniques in the literature. The statistical and neural networkbased techniques such as auto-associator neural network (AANN)[30], CART [29], See5 [29], genetic algorithm neural network (GANN)[29], self-organising map (SOM) [32,33] and SVM [31] were used forcomparison. All the experiments for comparison purposes were con-ducted using same training and testing database. TheWEKA softwarepackage was used for SVM experiments. A comparison of accuraciesis presented in Fig. 3. The comparison showed that the proposedSCNN is very promising.

7. Conclusions

The research presented in this paper has demonstrated that theproposed approach (SCNN) with new ideas of soft clusters, remov-ing weak clusters/neurons (keeping strong clusters) and combiningclustering with MGS has been able to produce an improvement inthe classification accuracy and training time. The highest classifica-tion accuracy obtained by SCNN with all clusters and without weakclusters was 93% and 94%, respectively. It was found that the in-creased accuracy was received with a larger number of clusters. Thereason for this is that the training data naturally falls into a num-ber of groups encasing a larger variation than just benign or malig-nant classes (such as a round benign mass, etc.). If the number ofclusters are kept low the resultant weights from the clustering pro-cess reflect this variation and effectively result in an over training ofthe network based on the smaller cluster size which may producelower classification accuracy on test set. However, the accuracy withsmaller cluster size will still be higher than most existing techniques.

In our future work we would like to focus on expanding thisresearch to find the appropriate number of clusters and test it on alarger benchmark dataset. We would also like to expand it by usingthe value of non-conflicting clusters from the confusion matrix. Thismight further improve the classification accuracy and reduce thecomplexity.

References

[1] R. Mousa, Q. Munib, A. Moussa, Breast cancer diagnosis system based on waveletanalysis and fuzzy neural, Expert Systems with Applications 28 (1) (2005)713–723.

[2] P. Lisboa, A. Taktak, The use of artificial neural networks in decision supportin cancer: a systematic review, Neural Networks 19 (2006) 408–415.

[3] P. Lisboa, A review of evidence of health benefit from artificial neural networksin medical intervention, Neural Networks 15 (2002) 11–39.

[4] A. Jemal, R. Siegel, E. Ward, Y. Hao, J. Xu, T. Murray, M. Thun, Cancer statistics,2008, CA: A Cancer Journal for Clinicians 58 (2008) 71–96.

[5] I. Christoyianni, A. Kotras, E. Dermatas, G. Kokkinakis, Computer aided diagnosisof breast cancer in digitized mammograms, Computerised Medical Imaging andGraphics 26 (2002) 309–319.

[6] M. Wallis, M. Walsh, J. Lee, A review of false negative mammography in asymptomatic population, Clinical Radiology 44 (1991) 13–15.

[7] R. Birdwell, D. Ideda, K. O'Shaughnessy, E. Sickles, Mammographiccharacteristics of 115 missed cancers later detected with screeningmammography and the potential utility of computer aided detection, Radiology219 (2001) 192–202.

[8] M. Kallergi, G. Carney, J. Gaviria, Evaluating the performance of detectionalgorithms in digital mammography, Medical Physics 26 (1999) 267–275.

[9] R. Bird, T. Wallance, B. Yankaskas, Analysis of cancers missed at screeningmammography, Radiology 184 (1992) 613–617.

[10] S. Goergen, J. Evans, G. Cohen, J. Macmillan, Characteristics of breast carcinomasmissed by screening radiologists, Radiology 204 (11) (1997) 131–135.

[11] American Cancer Society, Global Cancer Facts and Figures 2007, Atlanta:American Cancer Society, 2007.

[12] C. Balleyguier, K. Kinkel, J. Fermanian, S. Malan, G. Djen, P. Taourel, O. Helenon,Computer-aided detection (CAD) in mammography: does it help the junior orthe senior radiologist?, European Journal of Radiology 54 (2005) 90–96.

[13] R. Brem, J. Hoffmeister, G. Zisman, M. DeSimio, S. Rogers, A computer-aided detection system for the evaluation of breast cancer by mammographicappearance and lesion size, American Journal of Roentology 184 (2004)893–896.

[14] E.D. Pisano, C. Gatsonis, E. Hendrick, Diagnostic performance of digital versusfilm mammography for breast cancer screening, New England Journal ofMedicine 353 (17) (2005) 1773–1783.

[15] C. Yelleswarapu, S. Kothapalli, D. Rao, Optical Fourier techniques for medicalimage processing and phase contrast imaging, Optical Communications 28(2008) 1876–1888.

[16] Y. Lee, D. Tsai, Computerised classification of microcalcifications onmammograms using fuzzy logic and genetic algorithm, in: J. Fitzpatrick,M. Sonka (Eds.), Proceedings of the SPIE on Medical Imaging 2004: ImageProcessing, vol. 5370, 2004, pp. 952–959.

[17] N. Ramirez, H. Acosta-Mesa, H. Carillo-Calvert, L. Nava-Fernandez, R. Barrientos-Martinez, Diagnosis of breast cancer using Bayesian networks: a case study,Computers in Biology and Medicine 37 (2007) 1553–1564.

[18] G. Tourassi, B. Haarawood, S. Singh, J. Lo, C. Floyd, Evaluation of information-theoretic similarity measures for content based retrieval and detection ofmasses in mammograms, Medical Physics 34 (2007) 140–150.

[19] H.D. Cheng, X.J. Shi, R. Min, L.M. Ju, X.P. Cai, H.N. Du, Approaches for automateddetection and classification of masses in mammograms, Pattern Recognition 39(4) (2006) 646–668.

[20] K. Bovis, S. Singh, J. Fieldsend, C. Pinder, Identification of masses in digitalmammograms with MLP and RBF Nets, Proceedings of IEEE-IJCNN 1 (2000)342–347.

[21] H. Georgiou, M. Mavrofarakis, N. Dimitropoulos, D. Cavouras, S. Theodoridis,Multi-scaled morphological features for the characterization of mammographicmasses using statistical classification schemes, Artificial Intelligence in Medicine41 (2007) 39–55.

[22] S. Halkiotis, T. Botsis, M. Rangoussi, Automatic detection of clusteredmicrocalcifications in digital mammograms using mathematical morphologyand neural networks, Signal Processing 87 (2007) 1559–1568.

[23] A. Abdalla, S. Deris, N. Zaki, Breast cancer detection based on statistical featuresand support vector machine, in: Fourth International Conference on Innovationsin Information Technology, 2007, pp. 728–730.

[24] M. Heath, K. Bowyer, D. Kopans, R. Moore, P. Kegelmeyer, The Digital Databasefor Screening Mammography, IWDM-2000, Medical Physics Publishing, 2001.

[25] R. Rangayyan, F. Ayres, L. Desautels, A review of computer-aided diagnosis ofbreast cancer: toward the detection of subtle signs, Journal of the Franklin


Institute, Medical Applications of Signal Processing, Part I 344 (2007) 312–348(special issue).

[26] M. Mazurowski, P. Habas, J. Zurada, G. Tourassi, Impact of low classprevalence on the performance evaluation of neural network based classifiers:experimental study in the context of computer-assisted medical diagnosis,in: Proceedings of International Joint Conference on Neural Networks, 2007,pp. 2005–2009.

[27] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning internal representationsby back-propagating errors, in: D.E. Rumelhart, J.L. McClelland (Eds.), The PDPResearch Group, Parallel Distributed Processing, vol. 1, MIT Press, Cambridge,MA, 1986, pp. 318–362.

[28] M. Gori, A. Tesi, On the problem of local minima in backpropagation, IEEETransactions on Pattern Analysis and Machine Intelligence 14 (1) (1992)76–86.

[29] K. Kumar, P. Zhang, B. Verma, Application of decision trees for massclassification in mammography, in: International Conference on Fuzzy Systemsand Knowledge Discovery, FSKD'06, China, 2006, pp. 366–376.

[30] R. Panchal, B. Verma, Neural classification of mass abnormalities withdifferent types of features in digital mammography, International Journal ofComputational Intelligence and Applications 6 (1) (2006) 61–76.

[31] M. Masotti, A ranklet-based image representation for mass classification indigital mammograms, Medical Physics 33 (10) (2006) 3951–3961.

[32] S. Sawarkar, A. Ghatol, Breast cancer malignancy identification using self-organizing map, in: Proceedings of the Fifth WSEAS International Conferenceon Circuits, Systems, Electronics, Control & Signal Processing, 2006, pp. 20–25.

[33] P. Chang, W. Teng, Exploiting the self-organizing map for medical imagesegmentation, in: 20th IEEE International Symposium on Computer-basedMedical Systems (CBMS'07), 2007, pp. 281–288.

About the Author—BRIJESH VERMA is a Professor in the School of Computing Sciences at Central Queensland University, Queensland, Australia. His research interests includepattern recognition and computational intelligence. He has published twelve books, five book chapters and over hundred papers and supervised 29 research students in theabove mentioned areas. He has served as a chief/co-chief investigator on 12 competitive research grants. He has served on the organising/program committees and editorialboards of many conferences and journals.

About the Author—PETER MCLEOD is currently doing his Masters in Informatics at Central Queensland University, Queensland, Australia. His research areas include patternrecognition, neural networks and digital mammography.

About the Author—ALAN KLEVANSKY is a radiologist and Head of Radiology Department in the Gold Coast Hospital, Queensland, Australia. His research interests includedigital mammography and intelligent systems.

Documents

A novel soft cluster neural network for the classification of suspicious areas in digital mammograms