8
Gene expression microarray classication using PCABEL Ehsan Lota , Azita Keshavarz b,n a Department of Computer Engineering, Torbat-e-Jam Branch, Islamic Azad University, Torbat-e-Jam, Iran b Department of Psychology, Torbat-e-Jam Branch, Islamic Azad University, Torbat-e-Jam, Iran article info Article history: Received 21 February 2014 Accepted 16 September 2014 Keywords: Amygdala BEL Emotional neural network Cancer BELBIC Diagnosis Diagnostic method abstract In this paper, a novel hybrid method is proposed based on Principal Component Analysis (PCA) and Brain Emotional Learning (BEL) network for the classication tasks of gene-expression microarray data. BEL network is a computational neural model of the emotional brain which simulates its neuropsychological features. The distinctive feature of BEL is its low computational complexity which makes it suitable for high dimensional feature vector classication. Thus BEL can be adopted in pattern recognition in order to overcome the curse of dimensionality problem. In the experimental studies, the proposed model is utilized for the classication problems of the small round blue cell tumors (SRBCTs), high grade gliomas (HGG), lung, colon and breast cancer datasets. According to the results based on 5-fold cross validation, the PCABEL provides an average accuracy of 100%, 96%, 98.32%, 87.40% and 88% in these datasets respectively. Therefore, they can be effectively used in gene-expression microarray classication tasks. & 2014 Elsevier Ltd. All rights reserved. 1. Introduction Every cell in our body contains a number of genes that specify the unique features of different types of cells. The gene expression of cells can be obtained by DNA microarray technology which is capable of showing simultaneous expressions of tens of thousands of genes. This technology is widely used to distinguish between normal and cancerous tissue samples and support clinical cancer diagnosis [27]. There are certain challenges facing classication of gene expression in cancer diagnosis. The main challenge is the huge number of genes compared to the small number of available training samples [47]. Microarray learning data samples are typically gathered from often less than one hundred of patients, while the number of genes in each sample is usually more than thousands of genes. Furthermore, microarray data contain an abundance of redundancy, missing values [7] and noise due to biological and technical factors [25,75]. In the literature, there are two general approaches to these issues including feature selection and feature extraction. A feature selection method selects a feature subset from the original feature space and provides the marker and causal genes [9,4,1] which are able to identify cancers quickly and easily. However, feature extraction methods, normally trans- forms the original data to other spaces to generate a new set of features containing high information packing properties. In each of these two approaches, the reduced features are applied by a proper classier to diagnosis. A proper classier increases the accuracy of detection and can inuence the feature reduction step. This paper aims to review these approaches, investigate the recently developed methodology and propose a proper feature reduction-classication method for cancer detection. The organi- zation of the paper is as follows: feature selection methods are reviewed in Section 1.1. Section 1.2 explains the feature extraction methods and Section 2 offers the proposed method. Experimental results on cancer classication are evaluated in Section 3. Finally, conclusions are made in Section 4. 1.1. Feature selection methods Researchers have developed various feature selection methods for classication. Feature selection methods are categorized into three techniques including the lter model [62], wrapper model and embedded model [19]. The lter model considers feature selection and classier's learning as two separate steps and utilizes the general characteristics of training data to select features. The lter model includes both traditional methods which often eval- uate genes separately and new methods which consider gene-to- gene correlation. These methods rank the genes and select top ranked genes as input features for the learning step. The gene ranking methods need a threshold for the number of genes to be selected. For example Golub et al. [20] proposed the selection of the top 50 genes. Additionally the lter model needs a criterion to rank the genes. Liu et al. [35] and Golub et al. [20] have investigated some lter methods based on statistical tests and information gain. Examples of the lter criterion include Pearson correlation coefcient method [84], t-statistics method [2] and Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/cbm Computers in Biology and Medicine http://dx.doi.org/10.1016/j.compbiomed.2014.09.008 0010-4825/& 2014 Elsevier Ltd. All rights reserved. n Corresponding author. E-mail address: [email protected] (E. Lot). Computers in Biology and Medicine 54 (2014) 180187

2014 Gene expressionmicroarrayclassification usingPCA–BEL

Embed Size (px)

Citation preview

Page 1: 2014 Gene expressionmicroarrayclassification usingPCA–BEL

Gene expression microarray classification using PCA–BEL

Ehsan Lotfi a, Azita Keshavarz b,n

a Department of Computer Engineering, Torbat-e-Jam Branch, Islamic Azad University, Torbat-e-Jam, Iranb Department of Psychology, Torbat-e-Jam Branch, Islamic Azad University, Torbat-e-Jam, Iran

a r t i c l e i n f o

Article history:Received 21 February 2014Accepted 16 September 2014

Keywords:AmygdalaBELEmotional neural networkCancerBELBICDiagnosisDiagnostic method

a b s t r a c t

In this paper, a novel hybrid method is proposed based on Principal Component Analysis (PCA) and BrainEmotional Learning (BEL) network for the classification tasks of gene-expression microarray data. BELnetwork is a computational neural model of the emotional brain which simulates its neuropsychologicalfeatures. The distinctive feature of BEL is its low computational complexity which makes it suitable forhigh dimensional feature vector classification. Thus BEL can be adopted in pattern recognition in order toovercome the curse of dimensionality problem. In the experimental studies, the proposed model isutilized for the classification problems of the small round blue cell tumors (SRBCTs), high grade gliomas(HGG), lung, colon and breast cancer datasets. According to the results based on 5-fold cross validation,the PCA–BEL provides an average accuracy of 100%, 96%, 98.32%, 87.40% and 88% in these datasetsrespectively. Therefore, they can be effectively used in gene-expression microarray classification tasks.

& 2014 Elsevier Ltd. All rights reserved.

1. Introduction

Every cell in our body contains a number of genes that specifythe unique features of different types of cells. The gene expressionof cells can be obtained by DNA microarray technology which iscapable of showing simultaneous expressions of tens of thousandsof genes. This technology is widely used to distinguish betweennormal and cancerous tissue samples and support clinical cancerdiagnosis [27]. There are certain challenges facing classification ofgene expression in cancer diagnosis. The main challenge is thehuge number of genes compared to the small number of availabletraining samples [47]. Microarray learning data samples aretypically gathered from often less than one hundred of patients,while the number of genes in each sample is usually more thanthousands of genes. Furthermore, microarray data contain anabundance of redundancy, missing values [7] and noise due tobiological and technical factors [25,75]. In the literature, there aretwo general approaches to these issues including feature selectionand feature extraction. A feature selection method selects a featuresubset from the original feature space and provides the markerand causal genes [9,4,1] which are able to identify cancers quicklyand easily. However, feature extraction methods, normally trans-forms the original data to other spaces to generate a new set offeatures containing high information packing properties. In each ofthese two approaches, the reduced features are applied by a

proper classifier to diagnosis. A proper classifier increases theaccuracy of detection and can influence the feature reduction step.

This paper aims to review these approaches, investigate therecently developed methodology and propose a proper featurereduction-classification method for cancer detection. The organi-zation of the paper is as follows: feature selection methods arereviewed in Section 1.1. Section 1.2 explains the feature extractionmethods and Section 2 offers the proposed method. Experimentalresults on cancer classification are evaluated in Section 3. Finally,conclusions are made in Section 4.

1.1. Feature selection methods

Researchers have developed various feature selection methodsfor classification. Feature selection methods are categorized intothree techniques including the filter model [62], wrapper modeland embedded model [19]. The filter model considers featureselection and classifier's learning as two separate steps and utilizesthe general characteristics of training data to select features. Thefilter model includes both traditional methods which often eval-uate genes separately and new methods which consider gene-to-gene correlation. These methods rank the genes and select topranked genes as input features for the learning step. The generanking methods need a threshold for the number of genes to beselected. For example Golub et al. [20] proposed the selection ofthe top 50 genes. Additionally the filter model needs a criterion torank the genes. Liu et al. [35] and Golub et al. [20] haveinvestigated some filter methods based on statistical tests andinformation gain. Examples of the filter criterion include Pearsoncorrelation coefficient method [84], t-statistics method [2] and

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/cbm

Computers in Biology and Medicine

http://dx.doi.org/10.1016/j.compbiomed.2014.09.0080010-4825/& 2014 Elsevier Ltd. All rights reserved.

n Corresponding author.E-mail address: [email protected] (E. Lotfi).

Computers in Biology and Medicine 54 (2014) 180–187

Page 2: 2014 Gene expressionmicroarrayclassification usingPCA–BEL

signal-to-noise ratio method [20]. The time complexity of thesemethods is O(N) where N shows the dimensionality. It is efficientbut they cannot remove redundant genes, the issue studied inrecent literature [83,78,14,26,37].

In the wrapper model, a subset is selected and then theaccuracy of a predetermined learning algorithm is predicted todetermine the properness of a selected subset. In the wrappermodel of Xiong et al. [83], the selected subsets learn through threelearning algorithms including; linear discriminant analysis, logisticregression and support vector machines. These classifiers shouldbe run for every subset of genes selected from the search space.This procedure has a high computational complexity. Like thewrapper methods, in the embedded models, the genes are selectedas part of the specific learning method but with lower computa-tional complexity [19]. The subset selection methods of wrappermodel can be categorized into the population-based methods[71,34,53] and backward selection methods. Recently Lee andLeu [34], and Tong and Schierz [69] shed light on the effectivenessof the hybrid model in feature selection. The elements of a hybridmethod include Neural Network (NN), Fuzzy System, GeneticAlgorithm (GA; [76,23]) and Ant Colony [79]. Lee and Leu [34]examined the GA's ability in the feature selection. Furthermore,the abilities of fuzzy theories have been successfully applied bymany researchers [12,72,10]. Tong and Schierz [69] used a geneticalgorithm-Neural Network approach (GANN) as a wrapper model.The feature subset extraction is performed by GA and then theextracted subset is applied to learn the NN. These processes arerepeated until the best subset is determined. Because of the highdimension data, the GA looks to be a proper strategy for featureselection.

1.2. Feature extraction methods

In the literature, there are two well-known methods for featureextraction including principal component analysis (PCA; [78]) andlinear discriminant analysis (LDA; [48]). They normally transformthe original feature space to a lower dimensional feature transfor-mation methods. PCA transforms the original data to a set ofreduced feature that best approximate the original data. In the firststep, PCA calculates the data covariance matrix and then finds theeigenvalues and the eigenvectors of the matrix. Finally it goesthrough a dimensionality reduction step. According to the final step,the only terms corresponding to the K largest eigenvalues are kept.

In contrast to the PCA, first LDA calculates the scatter matricesincluding a within-class scatter matrix for each class, and thebetween-class scatter matrix. The within-class scatter matrix ismeasured by the respective class mean, and within-class scattermatrix measures the scatter of class means around the mixturemean. Then LDA transforms the data in a way that maximizes thebetween-class scatter and minimizes the within-class scatter. Sothe dimension is reduced and the class separability is maximized.

The feature extraction/selection method is the first step in geneexpression microarray classification and cancer detection. Thesecond step consists of a classifier learning the reduced features.In the literature, various classifiers have been investigated in orderto find the best classifier. It seems that the NN and various typesof NN [29,36,57,6,56,68,74,81,69,16], k nearest neighbors [61,13],k-means algorithms [32], Fuzzy c-means algorithm [11], bayesiannetworks [4], vector quantization based classifier [59], manifoldmethods [18,80], fuzzy approaches [54,58,30,60], complementary learn-ing fuzzy neural network [64–67], ensemble learning [55,8,27,50],logistic regression, support vector machines [22,5,82,73,63,46,70],LSVM [44], wavelet transform [28] as well as radial basis-supportvector machines [51] have been investigated successfully in classi-fication and cancer detection. But the recently developed classifiers

such as brain emotional learning (BEL) networks [42] have not beenexamined in this field.

BEL networks are recently developed methodologies that usesimulated emotions to aid their learning process. BEL is motivatedby the neurophysiological knowledge of the human's emotionalbrain. In contrast to the published models, the distinctive features ofthe BEL are low computational complexity and fast training whichmake it suitable for high dimensional feature vector classification.In this paper, BEL is developed and examined for gene expressionmicroarray classification tasks. It is expected that a model with lowcomputational complexity can be more successful in solving thechallenges of high dimensional microarray classification.

2. Proposed PCA–BEL to microarray data classification

Fig. 1 shows the general view of the proposed methods and thefinal proposed algorithm is presented in Fig. 2. In the proposedframework, what's different from published diagnostic methods isthe application of BEL model to cancer classification. There arevarious versions of BEL, including basic BEL [3], BELBIC (BEL basedintelligent controller; [45]), BELPR (BEL based pattern recognizer;[39]), BELPIC (BEL based picture recognizer; [43]) and supervisedBEL [38,40–42]. They are learning algorithms of emotional neuralnetworks [42]. These models are inspired by the emotional brain.The description of the relationship between the main componentsof emotional brain is common among all these models. Whatdiffers from one model to another is how they formulate thereward signal in the learning process. For example in the modelpresented by Balkenius and Morén [3], it is not clarified how thereward is assigned. In the BELBIC, the reward signal is definedexplicitly and the formulization of other equations is formedaccordingly. However, the supervised BEL employs the target valueof input pattern instead of the reward signal in the learning phase.So supervised BEL is model free and can be utilized in differentapplications and here, this version is developed for gene expres-sion microarray classification task. Generally the computationalcomplexity of BEL is very low [39–42]. It is O(n) that make itsuitable to use in high dimensional feature vector classification.

Fig. 1. General view of proposed method.

E. Lotfi, A. Keshavarz / Computers in Biology and Medicine 54 (2014) 180–187 181

Page 3: 2014 Gene expressionmicroarrayclassification usingPCA–BEL

BEL [42] is inspired by the interactions of thalamus, amygdala(AMYG) [15,17,21,24,31,33,77], orbitofrontal cortex (OFC) andsensory cortex in the emotional brain [42].

The first step is associated with PCA dimension reduction(Fig. 1). Consider the first k-principle components p1, p2,…, pk,they are the outputs of the first step and the inputs of second step.In the second step, this pattern should be normalized between [01]. The normalized k-principle components p1, p2,…, pk are outputsof the second step and the inputs of thirds step. Fig. 2 illustratesthe details of the proposed method. The input pattern of BEL isillustrated by vector p1, p2,…, pk and the E is the final output. Themodel consists of two main subsystems including AMYG and the

OFC. The AMYG receives the input pattern including: p1, p2,…, pkfrom the sensory cortex, and pkþ1 from the thalamus. The OFCreceives the input pattern including p1, p2,…, pk from the sensorycortex only. The pkþ1 calculated by following formula:

pkþ1 ¼maxj ¼ 1:::kðpjÞ ð1Þ

The vkþ1 is related to AMYG weight and the wkþ1 is related to OFCweight. The Ea is the internal output of AMYG which is used toadjust the plastic connection weights v1, v2,…, vkþ1 (Eq. (6)). TheEo is the output of OFC which is used to inhibit the AMYG output.This inhibitory task is implemented by subtraction of Eo from Ea(Eq. (5)). As the corrected AMYG response, E is the final output

Fig. 2. The flowchart of proposed method in learning step.

E. Lotfi, A. Keshavarz / Computers in Biology and Medicine 54 (2014) 180–187182

Page 4: 2014 Gene expressionmicroarrayclassification usingPCA–BEL

node. It is evaluated by monotonic increasing activation functiontansig and used to adjust OFC connection weights including w1, w2,…, wkþ1 (Eq. (7)). The activation function is as follows:

tan sigðxÞ ¼ 21þe�2x�1 ð2Þ

The AMYG, OFC and the final output are simply calculated byfollowing formulas respectively:

Ea ¼ ∑kþ1

j ¼ 1ðvj � pjÞþba ð3Þ

Eo ¼ ∑k

j ¼ 1ðwj � pjÞþbo ð4Þ

E¼ tan sigðEa�EoÞ ð5ÞLet t be target value associated to nth pattern (p). The t should

be binary encoded. So the supervised learning rules are as follows:

vj ¼ vjþ lr � maxðt�Ea;0Þ � pj f or j¼ 1…kþ1 ð6Þ

wj ¼wjþ lr � ðEa�Eo�tÞ � pj f or j¼ 1…kþ1 ð7Þ

ba¼ baþ lr � maxðt�Ea;0Þ ð8Þ

bo¼ boþ lr � ðEa�Eo�tÞ ð9Þwhere lr is learning rate, t is binary target and t�Ea is calculatederror, ba is the bias of AMYG neuron and bo is the bias of OFCneuron. The v1, v2,…, vkþ1 AMYG are learning weights and w1,w2,…, wkþ1 are OFC learning weights. Eqs. (3)–(9) show themultiple-inputs single-output model. In Figs. 2 and 3, the equa-tions are extended to multiple-inputs multiple-outputs usage. Theinput training microarray data in Fig. 2 includes the two matricesof P and T. The size of the matrix P is m� s where m is the numberof patterns and s is the number of features in each pattern (s⪢k).The size of the matrix T is m� c where c is the number of classes.The targets are encoded binary. So each row of matrix T includesonly one “1” and other columns are “0”. In the flowcharts, pidenotes the ith pattern and ti is related target.

The learning rate lr can be adaptively adjusted to increase theperformance. The final flowchart, Fig. 2, shows this adaptation andthe related parameters including the ratio to increase the learningrate (lr_inc) initialized with 1.05, the ratio to decrease learning rate(lr_dec) with the initial value 0.7, the maximum performanceincrease (minc) with initial the value 1.04, first performance (perf_f;in step 4 of the flowchart) and last performance (perf_l) which can becalculated as MSE. The initial lr¼0.001, the learning weights areinitialized randomly (step 3 in the flowchart.) and according to thealgorithm, if (perf_l/perf_f)4minc then lr¼ lr� lr_dec, else if (per-f_loperf_f), lr¼ lr� lr_inc. In the Fig. 2, the stop criterion is to reach adetermined learning epoch. The stop criterion can be the maximumepoch, which means the maximum number of epochs has beenreached (for example 10,000 epochs). Fig. 2 presents the learningstep and Fig. 3 shows the flowchart of the testing step. The inputs ofthe algorithm presented in Fig. 3 are a testing pattern, number ofclasses and the weights adjusted in the learning step. The last step ofthe algorithm is associated to the diagnosis where the index of themaximum E shows the class number of the pattern.

3. Experimental studies

The source code of the proposed method is accessible from http://www.bitools.ir/tprojects.html and it is evaluated to classify the geneexpression microarray data of 4-class complementary DNA (cDNA)microarray dataset of the small round blue cell tumors (SRBCTs), highgrade gliomas (HGG), lung, colon and breast cancer datasets. The

SRBCTs dataset is a 4-class cDNA microarray data and contains 2308genes and 83 samples including 29 samples in Ewing's sarcoma(EWS), 25 in rhabdomyosarcoma (RMS), 18 in neuroblastoma (NB)and 11 in Burkitt lymphoma (BL). This data set can be obtained fromhttp://research.nghri.nih.gov/microarray/Supplement/. In the pro-posed algorithm, the maximum learning epoch¼10,000, k¼100and initial lr is set at 0.001, 0.000001 and 0.001 for SRBCT, HGGand lung cancer datasets respectively. These parameters are pickedempirically. The value k¼100 and lr¼0.001 and 0.000001 can showbetter results for these datasets. However, in other applications theseparameters should be optimized.

The HGG dataset applied here, consist of 50 samples with 12,625genes including 14 classic glioblastomas, 14 non-classic glioblasto-mas, 7 classic anaplastic oligodendrogliomas and 15 non-classicanaplastic oligodendrogliomas. HGG dataset is accessible fromhttp://www.broadinstitute.org. In this dataset, the number of pat-terns much less than the number of the features in each sample andit may be difficult for classification methods to classify the data.

In the lung cancer dataset, there are 181 tissue samples in twoclasses: 31 points are malignant pleural mesothelioma and 150 pointsare adenocarcinoma. Each sample is described by 12,533 genes. Thisdata set is also accessible from http://datam.i2r.a-star.edu.sg/datasets/.Other datasets, applied here, are colon and breast cancer datasets thatare accessible from http://genomics-pubs.princeton.edu/oncology/affydata/index.html and http://datam.i2r.a-star.edu.sg/datasets/krbd/BreastCancer/BreastCancer.html, respectively. Colon dataset includes62 tissue samples with 2000 genes and the breast cancer datasetconsist of 97 samples and 24,481 genes.

Here and prior to entering comparative numerical studies, letus analyze the computational complexity of the proposed BEL.Regarding the learning step, the algorithm adjust O(2n) weights foreach pattern-target sample, where n is the number of inputattributes (for example for HGG database n¼12,625). Let us comparethe computational complexity with traditional neural networks anda supervised orthogonal discriminant projection classifier (SODP;[80]) applied in cancer detection. As mentioned above, the computa-tional complexity of the proposed classifier is O(n). In contrast,computational time is O(cn) for neural network and it is O(n2) forSODP. In NN architecture, c is the number of hidden neurons(generally c¼10) and SODP uses a Lagrangian multiplier thatimposes the complexity of O(n2). So the proposed method hasa lower computational complexity. This improved computing effi-ciency can be important for high dimensional feature vector classi-fication and cancer detection. The key to the proposed method is thefast processing resulting from low computational complexity thatmakes it suitable for cancer detection.

Another important point which is observed across in theexperimental implementations is that the results of the proposedmodel can change by changing the initial lr and k values. lr indicatesthe learning rate and k specifies the number of PCA's initial kcomponent in the algorithm. In other words, the value of lr and kshould be optimized in each problem. Here, the optimum values of0.001, 0.000001, 0.001, 0.00001 and 0.000001 are assigned to lr and100 to k for SRBCT, HGG, lung, colon and breast cancers respectively.The values assigned to lr are obtained from 0.1, 0.001, 0.0001,0.00001… 0.0000000001 and in the case of k from the values of 10,50 and 100 through implementation and observation.

The proposed method is compared with the results of themethods which have been reported by Zhang and Zhang [80].They have reported the results based on the 5-fold cross validationmethod. This implementation can result in the assessment ofaccuracy and repeatability and it can be used to validate theproposed method [46]. The compared methods include supervisedlocally linear embedding (SLLE), probability-based locally linearembedding (PLLE), locally linear discriminant embedding (LLDE),constrained maximum variance mapping (CMVU), orthogonal

E. Lotfi, A. Keshavarz / Computers in Biology and Medicine 54 (2014) 180–187 183

Page 5: 2014 Gene expressionmicroarrayclassification usingPCA–BEL

discriminant projection (ODP) and supervised orthogonal discri-minant projection (SODP).

These methods are extended manifold approaches that have beensuccessfully used in tumor classification. SLLE, PLLE and LLDE areextended versions of the locally linear embedding (LLE) that is aclassical manifold method. SODP is an extended version of ODP andCMVU is a linear approximation of multi-manifolds learning method.

Figs. 4–8 show the comparative results based on averageaccuracy of 5-fold cross validation. As illustrated in the figures,the proposed model shows consistent results and provides higherperformance in SRBCT, HGG and Lung cancer (Figs. 4–6). Table 1presents the percentage improvement of PCA–BEL with respect tothe best compared method reported by [80]. The best method inSRBCT and HDD detection is a supervised orthogonal discriminantprojection (SODP) algorithm with 96.56% and 73.74% averageaccuracy while in lung cancer classification the best method is alocally linear discriminant embedding (LLDE) with average accu-racy 93.18%. The proposed method improves these results.

Fig. 3. The testing step of proposed method to class diagnosis of an input issue.

85.00

90.00

95.00

100.00

105.00

SLLE PLLE ODP LLDE CMVU SODP PCA-BEL

SRBCT

Fig. 4. The accuracy comparison between various methods and proposed PCA–BELin SRBCTs classification problem.

Fig. 5. The accuracy comparison between various methods and proposed PCA–BELin HGG classification problem.

Fig. 6. The accuracy comparison between various methods and proposed PCA–BELin the lung cancer classification problem.

Fig. 7. The accuracy comparison between various methods reported by Zhang andZhang [80] and proposed PCA–BEL in the colon classification problem.

E. Lotfi, A. Keshavarz / Computers in Biology and Medicine 54 (2014) 180–187184

Page 6: 2014 Gene expressionmicroarrayclassification usingPCA–BEL

It seems that SRBCT and lung cancer are rather simple chal-lenges for the classifiers in terms of complexity, since the bestcompared classifiers i.e. SODP and LLDE (Table 1 and Figs. 8 and 6)have been able to exhibit a detection precision of 96.56% and93.18%. The proposed model improves these numbers by 3.56%

and 5.52% turning the accuracy into 100% and 98.32% for SRBCTand lung cancer, respectively (Table 1)

At any rate the detection precision of the proposed model isvery significant for HGG. It seems that this dataset is too complexfor other classifiers, because the best detection precision achievedfor HGG is 73.74% using the SODP method (refer to Table 1 andFig. 5). The proposed PCA–BEL has been able to effect a 30.18%improvement which results in 96% precision rate. However, theresults of colon and breast cancers, obtained from PCA–BEL, are87.40% and 88% accuracy which does not show any significantimprovement compared to the existing methods (Figs. 7 and 8).The percentage improvement of the proposed PCA–BEL is sum-marized in Table 1 and calculated by the following formulas:

Percentage improvement¼ 100� ðpropose method result–compared resultÞ=ðcompared resultÞ

ð10ÞAs illustrated in Table 1, the average accuracy of SRBCT, HGG

and lung cancer classification are 100%, 96% and 98.32% respec-tively obtained from proposed PCA–BEL. Table 2 shows thestatistical details of the improved results. The confidence level(confiLevel) in Table 2 is the Student's t-test with 95% confidence.

Finally Fig. 9 shows the averaged confusion matrix includingaccuracy, precision and recall of improved results obtained fromproposed PCA–BEL in 5-fold. In Fig. 9a, the class numbers 1, 2,3 and 4 belong to EWS, RMS, BL and NB respectively. In theexperimental results, 10,000 cycles is considered as the maximumnumber of learning cycles in every run. However this parametercan change for different problems. The maximum number ofcycles for the model is 220 in order to reach convergence and100% accuracy while there is a need for more than 8000 or eventhe whole 10,000 cycles to reach convergence in some folds ofHGG and lung cancer datasets. This parameter should preferablyhave the maximum value and considering the low calculationcomplexity of the method, increasing the number of learningcycles even to 100,000 will result in an acceptable calculationtime in modern computers.

4. Conclusions

In this paper, a novel gene-expression microarray classificationmethod is proposed based on PCA and BEL network. In contrast tothe many other classifiers, the proposed method shows lowercomputational complexity. Thus BEL can be considered as analternative approach to overcome the curse of dimensionality

Table 1Percentage improvement of classification of the small round blue cell tumor(SRBCT), high grade gliomas (HGG) and Lung cancer, obtained from proposedmethod. The compared methods are the supervised orthogonal discriminantprojection classifier (SODP) and locally linear discriminant embedding (LLDE)which are the best compared methods (Figs. 4–6).

Problem SRBCT HGG Lung cancer

Compared method SODP SODP LLDEDetection accuracy of compared method 96.56% 73.74% 93.18%Detection accuracy of our PCA–BEL method 100% 96% 98.32%Percentage improvement 3.56% 30.18% 5.52%

Table 2The statistical results of proposed PCA–BEL in three following improved problems:the small round blue cell tumor (SRBCT), high grade gliomas (HGG) and Lungcancer datasets. The rows 2, 3…, 5 show the detection accuracy of the folds and theremaining rows present the statistical information including maximum, mean,standard deviation (STD) of the results, and the confidence level (ConfiLevel) basedon the Student's t-test with 95% confidence.

Foldnumber SRBCT (%) HGG (%) Lung cancer (%)

F#1 100.00 100.00 100.00F#2 100.00 80.00 94.40F#3 100.00 100.00 97.22F#4 100.00 100.00 100.00F#5 100.00 100.00 100.00Max 100.00 100.00 100.00Average 100.00 96.00 98.32STD 0.00 8.94 2.50ConfiLevel 0.00 11.10 3.10

Fig. 8. The accuracy comparison between various methods reported by Zhang andZhang [80] and proposed PCA–BEL in the breast cancer classification problem.

Fig. 9. The averaged confusion matrix of improved problems including (a) SRBCT, (b) HGG and (c) lung cancer datasets.

E. Lotfi, A. Keshavarz / Computers in Biology and Medicine 54 (2014) 180–187 185

Page 7: 2014 Gene expressionmicroarrayclassification usingPCA–BEL

problem. The proposed model is accessible from http://www.bitools.ir/projects.html and is utilized for classification tasks ofSRBCT, HGG, lung, colon and breast cancer datasets. According tothe experimental results, the proposed method is more accuratethan traditional methods in SRBCT, HGG and lung datasets. PCA–BEL improves the detection accuracy about 3.56%, 30.18% and5.52% obtained respectively from SRBCT, HGG and lung cancer. Theresults indicate the superiority of the approach in terms of higheraccuracy and lower computational complexity. Hence, it isexpected that the proposed approach can be generally applicableto high dimensional feature vector classification problems.

However, the proposed approach has a drawback. Like manyother methods that used PCA, this method has not just extract theinformative gens. As mentioned in Section 1, PCA is a featureextraction method and cannot select the features. For futureimprovements the informative genes should be determined. Todetermine the informative gens, the proposed method shouldapply a feature selection step. This issue can be considered asthe next step of this research effort i.e. a proper feature selectionmethod should be found and replaced by PCA step of the proposedmethod. Furthermore, in order for the proposed method toprovide a proper response in other cancer classification problems,lr and k parameters should be specifically optimized for eachproblem. This issue can also be considered for the future worksand on the other datasets such as prostate cancer.

Conflict of interest

There is no conflict of interest.

References

[1] M. Alshalalfa, G. Naji, A. Qabaja, R. Alhajj, Combining multiple perspective asintelligent agents into robust approach for biomarker detection in geneexpression data, Int. J. Data Min. Bioinform. 5 (3) (2011) 332–350.

[2] P. Baldi, A.D. Long, A Bayesian framework for the analysis of microarrayexpression data: regularized t-test and statistical inferences of gene changes,Bioinformatics 17 (6) (2001) 509–519.

[3] C. Balkenius, J. Morén, Emotional learning: a computational model of amyg-dala, Cybern. Syst. 32 (6) (2001) 611–636.

[4] R. Cai, Z. Zhang, Z. Hao, Causal gene identification using combinatorialV-structure search, Neural Netw. 43 (2013) 63–71.

[5] A.H. Chen, C.H. Lin, A novel support vector sampling technique to improveclassification accuracy and to identify key genes of leukaemia and prostatecancers, Expert Syst. Appl. 38 (4) (2011) 3209–3219.

[6] J.H. Chiang, S.H. Ho, A combination of rough-based feature selection and RBFneural network for classification using gene expression data, NanoBiosci. IEEETrans. 7 (1) (2008) 91–99.

[7] W.K. Ching, L. Li, N.K. Tsing, C.W. Tai, T.W. Ng, A. Wong, K.W. Cheng, Aweighted local least squares imputation method for missing value estimationin microarray gene expression data, Int. J. Data Min. Bioinform. 4 (3) (2010)331–347.

[8] D. Chung, H. Kim, Robust classification ensemble method for microarray data,Int. J. Data Min. Bioinform. 5 (5) (2011) 504–518.

[9] Y.R. Cho, A. Zhang, X. Xu, Semantic similarity based feature extraction frommicroarray expression data, Int. J. Data Min. Bioinform. 3 (3) (2009) 333–345.

[10] J. Dai, Q. Xu, Attribute selection based on information gain ratio in fuzzy roughset theory with application to tumor classification, Appl. Soft Comput. 13 (1)(2013) 211–221.

[11] D. Dembele, P. Kastner, Fuzzy C-means method for clustering microarray data,Bioinformatics 19 (8) (2003) 973–980.

[12] Z. Deng, K.S. Choi, F.L. Chung, S. Wang, EEW-SC: Enhanced Entropy-WeightingSubspace Clustering for high dimensional gene expression data clusteringanalysis, Appl. Soft Comput. 11 (8) (2011) 4798–4806.

[13] M. Dhawan, S. Selvaraja, Z.H. Duan, Application of committee kNN classifiersfor gene expression profile classification, Int. J. Bioinform. Res. Appl. 6 (4)(2010) 344–352.

[14] C. Ding, H. Peng, Minimum redundancy feature selection from microarraygene expression data, J. Bioinform. Comput. Biol. 3 (02) (2005) 185–205.

[15] J.P. Fadok, M. Darvas, T.M. Dickerson, R.D. Palmiter, Long-term memory forpavlovian fear conditioning requires dopamine in the nucleus accumbens andbasolateral amygdala, PloS One 5 (9) (2010) e12751.

[16] F. Fernández-Navarro, C. Hervás-Martínez, R. Ruiz, J.C. Riquelme, Evolutionarygeneralized radial basis function neural networks for improving prediction

accuracy in gene classification using feature selection, Appl. Soft Comput. 12(6) (2012) 1787–1800.

[17] R. Gallassi, L. Sambati, R. Poda, M.S. Maserati, F. Oppi, M. Giulioni, P. Tinuper,Accelerated long-term forgetting in temporal lobe epilepsy: evidence ofimprovement after left temporal pole lobectomy, Epilepsy Behav. 22 (4)(2011) 793–795.

[18] J.M. García-Gómez, J. Gómez-Sanchs, P. Escandell-Montero, E. Fuster-Garcia,E. Soria-Olivas, Sparse Manifold Clustering and Embedding to discriminategene expression profiles of glioblastoma and meningioma tumors, Comput.Biol. Med. 43 (11) (2013) 1863–1869.

[19] S. Ghorai, A. Mukherjee, P.K. Dutta, Gene expression data classification byVVRKFA, Procedia Technol. 4 (2012) 330–335.

[20] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov,E.S. Lander, Molecular classification of cancer: class discovery and classprediction by gene expression monitoring, Science 286 (5439) (1999)531–537.

[21] E.M. Griggs, E.J. Young, G. Rumbaugh, C.A. Miller, MicroRNA-182 regulatesamygdala-dependent memory formation, J. Neurosci. 33 (4) (2013) 1734–1740.

[22] I. Guyon, J. Weston, S. Barnhill, V. Vapnik, Gene selection for cancer classificationusing support vector machines, Mach. Learn. 46 (1) (2002) 389–422.

[23] C. Gillies, N. Patel, J. Akervall, G. Wilson, Gene expression classification usingbinary rule majority voting genetic programming classifier, Int. J. Adv. Intell.Paradig. 4 (3) (2012) 241–255.

[24] O. Hardt, K. Nader, L. Nadel, Decay happens: the role of active forgetting inmemory, Trends Cogn. Sci. 17 (3) (2013) 111–120.

[25] H. Hong, Q. Hong, J. Liu, W. Tong, L. Shi, Estimating relative noise to signal inDNA microarray data, Int. J. Bioinform. Res. Appl. 9 (5) (2013) 433–448.

[26] D.S. Huang, C.H. Zheng, Independent component analysis-based penalizeddiscriminant method for tumor classification using gene expression data,Bioinformatics 22 (15) (2006) 1855–1862.

[27] N. Iam-On, T. Boongoen, S. Garrett, C. Price, New cluster ensemble approach tointegrative biological data analysis, Int. J. Data Min. Bioinform. 8 (2) (2013)150–168.

[28] A. Jose, D. Mugler, Z.H. Duan, A gene selection method for classifying cancersamples using 1D discrete wavelet transform, Int. J. Comput. Biol. Drug Des. 2(4) (2009) 398–411.

[29] J. Khan, J.S. Wei, M. Ringner, L.H. Saal, M. Ladanyi, F. Westermann, P.S. Meltzer,Classification and diagnostic prediction of cancers using gene expressionprofiling and artificial neural networks, Nat. Med. 7 (6) (2001) 673–679.

[30] M. Khashei, A. Zeinal Hamadani, M. Bijari, A fuzzy intelligent approach to theclassification problem in gene expression data analysis, Knowl.-Based Syst. 27(2012) 465–474.

[31] J.H. Kim, S. Li, A.S. Hamlin, G.P. McNally, R. Richardson, Phosphorylation ofmitogen-activated protein kinase in the medial prefrontal cortex and theamygdala following memory retrieval or forgetting in developing rats,Neurobiol. Learn. Mem. 97 (1) (2011) 59–68.

[32] Y.K. Lam, P.W. Tsang, eXploratory K-Means: a new simple and efficientalgorithm for gene clustering, Appl. Soft Comput. 12 (3) (2012) 1149–1157.

[33] R. Lamprecht, S. Hazvi, Y. Dudai, cAMP response element-binding protein inthe amygdala is required for long-but not short-term conditioned tasteaversion memory, J. Neurosci. 17 (21) (1997) 8443–8450.

[34] C.P. Lee, Y. Leu, A novel hybrid feature selection method for microarray dataanalysis, Appl. Soft Comput. 11 (1) (2011) 208–213.

[35] H. Liu, J. Li, L. Wong, A comparative study on feature selection and classifica-tion methods using gene expression profiles and proteomic patterns, GenomeInform. Ser. 13 (2002) 51–60.

[36] B. Liu, Q. Cui, T. Jiang, S. Ma, A combinational feature selection and ensembleneural network method for classification of gene expression data, BMCBioinform. 5 (1) (2004) 136.

[37] Y. Liu, Wavelet feature extraction for high-dimensional microarray data,Neurocomputing 72 (4) (2009) 985–990.

[38] E. Lotfi, M.R. Akbarzadeh-T, Supervised brain emotional learning. IEEE Inter-national Joint Conference on Neural Networks (IJCNN), 2012, pp. 1–6, http://dx.doi.org/10.1109/IJCNN.2012.6252391.

[39] E. Lotfi, M.R. Akbarzadeh-T, Brain Emotional Learning-Based Pattern Recogni-zer, Cybern. Syst. 44 (5) (2013) 402–421.

[40] E. Lotfi, M.R. Akbarzadeh-T, Emotional brain-inspired adaptive fuzzy decayedlearning for online prediction problems, in: 2013 IEEE International Confer-ence on Fuzzy Systems (FUZZ), pp. 1–7, IEEE, 2013, July).

[41] E. Lotfi, M.R. Akbarzadeh-T, Adaptive brain emotional decayed learning foronline prediction of geomagnetic activity indices, Neurocomputing 126 (2014)188–196.

[42] E. Lotfi, M.R. Akbarzadeh-T, Practical emotional neural networks, NeuralNetworks 59 (2014) 61–72. http://dx.doi.org/10.1016/j.neunet.2014.06.012.

[43] E. Lotfi, S. Setayeshi, S. Taimory, A neural basis computational model ofemotional brain for online visual object recognition, Appl. Artif. Intell. 28(2014) 1–21. http://dx.doi.org/10.1080/08839514.2014.952924.

[44] Z. Liu, D. Chen, Y. Xu, J. Liu, Logistic support vector machines and theirapplication to gene expression data, Int. J. Bioinform. Res. Appl. 1 (2) (2005)169–182.

[45] C. Lucas, D. Shahmirzadi, N. Sheikholeslami, Introducing BELBIC: brain emo-tional learning based intelligent controller, Int. J. Intell. Autom. Soft Comput.10 (2004) 11–21.

[46] M. Meselhy Eltoukhy, I. Faye, B. Belhaouari Samir, A statistical based featureextraction method for breast cancer diagnosis in digital mammogram usingmultiresolution representation, Comput. Biol. Med. 42 (1) (2012) 123–128.

E. Lotfi, A. Keshavarz / Computers in Biology and Medicine 54 (2014) 180–187186

Page 8: 2014 Gene expressionmicroarrayclassification usingPCA–BEL

[47] V.S. Tseng, H.H. Yu, Microarray data classification by multi-information basedgene scoring integrated with Gene Ontology, Int. J. Data Min. Bioinform. 5 (4)(2011) 402–416.

[48] M. Xiong, L. Jin, W. Li, E. Boerwinkle, Computational methods for geneexpression-based tumor classification, Biotechniques 29 (6) (2000) 1264–1271.

[50] Reboiro-Jato Miguel, Glez-Peña Daniel, Díaz Fernando, Fdez-Riverola Florentino,A novel ensemble approach for multicategory classification of DNA microarraydata using biological relevant gene sets, Int. J. Data Min. Bioinform. 6 (6) (2012)602–616.

[51] L. Nanni, A. Lumini, Ensemblator: an ensemble of classifiers for reliableclassification of biological data, Pattern Recognit. Lett. 28 (5) (2007) 622–630.

[53] T. Prasartvit, A. Banharnsakun, B. Kaewkamnerdpong, T. Achalakul, Reducingbioinformatics data dimension with ABC-kNN, Neurocomputing 116 (2013)367–381. http://dx.doi.org/10.1016/j.neucom.2012.01.045.

[54] M. Perez, D.M. Rubin, L.E. Scott, T. Marwala, W. Stevens, A hybrid fuzzy-svmclassifier, applied to gene expression profiling for automated leukaemiadiagnosis, in: IEEE 25th Convention of Electrical and Electronics Engineersin Israel, 2008, IEEEI 2008, IEEE, 2008, December, pp. 041–045.

[55] Y. Peng, A novel ensemble machine learning for robust microarray dataclassification, Comput. Biol. Med. 36 (6) (2006) 553–573.

[56] L.P. Petalidis, A. Oulas, M. Backlund, M.T. Wayland, L. Liu, K. Plant, V.P. Collins,Improved grading and survival prediction of human astrocytic brain tumorsby artificial neural network analysis of gene expression microarray data, Mol.Cancer Ther. 7 (5) (2008) 1013–1024.

[57] L.E. Peterson, M. Ozen, H. Erdem, A. Amini, L. Gomez, C.C. Nelson, M. Ittmann,Artificial neural network analysis of DNA microarray-based prostate cancerrecurrence, in: Proceedings of the 2005 IEEE Symposium on ComputationalIntelligence in Bioinformatics and Computational Biology, 2005, CIBCB'05,IEEE, 2005, November, pp. 1–8.

[58] L.E. Peterson, M.A. Coleman, Machine learning-based receiver operatingcharacteristic (ROC) curves for crisp and fuzzy classification of DNA micro-arrays in cancer research, Int. J. Approx. Reason. 47 (1) (2008) 17–36.

[59] I. Porto-Díaz, V. Bolón-Canedo, A. Alonso-Betanzos, O. Fontenla-Romero, Astudy of performance on microarray data sets for a classifier based oninformation theoretic learning, Neural Netw. 24 (8) (2011) 888–896.

[60] S. Saha, A. Ekbal, K. Gupta, S. Bandyopadhyay, Gene expression data clusteringusing a multiobjective symmetry based clustering technique, Comput. Biol.Med. 43 (11) (2013) 1965–1977.

[61] A. Statnikov, C.F. Aliferis, I. Tsamardinos, D. Hardin, S. Levy, A comprehensiveevaluation of multicategory classification methods for microarray geneexpression cancer diagnosis, Bioinformatics 21 (5) (2005) 631–643.

[62] X. Sun, Y. Liu, M. Xu, H. Chen, J. Han, K. Wang, Feature selection using dynamicweights for classification, Knowl.-Based Syst. 37 (2013) 541–549. http://dx.doi.org/10.1016/j.knosys.2012.10.001.

[63] M. Song, S. Rajasekaran, A greedy algorithm for gene selection based on SVMand correlation, Int. J. Bioinform. Res. Appl. 6 (3) (2010) 296–307.

[64] T.Z. Tan, C. Quek, G.S. Ng, Ovarian cancer diagnosis by hippocampus andneocortex-inspired learning memory structures, Neural Netw. 18 (5) (2005)818–825.

[65] T.Z. Tan, C. Quek, G.S. Ng, E.Y.K. Ng, A novel cognitive interpretation of breastcancer thermography with complementary learning fuzzy neural memorystructure, Expert Syst. Appl. 33 (3) (2007) 652–666.

[66] T.Z. Tan, G.S. Ng, C. Quek, Complementary learning fuzzy neural network: anapproach to imbalanced dataset, in: International Joint Conference on NeuralNetworks, 2007, IJCNN 2007, IEEE, pp. 2306-2311, 2007.

[67] T.Z. Tan, C. Quek, G.S. Ng, K. Razvi, Ovarian cancer diagnosis with comple-mentary learning fuzzy neural network, Artif. Intell. Med. 43 (3) (2008)207–222.

[68] M. Takahashi, H. Hayashi, Y. Watanabe, K. Sawamura, N. Fukui, J. Watanabe,T. Someya, Diagnostic classification of schizophrenia by neural networkanalysis of blood-based gene expression signatures, Schizophr. Res. 119 (1)(2010) 210–218.

[69] D.L. Tong, A.C. Schierz, Hybrid genetic algorithm-neural network: featureextraction for unpreprocessed microarray data, Artif. Intell. Med. 53 (1) (2011)47–56.

[70] M. Tong, K.H. Liu, C. Xu, W. Ju, An ensemble of SVM classifiers based on genepairs, Comput. Biol. Med. 43 (6) (2013) 729–737.

[71] M.H. Tseng, H.C. Liao, The genetic algorithm for breast tumor diagnosis – thecase of DNA viruses, Appl. Soft Comput. 9 (2) (2009) 703–710.

[72] P. Vadakkepat, L.A. Poh, Fuzzy-rough discriminative feature selection andclassification algorithm, with application to microarray and image datasets,Appl. Soft Comput. 11 (4) (2011) 3429–3440.

[73] V. Vinaya, N. Bulsara, C.J. Gadgil, M. Gadgil, Comparison of feature selectionand classification combinations for cancer classification using microarray data,Int. J. Bioinform. Res. Appl. 5 (4) (2009) 417–431.

[74] S.L. Wang, X. Li, S. Zhang, J. Gui, D.S. Huang, Tumor classification by combiningPNN classifier ensemble with neighborhood rough set based gene reduction,Comput. Biol. Med. 40 (2) (2010) 179–189.

[75] Y.F. Wang, Z.G. Yu, V. Anh, Fuzzy C–means method with empirical modedecomposition for clustering microarray data, Int. J. Data Min. Bioinform. 7 (2)(2013) 103–117.

[76] A. Yardimci, Soft computing in medicine, Appl. Soft Comput. 9 (3) (2009)1029–1043.

[77] S.H. Yeh, C.H. Lin, P.W. Gean, Acetylation of nuclear factor-κB in rat amygdalaimproves long-term but not short-term retention of fear memory, Mol.Pharmacol. 65 (5) (2004) 1286–1292.

[78] K.Y. Yeung, W.L. Ruzzo, Principal component analysis for clustering geneexpression data, Bioinformatics 17 (9) (2001) 763–774.

[79] Y. Zhang, J. Xuan, R. Clarke, H.W. Ressom, Module-based breast cancerclassification, Int. J. Data Min. Bioinform. 7 (3) (2013) 284–302.

[80] C. Zhang, S. Zhang, A supervised orthogonal discriminant projection for tumorclassification using gene expression data, Comput. Biol. Med. 43 (5) (2013)568–575. http://dx.doi.org/10.1016/j.compbiomed.2013.01.019.

[81] Z. Zainuddin, P. Ong, Reliable multiclass cancer classification of microarraygene expression profiles using an improved wavelet neural network, ExpertSyst. Appl. 38 (11) (2011) 13711–13722.

[82] X.L. Xia, K. Li, G.W. Irwin, Two-stage gene selection for support vector machineclassification of microarray data, Int. J. Model. Identif. Control 8 (2) (2009)164–171.

[83] M. Xiong, X. Fang, J. Zhao, Biomarker identification by feature wrappers,Genome Res. 11 (11) (2001) 1878–1887.

[84] H. Xiong, S. Shekhar, P.N. Tan, V., Kumar, Exploiting a support-based upperbound of Pearson's correlation coefficient for efficiently identifying stronglycorrelated pairs, in: Proceedings of the Tenth ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining, ACM, 2004, August,pp. 334–343.

E. Lotfi, A. Keshavarz / Computers in Biology and Medicine 54 (2014) 180–187 187