1-s2.0-S0893608015002804-main

7/25/2019 1-s2.0-S0893608015002804-main

1/12

Neural Networks 75 (2016) 150161

Contents lists available atScienceDirect

Neural Networks

journal homepage:www.elsevier.com/locate/neunet

Two fast and accurate heuristic RBF learning rules for dataclassification

Modjtaba Rouhani , Dawood S. JavanFaculty of engineering, Ferdowsi University of Mashhad, Mashhad, Iran

a r t i c l e i n f o

Article history:

Received 8 March 2015Received in revised form 24 October 2015

Accepted 22 December 2015

Available online 4 January 2016

Keywords:

Radial Basis Function

Neural network learningHeuristic learning method

Classification problem

Linear separability in feature space

a b s t r a c t

This paper presents new Radial Basis Function (RBF) learning methods for classification problems. The

proposed methods use some heuristics to determine the spreads, the centers and the number of hiddenneurons of network in such a way that the higher efficiency is achieved by fewer numbers of neurons,while the learning algorithm remains fast and simple. To retain network size limited, neurons are addedto network recursively until termination condition is met. Each neuron covers some of train data. Thetermination condition is to cover all training data or to reach the maximum number of neurons. Ineach step, the center and spread of the new neuron are selected based on maximization of its coverage.Maximization of coverage of the neurons leads to a network with fewer neurons and indeed lowerVC dimension and better generalization property. Using power exponential distribution function as theactivation function of hidden neurons, and in the light of new learning approaches, it is proved that alldata became linearly separable in the space of hidden layer outputs which implies that there exist linearoutput layer weights with zero training error. The proposed methods are applied to some well-knowndatasets and the simulation results, compared with SVM and some other leading RBF learning methods,show their satisfactory and comparable performance.

2015 Elsevier Ltd. All rights reserved.

1. Introduction

Radial Basis Functions (RBFs) are among those neural networksoriginated from biological concepts and applied to regression aswell as classification problems(Moody & Darken, 1989;Powell,1985). Bormhead and Lowe (1988) presented a method for fastlearning of RBF neural networks. Their work was followed byPark and Sandberg(1991)who proved that RBFs can approximateany arbitrary nonlinear function under some mild conditions, toany desirable accuracy. Soon after RBFs were applied to a widerange of engineering problems, like fault recognition (Huang &

Tan, 2009;Meng, Dong, Wang, & Wong, 2010), image processing(Ferrari, Bellocchio, Piuri, & Borghese, 2010; Lee & Yoon, 2010), andadaptive control (Cai, Rad, & Chan, 2010;Tsai, Huang, & Lin, 2010;Yu, Xie, Paszczynski, & Wilamowski, 2011). The generalizationproperty of RBFs and other neural networks is widely investigatedin the literature. Some authors estimated bounds on the numberof hidden neurons required in terms of input dimension, widthsof the neurons, the minimal separation among the centers and the

Corresponding author. Tel.: +98 513 8805122; fax: +98 513 8807801.E-mail addresses:[email protected](M. Rouhani),

[email protected](D.S. Javan).

possibility of fixing the centers (Girosi & Anzellotti, 1993;Kainen,Kurkov, & Sanguineti, 2009, 2012; Mhaskar, 2004a, 2004b).Although many studies place emphasis on the number of hiddenneurons,Bartlett(1998)found that the size of the weights may bemore important than the size of the network.

The main advantage of traditional RBFs lies in its simple andfast learning rule which in contrast to back propagation networksis not iterative and has no convergence problem. However, earlyRBF learning rules required a large number of hidden neurons andnormally had low generalization performances(Yu et al., 2011). Inthe last two decades, different RBF learning methods have been

proposed to reduce the number of hidden neurons by means ofsmarter selection of neuron centers and spreads.

Determination of required number of hidden units, theircenters and spreads are the main parts of an RBF learning rule.Orr (1995) introduced the regularization and subset selectionmethods for proper selection of units centers. Chen, Cowan, andGrant(1991) used orthogonal least squares for optimal additionof new RB units. Huang, Saratchandran, and Sundararajan(2004)proposed the sequential growing and pruning algorithm to designan RBF. Wu and Chow (2004) applied advanced self-organizingmap(SOM)to cluster training dataset andto selectthe centerof theclusters as units center. Xie, Yu, Hewlett, Rzycki, and Wilamowski(2012) implemented second order gradient method to learn an

http://dx.doi.org/10.1016/j.neunet.2015.12.011

0893-6080/ 2015 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.neunet.2015.12.011http://www.elsevier.com/locate/neunethttp://www.elsevier.com/locate/neunetmailto:[email protected]:[email protected]://dx.doi.org/10.1016/j.neunet.2015.12.011http://dx.doi.org/10.1016/j.neunet.2015.12.011mailto:[email protected]:[email protected]://crossmark.crossref.org/dialog/?doi=10.1016/j.neunet.2015.12.011&domain=pdfhttp://www.elsevier.com/locate/neunethttp://www.elsevier.com/locate/neunethttp://dx.doi.org/10.1016/j.neunet.2015.12.011

7/25/2019 1-s2.0-S0893608015002804-main

2/12

M. Rouhani, D.S. Javan / Neural Networks 75 (2016) 150161 151

RBF. In their method, all parameters of the network (hidden layersspreads and centers together with output weights) are learnediteratively by second order gradient approach. Hien, Huan, andHuynh (2009) and Huan, Hien, and Huynh (2007) proposed atwo-phase algorithm for training of an interpolation RBF withmany hidden neurons. Gaussian function is one of the mostpopular activation functions in RBF networks (Blanzieri, 2003;Hartman, Keeler, & Kowalski, 1990).Huan, Hien, and Tue(2011)replaced Euclidean norm in Gaussian function with Mahalanobisnorm and calculated spread parameters, directly. In fact, one ofthe challenges in RBF modeling is the optimization of spreadparameters. Yao, Chen, Zhao, and Tooren (2012) proposed anoptimization method for spread parameters based on concurrentsubspace width optimization. A more complete review of RBF anditslearningrulesand applicationscouldbe found in Yuetal. (2011).

Some authors (Constantinopoulos & Likas, 2006;Jaiyen, Lursin-sap, & Phimoltares, 2010;Mao & Huang, 2005;Oyang, Hwang, Ou,Chen, & Chen, 2005;Titsias & Likas, 2001)concentrated on RBFlearning rules for classification problems. Constantinopoulos andLikas(2006)considered the idea of adding the new neurons to themost crucial regions. When all neurons are added, every compo-nent is split into subcomponents, and each one corresponds to a

different class. Mao and Huang (2005)used data structure pre-serving criterion to add new units.Oyang et al.(2005)constructedan RBF subnetwork for each class of training data to approximatethe probability density function of that class. Jaiyen et al.(2010)presented a very fast learning method for classification problemsthat uses every datum just once and then can discard it. They usedversatile elliptic function instead of traditional Gaussian activationfunction. By their proposed learning rules, RBF network has fewneurons but its performance is limited.

In all those learning rules, there are tradeoffs between thenumber of hidden neurons, the generalization performance, andspeed and complexity of learning rule. Networks with morehidden neurons perform better on learning data but may have lessgeneralization ability. Learning rules with better generalization

performance goals have to limit the numberof hiddenneurons andneed to optimize the centers and spread of those limited nodes.Although there is a wide range of learning rules for RBF, it seemsthat the tradeoff between the generalization performance and thecomplexity of learning rule needs even more attention. Clearly,the learning rules presented in this paper would not solve thisdilemma. However, the key point of this paper is to keep learningrule simpleand fast, andat thesame time thenumber of thehiddenneurons limited by clever selection of RBF centers and spreads.

This paper presents new RBF learning rules for classificationproblems. The learning rules are one-pass heuristic rules withguaranteed linear separability property in feature space. Thatis, training data belonging to a class is linearly separable fromthose training data belonging to its complement set, in radial

basis functions space. The training data can be learned with 100%accuracy without extreme increase in the number of neurons. Theproposed learning methods are fast and simple, using efficientheuristics in their design. The learning rules set all hidden layerparameters, i.e. centers and spreads of neurons and the requirednumber of neurons, simultaneously. They achieve least errorand limited hidden neurons by selecting the neurons centersand spreads in the way that every newly added unit has thehighest coverage region of its own class, leading to a networkwith less complexity. Limiting the number of hidden neuronstogether with their near-optimal selection results in an increasedgeneralization performance. The idea of adding new neurons basedon maximum coverage achievable was first introduced in Javan,Rajabi Mashhadi, Ashkezari Toussi, and Rouhani(2012) andJavan,

Rajabi Mashhadi, and Rouhani(2013)and successfully applied toapplications in power systems. Here we extend the idea by two

interpretations of maximum coverage of neurons: in terms of theirspatial coverage and in terms of the number of data points coveredby a neuron.

The rest of this paper is organized as follows. Section 2 presentsthe main ideas for determination of spreads and centers of hiddenunits together with learning of output layer parameters andstructures. In Section3, four theorems about the convergence ofthe learning algorithms, their computational complexity and theirguaranteed linear separability property are proved. Section 4 isdedicated to simulation results of applying the proposed methodsto some well-known datasets. In Section5,the simulation resultsare compared with leading RBF learning rules and some SVMmodels. Section6summarizes the paper.

2. The proposed RBF learning rules

This section presents the main ideas of proposed RBF learningrules. Like the traditional RBF, the proposed RBF consists of ahidden layer of radial basis function neurons and an output layer.The parameters of hidden layer units, i.e. their spreads and centers,are set in the first phase and the parameters of the output layer (ifthere is any adjustable parameter in the output layer), can be setby traditional Least Mean Squares (LMS) method in the next phase.Two different methods are presented to learn the hidden layer. Inboth methods, the network has no hidden units at the beginningand the required neurons are added one by one. Every added unitis associated to one of output classes and covers a number oflearning data points of its own class. The center and the spreadof the unit are set according to the highest coverage possible.There are two heuristic approaches. First, we present the approachbased on the selection of neuron with maximum spread, and thenthe approach based on the selection of neuron with maximumcoverage of learning data. To guarantee the linear separability oflearning patterns in feature space (i.e. hidden units output), thetraditional Gaussian function is replaced by another symmetricbell-shape function. The output layer may have two different

structures. In the first structure, the layer has a weight matrixlearned by the well-known LMS method. The second structureassumes a winner-take-all (WTA) structure for output layer withno adjustable parameter, capable of accurate classification of alltraining data.

In the proposed RBF structure, each hidden neuron belongsto one of the output classes and represents some of the trainingdata for that class by its activation domain. Among the main ideasof this paper is to choose the hidden neurons with the highestcoverage domains, i.e. the ones with largest spread, in the hopethat the required number of hidden neuron decreases. In otherwords, in each iteration of the proposed algorithms, the centerand the spread of the new units are selected based on the largestcoverage possibility. Here, the largest coverage may have two

interpretations. First, in terms of the domain of coverage, thelargest coverage leads to selection of the neuron with greatestspread. Second, in terms of training data, the largest coverage leadsto selection of the neuron with the most number of data pointsin its domain. Those two interpretations result in two differentlearning algorithms, as discussed in this section. Selection of theneuron with largest coverage leads to smaller network, whichindeed prevents it from overtraining. Another idea presented inthis paper is to slightly modify the activation function of radialbasis units to guarantee the separability of training data in RBsspace.

Boundary data have important rule in performance andaccuracy of a classifier. Therefore, some approaches like supportvector machines (SVMs) have special emphasis on boundary data.

On the other hand, a boundary datum can simply be affected bynoise and may lead to misclassification. In other words, higher

7/25/2019 1-s2.0-S0893608015002804-main

3/12

152 M. Rouhani, D.S. Javan / Neural Networks 75 (2016) 150161

emphasis on the boundary data increases the sensitivity to noise.The proposed methods try to make a trade-off between those twoconflicting concepts by using the boundary data in an indirect way.That is, in the classification problem, the spreads of the neurons ofeach class are determined by the nearest boundary datum of theother classes while its center is determined by a datum of its ownclass.

2.1. Rule 1: The learning rule based on the maximum spread

The learning of an RBF can be carried out in two phases.In the first phase, centers and spreads of all hidden units arespecified, and then the weight matrix of linear output layer canbe calculated. In this paper, some heuristics are used in the firststage to improve the performance of each neuron added to thenetwork. As hidden units of RBF function locally (that is, eachneuron is active in a hypersphere around its center and inactiveelsewhere), the proposed learning rule tries to allocate the newneurons to have the largest activation region. This will decreasethe number of required hidden neurons and consequently willincrease the generalization performance. Each newly added unitbelongs to one of the output classes and is active (as will bedefined) for a number of training patterns of that class and

inactive for all patterns from other classes. In the first learningrule presented in this subsection, the center of the new unit isplaced at one of the training points and its spread is set equalto the distance of its center to the nearest pattern from oppositeclass (or classes) in order to limit the activation of the new unit topatterns of its own class. The center point is selected based on themaximum achievable spread. A unit with the largest spread resultsin more learning patterns covered by its activation domain, andall input patterns would be covered by fewer units. Furthermore,larger spreads make inputoutput mapping smoother and thegeneralization performance will increase. As we expect that theunit covers just the patterns of its own class, the maximum spreadachievable is limited by the nearest pattern (i.e. a boundary datum)of the other classes. In fact, by selecting the maximum spread for

neurons, the boundary data have an indirect effect on decisionsurface of the classifier. The data points near the boundary of twoclasses have the most information in the classification problems,however they can easily be affected by noise. This makes relianceon near boundary data (as in SVM model) more risky when thedata may be noisy. By the way, in the proposed learning rules theeffect of boundary data is in the determination of the spread of theneurons andas the spreadis selected to have largest values, a smalldisplacement of boundary datum has no significant influence onthe overall performance of the network.

Below arethe details of the first learning rule forspecification ofhidden layer parameters based on the maximum spread for newlyadded neurons:

Learning rule based on the maximum spread:Suppose there arePtraining patterns xp , p= 1, . . . , Peach

belongs to one ofCclasses(1) Calculate data distances matrix as follows:

D=xi xjPP i,j= 1, . . . , P. (1)

Consider the following sub-matrices ofD:

Rc=xi xjPc(PPc), xi Class c, xj Class c

Qc=xi xjPcPc , xi, xj Class c (2)

where, Rc is the distance matrix of class cdata points fromdata of other classes and is obtained by keeping rows of matrixD corresponding to class c and deleting its correspondingcolumns. AndQcis the matrix of distance of classcdata pointsto each other, and is obtained by keeping rows and columns of

matrixD corresponding to data in class c.Pcis the number ofdata in classc.

Let the number of neurons in the network be zero, n = 0.

(2) Consider the first class,c= 1.

(3) Calculate the minimum distance of each datum x i Class cfrom all other data in other classes using corresponding row ofRcmatrix:

rc= minjxi xj

Pc1

, xi Class c, xj Class c (3)

where vector rcmeasures the distance of alldata in class cfrom

other classes.

(4) Add a new neuron to hidden layer, set n = n + 1. The newneuron belongs to class c. Select the largest uncovered value ofvector rcasthe spreadof new neuron nand the correspondingdata as itsCn

n = maxi

rci = maxi

min

j

xi xj

,

xi Class c, xj Class c

Cn = xi, i= arg maxminj xi xj ,xi Class c, xj Class c.

(4)

(5) Consider the row ofQccorresponding to datum xi, let all data

points of classcthat their distances to center are smaller than

the spread of new neuron as covered and delete corresponding

rows and columns fromRcand Qc.

(6) If there are uncovered data in class c go back to step 4.

Otherwise if all classes are processed, that is ifc = C, stop.IfcRa2 > Ra3.

After all data belonging to class 1 are processed, the learning

procedure is repeated for class 2 data and in its first iteration the

datum of class 2 with largest distance from class 1 data is selected

as the center of new neuron belongs to class 2. The covered data

from class 2 are discarded Fig. 1(c) andthe procedure goes forward

for the rest of the data from class 2 Fig. 1(d). Again, the data

from class 2 are covered completely by three hidden neurons. Each

datum is covered by a neuron of its own class and the spreads ofthe neurons are the maximum achievable.

7/25/2019 1-s2.0-S0893608015002804-main

4/12


Fig. 1. Classification of an artificial two-dimensional dataset by learning rule based on the maximum spread. See text for description.

2.2. Rule 2: The learning rule based on the maximum data coverage

The second algorithm proposed in this paper is based onanother simple heuristic idea. Here, the center of newly addedneuron is selected based on the maximization of the number ofdata which is covered by that neuron. Again, more data coveredby each neuron means less neurons are needed for coverage ofthe entire dataset and thus, higher generalization performance ishopeful.

Learning rule based on the maximum coverage:

Suppose there arePtraining patternsxp

, p= 1, . . . , Peachbelongs to one ofCclasses

(1) Calculate data distances matrix as follows:

D=

xi xj

PP i,j= 1, . . . , P. (1)Consider following sub-matrices ofD:

Rc=xi xjPc(PPc) , xi Class c,xj Class c

Qc=xi xjPcPc , xi,xj Class c (2)

where Rc is the distance matrix of class cdata points fromdata of other classes and is obtained by keeping rows of matrixD corresponding to class c and deleting its correspondingcolumns. AndQcis the matrix of distance of classcdata pointsto each other and is obtained by keeping rows and columns ofmatrix D corresponding to data in class c.Pcis the number ofdata in classc.

Let the number of neurons in the network be zero, n= 0.

(2) Consider the first class,c= 1.

(3) Calculate the minimum distance of each datum x i Class c

from all other data in other classes using corresponding row ofmatrix Rc:

rc=

min

j

xi xj

Pc1

xi Class c, xj Class c (3)

where, vector rcmeasures the distance of all data in class cfrom other classes.

(4) Count the number of class c data that can be covered bydatumxi:

nc =

count

jQij =

xi xj < rci

Pc1

xi,xj Class c. (5)

(5) Add a new neuron to hidden layer, set n = n + 1. The newneuron belongs to class c. Select the datumxicorresponding tolargest value ofncvector as its center Cnand set the spread ofthenew neuron nto the corresponding value of data vector rc.

Cn = x i, i= argmax(nci), andxi Class c

n = rci.

(6) Consider the row ofQccorresponding to datum xi, let all datapoints of class cthat their distances to the center are smallerthan spreadof newneuronas covered andremove correspond-ing rows and columns fromRcand Qc.

(7) If there are uncovered data in class cgo back to step 4. Oth-erwise if all classes are processed c = C stop. Ifc < C, setc= c+ 1 and go back to step 3.

Both algorithms based on the maximum spread and the maximumcoverage try to cover all data in a class by the minimum number of

7/25/2019 1-s2.0-S0893608015002804-main

5/12


hidden neurons. As will be discussed in the following subsection,the selection of spread of each neuron is based on keeping a pre-specified level of activation of that neuron for data in its own class,

in a way that each neuron can be regarded as active in its ownspread range and inactive outside it.

2.3. Activation function

As mentioned before, two proposed methods are based on theidea of local activation of each neuron for data point of its ownclass.That means we expectthe hiddenneurons to be active (i.e. itsoutput must be greater than a specified level u, 0 < u < 1) forsome of training patterns of its own class and be inactive (i.e. itsoutput must be less than a specified level l, 0 < l < u) forall patterns of other classes. The well-known Gaussian activationfunction of RB neurons is as follows:

o(r)= e r

2

22n (6a)

in which, n is the spread of nth neuron and r = xcn isthe distance of pointx from the center of neuron cn. As Gaussianfunction has just one parameter , it can satisfy only one of thetwo mentioned requirements. That is, one may set in orderto let output of the neuron be less than l for nearest datum ofother classes, or it may be set in order to let output of the neuronbe greater than or equal to u for the farthest datum of its owncovered class. To achieve both conditions, we need a function withat least two parameters. Liu and Bozdogan (2003) proposed powerexponential distribution function

o(r)= e r

n (6b)

as activation function of an RBF neuron which has additionalparameter . We use the same activation function as in (6b),butreformulate it slightly:

o(r)= elog(l)

r2

2n

n(7)

which has two parameters nand n. The output of this function is1 at the center of the neuron (r = 0) and at distances equal to orgreater than its spread (r n) the output is equal to or less than

l, regardless of the value ofn. If we setl = e 12 0.6065 and

n = 1, theabove function (7) is equivalent to Gaussian function in(6). Now, ncanbe setaccording to therequirementthat activationlevel ofthe neuronmustnotbe lessthan uforthe farthest covereddatum. So we have:

n = max

1,

log log(u)log(l) 2log

rmax

n

(8)

where, rmax is the distance of the farthest datum covered by theneuron. Hence, each hidden neuron has three parameters: a vector

of its center cn, and two scalars of its spread n and power n .Eq. (8) implies that by setting i = 1 the output of neuron at rmaxisgreater than u, and there is no need to modify activation function.Fig. 2shows the proposed modified activation function.

2.4. Weights of output layer

Normally, the output layer of an RBF network may have linear

neurons for regression (function approximation) applications andhard limit ones forclassificationapplications. As ourconcern in this

Fig. 2. The modified Gaussian like activation function. Data belongs to a specified

classand otherclassesare markedwithx ando, respectively. l = 0.4, u = 0.6, =1, andrmax= 0.9.

paper is classification problems, we consider the output layer toconsist ofChard limit neurons:

yc=

Nn=1

wcnOn+ wc(N+1) c= 1, . . . , C (9)

yc=

1 ifyc >01 ifyc 0

c= 1, . . . , C (10)

where, On is the output of the nth neuron in hidden layer andWC(N+1)is thematrix of weights andbiases. Desired values foreachofPlearning patterns are defined as follows:

dp = [dp1dp2 . . .dpC] (11)

dpc=

1 xp classc1 otherwise

c=1, . . . , Cand

p= 1, . . . , P. (12)

Training of output layer weights and biases are carried out intraditional way by minimization of a sum of square error criteria.After learning of hiddenlayerby means of each of thetwo proposedlearning rules, all data points xp, p = 1, . . . , P are presentedto hidden layer and the corresponding hidden layer output Op iscalculated. The matricesDandOare defined as:

D= [d1d2 . . . dP] (13)

O(N+1)P=

O1 OP1 1

. (14)

The ones are added in the last row of matrix O to representthe multipliers of biases. The weight matrix could be calculateddirectly as:

WC(N+1) = D O (OO)1. (15)

2.5. Winner-take-all (WTA) output layer

Here, a novel structure for output layer of RBF for classificationproblems is presented as a winner-take-all (WTA) layer withoutany adjustable parameter. Here again, there are Coutput neurons,each associated with one of the C classes of data. The outputassociated with class c is one, if and only if one of the hiddenneurons associated with class chas the highest output among allhidden neurons, all other output neurons are set to minus one (orzero). As it will be proved later, it is guaranteed that all training

patterns would be classified correctly, i.e. the learning accuracyis 100%. As there is no adjustable parameter, there is no need

7/25/2019 1-s2.0-S0893608015002804-main

6/12


Fig. 3. The RBF structure with winner-take-all output layer.

for learning. Fig. 3 demonstrates the structure of proposed RBFnetwork with WTA output layer. The output calculation can besummarized as follows:

yc=maxn

(On) On classc (16)

yc=

1 ifyc=maxnyn1 otherwise.

(17)

2.6. Learning rule based on the maximum spread and limited neurons

The proposed learning rules attempt to select fewer neurons bythesimple heuristic rules, andsimulation results show the numberof neurons is actually limited in comparison with traditional RBFand the number of support vectors in SVM models. However, one

can slightly modify each of the two proposed methods to limit themaximum number of neurons of the network. When a maximumnumber of neurons is specified, the only required modification tothe previous rules is to add neurons in decreasing order of theirspread and regardless of the order of the classes. In other words, inthe algorithm of Section2.1neurons that belong to each class areadded together in order of their spreads.

Learning rule based on the maximum spread and limited neurons

Suppose there arePtraining patternsxp

, p= 1, . . . , Peachbelongs to one ofCclasses

(1) Calculate data distances matrix as follows:

D=

xi xj

PP

i,j= 1, . . . , P.

Consider theP

valued vectorR

with componentsr

ias:ri = max

j

xi xj , xi Class c, xj Class c. (18)Therefore elements of vector R are the distances of each datumto other classes.Let the number of neurons of the network be zero n= 0.

(2) Find the largest element ri = maxjrj of R and thecorresponding datum x i Class c. Add a neuron to networkn= n +1 and let the center of new neuron as Cn = x iand itsspread as n = ri.

(3) Find thedata belong to class cthat arecloserto neurons centerand mark them as covered.

(4) Stop if all data are covered or the maximum of neurons isachieved, otherwise go back to step 2.

A similar modification can be made to algorithm in Section2.2.

3. Analysis of proposed algorithms

Proposed algorithms in Sections 2.1, 2.2 and 2.6 are clearlyconvergent after a limited number of iterations. In fact, in eachiteration, a newneuron is addedand covered data by it is discarded,and the algorithm is repeated for remaining data. At least onedatum, the one which is selected as the center of the neuron willbe discarded. This could be summarized as follows:

Theorem 1 (Convergence). Algorithms based on the maximumspread and the maximum coverage converge in limited iterations, not

more than the size of training data P. Thelearned RBFnetworkshave at

most P hidden neurons. The algorithm based on the maximum spread

and limited neurons (Section2.6)converges in limited iterations, notmore than Nmax P and the resulting network has at most Nmaxneurons.

Now, we prove that for presented algorithms in Sections2.1 and2.2the values ofl and u could always be specified to have zeroclassification error for training patterns of RBF network with hard-limit output layer. For proposed RBF network in Section 2.5 withWTA output layer (seeFig. 3), this is true for all values ofland u,that 0<

l<

u

7/25/2019 1-s2.0-S0893608015002804-main

7/12


To determinebias,recall that foreach training datumxp c at least one of the hidden neurons On classchas output greaterthan u. Then by choosing a bias value that satisfies:

b < u (21)

we get

fxp

=

OnclasscOn(xp)b > 0, xp . (22)

In addition, for all data points not belonging to , it is required tohave:

fxp

< 0, xp . (23)

As learning rule ensures that for all data pointsxp the neuronswith weightswn = 1 (that is hidden neurons belong to ) havemaximum outputs ofl, Eq.(23)is satisfied if:

l

Nn=1

wn

7/25/2019 1-s2.0-S0893608015002804-main

8/12


Table 1

The properties of datasets for examples.

Dataset Data size Features Classes

1 Thyroid 215 5 3

2 Heart 270 13 23 Ionosphere 351 34 2

4 Breast cancer 699 9 2

5 PID 768 8 2

6 Segmentation 2310 19 77 Spambase 4601 57 2

8 Wine quality-white 4898 11 7

by 0.05 steps. Fig. 4 shows the results of train and test errorsfor both networks.Fig. 4(a) shows train error for RBF-R network.The minimum training error of 0.84% is obtained for u = 0.9and l = 0.4. Generally, the training error decreases as thedifference between uand lincreases. This fact is consistent withTheorem 2that states the train error can approach zero ifu ishigh enough and llow enough, though the output layer is trainedby LS method and not by suggested method ofTheorem 2proof.However, for u > 0.7 and l < 0.2, there is a slight increasein training error which can be explained by more local functions

of hidden neurons as a consequence of high values for n (seeEq. (7)) parameters (see (7)). Testing error of RBF-R network isplotted versus u and l in Fig. 4(b). Minimum testing error of4.7% is obtained with l = 0.4 andu = 0.6. Training error was1.2% for the same parameters. In fact, those values result the bestperformance regarding the generalization accuracy. It is clear fromthis figure that test error increases for lower values of l, whichcan be again explained by more local functions of hidden neuronsas a consequence of high values for n parameters. For values0.4< l < 0.65 and a little greater u, generalization accuracyis at its maximum and is not sensitive to exact values of thoseparameters. Fig. 4(c) demonstrates generalization performanceof RBF-WTA network. From this figure, One can realize that thegeneralization performance of RBF-WTA is not sensitive to exact

values of u and l. As it was proved by Theorem 3, RBF-WTAnetwork has zero training error forall sets of parameters ifl < u,and therefore the result is not shown.

AsExample 1conducts, the exact values ofu and l have nosignificant effect on the results. For all simulations hereafter, thosevalues are selected as u = 0.7 and l = 0.6065.

Example 2. In this example, the performances of RBF-N and RBF-WTA networks with limited and predefined number of hiddenneurons, as it was proposed by the algorithm in Section 2.6,arestudied. The learning algorithms based on the maximum spreadand the maximum coverage add as much neurons to the hiddenlayer to cover all learning data. However, the best generalizationperformance may be achieved by even less hidden neurons. Forthis example a large dataset, namely Spambase dataset, is chosenand 80% of data is used for training and 20% for testing in each run.For this dataset, the required number of neurons to cover all traindata is 616 neurons.Fig. 5shows training and testing errors as thenumber of hidden neurons increases. Figs. 5(a) and (b) show theresult for RBF-Nand RBF-WTA networks, respectively. As expected,train error of RBF-WTA reaches to zero once thenumber of neuronsreaches the maximum of 616. Both networks have approximatelythe same levels of training and testing errors for low and mediumnetwork sizes, which means there is no overtraining. Train andtesterrors of RBF-N network are a little less than RBF-WTA, althoughthis is not true for all datasets as we see later. For neurons morethan about half of the maximum of required number, the trainerrors decrease without decrease in test errors which indicates theovertraining of the networks.

Fig. 4. Test and train errors (in percentages) of proposed algorithms versus

parameters of activation function, (a) train error of RBF-R network, (b) test error

of RBF-R network, and (c) test error of RBF-WTA network.

Example 3. Test errors of RBF-WTA network for some datasets arepresented inFig. 6. The datasets are PID, Ionosphere, Heart, andSpambase. Since the required numbers of neurons foreach of thesedatasets are different, the horizontal axis is normalized as the ratioof neurons to the maximum of required neurons for each dataset.

The vertical axis indicates test error for each of four datasets. Theresults for RBF-N and RBF-R networks are similar to RBF-WTA.

7/25/2019 1-s2.0-S0893608015002804-main

9/12


Fig. 5. Test and train errors of RBF-N (left) and RBF-WTA (right) networks for Spambase data versus the number of hidden neurons.

Fig. 6. Testerror of RBF-WTA structureas thenumber of hidden neurons increases.

For Ionosphere dataset, the test error is always below 5% andreaches its minimum for about one-tenth of maximum neuronswhereas for PID dataset test error begins with 33% and decreasescontinuously. For Heart and Spambase datasets, test error has nosignificant reduction as the number of neurons increases over 30%of its maximum number. Summing up, for three of those fourdatasets, the optimal number of neurons is much less than thenumber of neurons which is required to cover all data.

Example 4. Fig. 7shows data coverage by hidden neurons for alldatasets ofTable 1.The horizontal axis of this figure is the ratioof hidden neurons to data size. The vertical axis shows the ratioof data points covered by learning rule based on the maximumspread as new neurons are added. For five datasets, the numberof required neurons to cover all data points is less than a quarter ofdata size, indicating that the zero error can be achieved with muchless number of neurons than data size (in RBF-WTA network). Forother three datasets (Heart, PID, and Wine Quality) the requirednumber of hidden neurons are a little more, i.e. up to 49% of datasize for Wine Quality.

5. Comparison to other methods

In this section, results of implementation of proposed methodsare compared with some other leading RBF learning methodsfor classification problems and some support vector machines

(SVM). All implementations are carried out by MatLab software.MatLab implements RBF training by newrb.m command which

Fig. 7. Theratioof data covered by hiddenneurons. Thehorizontal axis is theratio

of hidden neurons to data size. When all data is covered, the learning algorithm

stops.

is slightly different from traditional RBF as it selects the mosterroneous datum as the center of new neuron (Neural NetworkToolbox Users Guide, 2012). Traditional support vector machineis implemented by svmtrain.m command. Simulation results fortwo other SVM methods are presented, too. Sequential MinimalOptimization (SMO) is an efficient and fast solver for SVM ( Platt,1998). LIB-SVM (Chang & Lin, 2011) is a library toolbox whichconsists of different efficient SVM methods capable of solving two-class as well as multi-class problems. The results are comparedwith three other RBF learning methods for classification problem(Constantinopoulos & Likas, 2006; Jaiyen et al., 2010; Titsias &Likas, 2001). Those three models will be refereed here after asC2006, T2001, and J2010, respectively. The 5-fold cross validationis used for all simulations and the results are averaged over 1020runs for each dataset. No effort is taken to optimize the values ofuand lparameters, but the number of neurons is optimized.

Table 2 presents the results of testing error for proposednetworks RBF-R, RBF-N, and RBF-WTA together with the resultof other methods. Among eight datasets, in five cases one ofthe proposed learning rules has the best performance. RBF-WTAnetwork has better generalization as compared with other twoproposed methods of this paper. For two-class Heart problem,SVM and LIBSVM methods perform better than other methods andreference T2001 had lowest error for PID dataset. For Spambasedataset SMO-SVM performs slightly better than RBF-R. For thistwo-class data, the SVM fails due to huge memory usage (out ofmemory error). As seen from this table, the proposed algorithms

have comparable and satisfactory results as compared with otherRBF learning rules and SVM methods.

7/25/2019 1-s2.0-S0893608015002804-main

10/12


Table 2

Averaged 5-fold-validation test error (%).

RBF-R RBF-N RBF-WTA SVM LIBSVM SMOSVM RBF J2010 T2001 C2006

Thyroid 4.4 5.3 3.7 4.18 9.8 4.6

Heart 18.1 19.5 19.4 16.4 16.67 18.1 21.1 26Ionosphere 4.5 4.8 5.7 10.5 6.25 6.3 12.9 8.27

Breast cancer 3.7 3.6 3.0 3.3 3.3 3.1 6.7

PID 24.7 27.9 26.2 26.2 23.82 25.9 29.8 23.22 24.1

Segmentation 6.4 5.7 5.4 6.2 35.1 21.4Spambase 7 6.7 9.5 Out of memory 6.6 6.33 7.26 19

Wine quality 41.6 43.9 35.9 42.7 45.6

Table 3

Model complexity, (average number of hidden neurons for RBFs, and number of SVs for SVM models).

RBF-R RBF-N RBF-WTA SVM LIBSVM SMOSVM RBF J2010 T2001 C2006

Thyroid 15.1 18.7 14.6 58.8 43 3.8Heart 24 27 46 216 124 88.8 54 3.8

Ionosphere 65 48 66.6 280.8 105.2 58 70 12

Breast cancer 40 35 40 527.9 90.2 46.1 140

PID 120 264 160 614.4 356.2 347.6 154 14 4.3

Segmentation 200 241 200 632.8 462 11.8

Spambase 619.8 699 619.8 1070.8 684.3 460 3.8

Wine quality 2038 2000 2040 3498.4 2490

In addition to generalization performance, the proposed algo-rithms are compared with other methods based on model com-plexity, too. For SVM methods model complexity normally ismeasured as the number of SVs, and for RBF models as the num-bers of their hidden neurons. The numbers of hidden neurons orthe numbers of SVs are given in Table 3.By considering the num-ber of hidden units, three methods C2006, T2001, and J2010 haveless model complexity as compared with proposed methods for alldatasets.However,theiraccuracies arelowerthan ourmethodsex-ceptfor PID dataset. Regarding the model complexity, the proposedRBF networks arenormally at the middleof those forRBF networksand SVM models. For Breast Cancer dataset, the proposed methodshave the best performance regarding the accuracy and complexity.

For further investigation of the proposed learning rules, thoserules are compared with five neural network models as reportedinFernndez-Delgado, Cernadas, Barro, Ribeiro, and Neves(2014).Table 4 presents the specification of 15 small and medium sizedatasets used in their study and is adopted here for comparison.Fernndez-Delgado et al. (2014)reports the results of their ownDKP model. Table 5 summarizes the results. The firstthree columnsare associated with three RBF models proposed in this paper,namely RBF-R, RBF-N, and RBF-WTA. For each dataset, 20 runsare averaged. The rest of columns are adopted from Fernndez-Delgado et al. (2014). Among those 8 algorithms, RBF-R winsthe most (has the best performance). RBF-R has the best averageaccuracy (%84.8) on those 15 datasets, followed by RBF-N (%84.2).The average rank of each classifier on all 15 datasets is reported,too. Again, RBF-R has the best average rank and is followed by RBF-

N while RBF-WTA and DKP are the next. Based on both averageaccuracy and average rank, the RBF-R is the best and the RBF-N isthe second best algorithm.

Statistical tests suggested inDemar(2006)are used to furtherinvestigate the results. For comparison of multiple classifiers,Demar suggests the computation of critical difference (CD) withthe BonferroniDunn test. For N datasets and k classificationalgorithms (here k = 8 andN = 15), the BonferroniDunn testindicates a significantly different performance of two classifiersif the corresponding average ranks differ by at least the criticaldifference

CD= q

k(k+1)

6N(27)

whereq is critical value for a confidence level of (here, for =0.10 the critical value is q = 2.450 andCD = 2.1). The average

Table 4

The properties of datasets used in Table 5.

Dataset Data size Features Classes

Hepatitis 80 19 2

Zoo 101 16 7

Wine 178 13 3Sonar 208 60 2

Glass 214 9 7

Heart 297 13 2

Ecoli 336 7 8

Liver 345 6 2

Ionosphere 351 34 2

Monks-3 432 6 2

Dermatol. 366 34 5

Breast 683 9 2

Pima 768 8 2Tictactoe 958 9 2German 1000 24 2

rank of RBF-R is 3.13 and the BonferroniDunn test indicates thatits performance is meaningfully better than RBF, MLP and KNN.A less conservative result can be found by Wilcoxon signed rankstest for comparisons of two classifiers. The Wilcoxon signed rankstest indicates sufficient performance difference between RBF-Ralgorithm and other five algorithms: RBF, MLP, KNN, PNNand RBF-N, with confidence level of at least 0.05.

6. Conclusion

This paper presents two fast and accurate RBF learning rulesfor classification problems. In addition, a slight modification ofGaussian activation function is made which guarantees the linearseparability property of RBF. For the proposed winner-take-alloutput layer, there is no adjustable parameter in output layerand no learning method is needed as well. In learning rulesfor hidden neurons, center and spread of each added neuron isdetermined based on the maximum achievable coverage (in termsof covered spatial domain or the number of covered training data).Each hidden unit is associated with one of the output classes.Based on these heuristics, three RBF networks were proposedand tested on well-known datasets. The comparison of resultswith some other leading RBF learning rules for classificationproblems and SVM methods indicates that proposed RBF networks

perform satisfactory and are compatible. Analytic results showthat the learning rules increase the linear separability of train

7/25/2019 1-s2.0-S0893608015002804-main

11/12


Table 5

Averaged 5-fold-validation test accuracies (%).

Dataset RBF-R RBF-N RBF-WTA DKP MLP KNN PNN RBF Average

Hepatitis 81.9 81.1 82.1 72.0 61.0 56.0 62.0 65.0 71.2

Zoo 95.2 94.3 96.2 89.6 99.4 93.5 92.0 83.8 93.8Wine 95.5 94.4 93.5 93.3 82.7 67.3 68.0 94.0 86.0

Sonar 83.8 83.2 83.9 84.4 76.9 82.3 83.1 77.9 82.0

Glass 66.1 66.3 69.1 70.4 52.8 62.4 70.2 38.7 63.1

Heart 81.9 80.5 80.6 79.9 64.1 61.9 59.9 73.5 73.0Ecoli 78.5 79.3 81.0 92.9 88.9 92.7 94.4 69.5 85.6

Liver 62.2 62.8 61.0 65.5 64.2 66.6 65.3 53.8 63.7

Ionosphere 95.5 95.2 94.3 90.3 87.8 85.8 85.8 81.5 90.1

Monks-3 99.0 95.8 68.6 89.6 94.9 97.1 96.8 97.6 91.9

Dermatol. 92.8 92.1 92.7 90.3 95.9 94.7 94.9 70.2 91.2

Breast 96.3 96.4 97.0 96.1 94.5 95.7 95.9 94.1 95.5

Pima 75.3 72.1 73.8 74.7 67.5 73.2 70.5 71.0 72.1

Tictactoe 98.8 98.8 96.4 79.2 75.0 80.0 81.7 98.0 86.7German 69.1 69.9 66.3 74.1 70.9 69.4 70.4 71.3 70.8

# Wins 5 1 2 3 2 1 1 0

Ave. 84.8 84.1 82.4 82.8 78.4 78.6 79.4 76.0 81.1

Ave. Rank 3.13 3.80 3.87 3.87 5.40 5.27 4.73 5.93

Total Rank 1 2 3.5 3.5 7 6 5 8

data in hidden layer space and therefore the train accuracy of

100% is achievable, while the number of neurons is limited. Thecomputation time for learning rules was proved to be of order twowith respect to data size. The proposed learning algorithms are notstochastic and do not depend on the order of presentation of data.There arejust twoparameters for each learning rule, namely landu, and the simulation results show that the exact values of thoseparameters have no significant effect on the accuracy of networkas long as l < u, relaxing the design effort.

References

Bartlett, P. L. (1998). The sample complexity of pattern classification with neuralnetworks: The size of the weights is more important than the size of thenetwork.IEEE Transactions on Information Theory,44(2), 525536.

Blanzieri, E. (2003). Theoretical interpretations and applications of radial basis

function networks, Tech. Rep. DIT-03-023. Trento, Italy: Inf. Telecomun., Univ.Trento.

Bormhead, D. S., & Lowe, D. (1988). Multivariable functional interpolation andadaptive networks.Complex Systems,2(3), 321355.

Cai, L., Rad, A. B., & Chan, W. L. (2010). An intelligent longitudinal controllerfor application in semiautonomous vehicles. IEEE Transactions on IndustrialElectronics,57(4), 14871497.

Chang, C. C., & Lin, C. (2011). LIBSVM: a library for support vector machines. ACMTransactions on IntelligentSystemsand Technology, 2(3),127. Softwareavailableathttp://www.csie.ntu.edu.tw/~cjlin/libsvm.

Chen, S., Cowan, C. F. N., & Grant, P. M.(1991). Orthogonal least squares learningalgorithm for radial basis function networks. IEEE Transactions on NeuralNetworks,2(2), 302309.

Constantinopoulos, C., & Likas, A. (2006). An incremental training method forthe probabilistic RBF network. IEEE Transactions on Neural Networks, 17(4),966974.

Demar, J. (2006). Statistical comparisons of classifiers over multiple data sets.Journal of Machine Learning Research,7, 130.

Fernndez-Delgado, M., Cernadas, E., Barro, S., Ribeiro, J., & Neves, J.(2014). DirectKernel Perceptron (DKP): Ultra-fast kernel ELM-based classification with non-iterative closed-form weight calculation.Neural Networks,50, 6071.

Ferrari, S., Bellocchio, F., Piuri, V., & Borghese, N. A. (2010). A hierarchical RBFonlinelearningalgorithmfor real-time3-D scanner. IEEE Transactions on NeuralNetworks,21(2), 275285.

Frank, A., & Asuncion, A. (2010). UCI Machine Learning Repository. Irvine, CA:University of California, School of Information and Computer Science. URL:http://archive.ics.uci.edu/ml.

Girosi, F., & Anzellotti, G. (1993). Rates of convergence for radial basis functionsandneural networks. In R. J. Mammone (Ed.), Artif. neural netw. speech vis.(pp. 97113). London, UK: Chapman & Hall.

Hartman, E., Keeler, J. D., & Kowalski, J. M. (1990). Layered neural networks withGaussian hidden units as universal approximations. Neural Computation,2(2),210215.

Hien, D. T. T., Huan, H. X., & Huynh, H. T.(2009). Multivariate interpolation usingradial basis function networks. International Journal of Data Mining ModellingandManagement,1(3), 291309.

Huan, H. X., Hien, D. T. T., & Huynh, H. T. (2007). A novel efficient two-phase

algorithm for training interpolation radial basis function networks. SignalProcessing,87(11), 27082717.

Huan, H. X., Hien, D. T. T., & Tue, H. H. (2011). Efficient algorithm for training

interpolation RBF networks with equally spaced nodes. IEEE Transactions onNeural Networks,22(6), 982998.

Huang,G. B., Saratchandran, P., & Sundararajan, N. (2004). An efficient sequentiallearning algorithm for growing and pruning RBF (GAP-RBF) networks. IEEETransactions on Systems, Man, and Cybernetics,34(6), 22842292.

Huang, S., & Tan, K. K.(2009). Fault detection and diagnosis based onmodeling andestimation methods.IEEE Transactions on Neural Networks,20(5), 872881.

Jaiyen, S., Lursinsap, C., & Phimoltares, S. (2010). A very fast neural learning forclassification using only new incoming datum. IEEE Transactions on NeuralNetworks,21(3), 381392.

Javan,D. S., Rajabi Mashhadi, H., Ashkezari Toussi, S., & Rouhani, M. (2012). On-line voltage and power flow contingencies rankings using enhanced radialbasis function neural network and kernel principal componentanalysis. ElectricPower Components and Systems,40, 534555.

Javan, D. S., Rajabi Mashhadi, H., & Rouhani, M. (2013). A fast static securityassessment method based on radial basis function neural networks usingenhanced clustering.Electrical Power and Energy Systems,44(1), 988996.

Kainen,P. C., Kurkov, V., & Sanguineti, M.(2009). Complexity of Gaussian-radial-basis networks approximating smooth functions. Journal of Complexity, 25,6374.

Kainen,P. C., Kurkov, V., & Sanguineti, M.(2012). Dependence of computationalmodels on input dimension: Tractability of approximation and optimizationtasks.IEEE Transactions on Information Theory,58(2), 12031214.

Lee, Y.J., & Yoon, J.(2010). Nonlinear image up sampling method based on radialbasis function interpolation. IEEE Transactions on Image Processing, 19(10),26822692.

Liu, Z., & Bozdogan, H. (2003). RBF neural networks for classification using newkernel functions.Neural Parallel & Scientific Computations, 4152.

Mao, K.Z., & Huang, G.(2005). Neuron selection for RBF neural network classifierbased on data structure preserving criterion. IEEE Transactions on Neural, 16(6),15311540.

Meng,K., Dong, Z. Y., Wang, D. H., & Wong, K. P.(2010). A self-adaptive RBF neuralnetwork classifier for transformer fault analysis. IEEE Transactions on PowerSystems,25(3), 13501360.

Mhaskar, H. N. (2004a). On the tractability of multivariate integration andapproximation by neural networks.Journal of Complexity,20, 561590.

Mhaskar, H. N.(2004b). When is approximation by Gaussian networks necessarilya linear process?Neural Networks,17, 9891001.

Moody, J., & Darken, C. J. (1989). Fast learning in networks of locally-tunedprocessing units.Neural Computation,1(2), 281294.

Neural Network Toolbox Users Guide (2012). The Math Works, Inc., Natick, MA,USA. URL: http://www.mathworks.com/help/nnet/index.html.

Orr, M.J. L.(1995). Regularization in the selection of radial basis function centers.Neural Computation,7(3), 606623.

Oyang,Y., Hwang, S., Ou, Y., Chen, C., & Chen, Z. (2005). Data classification withradial basis function networks based on a Novel Kernel Density EstimationAlgorithm.IEEE Transactions on Neural Networks,16(1), 225236.

Park, J., & Sandberg, I. W. (1991). Universal approximation using radial-basisfunction networks.Neural Computation,3(2), 246257.

Platt, J.(1998).Sequential minimal optimization: a fast algorithm for training supportvector machines, Technical Report MSR-TR-98-14. Microsoft research.

Powell, M.J.D. (1985). Radial basis functions for multivariable interpolation: Areview. InProc. IMA conf. algorithms applicat. funct. data(pp. 143167).

Titsias, M.K.,&Likas,A.C.(2001). Shared kernel models forclass conditional densityestimation.IEEE Transactions on Neural Networks,12(5), 987997.

Tsai, C.C., Huang, H. C., & Lin, S. C. (2010). Adaptive neural network control of a

self-balancing two-wheeled scooter.IEEE Transactions on Industrial Electronics,57(4), 14201428.
http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref1http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref1http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref1http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref1http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref1http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref1http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref1http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref1http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref2http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref2http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref2http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref2http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref2http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref2http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref3http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref3http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref3http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref3http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref3http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref3http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref3http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref4http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref4http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref4http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref4http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref4http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref4http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref4http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref4http://www.csie.ntu.edu.tw/~cjlin/libsvmhttp://www.csie.ntu.edu.tw/~cjlin/libsvmhttp://refhub.elsevier.com/S0893-6080(15)00280-4/sbref6http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref6http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref6http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref6http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref6http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref6http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref6http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref6http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref7http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref7http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref7http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref7http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref7http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref7http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref7http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref7http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref8http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref8http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref8http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref8http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref8http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref8http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref9http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref9http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref9http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref9http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref9http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref9http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref9http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref9http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref10http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref10http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref10http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref10http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref10http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref10http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref10http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref10http://archive.ics.uci.edu/mlhttp://archive.ics.uci.edu/mlhttp://refhub.elsevier.com/S0893-6080(15)00280-4/sbref12http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref12http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref12http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref12http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref12http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref13http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref13http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref13http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref13http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref13http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref13http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref13http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref14http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref14http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref14http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref14http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref14http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref14http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref14http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref14http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref14http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref15http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref15http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref15http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref15http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref15http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref15http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref15http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref15http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref16http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref16http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref16http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref16http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref16http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref16http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref16http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref16http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref17http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref17http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref17http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref17http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref17http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref17http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref17http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref17http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref18http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref18http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref18http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref18http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref18http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref18http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref18http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref19http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref19http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref19http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref19http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref19http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref19http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref19http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref19http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref20http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref20http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref20http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref20http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref20http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref20http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref20http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref20http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref20http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref21http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref21http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref21http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref21http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref21http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref21http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref21http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref21http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref22http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref22http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref22http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref22http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref22http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref22http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref22http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref22http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref23http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref23http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref23http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref23http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref23http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref23http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref23http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref23http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref24http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref24http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref24http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref24http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref24http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref24http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref24http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref25http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref25http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref25http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref25http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref25http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref25http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref26http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref26http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref26http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref26http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref26http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref26http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref26http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref27http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref27http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref27http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref27http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref27http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref27http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref27http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref27http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref27http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref28http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref28http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref28http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref28http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref28http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref28http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref28http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref29http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref29http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref29http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref29http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref29http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref29http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref29http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref30http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref30http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref30http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref30http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref30http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref30http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref30http://www.mathworks.com/help/nnet/index.htmlhttp://www.mathworks.com/help/nnet/index.htmlhttp://refhub.elsevier.com/S0893-6080(15)00280-4/sbref32http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref32http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref32http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref32http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref32http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref32http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref33http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref33http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref33http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref33http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref33http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref33http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref33http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref33http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref34http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref34http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref34http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref34http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref34http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref34http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref34http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref35http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref35http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref35http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref35http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref37http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref37http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref37http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref37http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref37http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref37http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref37http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref38http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref38http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref38http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref38http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref38http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref38http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref38http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref37http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref35http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref34http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref33http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref32http://www.mathworks.com/help/nnet/index.htmlhttp://refhub.elsevier.com/S0893-6080(15)00280-4/sbref30http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref29http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref28http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref27http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref26http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref25http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref24http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref23http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref22http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref21http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref20http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref19http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref18http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref17http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref16http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref15http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref14http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref13http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref12http://archive.ics.uci.edu/mlhttp://refhub.elsevier.com/S0893-6080(15)00280-4/sbref10http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref9http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref8http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref7http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref6http://www.csie.ntu.edu.tw/~cjlin/libsvmhttp://refhub.elsevier.com/S0893-6080(15)00280-4/sbref4http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref3http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref2http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref1

7/25/2019 1-s2.0-S0893608015002804-main

12/12


Wu, S., & Chow, T. W. S. (2004). Induction machine fault detection using SOM-based RBF neural networks. IEEE Transactions on Industrial Electronics, 51(1),183194.

Xie, T., Yu, H., Hewlett, J., Rycki, P., & Wilamowski, B. (2012). Fast andefficient second-order method for training radial basis function networks.IEEETransactions on Neural Networks and Learning Systems,23(4), 609619.

Yao, W., Chen, X., Zhao, Y., & Tooren, M. (2012). Concurrent subspace widthoptimization method for RBF neural network modeling. IEEE Transactions onNeural Networks and Learning Systems,23(2), 247259.

Yu, H., Xie, T. T., Paszczynski, S., & Wilamowski, B. M. (2011). Advantages ofradial basis function networks for dynamic system design.IEEE Transactions onIndustrial Electronics,58(12), 54385450.
http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref39http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref39http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref39http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref39http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref39http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref39http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref39http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref40http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref40http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref40http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref40http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref40http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref40http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref40http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref40http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref40http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref41http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref41http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref41http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref41http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref41http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref41http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref41http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref41http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref42http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref42http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref42http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref42http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref42http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref42http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref42http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref42http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref42http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref41http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref40http://refhub.elsevier.com/S0893-6080(15)00280-4/sbref39

Documents

1-s2.0-S0893608015002804-main