[IEEE 2013 IEEE 17th International Conference on Intelligent Engineering Systems (INES) - San Jose, Costa Rica (2013.06.19-2013.06.21)] 2013 IEEE 17th International Conference on Intelligent

Determination of the Complexity Fitted Model Structure of Radial Basis Function Neural

Networks

Annamária R. Várkonyi-Kóczy*, Balázs Tusor** and Adrienn Dineva** * Institute of Mechatronics and Vehicle Engineering, Óbuda University, Budapest, Hungary

** Integrated Intelligent Systems Japanese-Hungarian Laboratory, Óbuda University, Budapest, Hungary [email protected], [email protected], [email protected]

Abstract—One of the disadvantages of using Artificial Neural Networks (ANNs) is their significant training time need, which scales with the complexity of the network and with the complexity of the problem that is needed to be solved. Radial Basis Function Neural Networks (RBFNNs) are neural networks that use the linear combination of radial basis functions, utilizing hybrid learning procedures which can solve the time requirement problem efficiently. However, it is not trivial to determine their structural parameters, such as the number of neurons as well as the parameters of each neuron. To solve that problem we have developed a new training method: we apply a clustering step to the training data, which results in information both about the quasi-optimum number of necessary neurons in the model and the approximate parameters of the neurons.

I. INTRODUCTION When solving complex problems with neural networks

(NNs), one of the biggest problems is the approximation of the necessary complexity of the NN model. It is well known that it has to majorate the complexity of the system/problem to be modeled which usually leads to a situation where the complexity of the model significantly exceeds that of the problem. This may result in a complexity explosion and serious problems in the fitting and/or evaluation of the model. The main problem is that there is not any real systematic method for the estimation of the needed (minimum) complexity. Engineers usually apply one of two strategies: The first one starts of a small model and increases its complexity iteratively if it turns out to be insufficient. The other method follows the opposite direction: It starts of a model with overestimated complexity which is reduced step by step by deleting unnecessary parts.

The problem leads back to Hilbert’s 13th (and also to his 23th) conjectures (“The existence of a continuous function of three variables which cannot be decomposed as the finite superposition of continuous functions of two variables” and “Further development of the calculus of variations”, respectively) [1] and to the works of Kolmogorov [2] and his student Arnold [3], [4]. Arnold disproved Hilbert’s 13th conjecture in 1957 by showing a constructive evidence to the problem while Kolmogorov proved a general representation theorem in the same year stating that any continuous real-valued N variable function defined over the [0,1]N compact interval can be represented with the help of appropriately chosen 1 variable functions and sum operation.

The universal approximation theorem has come into the focus again with the spreading of soft computing, fuzzy and neural network techniques. It has been proven that if an arbitrary function is approximated by a set of functions of a certain type then the universal approximation property holds only if the number of building functions (fuzzy sets or hidden neurons) is unlimited (see e.g. [5], [6], [7]).

The method presented in this paper contributes to the field by giving an approximation of the necessary model complexity of Radial Basis Function Neural Networks (RBFNNs). (See the analogy with the original problem: RBFNN having one hidden layer is a universal approximator for real-valued maps defined on convex, compact sets of RN if certain requirements are held, i.e., it can approximate any function which is continuous on the N-dimensional unit hypercube.) Further, the proposed technique also speeds up the teaching (learning process) of the NN model.

The paper is organized as follows: In Section II the basics of Radial Basis Function Neural Networks are summarized. Section III presents the new method. In Section IV experimental results are shown, while Section V is devoted to the conclusions.

II. RADIAL BASIS FUNCTION NEURAL NETWORKS Radial Basis Function Neural Networks (RBFNNs) are

feed-forward artificial neural networks with two active layers, which use the linear combination of radial basis functions that are used as activation functions. Their application areas mainly involve function approximation, time series prediction, control, etc.

The general architecture of RBFNNs can be seen in Fig. 1. The network first calculates the activation value (gi(X)) from the input data (X) using radial basis functions, each function having a center (ci) and a width (σi) parameter of the base function (which usually differ for each neuron). In the output layer, the weighted sum is generated from the values resulted in the previous step and their weight parameters (wi) (with an optional additional bias value (g0), which is usually 1), resulting in the response of the network (y).

In the training phase of RBFNNs the hidden layer realizes an unsupervised training through nonlinear mapping using radial basis functions, to tune the center and width parameters of each function. A widespread solution for this is using clustering methods,

– 237 –

INES 2013 • IEEE 17th International Conference on Intelligent Engineering Systems • June 19-21, 2013, Costa Rica

978-1-4799-0830-1/13/$31.00 ©2013 IEEE

Figure 1. The architecture of Radial Basis Function Neural Networks

e.g. the c-means method [8]. The output layer realizes a supervised training phase, e.g. the back-propagation algorithm.

To cite a few of the newest researches conducted on the utilization of RBFNNs at various fields, in [9] the authors combine RBFNNs with bidirectional Kohonen maps in order to help the estimation of damages caused by winter storms and related gusts. This topic is proved to have an increasing importance because of the climate change, which results in more unpredictable weather phenomena. In [10] RBFNNs are used to modify the inverse kinematic approximation of a robot-vision system. In [11] RBFNNs are used in food processing, successfully predicting the mass transfer in plant materials. In [12] the authors use RBFNNs to determine electromagnetic fields near material interfaces. In [13] RBFs are applied for video traffic prediction and in [14] for facial feature extraction in face recognition.

As it is shown above, the utilization of RBFNNs is widespread nowadays. Although, besides the advantages, the determination of the optimum structure for the networks is not trivial, particularly the number of neurons that are required to solve the given problem. Even though the c-means method is effective for determining the center parameters of each neuron, the number of clusters is needed to be set by the users themselves. The clustering algorithm proposed in this paper is not only able to calculate the center parameters, but it can also give the number of neurons thus the very structure of the network.

III. THE CLUSTERING ALGORITHM The proposed training procedure to determine the

structural parameters of RBFNNs used for classification problems adds an additional clustering step to the system, in which the input training samples of the neural networks are clustered and the neural networks are trained using the reduced sample set resulted from the clustering. Fig. 2 shows the block diagram of the general supervised learning problem extended with the additional clustering step. The goal of the training is to tune the model (in our case the RBFNN) in order to make the output of the model (y) approximate the desired output (d) of the examined unknown system using the value (c) determined by the criteria function (typically a function of the approximation error). The model input is the cluster center (u’) of the cluster which the actual input (u) belongs to. This means that the input data set has to be clustered before the training.

The role of the inserted clustering step is to reduce the quantity of the input data (u) used during the training and parallel with it to preserve the information contained in the original data set as much as possible, thus making the time required for learning the mapping of the unknown system to be modeled shorter while preserving the accuracy of the systems performance as if it was trained with the original data set. The results of the clustering step are the centers of the appointed clusters (u’).

For the training procedure, we have also developed a novel clustering method fundamentally based on the well-known c-means clustering method [8]. The principle of the new technique is similar to the original algorithms except the assignment procedure which has been modified (time reduced). While in the original c-means algorithms, the samples are compared to all of the existing clusters and the sample is assigned to the best fitting (nearest) cluster, in our method, the given sample is compared to the existing clusters one by one and the procedure stops if a “near enough” cluster is found. I.e., the sample gets assigned to the first cluster where the distance between the

Figure 2. The new supervised learning scheme extended with clustering

A. R. Várkonyi-Kóczy et al. • Determination of the Complexity Fitted Model Structure of Radial Basis Function Neural Networks

– 238 –

Figure 3. The algorithm of the time reduced clustering step

cluster’s center and the given sample is less than a predefined, arbitrary value (distance factor). If there is no such cluster then a new cluster is appointed with the given sample assigned to it. Fig. 3 shows the flowchart of the algorithm. The clustering is used on only one class at one time, so the incidental similarity between patterns of different classes will not cause any problem during the training.

The most important parameter of the new pre-clustering method is the clustering distance. Its value has a direct effect on the complexity of the training and on the accuracy of the classification. Choosing it too low results in too many clusters (nearing the complexity of the original training data set) while big clustering distances result in a fewer number of clusters, but less accurate classification.

IV. EXPERIMENTAL RESULTS The efficiency of the new training method has already

been tested on basic Artificial Neural Network [15] and Circular Fuzzy Neural Network [16] architectures with promising results. The main role of the clustering step was to reduce the training data with little or no loss regarding its ability to represent the original, unclustered data, thus creating a compressed data set. According to the efficiency analyses, in case of CFNNs, the introduction of the pre-clustering step has reduced the training-time in average by more than 32% together with a reduction of the number of input samples by approximately 54%. (The difference between CFNNs and regular feed-forward

ANNs is that the weights, biases, and outputs of CFNNs are fuzzy numbers (based on Fuzzy Neural Networks proposed in [17]), further their topology is realigned to a circular topology, and the connections between the input and hidden layers are trimmed.) In case of ANNs, apart from the time reduction another interesting phenomenon was observable: by reducing the complexity of the input data, networks could be trained for problems that originally could not be learned by networks with the same parameters due to the high complexity of the given problems.

In the experiments of this paper, the RBFNN implementation described in [18] has been used, which utilizes a single-step analytic learning method. The simplified description of the operation of the algorithm is the following. First, for every center (in each neuron) the algorithm calculates the distance from each input data points, and then calculates the Gaussian values using the distances, thus getting N by C matrix φ, where N is the number of data samples and C is the number of neurons/centers. Finally, the method produces the weight matrix by dividing matrix φ by the matrix of the target data (Y). (For more information, see [18].)

In the following, three characteristic experiments are presented. In all three experiments, the RBF implementation described above is used. The training data consists of 500 while the testing data consists of another 1000 randomly chosen data points. In the experiments various amounts of RBF centers (and thus neurons) are

– 239 –


0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Figure 4. The domain of the problem in the first experiment.

applied, according to the amount of centers that the clustering step results in, respectively.

In the first experiment, the data points of 5 different classes are laid out according to Fig. 4. There is a big set of data in the middle of the domain. Another crescent moon shaped data patch can be found near to class 1. Two further small classes are in two of the corners of the domain, while the rest of the domain is filled with the elements of the last, fifth class. In this simple case, there isn’t any overlapping among the classes.

Table I shows the accuracy of the trained RBF networks in case of different clustering distances. As it can be seen, the accuracy of the model on the training and on the (separate) testing data sets correlates well. It can

0 20 40 60 80 100 120 140 160 180 20050

60

70

80

90

100

X: 41Y: 94

Number of neurons

Acc

urac

y on

the

test

ing

set

Figure 5. The relationship between the complexity and the accuracy of

the network for the first experiment.

also be followed how the number of neurons (resulted from the clustering step) is changing in contrast to the clustering distance. The optimum clustering distance is mostly dependent on the layout of the original data points, so (at least in theory) it can be determined by analyzing the layout of the data points. By applying this optimum clustering distance, a good approximation of the optimum complexity of the RBFNN can be achieved. In this example, it is around 0.14, resulting in 41 clusters and more than 94% accuracy which nears the best obtainable less than 95% accuracy of the model.

For better analysis, the relationship between the complexity and the accuracy of the network (measured on the testing data set) has been investigated. In Fig. 5 the obtained accuracy figures are depicted for RBFNNs built of different numbers of neurons. In these cases, the initial parameters of the functions are chosen randomly. Corresponding to the result of the data analysis, around 40 neurons the accuracy more or less reaches its peak. The marked point shows the point indicated by the clustering results (d=0.14), which seems to be a good choice concerning complexity and accuracy issues. (It can also be seen that this particular implementation of RBF isn’t sensitive to over-learning, which usually occurs in case of too many neurons in case of other RBF architectures).

The second example represents a much more difficult problem. The data points of 5 different classes are laid out according to Fig. 6. Two ring shaped bigger sets of classes can be found in the middle of the domain having significant overlapping. (Overlapping can occur for many

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Figure 6. The domain of the problem in the second experiment.

TABLE I. THE ACCURACY OF THE TRAINED RBF NETWORKS FOR THE FIRST EXPERIMENT

Clu

ster

ing

dist

ance

Num

ber

of R

BF

cent

ers

Clustered initial centers

Acc

urac

y on

th

e tr

aini

ng

set

Acc

urac

y on

th

e te

stin

g se

t

0,4 11 88,2 86,6

0,3 17 91,4 91,9

0,2 24 91,4 91,8

0,15 38 93,4 92,9

0,14 41 95,2 94

0,13 45 94,8 93,8

0,12 57 96,4 94

0,11 65 96,8 93,4

0,1 74 96,4 93,5

0,05 191 96,8 94,6

0,02 399 97 94


– 240 –

0 20 40 60 80 100 120 140 160 180 20060

65

70

75

80

X: 61Y: 77.3

Number of neurons

Acc

urac

y on

the

test

ing

set


the network for the second experiment.

reasons. E.g., there can be other dimensions in the data that are either unknown or are too expensive to be dealt with.) The interiors of the rings form two non-overlapping classes with smaller quantities of data. The rest of the domain is filled with the data points of the fifth class.

Table II shows the accuracy of the trained RBF networks, as well as the number of neurons resulted from the clustering step. The results are similar to the results of the first experiment, though in this case the accuracy is not that high, because of the significant overlapping of two classes.

The relationship between the complexity and the accuracy of the network (measured on the testing data set) has been investigated in this case as well (see Fig. 7). The accuracy starts to be stabilized around 60 neurons. The marked point shows the point indicated by the clustering results (d=0.12), which can be viewed as quasi-optimum in terms of the complexity and accuracy trade-off of the network. Although, looking at the results 40 neurons seems to be a good choice as well, Fig. 7 clearly shows that despite the acceptable accuracy of the case, the network is not stable yet from accuracy point of view.

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Figure 8. The domain of the problem in the third experiment.

In the third example, the data points of 5 different classes are laid out according to Fig. 8. Similarly to the second experiment, there is significant overlapping among the different classes. Table III shows the accuracy of the trained RBF networks, as well as the number of neurons resulted from the clustering step.

The relationship between the complexity and the accuracy of the network (measured on the testing data set) has been investigated in this case as well. Fig. 9 shows that the accuracy rises sharply at first and then slows down to finally be stabilized around 80 neurons. The marked point shows the point indicated by the clustering results (d=0.1), which can be viewed as quasi-optimum in terms of the complexity and accuracy trade-off of the network.

TABLE III. THE ACCURACY OF THE TRAINED RBF NETWORKS FOR THE THIRD EXPERIMENT

Clu

ster

ing

dist

ance

Num

ber

of R

BF

cent

ers


Acc

urac

y on

th

e tr

aini

ng

set

Acc

urac

y on

th

e te

stin

g se

t

0,4 12 77,4 73,2

0,3 16 75 69,1

0,2 29 84,2 79

0,1 81 88,6 81,6

0,075 115 89,2 82,2

0,05 193 88,8 81,2

TABLE II. THE ACCURACY OF THE TRAINED RBF NETWORKS FOR THE SECOND EXPERIMENT

Clu

ster

ing

dist

ance

Num

ber

of R

BF

cent

ers


Acc

urac

y on

th

e tr

aini

ng

set

Acc

urac

y on

th

e te

stin

g se

t

0,4 11 70 72,4

0,3 18 68,2 70,9

0,2 28 73 72,9

0,15 41 80 77,2

0,14 45 79,2 76,2

0,13 56 79,6 76,9

0,12 61 80,4 76,9

0,11 74 80,4 76,7

0,1 87 81,2 77,4

0,05 201 81 77,2

– 241 –


0 20 40 60 80 100 120 140 160 180 20040

50

60

70

80

90

X: 81Y: 82.5

Number of neurons

Acc

urac

y on

the

test

ing

set


the network for the third experiment.

This example illustrates also that the obtainable accuracy can slightly be better when starting of the clustered initial centers than when the initial parameters are randomly chosen.

V. CONCLUSIONS In this paper, a data clustering method is presented for

the determination of the quasi-optimum structure (number of hidden neurons and parameters) of Radial Basis Function Neural Networks. The technique is an extension of the algorithm previously successfully used to reduce the necessary training time of feed-forward Artificial Neural Networks and Circular Fuzzy Neural Networks. In case of RBFNNs the main role of the presented clustering is not to reduce the required training time, since one of the advantages of using RBFNNs is the fast training by default. By applying the quantity and cluster centers of the resulting clusters, the quasi-optimum number of hidden layer neurons can be determined and their center parameters can be set, respectively.

ACKNOWLEDGMENT This work was sponsored by the Hungarian National Scientific Fund (OTKA 79576 and OTKA 105846).

REFERENCES [1] Hilbert, D., "Mathematical Problems,” Bulletin of the American

Mathematical Society, vol. 8, no. 10 (1902), pp. 437-479. Earlier publications (in the original German) appeared in Göttinger Nachrichten, 1900, pp. 253-297, and Archiv der Mathematik und Physik, 3dser., vol. 1 (1901), pp. 44-63, 213-237.

[2] Kolmogorov, A.N., “On the representation of continuous functions of many variables by superpositions of continuous functions of one variable and addition,” Doklady Akademii Nauk SSSR, Vol. 114. pp. 953-956, 1957 (In Russian).

[3] Arnold, V.I., “On the functions of three variables,” Doklady Akademii Nauk SSSR, Vol. 114, No. 4, pp. 679-681, 1957 (In Russian).

[4] Arnold, V.I., “On the representation of continuous functions of three variables by the superpositions of continuous functions of two variables,” Matem. Sbornik, Vol 48, No. 1, pp. 3-74, 1959 and Vol. 56, No. 3, p. 392, 1962 (In Russian).

[5] Hornik, K., Stinchcombe, M., White, H., “Multilayer feedforward networks are universal approximators,” Neural Networks, Vol. 2, pp. 359-366, 1989.

[6] Klement, E.P., Kóczy, L.T., Moser, B., “Are fuzzy systems universal approximators?,” Int. Journal of General Systems, Vol. 28, No. 2-3, pp. 259-282, 1999.

[7] Park. J., Sandberg, I.W., “Universal approximation using Radial-Basis Function Networks,” Neural Computation, Vol. 3, pp. 246-257, 1991.

[8] Aloise, D., A. Deshpande, P. Hansen, P. Popat, ”NP-hardness of Euclidean sum-of-squares clustering,” Machine Learning, Vol. 75, pp. 245–249., 2009.

[9] Voigt, M., P. Lorenz, T. Kruschke, R. Osinski, U. Ulbrich, G. C. Leckebusch, „Statistical Downscaling of Gusts During Extreme European Winter Storms Using Radial-Basis-Function Networks”, EGU General Assembly, 22 - 27 April, 2012, Vienna, Austria, p. 11806.

[10] Dinh, B.H., D. V. Hoang, D. C. Huynh, „A novel on-line training solution using a Radial Basis Function Network to modify the inverse kinematic approximation of a robot-vision system”, 2011 IEEE Power Engineering and Automation Conference (PEAM), Vol. 3, pp. 262–265, 2011.

[11] Tortoe, C., J. Orchard, A. Beezer, and J. Tetteh, “Application of radial basis function network with a Gaussian function of artificial neural networks in osmo-dehydration of plant materials.”, Journal of Artificial Intelligence, 4 (4). pp. 233-244. ISSN 1994 5450, 2011.

[12] Xiaoming Chen, Gang Lei; Guangyuan Yang; Shao, K.R.; Youguang Guo; Jianguo Zhu; Lavers, J.D., „Using Improved Domain Decomposition Method and Radial Basis Functions to Determine Electromagnetic Fields Near Material Interfaces”, IEEE Transactions on Magnetics, Vol.48, Issue 2, pp. 199 – 202, 2012.

[13] Oravec, M., M. Petráš, F. Pilka, “Video Traffic Prediction Using Neural Networks”, Acta Polytechnica Hungarica, Vol. 5, No. 4, pp. 59-78., 2008.

[14] Ban, J., M. Féder, M. Oravec, J. Pavlovičová, “Non-Conventional Approaches to Feature Extraction for Face Recognition”, Acta Polytechnica Hungarica, Vol. 8, No. 4, pp. 75-90., 2011.

[15] Tusor, B., A.R. Várkonyi-Kóczy, I.J. Rudas, G. Klie, G. Kocsis, “An Input Data Set Compression Method for Improving the Training Ability of Neural Networks”, In CD-ROM Proc. of the 2012 IEEE Int. Instrumentation and Measurement Technology Conference, I2MTC’2012, Graz, Austria, May 13-16, 2012, pp. 1775-1783.

[16] A.R. Várkonyi-Kóczy, B. Tusor, “Improving the Supervised Learning of Neural Network Based Classification”, In proc of IEEE International Symposium on Intelligent Signal Processing, WISP2011, Floriana, Malta, September 19-21, 2011.

[17] H. Ishibushi, H. Tanaka, “Fuzzy neural networks with fuzzy weights and fuzzy biases”, in Proc. IEEE Neural Network Conf, vol. 3, San Francisco, USA, pp 1650-1655., 1993.

[18] Travis Wiens, http://www.mathworks.com/matlabcentral/ fileexchange/22173-radial-basis-function-network


– 242 –

Documents

[IEEE 2013 IEEE 17th International Conference on Intelligent Engineering Systems (INES) - San Jose, Costa Rica (2013.06.19-2013.06.21)] 2013 IEEE 17th International Conference on Intelligent