6
A Semi-Evolutionary Approach for Modeling Pedotransfer Functions with Small Size of Data Using Artificial Neural Networks S. M. TAHERI, K. MAHDAVIANI, J. MOHAMMADI School of Mathematical Sciences, Isfahan University of Technology, Isfahan 84156, IRAN (corresponding author) Electrical and Computer Engineering Faculty, Isfahan University of Technology, and Isfahan Mathematics House, Isfahan, IRAN Soil Science Department, College of Agriculture, Shahrekord University, Shahrekord, IRAN Abstract: - Pedotransfer functions are predictive functions of certain soil properties based on their easily or cheaply measured features. An approach based on the artificial neural networks is taken to model such functions. Due to the small size of the relevant data set, a especial procedure is used to design the architecture of the suitable network to avoid the problem of overfitting. Key-Words: - Pedotransfer function, Soil characteristics, Artificial neural networks, Modeling 1 Introduction Artificial neural networks (ANNs), inspired by the biological structure of the brain, present a new class of computing systems. Very complex and usually analytically unknown or indescribable dependencies between input and output patterns can also be successfully approximated on the basis of the ability of ANNs, to learn from examples [3]. Because of their interesting property to generalize, ANNs can also provide a successful approximation for the relation between input and output data on the basis of a sub-set of all possible data. Moreover, due to their versatility, ANN have been widely used in many fields of science and technology, e.g. [4,11,13]. On the other hand, pedomodels have become a popular topic in environmental researches especially in soil science. They are predictive functions of certain soil properties based on other easily or cheaply measured properties [1]. Different types of functions, mainly based on statistical regression, have been developed to predict physical and chemical properties of soil [17]. In recent years some attempts have been done to develop alternative methods to fit pedotransfer functions. For example, one of the main approaches for fitting such functions is to use fuzzy regression models [2,10,18]. Another suitable approach is to apply ANN to investigate the relationships between soil variables and to predict certain properties [9,11]. 2 Artificial Neural Networks as Function Approximators In this study ANNs are considered as function approximators, by which some unknown function (pedotransfer function) are approximated. Suitable ANNs are designed and their parameters are adjusted to produce the same response as the unknown function, provided the same input is applied to both systems. For this purpose a multilayer feedforward neural network is taken into consideration. Hornik et al. [5] proved that Multilayer feedforward networks are universal approximators. They also maintained that: "...standard multilayer feedforward networks with as few as one hidden layer unit arbitrary Proceedings of the 6th WSEAS Int. Conf. on Systems Theory & Scientific Computation, Elounda, Greece, August 21-23, 2006 (pp145-150)

A semi-evolutionary approach for modeling pedotransfer functions with small size of data using artificial neural networks

Embed Size (px)

Citation preview

A Semi-Evolutionary Approach for Modeling Pedotransfer Functions with Small Size of Data Using Artificial Neural Networks

S. M. TAHERI, K. MAHDAVIANI, J. MOHAMMADI

School of Mathematical Sciences, Isfahan University of Technology, Isfahan 84156, IRAN (corresponding author)

Electrical and Computer Engineering Faculty, Isfahan University of Technology, and Isfahan Mathematics House, Isfahan, IRAN

Soil Science Department, College of Agriculture, Shahrekord University, Shahrekord, IRAN

Abstract: - Pedotransfer functions are predictive functions of certain soil properties based on their easily or cheaply measured features. An approach based on the artificial neural networks is taken to model such functions. Due to the small size of the relevant data set, a especial procedure is used to design the architecture of the suitable network to avoid the problem of overfitting.

Key-Words: - Pedotransfer function, Soil characteristics, Artificial neural networks, Modeling 1 Introduction Artificial neural networks (ANNs), inspired by the biological structure of the brain, present a new class of computing systems. Very complex and usually analytically unknown or indescribable dependencies between input and output patterns can also be successfully approximated on the basis of the ability of ANNs, to learn from examples [3]. Because of their interesting property to generalize, ANNs can also provide a successful approximation for the relation between input and output data on the basis of a sub-set of all possible data. Moreover, due to their versatility, ANN have been widely used in many fields of science and technology, e.g. [4,11,13]. On the other hand, pedomodels have become a popular topic in environmental researches especially in soil science. They are predictive functions of certain soil properties based on other easily or cheaply measured properties [1]. Different types of functions, mainly based on statistical regression, have been developed to predict physical and chemical properties of soil [17]. In recent years some attempts have been

done to develop alternative methods to fit pedotransfer functions. For example, one of the main approaches for fitting such functions is to use fuzzy regression models [2,10,18]. Another suitable approach is to apply ANN to investigate the relationships between soil variables and to predict certain properties [9,11]. 2 Artificial Neural Networks as Function Approximators In this study ANNs are considered as function approximators, by which some unknown function (pedotransfer function) are approximated. Suitable ANNs are designed and their parameters are adjusted to produce the same response as the unknown function, provided the same input is applied to both systems. For this purpose a multilayer feedforward neural network is taken into consideration. Hornik et al. [5] proved that Multilayer feedforward networks are universal approximators. They also maintained that: "...standard multilayer feedforward networks with as few as one hidden layer unit arbitrary

Proceedings of the 6th WSEAS Int. Conf. on Systems Theory & Scientific Computation, Elounda, Greece, August 21-23, 2006 (pp145-150)

squashing functions are capable of approximating any Borel measurable function from one finite dimensional space to another to any desired degree of accuracy, provided sufficiently many hidden units are available. In this sense, multilayer feedforward networks are a class of universal approximators." Kreinovich [7] proved that the neural networks are universal approximators even though the requirement of squashing function is replaced with more realistic requirement of smooth nonlinear activation functions. It is established that if an ANN is too small it cannot learn the problem pattern (especially in nonlinear models), and if it is too large it can memorize the training set and can not generalize well to data outside the training set (the overfitting problem) [6]. In case the size of the data is small, the latter will be a serious problem. Since the collected data set in our study is small, the second case is considerably important in the present research. Therefore, we use an especial procedure to design the network architecture to provide a suitable model. 3 The Problem of Small Size of Data As mentioned above, when the data set is small, overfitting becomes an important problem. So, different techniques have been proposed to avoid the problem of overfitting. A widespread method is the early stopping of training through cross validation [15]. However, according to Reed [16], when the data set is small, this method could not be useful. In such cases, the author stated that the techniques decreasing the number of weighs and biases by pruning the trained network can present better results. MacKay [8] proposed the use of the Bayesian analysis for a model comparison to infer the number of the network parameters to support the network to solve the problem of small size of data. His approach is in fact a kind of pruning while the training processing is still running [14]. In the present study, we use some new techniques to reduce the number of networks parameters and finding the simplest reliable network. We apply MacKay's approach to remove the problem of overfitting, as well.

4 Case Study The study area is a part of Silakhor plain in the province of Lorestan, between the cities of Broujerd and Durood, in Iran. A 100-hectare field located in the middle of the plain, were sampled on a grid with intersections at 200 m interval. A total of 25 core samples were obtained ranging from 0.0 to 25 centimeters of depths. Different physical and chemical properties of the soil were measured using standard procedures [12]. The data set is given in Table 1, (for a fuzzy regression approach to analysis such data, see [18].) The objective of the study is to fit a model between some properties of soil as dependent variable, i.e. CEC (cation exchange capacity) and independent variables of OM (organic matter content), SAND (percentage of sand content), SILT (percentage of silt content), and LIME (percentage of lime content). To reduce the number of parameters of ANN, we first performed a statistical analysis to find the more effective variables and discarded the less effective ones. So, we could design a simpler ANN with fewer numbers of parameters. In this way we lost at most a negligible amount of performance. The statistical analysis (stepwise regression) showed that SILT and LIME had no significant effects on CEC. Therefore, in applying the ANN method to fit a model to the data set, we considered OM and SAND as the independent variables. The use of the stepwise method in the selection of the significant variables, helps us to design a suitable ANN model. With regard to the small size of the data in the study (just 25 data points), reduction in the number of independent variables results in a reduction in the required parameters in the corresponding ANNs which helps to overcome the overfitting problem as well. It should be mentioned that, in this study, all calculations have been worked out through MatLab software.

Proceedings of the 6th WSEAS Int. Conf. on Systems Theory & Scientific Computation, Elounda, Greece, August 21-23, 2006 (pp145-150)

Table 1 Measured Soil Characteristics at Sampling

Locations

5 Network Architecture To arrive at the best architecture for designing the ANN for appropriate proposes is still an open problem, although there are some helpful theorems (as two theorems mentioned in Section 2) and some guiding patterns. In this article, we propose a pattern to use a semi-evolutionary procedure to find the best architecture for the ANN. Using this pattern we first define the main class of architecture we are going to use. The main class of architecture, in the present case study, is a 3 layer feed forward ANN with ‘n’ sigmoid neurons in the hidden layer and a single linear one in the output layer, preparing the single scalar output of the network. According to Hornik et al. [5] and Kreinovich [7], every bounded continuous function can be approximated with arbitrarily small error, using a feed forward ANN with one hidden layer. Of course the CEC parameter of soil can not be

infinitive, so we can consider it as a bounded value and according to it’s nature it might also be continues. The results indicate that the log-sigmoid seems to be better than the tan-sigmoid for the sigmoid transfer function of the hidden layer, in converging during the training process. To determine the undefined architectural parameters of the selected class of architecture we will produce populations large enough of ANNs with different values for the undefined architectural parameters, and use a fitness function to find the best ANN in each series. Note that, in the present case study, the only undefined architectural parameter is the number of neurons in the hidden layer. Finally we form a population of the best ANNs in each series and use a more exacting fitness function to find the best one among them. In this work, to determine the parameter ‘n’ (the number of neurons in the hidden layer) we begin with the smallest possible ‘n’, i.e. n = 1 to satisfy the ‘Occam’s razor‘[8] as much as possible, and produce a large enough amount (here 20) of ANNs with n (this time n = 1) sigmoid neurons in the hidden layer. Then, we train this population of ANNs by the training set and compute the SSE through the training set to determine how suitable this architecture is. Although the size of the data set is small, but in order to be able to have some numerical test, we prefer to save about %16 of data points to be able to test the selected ANN by the extended 25 points data set (which will contain %16 new points.) In this way, our training set is reduced to 21 points. Then, we increase parameter ‘n’ by one and repeat the above procedure. Continuing in this way, we will reach an amount for ‘n’ leading to an ANN architecture that would have so many parameters. Considering our small size of training set, and with regard to the ‘Occam’s razor’, we won't continue any more to avoid the genesis of a too complex ANN. Here, for n = 6 we decided not to continue any more, since a 3-layer feed forward ANN with a 2-6-1 architecture would have more than 25 parameters to be fitted. The ANNs with every architectures in this process seem to have 2 or 3 minima points: one with a SSE about 360, and the another with a SSE

No. of Obs.

CEC OM (%)

SAND (%)

SILT (%)

LIME (%)

1 16.5 0.88 35.0 45.0 12.0 2 18.6 1.13 37.0 42.0 13.0 3 19.3 1.31 27.0 43.0 13.5 4 20.3 1.98 29.0 41.0 13.5 5 17.3 1.02 38.0 39.0 8.0 6 20.4 1.29 32.0 39.0 11.0 7 19.3 1.52 29.0 37.0 13.6 8 21.9 1.33 18.0 45.0 15.6 9 15.9 1.71 40.0 38.0 17.7 10 18.3 2.00 28.0 46.0 15.0 11 22.6 1.68 13.0 40.0 11.3 12 23.7 2.15 19.0 41.0 19.7 13 24.4 3.52 31.0 41.0 13.0 14 21.8 2.33 31.0 42.0 25.2 15 23.8 1.71 17.0 50.0 13.0 16 20.8 1.14 14.0 53.0 15.0 17 17.5 0.99 19.0 44.0 14.3 18 17.8 1.14 28.0 43.0 11.0 19 20.2 1.46 26.0 44.0 7.0 20 20.0 1.81 32.0 42.0 11.2 21 22.8 1.38 10.0 49.0 14.0 22 19.1 0.84 38.0 43.0 10.0 23 12.1 1.48 49.0 35.0 12.0 24 12.8 1.08 42.0 44.0 10.0 25 5.3 0.36 79.0 14.0 4.6

Proceedings of the 6th WSEAS Int. Conf. on Systems Theory & Scientific Computation, Elounda, Greece, August 21-23, 2006 (pp145-150)

about 42 and the best one with a SSE about 21 (Table 2). Note that, different results for the same architectures are caused by deferent initial values for weights. Finding an ANN with a 2-2-1 architecture and a SSE of 20.5762, on the basis of the relatively good linear model for this pedotransfer function, we do the numerical test to decide which of these best trained ANNs with deferent number of middle layer neurons, seems to be the best ANN architecture. Statistical analysis shows the relatively good semi linear relation between the input arguments ‘Sand’ and ‘OM’ and the output ‘CEC’, that states we would not need a so nonlinear model. We select the best network (according to the SSE) from each series of populations. Then, we simulate the selected networks by the extended data set containing new points, leading to the SSEs between 39.5490 and 39.8310 (Table 3). Now, we can use the best 2-2-1 ANN as our ultimate ANN model, which somehow confirms the “Occam’s razor”.

Table 3 The Amounts of SSE for Selected ANNs

SSE for 2-1-1 Arch.

SSE for 2-2-1 Arch.

SSE for 2-3-1 Arch.

SSE for 2-4-1 Arch.

SSE for 2-5-1 Arch.

39.7371 39.5490 39.7934 39.8310 39.8310

6 Results So, the ultimately selected ANN model would have a MSE equal to 1.58 for the 25-points data set, which seems relatively acceptable. It should be mentioned that, the MSE of the statistical regression model (on the basis of the least squares method) was equal to 1.54 on the same data set. Table 4 shows the ANN’s outputs for the extended 25 points data set. Note that the data points 7, 15, 17, and 24 are not included in the training set, and the model has received them for the first time, but it’s output is almost acceptable for them. Although the data point No. 24 is somehow an outlier, for the other three new points the network’s response is acceptably close to the real value. Fig. 1 compares the ANN’s computed outputs in a 3-dimensional space using dots and real CEC values shown in the same space by plus

signs. Although the training set is very small, but the model output is acceptable. Using this model instead of a conventional linear regression model, would provide a good amount of nonlinearity so that the output of the model is more similar to the real values format in general.

Table 4

The Real Values of CEC and Corresponding Estimated Values by Designed ANNs

Data No. CEC ANN’s Output 1 16.5 18.171 2 18.6 17.747 3 19.3 19.430 4 20.3 20.331 5 17.3 17.495 6 20.4 18.746 7 19.3 19.405 8 21.9 20.806 9 15.9 16.853 10 18.3 20.642 11 22.6 23.120 12 23.7 23.407 13 24.4 24.066 14 21.8 20.826 15 23.8 22.317 16 20.8 21.000 17 17.5 19.900 18 17.8 19.162 19 20.2 19.778 20 20.0 19.250 21 22.8 22.716 22 19.1 17.516 23 12.1 12.261 24 12.8 16.117 25 5.3 5.579

Fig. 1 Three Dimensional Comparison between Model

Outputs and Real Values

Proceedings of the 6th WSEAS Int. Conf. on Systems Theory & Scientific Computation, Elounda, Greece, August 21-23, 2006 (pp145-150)

7 Conclusion An artificial neural network approach is considered for modeling the relation among some soil properties, and for approximating a predictive function called pedotransfer function. When the data set is small, and the underlying model is more or less linear, the proposed method may be a useful method. So we would like to consider some possible nonlinearity in the approximated model as an alternative to a completely linear model. A comparison between the fuzzy regression methods (both the fuzzy least squares and the possibilistic approaches) and the artificial neural network methods, is recommended. The sensitivity analysis based on possible outliers, is also suggested for further researches. Passing observations confirm that it is possible to change the amount of nonlinearity by changing the number of neurons in the hidden layer. Precise results in this field can be maintained as future works. Of course limitations as those mentioned in [6] should also be taken into consideration.

Acknowledgement This work was supported in part by the CEAMA, Isfahan University of Technology, Isfahan 84156, Iran. References: [1] J. Bouma, Using soil survey data for

quantitative land evaluation, Adv. Soil Sci., 9, 1989, pp. 177-213.

[2] P.A. Burrough, Fuzzy mathematical methods for soil survey and land evaluation, J. Soil Sci., 40, 1989, pp. 477-492.

[3] M.T. Hagan, H.B. Demuth and M. H. Beale, Neural Networks Design, Boston: PWS Publishing, 1996.

[4] M.T. Hagan, H.B. Demuth, O. De Jesus, An introduction to the use of neural networks in control systems, Int. J. of Robust and Nonlinear Control, 12, 2002, pp. 959-985.

[5] K.M. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks are universal approximators, Neural Networks, 2, 1989, pp. 359-366.

[6] G.B. Huang, H.A. Babri, Upper bounds on the number of hidden neurons in feedforward

networks with arbitrary bounded nonlinear activation functions, IEEE Trans. on Neural Networks, 9, 1998, pp. 224-229.

[7] V.Y. Kreinovich, Arbitrary nonlinearity is sufficient to represent all functions by neural networks: A theorem, Neural Networks, 4, 1991, pp. 381-383.

[8] D.J.C. MacKay, Bayesian interpolation, Neural Computation, 4, 1992, pp. 415-447.

[9] B. Minasny, A.B. McBratney, The neuro-m method for fitting neural network parametric pedotransfer functions, Soil Sci. Soc. Am. J., 66, 2002, pp. 352-361.

[10] J. Mohammadi, S.M. Taheri, Pedomodels fitting with fuzzy least squares regression, Iranian J. Fuzzy Systems, 1, 2004, pp. 45-61.

[11] Y.A. Pachapsky, D. Timlin, G. Varallyay, Artificial neural networks to estimate soil water retention from easily measurable data, Soil Sci. Soc. Am. J., 1996, 60, pp. 727-733.

[12] A.L. Page, R.H. Miller, D.R. Keeney, Methods of Soil Analysis, Wisconsin: Soil Science Society of America, 1982.

[13] D.T. Pham, X. Liu, Neural Networks for Identification, Prediction, and Control, New York: Springer-Verlag, 1995.

[14] F.A.C. Pinto, J.F. Reid, Q. Zhang, N. Noguchi, Guidance parameter determination using artificial neural network classifier, Toronto, ON, Canada, ASAE Meeting Presentation, Paper No. 993004, 1999.

[15] L. Prechelt, Automatic early stopping using cross validation: qualifying the criteria, Neural Networks, 11, 1998, pp. 761-767.

[16] R. Reed, Prunning algorithms-a survey, IEEE Trans. Neural Networks, 4, 1993, pp. 740-747.

[17] E. Salchow, R. Lal, N.R. Fausey, A. Ward, Pedotransfer functions for variable alluvial soils in southern Ohio, Geoderma, 73, pp. 1996, 165-181.

[18] S.M. Taheri, J. Mohammadi, Application of fuzzy regression in soil science, Proc. of the 8th

World Multi-Conf. on Systemics, Cybernetics and Informatics, Orlando, U.S.A., 2004, pp. 311-316.

Proceedings of the 6th WSEAS Int. Conf. on Systems Theory & Scientific Computation, Elounda, Greece, August 21-23, 2006 (pp145-150)

Table 2 The Amounts of SSEs and Related Number of Epochs for Each ANN to Converge During the Training

Process ANN No.

2-1-1 ANN # of Epochs

2-1-1 ANN Amount of SSE

2-2-1 ANN # of Epochs

2-2-1 ANN Amount of SSE

2-3-1 ANN # of Epochs

2-3-1 ANN Amount of SSE

2-4-1 ANN # of Epochs

2-4-1 ANN Amount of SSE

2-5-1 ANN # of Epochs

2-5-1 ANN Amount of SSE

1 187 42.2319 288 42.5178 46 360.5930 43 360.6390 395 42.7728 2 83 360.5280 54 360.5560 42 360.5930 380 42.7220 142 42.7728 3 385 42.2319 81 20.5762 335 42.6463 45 360.6390 407 42.7728 4 192 42.2319 296 42.5178 47 360.5930 31 360.6390 570 21.0105 5 71 360.5280 287 42.5178 44 360.5930 37 360.6390 394 42.7728 6 59 360.5280 293 42.5178 548 42.6463 41 360.6390 143 42.7728 7 192 42.2319 60 360.5560 39 360.5930 42 360.6390 375 42.7728 8 193 42.2319 57 360.5560 72 20.9369 344 42.7220 786 21.0105 9 78 360.5280 61 360.5560 53 360.5930 37 360.6390 396 42.7728 10 66 360.5280 284 42.5178 588 42.6463 1696 20.9861 377 42.7728 11 76 360.5280 287 42.5178 47 360.5930 43 360.6390 255 42.7728 12 195 42.2319 282 42.5178 590 42.6463 380 42.7220 383 42.7728 13 76 360.5280 2192 20.5762 45 360.5930 605 42.7220 44 360.6930 14 188 42.2319 56 360.5560 51 360.5930 366 42.7220 538 21.0105 15 192 42.2319 52 360.5560 50 360.5930 370 42.7220 378 42.7728 16 80 360.5280 59 360.5560 49 360.5930 604 42.7220 136 42.7728 17 186 42.2319 510 42.5178 54 360.5930 39 360.6390 3542 21.0105 18 84 360.5280 64 360.5560 356 42.6463 588 42.7220 375 42.7728 19 76 360.5280 236 42.5178 350 42.6463 34 360.6390 295 42.7728 20 78 360.5280 53 360.5560 3766 20.9369 615 42.7220 367 42.7728

Proceedings of the 6th WSEAS Int. Conf. on Systems Theory & Scientific Computation, Elounda, Greece, August 21-23, 2006 (pp145-150)