14
A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods Antonio J. Torija a, , Diego P. Ruiz b a Department of Electronic Technology, University of Malaga, Higher Technical School of Telecommunications Engineering, Campus de Teatinos, Malaga 29071, Spain b Department of Applied Physics, University of Granada, Avda. Fuentenueva s/n, 18071 Granada, Spain HIGHLIGHTS Machine-learning regression methods are implemented for L Aeq prediction. Non-linear solvers outperform linear solver in estimating urban environmental noise. SMO and GPR algorithms achieve the best estimation of L Aeq . CFS technique allows the greatest reduction in data-collection cost. Input variables chosen by WFS technique offers the best results in estimating L Aeq . abstract article info Article history: Received 23 May 2014 Received in revised form 4 August 2014 Accepted 19 August 2014 Available online xxxx Editor: P. Kassomenos Keywords: Feature selection Multiple linear regression Multilayer perceptron Sequential minimal optimisation Gaussian processes for regression Environmental-noise prediction The prediction of environmental noise in urban environments requires the solution of a complex and non-linear problem, since there are complex relationships among the multitude of variables involved in the characterization and modelling of environmental noise and environmental-noise magnitudes. Moreover, the inclusion of the great spatial heterogeneity characteristic of urban environments seems to be essential in order to achieve an accurate environmental-noise prediction in cities. This problem is addressed in this paper, where a procedure based on feature-selection techniques and machine-learning regression methods is proposed and applied to this environ- mental problem. Three machine-learning regression methods, which are considered very robust in solving non- linear problems, are used to estimate the energy-equivalent sound-pressure level descriptor (L Aeq ). These three methods are: (i) multilayer perceptron (MLP), (ii) sequential minimal optimisation (SMO), and (iii) Gaussian processes for regression (GPR). In addition, because of the high number of input variables involved in environmental-noise modelling and estimation in urban environments, which make L Aeq prediction models quite complex and costly in terms of time and resources for application to real situations, three different techniques are used to approach feature selection or data reduction. The feature-selection techniques used are: (i) correlation-based feature-subset selection (CFS), (ii) wrapper for feature-subset selection (WFS), and the data reduction technique is principal-component analysis (PCA). The subsequent analysis leads to a proposal of different schemes, depending on the needs regarding data collection and accuracy. The use of WFS as the feature-selection technique with the implementation of SMO or GPR as regression algorithm provides the best L Aeq estimation (R 2 = 0.94 and mean absolute error (MAE) = 1.141.16 dB(A)). © 2014 Elsevier B.V. All rights reserved. 1. Introduction Exposure to environmental noise in urban areas is associated with adverse effects on human health and quality of life (Nega et al., 2012; Lucas de Souza and Giunta, 2011; Vlachokostas et al., 2012). Several studies have indicated that urban noise pollution can lead not only to psychological damage, such as annoyance, sleep disturbance, anxiety, etc., but also to physiological problems, such as cardiovascular risks (Belojevic et al., 1997, 2008; Hofman et al., 1995; Kurra et al., 1999; Shaw, 1996). Environmental-noise-pollution models are important tools in planning more environmentally friendly urban sound spaces and assessing the impact of environmental noise on the exposed population (Pamanikabud and Tansatcha, 2003; Zhao et al., 2012). In addition, at design stages, environmental-noise-pollution models are needed for planning and designing new neighbourhoods as well as streets in urban environments so that impacted areas have comfortable sound conditions (Steele, 2001; Gündogdu et al., 2005). Consequently, noise models are extensively used in the monitoring of environmental-noise Science of the Total Environment 505 (2015) 680693 Corresponding author. Tel.: +34 95 21 32844. E-mail addresses: [email protected], [email protected] (A.J. Torija). http://dx.doi.org/10.1016/j.scitotenv.2014.08.060 0048-9697/© 2014 Elsevier B.V. All rights reserved. Contents lists available at ScienceDirect Science of the Total Environment journal homepage: www.elsevier.com/locate/scitotenv

A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods

  • Upload
    diego-p

  • View
    216

  • Download
    2

Embed Size (px)

Citation preview

Page 1: A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods

Science of the Total Environment 505 (2015) 680–693

Contents lists available at ScienceDirect

Science of the Total Environment

j ourna l homepage: www.e lsev ie r .com/ locate /sc i totenv

A general procedure to generate models for urban environmental-noisepollution using feature selection and machine learning methods

Antonio J. Torija a,⁎, Diego P. Ruiz b

a Department of Electronic Technology, University of Malaga, Higher Technical School of Telecommunications Engineering, Campus de Teatinos, Malaga 29071, Spainb Department of Applied Physics, University of Granada, Avda. Fuentenueva s/n, 18071 Granada, Spain

H I G H L I G H T S

• Machine-learning regression methods are implemented for LAeq prediction.• Non-linear solvers outperform linear solver in estimating urban environmental noise.• SMO and GPR algorithms achieve the best estimation of LAeq.• CFS technique allows the greatest reduction in data-collection cost.• Input variables chosen by WFS technique offers the best results in estimating LAeq.

⁎ Corresponding author. Tel.: +34 95 21 32844.E-mail addresses: [email protected], [email protected]

http://dx.doi.org/10.1016/j.scitotenv.2014.08.0600048-9697/© 2014 Elsevier B.V. All rights reserved.

a b s t r a c t

a r t i c l e i n f o

Article history:Received 23 May 2014Received in revised form 4 August 2014Accepted 19 August 2014Available online xxxx

Editor: P. Kassomenos

Keywords:Feature selectionMultiple linear regressionMultilayer perceptronSequential minimal optimisationGaussian processes for regressionEnvironmental-noise prediction

The prediction of environmental noise in urban environments requires the solution of a complex and non-linearproblem, since there are complex relationships among themultitude of variables involved in the characterizationandmodelling of environmental noise and environmental-noisemagnitudes.Moreover, the inclusion of the greatspatial heterogeneity characteristic of urban environments seems to be essential in order to achieve an accurateenvironmental-noise prediction in cities. This problem is addressed in this paper, where a procedure based onfeature-selection techniques andmachine-learning regressionmethods is proposed and applied to this environ-mental problem. Three machine-learning regression methods, which are considered very robust in solving non-linear problems, are used to estimate the energy-equivalent sound-pressure level descriptor (LAeq). These threemethods are: (i) multilayer perceptron (MLP), (ii) sequential minimal optimisation (SMO), and (iii) Gaussianprocesses for regression (GPR). In addition, because of the high number of input variables involved inenvironmental-noise modelling and estimation in urban environments, which make LAeq prediction modelsquite complex and costly in terms of time and resources for application to real situations, three differenttechniques are used to approach feature selection or data reduction. The feature-selection techniques used are:(i) correlation-based feature-subset selection (CFS), (ii) wrapper for feature-subset selection (WFS), and thedata reduction technique is principal-component analysis (PCA). The subsequent analysis leads to a proposalof different schemes, depending on the needs regarding data collection and accuracy. The use of WFS as thefeature-selection technique with the implementation of SMO or GPR as regression algorithm provides the bestLAeq estimation (R2 = 0.94 and mean absolute error (MAE) = 1.14–1.16 dB(A)).

© 2014 Elsevier B.V. All rights reserved.

1. Introduction

Exposure to environmental noise in urban areas is associated withadverse effects on human health and quality of life (Nega et al., 2012;Lucas de Souza and Giunta, 2011; Vlachokostas et al., 2012). Severalstudies have indicated that urban noise pollution can lead not only topsychological damage, such as annoyance, sleep disturbance, anxiety,etc., but also to physiological problems, such as cardiovascular risks

(A.J. Torija).

(Belojevic et al., 1997, 2008; Hofman et al., 1995; Kurra et al., 1999;Shaw, 1996).

Environmental-noise-pollution models are important tools inplanning more environmentally friendly urban sound spaces andassessing the impact of environmental noise on the exposed population(Pamanikabud and Tansatcha, 2003; Zhao et al., 2012). In addition, atdesign stages, environmental-noise-pollution models are needed forplanning and designing new neighbourhoods as well as streets inurban environments so that impacted areas have comfortable soundconditions (Steele, 2001; Gündogdu et al., 2005). Consequently, noisemodels are extensively used in the monitoring of environmental-noise

Page 2: A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods

681A.J. Torija, D.P. Ruiz / Science of the Total Environment 505 (2015) 680–693

impact and in the management of practical solutions to existing noiseproblems (Givargis and Karimi, 2010).

Many scientific models have been developed focussing on road-traffic-noise prediction based on source emission and on empirical for-mulations for sound propagation (Garg and Maji, 2014). These modelsallow accurate road-traffic modelling. Although road traffic has beenidentified as the main environmental noise source in cities, it is widelyknown that one of the most salient characteristics of urban environ-ments is their complexity (Tang and Wang, 2007; Torija et al., 2010,2012), which is reflected in an accumulation, saturation, and diversityof sound sources, i.e. road traffic, industry, construction, commerce,and social as well as leisure activities. Therefore, the use of such emis-sion–propagation empirical road-traffic-noise models can lead to aserious underestimation of sound levels in urban agglomerations,since the contribution of sound sources other than road traffic is notconsidered.

Also, built-up environments entail large spatial heterogeneity, withdifferent types of locations (traffic street, square, urban park, etc.), var-ied geometry of locations, coexistence of diverse sound sources, as wellas great temporal heterogeneity, depending on the time of day (day,evening or night) and the type of day (work day, weekend) (Torijaet al., 2012). In environmental-noise modelling, this heterogeneity con-stitutes an essential aspect, since this situational diversity is a key factorto consider in order to develop a well-designed environmental model(Lucas de Souza and Giunta, 2011). For this, it is necessary toundertake a suitable selection of variables related to acoustic emissionand propagation by which to characterize built-up environmentsappropriately (Torija et al., 2010).

These built-up environments, togetherwith the large spatial, tempo-ral, and spectral variability of environmental noise in urban spaces,make its modelling and prediction an extremely complex and non-linear problem (Torija et al., 2012). Because machine-learning tech-niques have a great ability to model non-linear relations, algorithmssuch as artificial neural networks (ANNs) (Givargis and Karimi, 2010;Lucas de Souza and Giunta, 2011; Torija and Ruiz, 2012; Torija et al.,2012; Verrelst et al., 2012; Yilmaz and Kaynar, 2011), support vectormachines (SVM) (Chuang and Lee, 2011; Thissen et al., 2004; Vapnik,1995; Verrelst et al., 2012; Xinjun, 2010; Torija et al., 2014), and Gauss-ian processes for regression (GPR) (Pasolli et al., 2010; Rasmussen andWilliams, 2006; Verrelst et al., 2012; Wu et al., 2012), have been suc-cessfully employed in solving regression problems. Therefore, thesemachine-learning techniques for regression could be used to developurban environmental-noise-pollution models.

The machine-learning-based models learn the relationship betweena set of input variables and a dependent output, in this case energy-equivalent sound-pressure levels. Consequently, the performance ofthis approach depends heavily on an appropriate selection of input var-iables, but also on the collection of a representative sample of the di-verse typology of scenarios where such models would be used. Thus,under unknown conditions for the models or new noise sources, a pro-cess for updating the database or the set of input variables is needed ifreliable predictions are requested. On the other hand, the machine-learning models are flexible, adaptable and able to learn complex rela-tionships, so that, a consequent improvement of the provided databasemight enable their application in a great range of scenarios. Also, otheradvantages of machine-learning-based models for estimating urbanenvironmental noise that could be mentioned include: (i) easy imple-mentation once trained and validated; (ii) ability to model non-conventional road-traffic sources and noise sources other than roadtraffic; (iii) possibility of implementation to deal with long-term butalso short-time energy-equivalent sound levels. Thus, they can beused to identify impacts (or deviations from limit noise values) on ashort-term scale, and hence offer faster action for noise managementin noise action plans.

The objective of the presentwork is to develop a procedure for accu-rate environmental-noise-pollution modelling with a reduced

computational and data collection cost in urban environments. Firstly,with the inclusion of a previously selected set of input variables(Torija et al., 2010), and with the use of machine-learning algorithms,a series of environmental-noise-pollution models to built-up environ-ments are developed and tested. To develop machine learning for re-gression models, three approaches were used: multilayer perceptronneural networks (MLP), sequential minimal optimisation (SMO) for re-gression, and Gaussian process for regression (GPR). The Pearson VIIfunction-based universal kernel (PUK) has been implemented to buildSMO and GPR models. Furthermore, this it was hypothesised that theenvironmental-noise modelling in urban environments is a non-linearproblem, so that the performance of themachine learning for regressionmodels (non-linear solvers) is compared with the one of a classical lin-ear solver (multiple linear-regression model, MLR). Secondly, a featureselection was undertaken to reduce the complexity of the models andthereby decrease both the computational and data-collection cost,thereby offering a more easily implemented model. Two different ap-proaches were followed: correlation-based feature-subset selectionand wrapper for feature-subset selection. The performance of thesefeature selection methods is compared to the one of a classical data re-duction technique (principal components analysis). Thus, once the con-firmation of the feasibility ofmachine learning for regression algorithmsfor developing urban environmental-noise-pollution models, differentschemes for selecting the input variables set were proposed, dependingon the needs regarding data-collection cost, computational cost andaccuracy.

2. Methods

2.1. Methods used for urban environmental-noise-pollution

As stated above, this paper proposes a procedure for developingenvironmental-noise-pollution models for built-up environments. Forthis task, the performance of machine-learning algorithms such asMLP, SMO, and GRP was assessed and compared with that of a MLR. Itshould be noted that, the training and test conditions of the differentmodels built are exactly the same for all the estimation techniquesimplemented.

Fig. 1 shows a diagram with information on the process used formodel development.

2.1.1. Multiple-linear regressionMLR is employed to predict the variance in a dependent interval,

based on linear combinations of interval, dichotomous, or dummy inde-pendent variables. The general purpose of MLR is to learn more aboutthe relationship involving several independent or predictor variablesand a dependent or criterion variable (Yilmaz and Kaynar, 2011). MLRis usually based on the least-squares method, so that the model is fitin such away that the sumof squares of differences of observed andpre-dicted values is minimized. Therefore, the predicted value is estimatedas follows:

yi ¼ b0 þ b1xi1 þ b2xi2 þ ⋯þ bkxik ð1Þ

where, bi are the estimations of the βi parameters or standardized coef-ficients (the estimates resulting from the analysismade on the indepen-dent variable that have been standardized) and yi is the predicted value(Agirre-Basurko et al., 2006).

In this study, aMLR is used to compare the performance of nonlineartechniques against a classical linear-estimation method for estimatingenvironmental noise in urban environments.

2.1.2. Multilayer perceptron neural networksThe most common approach to develop nonparametric and nonlin-

ear regression is based on ANN (Haykin, 1999). An ANN is a structurecomposed of artificial neurons (nodes) set in layers and connected

Page 3: A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods

Fig. 1. Process followed for model development.

682 A.J. Torija, D.P. Ruiz / Science of the Total Environment 505 (2015) 680–693

with each other. In this work, a MLP is the typology of ANN selected fordeveloping the environmental-noise-pollution models. As can be seenin Fig. 2, in this ANN typology the weighted sum of the inputs and thebias term are passed to the activation level through a transfer functionto produce the output, and the units are arranged in a layered feed-forward topology called feed forward neural network (Venkatesanand Anitha, 2006). Morphologically, a MLP is constituted by at leastthree layers, i.e. the input layer, the hidden layer(s), and the outputlayer (Haykin, 1999).

From a set of observations, an estimator gλ x; wð Þ of the unknownfunction h xð Þ is built during the learning phase.

gλ x;wð Þ ¼ γ2

Xλj¼1

w 2½ �j γ1

Xmi¼1

w 1½ �ij þw 1½ �

mþ1; j

� �þw 2½ �

λþ1

� �ð2Þ

Fig. 2. Illustration of the structure of the m

where, w ¼ w1;…; wdð ÞT is the parametric vector to be estimated,which is equivalent to the weights of the connections between the net-work neurons; γ1 is a bounded and differentiable non-linear function,which is usually a sigmoid or radial basis function; γ2 is a functionthat can be linear or non-linear; and λ is the control parameterwhich indicates the number of neurons in the hidden layer(Haykin, 1999).

Thus, theMLP providesm outputs from n inputs through some non-linear functions. The output of theNN is determined by the activation ofthe units in the output layer.

xO ¼ fX

hxhwhO

� �ð3Þ

ultilayer perceptron used in this work.

Page 4: A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods

683A.J. Torija, D.P. Ruiz / Science of the Total Environment 505 (2015) 680–693

where f() is the activation function, xh is the activation of hth hiddenlayer node, and whO is the interconnection between hth hidden layernode and Oth output layer node (Yilmaz and Kaynar, 2011). The activa-tion function usually implemented in the hidden layer is a sigmoid func-tion (Yilmaz and Kaynar, 2011):

xO ¼ 1

1þ exp −X

xhwhO

� � : ð4Þ

However, in the output layer, the most commonly used activa-tion function is a linear function, although a monotone, bounded,and differentiable non-linear function could be implemented.These requirements are satisfied by a sigmoid function (Chandra,2003).

The MLP is trained according to a supervised learning scheme.Supervised training is accomplished by the gradient descent back-propagation algorithm (Rumerlhart et al., 1986). During the trainingphase,MLP startswith a random set of initialweights and then the algo-rithmminimizes a function, usually quadratic, of the errors between theexpected outputs (T) and outputs calculated by the MLP (Y) in each ofthe training examples, modifying the system parameters (connectionweights of theMLP). Therefore, the error E between the values calculat-ed by the MLP (Y) for each input (x) and the expected output (T) iscalculated. Thus, during this process, for a set of data pairs (x, T), theerror E between the MLP output and the target is minimized:

minE ¼ 12

XMk¼1

XNj¼1

T j−Y j

� �2 ð5Þ

where M is the number of training examples and N is the number ofneurons in the output layer.

Training anMLPdemands a selection of a suitable structure (numberof hidden layers and neurons per layer), proper initialization of theweights, shape of thenonlinearity, and the selection of the training algo-rithm and regularization parameters to prevent overfitting (Verrelstet al., 2012). In this work, the gradient descent with momentumbackpropagation algorithm was used to train the MLP. The value ofthe parameters learning rate (LR) and momentum (mu) was selectedby means of an iterative process, which sought the minimum ANNerror. Moreover, the ANN weights were randomly initialized and themaximum number of neurons in the hidden layer was limited to halfthe number of ANN inputs.

2.1.3. Sequential minimal optimization for regressionA successful technique for regression problems is the SVR method

(Schölkopf and Smola, 2002; Smola and Schölkopf, 2004).Given a set of training input–output pairs, {xi, yi}i= 1

n , the SVR definesa higher dimensional space Φ : xi → Φ(xi) ∈ RH, where H ≥ B. The SVRestimation model can be defined as:

yi ¼ wTΦ xi þ bð Þ: ð6Þ

Thus, SVR estimates weights w by minimizing the following func-tion:

E wð Þ ¼ 1N

XNi¼1

yi− f xi;wð Þj jϵ þ wk k2; ð7Þ

The term | · |ϵ is an ϵ-insensitive error function, which is defined as:

xj jϵ ¼0

x−ϵj jif xj jbϵ

otherwise

�: ð8Þ

The output of the SVR model is calculated as follows:

yi ¼Xn

j¼1α j−α�

j

� �K xi; xj

� �þ b ð9Þ

where αj⁎ and αj are Lagrange multipliers and, K(xi, xj) is the underlyingkernel function. The objective function should be minimized withrespect to α* and α, subject to the constraints:

XNi¼1

α j−α�j

� �¼ 0; 0≤α j;α

�j ≤C;∀i: ð10Þ

C is a user-defined constant that represents a balance between themodel complexity and the approximation error (Flake and Lawrence,2001).

The training of SVR requires the solution of a quadratic program-ming (QP) problem, which involves high computational complexityand restricts its applicability (Vapnik, 1998). For that reason, in thiswork, the SMO algorithm was used for training SVR. SMO algorithmcan quickly solve the SVR QP problemwithout using any extra matrixstorage and without implementing numerical QP optimisation pro-cesses (Platt, 1999), which improves its applicability for using in en-vironmental modelling problems. Thus, SMO breaks this probleminto a series of smaller possible sub-problems, which are thensolved analytically using Osuna's theorem to ensure convergence.For the standard SVR QP problem, the SMO algorithm repeatedlyfinds two Lagrange multipliers that can be optimised with respectto each other and analytically computes the optimal step for thetwo Lagrange multipliers (Flake and Lawrence, 2001). SMO algo-rithm actually has two components: an analytic method for solvingthe two Lagrange multipliers and a heuristic to choose which multi-pliers to optimise (Platt, 1999).

The SMOmodels were implemented by adapting the stoop criterionof Shevade et al. (2000). For this implementation, the optimal value of(i) epsilon parameter for round-off error, (ii) epsilon parameter of theepsilon insensitive loss function (ϵ), and (iii) the tolerance parameterused for checking the stop criterion were selected by means of an itera-tive process, which sought the minimum SMO error.

On the other hand, SMOmodels are calculated as a weighted sum ofkernel-function outputs, so that these kernel functions can be an innerproduct, Gaussian basis function, polynomial, or any other functionthat obeys Mercer's condition (Flake and Lawrence, 2001). In thepresent study, for implementing the SMO model for regression and, ascan be seen below, for implementing GPR model, the PUK kernel(Üstün et al., 2006) was used. The PUK kernel has been established asa very robust and less-time-consuming kernel, which in combinationwith SVR improves the generalization performance of a given method,andwhich can properly deal with a large variety of regression problems(Üstün et al., 2006). These are the reasons why this kind of kernel wasselected. Moreover, this kernel function fulfils Mercer's condition,since its corresponding kernel matrix is symmetric and positive semi-definite. This latter point can be seen in the following expression ofthe PUK kernel (Üstün et al., 2006):

K xi; xj

� �¼ 1

1þ 2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffixi−x jk k2

p ffiffiffiffiffiffiffiffiffiffiffiffiffiffi2 1=ωð Þ−1

� �2" #ω ð11Þ

where the parameters σ andω define the Pearson VII peak shape. Theparameter σ controls the half-width and the parameter ω the tailingfactor of the peak. Based on the results provided by Üstün et al.(2006), the optimal value of the parameters σ and ω was selectedfor the implementation of the PUK kernel in the SMO and GRPmodels.

2.1.4. Gaussian process regressionRecently, GPR was introduced by Rasmussen and Williams (2006).

This powerful regression tool provides a probabilistic approach forlearning generic regression problems with kernels (Verrelst et al.,2012). With this technique, the learning of a regressor machine isformulated in terms of a Bayesian estimation problem, and thus the

Page 5: A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods

684 A.J. Torija, D.P. Ruiz / Science of the Total Environment 505 (2015) 680–693

parameters of the machine are assumed to be random variables whichare a priority jointly drawn for a Gaussian distribution (Pasolli et al.,2010).

Several reasons to use this technique can be found in the literature(Pasolli et al., 2010; Rasmussen and Williams, 2006; Verrelst et al.,2012), notably that: (i) it leads to an analytical solution of the regres-sion problem (Pasolli et al., 2010); (ii) the use of a completely auto-matic Bayesian model-selection strategy saves time needed by thecommon empirical model-selection procedure and for a further in-crease in the estimation accuracy (Pasolli et al., 2010); (iii) both apredictive mean and a predictive variance are possible (Verrelstet al., 2012); (iv) since all hyperparameters can be learned efficientlyby maximizing the marginal likelihood in the training set, very so-phisticated kernel functions in the GPR model can be used (Verrelstet al., 2012). Regarding this latter aspect, as mentioned above, aPUK kernel has been implemented. Let us consider xi ∈ ℜm as a vec-tor of m features (i.e. input variables) and yi ∈ ℜ as the associatedtarget value (i.e. LAeq), with i = 1,2,…,M. Also, all the xis are aggre-

gated into a m × M feature matrix X, and all yi s are aggregated into

a M × 1 target vector y so that the set of samples L ¼ X; yn o

. Under

the Gaussian process (GP) learning scheme, the observed values y

of the function to model are assumed to be the sum of a function fand a noise component ;, where

f � GP 0;K X;X� �n o

ð12Þ

ε � N 0;σ2nI

� �ð13Þ

where, K X;X� �

is the covariance matrix, σn2 is the variance and I

represents the identify matrix.According to this Pasolli et al. (2010) formulation, given the set of

samples L, the best estimation of the output value f⁎ associated withan unknown sample x� is represented by the expectation of the desiredoutput conditioned to L and x�, as follows:

f � X; y; x� � E��� f � X; y; x�

���n o¼

Zf �p f � X; y; x�

���� �df �: ð14Þ

The predictive distribution p f � X; y; x����� �

can be derived using thefollowing expression:

p f � X; y; x����� �

� N μ�;σ2n

� �ð15Þ

where

μ� ¼ kT� � K X;X� �

þ σ2nI

h i−1 � y ð16Þ

σ2� ¼ k x�; x�ð Þ−kT� � K X;X

� �þ σ2

nIh i−1 � k�: ð17Þ

The vector k� represents the covariance values between the trainingsamples X and the sample x�, the prediction of which is sought. More-over, themean μ⁎ is the best output value (LAeq) and σ⁎

2 expresses a con-fidence measure associated by the model to the output (Pasolli et al.,2010).

In the implementation and configuration of the GRP models, thelevel of Gaussian noise, which is added to the diagonal of the covariancematrix, was selected by using an iterative process that sought the min-imum GRP error.

2.2. Methods used for feature selection

As mentioned above, in the process of building an urbanenvironmental-noise-pollution model for estimating the LAeq descrip-tor, the second step after analysing the performance of MLP-, SMO-,GPR- and MLR-based models is to approach a feature-selection processin order to reduce the complexity of the models and diminish the com-putational and data-collection costs as well. In this section, the threefeature-selection methods used are introduced.

2.2.1. Correlation-based feature-subset selectionThe correlation-based feature-subset selection (CFS) method was

introduced by Hall and Smith (1997). According to this method, themerit of a given set of features is evaluated by a metric, which isthen used to guide a search for the best possible subset of variables.To approach this process, a merit function based on the correlationbetween each feature and the output (relevancy) and on the correla-tion among the features in the subset (redundancy) is implementedas follows (Calvo et al., 2009):

GS ¼krciffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

kþ k k−1ð Þrii0p ð18Þ

where k is the number of variables in the subset S, rci is the averagecorrelation between the features in S and the class, and rii0 is the av-erage correlation among the features in S (Calvo et al., 2009).

In combinationwith this feature evaluator, a genetic algorithm (GA)is used as the selection method. This GA is capable of effectively explor-ing large search spaces, which are usually required in attribute selection(Tiwari and Singh, 2010). In the presentwork, the total number of com-peting candidate subsets to be generated was 232. The implemented GAperformed a global search in that space. This genetic-search-basedmethod was selected because it has been demonstrated to be a very ro-bust and efficient selectionmethod, with substantial improvement overa variety of randomand local searchmethods (De Jong, 1988; Vafaie andImam, 1994). This genetic search for feature selection, using CFS as fea-ture evaluator, was implemented inWeka software (version 3.6),whichuses the simple genetic algorithm described in Godberg (1989). A GA isruled by three operations: (i) selection, where a good string is chosen tobreed a new generation; (ii) crossover, which combines good string togenerate better offspring; and (iii) mutation, which alters a string local-ly to ensure genetic diversity among generations (Tiwari and Singh,2010). For implementing this searching method, a GA was configuredusing as reference the parameters recommended by Hall et al. (2009)in Weka software: (i) population size equal to 20 individuals, (ii) prob-ability of crossover equal to 60%, and (iii) probability of mutation occur-ring equal to 3.3%. Amaximumnumber of generations to evaluate equalto 100 was set as a termination criterion.

2.2.2. Wrapper for feature-subset selectionIn supervised machine-learning problems, an induction algorithm is

presented with a set of training records. According to the wrapperapproach, the feature subset is selected using the induction algorithmas a black box. It should be noted that the induction algorithm refersto the regression algorithms presented in Section 2.1. Under thisfeature-selection scheme, four subsets of input variables were selected,i.e. one for each of the 4 regression algorithms (or induction algo-rithms). Thus, the four regression algorithms, with the configurationwhich allowed a lower estimation error, were used as induction algo-rithm for the WFS method. The feature-selection algorithm searchesfor an optimal subset by using the induction algorithm itself as a partof the evaluation function, so that the feature-subset-selection algo-rithm exists as a wrapper around the induction algorithm (Kohavi andJohn, 1997). The WFS can be considered a method that evaluatesattribute sets by using a learning scheme. The WFS method proceedsas follows: the induction algorithm is run on the set of data, which is

Page 6: A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods

685A.J. Torija, D.P. Ruiz / Science of the Total Environment 505 (2015) 680–693

usually split into the internal training and holdout set, with a differentset of features removed from the data. The feature subset with thehighest evaluation is chosen as the final set on which to run the induc-tion algorithm (Kohavi and John, 1997). A cross-validation process wasused to estimate the accuracy of the learning scheme for a set ofattributes.

Finally, the same GA as the one used for CFS feature selection wasimplemented as searching method.

2.3. Method for data reduction: principal-components analysis

With the principal-components analysis (PCA), a statistical tech-nique used to extract information from a multi-variety dataset, thelower-dimensional version of the original dataset is established, sothat PCA could be considered a data-reducing technique.

The process approached byPCA to identify theprincipal componentsof original variables, with the implementation of linear combinations, isas follows: (i) the original dataset with the maximum variability isrepresented with the first principal component; (ii) the dataset fromthe remaining informationwith themaximumvariability is representedwith the second principal component; (iii) the process continues con-secutively as such, and the dataset from the remaining informationwith the maximum variability is represented with the next principal

Table 1Set of input variables previously selected to be included in a LAeq prediction model.

Key Input variable Type of variable

TD Type of day DiscreteDP Day period DiscreteCLE Commercial/leisure environment BinaryCW Appearance of construction works BinaryTL Type of location Discrete

WF Presence of water fountains BinaryV Presence of vegetation BinaryST Stabilization time of the sound pressure level ContinuousTFD Type of traffic flow dynamics Discrete

ALVF Ascendant light vehicles flow ContinuousDLVF Descendant light vehicles flow ContinuousAHVF Ascendant heavy vehicles flow ContinuousDHVF Descendant heavy vehicles flow ContinuousABF Ascendant buses flow ContinuousDBF Descendant buses flow ContinuousAMMF Ascendant motorcycles/mopeds flow ContinuousDMMF Descendant motorcycles/mopeds flow ContinuousUCV Appearance of urban cleaning vehicles BinaryVS Appearance of vehicles with siren BinaryNSET Appearance of noticed sound events related to traffic BinaryNSEnT Appearance of noticed sound events no related to traffic BinaryAS Average speed ContinuousGR Gradient of the road Discrete

UL Number of upward lanes DiscreteDL Number of downward lanes DiscreteTP Type of pavement Discrete

CS Condition of surface Discrete

SG Street geometry DiscreteSW Street width ContinuousBH Buildings height ContinuousRW Roadway width ContinuousSRD Source–receptor distance Continuous

It should be noted that according to the nomenclature used in this paper, (i) in MLR each inpu(iii) xi denotes each vector of input variables in SMO and GPR.

component (Uguz, 2011). A detailed explanation of the PCA methodcan be found in Jolliffe (1986).

In the present study, the PCA is used in conjunction with a Rankersearch. The dimensionality reduction is approached by selecting enougheigenvectors to account for a given percentage of the variance in theoriginal data—that is, principal components are chosen based on theircumulative percentage of variance higher than a established thresholdvalue (95% of variance in this work) (Uguz, 2011). This criterion hasbeen applied because of its simplicity and for its adaptable performance(Valle et al., 1999).

2.4. Evaluation of model performance

For a suitable assessment of model performance, it is advisable toexamine both the overall model performance and overall model errordescriptors. Several suggestions have been made for more meaningfulperformance statistical indicators (Ibarra-Berastegi et al., 2008). In thepresent study, two statistical indicators were selected and used. As anindicator intended to describe the overall model performance, Pearson'ssquared correlation coefficient (R2) was chosen. This indicator repre-sents the proportion of the observed variance explained by the model.To characterize the overall model error, the mean absolute error(MAE) was selected. The MAE is an average of the absolute error ei =

Value Range values Units

Working day (1), Saturday (2), Sunday (3) – –

Day (1), evening (2), night (3) – –

No (0)/Yes (1) – –

No (0)/Yes (1) – –

Road traffic street (1), pedestrian walk (2),square (3), park (4)

– –

No (0)/Yes (1) – –

No (0)/Yes (1) – –

– 0.4–58.8 minNo flow (0), intermittent (1), constant fluid (2),decelerated in pulses (3), constant pulsed (4),Accelerated in pulses (5),congestion (6)

– –

– 0–162 veh/5-min– 0–156 veh/5-min– 0–4 veh/5-min– 0–4 veh/5-min– 0–13 veh/5-min– 0–9 veh/5-min– 0–51 veh/5-min– 0–56 veh/5-minNo (0)/Yes (1) – –

No (0)/Yes (1) – –

No (0)/Yes (1) – –

No (0)/Yes (1) – –

– 7.5–65 km/hNon-relevant gradient (0),ascendant gradient N 2% (1)

– –

Number of lanes 1–3 –

Number of lanes 1–4 –

No type (0), porous asphalt (1),smooth asphalt (2), paved (3)

– –

No type (0), good (1), neither goodnor bad (2), bad (3)

– –

“U” type (1), “L” type (2), free field type (3) – –

– 3.5–100 m– 0–34 m– 2.65–32.9 m– 1.75–50 m

t variable is represented by xi; (ii) x represents each vector of input variables in MLP; and

Page 7: A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods

686 A.J. Torija, D.P. Ruiz / Science of the Total Environment 505 (2015) 680–693

|fi − yi|, where fi is the predicted value and yi is the observed value.Moreover, the training time for the implemented models was alsoevaluated. Finally, the performance of the different methods used forestimation, and variable selection was compared using Student's t-test.

On the other hand, machine-learning algorithms learnt the relation-ship between the input variables (see Table 1) and the output (LAeqdescriptor) by fitting a flexible model directly from the data. For thedevelopment of the different models presented in this work, atraining-validation process was executed to minimize the estimationerror. Thus, based on the knowledge acquired in the relevant literature,the value of the different parameters needed to configure the imple-mented algorithms was iteratively changed until the lower values ofR2 and MAE descriptors in the validation subset were found. This pro-cess for evaluating the models' performance was undertaken by usinga 10-fold cross-validation standard scheme. Regarding this 10-fold

Fig. 3.MAE (a) and R2 (b) values for the models de

cross-validation, 10 training and validation subsets were built. In eachsubset, 90% of samples were used for training and the 10% of samplesfor validation. The value of the different statistical descriptors men-tioned above was calculated as the arithmetic mean of the 10 validationsubsets. Also, it should be noted that the same training and validationperiods were used for each method. On the other hand, one of themain issues in the development of machine-learning based models isthe presence of overfitting. Overfitting occurs when a model achievesan outstanding performance on the training data but it is unable to gen-eralize, i.e. approach predictions on unseen data. The cross-validationmethod has been found as an outstanding technique for avoidingoverfitting (Verrelst et al., 2012), and thus for achieving good generali-zation capability. With regard to the SMO model, overfitting is avoidedbecause, as the value of ε increases, the number of support vectorsdecreases (Verrelst et al., 2012).

veloped with the whole set of input variables.

Page 8: A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods

687A.J. Torija, D.P. Ruiz / Science of the Total Environment 505 (2015) 680–693

2.5. Data sampling

To develop and evaluate the performance of the different imple-mented models, a database of 533 records was used. Each record wascomposed of the value for the 32 input variables (Torija et al., 2010)and the corresponding value of the LAeq descriptor. It should be notedthat some other input variables, such as “type of day”, “type of location”,“presence of water fountains”, “presence of vegetation” and “appear-ance of urban cleaning vehicles” were added to the original set ofinput variables found in Torija et al. (2010). Moreover, in this database,the heavy vehicle flow was subdivided into “heavy vehicle flow” and“bus flow”. For this database, a samplemeasurement was taken of envi-ronmental noise throughout the city of Granada (southern Spain). Themeasurement locations, 80 in number, were selected to be as generical-ly representative as possible of the wide range of urban scenarios andspatial configurations. It should be mentioned that, the original mea-surement campaign was conducted during February and March of2007, and then this database was updated with new measurementsin November 2007, May–June 2008 and January 2009. Thus, thesemeasurement campaignswere aimed at achieving a database which in-cluded both short-term (day period and type of day) and long-term(different seasons) temporal fluctuations. To ensure the representative-ness of these short-term measurements, these data were comparedwith long-termmeasurements (2006–2008) coming from the strategicnoisemapof the city of Granada just to be sure that adequate trends andurban configurations were selected. Appendix A provides a Google mapof the 80 measurement locations as well as their description (value ofthe 32 input variables considered and LAeq descriptor).

Measurements were made for between 15 and 70 min, dependingon the stabilization time of the sound-pressure level in the localization(Torija et al., 2011). Each measurement was fragmented into 5-minsub-fragments. It should be noted that this fragmentation was carriedout because this research was focused on predicting short-term urbanenvironmental noise. Thus, the overall number of available recordswas 533.

These measurements were made with a type-1 sound-level meter(2260 Observer model with sound basic analysis programme BZ7219),using a tripod and wind shield. Before the measurement, the sound-level meter was calibrated using a 4231 Brüel & Kjaer calibrator. Mea-surements were taken following international reference procedures,with all microphonesmounted away from reflecting facades, at a heightof 4 m above the local ground level (EU, 2002).

Moreover, each of the measurement localizations chosen wascharacterized by collecting data for all the input variables used(Table 1). It should be mentioned that before their introduction tothe models, the qualitative nominal variables were converted intodiscrete variables. For more details on input-variable description,see Torija et al. (2010). Each sub-fragment of 5-min recorded sound-pressure level has its corresponding value for the 32 input variablesconsidered.

3. Results

3.1. Performance of machine-learning regression models in predicting LAeq

Fig. 3 lists the values of MAE and R2 compiled with the use of themodels built. Fig. 3 shows that SMO- and GPR-based models give quitesimilar values in MAE and R2 indicators and, that these models achievethe highest value of explained variance and the lower estimation error.Based on the results of the two statistical descriptors, the MLP-basedmodel renders a worse LAeq estimation than the other two machine-learningmethods. In any case, these threemachine-learning algorithmsoutperformed MLR model.

To check the ability in LAeq estimation of the different models devel-oped, Fig. 4 presents the observed and predicted values of themodel out-put for one of the test subsets formed by the cross-validation technique.

The results in Fig. 4 point out both the SMO and GPR as the modelswith the greatest performance in estimating LAeq descriptor. In addition,these models achieve better results in predicting extreme values of LAeqdescriptor. Moreover, by observing Fig. 4 (c and d) it can be deducedthat SMO and GPRmodels achieve accurate environmental-noise estima-tion regardless of both the range of values of the LAeq descriptor and theurban location.

On the other hand, as indicated in Section 2.4, Student's t-test wasused to compare the different methods used and thus to analyse thepossibility of finding statistically significant differences inmodel perfor-mance (Table 2). For this test, a one-to-one comparison was madeamong the different methods implemented. For example, the methodof column 2 (MLP) is compared against the method of row 1 (MLR),and then the method of column 3 (SMO) is again compared againstthe method of row 1, etc. After all the methods were compared againstthe method of row 1, a comparison was made against the method ofrow 2, and so on until all methods were compared against all others.Thus, according to the results displayed in Table 2, it can be statedthat (with a significance level of 0.05) all the machine-learning modelsprovided statistically significant better results than did the MLR model.Moreover, as shown in Table 2, the SMO and GPR models were statisti-cally found to be significantly better thanMLP. However, no statisticallysignificant differenceswere found between the SMO and GPR results, sothat the performance of both can be considered equal.

Finally, as for the time taken to train the models, it was found thatMLR (0.03 s), SMO (0.8 s) and GPR (0.18 s) models required a veryshort time to complete the training phase. Meanwhile, the MLP(220.96 s) method consumed substantially more time to complete thetraining phase.

3.2. Feature selection and data reduction

Once the models are built, using the total set of input variables con-sidered (Table 1) and with the implementation of the different estima-tion methods, as shown in Fig. 1, the next step was to implement afeature selection, which would allow a reduction in model complexity,a decrease in the cost of data collection, and easier applicability in areal situation as well. The outcomes are shown in Table 3.

According to these results, with the use of each feature-selectionmethod, different input variables are chosen. Regarding the CFSmethod, with this technique the lowest number of input variablescan be selected (11 input variables). In CFS, features which are highlycorrelated with the output while having low intercorrelation arechosen. Thus, this method considers the degree of redundancy be-tween the input variables, which may explain the lower number ofinput variables selected.

With the use of the WFS technique, each estimation method isimplemented to perform the process of feature selection, so that fourdifferent subsets of input variables are chosen, depending on the esti-mation method used. This feature-selection technique assesses theinput-variable set by using a learning scheme in such a way that thesubset of input variables is chosen to ensure the best results in outputprediction. The WFS technique selects a number of 25, 23, 24, and 24input variables with the use of MLR, MLP, SMO, and GPR, respectively,as induction algorithms.

As for PCA method, a dimensionality reduction is accomplished byselecting enough eigenvectors to account for 95% of the variance ofthe original data. Thus, this method differs from the previous ones,since it does not select input variables but rather processes the data sothat from the original set of variables the method builds a number ofnew variables that explain 95% of the variance of the original data.After a dimensionality reduction and the elimination of theworst eigen-vectors, PCA generates 22 new variables (95% of the variance) that willact as input variables for the different methods for prediction-modeldevelopment.

Page 9: A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods

Fig. 4. Example of the deviation between the observed and the predicted LAeq by the different estimation methods used, MLR (a), MLP (b), SMO (c), and GPR (d), with the whole set of input variables, for the test subset of the cross-validation foldnumber 9.

688A.J.Torija,D

.P.Ruiz/Science

oftheTotalEnvironm

ent505

(2015)680

–693

Page 10: A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods

Table 2Results of the Student's t-test conducted to investigate statistically significant differencesamong the models built with the whole set of input variables.

Comparison method MLR MLP SMO GPR

MLR b (t = 5.337;sig. = 0.000)

b (t = 6.868;sig. = 0.000)

b (t = 6.802;sig. = 0.000)

MLP b (t = 3.124;sig. = 0.006)

b (t = 3.169;sig. = 0.005)

SMO e (t = 0.123;sig. = 0.903)

GPR

b = statistically significant better; e = statistically significant equal; w = statisticallysignificant worse.

689A.J. Torija, D.P. Ruiz / Science of the Total Environment 505 (2015) 680–693

3.3. Feature selection and data reduction for developingmachine-learning-regression models

By using the different feature-selection and data reduction tech-niques, and estimation methods, 12 hybrid models are developed.Each hybrid model is built from the input variables chosen by thefeature-selection technique considered and with the implementationof the estimation method given. Fig. 5 shows the results for eachmodel built.

As reflected in Fig. 5, the machine-learning regression methods im-prove the performance shown by the MLR, and the SMO and GPR-based models achieve the best estimation results. In addition, whenthe feature-selection technique is focused upon, it is found that theuse of CFS gives the worst results in both the MAE and R2 value. Whilethe use of PCA as a feature-selection technique provides better MAEand R2 values, especially when the estimation method implemented isSMO or GPR. However, the use of WFS as the feature-selection tech-nique offers the best results for estimating the LAeq descriptor. Thus, it

Table 3Recommended subsets of input variables as selected by the CFS andWFS feature selectiontechniques.

Input variable CFS MLR-WFS MLP-WFS SMO-WFS GPR-WFS

TD X X X XDP XCLE X X X XCW X X XTL X X XWF X X XV X X X XST X X XTFD X X XALVF X X X XDLVF X X X XAHVF X XDHVF XABF X X X X XDBF X X X XAMMF X X X X XDMMF X X X XUCV X X X XVS X X X X XNSET XNSEnT X X XAS X X XGR X X XUL X X X XDL X XTP X X XCS X X X XSG X X X XSW X X XBH X X X XRW X X X XSRD X X X

should be highlighted that the SMO-WFS and GPR-WFS models givevalues of MAE of 1.16 and 1.14 dB(A), respectively, and R2 values ofaround 0.94.

In addition, it should be noted that the use of theWFS feature selec-tion, not only achieves very promising results but also slightly improvesthe results foundwhen using thewhole set of input variables. Thus, thisfeature-selection technique selects themost relevant input variables onestimating the output,while disregarding input variableswhichhave noimpact on the estimation and which can also lead the estimationmethod to higher errors.

On the other hand, as done with the whole set of input variables inthe implementation of the estimation methods (Section 3.1), Student'st-test was used to find possible statistically significant differencesamong the hybrid models developed. The results appear in Table 4.

Table 4 shows the statistical significance by which MLP-WFS and allthe SMO and GPR models outperform the MLR and MLP-CFS and MLP-PCA models. Moreover, with a significance level of 0.05, it can beestablished that SMO-PCA and GRP-PCA models are equal to MLP-WFSmodel, but also that SMO-CFS and GRP-CFS are statistically worse thanMLP-WFS model. This result indicates that an adequate feature-selection technique is a crucial factor to reduce model complexity with-out significantly compromising model performance. Finally, it wasfound that GPR-WFS model performs statistically better results thanthe other models developed, with the exception of the GPR-PCA andSMO-WFS models, which are statistically equal.

4. Discussion

The first conclusion drawn from Fig. 3 is that by using the variablesselected by Torija et al. (2010), and regardless of the method imple-mented, the results are promising in terms of LAeq estimation. In allcases, values of explained variance above 80% and MAE values below2 dB(A) were found. However, it is undeniable that the results foundby the different methods implemented differ widely in the value ofthe indicators used to assess the level of performance. Thus, as indicatedby Fig. 3 and Table 2, the three machine-learning regression methodsused (MLP, SMO, and GPR) greatly improve the results found by theclassical MLR regression. The reason why the machine-learning regres-sionmethods outperformMLR could be because the prediction of urbanenvironmental noise is a complex and non-linear problem and, thatMLR cannot identify the complex relationships between the great num-ber of input variables involved and the urban environmental noise,whereas machine-learning regression methods are quite robust innonlinear estimation problems. These machine-learning methodscategorize the different relationships between the multitude of inputvariables and the environmental noise in urban environments, thusallowing an accurate and robust estimation of the LAeq descriptor.Regarding the computational cost of implementing the differentmachine-learning models, both SMO and GPR models are very fast incompleting the training phase, whereas the increase in the computa-tional time of MLP compared with these other two machine-learningalgorithms could be an important issue for implementing a real-lifeenvironmental-noise-pollution modelling.

On the other hand, these results offered by the implementedmodelsshould be contextualized in order to facilitate their further use. Themodels presented were developed using a database which was aimedto be a representative sample of a typical medium-sizedMediterraneancity. In this sense, the models presented are able to estimate the LAeqdescriptor under the typical conditions found in the sampled type ofurban environment, i.e. meteorological conditions, driving behaviour,traffic flows, type of vehicles in circulation, activity, etc. Moreover,these models estimate the LAeq descriptor under situations where thepotential receptor is directly affected by a nearby and acoustically dom-inant noise source. To display the ranges of application of the presentedmodels, Table 1 shows the ranges of values of the input variables for thedatabase used. Based on the results shown, the models developed offer

Page 11: A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods

Fig. 5.MAE (a) and R2 (b) values for the 12 hybrid models developed.

690 A.J. Torija, D.P. Ruiz / Science of the Total Environment 505 (2015) 680–693

a precise estimation of the LAeq descriptor; however, before the applica-tion in urban scenarios different from the ones included in the used da-tabase, i.e. different urban configurations, types of cities, or differentnoise sources, a previous process for training/learning such new scenar-ios is required. An iterative update of the database will increase therange of applicability. Although this training is required, the advantagesof these methods are their flexibility and ability to learn from the pro-vided data, so the update and improvement of the database could leadto precise sound-level estimation in any kind of urban scenario. More-over, the list of input variables included in the database ensures agood LAeq estimation, but it is not exhaustive. Thus, new features, forcharacterizing different noise sources or the effect of noise correctivemeasures (e.g. noise barriers) could be incorporated as input variablesas required in a specific scenario.

Regarding the different techniques implemented for approaching adata reduction or a feature selection, and based on the results achievedby the 12 hybridmodels, it can be established that: (i) PCA enables a di-mensionality reduction which in turn allows a reduction in the modelcomplexity, and consequently in the computation time. However, byusing this method, the operational cost (or data-collection cost) will re-main the same, since data of the 32 input variables should be collectedfor further use; (ii) the CFS method achieves the greater reduction inthe operational cost. In spite of this important reduction in the numberof input variables, a good estimation of the LAeq descriptor is performed(MAE = 1.6 dB(A)). From this dataset of input variables, a simplifiedmodel might be implemented, to be used in preliminary studies orwhen a trade-off between estimation error and operational cost is pos-sible; (iii) the use of WFS as feature selection technique provides the

Page 12: A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods

Table 4Results of Student's t-test conducted to investigate statistically significant differences among the 12 hybrid models built.

Comparisonmethod

MLR-PCA MLR-CFS MLR-WFS MLP-PCA MLP-CFS MLP-WFS SMO-PCA SMO-CFS SMO-WFS GPR-PCA GPR-CFS GPR-WFS

MLR-PCA · e (t = −0.709;Sig. = 0.487)

e (t = 0.660;Sig. = 0.518)

e (t = 0.591;Sig. = 0.562)

e (t = 0.563;Sig. = 0.580)

b (t = 5.502;Sig. = 0.000)

b (t = 6.217;Sig. = 0.000)

b (t = 2.181;Sig. = 0.043)

b (t = 7.847;Sig. = 0.000)

b (t = 6.443;Sig. = 0.000)

b (t = 2.680;Sig. = 0.015)

b (t = 8.186;Sig. = 0.000)

MLR-CFS · e (t = 1.431;Sig. = 0.170)

e (t = 1.303;Sig. = 0.209)

e (t = 1.352;Sig. = 0.193)

b (t = 6.268;Sig. = 0.000)

b (t = 6.973;Sig. = 0.000)

b (t = 2.869;Sig. = 0.010)

b (t = 8.611;Sig. = 0.000)

b (t = 7.192;Sig. = 0.000)

b (t = 3.401;Sig. = 0.003)

b (t = 8.954;Sig. = 0.000)

MLR-WFS · e (t = −0.018;Sig. = 0.986)

e (t = −0.127;Sig. = 0.901)

b (t = 5.553;Sig. = 0.000)

b (t = 6.390;Sig. = 0.000)

e (t = 1.742;Sig. = 0.099)

b (t = 8.431;Sig. = 0.000)

b (t = 6.646;Sig. = 0.000)

b (t = 2.282;Sig. = 0.035)

b (t = 8.889;Sig. = 0.000)

MLP-PCA · e (t = −0.096;Sig. = 0.925)

b (t = 4.890;Sig. = 0.000)

b (t = 5.618;Sig. = 0.000)

e (t = 1.617;Sig. = 0.123)

b (t = 7.256;Sig. = 0.000)

b (t = 5.851;Sig. = 0.000)

e (t = 2.089;Sig. = 0.051)

b (t = 7.595;Sig. = 0.000)

MLP-CFS · b (t = 5.908;Sig. = 0.000)

b (t = 6.780;Sig. = 0.000)

e (t = 1.894;Sig. = 0.074)

b (t = 8.975;Sig. = 0.000)

b (t = 7.041;Sig. = 0.000)

b (t = 2.464;Sig. = 0.024)

b (t = 9.482;Sig. = 0.000)

MLP-WFS · e (t = 0.969;Sig. = 0.345)

w (t = −2.882;Sig. = 0.010)

b (t = 2.944;Sig. = 0.009)

e (t = 1.314;Sig. = 0.205)

w (t = −2.709;Sig. = 0.014)

b (t = 3.361;Sig. = 0.003)

SMO-PCA · w (t = −3.593;Sig. = 0.002)

e (t = 1.862;Sig. = 0.079)

e (t = 0.356;Sig. = 0.726)

w (t = −3.492;Sig. = 0.003)

b (t = 2.238;Sig. = 0.038)

SMO-CFS · b (t = 5.075;Sig. = 0.000)

b (t = 3.833;Sig. = 0.001)

e (t = 0.363;Sig. = 0.721)

b (t = 5.365;Sig. = 0.000)

SMO-WFS · e (t = −1.437;Sig. = 0.168)

w (t = −5.161;Sig. = 0.000)

e (t = 0.355;Sig. = 0.727)

GPR-PCA · w (t = −3.754;Sig. = 0.001)

e (t = 1.789;Sig. = 0.090)

GPR-CFS · b (t = 5.501;Sig. = 0.000)

GPR-WFS ·

*b = statistically significant better; e = statistically significant equal; w = statistically significant worse.

691A.J.Torija,D

.P.Ruiz/Science

oftheTotalEnvironm

ent505

(2015)680

–693

Page 13: A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods

692 A.J. Torija, D.P. Ruiz / Science of the Total Environment 505 (2015) 680–693

lowest estimation errors (MAE = 1.14–1.16). By comparing this tech-nique with CFS, it is found that a smaller reduction in the operationalcost is traded for greater estimation accuracy. From the resulting datasetof input variables and using either SMO or GRP algorithms an accurateestimation of the LAeq descriptor is ensured. It should be noted thatthe reason why both regression algorithms perform the best sound-level estimation might be related to the high performance of the PukKernel. Only the shorter computational time could justify choosingGPR algorithm over SMO algorithm.

5. Conclusions

In this study, a procedure for approaching an environmental-noise-pollution modelling in built-up environments is proposed, with adetailed study of several alternatives. The methods implemented inthis work were adapted and analysed to be applied to this specificmodelling application. With the aim of achieving good modelling re-sults, a number of machine-learning regression methods were imple-mented to approach the estimation of the LAeq descriptor in urbanenvironments. The performance, as for MAE and R2 indicators, of MLP-, SMO- and GPR-based models was compared with the performance ofa classical linear-estimation method, MLR. Thus, the results indicatethat machine-learning regression methods widely outperformMLR, es-pecially SMO and GPR models, which reach the highest R2 values andthe lowest MAE values. These results seem to confirm that the model-ling of environmental noise in urban environments is a non-linear prob-lem, and thus, because of the ability to solve non-linear problems, thesemachine-learning methods achieve an accurate estimation of the LAeqdescriptor.

On the other hand, due to the great number of input variables in-volved in LAeq estimation in urban environments, two differentfeature-selection techniques (CFSand WFS) and a data-reduction tech-nique (PCA) were proposed and used to reduce the complexity of theestimation models and the computational and data-collection cost.Thus, by these feature-selection techniques and by implementing theestimation methods used, 12 different models were built. In light ofthe results, it can be established that the WFS is the feature-selectiontechnique that achieves the best results in estimating LAeq, but the oper-ational cost is not substantially reduced. On the other hand, CFS tech-nique offers the greatest reduction in data-collection cost, which istraded for an increase in the estimation error of the LAeq descriptor.Therefore, depending on the needs as to data-collection cost or accura-cy, a simplified (CFS-based) or high-accuracy (WFS-based)model couldbe selected for estimating LAeq in urban environments.

Acknowledgments

This work was funded by the University of Malaga and the EuropeanCommission under the Agreement Grant no. 246550 of the seventhFramework Programme for R & D of the EU, granted within the PeopleProgramme, “Co-funding of Regional, National and InternationalProgrammes” (COFUND), andMinisterio de Economia y Competitividad(COFUND2013-40259). Moreover, this work is also supported by the“Ministerio de Economía y Competitividad” of Spain under projectTEC2012-38883-C02-02.

Appendix A. Supplementary data

A Google map of the 80 measurement locations as well as theirdescription (value of the 32 input variables considered and LAeq descrip-tor) is provided. Supplementary data associated with this article can befound in the online version, at doi:10.1016/j.scitotenv.2014.08.060.These data include Google maps of the most important areas describedin this article.

References

Agirre-Basurko E, Ibarra-Berastegi G, Madariaga I. Regression and multilayer perceptron-based models to forecast hourly O3 and NO2 levels in the Bilbao area. Environ ModelSoftware 2006;21:430–46.

Belojevic G, Jakovljevic B, Aleksic O. Subjective reactions to traffic noise with regard tosome personality traits. Environ Int 1997;23:221–6.

Belojevic G, Jakovljevic B, Stojanov V, Paunovic K, Ilic J. Urban road-traffic noise and bloodpressure and heart rate in preschool children. Environ Int 2008;34:226–31.

Calvo B, Larrañaga P, Lozano JA. Feature subset selection from positive and unlabelled ex-amples. Pattern Recognit Lett 2009;30:1027–36.

Chandra P. Sigmoidal function classes for feedforward artificial neural networks. NeuralProcess Lett 2003;18:205–15.

Chuang C-C, Lee Z-J. Hybrid robust support vector machines for regression with outliers.Appl Soft Comput 2011;11:64–72.

De Jong K. Learning with genetic algorithms: an overview. Mach Learn 1988;3:121–38.Lucas de Souza LC, Giunta MB. Urban indices as environmental noise indicators. Comput

Environ Urban 2011;35:421–30.Directive 2002/49/EC. European parliament and of the council of 25 June 2002 relating to

the assessment and management of environmental noise; 2002.Flake GW, Lawrence S. Efficient SVM regression training with SMO. Mach Learn 2001;46:

271–90.Garg N, Maji S. A critical review of principal traffic noise models: strategies and implica-

tions. Environ Impact Assess 2014;46:68–81.Givargis Sh, Karimi H. A basic neural traffic noise prediction model for Tehran's roads. J

Environ Manage 2010;91:2529–34.Goldberg DE. Genetic algorithms in search, optimization and machine learning. Boston:

Kluwer Academic Publishers; 1989.Gündogdu Ö, Gökdag M, Yüksel F. A traffic noise prediction method based on vehicle

composition using genetic algorithms. Appl Acoust 2005;66:799–809.Hall MA, Smith LA. Feature subset selection: a correlation based filter approach. Proceed-

ings of the International Conference onNeural Information Processing and Intelligent,Information Systems. Springer; 1997. p. 855–8.

Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. TheWEKA Data MiningSoftware: an update. SIGKDD Explorations; 2009. p. 11.

Haykin S. Neural networks. A comprehensive foundation. 2nd ed. New York: PrenticeHall; 1999.

HofmanWF, Kumar A, Tulen JHM. Cardiac reactivity to traffic noise during sleep onman. JSound Vib 1995;179:577–89.

Ibarra-Berastegi G, Elias A, Barona A, Saenz J, Ezcurra A, Diaz de Argandoña J. From diag-nosis to prognosis for forecasting air pollution using neural networks: air pollutionmonitoring in Bilbao. Environ Model Software 2008;23:622–37.

Jolliffe T. Principal component analysis. ACM computing surveys. New York: Springer-Verlag; 1986. p. 1–47.

Kohavi R, John GH. Wrappers for feature subset selection. Artif Intell 1997;97:273–324.Kurra S, Morinoto M, Maehoura ZI. Transportation noise annoyance—a simulated envi-

ronmental study for road, railway and aircraft noises, part 1: overall annoyance. JSound Vib 1999;220:251–78.

Nega T, Smith C, Bethune J, Fu W-H. An analysis of landscape penetration by roadinfraestrure and traffic noise. Comput Environ Urban 2012;36:245–56.

Pamanikabud P, Tansatcha M. Geographical information system for traffic noise analysisand forecasting with the appearance of barriers. Environ Model Software 2003;18:959–73.

Pasolli L, Melgani F, Blanzieri E. Gaussian process regression for estimating chlorophyllconcentration in subsurface waters from remote sensing data. IEEE Geosci RemoteSens Lett 2010;7:464–8.

Platt J. Fast training of support vector machines using sequential minimal optimization.In: Schölkopf B, Burges CJC, Smola AJ, editors. Advances in kernel methods: supportvector learning. Cambridge: MIT Press; 1999. p. 185–208.

Rasmussen CE, Williams CKI. Gaussian processes for machine learning. New York: TheMIT Press; 2006.

Rumerlhart DE, Hinton GE, Williams RJ. Learning representations by backpropagation er-rors. Nature 1986;323:533–6.

Schölkopf B, Smola A. Learning with kernels—support vector machines, regularization,optimization and beyond. New York: MIT Press Series; 2002.

Shaw EAG. Noise environments outdoors and the effects of community noise exposure.Noise Control Eng 1996;44:109–19.

Shevade SK, Keerthi SS, Bhattacharyya C, Murthy KRK. Improvements to the SMO algo-rithm for SVM regression. IEEE Trans Neural Netw Learn Syst 2000;11:1188–93.

Smola AJ, Schölkopf B. A tutorial on support vector regression. Stat Comput 2004;14:199–222.

Steele C. A critical review of some traffic noise prediction models. Appl Acoust 2001;62:271–87.

Tang UW, Wang ZS. Influences of urban forms on traffic-induced noise and air pollution:results from a modeling system. Environ Model Software 2007;22:1750–64.

Thissen U, Pepers M, Üstün B, Melssen WJ, Buydens LMC. Comparing support vector ma-chines to PLS for spectral regression applications. Chemometr Intell Lab 2004;73:169–79.

Tiwari R, Singh MP. Correlation-based attribute selection using genetic algorithm. Int JComput Appl 2010;4:28–34.

Torija AJ, Ruiz DP. Using recorded sound spectra profile as input data for real-timeshort-term urban road-traffic-flow estimation. Sci Total Environ 2012;435–436:270–9.

Torija AJ, Genaro N, Ruiz DP, Ramos-Ridao A, Zamorano M, Requena I. Priorization ofacoustic variables: environmental decision support for the physical characterizationof urban sound environments. Build Environ 2010;45:1477–89.

Page 14: A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods

693A.J. Torija, D.P. Ruiz / Science of the Total Environment 505 (2015) 680–693

Torija AJ, Ruiz DP, Ramos-Ridao A. Required stabilization time, short-term variability andimpulsiveness of the sound pressure level to characterize the temporal compositionof urban soundscapes. Appl Acoust 2011;72:89–99.

Torija AJ, Ruiz DP, Ramos-Ridao AF. Use of back-propagation neural networks to predictboth level and temporal-spectral composition of sound pressure in urban sound en-vironments. Build Environ 2012;52:45–56.

Torija AJ, Ruiz DP, Ramos-Ridao AF. A tool for urban soundscape evaluation applying sup-port vector machines for developing a soundscape classification model. Sci Total En-viron 2014;482–483:440–51.

Uguz H. A two-stage feature selection method for text categorization by using informa-tion gain, principal component analysis and genetic algorithm. Knowl-Based Syst2011;24:1024–32.

Üstün B, Melssen WJ, Buydens LMC. Facilitating the application of support vector regres-sion by using a universal Pearson VII function based kernel. Chemometr Intell Lab2006;81:29–40.

Vafaie H, Imam IF. Feature selection methods: genetic algorithms vs. greedy-like search.Proceeding of the 3rd International Fuzzy Systems and Intelligent Control Confer-ence; 1994. [Louisville, KY].

Valle S, Li W, Qin SJ. Selection of the number of principal components: the variance of thereconstruction error criterionwith a comparison to othermethods. Ind Eng ChemRes1999;38:4389–401.

Vapnik VN. The natural of statistical learning theory. New York: Springer; 1995.Vapnik VN. Statistical learning theory. New York: Wiley; 1998.Venkatesan P, Anitha S. Application of a radial basis function neural network for diagnosis

of diabetes mellitus. Curr Sci India 2006;91:1195–9.Verrelst J, Muñoz J, Alonso L, Delegido J, Rivera JP, Camps-Valls G, et al. Machine learning

regression algorithms for biophysical parameter retrieval: opportunities for sentinel-2 and -3. Remote Sens Environ 2012;118:127–39.

Vlachokostas Ch, Achillas Ch, Michailidou AV, Moussiopoulos N. Measuring combinedexposure to environmental pressures in urban areas: an air quality and noise pollu-tion assessment approach. Environ Int 2012;39:8–18.

Wu Q, Law R, Xu X. A sparse Gaussian process regressionmodel for tourism demand fore-casting in Hong Kong. Expert Syst Appl 2012;39:4769–74.

Xinjun P. TSVR: an efficient twin support vector machine for regression. Neural Netw2010;23:365–72.

Yilmaz I, Kaynar O. Multiple regression, Ann (RBF, MLP) and ANFIS models for predictionof swell potential of clayey soils. Expert Syst Appl 2011;38:5958–66.

Zhao J, Zhang X, Chen Y. A novel traffic-noise prediction method for non-straight roads.Appl Acoust 2012;73:276–80.