10
Comparison of two optimized machine learning models for predicting displacement of rainfall-induced landslide: A case study in Sichuan Province, China Xing Zhu a,b , Qiang Xu a, , Minggao Tang a , Wen Nie b , Shuqi Ma b , Zhipeng Xu c a State Key Laboratory of Geohazard Prevention and Geoenvironment Protection, Chengdu University of Technology, Chengdu 610059, PR China b School of Civil and Environmental Engineering, Nanyang Technological University, Singapore 639789, Singapore c China Railway Academy Co., Ltd., Chengdu 611731, PR China abstract article info Article history: Received 20 August 2016 Received in revised form 15 January 2017 Accepted 18 January 2017 Available online 19 January 2017 Evaluation and prediction of displacement by specic models help in forecasting geo-hazards. Among the various available predictive tools, Least Square Support Vector Machines (LSSVM) model optimized with Genetic Algo- rithm, namely GA-LSSVM, is commonly used to empirically forecast landslide displacement due to its capability of processing non-linear complex systems. Another improved hybrid model composed of Double Exponential Smoothing (DES) and LSSVM considers measured displacement and precipitation time series to estimate the one-step ahead displacement evolution of rain-induced landslide. Here, the modelling process and accuracy of these two models are presented, and their predictive performances are evaluated by the root mean squared error (RMSE), mean absolute percentage error (MAPE), accuracy factor (AF), and correlation coefcient (R). A slow- ly-moving landslide on gently dipping rocky slope located in Sichuan Province of China was chosen as the case study for its deformation triggered by intense seasonal rainfall. The application results indicated that both GA- LSSVM and DES-LSSVM models were suitable for accurately predicting the landslide displacement on the basis of precipitation and displacement observations. Furthermore, comparison results show that DES-LSSVM model can provide the better predictive accuracy, with RMSE and MAPE values of 0.059 mm and 0.004%, respectively. © 2017 Elsevier B.V. All rights reserved. Keywords: Genetic Algorithm Least Squares Support Vector Machines Double Exponential Smoothing Landslide High-accuracy prediction 1. Introduction Landslide is one major worldwide geo-hazard, causing massive ca- sualties and property damage every year. Prediction of the evolution process of landslide is an important issue of the safety assessment and dynamical behaviour investigation for landslide under the inuence of external factors, for example, precipitation, water, earthquake, and human activities. It is well known that the evolution process of landslide is a complex non-linear process that is caused by the complex interac- tion of different factors (Huang et al., 2005; Zhang et al., 1994), e.g. the complicated geological settings, varying hydrological conditions. Displacement time series are generally appreciated as the direct representation of the complex and non-linear dynamical behaviour of landslide. Monitoring, prediction and early warning of landslide dis- placement are the effective and reliable methods in use to reduce the risk of landslide failure on human's lives and infrastructure. Recently, numerous models have been proposed and widely used for landslide displacement, such as functional regression (Samui and Kurup, 2012; Yin et al., 2007), Articial Neural Network (ANN) (Chen et al., 2015b; Jiang and Chen, 2016; Lian et al., 2015; Lv and Liu, 2012), and Support Vector Machines (SVMs) (Cai et al., 2015; Feng et al., 2004; Li and Kong, 2014; Samui and Kurup, 2012; Zhou et al., 2016). All those models tried to nd the complex non-linear relationship between a training set of input vectors and corresponding output. The ANN-based methods have provided powerful tools to predict the dis- placement of landslide for their capability of processing non-linear problems. However, ANN has its own drawbacks such as arriving at the local minimum, over tting, slow convergence speed that limit its predictive performance (Lian et al., 2012; Samui and Kurup, 2012). The SVM is a machine learning model based on the knowledge of statis- tical learning for small samples and structural risk minimization. There- fore, SVM becomes a more advanced method for dealing with the non- linear problems in predicting landslide displacement. With the rapid development of theory and technique, Least Squares Support Vector Machines (LSSVM) have been proposed for overcoming the defects of the SVM with high computational complexity due to quadratic pro- gramming (Suykens et al., 2002; Vapnik et al., 1997). The predictive performance of those models is very crucial for early warning of landslide (Sassa et al., 2009). To improve the predictive per- formance, Genetic Algorithm (GA) was introduced to optimize the pa- rameters of model for obtaining better predictive performance in Engineering Geology 218 (2017) 213222 Corresponding author. E-mail address: [email protected] (Q. Xu). http://dx.doi.org/10.1016/j.enggeo.2017.01.022 0013-7952/© 2017 Elsevier B.V. All rights reserved. Contents lists available at ScienceDirect Engineering Geology journal homepage: www.elsevier.com/locate/enggeo

Comparison of two optimized machine learning models for ...hgycg.cdut.edu.cn/data/upload/1563334939768.pdfComparison of two optimized machine learning models for predicting displacement

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • Engineering Geology 218 (2017) 213–222

    Contents lists available at ScienceDirect

    Engineering Geology

    j ourna l homepage: www.e lsev ie r .com/ locate /enggeo

    Comparison of two optimized machine learning models for predictingdisplacement of rainfall-induced landslide: A case study in SichuanProvince, China

    Xing Zhu a,b, Qiang Xu a,⁎, Minggao Tang a, Wen Nie b, Shuqi Ma b, Zhipeng Xu ca State Key Laboratory of Geohazard Prevention and Geoenvironment Protection, Chengdu University of Technology, Chengdu 610059, PR Chinab School of Civil and Environmental Engineering, Nanyang Technological University, Singapore 639789, Singaporec China Railway Academy Co., Ltd., Chengdu 611731, PR China

    ⁎ Corresponding author.E-mail address: [email protected] (Q. Xu).

    http://dx.doi.org/10.1016/j.enggeo.2017.01.0220013-7952/© 2017 Elsevier B.V. All rights reserved.

    a b s t r a c t

    a r t i c l e i n f o

    Article history:Received 20 August 2016Received in revised form 15 January 2017Accepted 18 January 2017Available online 19 January 2017

    Evaluation and prediction of displacement by specific models help in forecasting geo-hazards. Among the variousavailable predictive tools, Least Square Support Vector Machines (LSSVM) model optimized with Genetic Algo-rithm, namely GA-LSSVM, is commonly used to empirically forecast landslide displacement due to its capabilityof processing non-linear complex systems. Another improved hybrid model composed of Double ExponentialSmoothing (DES) and LSSVM considers measured displacement and precipitation time series to estimate theone-step ahead displacement evolution of rain-induced landslide. Here, the modelling process and accuracy ofthese two models are presented, and their predictive performances are evaluated by the root mean squarederror (RMSE),mean absolute percentage error (MAPE), accuracy factor (AF), and correlation coefficient (R). A slow-ly-moving landslide on gently dipping rocky slope located in Sichuan Province of China was chosen as the casestudy for its deformation triggered by intense seasonal rainfall. The application results indicated that both GA-LSSVM and DES-LSSVM models were suitable for accurately predicting the landslide displacement on the basis ofprecipitation and displacement observations. Furthermore, comparison results show that DES-LSSVM model canprovide the better predictive accuracy, with RMSE and MAPE values of 0.059 mm and 0.004%, respectively.

    © 2017 Elsevier B.V. All rights reserved.

    Keywords:Genetic AlgorithmLeast Squares Support Vector MachinesDouble Exponential SmoothingLandslideHigh-accuracy prediction

    1. Introduction

    Landslide is one major worldwide geo-hazard, causing massive ca-sualties and property damage every year. Prediction of the evolutionprocess of landslide is an important issue of the safety assessment anddynamical behaviour investigation for landslide under the influence ofexternal factors, for example, precipitation, water, earthquake, andhuman activities. It is well known that the evolution process of landslideis a complex non-linear process that is caused by the complex interac-tion of different factors (Huang et al., 2005; Zhang et al., 1994), e.g.the complicated geological settings, varying hydrological conditions.Displacement time series are generally appreciated as the directrepresentation of the complex and non-linear dynamical behaviour oflandslide. Monitoring, prediction and early warning of landslide dis-placement are the effective and reliable methods in use to reduce therisk of landslide failure on human's lives and infrastructure.

    Recently, numerousmodels have been proposed andwidely used forlandslide displacement, such as functional regression (Samui andKurup, 2012; Yin et al., 2007), Artificial Neural Network (ANN) (Chen

    et al., 2015b; Jiang and Chen, 2016; Lian et al., 2015; Lv and Liu, 2012),and Support Vector Machines (SVMs) (Cai et al., 2015; Feng et al.,2004; Li and Kong, 2014; Samui and Kurup, 2012; Zhou et al., 2016).All those models tried to find the complex non-linear relationshipbetween a training set of input vectors and corresponding output. TheANN-based methods have provided powerful tools to predict the dis-placement of landslide for their capability of processing non-linearproblems. However, ANN has its own drawbacks such as arriving atthe local minimum, over fitting, slow convergence speed that limit itspredictive performance (Lian et al., 2012; Samui and Kurup, 2012).The SVM is amachine learningmodel based on the knowledge of statis-tical learning for small samples and structural riskminimization. There-fore, SVM becomes a more advanced method for dealing with the non-linear problems in predicting landslide displacement. With the rapiddevelopment of theory and technique, Least Squares Support VectorMachines (LSSVM) have been proposed for overcoming the defects ofthe SVM with high computational complexity due to quadratic pro-gramming (Suykens et al., 2002; Vapnik et al., 1997).

    The predictive performance of those models is very crucial for earlywarning of landslide (Sassa et al., 2009). To improve the predictive per-formance, Genetic Algorithm (GA) was introduced to optimize the pa-rameters of model for obtaining better predictive performance in

    http://crossmark.crossref.org/dialog/?doi=10.1016/j.enggeo.2017.01.022&domain=pdfhttp://dx.doi.org/10.1016/j.enggeo.2017.01.022mailto:[email protected]://dx.doi.org/10.1016/j.enggeo.2017.01.022http://www.sciencedirect.com/science/journal/00137952www.elsevier.com/locate/enggeo

  • 214 X. Zhu et al. / Engineering Geology 218 (2017) 213–222

    recent achievements. Chen and Zeng (2013) improved the predictivecapability of ANN by combining it with GA. Li and Kong (2014) present-ed an application of Genetic Algorithm and Support Vector Machines(GA-SVM) method with parameter optimization in landslide displace-ment rate prediction. Cai et al. (2015) presented a new model of LSSVMand GA for predicting the displacement of a landslide based on themulti-ple triggering factors. Besides the achievements above, some authors pro-posed a new work scheme for predicting the displacement of landslideunder external influencing factors. Du et al. (2012) divided the accumu-lated displacement into a trend and a periodic component by themovingaverage method, and the back-propagation neural network (BPNN) wasadopted to forecast the periodic component, while non-linear regressionwas used to predict the trend component. Zhou et al. (2016) used the Par-ticle Swarm Optimization and Support Vector Machines (PSO-SVM) topredict the periodic component to improve the predictive accuracy.

    In this paper, we have proposed two improved LSSVM models forhigh-precision prediction of the displacement of rainfall-induced land-slide. The first one is a hybrid model composed of LSSVM and DoubleExponential Smoothing (DES), namely DES-LSSVM in this study. Thesecond one is an LSSVMmodel optimized by Genetic Algorithm for dis-placement rate prediction, namely GA-LSSVM. Kualiangzi landslide, atypical rainfall-induced deep-seated rocky landslide with gentle slidsurface angle in the Sichuan Province, China, was taken as the casestudy to construct and validate those two models.

    2. Methodology

    2.1. A hybridmodel composed of Least Square Support VectorMachines andDouble Exponential Smoothing (DES-LSSVM)

    Fig. 1 shows the flowchart of the hybrid model (DES-LSSVM), whichincludes three main parts of Hodrick-Prescott filter/decomposition,Least Square Support Vector Machines, and Double ExponentialSmoothing (DES) method. Firstly, the original observed displacementtime series are easily de-noised by wavelet de-noising method in

    Fig. 1. Flowchart of DES-LSSVMmodel.

    order to reduce the random noises in GPS observations. And then, theHodrick-Prescott filter is used to divide cumulative displacement timeseries into periodic component related to an external influencing factor(i.e. seasonal intense rainfall) and trend component representing thelong-term dynamic evolution behaviour of landslide. On the one hand,the values of periodic component in the prior two days and the averagerainfall intensity in the prior certain days are chosen as the input vectorto construct the LSSVM model, and the one-step ahead periodic dis-placement is regarded as the output of the LSSVMmodel. The optimizedLSSVM model is trained and obtained using grid search algorithm withcross-validationmethod on the basis of the previously observed period-ic displacement and precipitation time series. On the other hand, theone-step ahead trend displacement can be estimated by the Double Ex-ponential Smoothingmethod on the basis of the trend component valuein the prior two days. Finally, the summation of the two components isconsidered as the cumulative displacement predicted.

    In this paper, the automatic 1-Dde-noisingMATLAB functionwden()was chosen as the pre-processing approach to remove the randomnoises. It is applied to the originally observed displacement time seriesusing soft heuristic SURE thresholding and scaled noise option on detailcoefficients obtained from the decomposition of original data set at level5 by ‘sym8’ wavelet.

    The following sections will introduce the three main methods:Hodrick-Prescott decomposition/filter, LSSVM and DES.

    2.1.1. Decomposition of cumulative displacement utilizing the Hodrick-Prescott filter

    The Hodrick-Prescott filter is a mathematical tool used in macroeco-nomics, especially in real business cycle theory, to remove the cyclicalcomponent of a time series from raw data. The filter was popularizedin the field of economics in the 1990s by economists Robert J. Hodrickand Nobel Memorial Prize winner Edward C. Prescott, and the detailsof the method can be found in the paper by Hodrick and Prescott(1997). It was used to obtain a smoothed-curve representation of atime series, one that is more sensitive to long-term than to short-termfluctuations. Therefore, it can be used to divide the cumulative displace-ment time series into fluctuation term owing to intense rainfall in everyyear and trend component in the long term as shown as follows:

    Si ¼ τi þ αi ð1Þ

    where Si is the total cumulative displacement value at time i; αi repre-sents the fluctuation term and is also called periodic component be-cause it is related to the seasonal rainfall in this study; τi is the trendterm at time i. One smoothing parameter λ before applying theHodrick-Prescott filter should be determined according to the periodof the external trigger. In this study, 100 was determined as the valueof λ according to the user guide of this filter function in MATLAB andthe sharp increase characteristics of cumulative displacement in therainfall seasons every year.

    2.1.2. Construction of LSSVMmodel for predicting the periodic displacementcomponent

    Least Squares Support Vector Machines (LSSVM) is the improvedformulation of the original SVM algorithm (Vapnik et al., 1997) pro-posed by Suykens and Vandewalle (1999). In LSSVM, given a trainingdata set of N samples {xi,yi}i=1N with input data xi∈Rn and outputyi∈R, where Rn is the n-dimensional vector space and R is the one-di-mensional vector space. In this model, the three input variables of theLSSVM are αi, αi−1 obtained by Eq. (1), and ri representing the averageintensity of rainfall in the prior certain K days. In this study, the value ofK is set to 20 by considering the lagging-effect of rainfall influence onthe physical characteristics and mechanical behaviour of geo-materialswithin landslide. The output of the LSSVM model is the one-stepahead periodic displacement Yi+1. So, x=[αi ,αi−1 ,ri] and y = [Yi+1].

  • Fig. 2. Flowchart of GA-LSSVM model.

    215X. Zhu et al. / Engineering Geology 218 (2017) 213–222

    The LSSVM carries out mapping of the data samples in a high-dimensional feature space as follows:

    y xð Þ ¼ wTØ xð Þ þ b ð2Þ

    where the non-linear mapping function ø(x) maps the input data into ahigher dimensional feature space. w∈Rn is an adjustable weight factorvector with n dimensional space, and b∈R is the scalar threshold withone dimensional space. Based on the structural risk minimization prin-ciple, the optimization problem of the LSSVM for function estimationcan be given as follows (Suykens and Vandewalle, 1999):

    minw;b;σ J w;σð Þ ¼12wTwþ γ

    2∑N

    i¼1e2i ð3Þ

    Subject to:

    y xð Þ ¼ wTØ xið Þ þ bþ ei; i ¼ 1;2;…;N ð4Þ

    where γ is a regularization parameter, which determines the trade-offbetween the training error and the model fitness; ei is error variable.

    The following model for periodic displacement prediction can beconstructed by solving the above mentioned optimization problem(Chai et al., 2015; Chen et al., 2015a; Suykens et al., 2002).

    y xð Þ ¼ ∑N

    i¼1βiK x; xið Þ þ b ð5Þ

    K(.) is a kerner functionmatrix, and Kij=ø(xi)Tø(xj)=K(xi,xj). Thereare three types of kernel functions: a polynomial function, a radial basisfunction (RBF), and a sigmoid function. In this study, RBF kernel func-tion is chosen due to its fewer parameters and excellent non-linearmapping performance. RBF kernel function is given by:

    K x; xið Þ ¼ exp −1

    2σ2x−xik k2

    � �ð6Þ

    where σ2 is the parameter related to the bandwidth of the kernel in sta-tistics, which is an important parameter for the generalization behav-iour of a kernel method.

    Therefore, regularization parameterγ and kernel parameter σ2 havepowerful influence on the efficiency and generalization performance ofthe LSSVM model. In this study, the two optimized parameters aresearched as a pair which results in the lowest validation mean squareerror (MSE) using the grid search method and the cross-validationmethod which can be easily implemented with LSSVM toolbox inMATLAB (Suykens et al., 2002).

    2.1.3. Double Exponential Smoothing (DES) for long-term trend predictionExponential smoothing is commonly applied to smooth data, as

    many window functions are in signal processing, acting as low-pass fil-ters to remove high frequency noise. Furthermore, Double ExponentialSmoothingmethod is an appropriatemethod for the time series that in-clude trendwithout seasonalfluctuations (Tsividis, 2010). In Double Ex-ponential Smoothing, the basic idea is to introduce a term to take intoaccount the possible form of trend in a time series. Consideration of agiven trend component of displacement time series represented by{τi} beginning at time i=0, we used {Li} to represent the smoothedvalue for time i, and {εi} is the best estimate of the trend at time i. DoubleExponential Smoothing is given by the following formulae:

    Li ¼ m ∗τi þ 1−mð Þ Li−1 þ εi−1ð Þ ð7Þ

    εi ¼ n� Li−Li−1ð Þ þ 1−nð Þεi−1 ð8Þ

    Fiþ1 ¼ Li þ εi ð9Þ

    where m and n are two parameters associated with the level (Li) andwith the trend (εi), respectively, and Fi+1 is the one-step ahead forecast-ing value. Generally, the initialization conditions for the above formulaeare set as follows:

    L1 ¼ τ1ε1 ¼ τ2−τ1ð Þ

    0bmb10bnb1

    8>><>>:

    ð10Þ

    The values of m and n can be determined by minimizing the rootmean squares error (RMSE) between the values predicted and thevalues actually observed during the training phase.

    2.1.4. Prediction of cumulative displacementAccording to Eq. (1), the total forecasting displacement can be ob-

    tained by adding the periodic displacement value predicted by LSSVMmodel and the trend value predicted by DES method. So, the one-stepahead total displacement can be obtained as follows:

    Piþ1 ¼ Fiþ1 þ Yiþ1 ð11Þ

    where Pi+1 represents the forecasting value of the total displacement.

    2.2. Optimized Least Square Support Vector Machine with GeneticAlgorithm (GA-LSSVM)

    Fig. 2 shows the flowchart of the GA-LSSVM model applied in thispaper. The same with the DES-LSSVM, the originally observed

  • Table 1Evaluation criteria for prediction performance of machine learning models.

    Item Formula Notes

    RMSEffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi∑ni¼1 ðŷi−yi Þ2

    n

    qRoot mean square error (RMSE) is a frequently used measure of the differences between the values predicted (ŷi) by a model and the valuesactually observed (yi). Where, n denotes the total number of data samples.

    MAPE 100 � 1n∑ni¼1 j yi−ŷiyi j Mean absolute percentage error (MAPE) is a measure of prediction accuracy of a forecasting method in statistics, and usually expressesaccuracy as a percentage.

    R ∑ni¼1ðyi−yÞðŷi−ŷÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi∑ni¼1 ðyi−yÞ2

    p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi∑ni¼1ðŷi−ŷÞ

    2q Correlation coefficient (R) is a measure of the strength and direction of the linear relationship between the values predicted (ŷi) by a model

    and the values actually observed (yi). Where, y and ŷ is the mean value of observations and predictions, respectively.

    AF10

    ∑ni¼1 j logð

    ŷiyiÞj

    n

    Prediction accuracy factor (AF) is a simple multiplicative indicator denoting the spread of results about the prediction. AF = 1 indicates thatthere is a perfect agreement between all the values predicted (ŷi) and values actually observed (yi).

    216 X. Zhu et al. / Engineering Geology 218 (2017) 213–222

    displacement in this model should be de-noised to remove the randomobserved noises. And then, the daily displacement change-rate is calcu-lated according to the de-noised displacement.

    The details of basic theory for LSSVM are no longer presented here(see in Section 2.1.2). The differences of the GA-LSSVM model fromthe above proposed DES-LSSVMmodel are as follows:

    (1) As shown in Fig. 2, the displacement rate value in the prior 3 days(i.e. yt ,yt−1 ,yt−2) combined with rainfall influencing factor (rt)are directly chosen as the input vector of the LSSVMmodel. Therecently observed displacement rate indicates the inherentstate factor of the landslide, while the average rainfall intensityin the prior certain K number of days denotes the externalinfluencing factor. So, it is reasonable to use those previously ob-served factors as input for constructing the LSSVMmodel to pre-dict the one-step ahead displacement change value of landslideas the output of the model. As mentioned in Section 2.1.2, thevalue of K is set to 20 by considering the lag-effect of rainfall in-fluence on the physical characteristics andmechanical behaviourof geo-materials within landslide.

    Fig. 4. Data sets observed: (a) displacement and daily precipitation from 2013 to 2015; (b

    Fig. 3. Engineering geological pro

    (2) Genetic Algorithm, as a widely used optimization method (Cai etal., 2015; Li and Kong, 2014), was chosen to search for the opti-mal parameters (γ and δ) of LSSVM in this model. The algorithmrepeatedly modifies a population of individual solutions. At eachstep, the Genetic Algorithm randomly selects individuals fromthe current population and uses them as parents to produce thenext generation with three genetic operators of selection, cross-over, and mutation. At last, the best individual (the optimalvalues of LSSVM parameters) can be found with repeated evolu-tion from generation to generation. This algorithm can be imple-mented in MATLAB with a combination of the GAOT toolbox(Houck et al., 1995) and LSSVM toolbox (Suykens et al., 2002).

    2.3. Quantitative criteria for evaluating models' performance

    We quantitatively evaluate the performance of these models usingfour performance indexes, namely root mean square error (RMSE),mean absolute percentage error (MAPE), correlation coefficient (R),and accuracy factor (AF) (Table 1). RMSE and MAPE are frequently

    ) daily precipitation, cumulative displacement, and daily displacement rate in 2014.

    file of Kualiangzi landslide.

  • Table 2The statistical analysis of the field observed data sets of 2013 and 2014.

    Displacement/mm Precipitation/mm

    Year μ σ δ Min Max μ σ δ Min Max2013 529 25565 160 216 667 3.9 183.6 13.6 0 1092014 730.5 2526.5 50.3 667.4 825.9 2.4 82.9 9.1 0 130

    Note: μ−mean value;σ−variance; δ−standard deviation;Min –minimum value;Max –maximum value.

    217X. Zhu et al. / Engineering Geology 218 (2017) 213–222

    used to estimate the deviation between the predicted values and thevalues actually observed. Lower values of RMSE and MAPE denote bet-ter performance of the model. R and AF are coefficients that illustratequantitative measures of statistical relationships between the valuespredicted by the two models and the values actually observed.

    3. Application and comparison

    3.1. Case study: Kualiangzi (KLZ) landslide in Sichuan Province, China

    The Kualiangzi landslide is a large and typical rain-induced slowlymoving rock slide and is located in Zhongjiang County, Sichuan Prov-ince, China. As shown in Fig. 3, The landslide developed in a nearly hor-izontal bedding rocky slope, with a maximum width and length of1100 m and 360–390 m respectively. The height between the toe andthe rear edge of the landslide is about 110 m. The average thickness ofthe landslide body is 50 m with the maximum of 80 m. It covers anarea of approximately 0.51 km2, and has an estimated volume of2.55 × 107 m3. Although the dip of the slide surface is only 2°–5°, it is

    Fig. 5. (a) Constructing and training LSSVMmodel using periodic displacement component obse(c) Training results of trend displacement component using DES approach; (d) Fitting regressi

    still under a creeping state that is difficult to understand based on thetraditional limit equilibrium theory.

    A special sedimentary stratumknown as “red beds”, is widespread inSichuan Basin in the southwest of China. As shown in Fig. 3, the red bedsare typically composed of the alternations of thick sandstones and thinmudstones layers, and were formed in the Jurassic and Cretaceous age(Huang et al., 2005). A number of vertical tension troughs/cracks areformed because of the differential deformation between these twolayers. Consequently, the troughs/cracks provide infiltration channelsinto landslide for rainwater. Different from the high stiffness of thesandstone in the slope, the mudstone is more likely to be softened anddisintegrated under the action of water because it has greater water-ab-sorbing capacity and expansibility (Xu et al., 2016). The landslide ischaracterized by a deep and sub-horizontal slip surface, with creepmovement in the dry season but transient response to intensive rainfall,and is commonly known as the translational landslide (Xu et al., 2016;Zhang et al., 1994).

    To study the dynamic behaviour of the landslide under the influenceof the seasonal rainfall events, twoGPS displacementmonitoring instru-ments and one rainfall gauge were deployed in the field in 2013. Fig. 4ashows the monitoring data sets of cumulative displacement and dailyprecipitation from June 1st, 2013 toDecember 10th, 2015. It is obviouslyfound that the intense rainfall from June to September in each year isthe crucial influencing factor for the significant increase of cumulativedisplacement (Fig. 4b). Meanwhile, the lag-effect of the influence ofrainfall on the deformation of landslide can be seen in Fig. 4b whichshows that the peak value of daily displacement rate appears laterthan the peak of the daily precipitation. In addition, the effect of rainfallon the deformation of landslide varies in different states of landslide. So,

    rved in 2013; (b) Fitting regression between periodic components predicted and observed;on between the trend displacement predicted and observed.

  • 218 X. Zhu et al. / Engineering Geology 218 (2017) 213–222

    this trigger-response relationship between rainfall and deformation oflandslide may be characterized as being non-linear and complex.

    In this study, the time series observed from June 1st to December31st in 2013, a complete cycle from the rainfall season to the non-rain-fall season, are selected as data samples to construct and train the twomodels proposed, because they can reasonably reflect a trigger-re-sponse relationship between the rainfall factor and the displacementof the landslide. The data sets observed in 2014 are chosen to testthese two trained models. Table 2 shows a statistical description of theobserved data sets in 2013 and 2014, which indicate that themonitoreddata sets of 2013 are different in trends from those of 2014. However, itshould be noted that the 2014 data sets are more uniform (i.e. havelower standard deviation) for both displacement and precipitation.

    3.2. Forecasting displacement using DES-LSSVM model

    As is stated in Section 2.1, in DES-LSSVM model, the cumulativedisplacement time series observed is de-noised using the wavelet 1-Dfilter, and then is decomposed into the trend component and the peri-odic component utilizing the Hodrick-Prescott filter. By analysing thedeformation characteristics, the periodic component is the timely dy-namic response of the landslide to the external rainfall in the short-term, while the trend component represents the inherent dynamicbehaviour of the landslide in the long-term. The monitoring data ofthe period from June 2013 to December 2013 were selected to trainthe DES-LSSVM model, while the data observed from January 2014 toDecember 2014 were chosen as test sample to validate the model. Sea-sonal rainfall is the major external factor inducing the displacement oflandslide. Considering the lag-effect of the rainfall to changes ofmechan-ical behaviour of soil and rocks in the landslide, the average rainfall in-tensity among the prior 20 days was determined as one input of theLSSVM model, and the values of the periodic component in the prior2 dayswere considered as the other input of the LSSVMmodel to capturethe current dynamic state of the landslide. Meanwhile, the values of thetrend component in the prior 2 dayswere used to feed theDES algorithmto estimate the trend displacement of one-step ahead. We initialized anobjective DES-LSSVMmodel with RBF kernel type by using data samplesobserved in 2013 at first. And then the hyper-parameters (regularizationparameter γ and kernel parameter σ2) of the LSSVMmodel were tunedautomatically with respect to the lowest validation mean square error(MSE) by using cross-validation method and grid-search optimizationfunction. Therefore, these methods used in this research can ensurethat the trained model is optimal and free of overfitting.

    Fig. 5a and c show the periodic component and the trend compo-nent, respectively, as a result of the decomposition of the cumulativedisplacement. Fig. 5a shows the periodic displacement occurringmainlyfrom June to September each year, which is the period of rainfall season

    Fig. 6. (a) Training results of DES-LSSVM model for cumulative displacement as a summaticumulative displacements predicted and observed.

    each year. It demonstrated that the changes of the periodic displacementwere mainly influenced by the intense rainfall factor. Fig. 5b shows theregression between the values observed and estimated as the output oftrained LSSVM model for prediction of the periodic component. Fig. 5dindicates the great predictive performance of the Double ExponentialSmoothing method for the prediction of the trend component.

    Fig. 6a shows the comparison between the cumulative displacementcalculated by the summation of output of LSSVMmodel and the outputof DES algorithm, and the total displacement observed. As indicated inFig. 6b, the percentage error of the displacement estimated to the valuesobserved is b1%. It demonstrates that the hybrid machine learningmodel can be used to accurately predict the cumulative displacementof the rainfall-induced landslide.

    After training the hybrid model, the data observed in 2014 werepre-processed and used to test the trained model. Fig. 7 shows thefinal outcome of the hybrid trained model and the linear regression re-lationship between the values of cumulative displacement predictedand observed. Fig. 7a and b show the application results of DES-LSSVMmodel for periodic component, while Fig. 7c and d show the results ofDES-LSSVMmodel for the trend component prediction. Meanwhile, bycomparing the evolution process in 2014 with that in 2013, we cansee that the high average rainfall intensity in the prior certain days is re-sponsible for the timely sharp increase of cumulative displacement oflandslide. As shown in Fig. 7f, the correlation coefficient between thecumulative displacement predicted and observed is almost 1, whichmeans that the trained DES-LSSVMmodel has a good predictive accura-cy, generality and reliability.

    3.3. Forecasting displacement using GA-LSSVM model

    In GA-LSSVM, the daily displacement rate is calculated on the basisof the cumulative displacement values in 2013, which was firstly fil-tered bywavelet de-noisingmethod to remove themeasurement errorsinduced by the GPS system. The values of daily displacement rate in theprior three days and the average precipitation in the prior 20 days areselected as the input vector of the GA-LSSVMmodel, the daily displace-ment rate in the current day is chosen as the output of the GA-LSSVMmodel. The optimized parameters γ and σ2 for the GA-LSSVM modelare obtained through the GA optimization procedure and the trainingprocess. The total cumulative displacement can be predicted by addingall the displacement rates.

    Fig. 8a shows the daily displacement rate observed and estimatedusing GA-LSSVM model. Fig. 8b shows the fitting regression betweenthe values estimated and observed, and it also indicates the trainingperformance of theGA-LSSVMmodel. Fig. 8c andd show the cumulativedisplacement time series estimated and observed, and the differencebetween them, respectively. The results demonstrate that the selection

    on of the trend and periodic components predicted; (b) Percentage error between the

  • Fig. 7. (a) Prediction results of the periodic component using the DES-LSSVM model; (b) Linear fitting regression between the periodic values predicted and observed; (c) Predictionresults of the trend component based on the DES-LSSVM model; (d) Linear fitting regression between the trend values predicted and observed; (e) Cumulative displacementpredicted (blue line) utilizing trained model and values actually observed (red circle) in 2014; (f) Linear regression between the cumulative displacement predicted and observed.

    219X. Zhu et al. / Engineering Geology 218 (2017) 213–222

    of input vector for constructing and training the GA-LSSVM model isreasonable and reliable. The change of displacement can be accuratelyestimated using the constructed GA-LSSVM model as shown in Fig. 8a.

    After training the constructed GA-LSSVM model, the data setsobserved in 2014 are chosen to test this trained model. And then thevalues predicted are compared to the values observed quantitatively.Fig. 9 shows the prediction results and the comparison with the ob-served result. The results indicate that the trained GA-LSSVM modelcan predict the one-step ahead displacement change effectively and

    accurately based on the influencing factor (rainfall) and the state factor(displacement changes in recent days).

    3.4. Performance evaluation

    The proposedmodels are trained and cross-validated using the 2013data sets. In order to evaluate their predictability performance, the per-formance indexes as defined in Table 1, the 2014 data sets wereemployed as testing data sets.

  • Fig. 8. Constructing and training of GA-LSSVMmodel: (a) daily displacement rate in 2013 for trainingmodel. (b) Linear regression between the daily displacement rate predicted and thatobserved. (c) Cumulative displacements predicted and observed. (d) Linear regression between the cumulative displacements predicted and observed.

    220 X. Zhu et al. / Engineering Geology 218 (2017) 213–222

    Fig. 10 shows the comparison between the predicted values of themodels and the measured values of displacement in the rainfall seasonof 2014. It is suggested that the results obtained by both models areclose to the actual values, but the DES-LSSVM performs with better ac-curacy than GA-LSSVM.

    Table 3 also shows the quantitative estimated performance of themodels in terms of different criteria, RMSE, MAPE, R, AF. The RMSEand the MAPE of the DES-LSSVM model for the same testing data setsare 0.059 mm and 0.004%, respectively, which are significantly lowerthan those of the GA-LSSVM model. As it is indicated in Table 3, R = 1for the testing suggests that the calculated displacement values fromthe proposed model have an extremely good linear relationship withthe observed displacement values. By comparison, it is indicated thatusing different models for different components in total displacementcan provide better predictive performance for total displacement pre-diction because the inputs of themodels as the factors of landslide defor-mation are separately considered according to the understanding of theresponse relationship between the factors and the displacement of land-slide. Generally, the prediction performance of the DES-LSSVM modeloutperforms those of GA-LSSVMmodels in predicting the landslide dis-placement. However, the GA-LSSVM model can also provide accurateone-step ahead displacement rate, and has good predictive performancethat meet the requirements of most engineering applications.

    4. Discussion and conclusion

    One typical rainfall-induced landslide was chosen as a case study.Based on the analysis of displacement characteristics of the landslide,it is found that the annually intense rainfall is the crucial trigger for

    the evolution process of the landslide. Two machine learning modelswere implemented to predict the one-step ahead displacement of thelandslide. The first model was based on Hodrick-Prescott decomposi-tion, Least Square Support Vector Machines (LSSVM) and Double Expo-nential Smoothing (DES) method. In this model, the total displacementwas divided into the trend component and the periodic term accordingto the response between dynamic changes in landslide displacementand seasonal influencing factors. The one-step ahead fluctuation dis-placement and trend displacement were forecast by utilizing theLSSVM and DES models, respectively. And then the total predictive dis-placement was considered as the summation of the two components.The average intensity of the precipitation in the prior 20 days and thevalues of the periodic displacement in the prior 2 days were selectedas inputs of LSSVM model with respect to the periodic component.The values of the trend displacement in the prior 2 days were selectedas the inputs of DES to predict the trend term. The secondmodel imple-mented an optimized LSSVMmodelwithGenetic Algorithm. The chang-es of displacement on each day were calculated at first. The values ofdisplacement change in the prior 3 days and the average intensity ofthe rainfall in the prior 20 days were chosen as the inputs of the GA-LSSVM model to predict one-step ahead displacement change.

    The monitoring data sets of 2013 were used to construct and trainthe proposed models using cross-validation method while the datasetscorresponding to 2014were used to test themodels. The simulation re-sults indicate that both GA-LSSVM and DES-LSSVMmodels are accuratein fitting the observed value of slope displacement and are successful topredict the landslide displacement, whichwas primarily controlled by asingle external factor – rainfall. In particular, both models can success-fully predict the small changes of displacement of landslide, which is

  • Fig. 9. Testing results of GA-LSSVM model: (a) Daily displacement rate observed and predicted in 2014; (b) Linear regression between the daily displacement rate predicted and thatobserved; (c) Cumulative displacements observed and predicted; (d) Linear regression between the cumulative displacements predicted and observed.

    221X. Zhu et al. / Engineering Geology 218 (2017) 213–222

    able to provide good early warning. In addition, the comparison of pre-diction performances shows that the DES-LSSVM performs better accu-racy than the GA-LSSVM model. It also indicates that the predictiveaccuracy for landslide deformation can be improved by using differentmodels for different components separately, because the evolution oflandslide is controlled under the combined influence of internal and ex-ternal factors.

    Fig. 10. Comparison of predictive performance byGA-LSSVMandDES-LSSVMmodel: (a) CumulDES-LSSVMmodel.

    The high accuracy of the models (R close to 1) for both training andtesting is unusually excellent. As such, it is important to clarify the con-text of the implementation of the models. Firstly, the accuracy of thetraining results was high, as only one input variable was used for a 2-di-mensional data space. In addition, the testing data sets in comparison tothe training data sets, had lower standard deviation and therefore, weremore uniform with a data range that falls almost within that of the

    ative displacement predicted and observed; (b) Percentage errors of GA-LSSVMmodel and

  • Table 3Comparison of performance between the DES-LSSVM and GA-LSSVM models.

    Model

    Training performance (2013) Testing performance (2014)

    RMSE MAPE/% R AF RMSE MAPE/% R AF

    DES-LSSVM 0.3748 0.0434 0.9999 0.99984 0.0591 0.0042 1.00 1.00GA-LSSVM 0.7739 0.6690 0.9999 1.0114 0.7464 1.0539 1.00 1.01

    222 X. Zhu et al. / Engineering Geology 218 (2017) 213–222

    training data sets. This might also contribute to the high accuracy. A dif-ferent result could be achieved with higher dimension data sets. As amatter of fact, since there are several factors influencing the landslidedisplacement, it is suggested that the landslide displacement predictionshould be dealt as a multi-factor problem in future work.

    Acknowledgements

    Funding: this work was supported by the National Basic Re-search Program (973 Program) [grant numbers 2013CB733200,2014CB744703]; Project supported by the Funds for Creative ResearchGroups of China [grant number 41521002].

    References

    Cai, Z., Xu,W., Meng, Y., Shi, C., Wang, R., 2015. Prediction of landslide displacement basedon GA-LSSVM with multiple factors. Bull. Eng. Geol. Environ. http://dx.doi.org/10.1007/s10064-015-0804-z.

    Chai, J., Du, J., Lai, K.K., Lee, Y.P., 2015. A hybrid least square support vectormachinemodelwith parameters optimization for stock forecasting. Math. Probl. Eng. 2015:1–7.http://dx.doi.org/10.1155/2015/231394.

    Chen, H.Q., Zeng, Z.G., 2013. Deformation prediction of landslide based on improved back-propagation neural network. Cogn. Comput. 5:56–62. http://dx.doi.org/10.1007/s12559-012-9148-1.

    Chen, C., Yan, C., Li, Y., 2015a. A robust weighted least squares support vector regressionbased on least trimmed squares. Neurocomputing 168:941–946. http://dx.doi.org/10.1016/j.neucom.2015.05.031.

    Chen, J., Zeng, Z., Jiang, P., Tang, H., 2015b. Deformation prediction of landslide based onfunctional network. Neurocomputing 149:151–157. http://dx.doi.org/10.1016/j.neucom.2013.10.044.

    Du, J., Yin, K., Lacasse, S., 2012. Displacement prediction in colluvial landslides, ThreeGorges Reservoir, China. Landslides 10:203–218. http://dx.doi.org/10.1007/s10346-012-0326-8.

    Feng, X.-T., Zhao, H., Li, S., 2004. Modeling non-linear displacement time series of geo-ma-terials using evolutionary support vector machines. Int. J. Rock Mech. Min. 41:1087–1107. http://dx.doi.org/10.1016/j.ijrmms.2004.04.003.

    Hodrick, R., Prescott, E.C., 1997. Postwar U.S. business cycles: an empirical investigation.J. Money Credit Bank. 29, 1–16.

    Houck, C., Joines, J., Kay, M., 1995. A Genetic Algorithm for Function Optimization: AMatlab Implementation (NCSU-IE TR).

    Huang, S.B., Cheng, Q., Hu, H.T., 2005. A study on distribution of Sichuan red beds and en-gineering environment characteristics. Highway 81–85 (in Chinese).

    Jiang, P., Chen, J., 2016. Displacement prediction of landslide based on generalized regres-sion neural networks with k-fold cross-validation. Neurocomputing 198:40–47.http://dx.doi.org/10.1016/j.neucom.2015.08.118.

    Li, X.Z., Kong, J.M., 2014. Application of GA–SVMmethodwith parameter optimization forlandslide development prediction. Nat. Hazard. Earth Syst. Sci. 14:525–533. http://dx.doi.org/10.5194/nhess-14-525-2014.

    Lian, C., Zeng, Z., Yao, W., Tang, H., 2012. Displacement prediction model of landslidebased on a modified ensemble empirical mode decomposition and extreme learningmachine. Nat. Hazards 66:759–771. http://dx.doi.org/10.1007/s11069-012-0517-6.

    Lian, C., Zeng, Z., Yao, W., Tang, H., 2015. Multiple neural networks switched prediction forlandslide displacement. Eng. Geol. 186:91–99. http://dx.doi.org/10.1016/j.enggeo.2014.11.014.

    Lv, Y., Liu, H., 2012. Prediction of landslide displacement using grey and artificial neuralnetwork theories. Adv. Sci. Lett. 11, 511–514.

    Samui, P., Kurup, P., 2012. Multivariate adaptive regression spline (MARS) and leastsquares support vector machine (LSSVM) for OCR prediction. Soft. Comput. 16:1347–1351. http://dx.doi.org/10.1007/s00500-012-0815-7.

    Sassa, K., Picarelli, L., Yin, Y.P., 2009. Monitoring, prediction and early warning. Landslides- Disaster Risk Reduction:pp. 351–375 http://dx.doi.org/10.1007/978-3-540-69970-5_20.

    Suykens, J., Vandewalle, J., 1999. Least squares support vector machines classifiers. Neural.Process. Lett. 9, 293–300.

    Suykens, J.A.K., Gestel, T.V., Brabanter, J.D., Moor, B.D., Vandewalle, J., 2002. Least SquaresSupport Vector Machines. World Scientific, Singapore.

    Tsividis, Y., 2010. Event-driven data acquisition and digital signal processing - a tutorial.IEEE Transactions on Circuit and Systems-II: Express Briefs. 57, pp. 577–581.

    Vapnik, V., Golowich, S.E., Smola, A., 1997. Support vector method for function approxi-mation, regression estimation, and signal processing. Adv. Neural Inf. Proces. Syst.9, 281–287.

    Xu, Q., Liu, H., Ran, J., Li, W., Sun, X., 2016. Field monitoring of groundwater responses toheavy rainfalls and the early warning of the Kualiangzi landslide in Sichuan Basin,southwestern China. Landslides:1–16 http://dx.doi.org/10.1007/978-3-540-69970-5_2010.1007/s10346-016-0717-3.

    Yin, G., Zhang, W., Zhang, D., Kang, Q., 2007. Forecasting of landslide displacement basedon exponential smoothing and nonlinear regression analysis. Rock Soil Mech. 28,1725–1728 (in Chinese).

    Zhang, Z.Y., Wang, S.T., Wang, L.S., 1994. The Analytical Principle in Engineering Geology.Beijing Geological Publishing House, Beijing, China.

    Zhou, C., Yin, K., Cao, Y., Ahmed, B., 2016. Application of time series analysis and PSO–SVMmodel in predicting the Bazimen landslide in the Three Gorges Reservoir. China. Eng.Geol. 204:108–120. http://dx.doi.org/10.1007/978-3-540-69970-5_2010.1016/j.enggeo.2016.02.009.

    http://dx.doi.org/10.1007/s10064-015-0804-zhttp://dx.doi.org/10.1007/s10064-015-0804-zhttp://dx.doi.org/10.1155/2015/231394http://dx.doi.org/10.1007/s12559-012-9148-1http://dx.doi.org/10.1007/s12559-012-9148-1http://dx.doi.org/10.1016/j.neucom.2015.05.031http://dx.doi.org/10.1016/j.neucom.2015.05.031http://dx.doi.org/10.1016/j.neucom.2013.10.044http://dx.doi.org/10.1016/j.neucom.2013.10.044http://dx.doi.org/10.1007/s10346-012-0326-8http://dx.doi.org/10.1007/s10346-012-0326-8http://dx.doi.org/10.1016/j.ijrmms.2004.04.003http://refhub.elsevier.com/S0013-7952(17)30106-0/rf0040http://refhub.elsevier.com/S0013-7952(17)30106-0/rf0040http://refhub.elsevier.com/S0013-7952(17)30106-0/rf0045http://refhub.elsevier.com/S0013-7952(17)30106-0/rf0045http://refhub.elsevier.com/S0013-7952(17)30106-0/rf0050http://refhub.elsevier.com/S0013-7952(17)30106-0/rf0050http://dx.doi.org/10.1016/j.neucom.2015.08.118http://dx.doi.org/10.5194/nhess-14-525-2014http://dx.doi.org/10.1007/s11069-012-0517-6http://dx.doi.org/10.1016/j.enggeo.2014.11.014http://dx.doi.org/10.1016/j.enggeo.2014.11.014http://refhub.elsevier.com/S0013-7952(17)30106-0/rf0075http://refhub.elsevier.com/S0013-7952(17)30106-0/rf0075http://dx.doi.org/10.1007/s00500-012-0815-7http://dx.doi.org/10.1007/978-3-540-69970-5_20http://dx.doi.org/10.1007/978-3-540-69970-5_20http://refhub.elsevier.com/S0013-7952(17)30106-0/rf0090http://refhub.elsevier.com/S0013-7952(17)30106-0/rf0090http://refhub.elsevier.com/S0013-7952(17)30106-0/rf0095http://refhub.elsevier.com/S0013-7952(17)30106-0/rf0095http://refhub.elsevier.com/S0013-7952(17)30106-0/rf0100http://refhub.elsevier.com/S0013-7952(17)30106-0/rf0100http://refhub.elsevier.com/S0013-7952(17)30106-0/rf0105http://refhub.elsevier.com/S0013-7952(17)30106-0/rf0105http://refhub.elsevier.com/S0013-7952(17)30106-0/rf0105http://dx.doi.org/10.1007/978-3-540-69970-5_2010.1007/s10346-016-0717-3http://dx.doi.org/10.1007/978-3-540-69970-5_2010.1007/s10346-016-0717-3http://refhub.elsevier.com/S0013-7952(17)30106-0/rf0115http://refhub.elsevier.com/S0013-7952(17)30106-0/rf0115http://refhub.elsevier.com/S0013-7952(17)30106-0/rf0115http://refhub.elsevier.com/S0013-7952(17)30106-0/rf0120http://refhub.elsevier.com/S0013-7952(17)30106-0/rf0120http://dx.doi.org/10.1007/978-3-540-69970-5_2010.1016/j.enggeo.2016.02.009http://dx.doi.org/10.1007/978-3-540-69970-5_2010.1016/j.enggeo.2016.02.009

    Comparison of two optimized machine learning models for predicting displacement of rainfall-induced landslide: A case stud...1. Introduction2. Methodology2.1. A hybrid model composed of Least Square Support Vector Machines and Double Exponential Smoothing (DES-LSSVM)2.1.1. Decomposition of cumulative displacement utilizing the Hodrick-Prescott filter2.1.2. Construction of LSSVM model for predicting the periodic displacement component2.1.3. Double Exponential Smoothing (DES) for long-term trend prediction2.1.4. Prediction of cumulative displacement

    2.2. Optimized Least Square Support Vector Machine with Genetic Algorithm (GA-LSSVM)2.3. Quantitative criteria for evaluating models' performance

    3. Application and comparison3.1. Case study: Kualiangzi (KLZ) landslide in Sichuan Province, China3.2. Forecasting displacement using DES-LSSVM model3.3. Forecasting displacement using GA-LSSVM model3.4. Performance evaluation

    4. Discussion and conclusionAcknowledgementsReferences