84
IN DEGREE PROJECT ARCHITECTURE, SECOND CYCLE, 30 CREDITS , STOCKHOLM SWEDEN 2019 Artificial intelligence to model bedrock depth uncertainty BEATRIZ MACHADO KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ARCHITECTURE AND THE BUILT ENVIRONMENT

Artificial intelligence to model bedrock depth uncertaintykth.diva-portal.org/smash/get/diva2:1318203/FULLTEXT03.pdf · 2.12 Comparison between random forest and GBR (Kaggle, 2018)

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

  • IN DEGREE PROJECT ARCHITECTURE,SECOND CYCLE, 30 CREDITS

    , STOCKHOLM SWEDEN 2019

    Artificial intelligence to model bedrock depth uncertainty

    BEATRIZ MACHADO

    KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ARCHITECTURE AND THE BUILT ENVIRONMENT

  • TRITA-ABE-MBT-19205c©Beatriz Machado, 2019

  • Abstract

    The estimation of bedrock level for soil and rock engineering is a chal-lenge associated to many uncertainties. Nowadays, this estimation isperformed by geotechnical or geophysics investigations. These meth-ods are expensive techniques, that normally are not fully used becauseof limited budget. Hence, the bedrock levels in between investigationsare roughly estimated and the uncertainty is almost unknown.

    Machine learning (ML) is an artificial intelligence technique thatuses algorithms and statistical models to predict determined tasks.These mathematical models are built dividing the data between train-ing, testing and validation samples so the algorithm improve automat-ically based on past experiences.

    This thesis explores the possibility of applying ML to estimate thebedrock levels and tries to find a suitable algorithm for the predictionand estimation of the uncertainties. Many different algorithms weretested during the process and the accuracy level was analysed com-paring with the input data and also with interpolation methods, likeKriging.

    The results show that Kriging method is capable of predicting thebedrock surface with considerably good accuracy. However, when isnecessary to estimate the prediction interval (PI), Kriging presents ahigh standard deviation. The machine learning presents a bedrocksurface almost as smooth as Kriging with better results for PI. TheBagging regressor with decision tree was the algorithm more capableof predicting an accurate bedrock surface and narrow PI.

    Keywords: Machine learning, Kriging, prediction, algorithm.

    ii

  • Preface

    This research has been done as the final part on the Master of Science inCivil and Architectural Engineering program at the Royal Institute ofTechnology. The thesis has been carried out for the division of Soil andRock Mechanics in collaboration with Tyréns Konsult AB, as part ofthe BIG and BeFo project ”Rock and ground water including artificialintelligence”. The research was been conducted under supervision ofProfessor Stefan Larsson to whom I would like to express my gratitudefor all the support and guidance.

    I would like to thank Mats Svensson and Chunling Shan at TyrénsKonsult AB for their guidance during the whole process and constantlysupport. I would also like to thank Jennifer Wänseth at Tyréns KonsultAB for the opportunity of developing this project at the Stockholmoffice.

    Finally I would like to thank my family and friends for the uncon-ditional love and support for my decisions.

    Stockholm, May 2019

    Beatriz Machado.

    iv

  • Nomenclature

    Abbreviations

    CI Confidence interval

    GBR Gradient boosting regressor

    KNN K nearest neighbour

    ML Machine learning

    PI Prediction interval

    PILj Lower prediction interval

    PIUj Upper prediction interval

    PLLj Lower prediction limit

    PLUj Upper prediction limit

    RF Random forest

    x test Data used for testing the algorithm

    x train Data used for training the algorithm

    XGBoost Extreme gradient boosting regressor

    y pred Output performed by the ML algorithm

    vi

  • NOMENCLATURE

    y test True response of the testing data

    y train True response of the training data

    Greek letters

    α Parameter for level of accuracy in the predic-tion interval

    �i Variance for Kriging interpolation (error)

    γi Weight of each sample point

    µ Membership grade

    τ Membership gamma function

    Other Symbols

    A Membership triangular function

    a Minimum value for the data points

    b Maximum value for the data points

    Ci Cluster division

    ej Error value associated with the instance j

    K Rate of growth in Gamma Function

    L Membership L function

    m Mean values of the data points

    n Number of clusters

    S Membership S function

    yj Output model

    z(xi) Measured value at the location

    z(xo) Prediction value for Kriging interpolation

    vii

  • List of Figures

    2.1 Variogram example (Azavea, 2016) . . . . . . . . . . . . . 10

    2.2 Types of variograms used for Kriging on Python . . . . . . 11

    2.3 Code for Python to perform Kriging . . . . . . . . . . . . 15

    2.4 ArcMap software performing Kriging . . . . . . . . . . . . 16

    2.5 MicroMine software performing Kriging . . . . . . . . . . . 17

    2.6 Training, validation and testing (Pouyan, 2018) . . . . . . 18

    2.7 Bagging and boosting techniques (QuantDare, 2016) . . . 20

    2.8 Random forest for classification problems (Medium, 2018) 21

    2.9 Decision Tree algorithm (Dataaspirant, 2017) . . . . . . . 22

    2.10 KNN prediction (Medium, 2018) . . . . . . . . . . . . . . 23

    2.11 Comparison between random forest and GBR (Medium, 2017) 24

    2.12 Comparison between random forest and GBR (Kaggle, 2018) 25

    2.13 SVM non-linear and linear classification (Wikipedia, 2019) 26

    2.14 Terminology for prediction interval (Durga and Dimitri, 2006) 27

    2.15 Prediction interval computation (Durga and Dimitri, 2006) 29

    2.16 Prediction interval computation (Membership functions, 2012) 30

    2.17 L function diagram (Membership functions, 2012) . . . . . 30

    2.18 Gamma function diagram (Membership functions, 2012) . 31

    2.19 Triangular function diagram (Membership functions, 2012) 31

    3.1 Investigation points . . . . . . . . . . . . . . . . . . . . . . 34

    4.1 Ordinary Kriging results . . . . . . . . . . . . . . . . . . . 37

    4.2 Universal Kriging results . . . . . . . . . . . . . . . . . . . 37

    4.3 Ordinary kriging performed in a grid of 1000 ∗ 1000 m2,where the small points represent the input data. . . . . . . 38

    viii

  • List of Figures

    4.4 Ordinary kriging performed in a grid of 500∗500 m2, wherethe small points represent the input data. . . . . . . . . . 39

    4.5 Comparison of Kriging on Python, ArcMap and MicroMine 414.6 Comparison between machine learning algorithms, where

    the black lines represent the bedrock prediction and the reddots the true values. . . . . . . . . . . . . . . . . . . . . . 43

    4.7 Prediction for Bagging regressor with Decision Tree. . . . . 444.8 PI for Bagging regressor with Decision Tree. . . . . . . . . 454.9 Prediction for Bagging regressor with KNN. . . . . . . . . 464.10 PI for Bagging regressor with KNN. . . . . . . . . . . . . . 474.11 Prediction for random forest. . . . . . . . . . . . . . . . . 484.12 PI for Random Forest . . . . . . . . . . . . . . . . . . . . 494.13 Prediction for Random forest. . . . . . . . . . . . . . . . . 504.14 PI for Random Forest. . . . . . . . . . . . . . . . . . . . . 514.15 Prediction for KNN. . . . . . . . . . . . . . . . . . . . . . 524.16 Prediction interval for KNN. . . . . . . . . . . . . . . . . . 534.17 Prediction for GBR. . . . . . . . . . . . . . . . . . . . . . 544.18 Prediction interval for GBR. . . . . . . . . . . . . . . . . . 554.19 Prediction for GBR. . . . . . . . . . . . . . . . . . . . . . 564.20 Prediction interval for GBR. . . . . . . . . . . . . . . . . . 574.21 Prediction for XGBOOST. . . . . . . . . . . . . . . . . . . 584.22 Prediction interval for XGBOOST. . . . . . . . . . . . . . 594.23 Comparison of Kriging with ML. . . . . . . . . . . . . . . 61

    A.1 Python code for Bagging regressor with Decision Tree . . . 70

    A.2 Python code for Bagging regressor with Decision Tree . . . 71

    ix

  • Contents

    1 Introduction 2

    1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . 5

    1.5 Structure . . . . . . . . . . . . . . . . . . . . . . . . . 6

    2 Method 8

    2.1 Kriging . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    2.1.1 Variogram models . . . . . . . . . . . . . . . . . 9

    2.1.2 The kriging weights . . . . . . . . . . . . . . . . 12

    2.1.3 Types of kriging . . . . . . . . . . . . . . . . . . 13

    2.1.4 Kriging with commercial software’s and Python 15

    2.1.4.1 Python . . . . . . . . . . . . . . . . . 15

    2.1.4.2 ArcMap . . . . . . . . . . . . . . . . . 16

    2.1.4.3 MicroMine . . . . . . . . . . . . . . . 17

    2.2 Machine learning . . . . . . . . . . . . . . . . . . . . . 18

    2.2.1 Algorithms . . . . . . . . . . . . . . . . . . . . 20

    2.2.1.1 Random forest . . . . . . . . . . . . . 20

    2.2.1.2 Decision Tree . . . . . . . . . . . . . . 22

    2.2.1.3 K Nearest Neighbour . . . . . . . . . . 23

    2.2.1.4 Gradient Boosting regressor . . . . . . 24

    2.2.1.5 XGBoost . . . . . . . . . . . . . . . . 25

    2.2.1.6 Support vector machine . . . . . . . . 26

    2.2.2 Uncertainty prediction . . . . . . . . . . . . . . 27

    x

  • CONTENTS

    2.2.2.1 Prediction interval for Random forestalgorithm . . . . . . . . . . . . . . . . 28

    2.2.2.2 Prediction interval for GBR algorithm 282.2.2.3 General uncertainty using the member-

    ship functions . . . . . . . . . . . . . . 28

    3 Data 34

    4 Results 364.1 Kriging results . . . . . . . . . . . . . . . . . . . . . . 364.2 Machine learning results . . . . . . . . . . . . . . . . . 43

    4.2.1 Bagging regressor with Decision Tree - generalapproach to estimate the PI . . . . . . . . . . . 44

    4.2.2 Bagging regressor with KNN - general approachto estimate the PI . . . . . . . . . . . . . . . . 46

    4.2.3 Random forest . . . . . . . . . . . . . . . . . . 484.2.4 Random forest - general approach to estimate the

    PI . . . . . . . . . . . . . . . . . . . . . . . . . 504.2.5 K nearest neighbour - general approach to esti-

    mate the PI . . . . . . . . . . . . . . . . . . . . 524.2.6 Gradient Boosting Regressor . . . . . . . . . . . 544.2.7 Gradient Boosting Regressor - general approach

    to estimate the PI . . . . . . . . . . . . . . . . 564.2.8 XGBoost - general approach to estimate the PI 584.2.9 Comparison of ML results . . . . . . . . . . . . 60

    4.3 Kriging and Machine Learning comparison . . . . . . . 61

    5 Discussion and Conclusions 625.1 Kriging . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.2 Machine learning . . . . . . . . . . . . . . . . . . . . . 635.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 645.4 Suggestions for Further Research . . . . . . . . . . . . 64

    References 66

    Appendix A 70

    xi

  • 1. Introduction

    1.1 Background

    Artificial intelligence techniques have been recently applied to solveproblems in geotechnical engineering, as soil classification, settlementsof structures, pile capacity, etc. One specific reason is the difficultiesin defining the soil materials properties, besides the many assumptionsand simplifications included in the traditional methods. Another rea-son is due to the considerable success in modelling complex problemswith AI [18]. Geotechnical engineering deals with many uncertainties,as properties of the soils and defining the bedrock depth, what is acontrast with other civil engineering materials [19]. A good alternativeto deal with these complexities is to use machine learning and other AItechniques.

    The level of the bedrock is a valuable information both for design-ing and construction phase of many projects. Unexpected conditionsof bedrock level can increase the risks, cause delays and affects costestimation [4]. Due to this, the necessity of knowing the exact depthof the bedrock is a delicate matter that can avoid inconvenience. Toinvestigate the bedrock level is normally performed many soundingsaround the desired area but all the bedrock levels in the areas in be-tween investigations is roughly assumed. Moreover, these techniquesused to investigate the bedrock level are extremely expensive, normallythe goal is to perform an optimum amount of soundings. Sometimes

    2

  • CHAPTER 1. INTRODUCTION

    it is economically impossible to perform the necessary investigationsand in many points the bedrock level is abruptly estimated or totallyunknown. So, the idea of having an accurate prediction of the rocksurface with the same number of boreholes is valuable and attractive.

    Such investigations are basically performed by geotechnical or geo-physical methods [22]. When it comes to the estimation of a spatialsurface, interpolation methods like Kriging, inverse distance weight andSpline are normally used, but is still difficult to estimate the intervalof the predicted surface [25]. It is a common practice in Sweden toperform soil-rock soundings to estimate the bedrock surface. It is anaccurate method to determine the exact level of the bedrock but thedisadvantage is the high cost of drilling, besides it is a time consum-ing process and only gives the bedrock level at certain points [20]. Ifthe required accuracy level is achieved by a ML procedure, it is evenpossible to reduce the number of investigation boreholes and possi-ble costs. Nowadays ML techniques have been developed and appliedto geotechnical problems. In [8] it is discussed the promising resultsof AI application in many different problems, like pile bearing capac-ity, foundation settlements, subsurface characterization, prediction ofgeotechnical properties, etc.

    By using a software to simulate a model prediction is a simple andcheap way that permit the test of numerous different combinationswith the sample data, different methods and techniques with a fast ap-proach. In some software’s like Python, it is even possible to performboth Kriging and machine learning without any license cost, the onlyrequirement is to have a large input data for the simulations. Besidesthe fact that ML and Kriging are easily performed, another advantageis the fact that with these methods is possible to quantify the predictionintervals with the desired confidence interval. Basically, the bedrocklevel uncertainty can be quantified and manipulated for the area of in-terest. The quality of the prediction and the accuracy of these methodsis the aim of this master thesis.

    Supervised learning aims to estimate the outputs of a determinedproblem based on available data. By optimizing the algorithm param-eters, it is expected more accurate results. Many algorithms can beapplied to predict the bedrock depth, but when it comes to the pre-

    3

  • CHAPTER 1. INTRODUCTION

    diction interval mostly tree-based methods are capable of defining itdirectly on the code. A new research article by Shrestha and Solo-matine (2006) [16], suggests a general approach that is based on fuzzytheory method to estimate the PI. This tactic requires knowledge of themembership functions of the data subject to the uncertainty, howeverpromises good results.

    This study has focused on finding an algorithm capable of predictingthe bedrock surface with high accuracy and also estimating a PI assmall as possible. The outcomes show the performance of differentalgorithms with different approaches to estimate the PI, including thegeneral approach suggested by [16].

    1.2 Objectives

    The main focus in this thesis is to apply machine learning techniquesand test different algorithms to predict the bedrock surface with accu-racy. The predictions aim to estimate a narrow prediction interval thatcould be helpful in practice. After the simulations, the ML results havebeen compared with other methods, like Kriging and an analysis havebeen made for deciding what is the most suitable method for predictingthe bedrock surface and the uncertainty implied.

    This project has three basically objectives:

    - Compare Kriging performance on Python, ArcMap and MicroMine.

    - Model uncertainties, implementing machine learning techniquesto improve the prediction of the bedrock surface.

    - Compare which machine learning algorithm can predict more ac-curately the bedrock surface.

    - Compare machine learning results with Kriging.

    Furthermore, the objective of this study is to implement ML opti-mization as a common tool for rock and soil engineering.

    4

  • CHAPTER 1. INTRODUCTION

    1.3 Limitations

    Modelling a geological surface is a complex task that normally involvesmany simplifications. The model is a simplified representation of real-ity, so in general do not capture all details of the bedrock surface. AnyML technique or interpolation method follows a certain mathematicallogic, that in practice can fail in representing the real rock mass dis-tribution. The algorithm itself can have mathematical limitations thatdo not represent the real behaviour of the bedrock.

    Moreover, is important to notice that when using site investigationresults there is always a chance of error implied. It can be human errorthat present wrong values for specific points or it can be a wrong cali-brated equipment, causing bias or a data sample that do not representthe real site condition. The data set used in this study do not includegeological information and is not evenly distributed around the entirearea, which can impact the ML predictions.

    1.4 Methodology

    The tests for Kriging interpolation were carried out using Python, Ar-cMap and MicroMine softwares. For comparison reasons, Kriging wasperformed varying the size of the data set between the entire area of400 km2, 1000 ∗ 1000 m2 and 500 ∗ 500 m2.

    The ML was performed on Python and it was tested different algo-rithms, including random forest (RF), decision tree, K nearest neigh-bour (KNN), gradient boosting regressor (GBR), XGBoost and supportvector machine (SVM). Furthermore, it was tested a new approach toestimate the PI.

    Finally, the results for Kriging interpolation and machine learningwere compared.

    5

  • CHAPTER 1. INTRODUCTION

    1.5 Structure

    The first chapter contains background information about the topic andexplains the objectives and limitations of the study.

    In chapter 2, the methods used for this research are explained. Firstit is explained the Kriging interpolation method, to have knowledge ofhow the process is calculated. For example, different Kriging types andvariograms. Secondly, the machine learning algorithms are explained.Different algorithms are tested to find the best algorithms which pre-dict the bedrock levels with high accuracy and also estimate narrowprediction intervals.

    Chapter 3 covers the data used on this research, describing thenature of the project.

    In chapter 4, it was compared the results of ML algorithms betweenthemselves and also compared with Kriging results.

    Finally, chapter 5 discusses the results and summarize the conclu-sions for the research question.

    6

  • 2. Method

    In this chapter it is described the methods used for this research. Thefirst section presents the Kriging method, including the variogram mod-els, Kriging weights, types of Kriging and some software’s that can beused to perform Kriging. The second section contains the machinelearning subject, explaining the algorithms tested in this study andthe approaches used to estimate the prediction interval.

    2.1 Kriging

    Kriging is an interpolation method developed in 1960 to estimate thedistribution of certain characteristics from previous data. It consistsof a geostatistical interpolation method that is performed based on thespatial correlation of the observed points. Based on that is possible topredict a surface and estimate uncertainty of each interpolation value[12]. In soil science and geological engineering, Kriging is often usedbecause of the spatial correlation between the points, which allowsKriging to present good results [24].

    The basis of interpolation by Kriging is the distance weighting tech-nique. The process assumes that the distance between points implies acorrelation, that is used to estimate an interpolated value as a weightedsum of contributions from n neighbours [12]. The near points shouldhave a higher weight in the prediction. Thus, Kriging relies on a spa-tial structure that is modelled via a variogram [9]. The performance

    8

  • CHAPTER 2. METHOD

    of the method is strongly dependent on the autocorrelation among theobserved data. Basically, the data needs to be normally distributed,present no trends and be stationary.

    The entire Kriging process includes three steps. First, it is defined aspatial variation in the sample data. Secondly, the spatial variation canbe summarized by a mathematical function, a variogram. Finally, it isdetermined the interpolation weights and estimated the output values.The method is based on two main assumptions to provide a predictionthat is unbiased. It consists of stationary and isotropy, though the levelof this assumption can vary according with the Kriging type. Accordingto [7] ”things are the same everywhere”, this implies that for Kriginginterpolation the variance of the data distribution parameters are thesame for the entire area of interest. This stationary property meansthat all the calculations performed during the process use the samespatial covariation function and the same variogram.

    The Kriging method contains some assumptions that can affect andlimit the accuracy of results. The weights of the Kriging interpolatorare defined depending on the modelled variogram, so Kriging is verysensitive to a variogram model choice that do not fit the data. Besides,the assumptions of the Kriging model to be stationary and isotropicare difficult to reach in a real environment context. Furthermore, theaccuracy of Kriging interpolation is limited if the sampled data is smallor not correlated.

    2.1.1 Variogram models

    The variogram is a tool developed by a geostatistical method to analyzethe relation of spatial data and build the base for Kriging [24]. Avariogram is a representation of the covariance between a mathematicalfunction and the observed data. It is a particular function that hasappropriate mathematical properties will try to estimate the variationof the data within distance [12]. The points data are plotted relating tothe distance and the variogram model choice is the function that bestfits the data distribution. At close distances the correlation betweendata points is high and gradually decreases until disappears completelyat an infinity distance, as seen in figure 2.1.

    9

  • CHAPTER 2. METHOD

    Figure 2.1: Variogram example (Azavea, 2016)

    In order to make a prediction with the Kriging interpolation method,it is necessary to uncover the dependency rules of the data to createthe variogram. In this way, the relation of two points does not dependon the absolute geographical location but on the relative location [23].With these relations, it can be estimated the statistical dependence,which are values that depend on the model autocorrelation. So, thedata is used twice, first to estimate the spatial autocorrelation andsecond to make the predictions.

    The prediction of the unknown values is influenced by the variogrammodel, especially when the shape function differs to the points signifi-cantly. The model choice is defined depending on the spatial nature ofthe data and on the phenomena that is being studied, so the predic-tion can have good or bad accuracy depending on the model used. Thischoice should yield the ”best” model of uncertainty, normally differentfunctions are applied to the data set and the performance is compared[6]. Based on the variogram model, Kriging can estimate the weightsfor the calculation process so the autocorrelation between point data

    10

  • CHAPTER 2. METHOD

    and prediction can be quantified. The choice of the variogram is de-fined by the user but some techniques can be used to help this decisionas the maximum likelihood or Bayesian methods. In Python it is of-fered five different types of variograms, circular, spherical, exponential,linear and Gaussian functions, as seen in figure 2.2.

    Figure 2.2: Types of variograms used for Kriging on Python

    11

  • CHAPTER 2. METHOD

    2.1.2 The kriging weights

    The weights for Kriging interpolation are calculated based on the var-iogram. In general, the points close to the interest area are given ahigher weight than the points located farther away. So, until a certainrange the points have higher influence on each other and the level ofthis influence vary depending on the distance, until a certain distancewhere no influence is exerted at all. Clustering points in the data setare also considered, basically a cluster of points have less weight thansingle points, since they contain less information. These considerationsallow to reduce bias in the predictions, making Kriging an ”exact” in-terpolation, what means that each interpolation is calculated intendingto minimize the prediction error.

    The prediction for each interpolated point is calculated accordingwith the spatial characteristics of that specific area in reference to alldata. The same variogram is applied to the entire area and the weightvariation is assumed to be dependent on the distance [12]. In this way,the weight for each point is determined by the variogram and appliedto the sample points, so the prediction can be calculated according toequation (2.1).

    z(xo) =n∑i=1

    γiz(xi) (2.1)

    Where the prediction value z(xo) is equal to the sum of the weightsof each sampled point γ times the point specific measured value at thelocation z(xi).

    The sum of all the weights used for a prediction has to be equal toone, as shown in equation (2.2).

    n∑i=1

    γi = 1 (2.2)

    12

  • CHAPTER 2. METHOD

    2.1.3 Types of kriging

    There are several sub-types of Kriging, in this sections it is discussedsuperficially the most common ones. Basically all Kriging types arebased on the general equation (2.3), the difference is how the the meanµ and the variance � are considered.

    z(xo) = γ1z(x1) + γ2z(x2) + ...+ γnz(xn) + � (2.3)

    µ = γ1z(x1) + γ2z(x2) + ...+ γnz(xn) (2.4)

    z(xo) = µ+ �(xi) (2.5)

    Where the prediction value z(xo) is equal to the sum of the meanweights µ and the variance �.

    • Ordinary Kriging

    Ordinary Kriging assumes the model that both the mean µ and thevariance � are unknown constant values across the spatial distribution,as seen in equation (2.5).

    This is one of the simplest forms of Kriging, but one of the mainissues concerning ordinary Kriging is whether the assumption of a con-stant mean is reasonable.

    • Simple Kriging

    In simple Kriging the mean µ it is assumed to be a known constantand does not vary spatially, so the field is assumed to be isotropic. Thisassumption is often unrealistic but depending on the case it can makesense that a model gives a known trend. Hence, if the value for thevariance is known it is easier to estimate the autocorrelation.

    13

  • CHAPTER 2. METHOD

    • Universal Kriging

    For universal Kriging the mean value µ is assumed as a determin-istic function. Here the stationary assumption is relaxed, allowing themean values to differ in a deterministic way depending on the locationof the prediction, while the variance � is held constant across the en-tire field. Conceptually, the autocorrelation is now modelled from therandom errors �[s]. In fact, that is what universal Kriging is. It isperformed regression with the spatial coordinates. However, instead ofassuming that the errors �[s] are independent, you model them to beautocorrelated.

    Some disadvantages with universal Kriging are that the process ismore complex than other Kriging methods and consequently requiresmore computational effort.

    • Cokriging

    Ordinary cokriging extends the analyses to more variables at thesame time. It is a useful method when the variables are known to bespatially associated. Here, both values for mean µ1 and µ2 are unknownconstants, as seen in (2.4).

    z1(xo) = µ1 + �1(xi)

    z2(xo) = µ2 + �2(xi)(2.6)

    With this assumption, both values for variance are also estimated.The prediction of z1 is exactly like ordinary Kriging but using thecovariance information from z2, in order to have a better prediction.

    14

  • CHAPTER 2. METHOD

    2.1.4 Kriging with commercial software’s andPython

    For comparison reasons the Kriging interpolation process was per-formed in three different softwares, always using the same data set.The comparison was made on a input data distributed around 500∗500m2 with the same coordinates for all softwares, Python, ArcMap andMicroMine. The area of interest to plotting was between 147000 to147500 meters for the easting axis and 5567000 to 5567500 meters forthe northing axis. The grid selection for each interpolation is a userdefined parameter and in this study it was used 5 meter spacing.

    2.1.4.1 Python

    Python is programming language that can be used in many differentsoftware’s. Python is cost free and supports different modules andpackages. Kriging can be easily performed in Python, installing thelibrary ”pykrige”. This library offers many different types of Krig-ing, including ordinary and universal Kriging. It was performed bothOrdinary and Universal Kriging with four different variogram modelsto comparison. The variograms used were: Spherical, Exponential,Gaussian and Linear. After analyzing the results, it was chosen the ex-ponential variogram to proceed with more simulations, because of thebetter predicted accuracy. Figure 2.3 shows the code for performingKriging on Python is shown.

    Figure 2.3: Code for Python to perform Kriging

    15

  • CHAPTER 2. METHOD

    The interpolation process was performed with varying data set ex-tension and the grid size. The aim was to compare the standard devia-tion of Kriging and analyze the predicted surface, considering differentdata distributions.• 1st interpolation – Entire data set and grid of 400 km2• 2nd interpolation – Data and grid from 1000 ∗ 1000 m2• 3rd interpolation – Data and grid from 500 ∗ 500 m2

    2.1.4.2 ArcMap

    ArcMap is a part of a geospacial software called ArcGIS, normallyused to visualize, image processing, edit and create geospatial data.Performing Kriging on ArcMap is a simple process. After importingthe input data points, the Kriging interpolation can be selected on thespatial analyst tools, as seen in figure 2.4.

    Figure 2.4: ArcMap software performing Kriging

    16

  • CHAPTER 2. METHOD

    2.1.4.3 MicroMine

    Micromine is a commercial software used for mine engineering, whichoffer options for modelling, estimations and design of projects. Krig-ing can be performed in MicroMine easily choosing the ”Grid” option.After importing the data points, it can be created a grid using Kriginginterpolation method, as shown in figure 2.5.

    Figure 2.5: MicroMine software performing Kriging

    17

  • CHAPTER 2. METHOD

    2.2 Machine learning

    Artificial intelligence traditionally refers to an artificial creation of hu-man intelligence that can learn, reason, plan, perceive, or process nat-ural language [26]. Machine learning is a type of AI where algorithmsare used to solve a determined problem. Actually, many different al-gorithms have been developed by programmers to solve tasks both forclassification and regression problems. So instead of producing a se-quential code, ML allows the computer to solve problems that couldnot be manually programmed, like photo recognition or translatingpictures into sounds.

    According to [8] ML is an empirical approach where a computerprogram learns from a data set without the need to code the problemand a procedure to solve it. The number of ML applications is growingrapidly in all fields of engineering. The overall process of ML is to makethe computer learn generating new set of rules, based on inferencesfrom the previous data. The process includes training, testing andvalidation of the data. Training data is used for the learning process ofthe algorithm and the test data is used for checking if the predictionsare accurate or not. After that, the validation data is used to verifythe outcomes by validation sets. The randomization of the datasetdepends on the problem studied and are user defined. The optimumrandomization percentage can be found from testing different valuesfor training, testing and validation datasets. In fact, the more the dataavailable, the more the algorithm can learn [14]. Figure 2.6 illustratesthe most common percentages for splitting the data.

    Figure 2.6: Training, validation and testing (Pouyan, 2018)

    18

  • CHAPTER 2. METHOD

    One perspective with ML is that it involves searching possible hy-pothesis to determine the one that best fit the data observed. Allalgorithms (eg., linear regression, decision trees, ANN) represent a dif-ferent learning process with different underlying structures to organizethe hypothesis [11]. Although machine learning can include many dif-ferent techniques, it is typically categorized in three types [21]:• Supervised learning: the outputs of the algorithm are already in

    the data set. That means that some of the outputs for the machinelearning can be tested and validated.• Unsupervised learning: in this case the machine has no informa-

    tion about the outcome values and the results for the problems areunknown. This makes unsupervised learning a process of ML verychallenging.• Reinforced learning: the algorithm interacts with the environment

    that provides feedback in terms of reward and punishments until theoutput is accurate.

    On this study it was used supervised machine learning and testedensemble techniques like bagging and boosting. Using techniques likeBagging and Boosting helps to decrease the variance and increased therobustness of the model. Besides, the term ensemble is used whenmany algorithms are used together to give a final better prediction.Combinations of multiple classifiers decrease variance and produce amore reliable classification than a single classifier [10].

    To have a better understanding it is necessary to comprehend theconcepts for these techniques:• Bagging is composed by independent predictors running in paral-

    lel of each other, that are combined without preference to any model.The final predictions are an average output determined by combiningthe predictions from all the models, giving a more generalized result[10].• Boosting is a sequential process formulated in a way that the

    model predictors are not independent but sequentially formed. It worksas a group of algorithms that have weighted averages, making indi-vidual ”weak” learners into ”stronger” learners. So, each subsequentmodel attempts to correct the errors of the previous model dictatingthe features that the next model will focus on [10].

    Figure 2.7 shows the difference between bagging and boosting..

    19

  • CHAPTER 2. METHOD

    Figure 2.7: Bagging and boosting techniques (QuantDare, 2016)

    2.2.1 Algorithms

    In this section, it is briefly explained all the algorithms tested in thisstudy.

    2.2.1.1 Random forest

    Random forest is an ensemble learning method used for supervisedlearning based on different types of algorithms in order to create amore accurate and stable model, what is also called ensemble learning.The name ”random forest” originates because this algorithm combinesdecision trees with a bagging classifier resulting in a ”forest”, beingused both for classification and regression. The randomness of themethod is added due to the great number of trees, instead of selectingthe most important feature while splitting a node, it performs randomcombinations [1]. So, the results are more diverse what generally resultsin a better model.

    The performance of random forest is very hard to beat, is fast andcan handle different features like binary, numerical and categorical [1].One of the problems of this algorithm is over-fitting, that is when themodel fits the train data but do not fit the test data. To avoid over-fitting, the main requirement is to optimize a parameter that governsthe number of features that are randomly chosen to grow at each tree

    20

  • CHAPTER 2. METHOD

    from the data. On the other hand, if the model contains too manytrees it can make the algorithm ineffective for real life predictions. Itis necessary to find a balance between accuracy requirements and timeperformance that satisfy the prediction.

    Figure 2.8 shows the steps performed by RF algorithm. The firststep is to select random records from the data set, consequently a spec-ified number of decision trees are built based on the records. Finally,for regression problems each tree in the forest will predict a value forthe output and the final value can be calculated by the average of alloutputs in the forest. For classification problems each tree will predicta category as output and the new prediction will be assigned to thecategory with more votes.

    Figure 2.8: Random forest for classification problems (Medium, 2018)

    21

  • CHAPTER 2. METHOD

    2.2.1.2 Decision Tree

    Decision tree is a type of supervised learning method that uses discretetarget-valued functions and these functions are represented by a tree[11]. It is very common algorithm used both for classification and re-gression problems. This algorithm type has a mathematical logic, ittries to imitate the human brain making decisions. The decision treeclassifications are made with the majority of votes over the total num-ber of trees [5]. Every particular instance is classified or valued startingin the root of the tree until the leaves. In the beginning of the processthe whole training data is considered as root and by every decision thedata is classified. Each node in a tree represents an attribute, the linksrepresent a decision and finally the leaves are the outcomes, as shownin figure 2.9.

    Decision tree learning is a robust method, good results are presentedeven when the training data presents errors or part of the data samplesare unknown. Another advantage of this algorithm is that in additionto the outputs it can also measure the classification margin (predictionintervals). On the other hand, it can often create very complex treesthat do not generalize the data.

    Figure 2.9: Decision Tree algorithm (Dataaspirant, 2017)

    22

  • CHAPTER 2. METHOD

    2.2.1.3 K Nearest Neighbour

    KNN is a type of supervised machine learning algorithm capable ofperforming complex classification and regression tasks. The processof classification is defined by the training data without any additionalinformation. KNN is a typical example of a lazy learner not becauseof simplicity but due to the fact that it learns a discriminative func-tion from the training data. The prediction for the testing data isdetermined according to the neighbours and the category with higherprobability wins [28]. In KNN process there is no training phase, ratherall data is used in the prediction. That means that KNN does not as-sume any previous behaviour of the data set, being a non-parametricalgorithm. This is a useful tool since many problems do not present auniform or linear distribution.

    In this method the choice of an optimum K value is very importantand influences the algorithm results in a significant way. This choicecan be done by inspecting the data or with cross-validation. Generally,a large K value gives more precise results. An optimum value for Kis normally in between 3 to 10. Figure 2.10 shows the logic behindKNN. A specified distance, that can be Euclidean or Manhattan, of anew data point to all other training points is assigned, so the predictedoutput is the value with higher probability between the distance.

    Figure 2.10: KNN prediction (Medium, 2018)

    23

  • CHAPTER 2. METHOD

    2.2.1.4 Gradient Boosting regressor

    Gradient boosting regressor is an ensemble machine learning technique,normally combining a boosting algorithm with decision trees, that canbe used both for regression and classification problems [10]. As a boost-ing method, GBR combines several ”weak learners” trained sequen-tially into a single ”strong learner”. Following this logic, GBR is aninteractive process where every predictor learns from the mistakes ofthe previous, strengthening the model based on the error level. Thepredictions are updated until the residuals do not present patterns(errors are close to zero) and consequently the outputs are accurate.The final predictions are defined by the majority of votes of the ”weaklearners” and weighted by their accuracy level. The boosting algorithmperformance normally increases the speed of the predictions, since lessinteractions are needed to achieve the actual predictions.

    Figure 2.11 shows the differences between bagging and boostingtechniques, as example the random forest and GBR algorithms areused.

    Figure 2.11: Comparison between random forest and GBR (Medium, 2017)

    24

  • CHAPTER 2. METHOD

    2.2.1.5 XGBoost

    Extreme Gradient Boosting (XGBoost) is a variation of gradient boost-ing with alterations that increase the performance. This method isnormally faster than other boosting techniques and presents higherperformance. Hence, some other features have been increased, as ex-tra randomization parameters, proportional shrinking of leaf nodes,etc. This extra randomization reduces the correlation between trees,and the lesser the correlation among classifiers the better the ensem-ble algorithm will turn out. Also, the proportional shrinking of leafnodes allows XGBoost to have a greater number of terminal nodes andweight for the trees that are calculated. When compared to gradientboosting, XGBoost is faster but the application range is more limited.As a boosting technique, in every new prediction improves the resid-uals of the previous prediction, minimizing the error [27]. Figure 2.12demonstrates the steps performed for XGBoost.

    Figure 2.12: Comparison between random forest and GBR (Kaggle, 2018)

    25

  • CHAPTER 2. METHOD

    2.2.1.6 Support vector machine

    Support vector machine is a ML technique that analyses data, used forclassification and regression analysis. According to a training data, theSVM algorithm builds a model that divides new points into categories,what makes SVM a non-parametric classifier. The model divides pointsin space, in a way that the categories are divided with maximum mar-gin. Basically, it is constructed a hyperplane or set of hyperplaneswhich create a separation for the different data set. The separationsshould create the largest distance to the nearest training points of anyclass, because that minimizes the number of misclassification errors.The trade in between having margin and error is controlled by theconstant ”C”, that is user defined [13].

    One of the problems with SVM is the choice of the kernel function,loss function and appropriate parameters that will allow generalizationon the performance. However, from a practical point of view the mostserious problem with SVM is the algorithmic complexity and memoryrequirements for programming large tasks. In addition to linear classi-fication, SVM can also perform non-linear classifications depending onthe Kernel choice, as seen in figure 2.13.

    Figure 2.13: SVM non-linear and linear classification (Wikipedia, 2019)

    26

  • CHAPTER 2. METHOD

    2.2.2 Uncertainty prediction

    In this section it is explained the different approaches used to estimatethe prediction interval. But first is necessary to define the terminologiesthat has been used. In figure 2.14 it is shown the definitions usedaccording to Sherestha and Solomatine (2006) [16].

    Figure 2.14: Terminology for prediction interval (Durga and Dimitri, 2006)

    27

  • CHAPTER 2. METHOD

    2.2.2.1 Prediction interval for Random forest algorithm

    With the random forest algorithm is possible to estimate the predictionintervals for every prediction using a function. This defined functionassign an upper prediction limit and lower prediction limit for the entirelength of the test data (x test) and subsequently for all points of thegrid. Then it is used the RF algorithm to predict the intervals, usingthe desired percentage for the confidence interval. After the estimationof the upper and lower prediction limit the accuracy for the intervals aretested. The accuracy is calculated checking if the true values (y test)are inside the PI [2], as seen in equation (2.7).

    Lower limit ≤ y test ≥ upper limit (2.7)

    2.2.2.2 Prediction interval for GBR algorithm

    The gradient boosting regressor algorithm possess a parameter thatallows the estimation of prediction interval. When this parameter isdefined as ”quatile” it possible to apply the algorithm to predict theupper and lower prediction limits. The prediction is made for thelength of the test data (x test) and also for the grid. After predictingthe intervals, the parameter is altered to default initial condition andthe prediction for the outputs can be performed. The same accuracyanalyses is performed using the equation (2.7).

    2.2.2.3 General uncertainty using the membershipfunctions

    A new technique for estimating the prediction interval for non-linearregression was proposed by Shrestha and Solomatine (2006) [16]. Thisnew method for estimating the prediction interval is based on the ideathat the residuals (error) between the model outputs and the true val-ues are the best indicator of the discrepancy of the model. The resid-uals give valuable information that can be used to estimate the modeluncertainty, since the error are normally functions that can be mod-elled. The approach is based on a fuzzy cluster method that requiresthe knowledge of the membership functions that can quantify the un-certainty [16].

    28

  • CHAPTER 2. METHOD

    The process of clustering involves the separation of the data inhomogeneous groups with similarly measure. In the traditional clus-tering ”k-means”, the data points are assumed to belong to exactlyone cluster. In the cases where one points belongs to several clusters,it is interpreted as ”fuzzy” membership in a cluster or ”c-means”. So,a point can belong to many clusters with some degree (membershipgrade) in a range of 0 to 1 [17].

    The first step of the process is clustering the data, that involvespartitioning the data corresponding to different distributions of histor-ical residuals. In each cluster it is assumed that that the residuals havethe same distribution. So, membership functions are used to indicatethe degree to which the data points belong or not to the different clus-ters, this is determined based on a distance function. The accuracy ofdifference membership functions has to be tested because some pointsthat obviously belong to one cluster (C1) are sometimes attributed toanother cluster (C2). These points are the results of noisy data orwrong measurements and are normally called ”aliens” [3].

    In order to construct a prediction interval with (1 − α)100% con-fidence, the percentiles values for lower and upper limits are (α

    2)100%

    and (1− α2)100% as shown in figure 2.15. A typical value for α is 0.05,

    which corresponds to a PI of 95%. In the simulations performed it wasassumed a crisp cluster, so each instance belongs to exactly one cluster,what makes the computation straight forward.

    Figure 2.15: Prediction interval computation (Durga and Dimitri, 2006)

    29

    bmoStämpel

  • CHAPTER 2. METHOD

    For this study it was tested four membership functions, S, L, gammaand triangular functions, according to [15]. These functions indicatethe degree that each data points belongs to a cluster.

    - S function

    Figure 2.16: Prediction interval computation (Membership functions, 2012)

    S(x) =

    0 ifx ≤ a2(x−a

    b−a )2 ifx ∈ (a,m)

    1− 2(x−bb−a)

    2 ifx ∈ (m, b)0 ifx ≥ b

    (2.8)

    - L function

    Figure 2.17: L function diagram (Membership functions, 2012)

    L(x) =

    1 ifx ≤ ab−xb−a ifa < x ≤ b0 ifx > b

    (2.9)

    30

  • CHAPTER 2. METHOD

    - Gamma function

    Figure 2.18: Gamma function diagram (Membership functions, 2012)

    τ(x) =

    {0 ifx ≤ ak(x−a)2

    1+k(x−a)2 ifx > a(2.10)

    - Triangular function

    Figure 2.19: Triangular function diagram (Membership functions, 2012)

    A(x) =

    0 ifx ≤ ax−am−a ifx ∈ (a,m)b−xb−m ifx ∈ (m, b)0 ifx ≥ b

    (2.11)

    31

  • CHAPTER 2. METHOD

    After calculating the membership functions for each instance, thefollowing expressions can be used to find the upper and lower predictioninterval.

    PICLi = ej (2.12)

    j :

    j∑k=1

    µi, k <α

    2

    n∑j=1

    µi, jv (2.13)

    PICUi = ej (2.14)

    j :

    j∑k=1

    µi, k < 1−α

    2

    n∑j=1

    µi, j (2.15)

    Where j is the instance when the maximum value of the equationsabove keep satisfying the inequality, ej is the error associated with theinstance j and µ(i, j) is the membership grade of the instance j tocluster i.

    So, the prediction interval for each cluster is calculated by equations(2.16) and (2.17):

    PILj =c∑i=1

    µi, jPICLi (2.16)

    PIUj =c∑i=1

    µi, jPICUi (2.17)

    Where PILj and PIUj are the lower and upper prediction interval

    for j instance. Once the lower and upper prediction interval for eachinput instance is obtained, prediction limits can be calculated simplyadding the model output, as seen in equations (2.18) and (2.19).

    PLLj = yi + PILj (2.18)

    PLUj = yi + PIUj (2.19)

    Where PLLj and PLUj are the lower and upper prediction limits for

    instance j.

    32

  • 3. Data

    The data used for this study is from the project Tvärförbindelse Södertörn.In total 2788 boreholes where used, with data interpreted from dif-ferent geotechnical investigations and sampling. Many methods wereperformed, including Soil/Rock probing, Ram sounding, Stick sound-ing, Weight sounding, Percussion sounding with registration and Conepenetration test. Figure 3.1 shows the location of all the investigatedpoints in the studying area. The blue points represent the trainingdata set (80%) and the red cross show the location of the test data set(20%). All the points where used for the machine learning predictionand the plotted area was focused in an area of 500 ∗ 500 m2.

    Figure 3.1: Investigation points

    34

    bmoStämpel

  • CHAPTER 3. DATA

    Table 3.1 shows part of the data set used for this study. All the 2788boreholes have a determined ”X” and ”Y” coordinate, being eastingand northing respectively. The ”Z” column represents the bedrockdepths in meters, the ”GL” column is the visible ground level and the”method” column represents the geotechnical investigations performed.

    Index Northing [Y] Easting [X] Depth [Z] GL [m] Method

    14W214 6567125.313 146984.352 36.744 5.00 Jb214W23 6567210.691 147002.234 27.72 11.28 Jb2 Cpt Tolk13G353 6572663.525 144929.250 20.063 9.00 Jb2 HfA Tolk14W300 6567527.766 147314.885 28.066 2.90 Jb2 Prov11W200 6569457.940 144146.838 25.844 11.24 Jb2 Tolk17T678 6567858.466 147550.560 6.782 18.52 Jb2 Vim Cpt Prov Tolk17T610 6560015.246 157000.717 9.551 29.60 Jb2 Vim Cpt Tolk17T1024 6571089.206 142530.397 -47.534 49.92 Jb2 Vim HfA Tolk14W74 6567063.022 146900.738 29.91 7.37 Jb2 Vim Tolk09W069 6565894.627 149377.578 16.225 10.26 Cpt14W25 6567200.477 146973.462 34.613 1.70 Cpt Prov Tolk73SJ48 6567167.430 147054.191 23.95 8.05 HfA09T043 6571113.977 143359.358 11.504 12.95 HfA Prov

    85SC112 6566472.151 146489.340 32.4 4.00 Jb85SC73 6566560.708 146553.202 28.3 7.60 Jb Slb17T1049 6571050.399 142753.078 -49.859 52.13 Jb316T091 6567612.797 147670.792 16.877 8.32 Jb3 Tolk12G832 6572776.100 144989.880 19.488 9.15 Slb09W046 6566496.366 149108.121 32.87 4.90 Slb Cpt70SJ10 6567247.890 147113.706 28.7 4.00 Sti12W67 6566409.104 146190.435 34.402 4.28 T09V043 6567548.534 147982.740 15.265 8.80 Vim

    B9 6567090.975 146979.891 26.425 5.50 Vim HfA72SGI224 6567661.627 147336.497 19.9 6.10 Vim Slb

    35

  • 4. Results

    The results for Kriging interpolation and machine learning are dis-played in this chapter. The first section contain the results for Krigingand all the comparisons performed for this part. The second part in-cludes the results for machine learning and all the algorithms tested inthis study. The last section consists of the comparison between Krigingand ML results.

    4.1 Kriging results

    In this thesis is studied both ordinary and universal Kriging, bothmethods comparing different variograms.

    Figures 4.1 and 4.2 show the results for ordinary and universalKriging performed with the entire data set. The variograms used inthe simulations varied between spherical, exponential, Gaussian andlinear. From the figures, it is possible to see that both ordinary anduniversal Kriging present similar results, besides the exponential var-iogram was the best mathematical function that fits the data. Thepredicted surface for the exponential variogram varies according to theinput data (colors points), as shown in the upper left corner of eachfigure. Besides, the linear and Gaussian variograms failed to representthe data, presenting almost no variation in the bedrock depth. Forthis reason all the other interpolations were done using the exponentialvariogram.

    36

  • CHAPTER 4. RESULTS

    Figure 4.1: Ordinary Kriging results

    Figure 4.2: Universal Kriging results

    37

  • CHAPTER 4. RESULTS

    Comparisons of the predicted surfaces with different areas were alsocarried out. The results are shown in figures 4.3 and 4.4 with areas of1000 ∗ 1000 m2 and 500 ∗ 500 m2 respectively. As shown in the figures,Kriging performance in the small area of 500∗500 m2 shows a predictionsurface that is more close to the input data (colorful dots). This is dueto the fact that the input data points were more evenly distributed,so the interpolated prediction surface was very detailed and took intoaccount almost all input data. Also, the right bottom graphs show thehistograms of predicted intervals for all the grid points, for example infigure 4.3 the histogram shows that the PI varies between 15.20 to 32.7meters. In figure 4.4 the histogram shows the PI between 5.0 to 25.0meters, which means that Kriging in smaller areas estimates a betterPI than in bigger areas.

    Figure 4.3: Ordinary kriging performed in a grid of 1000 ∗ 1000 m2, wherethe small points represent the input data.

    38

  • CHAPTER 4. RESULTS

    Figure 4.4: Ordinary kriging performed in a grid of 500 ∗ 500 m2, where thesmall points represent the input data.

    39

  • CHAPTER 4. RESULTS

    Table 4.1 compares Kriging performance when the data set is changedbetween the entire area, 1000∗1000 m2 and 500∗500 m2. The purposeof the comparison was to analyze if the results would differ when thedata distribution changes. In the table, it is possible to see that botherror and standard deviation increase considerably when the interpo-lated area increases. For example, the standard deviation increase from4.18 meters (500 ∗ 500 m2) to 14.51 meters (entire data).

    Table 4.1: Performance of Kriging changing the data set

    Kriging on Python

    Data of 500 * 500 m2

    Soil layer Soil layer Soil layer SD [m]max depth min depth mean depth

    Interpolation [m] 1.57 16.64 6.57 4.18Observed data [m] 1.10 17.00 6.35 -

    Data from 1000 * 1000 m2

    Soil layer Soil layer Soil layer SD [m]max depth min depth mean depth

    Interpolation [m] 2.93 21.28 8.11 5.49Observed data [m] 1.10 28.14 7.50 -

    Entire data 400 km2

    Soil layer Soil layer Soil layer SD [m]max depth min depth mean depth

    Interpolation [m] 20.66 39.50 11.82 14.51Observed data [m] 0.77 63.95 7.44 -

    40

  • CHAPTER 4. RESULTS

    Kriging was also performed in MicroMine and ArcMap and the re-sults were compared to Kriging on Python. Note that all the Krigingswere carried out in the same 500 ∗ 500 m2 area with the same inputdata, for the comparison purpose. All three interpolations present somesimilarities, specially Python and ArcMap. Although MicroMine inter-polation differs on the top and bottom parts, presenting values that donot correspond to the input data, as shown in figure 4.5

    Figure 4.5: Comparison of Kriging on Python, ArcMap and MicroMine

    41

    bmoStämpel

  • CHAPTER 4. RESULTS

    Table 4.2 compares the results for Kriging performed in differentsoftwares with the same input data. It was chosen three different pointsto be tested, the maximum and minimum depth of the 500 ∗ 500 m2area and the mean depth of the surface. The estimated soil depthsfrom different softwares simulations were compared with the input data(observed values from geotechnical investigations) to evaluate the levelof accuracy. Kriging on ArcMap was the estimation more close to theinput data. However, the standard deviation from Kriging in Pythonwas the smallest, presenting a value of 4.18 meters.

    Table 4.2: Comparison of Kriging process between softwares

    Data from 500 * 500 m2 ArcMap Python MicroMine Input data

    Soil Layer at max depth [m] 1.24 1.57 -5.88 1.10Soil Layer at min depth [m] 16.77 16.64 15.97 17.00

    Soil Layer at mena depth [m] 6.48 6.57 5.88 6.35SD[m] 5.46 4.18 6.68 -

    42

  • CHAPTER 4. RESULTS

    4.2 Machine learning results

    In this section it is presented all different algorithms tested for themachine learning techniques. Figure 4.6 shows that the first four algo-rithms present better predictions when compared to the last two, sincethe predicted bedrock levels (black lines) are more close to the inputdata (red dots).

    Figure 4.6: Comparison between machine learning algorithms, where theblack lines represent the bedrock prediction and the red dots the true values.

    43

    bmoStämpel

  • CHAPTER 4. RESULTS

    4.2.1 Bagging regressor with Decision Tree -general approach to estimate the PI

    Figure 4.7 demonstrates the prediction surface for bagging regressorwith Decision Tree, with upper and lower prediction intervals. Also,the right bottom graph shows the histogram of predicted intervals forall the grid points. The values for the PI are distributed in between8.8 and 9.6 meters, having an average value of 9.13 meters.

    Figure 4.7: Prediction for Bagging regressor with Decision Tree.

    44

  • CHAPTER 4. RESULTS

    Figure 4.8 shows the comparison of true values (y test) with thepredicted intervals. In some points the ground level (GL) is below theupper limit, what is not actually possible meaning that some upperlimits values are not true. The confidence interval for this algorithmis 81%, which means that 81% of the true values are included in theprediction interval. The histogram in the bottom shows the predictedintervals for all y test data, that is an estimated interval to representthe level of uncertainty for future observations.

    Figure 4.8: PI for Bagging regressor with Decision Tree.

    45

  • CHAPTER 4. RESULTS

    4.2.2 Bagging regressor with KNN - generalapproach to estimate the PI

    Figure 4.9 demonstrates the prediction surface for bagging regressorwith KNN, with upper and lower prediction intervals. Also, the rightbottom graph shows the histogram of predicted intervals for all thegrid points. The values for the PI are distributed in between 9.1 and9.7 meters, having an average value of 9.49 meters.

    Figure 4.9: Prediction for Bagging regressor with KNN.

    46

  • CHAPTER 4. RESULTS

    Figure 4.10 shows the comparison of true values (y test) with thepredicted intervals. In some points the ground level (GL) is below theupper limit, what is not actually possible meaning that some upperlimits values are not true. The confidence interval for this algorithmis 82%, which means that 82% of the true values are included in theprediction interval. The histogram in the bottom shows the predictedintervals for all y test data, that is an estimated interval to representthe level of uncertainty for future observations.

    Figure 4.10: PI for Bagging regressor with KNN.

    47

  • CHAPTER 4. RESULTS

    4.2.3 Random forest

    Figure 4.11 demonstrates the prediction surface for random forest, withupper and lower prediction intervals. Also, the right bottom graphshows the histogram of predicted intervals for all the grid points. Thevalues for the PI are distributed in between 5.5 and 21.8 meters, havingan average value of 15.75 meters.

    Figure 4.11: Prediction for random forest.

    48

  • CHAPTER 4. RESULTS

    Figure 4.12 shows the comparison of true values (y test) with thepredicted intervals. In some points the ground level (GL) is below theupper limit, what is not actually possible meaning that some upperlimits values are not true. The confidence interval for this algorithmis 87%, which means that 87% of the true values are included in theprediction interval. The histogram in the bottom shows the predictedintervals for all y test data, that is an estimated interval to representthe level of uncertainty for future observations.

    Figure 4.12: PI for Random Forest

    49

  • CHAPTER 4. RESULTS

    4.2.4 Random forest - general approach toestimate the PI

    Figure 4.13 demonstrates the prediction surface for random forest, withupper and lower prediction intervals. Also, the right bottom graphshows the histogram of predicted intervals for all the grid points. Thevalues for the PI are distributed in between 9.2 and 11.8 meters, havingan average value of 11.03 meters.

    Figure 4.13: Prediction for Random forest.

    50

  • CHAPTER 4. RESULTS

    Figure 4.14 shows the comparison of true values (y test) with thepredicted intervals. In some points the ground level (GL) is below theupper limit, what is not actually possible meaning that some upperlimits values are not true. The confidence interval for this algorithmis 85%, which means that 85% of the true values are included in theprediction interval. The histogram in the bottom shows the predictedintervals for all y test data, that is an estimated interval to representthe level of uncertainty for future observations.

    Figure 4.14: PI for Random Forest.

    51

  • CHAPTER 4. RESULTS

    4.2.5 K nearest neighbour - general approach toestimate the PI

    Figure 4.15 demonstrates the prediction surface for KNN, with upperand lower prediction intervals. Also, the right bottom graph shows thehistogram of predicted intervals for all the grid points. The values forthe PI have an average value of 10.09 meters.

    Figure 4.15: Prediction for KNN.

    52

  • CHAPTER 4. RESULTS

    Figure 4.16 shows the comparison of true values (y test) with thepredicted intervals. In some points the ground level (GL) is below theupper limit, what is not actually possible meaning that some upperlimits values are not true. The confidence interval for this algorithmis 81%, which means that 81% of the true values are included in theprediction interval. The histogram in the bottom shows the predictedintervals for all y test data, that is an estimated interval to representthe level of uncertainty for future observations. The PI for the KNNalgorithm presents almost the same value around 10.0 meters. Thiscan be a problem associated with this specific algorithm, that was notcapable of estimating a proper interval.

    Figure 4.16: Prediction interval for KNN.

    53

  • CHAPTER 4. RESULTS

    4.2.6 Gradient Boosting Regressor

    Figure 4.17 demonstrates the prediction surface for GBR, with upperand lower prediction intervals. Also, the right bottom graph showsthe histogram of predicted intervals for all the grid points. The valuesfor the PI are distributed in between 10.2 and 19.9 meters, having anaverage value of 21.09 meters.

    Figure 4.17: Prediction for GBR.

    54

  • CHAPTER 4. RESULTS

    Figure 4.18 shows the comparison of true values (y test) with thepredicted intervals. In some points the ground level (GL) is below theupper limit, what is not actually possible meaning that some upperlimits values are not true. The confidence interval for this algorithmis 85%, which means that 85% of the true values are included in theprediction interval. The histogram in the bottom shows the predictedintervals for all y test data, that is an estimated interval to representthe level of uncertainty for future observations.

    Figure 4.18: Prediction interval for GBR.

    55

  • CHAPTER 4. RESULTS

    4.2.7 Gradient Boosting Regressor - generalapproach to estimate the PI

    Figure 4.19 demonstrates the prediction surface for GBR, with upperand lower prediction intervals. Also, the right bottom graph showsthe histogram of predicted intervals for all the grid points. The valuesfor the PI are distributed in between 12.0 and 14.3 meters, having anaverage value of 12.61 meters.

    Figure 4.19: Prediction for GBR.

    56

  • CHAPTER 4. RESULTS

    Figure 4.20 shows the comparison of true values (y test) with thepredicted intervals. In some points the ground level (GL) is below theupper limit, what is not actually possible meaning that some upperlimits values are not true. The confidence interval for this algorithmis 88%, which means that 88% of the true values are included in theprediction interval. The histogram in the bottom shows the predictedintervals for all y test data, that is an estimated interval to representthe level of uncertainty for future observations.

    Figure 4.20: Prediction interval for GBR.

    57

  • CHAPTER 4. RESULTS

    4.2.8 XGBoost - general approach to estimatethe PI

    Figure 4.21 demonstrates the prediction surface for XGBoost, with up-per and lower prediction intervals. Also, the right bottom graph showsthe histogram of predicted intervals for all the grid points. The valuesfor the PI are distributed in between 20.0 and 27.0 meters, having anaverage value of 24.18 meters.

    Figure 4.21: Prediction for XGBOOST.

    58

  • CHAPTER 4. RESULTS

    Figure 4.22 shows the comparison of true values (y test) with thepredicted intervals. In some points the ground level (GL) is below theupper limit, what is not actually possible meaning that some upperlimits values are not true. The confidence interval for this algorithmis 93%, which means that 93% of the true values are included in theprediction interval. The histogram in the bottom shows the predictedintervals for all y test data, that is an estimated interval to representthe level of uncertainty for future observations.

    Figure 4.22: Prediction interval for XGBOOST.

    59

  • CHAPTER 4. RESULTS

    4.2.9 Comparison of ML results

    The results for the ML algorithms are compared in table 4.3 and 4.4,where values for error, prediction interval and confidence interval (CI)are presented.

    Table 4.3 compares the algorithms predictions using the general ap-proach to estimate PI proposed by Sherestha and Solomatine (2006)[16]. The best prediction was performed by Bagging regressor with De-cision tree, presenting an error of 17.24%, a PI of 9.13 meters and a CIof 81%. Where the confidence interval (CI) represents the percentageof input data that are actually included in the PI.

    Table 4.3: Performance of ML using the general approach to estimate PI

    ML Algorithm Error [%] PI [m] CI [%]

    Bagging + Decision Tree 17.24% 9.13 81%Bagging + KNN 19.74% 9.49 82%

    KNN 30.54% 10.09 81%Random forest 30.80% 11.03 85%

    GBR 38.35% 12.61 88%XGBOOST 61.85% 24.18 93%

    Table 4.4 compare the algorithms that are capable of estimatingthe PI inside the algorithm, without any external approach. For thesepredictions the error were higher than by using the general approachproposed by Sherestha and Solomatine (2006) [16]. Besides, the esti-mated PI were bigger, presenting values of 15.75 for random forest and21.09 for GBR.

    Table 4.4: Performance of ML with other methods to estimate PI

    ML Algorithm Error [%] PI [m] CI [%]

    Random forest 30.35% 15.75 87%GBR 38.14% 21.09 85%

    60

  • CHAPTER 4. RESULTS

    4.3 Kriging and Machine Learning

    comparison

    Figure 4.23 shows the comparison of the results from Kriging on Pythonwith ML with the algorithm Bagging regressor with Tree decision. Asseen the surface prediction is similar for both methods, Kriging presentsa really smooth prediction considering almost all input data. Hence,when it comes to the PI, ML presents a narrower value. The histogramson the right corner show the distribution of predicted intervals for allthe grid points. The predicted values for ML present a mean of 9.13while Kriging has an average of 16.75 meters.

    Figure 4.23: Comparison of Kriging with ML.

    61

  • 5. Discussion and Conclusions

    In this chapter, the results presented are compared and discussed. Con-clusions about the study are also explained. Finally, suggestions forfurther researches are pointed out.

    5.1 Kriging

    It was performed many simulations of Kriging on python by changingthe area of interest between the entire data set area and smaller areas,like 1000 ∗ 1000 m2 and 500 ∗ 500 m2. It was noticed that the surfaceprediction and the standard deviation depend on the distribution ofthe selected data, as expected. Smaller areas as with points evenlydistributed, have more accurate results and a smaller covariance whencompared to bigger areas.

    From the analyses, it was concluded that Kriging presents accurateresults if the data is well distributed over the area of interest. As ispossible to see in figure 4.4, the predicted surface is really close tothe real values, presenting a detailed surface. This is due to the factthat the data points were more evenly distributed in the small areaof 500 ∗ 500 m2. When the data is expanded to 1000 ∗ 1000 m2, theamount of data points per square meter is reduced, giving a less detailedprediction with higher standard deviation.

    The performance of Kriging in different software’s was comparedand expressed in figure 4.5. Python and ArcMap present closer inter-

    62

  • CHAPTER 5. DISCUSSION AND CONCLUSIONS

    polation surfaces, being more accurate with the real values. On theother hand, MicroMine present strange values in some areas, like bot-tom and top left side, that do not correspond to the true data. Basedon this comparison, it is observed that Kriging in Python gives morereliable results and it shows even better bedrock surface predictionsthan the Kriging in ArcMap and MicroMine.

    5.2 Machine learning

    The machine learning results show that nonlinear regression algorithmslike SVM and boosting techniques like XGBoost and GBR are not ca-pable of predicting the bedrock surface with good accuracy. On theother hand, bagging techniques like Random forest, KNN, Bagging re-gressor with KNN and Bagging regressor with Decision Tree presentbetter results. The biggest challenge with the machine learning tech-nique was to find a way to estimate a narrow prediction interval. Onlyrandom forest and GBR offered the option of calculating the PI directly,with other algorithms it was necessary to apply a general method. Thegeneral approach to estimate the PI proposed by [16] was the methodtested in this study. During the calculations the membership functionhow presented the best results was the gamma function. With thisapproach it was possible to predict a smaller PI and also present goodprediction accuracy, having more than 81% of the input data inside theinterval.

    The simulations performed showed better predictions for baggingregressor both with Decision Tree and KNN. The smallest PI wereestimated using the general approach, as seen in table 4.3. Baggingregressor with Decision Tree were the algorithms that better performedthe ML, presenting an error of 17.24% and a PI of 9.13 meters with81% of the input data is inside the PI.

    63

  • CHAPTER 5. DISCUSSION AND CONCLUSIONS

    5.3 Conclusions

    The study concludes that Kriging is a reliable interpolation methodif the sample data points are evenly distributed around the interestedarea. Kriging presents a better interpolation surface in small areasthan in bigger ones, giving a more detailed predicted surface and con-sequently a smaller standard deviation. It also concludes that bothKriging on Python and ArcMap present a better performance thanKriging on MicroMine.

    The machine learning simulations demonstrate that the best al-gorithm tested on this data set was Bagging regressor with DecisionTree. These results can be useful for early designing phases but furtherresearch on how to predict an even narrower PI is still needed.

    Comparing Kriging with ML it is noticed that Kriging can predict asmoother bedrock surface, where the input data are almost not noticed,but when it comes to the PI, Kriging fails in determine a small interval.The ML bedrock prediction is slightly not as smooth as Kriging butstill presenting a really good accuracy. However, the PI was betterestimated, almost half of the value estimated by Kriging. Based onthat it is concluded that ML is a better approach for the bedrock levelprediction. Besides, ML is very robust method and is not influencedby the size of the studied area, even when the interested area varies,the predicted results are always the same.

    5.4 Suggestions for Further Research

    Further research can be performed by testing different algorithms, usingartificial neural network techniques and even applying deep learning orreinforced learning. Another suggestion is to test different approachesto estimate the prediction interval.

    The ground level is only visualized in the results section, but itwas not considered in the algorithm prediction phase. For future re-search, it should be taking into account the ground levels and use it asconstraints in the training process of the algorithm. This way the pre-

    64

  • CHAPTER 5. DISCUSSION AND CONCLUSIONS

    diction interval calculations will always lead to a value equal or smallerthan the ground level.

    In this study it was considered only the uncertainty of the bedrockdepth. In order to make more predictions or a more complete study, itcan also be included other uncertainties in ML, for example the qualityof the rock.

    Finally, in this study it was only used data from geotechnical inves-tigations. Future studies can integrate geophysical, geotechnical andoutcrop data for mapping the bedrock surface.

    65

  • References

    [1] Barandiaran, I., Cottez, C., Paloc, C. & Graña, M. ComparativeEvaluation of Random Forest and Fern Classifiers for Real-TimeFeature Matching. p.159-166, Spain, 2008.

    [2] Bayley, S. & Falessi, D. Optimizing Prediction Intervals by TuningRandom Forest via Meta-Validation. 2018.

    [3] Bhattacharya, B. & Solomatine, D. P. Machine learning in soilclassification. Hydroinformatics and Knowledge Management De-partment, Neural Networks 19 p.186–195, Netherlands, 2006.

    [4] Danielsen, B. E. The applicability of geoelectrical methods in pre-investigation for construction in rock. Doctoral Thesis , EngineeringGeology, Lund University, Sweden, 2010.

    [5] Freund, Y. & Mason, L. The alternating decision tree learning al-gorithm. AT & T Labs, USA, 1999.

    [6] Goovaerts, P. Geostatistical modelling of uncertainty in soil science.Geoderma 103, p.3-26, USA, 2001.

    [7] Henley, S. The importance of being stationary. Earth Science Com-puter Applications, v.16, no.12, p.1-3, England, 2001.

    [8] Juwaied, N. S. Applications of artificial intelligence in geotechnicalengineering. College of Engineering, Department of Civil Engineer-ing, v.13, no.8, Iraq, 2018.

    [9] Lichtenstern, Andreas. Kriging methods in spatial statistics. Tech-nical University of Munich, Department of Mathematics, 2013.

    66

  • REFERENCES

    [10] Medium: Gradient boostinghttps://medium.com/mlreview/gradient-boosting-from-scratch-1e317ae4587d

    [11] Mitchell, T. M. Machine learning. McGraw-Hill Science, BookNews, Inc., Portland, 1997.

    [12] O’Sullivan, D. & Unwin, D. Geographics Information Analysis.Knowing the unknowable: The statistics of field, p.292-311, NewJersey, 2010.

    [13] Osuna, E., Freund, R. & Girosi F. Training support vector ma-chines: an applications to face detection. Center of biological andcomputational learning and operations research center MIT, USA,1997.

    [14] Pirnia, P., Duhaime, F. & Manashti, J. Machine learning al-gorithms for applications in geotechnical engineering. Departmentof Construction Engineering – École de technologie supérieure,Canada, 2018.

    [15] ResearchGate: Membershipfunctionshttps://www.researchgate.net

    [16] Shrestha, D. L. & Solomatine, D. P. Machine learning approachesfor estimation of prediction interval for the model output. Hydroin-formatics and Knowledge Management Department, Neural Net-works 19 p.225-235, Netherlands, 2006.

    [17] Shrestha, D. L. & Solomatine, D. P. Predicting hydrological modelsuncertainty use of machine learning. UNESCO-IHE Institute forWater, Conference Paper, 2007.

    [18] Shahin, M. A., Jaksa, M. B. & Maier, H. R. Artificial neural net-work applications in geotechnical engineering. Australian Geome-chanics Journal 36, p.49-62, 2001.

    [19] Shahin, M. A., Jaksa, M. B. & Maier, H. R. Applications of artifi-cial neural network in foundation engineering. Geotechnical Chal-lenges and Solutions No. 5.3, 2004.

    [20] Svenska Geotekniska Föreningen. SGF Rapport 2:99 Me-todbeskrivning för jord-bergsondering. p.1-30, 1999.

    67

  • REFERENCES

    [21] Towards data science: Types of machine learninghttps://towardsdatascience.com/types-of-machine-learning-algorithms-you-should-know-953a08248861

    [22] Transportation Research Board. Geophysical Methods CommonlyEmployed for Geotechnical Site Characterization. Transportationresearch circular E-C130, 2008.

    [23] Wackernagel, H. Multivariate Geostatistics: An Introduction withApplications. Heidelberg: Springer, Berlin, 2003.

    [24] Webster, R. & Oliver, M. A. Geostatistics for Environmental Sci-entists. Statistics in Practice, John Wiley & Sons, Ltd., 2007.

    [25] Wijemannage1, A. L. K., Ranagalage, M. & Perera, E. N. C.Comparison of spatial interpolation methods for rainfall data in SriLanka. Asian Association on Remote Sensing, Sri Lanka, 2018.

    [26] Wikipedia: Artificial intelligencehttps://en.wikipedia.org/wiki/Artificial intelligence

    [27] XGBoost: Introduction to boosted treeshttps://xgboost.readthedocs.io/en/latest/tutorials/model.html

    [28] Yong, Z., Youwen, L. & Shixiong, X. An Improved KNN TextClassification Algorithm Based on Clustering. Journal of computersv.4, no.3, China, 2009.

    68

  • Appendix A

    Figure A.1: Python code for Bagging regressor with Decision Tree

    70

  • APPENDIX A

    Figure A.2: Python code for Bagging regressor with Decision Tree

    71

  • TRITA ABE-MBT-19205

    www.kth.se