10
Research Article Geological Type Recognition by Machine Learning on In-Situ Data of EPB Tunnel Boring Machines Qian Zhang, 1 Kaihong Yang, 1 Lihui Wang, 2 and Siyang Zhou 1 1 Key Laboratory of Modern Engineering Mechanics, School of Mechanical Engineering, Tianjin University, Tianjin 300072, China 2 Department of Military Vehicle, Academy of Military Transportation, Tianjin 300161, China CorrespondenceshouldbeaddressedtoSiyangZhou;[email protected] Received 6 November 2019; Revised 25 March 2020; Accepted 30 March 2020; Published 27 April 2020 AcademicEditor:AkhilGarg Copyright©2020QianZhangetal.isisanopenaccessarticledistributedundertheCreativeCommonsAttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. At present, many large-scale engineering equipment can obtain massive in-situ data at runtime. In-depth data mining is conducive to the real-time understanding of equipment operation status or recognition of service environment. is paper proposesageologicaltyperecognitionsystembytheanalysisofin-situdatarecordedduringTBMtunnelingtoaddressgeological informationacquisitionduringTBMconstruction.Owingtohighdimensionalityandnonlinearcouplingbetweenparametersof TBMin-situdata,thedimensionalityreductionfeatureengineeringandmachinelearningmethodsareintroducedintoTBMin- situdataanalysis.echi-squaretestisusedtoscreenforsensitivefeaturesduetothedisobediencetocommondistributionsof TBMparameters.Consideringcomplexrelationships,ANN,SVM,KNN,andCARTalgorithmsareusedtoconstructageology recognitionclassifier.Acasestudyofasubwaytunnelprojectconstructedusinganearthpressurebalancetunnelboringmachine (EPB-TBM)inChinaisusedtoverifytheeffectivenessoftheproposedgeologicalrecognitionmethod.eresultshowsthatthe recognitionaccuracygraduallyincreasestoastablelevelwiththeincreaseofinputfeatures,andtheaccuracyofallalgorithmsis higherthan97%.SevenfeaturesareconsideredasthebestselectionstrategyamongSVM,KNN,andANN,whilefeatureselection isaninherentpartoftheCARTmethodwhichshowsagoodrecognitionperformance.isworkprovidesanintelligentpathfor obtaining geological information for underground excavation TBM projects and a possibility for solving the problem of en- gineering recognition of more complex geological conditions. 1. Introduction Withtherapiddevelopmentofsensordetectiontechnology, an increasing number of large-scale engineering equipment are made available to capably provide rich monitoring data inrealtimeduringconstruction.esedatacontainalarge numberofcontrolrulesrelatedtoequipmentoperation.An intelligent analysis of engineering monitoring data can provideanewpathfortheresearchoncomplexengineering problems and offer a decision-making basis for intelligent control of the engineering equipment. Atunnelboringmachine(TBM)isatypeoflarge-scale engineering equipment that is widely used in tunneling construction. is equipment combines the functions of soilcutting,soildebrisconveying,andtunnelsupportingto achievefullmechanizedconstructionoftunnelengineering [1]withahighdegreeofsafetyandconstructionefficiency [2]. A schematic diagram of the construction of a TBM is shown in Figure 1. A TBM consists of many parts, in- cluding a cutter head, TBM head, gripper, guniting, jumbolter, and belt conveyor [3]. Functions such as ex- cavation, support, and guidance need to be carried out by different mechanical components and the synergy of the overall mechanical system. During the process of TBM excavation, it is necessary to continuously adjust the constructionstrategybasedontheoperationalstateandthe information of the surrounding geological environment, which is an important basis for the safe and efficient op- eration of the machine [4]. However, due to the special characteristicoftheundergroundexcavationbyaTBM,the Hindawi Mathematical Problems in Engineering Volume 2020, Article ID 3057893, 10 pages https://doi.org/10.1155/2020/3057893

Geological Type Recognition by Machine Learning on In-Situ Data of EPB Tunnel Boring ... · 2020. 4. 27. · parametersinthisworkaremin-maxnormalizedbefore machinelearningclassification.ecalculationmethodis

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Research ArticleGeological Type Recognition by Machine Learning on In-SituData of EPB Tunnel Boring Machines

    Qian Zhang,1 Kaihong Yang,1 Lihui Wang,2 and Siyang Zhou 1

    1Key Laboratory of Modern Engineering Mechanics, School of Mechanical Engineering, Tianjin University,Tianjin 300072, China2Department of Military Vehicle, Academy of Military Transportation, Tianjin 300161, China

    Correspondence should be addressed to Siyang Zhou; [email protected]

    Received 6 November 2019; Revised 25 March 2020; Accepted 30 March 2020; Published 27 April 2020

    Academic Editor: Akhil Garg

    Copyright © 2020 Qian Zhang et al. -is is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

    At present, many large-scale engineering equipment can obtain massive in-situ data at runtime. In-depth data mining isconducive to the real-time understanding of equipment operation status or recognition of service environment. -is paperproposes a geological type recognition system by the analysis of in-situ data recorded during TBM tunneling to address geologicalinformation acquisition during TBM construction. Owing to high dimensionality and nonlinear coupling between parameters ofTBM in-situ data, the dimensionality reduction feature engineering and machine learning methods are introduced into TBM in-situ data analysis. -e chi-square test is used to screen for sensitive features due to the disobedience to common distributions ofTBM parameters. Considering complex relationships, ANN, SVM, KNN, and CART algorithms are used to construct a geologyrecognition classifier. A case study of a subway tunnel project constructed using an earth pressure balance tunnel boring machine(EPB-TBM) in China is used to verify the effectiveness of the proposed geological recognition method. -e result shows that therecognition accuracy gradually increases to a stable level with the increase of input features, and the accuracy of all algorithms ishigher than 97%. Seven features are considered as the best selection strategy among SVM, KNN, and ANN, while feature selectionis an inherent part of the CARTmethod which shows a good recognition performance. -is work provides an intelligent path forobtaining geological information for underground excavation TBM projects and a possibility for solving the problem of en-gineering recognition of more complex geological conditions.

    1. Introduction

    With the rapid development of sensor detection technology,an increasing number of large-scale engineering equipmentare made available to capably provide rich monitoring datain real time during construction. -ese data contain a largenumber of control rules related to equipment operation. Anintelligent analysis of engineering monitoring data canprovide a new path for the research on complex engineeringproblems and offer a decision-making basis for intelligentcontrol of the engineering equipment.

    A tunnel boring machine (TBM) is a type of large-scaleengineering equipment that is widely used in tunnelingconstruction. -is equipment combines the functions ofsoil cutting, soil debris conveying, and tunnel supporting to

    achieve full mechanized construction of tunnel engineering[1] with a high degree of safety and construction efficiency[2]. A schematic diagram of the construction of a TBM isshown in Figure 1. A TBM consists of many parts, in-cluding a cutter head, TBM head, gripper, guniting,jumbolter, and belt conveyor [3]. Functions such as ex-cavation, support, and guidance need to be carried out bydifferent mechanical components and the synergy of theoverall mechanical system. During the process of TBMexcavation, it is necessary to continuously adjust theconstruction strategy based on the operational state and theinformation of the surrounding geological environment,which is an important basis for the safe and efficient op-eration of the machine [4]. However, due to the specialcharacteristic of the underground excavation by a TBM, the

    HindawiMathematical Problems in EngineeringVolume 2020, Article ID 3057893, 10 pageshttps://doi.org/10.1155/2020/3057893

    mailto:[email protected]://orcid.org/0000-0002-0534-2101https://creativecommons.org/licenses/by/4.0/https://doi.org/10.1155/2020/3057893

  • poor service conditions and complex muddy environmentmake it very inconvenient to observe the construction stateof a TBM. -erefore, the monitoring data acquired andrecorded by various sensors loaded on the main parts of themachine form an important information basis for under-standing the working state of the equipment [5–8]. CurrentTBMs can simultaneously monitor hundreds of operationparameters, such as the tunneling rate, cutter head rota-tional speed, cylinder thrust, cutter head torque, and sealedchamber pressure. -ese in-situ data contain rich infor-mation on interactions between the machine and thesurrounding environment. -e rapid development oftechnologies such as big data and artificial intelligence inrecent years provides a more effective method and path forin-depth and sufficient exploration of information pro-vided by in-situ data to realize the informatization andintelligence of TBM construction [9].

    In the early stage, the engineering data of TBM wasused to establish some empirical models used in solvingpractical problems conveniently. For example, Krause [10]used the data of hundreds of TBM construction fromGermany and Japan to analyze and give the empiricalprediction range of tunneling load. In addition, classicempirical models include the NTNUmodel [11] developedby the Norwegian University of Science and Technologyand the improved model of Bruland [12], which are oftenused in engineering for the prediction of the rate ofpenetration. On the basis of empirical models, manyscholars have further given parameter prediction modelsbased on a statistical analysis of engineering data. Forexample, Zhang et al. [13] established tunneling loadprediction model for the earth pressure balance tunnelboring machine (EPB-TBM) by combining regressionanalysis with dimensional analysis from engineering data.Avunduk and Copur [14] established a nonlinear re-gression model of rate of penetration by several soilproperty parameters such as particle size distribution andnatural water content. Macias et al. [15] analyzed thechange rule of prediction curve of rate of penetration of ahard rock TBM under different fracturing conditions, andthe fracturing coefficient was determined as an effectiveindex of the influence of rock fracturing on tunnelingperformance. -e single index affecting the rate of

    penetration was regressed by Armetti et al. [16] to analyzethe influence degree of different parameters in the em-pirical model on the tunneling performance. Vergara andSaroglou [17] established the regression relationship be-tween the weighted rock mass rating and mixed-facepenetration index under the condition of mixed geology,considering the proportion of rock and soil in the tun-neling face. Yagiz et al. [18] set up regression equation topredict the tunneling performance under the condition ofjoint fault rock based on the rock properties such asdistance between planes of weakness and orientation ofdiscontinuities in rock mass. -e works based on TBM in-situ data mainly focus on the basic statistical regression ofkey tunneling performance parameters, with some limi-tations on the applicable problems and the number offeatures that can be considered. Statistical regression canextract and describe the rules in the data. But TBM is acomplex engineering system with hundreds of parameterscollected in the process. Moreover, TBM in-situ data areoften characterized by many influencing factors andnonlinear coupling between parameters, which makes itdifficult for in-depth data mining [19]. -e valuable in-formation hidden within the massive monitoring dataremains to be explored.

    In recent years, machine learning algorithms havebeen developed rapidly. Because of their excellent non-linear expression ability and adaptability to massive data,they provide powerful tools for TBM in-situ data analysis.Some typical works are as follows: Bouayad and Emeriault[20] established a prediction model of ground settlementcaused by shield machine based on earth pressure balancethrough the principal component analysis (PCA) andadaptive neuro-fuzzy inference system (ANFIS). Mah-devri et al. [21] used a support vector machine and ar-tificial neural network to predict the tunnel convergencecaused by ground compression and verify the outputresults and measured data of the model through engi-neering examples. Hyun et al. [22] combined fault treeanalysis (FTA) and analytic hierarchy process (AHP) toanalyze the risk and probability of shield construction andconstructed a risk management system according to goodconsistency. Salimi et al. [23] used nonlinear regressionand artificial intelligence algorithms to predict the per-formance of the hard rock TBM. Sun et al. [24] establisheda model by random forest to predict the dynamic load ofshield tunneling. Gholamnejad and Tayarani [25] used anartificial neural network to predict the rate of penetrationwith three rock mass parameters of uniaxial tensilestrength, rock quality index, and weak face spacing andtried to evaluate the results with different hidden-layersettings. Adoko et al. [26] proposed a Bayesian method toselect the performance of different tunneling machines.Seker and Ocak [27] compared the application effect ofrandom forest and other ensemble learning algorithms inthe prediction of the rate of penetration. Gao et al. [28]used several kinds of recurrent neural networks to analyzethe sequence rule of TBM performance parameters, so asto predict the important performance parameters inadvance.

    Figure 1: Schematic diagram of TBM underground construction.

    2 Mathematical Problems in Engineering

  • Previous studies have shown that machine learningmethods can be used in multiparameter analysis of TBMdata. In addition, these results indicate that changes in thegeological types during TBM driving will be reflected in thein-situ data through tunneling between the machine and thegeology. Due to the characteristics of underground exca-vation in TBM, various geological conditions may be facedin TBM tunneling. -e geology varies greatly betweendifferent projects, such as soft soil, hard rock, and compositeground. -erefore, geological conditions are the importantfactors affecting the project and geological type recognitionis one of the major tasks in TBM engineering.-erefore, it isa feasible way to identify geology category by digging intothe relationship between TBM in-situ data and geologicalconditions. Furthermore, it may be a feasible way to analyzethe relationship between TBM in-situ data and geologicalconditions, so as to identify different construction geologicaltypes.

    In this paper, feature selection and machine learningmethods are introduced into the engineering data analysis topropose a geological recognition system based on in-situdata analysis during tunneling. -e proposed methodprovides an effective way to acquire geological informationfor construction decision-making. -e influence parameterssensitive to the change of geological type are selected as inputfeatures by the feature engineering algorithm for dimensionreduction. While four machine learning classification al-gorithms KNN (k-nearest neighbor), SVM (support vectormachine), ANN (artificial neural network), and CART(classification and regression tree) are selected to traindifferent geological type labels. And the recognition per-formance is evaluated in an independent test set. -roughthe above steps, the sensitive features are extracted from in-situ data, and the geological recognition system is estab-lished. In this paper, a subway tunnel project constructed bythe tunnel boring machine (EPB-TBM) is taken as a case todiscuss the effectiveness of the above methods. -e proce-dure of the proposed TBM geological recognition system isshown in Figure 2.

    2. Methods

    -e geological recognition system proposed in this papermainly includes the following three steps. First, normali-zation preprocessing is performed to reduce the dominanteffects generated by the difference in dimensions and orderof magnitude between different parameters in the TBM in-situ data. Second, the chi-square test, which is the non-parametric test method in feature selection, is used to selectthe key parameters that are highly sensitive to geologicalvariation as input features. -ird, several typical machinelearning classification algorithms are used to train the datasets with geological labels to obtain the geological recog-nition classifier, which is used to perform the geological typerecognition. -e test set data are used to validate the ac-curacy of the geological recognition system and evaluate theeffectiveness of the method.

    2.1. Data Preprocessing. During the TBM excavation pro-cess, numerous types of information related to machineoperation, such as hundreds of different types of engineeringparameters, including the cylinder thrust, motor torque,cutter head rotational speed, advance rate, guiding attitude,and sealed chamber pressure, can be recorded in real time inthe data acquisition system. -ese engineering parametershave various dimensions, and the corresponding numericalmagnitudes are very different. For example, the cylinderthrust can reach tens of thousands of kN, while the advancerate is usually only tens of millimeters per minute, both ofwhich are important factors reflecting the features of theoperating states of the machine in different geologicalconditions.

    Considering that most feature selection and machinelearning algorithms are not invariant to scale, to preventcertain parameters from playing a dominant role in datamining due to differences in the order of magnitude, all the

    Preprocessing

    TBM in-situdata

    Min-max normalized

    Exploration information

    Geological label

    Start

    End

    Preprocessing data set with labels

    Feature Engineeringand model training

    Chi-square test

    Add to input in descending order of chi-square value

    Machine learning classification algorithm

    10-fold cross validation

    No

    Yes

    Geological recognition classifier

    Reach the requirementof accuracy

    Figure 2: Flow chart of the geological identification system.

    Mathematical Problems in Engineering 3

  • parameters in this work are min-max normalized beforemachine learning classification. -e calculation method is

    xpre �x − xmin

    xmax − xmin, (1)

    where xpre is the dimensionless form after normalizationpretreatment, xmin is the minimum value in the recordeddata of this parameter, and xmax is the maximum value in therecorded data of this parameter.

    Min-max normalization can convert parameters fromdimensional to dimensionless andmap the parameters to theinterval of 0 to 1, so that parameters with different di-mensions and orders of magnitude can be treated as equallyas possible in the subsequent analysis. In addition, the use ofnormalization in the actual solution is beneficial for im-proving the convergence speed and results.

    2.2. Feature Engineering. As mentioned above, the in-situdata include the records of hundreds of engineering pa-rameters, whereas in the existing engineering experience,only a few parameters such as cylinder thrust and motortorque are used to analyze the geological conditions [29, 30].However, it is of great concern to fully investigate thevariation of the parameters in the data with the geologicalconditions, thus achieving effective geological recognition.To this end, it is necessary to more comprehensively con-sider and select the parameters that are highly sensitive togeological changes as the input features for the subsequentmachine learning, namely, to conduct feature engineering.-rough this step, redundant parameters with low corre-lation with geological changes can be removed, while theinformative parameters are retained, which is conducive toimproving the recognition accuracy, reducing the empiricalrisk and avoiding the overfitting problems caused by in-correct generalization due to the accidental nature of certainparameters in engineering.

    -e engineering data often do not follow the commondata distribution forms, and the relationships among manyparameters cannot be explained by independent statisticalanalysis. Instead, the target variable is influenced by acombination of parameters [31]. -erefore, this work usesthe chi-square test algorithm for feature engineering. -echi-square test is a nonparametric test method that repre-sents the degree of the deviation between the observed valueand the theoretical value based on the independence as-sumption, and it does not make assumptions on the datadistribution. Hence, this method is suitable for the analysisof the engineering data in this research. Its basic principle isto evaluate the parameter independence by calculating thedeviation between the theoretical value and the expectedone. -e specific calculation formula is

    χ2 � k

    i�1

    Ni − Ei( 2

    Ei, (2)

    where χ2 is the chi-square value of the parameter, k is thenumber of recorded values, Ni is the actual value, and Ei isthe expected value. χ2 is a measure of the degree to which theexpected value and the actual value deviate from each other.

    -e high value of χ2 indicates that the independent hy-pothesis is incorrect, that is, the parameter as an inputfeature is helpful to judge whether a certain kind of eventoccurs or not.

    -e geological type recognition problem to be solved inthis work is essentially a type of supervised classificationproblem. For the training set data, different geologicaltypes are marked with the supervised learning label in theconstruction area of known geological information. Usingthe geological label as the target, the chi-square test isperformed on the training set data to yield the chi-squarevalue of each parameter under the given geological label.-e values are sorted from the largest to the smallest, andthe first few parameters, that is, those with the highestsensitivity, are selected as the input features of the sub-sequent recognition algorithm. After testing and valida-tion, the input features with the best recognitionperformance are selected as the optimal input features.Due to the long distance of TBM construction and ir-regular geological changes, the in-situ data of TBM aremassive and with the nonuniform distribution of infor-mation. To more effectively evaluate the impact of differentfeature selection strategies on the performance of thegeological recognition system, this paper uses the 10-foldcross-validation method [32], with its basic idea given inFigure 3. -e dataset is divided into ten subsets withsimilar amounts of data. A subset is selected as the test setsuccessively without repetition, and the remaining ninesubsets are used as the training sets until all the subsetshave been validated as test sets once. Finally, the evalu-ation values using the ten test sets are averaged and takenas the final evaluation value of the 10-fold cross-validationmethod. -us, the contingency and randomness problemscaused by the use of a single test set are avoided as much aspossible, giving an insight on how the model will gener-alize to an independent dataset.

    2.3. Applied Algorithms and Classification Metrics.Considering the characteristics of TBM in-situ data, in-cluding high dimensionality, nonlinear coupling of pa-rameters, and high noise, four commonly used supervisedclassification algorithms, namely, KNN (k-nearest neigh-bor), SVM (support vector machine), ANN (artificial neuralnetwork), and CART (classification and regression tree), areselected in this study to express the relationships between theinput features and geological labels and to establish severalcorresponding geological type recognition systems.

    As shown in Figure 4(a), the k-nearest-neighbor (KNN)[33] algorithm is an example-based method, which makesdecisions on prediction by the properties of K sample pointsclosest to the prediction points in the feature space. -eprinciple is simple, and it can adapt to multiclassificationtasks.

    Support vector machine (SVM) [34] illustrated inFigure 4(b) is a geometric method to find the optimalseparating hyperplane through support vector. In thenonlinear case, SVM maps the nonlinear problems in theoriginal space to the high-dimensional space through the

    4 Mathematical Problems in Engineering

  • kernel function, which only needs fewer support vectors tomake decisions and adaptability to the high-dimensionalproblems, making it one of the most widely used machinelearning methods.

    -e basic principle of the artificial neural network(ANN) is shown in Figure 4(c), which is a nonlinear fittingmodel inspired by the biological neural system [35]. It ismainly composed of input layer, hidden layer, and outputlayer. In the hidden layer, it is endowed with nonlinearproperties by complex network structure and activationfunction. Because of its strong nonlinear expression ability,

    it has become one of the most popular fields in machinelearning methods in recent years.

    In Figure 4(d), the classification and regression tree(CART) [36] method is one of the decision tree methods.-rough the Gini index, it constantly searches for the bestfeature and the best segmentation point and divides thebinary tree, so as to complete the classification of the wholedata set. -e biggest characteristic of the cart algorithm isthat it can provide a clear and even visual decision-makingprocess, thus providing useful guidance in practicalengineering.

    D

    D1 D2 D3 D4 D5 D6 D7 D8 D9

    D1 D2 D3 D4 D5 D6 D7 D8 D9 D10

    D9

    D1

    D1 D2 D3 D4 D5 D6 D7 D8 D10

    D10

    D2 D3 D4 D5 D6 D7 D8 D9 D10

    Test setTraining set

    Evaluation 1

    Evaluation 2

    Evaluation 10

    Get average Evaluation of 10-foldcross validation

    Figure 3: Diagram of 10-fold cross validation.

    K = 8

    K = 5

    Sample to be predicted

    ?

    (a)

    Support vectors

    Support vectors

    Optimal separatedhyperplane

    (b)

    Inputlayer

    Hiddenlayer

    Outputlayer

    x1

    x2

    ......

    ...

    y1

    ynxn

    (c)

    If

    If

    IfResult

    ResultResult

    Result

    Ture

    Ture

    Ture

    False

    False

    False

    (d)

    Figure 4: Basic principles of four classification algorithms. (a) KNN. (b) SVM. (c) ANN. (d) CART.

    Mathematical Problems in Engineering 5

  • To quantify the quality of predictions, there are severalmetrics that are adopted to assess the prediction accuracy.Among the supervised classification problems in machinelearning, the accuracy (AR), precision (PR), recall (RE), andF1-score (F1) are the most commonly used indices toevaluate the performance of classifiers. Besides, the confu-sion matrix is a format used to show classification results.For example, the confusion matrix for the binary classifi-cation problem is shown in Table 1, where true positive (TP)is a prediction of a positive class as a positive class, truenegative (TN) is a prediction of a negative class as a negativeclass, false positive (FP) is a prediction of a negative class as apositive class, which is a type I error, and false negative (FN)is a prediction of a positive class as a negative class, which is atype II error.

    Based on a given confusion matrix, the accuracy, pre-cision, and recall can be calculated. -e accuracy is the mostcommon classification evaluation, which represents thenumber of correctly classified samples divided by the totalnumber of samples. -e precision represents the percentageof samples that are correctly classified in the samples that aredetermined to be of a certain class. -e recall is a measure ofthe covering surface and represents the proportion of cor-rectly classified samples in the samples that should beclassified as a certain class. Since precision and recallsometimes conflict with each other, high precision is usuallyaccompanied by a low recall, and vice versa, while the F1-score is a comprehensive evaluation of these two parameters.In this paper, these four indices are used to evaluate geo-logical recognition results. -e calculation method of eachevaluation index is as follows:

    AR �TP + TN

    TP + TN + FP + FN,

    PR �TP

    TP + FP,

    RE �TP

    TP + FN,

    F1 �2 × PR × REPR + RE

    .

    (3)

    3. Results and Discussion

    -e method proposed in this paper is applied to the geo-logical recognition of the actual tunnel engineering, and theapplicability and effectiveness of the method and the rec-ognition performance of different classification algorithmsare discussed in this section. As a preliminary study to usethe machine learning method to recognize geological types,in order to test the feasibility of this method, Tianjin MetroLine 9 and Tianjin Metro Line 3 are discussed in this paper,which are mainly composed of soft soil. A few types areinvolved in the section of the data, such as muddy clay andsilt and silty clay. -e section of Tianjin Metro Line 9 isapproximately 1104m long, constructed using an EPB-TBM. -e construction area of this project mainly passedthrough soft soil such as silty clay, muddy clay, and silty soil.

    -e engineering data used in this paper have 357 parametersrecorded by the data acquisition system during construction.-e sampling frequency was approximately set to every 30 s(approximately advanced by 17mm). In the application, thedataset is divided into training sets and test sets according tocertain proportions. -e training set data are used to es-tablish the geological recognition system, while the test setdata are not involved in the training process but used forindependent testing of the recognition results. To obtain thegeological recognition labels of the supervised classificationalgorithm, the geological survey report obtained from thegeological exploration is used as the prior information.Table 2 lists the basic statistical characteristics of somepresentative TBM tunneling parameters in Tianjin MetroLine 9.

    3.1. Implementation of Feature Selection and Performance ofthe Geological Recognition System. In this section, the rec-ognition accuracy and computational time of the fourgeological classifiers are discussed, with different numbers offeatures selected using the aforementioned engineeringexample.

    For feature engineering dealing with high-dimensionalproblems, it is necessary to comprehensively consider theissues of training precision, computational cost, and possibleoverfitting in the selection of the appropriate number offeatures as the effective input for classifier training. -ere-fore, the effect of the number of different features on therecognition accuracy of the four types of geological classi-fiers is discussed first.

    -e hyperparameters of the algorithms used are set asfollows, the number of hidden layers is 4 and the number ofnodes in each layer is 10 in the ANN. -e distance metric isEuclidean distance for the KNN.-e kernel function of SVMis radial basis unction and the criterion in CART is Ginicoefficient.

    In the Tianjin Metro Line 9, the variation results of thegeological recognition by KNN, SVM, ANN, and CART areshown in Figure 5. All the points are given by the accuracyfrom 10-fold cross validation. In this figure, it is shown thatas the features are added to the feature input according to thechi-square values, the accuracy of the geological recognitionmodels gradually increases. Eventually, the algorithms havegood recognition performance, with the accuracy exceeding96% after K> 4. Performance of algorithms is discussedbased on the results in Figure 5. Among the four algorithms,the recognition performance of KNN is significantly goodand the classification result reaches an accuracy of 99.9%whenK� 3 in KNN, probably because the TBM constructionis a continuous process so that KNN can find similar samplesfor decision in the high-density TBM data collection moreeffectively. While the performance of SVM is inferior to

    Table 1: Confusion matrix for a binary classification problem.

    Positive NegativeTrue True positive (TP) True negative (TN)False False positive (FP) False negative (FN)

    6 Mathematical Problems in Engineering

  • other algorithms, which may be caused by the difficulty inthe selection of the hyperparameters resulting from thecomplex distribution characteristics and noise phenomenonof TBM in-situ data.

    Figure 5 can also provide some references for thenumber of inputs for this multiple input problem. Formost algorithms, the recognition results are generallygood when K = 7. Subsequently, with the increasingnumber of features, the accuracy only slightly improves.Combined with the consideration of the calculation andfeature acquisition costs and the complexity of the rec-ognition system, K = 7 is used for feature combination asthe optimal feature selection strategy for the geologicalrecognition operation in this work. In addition, sincefeature selection is an inherent part of the CART algo-rithm, it does not participate in the discussion of the chi-square test and the number of input features can becontrolled by adjusting the depth of the tree. -e top 7features selected by this method and their chi-squarevalues and P-values are shown in Table 3.

    To discuss the dependence of the proposed geologicalrecognition system on the amount of data in the trainingset, 10% of the samples are randomly selected from theengineering datasets as the training set, and theremaining 90% samples are used as an independent testset. -e above feature selection results are used as inputfor the training classifiers to validate the recognitionaccuracy again using the independent test set. -ecomputational costs of the training and predicting for

    these four types of the classifier are compared. -e resultsare shown in Table 4. For the SVM, ANN, and KNNclassifiers, the computational time of the chi-square test isexcluded, and the duration of each algorithm from thetraining set fitting to the test set prediction is measured. Itshould be noted that the feature selection of CART isincluded in its training process. Table 4 demonstrates thateven only 10% of the samples are used for training, theoptimal feature combination selected by the feature se-lection algorithm still retains excellent recognition per-formance when 90% of the samples are used forprediction and validation.

    In the Tianjin Metro Line 9, the computational cost ofthe CART-based geological classifier is significantly smallerthan those of the other three. -e prediction time of theKNN classifier is the longest and is significantly longer thanthose of the other three classifiers, since KNN is an in-stance-based algorithm and the training process of KNN isonly a storing process. Moreover, each prediction requiresthe calculation of the distances between the point to bepredicted and all the sample points in the training set,resulting in a longer prediction time with regard to a largeamount of data. In addition, for other classifiers, theprediction of the test set is relatively fast after the training iscompleted.

    3.2. Generalization Ability of Geological Recognition Systems.-e generalization ability is an important indicator toevaluate whether a learner has the overfitting phenomena,which is a prerequisite for the practical application of theproposed method in engineering problems. -e general-ization ability generally refers to the adaptability of themachine learning method for predicting the new data, thatis, whether a reasonable output can still be achieved whena dataset outside the training set is given. In this work,instead of using the data from Tianjin Metro Line 9,Tianjin Metro Line 3 is used as the engineering examplefor generalization validation. -e statistical characteristicsof its dataset involved in the calculation are shown inTable 5.

    -is project and Tianjin Metro Line 9 were bothconstructed using the same EPB shield, and they are lo-cated in the same city with similar geological conditions.In this section, the generalization of the geological rec-ognition system proposed in this paper is investigated byusing the geological recognition system established on thebasis of the in-situ data of the Tianjin Metro Line 9 projectto the geological recognition of Tianjin Metro Line 3,which is another project independent of Tianjin MetroLine 9. Considering the difference of the parameters in

    Table 2: Statistical properties of main parameters from the selected section of Tianjin Metro Line 9.

    Advance rate (mm/min) Cylinder thrust (kN) Cutter head torque (kNm) Cutter head rotational speed (r/min)Max 55.68 24701.68 1706.28 1.1Min 16.84 10436.07 803.08 0.4Average 31.53 15840.43 1264.49 0.94SD 10.04 2905.33 148.71 0.14

    2 4 6 8 10 12 14 16 180Number of feature

    0.65

    0.70

    0.75

    0.80

    0.85

    0.90

    0.95

    1.00

    1.05

    Accu

    racy

    SVMKNN

    CARTANN

    Figure 5: Identification accuracy of Tianjin Metro Line 9 with theincrease of input features.

    Mathematical Problems in Engineering 7

  • different engineering datasets of Line 9 and Line 3, thefeature selection method introduced in Section 2.2 is usedto select the important features in both engineeringdatasets, as the input for the training of the geologicalrecognition system. Based on the training set data fromTianjin Metro Line 9, three classification algorithms(ANN, SVM, and KNN) are used to establish the geo-logical recognition system. -e recognition performanceis verified using the test set data from Line 3. -e eval-uation indicators for recognition performance are shownin Figure 6. -e geological recognition system trainedusing the engineering data of Line 9 can effectively rec-ognize the similar geology in the Line 3 project, such asmuddy clay and silty clay. -e recognition accuracies ofthe three types of classifiers are all above 90%. Among theclassifiers, the ANN outperforms the other two algo-rithms, and algorithms based on KNN and SVM havesimilar prediction results. -e results show that thegeological recognition system based on the existingtraining engineering data can give a reasonable outputwhen applied to new datasets from different projects withsimilar geological conditions, demonstrating a good

    Table 3: Chi-square value and P-value of top 7 features in Tianjin Metro Line 9.

    Name Chi-square value P-valueAccumulated rotation number of screw machine 4913.95

  • generalization ability of the proposed recognitionsystems.

    4. Conclusions

    -is paper proposes a method based on the in-situ datarecorded during TBM construction to conduct geologicaltype recognition. -e main conclusions of this research canbe summarized as follows:

    (1) -e proposed method consists of feature engineeringand machine learning classification methods. -erecognitionmethod based on the analysis of TBM in-situ data can effectively mine the internal influencelaw between variables and provide an effective way toobtain geological information for construction de-cision-making.

    (2) In feature engineering, considering the disobedienceof TBM in-situ data to common distributions, thechi-square test method is chosen for feature selec-tion. Four machine learning classification algo-rithms, ANN, SVM, KNN, and CART, are used forthe nonlinear coupling between features.

    (3) -e proposed method is applied to the geologicalrecognition of urban metro projects constructedwith EPB-TBM in China. -e comparison betweenthe recognition results and the measured geologytypes shows that proposed method is effective. -erecognition accuracy gradually increases with theincrease of input and eventually reaches a flat levelwhen the accuracy of all algorithms is higher than97%. Based on this trend, a selection strategy foroptimal input features is also given that the optimalnumber of input variables for this validation case isseven.

    (4) Studies regarding more advanced applications wouldbe worthwhile: a database with more comprehensivegeological types (such as hard rock, compositeground) is recommended to be established andanalyzed though the presented learning procedure.Moreover, intelligent visual interface could beconducted based on the proposed system for moreconvenient applications.

    Data Availability

    -e data used in this paper are available from the relevantengineering enterprises, which have not been released forcommercial reasons.

    Conflicts of Interest

    All authors declare that there are no conflicts of interest.

    Acknowledgments

    -is research was supported by the National Key R&DProgram of China (grant no. 2018YFB1702505), NationalNatural Science Foundation of China (grant no. 11872269),

    and Natural Science Foundation of Tianjin (grant no.18JCYBJC19600). Support from Prof. Wei Guo in the Schoolof Computer Science and Technology of Tianjin Universityfor the research in this paper is greatly appreciated.

    References

    [1] G. Barla and S. Pelizza, TBM Tunnelling in Difficult GroundConditions, Proceedings of the International Conference onGeotechnical & Geological Engineering, Melbourne,Australia, pp. 329–354, Technomic Publishing Company,2000.

    [2] D. J. Armaghani, M. Koopialipoor, A. Marto, and S. Yagiz,“Application of several optimization techniques for esti-mating TBM advance rate in granitic rocks,” Journal of RockMechanics and Geotechnical Engineering, vol. 11, no. 4,pp. 779–789, 2019.

    [3] A. K. Agrawal, V. M. S. R. Murthy, and S. Chattopadhyaya,“Investigations into reliability, maintainability and availabilityof tunnel boring machine operating in mixed ground con-dition using Markov chains,” Engineering Failure Analysis,vol. 105, pp. 477–489, 2019.

    [4] X. Huang, Q. Liu, K. Shi, Y. Pan, and J. Liu, “Application andprospect of hard rock TBM for deep roadway construction incoal mines,” Tunnelling and Underground Space Technology,vol. 73, pp. 105–126, 2018.

    [5] M. Entacher, G. Winter, and R. Galler, “Cutter force mea-surement on tunnel boring machines-implementation atKoralm tunnel,” Tunnelling and Underground Space Tech-nology, vol. 38, no. 3, pp. 487–496, 2013.

    [6] B. Galler, Q. Guo, Z. Liu et al., “Comprehensive aheadprospecting for hard rock TBM tunneling in complex lime-stone geology: a case study in Jilin, China,” Tunnelling andUnderground Space Technology, vol. 93, Article ID 103045,2019.

    [7] H. Lan, Y. Xia, Z. Zhu, Z. Ji, and J. Mao, “Development of on-line rotational speed monitor system of TBM disc cutter,”Tunnelling and Underground Space Technology, vol. 57,pp. 66–75, 2016.

    [8] B. Tang, H. Cheng, Y. Tang et al., “Experiences of gripperTBM application in shaft coal mine: a case study in Zhangjicoal mine, China,” Tunnelling and Underground SpaceTechnology, vol. 81, pp. 660–668, 2018.

    [9] J. Li, L. Jing, X. Zheng, P. Li, and C. Yang, “Application andoutlook of information and intelligence technology for safeand efficient TBM construction,” Tunnelling and Under-ground Space Technology, vol. 93, p. 103097, 2019.

    [10] T. Krause, “Schildvortrieb mit flüssigkeits-und erdgestützterOrtsbrust,” Technical University of Braunschweig,Braunschweig, Germany, Dissertation, 1987.

    [11] O. T. Blindheim, Boreability Predictions for Tunneling, -eNorwegian Institute of Technology, Trondheim, Norway,1979.

    [12] A. Bruland, Hard Rock Tunnel Boring, Faculty of EngineeringScience & Technology, vol. 3, no. 1, Malé, Maldives, 2000.

    [13] Z. Qian, Y. Kang, Z. Zheng, and L. Wang, “Inverse analysisand modeling for tunneling thrust on shield machine,”Mathematical Problems in Engineering, vol. 2013, Article ID975703, 9 pages, 2013.

    [14] E. Avunduk and H. Copur, “Empirical modeling for pre-dicting excavation performance of EPB TBM based on soilproperties,” Tunnelling and Underground Space Technology,vol. 71, pp. 340–353, 2018.

    Mathematical Problems in Engineering 9

  • [15] F. J. Macias, P. D. Jakobsen, Y. Seo, and A. Bruland, “Influenceof rock mass fracturing on the net penetration rates of hardrock TBMs,” Tunnelling and Underground Space Technology,vol. 44, pp. 108–120, 2014.

    [16] G. Armetti, M. R. Migliazza, F. Ferrari, A. Berti, andP. Padovese, “Geological and mechanical rock mass condi-tions for TBM performance prediction. -e case of “LaMaddalena” exploratory tunnel, Chiomonte (Italy),” Tun-nelling and Underground Space Technology, vol. 77, pp. 115–126, 2018.

    [17] I. M. Vergara, C. Saroglou, and C. Saroglou, “Prediction ofTBM performance in mixed-face ground conditions,” Tun-nelling and Underground Space Technology, vol. 69, pp. 116–124, 2017.

    [18] S. Yagiz, “Utilizing rock mass properties for predicting TBMperformance in hard rock condition,” Tunnelling and Un-derground Space Technology, vol. 23, no. 3, pp. 326–339, 2008.

    [19] C. Zhou, T. Kong, Y. Zhou, H. Zhang, and L. Ding, “Un-supervised spectral clustering for shield tunneling machinemonitoring data with complex network theory,” Automationin Construction, vol. 107, Article ID 102924, 2019.

    [20] D. Bouayad and F. Emeriault, “Modeling the relationshipbetween ground surface settlements induced by shield tun-neling and the operational and geological parameters basedon the hybrid PCA/ANFIS method,” Tunnelling and Un-derground Space Technology, vol. 68, pp. 142–152, 2017.

    [21] S. Mahdevari, S. R. Torabi, and M. Monjezi, “Application ofartificial intelligence algorithms in predicting tunnel con-vergence to avoid TBM jamming phenomenon,” Interna-tional Journal of Rock Mechanics and Mining Sciences, vol. 55,pp. 33–44, 2012.

    [22] K.-C. Hyun, S. Min, H. Choi, J. Park, and I.-M. Lee, “Riskanalysis using fault-tree analysis (FTA) and analytic hierarchyprocess (AHP) applicable to shield TBM tunnels,” Tunnellingand Underground Space Technology, vol. 49, pp. 121–129, 2015.

    [23] A. Salimi, J. Rostami, C. Moormann, and A. Delisio, “Ap-plication of non-linear regression analysis and artificial in-telligence algorithms for performance prediction of hard rockTBMs,” Tunnelling and Underground Space Technology,vol. 58, pp. 236–246, 2016.

    [24] W. Sun, M. Shi, C. Zhang, J. Zhao, and X. Song, “Dynamicload prediction of tunnel boring machine (TBM) based onheterogeneous in-situ data,” Automation in Construction,vol. 92, pp. 23–34, 2018.

    [25] G. Javad and T. Narges, “Application of artificial neuralnetworks to the prediction of tunnel boring machine pene-tration rate,” Mining Science and Technology (China), vol. 20,no. 5, pp. 727–733, 2010.

    [26] A. C. Adoko, C. Gokceoglu, and S. Yagiz, “Bayesian predictionof TBM penetration rate in rock mass,” Engineering Geology,vol. 226, pp. 245–256, 2017.

    [27] S. E. Seker and I. Ocak, “Performance prediction of road-headers using ensemble machine learning techniques,”NeuralComputing and Applications, vol. 31, no. 4, pp. 1103–1116,2019.

    [28] X. Gao, M. Shi, X. Song, C. Zhang, and H. Zhang, “Recurrentneural networks for real-time prediction of TBM operatingparameters,” Automation in Construction, vol. 98, pp. 225–235, 2019.

    [29] S. Li, B. Liu, X. Xu et al., “An overview of ahead geologicalprospecting in tunneling,” Tunnelling and Underground SpaceTechnology, vol. 63, pp. 69–94, 2017.

    [30] T. Yamamoto, S. Shirasagi, S. Yamamoto, Y. Mito, andK. Aoki, “Evaluation of the geological condition ahead of the

    tunnel face by geostatistical techniques using TBM drivingdata,” Tunnelling and Underground Space Technology, vol. 18,no. 2-3, pp. 213–221, 2003.

    [31] C. Zhou, L. Y. Ding, M. J. Skibniewski, H. Luo, andH. T. Zhang, “Data based complex network modeling andanalysis of shield tunneling performance in metro con-struction,” Advanced Engineering Informatics, vol. 38,pp. 168–186, 2018.

    [32] R. Kohavi, “A study of cross-validation and bootstrap foraccuracy estimation and model selection,” in Proceedings ofthe International Joint Conference on Artificial Intelligence,Quebec, Canada, August 1995.

    [33] T. Cover and P. Hart, “Nearest neighbor pattern classifica-tion,” IEEE Transactions on Information:eory, vol. 13, no. 1,pp. 21–27, 1967.

    [34] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A trainingalgorithm for optimal margin classifier,” in Proceedings of theWorkshop on Computational Learning:eory, Pittsburgh, PA,USA, July 1992.

    [35] M. T. Hagan, M. Beale, and M. Beale, Neural Network Design,MIT Press, Cambridge, MA, USA, 2002.

    [36] L. Breiman, Classification and Regression Trees, Routledge,Abingdon, UK, 2017.

    10 Mathematical Problems in Engineering