10
Journal of Petroleum Science Research (JPSR) Volume 3 Issue 2, April 2014 www.jpsr.org doi: 10.14355/jpsr.2014.0302.04 73 Automatic Identification of Formation Iithology from Well Log Data: A Machine Learning Approach Seyyed Mohsen Salehi *1 , Bizhan Honarvar 2 *1 Department of Petroleum Engineering, Omidiyeh Branch, Islamic Azad University, omidiyeh, Iran 2 Islamic Azad University, Fars Science and Research Branch, Shiraz, Iran Emails: *1 [email protected]; 2 [email protected] Received 22 December 2013; Accepted 10 February 2014; Published 14 April 2014 © 2014 Science and Engineering Publishing Company Abstract Determination of the hydrocarbon content and also the successful drilling of petroleum wells are highly contingent upon the lithology of the underground formation. Conventional lithology identification methods are either uneconomical or of high uncertainties.The main aim of this study is to develop an intelligent model based on Least Squares Support Vector Machine (LSSVM) and Coupled Simulated Annealing (CSA) algorithm simply called CSALSSVM for predicting the lithology in one of the Iranian oilfields. To this end, photoelectric index (PEF) values were simulated by CSALSSVM algorithm based on valid well logging data generally known as lithology indicators. Model predictions were compared to the real data obtained from well logging operation and the overall Correlation Coefficient (R2) of 0.993 and Average Absolute Relative Deviation (AARD) of 1.6% were obtained for the total dataset (3243 data points) which shows the robustness of the CSALSSVM algorithm in predicting accurate PEF values. In order to check the validity of the employed well log data,value statistical method was implemented in this study for detecting the possible outliers. However, diagnosing only one single data point as the suspected data or probable outlier reveals the validity of recorded data points and shows high applicability domain of the proposed model. Keywords Lithology; Least Squares Support Vector Machine (LSSVM); Coupled Simulated Annealing (CSA); Outlier Introduction Efficient drilling of hydrocarbon wells in an oilfield certainly entails identification of the lithologies crossed by the well. The knowledge of lithology on a hydrocarbon well can be employed in determining a variety of other parameters, the most important of which is its fluid content. One way of determining the lithologies and lithofacies is to infer from the cuttings obtained during drilling operations. However,it is always uncertain about the depth of the retrieved cuttings and the samples are not usually large enough for accurate and reliable determination of petrophysical parameters (Serra and Abbott, 1982).The other method to obtain such parameters may be through observation and analysis of the core samples taken from underground formation. Nevertheless, this approach is highly expensive and may require a huge amount of time and effort to obtain reliable information about the underground lithofacies. Moreover, different geophysicists and geologists may obtain nonunique results based on their own observations and analysis (Akinyokun et al., 2009; Serra and Abbott, 1982). Considering the constraints mentioned for other methodologies, there has been a growing interest in identification of lithologies through interpretation of well log data which is cheaper, more reliable, and economical than core analysis. Wireline logging provides the advantage of covering the entire geological formation of interest along with providing extensive and exceptional details of the underground formation (Serra and Abbott, 1982). Unfortunately, ambiguities in measurements, mineralogical complexities of geological formations, and many other factors may, in some cases, bring unexpected difficulties to lithology identification from well log interpretations. In this perspective, a number of studies have been undertaken for accurate and reliable determination of

Automatic Identification of Formation Iithology from Well Log Data: A Machine Learning Approach

Embed Size (px)

DESCRIPTION

Automatic Identification of FormationIithology from Well Log Data: A MachineLearning ApproachSeyyed Mohsen Salehi*1, Bizhan Honarvar 2

Citation preview

  • JournalofPetroleumScienceResearch(JPSR)Volume3Issue2,April2014www.jpsr.orgdoi:10.14355/jpsr.2014.0302.04

    73

    AutomaticIdentificationofFormationIithologyfromWellLogData:AMachineLearningApproachSeyyedMohsenSalehi*1,BizhanHonarvar2*1DepartmentofPetroleumEngineering,OmidiyehBranch,IslamicAzadUniversity,omidiyeh,Iran2IslamicAzadUniversity,FarsScienceandResearchBranch,Shiraz,IranEmails:*[email protected];[email protected];Accepted10February2014;Published14April20142014ScienceandEngineeringPublishingCompanyAbstractDetermination of the hydrocarbon content and also thesuccessfuldrillingofpetroleumwellsarehighlycontingentupon the lithology of the underground formation.Conventional lithology identification methods are eitheruneconomicalorofhighuncertainties.Themainaimof thisstudy is to develop an intelligent model based on LeastSquares Support Vector Machine (LSSVM) and CoupledSimulated Annealing (CSA) algorithm simply called CSALSSVM for predicting the lithology in one of the Iranianoilfields.To thisend,photoelectric index (PEF)valuesweresimulated by CSALSSVM algorithm based on valid wellloggingdatagenerallyknownaslithologyindicators.Modelpredictionswere compared to the real data obtained fromwell logging operation and the overall CorrelationCoefficient (R2) of 0.993 and Average Absolute RelativeDeviation(AARD)of1.6%wereobtainedforthetotaldataset(3243datapoints)which shows the robustnessof theCSALSSVMalgorithminpredictingaccuratePEFvalues.Inorderto check the validity of the employedwell log data,valuestatistical method was implemented in this study fordetecting the possible outliers. However, diagnosing onlyone single data point as the suspected data or probableoutlier reveals the validity of recorded data points andshowshighapplicabilitydomainoftheproposedmodel.KeywordsLithology; Least Squares Support Vector Machine (LSSVM);CoupledSimulatedAnnealing(CSA);Outlier

    Introduction

    Efficient drilling of hydrocarbonwells in an oilfieldcertainlyentailsidentificationofthelithologiescrossedby the well. The knowledge of lithology on ahydrocarbonwell can be employed indetermining a

    variety of other parameters, the most important ofwhichisitsfluidcontent.Onewayofdeterminingthelithologiesand lithofacies is to infer from thecuttingsobtained during drilling operations. However,it isalways uncertain about the depth of the retrievedcuttingsandthesamplesarenotusuallylargeenoughfor accurate and reliable determination of petrophysicalparameters(SerraandAbbott,1982).Theothermethod to obtain such parameters may be throughobservation and analysis of the core samples takenfrom underground formation. Nevertheless, thisapproachishighlyexpensiveandmayrequireahugeamount of time and effort to obtain reliableinformation about the underground lithofacies.Moreover,differentgeophysicistsandgeologistsmayobtain nonunique results based on their ownobservations and analysis (Akinyokun et al., 2009;Serra andAbbott, 1982). Considering the constraintsmentioned forothermethodologies, therehasbeen agrowinginterestinidentificationoflithologiesthroughinterpretationofwell logdatawhich ischeaper,morereliable,and economical than coreanalysis.Wirelineloggingprovides theadvantageofcovering theentiregeological formationof interest alongwithprovidingextensiveand exceptionaldetailsof theundergroundformation (Serra and Abbott, 1982). Unfortunately,ambiguities in measurements, mineralogicalcomplexitiesofgeologicalformations,andmanyotherfactors may, in some cases, bring unexpecteddifficulties to lithology identification from well loginterpretations.In this perspective, a number of studies have beenundertaken foraccurateand reliabledeterminationof

  • www.jpsr.orgJournalofPetroleumScienceResearch(JPSR)Volume3Issue2,April2014

    74

    lithologiesbyemploying thedataobtained fromwelllogging operations (Akinyokun et al., 2009;Hsieh etal., 2005; Serra and Abbott, 1982). In recent years,engineers and geoscientists have appliedcomputationalalgorithmsandstatisticalapproachestodefine the lithologies and petrophysicalparameters,furthermore, try to reduce the errors anddifficulties associatedwith conventionalwell logginginterpretations (Akinyokunetal.,2009).Conventionalcomputational algorithms or statisticalmethodsmaybe defective in providing adequate information forlithology identification, especially in carbonate oilreservoirs. Broad families of algorithmic approachesare subsumed under category of machine learningtechniques.Thesealgorithmsarebasedonacoherentstatistical foundation and aim to find reliablepredictions through inferring from a set ofmeasurements. Some researchers have recentlyemployed Artificial Neural Networks (ANNs) toimprovethepastperformanceinsolvingtheproblemsconcernedwith lithologydetermination (Changetal.,2002). However, ANNbased models possess somedeficienciesinreproducingtheobtainedresults,partlyduetorandominitializationofthenetworkparametersandvariationsofstoppingcriteriaduringoptimizationprocesses (Cristianini and ShaweTaylor, 2000;Suykens and Vandewalle, 1999). Recently, supportvector machine (SVM) has been proved to be anestablished and powerful tool employed in solvingseveral complex problems encountered in manydisciplines (Baylar et al., 2009; Byvatov et al., 2003;ScholkopfandSmola,2002;Vapnik,1995).ThisresearchemployedsaleastsquaremodificationofSVM approach called Least Squares Support VectorMachine (LSSVM) in an effort to alleviate theshortcomingsanddeficienciescarriedbyconventionalwell log interpretation methods and previouslyappliedalgorithmicapproaches.Ourmainfocusisthedetermination of lithology from thedata recorded inwirelineloggingoperationfromoneoftheIranianoilwells in Ahwaz oilfield. In this study, caliper log(CALI), sonic log (DT),deep induction resistivity log(ILD), neutron log (NPHI), density log (RHOB), andgamma ray log (CGR) were identified as lithologyindicators. All raw data obtained from wirelineloggingareinitiallycorrectedforenvironmentaleffectsowing to borehole size, mud salinity, etc. Thesecorrections are rendered indispensible prior to anyinterpretationsbeingperformedonwelllogdata.Caliper log is a tool formeasuring thediameter andshape of the wellbore. Caliper logs can be used as

    crude lithological indicators. Shale, bentonite, andcoals tend tocave into thewellbore, soproducinganincreasedwellbore diameter. On the other hand, noborehole deviations are observed in sandstones andcarbonates since they do not tend to cave into thewellbore(Evenick,2008).Insonic logs, thespeedofsound transmitted throughthe formation is recorded in microseconds per foot(s/ft).Theselogsaregoodindicatorsoflithologyanddensity since transmission ratehighlydependon themediathatthesoundispassingthrough.Deepinductionresistivitylogsrecordtheresistanceofa formation to flow of electricity far away from theinvasioncoreproducedbydrillingmudinOhmmeter(m).Most rocks are insulators andmost formationfluids are electrical conductors. High resistivity isrecorded when the formation contains hydrocarbon(Akinyokunetal.,2009;Evenick,2008).A neutron log normally measures a formationsporositybaseduponthequantityofhydrogenpresentin the formation. It is mainly used in lithologyidentification,porosity evaluation, anddifferentiationbetween liquids and gases due to their dissimilarhydrogen contents (Akinyokun et al., 2009; Evenick,2008).Thedensity logmeasures theporosityofa formationbased on the assumed density of the formation anddrillingfluidingramspercubiccentimeter(g/cm3).Itcanalsobeemployedindifferentiationbetweengasesand liquids through crossplotting the overestimatedporosity values (from density logs) andunderestimated porosity values (from neutron logs)(Akinyokunetal.,2009).Gamma ray logsare indicatorsof radioactivityof theformation as shalefree sandstones and carbonatesyieldlowgammarayvalues.Shalesontheotherhandusually exhibit high gamma ray readings if theycontain adequate amounts of accessory mineralscontaining isotopes like potassium, uranium, and/orthorium(Hsiehetal.,2005).This article is organized in the following sections. Inthe section 2, acquisition of data and assembleddatabase are explained in detail. In section 3 detailsand equations behind the intelligent model areprovidedalongwithsomediscussionsonadvantagesanddisadvantagesofsomemethodsbasedonmachinelearningtheory.Insection4,resultsobtainedfromtheLSSVMmodel are comparedwith realwell log dataandaccuracyofthemodelisfullydescribed.Finally,a

  • JournalofPetroleumScienceResearch(JPSR)Volume3Issue2,April2014www.jpsr.org

    75

    statisticalmethod is applied fordetermination of thepossibleoutliersandalsoforinvestigatingthevalidityoftheemployeddataset.

    Data Acquisition

    Boreholegeophysicaldatawereobtained from anoilwell inAhwaz Iranian oilfield. Some of thewell logdatawereselectedas indicatorsof lithology.Foreachdatapoint,thesearecaliperlog(CALI),soniclog(DT),deep induction resistivity log (ILD), neutron log(NPHI), density log (RHOB), and gamma ray log(CGR). These readings were then connected tophotoelectric index (PEF) which is a supplementarymeasurementusedforrecordingtheadsorptionoflowenergygammaraysbytheformationinunitsofbarnsper electron. The logged values are directlyproportional to the aggregate atomic number of theelementsinformation,thusitisasensitiveindicatorofmineralogy and has to be predicted with highaccuracy.Figure1 indicatesdifferentvaluesofPEF indifferentformationlithologies.Atotalnumberof3243logreadingswereassembledintoadatasetincluding7inputs (lithology indicator logs) and 1 output (PEFvalues).TheoverallrangeofrecordeddataalongwiththeiraverageandstandarddeviationsaresummarizedinTableI.

    FIGURE1MEASUREMENTSOFPHOTOELECTRICINDEX(PEF)

    FORDIFFERENTUNDERGROUNDLITHOLOGIES

    TABLEIRANGESOFINPUT/OUTPUTVARIABLESUSEDFORDEVELOPINGANDTESTINGTHEMODEL

    Parameter Minimum Maximum Average StandardDepth(m) 2575.712 3075.889 2827.878 124.5312CALI(in) 8.1504 22.2763 9.345049 0.659798DT(s/ft) 53.1954 113.1356 77.09043 9.722123ILD(m) 0.1975 1705.562 12.79944 15.99413NPHI(p.u) 0.041645 0.494965 0.199554 0.047319

    RHOB(g/cm3) 1.4736 2.8639 2.420654 0.158964CGR(API) 0.0139 111.2971 30.33772 19.87745

    PEF(barn/electron) 1.8121 6.635 3.096314 0.845851

    Details Of The Intelligent Model

    SupportVectorMachine(SVM)The concept of SVM was initially introduced byVapnik (1995) as a supervised learning algorithm forsolving several classification and functionapproximation problems (Moser and Serpico, 2009;Suykens, 2001). SVM has a number of distinctadvantages as compared to traditional learningmethods based on ANN (Byvatov et al., 2003;Cristianini and ShaweTaylor, 2000; Suykens andVandewalle,1999):

    1) In contrast toANN, theneed fordeterminingthe topology of the network is eliminated inSVManditisautomaticallyestablishedduringthelearningprocess.

    2) Possibility of overfitting or underfitting isminimized inSVMparadigmby incorporatingastructuralriskminimization(SRM)strategy.

    3) InSVM,a limitednumberofparametersneedto be adjusted during learning process,comparedto largenumberofadjustingweightfactorsinANNmodels.

    Assuming 1 1 n nS (x , y ),...,(x , y ) where ix representsinputpatterns(CALI,DT,ILD,NPHI,RHOB,RT,andCGR), iy denotesoutputdata(PEFinthisstudy)andnis the totalnumberof recordeddata.SVMemploysanonlinear mapping procedure in order to map theinput parameters into a higher dimensional or eveninfinite dimensional feasible space (Cristianini andShaweTaylor, 2000; Suykens andVandewalle, 1999).Thus, themain aim of SVM is to locate an optimumhyperplane,fromwhichallexperimentaldatahaveaminimum distance. Assuming that the data samplesare linearly separable, the form of decision functionemployed by SVM is represented as follows

  • www.jpsr.orgJournalofPetroleumScienceResearch(JPSR)Volume3Issue2,April2014

    76

    (Cristianini and ShaweTaylor, 2000; Suykens andVandewalle,1999):

    tf(x) w g(x) b (1)where g(x) is the mapping function, w and b areweight vectors and bias terms, respectively, andsuperscript t denotes the transpose of the weightmatrix. The decision function is subjected to thefollowing condition under the assumption that thedatafromtwoclassesareseparable:

    1 11 1

    i i

    i i

    f(x ) if yf(x ) if y

    (2)

    Support vectors (SVs) are selected from a pool oftrainingdatawhichsatisfy theconstraints (Cristianiniand ShaweTaylor, 2000; Suykens and Vandewalle,1999).Iftheproblemislinearlyseparableinthefeaturemargin, there will be unlimited number of decisionfunctionswhich satisfy the Equation (2).Hence, theoptimal separating plane can bedetermined throughmaximizingthemarginandminimizingthenoisebyaslack margin introduced below (Cristianini andShaweTaylor,2000;SuykensandVandewalle,1999):

    2

    1

    1min2

    n

    ii

    ( w ) C

    (3)whereC is a positive constantwhich is the tradeoffbetween maximum margin and minimumclassificationerror, is theslackvariable representingthedistancebetweendatapointsinthefalseclassandmarginoftheirvirtualclass.Taking into consideration the equations presentedearlier,wehaveatypicalconvexoptimizationproblemthat can be solved using the Lagrange multipliersmethod given below (Baylar et al., 2009; Cristianiniand ShaweTaylor, 2000; Suykens and Vandewalle,1999):

    1 1 1

    1, , 12 2

    n n nt ti i i i i i i

    i i i

    Cg(w,b, ) w w (y w x b )

    (4)

    where,aretheLagrangemultipliers.Thesolutionisdefined through the saddle point of the Lagrangianwhen thevalueof i isgreater thanzero (Cristianiniand ShaweTaylor, 2000; Suykens and Vandewalle,1999). Owing to the specific formalism of the SVMalgorithm, sparse solutions can be found for bothlinearandnonlinear regressionproblems (Cristianiniand ShaweTaylor, 2000; Suykens and Vandewalle,1999).

    LeastSquaresSupportVectorMachine(LSSVM)Regardless of outstanding performance of SVM forsolvingstaticfunctionapproximationproblems,ithasa higher computational burden, owing to requiredconstraint optimization programming (Haifeng andDejin, 2005).Thus, application of SVM in large scalefunctionapproximationproblemswithawiderangeofexperimentaldata is limitedby the timeandmemoryconsumed during optimization (Haifeng and Dejin,2005).InanefforttominimizethecomplexityofSVMandalsotoenhanceitsspeedofconvergence,SuykensandVandewalle(1999)proposedamodifiedversionofSVM, called Least Squares Support Vector Machine(LSSVM). In LSSVM, equality constraints are usedinstead of inequality ones employed in traditionalSVM (Haifeng and Dejin, 2005; Suykens andVandewalle,1999).AlthoughLSSVMbenefitsfromthesame advantages as SVM; however, the optimumsolutioncanbeobtainedthroughsolvingasetoflinearequations (linearprogramming) rather than solvingaquadratic programming (Gharagheizi et al., 2011;Suykens and Vandewalle, 1999). In general, thefollowing equation is implemented as an objectivefunction in order to train the LSSVM algorithm(SuykensandVandewalle,1999):

    2

    1

    12

    nti

    iQ w w e

    (5)

    whereitissubjectedtothefollowinglinearconstraints:1 2ty w (x ) b e , i , ,...,ni i i (6)

    In Equations (5) and (6), ei represents the regressionerror relevant to n number data set; denotes therelativeweightregardingthesummationofregressionerrors compared to regression weight. Regressionweight coefficient (w) can be written in terms ofLagrangian multiplier (i) and input vector (xi) asrepresented below (Farasat et al., 2013; Fazavi et al.,2013;RafieeTaghanaki et al., 2013; Shokrollahi et al.,2013):

    1

    n

    i ii

    w x

    where

    2i ie (7)Considering the assumption that a linear regressionexists between the dependent and independentparametersoftheLSSVMalgorithm,equation(15)canbe reformulated as (Farasat et al., 2013;Fazavi et al.,2013;RafieeTaghanaki et al., 2013; Shokrollahi et al.,2013):

  • JournalofPetroleumScienceResearch(JPSR)Volume3Issue2,April2014www.jpsr.org

    77

    1

    nt

    i ii

    y x x b

    (8)Thus, after some mathematical manipulations, theLagrangemultipliers in equation can be determinedfrom following relationships (Farasat et al., 2013;Fazavi et al., 2013; RafieeTaghanaki et al., 2013;Shokrollahietal.,2013):

    1

    ( )(2 )

    ii ti

    y bx x

    (9)

    The linear regression equation developed earlier canbeconverted tononlinear formemploying theKernelfunction as follows (Farasat et al., 2013;Fazavi et al.,2013;RafieeTaghanaki et al., 2013; Shokrollahi et al.,2013):

    1( ) ( , )

    n

    i ii

    f x K x x b

    (10)where ( , )iK x x is the Kernel function obtained frominnerproductofvectors(x)and(xi) in thefeasiblemargin as is represented below (Farasat et al., 2013;Fazavi et al., 2013; RafieeTaghanaki et al., 2013;Shokrollahietal.,2013):

    (11)

    TheKernel function implemented in this study is theradial basis function (RBF) which is one the mostpowerfulkernelfunctionscommonlyemployedinthisfield(Farasatetal.,2013;RafieeTaghanakietal.,2013;Shokrollahietal.,2013):

    (12)

    where 2 is squared bandwidthwhich is optimizedthroughanexternaloptimizationtechniqueduringthetrainingprocess.Themean squared error (MSE)between the realPEFvalues and those of predicted by LSSVM algorithmwasdefinedas (Farasatetal.,2013;RafieeTaghanakietal.,2013;Shokrollahietal.,2013):

    1( )

    i i

    n

    pred reali

    PEF PEFMSE

    N

    (13)

    wherePEFrepresentsthePEFvalues,Nisthenumberoftrainingobjectsandsubscriptspredandrealdenotethe predicted and real PEF values, respectively. TheLSSVMalgorithmemployed in thisstudy to train thewell logdatahasbeendevelopedbyPelckmansetal.(2002)andSuykensandVandewalle(1999).Inordertoenhancemodelperformanceduring learningprocess,

    Coupled Simulated Annealing (CSA) algorithm wasemployed to optimize two of themodel parameterscontrolling its accuracy and convergence namely, and 2 .CoupledSimulatedAnnealingSimulatedAnnealing(SA)isapopulationbasedsearchmethod which is usually used for combinatorialoptimization problems. The method was initiallyproposed by Metropolis et al. (1953), and waspopularized by Kirkpatrick et al. (1983) afterwards.Themotivationbehindthismethodliesinthephysicalprocessofannealing,duringwhichametalisheatedtoa liquid stateand then cooled slowly enough thatallcrystalgrainseventuallyreachtothelowestminimuminner energy. Like the metal cooling process, SAgradually converges to the optimum solutionwhichfurther guarantees global optimum accomplishmentandevadesthelocaloptimality(Fabian,1997).This study employs theCoupleSimulatedAnnealing(CSA)proposedbyXavierdeSouzaetal.(2010)inaneffort to enhance thequalityofoptimizationprocess.TheconceptofCSAwasinspiredbytheCoupledLocalMinimizers(CLM)inwhichmultiplegradientdescentoptimizers are used instead of multistart gradientdescentforoptimizationproblem.CSAdescribesasetof individual SA processes coupled by a term inacceptanceprobability function.TheaimofCSA is toobtainafasterandrobustconvergence.ThecouplingisafunctionofthecurrentcostsofalltheindividualSAprocesses (XavierdeSouza et al., 2010). Theinformationbetween individualSA isshared throughboth coupling term and acceptance probabilityfunction,allowingforcontrollinggeneraloptimizationindicator using optimization control parameters(XavierdeSouza et al., 2010). While the acceptanceprobabilityofanuphillmoveintraditionalSAisoftengiven by Metropolis rule (Metropolis et al., 1953),which depends merely on the current and probingsolution, CSA considers other current solutions aswell.Thisprobabilityisalsodependentonthecostsofsolutionsthroughacouplingterm instateset S ,where S is the set of all possible solutions. isgenerally believed to be a function of all costs ofsolution in . Theacceptanceprobability function inCSA, A ,isrepresentedasfollows:

    exp ( ( ) max ( )) /( , ) i

    ai x i k

    i i

    E x E x TA x y

    (14)

    ( , ) ( ) . ( )ti iK x x x x

    2 2( , ) exp( / )iiK x x x x

  • www.jpsr.orgJournalofPetroleumScienceResearch(JPSR)Volume3Issue2,April2014

    78

    FIGURE2ATYPICALFLOWCHARTREPRESENTINGTHECSA

    LSSVMALGORITH

    where akT is the acceptance temperature, xi and yirepresent individual solutions in and theircorresponding probing solution, respectively. Andcouplingterm, ,isgivenas:

    ( ) max ( )exp( )ii x ial k

    E x E x

    T

    (15)

    This study proposes a CSAbased approach forparameter optimization and feature selection inLSSVM, termed CSALSSVM.A typical flowchart oftheCSALSSVMalgorithm is shown inFigure2.Theobjective functionofCSALSSVMwhensearching foroptimummodelparameters is tominimize theMeanSquaredError(MSE)giveninEquation(13).

    Result And Discussion

    ModelAccuracyAndValidationIn this research, CSALSSVM algorithm wasimplemented in order to obtainPEF as a function ofseveral other measurements recorded during welllogging operation. PEF can be used as a generalindicatoroflithologiesandmineralogicalcomplexitiesofdifferentlayersofformation.Inthisstudy,PEFwaslinked to some otherparameters generally known aslithologyindicators:PEF=f(Depth,CALI,DT,ILD,NPHI,RHOB,CGR)(16)

    Inthenextstep,assembledwelllogdatawereinitiallydivided into three subsets namely, train, validationand test.TheTrainset isemployed toperformandgenerate themodel structure, the Validation set isappliedforadjustingthemodelparametersandalsotocheck the validity of the patterns learned by CSALSSVM over thewhole range ofdataset, andfinally,the Test set is used to investigate the finalperformance and validity of the proposedmodel forunseendata.To increase themodel applicability androbustness,thewholedatabasewasdividedrandomlyinto70%,15%,and15% fractionsof themaindatasetfortheTrainset(2270datapoints),theValidationset (486 data points), and the Test set (487 datapoints),respectively.RBF kernel functionwas implemented in this studydue to its superior performance compared to otherkernel types like linear or polynomial kernels. CSAalgorithm was then implemented for tuning theLSSVM parameters during learning process. Theoptimumvaluesfoundfortheseparametersattheendof optimization process were: 284.8173 and 2 0.9916 .TABLEIISTATISTICALPARAMETERSOFTHEPROPOSEDCSALSSVMMODEL

    STATISTICALPARAMETERSTRAINSETR2 0.995

    AVERAGEABSOLUTERELATIVEDEVIATION 1.3STANDARDDEVIATIONERROR 0.84ROOTMEANSQUAREERROR 0.07

    N 2270VALIDATIONSETR2 0.987

    AVERAGEABSOLUTERELATIVEDEVIATION 2.2STANDARDDEVIATIONERROR 0.82ROOTMEANSQUAREERROR 0.11

    N 486TESTSET

    R2 0.985AVERAGEABSOLUTERELATIVEDEVIATION 2.2

    STANDARDDEVIATIONERROR 0.86ROOTMEANSQUAREERROR 0.12

    N 487TOTAL

    R2 0.993AVERAGEABSOLUTERELATIVEDEVIATION 1.6

    STANDARDDEVIATIONERROR 0.84ROOTMEANSQUAREERROR 0.08

    N 3243

    Trn.

    set

    Tst.

    set

    Read well logdataset

    Employ featuresubset( and 2 )

    Construct PEF prediction model

    Evaluate model accuracy

    Re-train LSSVM using the optimum features

    Final CSA-LSSVM model

    Implement Coupled

    Simulated Annealing (CSA)

    Select Model features ( and 2 )

    Meet stopping criteria?

    Optimum Model features( and 2 )obtained

    Vld

    n. se

    t

    NoO

    Yes

  • JournalofPetroleumScienceResearch(JPSR)Volume3Issue2,April2014www.jpsr.org

    79

    2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    5.5

    6

    6.5

    Real PEF

    LSSV

    M pre

    dictio

    n of P

    EF

    45 lineTrainValidationTest

    R2 = 0.993

    FIGURE3GRAPHICALREPRESENTATIONOFPEFVALUESPREDICTEDBYCSALSSVMALGORITHMVERSUSREALPEF

    VALUES.

    0 500 1000 1500 2000 25001

    2

    3

    4

    5

    6

    7

    Total number of train data

    PEF

    valu

    es

    Real PEFLSSVM prediction

    FIGURE4COMPARISONBETWEENCSALSSVMMODELPREDICTIONSANDREALDATAFORTRAINDATASET

    0 50 100 150 200 250 300 350 400 450 5001.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    5.5

    Total number of validation data

    PEF

    valu

    es

    Real PEFLSSVM prediction

    FIGURE5COMPARISONBETWEENCSALSSVMMODEL

    PREDICTIONSANDREALDATAFORVALIDATIONDATASET

    0 50 100 150 200 250 300 350 400 450 5001.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    5.5

    Total number of test data

    PEF

    valu

    es

    Real PEFLSSVM prediction

    FIGURE6COMPARISONBETWEENCSALSSVMMODELPREDICTIONSANDREALDATAFORTESTDATASET

    1 0.5 0 0.50

    100

    200

    300

    400

    500

    600

    Relative deviation

    Data

    freq

    uenc

    y

    TrainValidationTest

    FIGURE7HISTOGRAMOFERRORFREQUENCYSKETCHED

    FORALLDATAINCLUDINGTRAIN,VALIDATION,ANDTESTSETS

    Some statistical parameters indicating the accuracyand validity of the proposedmodel are outlined inTable II.A totalCorrelationCoefficient (R2)of 0.993,AverageAbsoluteRelativeDeviation(AARD)of1.6%,Standard Deviation Error (STD) of 0.84, and RootMean Squared Error (RMSE) of 0.08 highly confirmstheaccuracyandvalidityoftheCSALSSVMmodelinprediction of PEF values from well log data.RegressionplotofrealPEFvaluesandthosepredictedbyCSALSSVMmodel is also shown inFigure 3, forTrain, Validation, and Test data sets. Highconcentration of data around the 45 line indicates agood agreement betweenmodel predictions and realPEF values. Deviations of the real PEF values fromthosepredictedbyCSALSSVMmodelarealsoshownin Figures 46 for Train, Validation, and Test set,

  • www.jpsr.orgJournalofPetroleumScienceResearch(JPSR)Volume3Issue2,April2014

    80

    respectively. Obviously, model predictions and thereal values approximately overlap suggesting smalldeviations and high accordance. Frequency of errorsbetweenmodelpredictionsandrealPEFdatahasalsobeenplottedinFigure7.Thisfigureindicatesanormalerror distribution which is a measure of robustnessandaccuracyinthedevelopedLSSVMmodel.

    a 2

    2

    2

    ( ( ) exp.( ))1

    ( (exp.( )))

    N

    iN

    i

    pred i iR

    pred average i

    b 100 | ( ) exp.( ) |%exp.( )

    N

    i

    pred i iAARDN i

    c 2( ( ) ( ( )))N

    i

    error i average error iSTDN

    OutlierDetectionInPEFMeasurementsDeveloping a valid and highly applicablemodel forpredicting PEF values from well log measurements,recordeddatamustbereliableandaccurate.However,accuratemeasurementsofwell logdata isalmostnotfeasibleandenvironmentalinterferencesinsomecasesmay introduce some flawed measurements intorecorded database. These observations usually differfrombulkofthedataandareconsideredasamenaceto successful lithology prediction. Thus, constructingan accurate and reliablemodel is highly dependentupondetectingthesevaluesfromwellloggingdata.In order to successfully diagnose the suspectedmeasurements, the leveragevaluestatisticalapproachwas implemented in this study. The calculationprocedure according to this method includesdeterminationoftheresidualvaluesforalldatapoints(i.e. deviations between CSALSSVM modelpredictionsandrealPEFvalues)andamatrixreferredto as Hatmatrix composed of real data and valuespredicted by the model. In general, Hat matrix isconstructed as follows (Eslamimanesh et al., 2013;Goodall,1993;Gramatica,2007):

    1( )t tH X X X X (17)where X is a twodimensional matrix containing mrows (representing total number of employed data)and n columns (representing total number ofmodelparameters) and t denotes the transpose operator.Diagonal elementsofHatmatrix indicate the feasibleregionof theproblem.GraphicaldetectionofoutliersisusuallycarriedoutthroughsketchingtheWilliamsplot according to the H values calculated fromEquation (17) (Eslamimanesh et al., 2013; Goodall,

    1993;Gramatica,2007;Mohammadi etal.,2012).Thisplot represents the correlation existing between Hatindices and standardized crossvalidated residuals.Awarning leverage (H*) is typicallydefined equally to3(n+1)/m,wheremdenotesthetotalnumberofdatasetand n represents the number of inputparameters.Aleveragevalueof3 isgenerallyconsideredas thecutoffvalue toaccept themeasurementswithin 3 rangestandard deviations from the mean (represented astwo green lines) (Eslamimanesh et al., 2013;Goodall,1993; Gramatica, 2007; Mohammadi et al., 2012).Existence of themajority of data points in the range

    *0 H H and 3 3R revealsthehighapplicabilityand reliability of developed model. Based on thesevalues, suspected datamay be categorized into twotypesnamely, leveragepointsandregressionoutliers.Leveragepoints are also subdivided into twogroupsnamely,good leveragepoint andbad leveragepoint.Good leverage points are those data points locatedbetween *H H and 3 3R . Although thesemeasurements possess high leverage values, they donot necessarily affect the correlation coefficient andtheyareclose to the linearoundwhichmostdataarecentered.Badleveragepointsarethosemeasurementsin the rangeofR>3orR

  • JournalofPetroleumScienceResearch(JPSR)Volume3Issue2,April2014www.jpsr.org

    81

    0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.016

    4

    2

    0

    2

    4

    6

    Hat

    Stan

    dard

    ized r

    esidu

    al

    FIGURE8DETECTIONOFPROBABLEOUTLIERSOR

    SUSPECTEDDATAFROMTHEWHOLERECORDEDDATASET

    Conclusions

    In this study,Least Squares SupportVectorMachine(LSSVM) was implemented to obtain formationlithologyfromwelllogdataobtainedfromanoilwellin Ahvaz Iranian oilfield. In order to optimize theLSSVM parameters, Coupled Simulated Annealing(CSA) algorithm was implemented to construct ahybrid approach calledCSALSSVM.Using theCSALSSVM algorithm, photoelectric index (PEF) wassimulated based on the well logging data obtainedfromundergroundformation.ModelpredictionswerecomparedwithrealPEFvaluesandoverallCorrelationCoefficient (R2) of 0.993 and Average AbsoluteRelative Deviation (AARD) of 1.6% were obtainedshowing high accuracy ofCSALSSVM in predictingPEF values. Excellent accordance was observedbetween simulated and realPEFvalues in this studywhich corroborates the validity of developedmodel.Also, a statistical approach was implemented fordetermining the suspected data and possible outliersfrom overall PEF recordings. It was found thatemployed database is highly accurate and only onedata point was diagnosed of following a differentpatternfromtherestofthedataset.Thus,thissuggeststhehigh applicabilitydomainof thedevelopedCSALSSVMmodel inpredictingPEFvaluesfromwell logdata.Developedmodelcanfurtherbeimplementedinadjacent wells with an acceptable accuracy forlithologypredictionduringdrillingoperations.

    NOMENCLATUREA AcceptanceprobabilityfunctionTak Acceptancetemperatureei Regressionerror

    K(x, x )i Kernelfunction2R Coefficientofdetermination

    AARD AverageAbsoluteRelativeDeviations,%B BiastermC Positiveconstant

    CALI CaliperlogCGR CorrectedgammarayCLM CoupledLocalMinimizersCSA CoupledSimulatedAnnealingDT SoniclogH HatmatrixILD Deepinductionresistivitylogg(x) Mappingfunction

    LSSVM LeastSquaresSupportedVectorMachineM Numberofemployeddata

    MSE MeanSquaredErrorN TotalnumberofmodelparametersN Numberoftrainingobjects

    NPHI NeutronlogQ Injectionrate,cc/minR Residual

    RMSE RootMeanSquaredErrorsRHOB Densitylog

    S SetofallpossiblesolutionsSA SimulatedAnnealingSTD StandardDeviationErrorT TransposeW AnonlinearfunctionX InputsX Atwodimensionalmatrix(mn)Y Outputs

    GREEKLETTERS2 Squaredbandwidth Couplingterm Aofsubsetofallpossiblesolutions, Lagrangemultipliers Relativeweightofthesummationoftheregression

    errors Slackvariable

    REFERENCES:

    Akinyokun,O.C.,Enikanselu,P.A.,Adeyemo,A.B.,Adesida,A., 2009. Well Log Interpretation Model for theDetermination of Lithology and Fluid Contents. ThePacificJournalofScienceandTechnology10,507517.

    Baylar,A.,Hanbay,D.,Batan,M.,2009.Applicationof leastsquare support vector machines in the prediction ofaeration performance of plunging overfall jets from

  • www.jpsr.orgJournalofPetroleumScienceResearch(JPSR)Volume3Issue2,April2014

    82

    weirs.ExpertSyst.Appl.36,83688374.Byvatov, E., Fechner,U., Sadowski, J., Schneider,G., 2003.

    Comparison of support vector machine and artificialneuralnetwork systems fordrug/nondrug classification.Journal of chemical information and computer sciences43,18821889.

    Chang, H.C., KopaskaMerkel, D.C., Chen, H.C., 2002.Identification of lithofacies using Kohonen selforganizingmaps.Computers&Geosciences28,223229.

    Cristianini, N., ShaweTaylor, J., 2000. An introduction tosupport Vector Machines: and other kernelbasedlearningmethods.CambridgeUniversityPress.

    Eslamimanesh, A., Gharagheizi, F., Mohammadi, A.H.,Richon, D., 2013. Assessment test of sulfur content ofgases.FuelProcessingTechnology110,133140.

    Evenick, J.,2008. Introduction toWellLogsandSubsurfaceMaps.PennWell.

    Fabian,V.,1997.Simulatedannealingsimulated.Computers&MathematicswithApplications33,8194.

    Farasat, A., Shokrollahi, A., Arabloo, M., Gharagheizi, F.,Mohammadi,A.H.,2013.Towardanintelligentapproachfordeterminationofsaturationpressureofcrudeoil.FuelProcess.Technol.Fazavi,M.,Hosseini,S.M.,Arabloo,M.,Shokrollahi, A., Amani, M., 2013. Applying a SmartTechnique for Accurate Determination of FlowingOil/Water Pressure Gradient inHorizontal Pipelines. J.DispersionSci.Technol.

    Gharagheizi,F.,Eslamimanesh,A.,Farjood,F.,Mohammadi,A.H., Richon, D., 2011. Solubility Parameters ofNonelectrolyte Organic Compounds: DeterminationUsing Quantitative StructureProperty RelationshipStrategy.Ind.Eng.Chem.50,1138211395.

    Goodall, C.R., 1993. Computation using the QRdecomposition,HandbookofStatistics.Elsevier,pp.467508.

    Gramatica,P., 2007.Principles ofQSARmodelsvalidation:internalandexternal.QSAR&CombinatorialScience26,694701.

    Haifeng,W.,Dejin,H., 2005.Comparison of SVM and LSSVM for Regression, Neural Networks and Brain, pp.279283.

    Hsieh, B.Z., Lewis, C., Lin, Z.S., 2005. Lithologyidentificationofaquifersfromgeophysicalwell logsandfuzzy logicanalysis:ShuiLinArea,Taiwan.Computers

    &Geosciences31,263275.Kirkpatrick,S.,Gelatt,C.D.,Vecchi,M.P.,1983.Optimization

    bySimulatedAnnealing.Science220,671680.Metropolis,N.,Rosenbluth,A.W.,Rosenbluth,M.N.,Teller,

    A.H.,Teller,E., 1953.Equation of StateCalculations byFast Computing Machines. The Journal of ChemicalPhysics21,10871092.

    Mohammadi, A.H., Eslamimanesh, A., Gharagheizi, F.,Richon, D., 2012. A novel method for evaluation ofasphaltene precipitation titration data. ChemicalEngineeringScience78,181185.

    Moser,G.,Serpico,S.B.,2009.ModelingtheErrorStatisticsinSupportVectorRegressionofSurfaceTemperatureFromInfraredData.IEEEGeosci.RemoteSens.Lett.6,448452.

    Pelckmans,K.,Suykens, J.A.K.,Gestel,T.V.,Brabanter, J.D.,Lukas,L.,Hamers,B.,Moor,B.D.,Vandewalle, J.,2002.LSSVMlab: a MATLAB/C toolbox for Least SquaresSupportVectorMachines,Leuven,Belgium.

    RafieeTaghanaki, S.,Arabloo,M.,Chamkalani,A.,Amani,M., Zargari, M.H., Adelzadeh, M.R., 2013.Implementation of SVM framework to estimate PVTpropertiesofreservoiroil.FluidPhaseEquilib.346,2532.

    Scholkopf, B.S., Smola, A.J., 2002. Learning With Kernels:Support VectorMachines, Regularization,OptimizationandBeyond.UniversityPressGroupLimited.

    Serra,O.,Abbott,H.T., 1982. TheContribution of LoggingData to Sedimentology and Stratigraphy. Society ofPetroleumEngineersJournal22,117131.

    Shokrollahi,A.,Arabloo,M.,Gharagheizi, F.,Mohammadi,A.H., 2013. Intelligent model for prediction of CO2 Reservoiroilminimummiscibilitypressure. J.Fuel 112,375384.

    Suykens, J., Vandewalle, J., 1999. Least squares supportvectormachine classifiers.Neural Processing Letters 9,293300.

    Suykens,J.A.K.,2001.SupportVectorMachines:ANonlinearModellingandControlPerspective.Eur.J.Control7,311327.

    Vapnik, V., 1995. The nature of statistical learning theory.SpringerVerlag,NewYork.

    XavierdeSouza, S., Suykens, J.A.K., Vandewalle, J., Bolle,D., 2010.Coupled SimulatedAnnealing. Systems,Man,andCybernetics,PartB:Cybernetics, IEEETransactionson40,320335.