Automatic Identification of Formation Iithology from Well Log Data: A Machine Learning Approach

JournalofPetroleumScienceResearch(JPSR)Volume3Issue2,April2014www.jpsr.orgdoi:10.14355/jpsr.2014.0302.04

73

AutomaticIdentificationofFormationIithologyfromWellLogData:AMachineLearningApproachSeyyedMohsenSalehi*1,BizhanHonarvar2*1DepartmentofPetroleumEngineering,OmidiyehBranch,IslamicAzadUniversity,omidiyeh,Iran2IslamicAzadUniversity,FarsScienceandResearchBranch,Shiraz,IranEmails:*[email protected];[email protected];Accepted10February2014;Published14April20142014ScienceandEngineeringPublishingCompanyAbstractDetermination of the hydrocarbon content and also thesuccessfuldrillingofpetroleumwellsarehighlycontingentupon the lithology of the underground formation.Conventional lithology identification methods are eitheruneconomicalorofhighuncertainties.Themainaimof thisstudy is to develop an intelligent model based on LeastSquares Support Vector Machine (LSSVM) and CoupledSimulated Annealing (CSA) algorithm simply called CSALSSVM for predicting the lithology in one of the Iranianoilfields.To thisend,photoelectric index (PEF)valuesweresimulated by CSALSSVM algorithm based on valid wellloggingdatagenerallyknownaslithologyindicators.Modelpredictionswere compared to the real data obtained fromwell logging operation and the overall CorrelationCoefficient (R2) of 0.993 and Average Absolute RelativeDeviation(AARD)of1.6%wereobtainedforthetotaldataset(3243datapoints)which shows the robustnessof theCSALSSVMalgorithminpredictingaccuratePEFvalues.Inorderto check the validity of the employedwell log data,valuestatistical method was implemented in this study fordetecting the possible outliers. However, diagnosing onlyone single data point as the suspected data or probableoutlier reveals the validity of recorded data points andshowshighapplicabilitydomainoftheproposedmodel.KeywordsLithology; Least Squares Support Vector Machine (LSSVM);CoupledSimulatedAnnealing(CSA);Outlier

Introduction

Efficient drilling of hydrocarbonwells in an oilfieldcertainlyentailsidentificationofthelithologiescrossedby the well. The knowledge of lithology on ahydrocarbonwell can be employed indetermining a

variety of other parameters, the most important ofwhichisitsfluidcontent.Onewayofdeterminingthelithologiesand lithofacies is to infer from thecuttingsobtained during drilling operations. However,it isalways uncertain about the depth of the retrievedcuttingsandthesamplesarenotusuallylargeenoughfor accurate and reliable determination of petrophysicalparameters(SerraandAbbott,1982).Theothermethod to obtain such parameters may be throughobservation and analysis of the core samples takenfrom underground formation. Nevertheless, thisapproachishighlyexpensiveandmayrequireahugeamount of time and effort to obtain reliableinformation about the underground lithofacies.Moreover,differentgeophysicistsandgeologistsmayobtain nonunique results based on their ownobservations and analysis (Akinyokun et al., 2009;Serra andAbbott, 1982). Considering the constraintsmentioned forothermethodologies, therehasbeen agrowinginterestinidentificationoflithologiesthroughinterpretationofwell logdatawhich ischeaper,morereliable,and economical than coreanalysis.Wirelineloggingprovides theadvantageofcovering theentiregeological formationof interest alongwithprovidingextensiveand exceptionaldetailsof theundergroundformation (Serra and Abbott, 1982). Unfortunately,ambiguities in measurements, mineralogicalcomplexitiesofgeologicalformations,andmanyotherfactors may, in some cases, bring unexpecteddifficulties to lithology identification from well loginterpretations.In this perspective, a number of studies have beenundertaken foraccurateand reliabledeterminationof

www.jpsr.orgJournalofPetroleumScienceResearch(JPSR)Volume3Issue2,April2014

74

lithologiesbyemploying thedataobtained fromwelllogging operations (Akinyokun et al., 2009;Hsieh etal., 2005; Serra and Abbott, 1982). In recent years,engineers and geoscientists have appliedcomputationalalgorithmsandstatisticalapproachestodefine the lithologies and petrophysicalparameters,furthermore, try to reduce the errors anddifficulties associatedwith conventionalwell logginginterpretations (Akinyokunetal.,2009).Conventionalcomputational algorithms or statisticalmethodsmaybe defective in providing adequate information forlithology identification, especially in carbonate oilreservoirs. Broad families of algorithmic approachesare subsumed under category of machine learningtechniques.Thesealgorithmsarebasedonacoherentstatistical foundation and aim to find reliablepredictions through inferring from a set ofmeasurements. Some researchers have recentlyemployed Artificial Neural Networks (ANNs) toimprovethepastperformanceinsolvingtheproblemsconcernedwith lithologydetermination (Changetal.,2002). However, ANNbased models possess somedeficienciesinreproducingtheobtainedresults,partlyduetorandominitializationofthenetworkparametersandvariationsofstoppingcriteriaduringoptimizationprocesses (Cristianini and ShaweTaylor, 2000;Suykens and Vandewalle, 1999). Recently, supportvector machine (SVM) has been proved to be anestablished and powerful tool employed in solvingseveral complex problems encountered in manydisciplines (Baylar et al., 2009; Byvatov et al., 2003;ScholkopfandSmola,2002;Vapnik,1995).ThisresearchemployedsaleastsquaremodificationofSVM approach called Least Squares Support VectorMachine (LSSVM) in an effort to alleviate theshortcomingsanddeficienciescarriedbyconventionalwell log interpretation methods and previouslyappliedalgorithmicapproaches.Ourmainfocusisthedetermination of lithology from thedata recorded inwirelineloggingoperationfromoneoftheIranianoilwells in Ahwaz oilfield. In this study, caliper log(CALI), sonic log (DT),deep induction resistivity log(ILD), neutron log (NPHI), density log (RHOB), andgamma ray log (CGR) were identified as lithologyindicators. All raw data obtained from wirelineloggingareinitiallycorrectedforenvironmentaleffectsowing to borehole size, mud salinity, etc. Thesecorrections are rendered indispensible prior to anyinterpretationsbeingperformedonwelllogdata.Caliper log is a tool formeasuring thediameter andshape of the wellbore. Caliper logs can be used as

crude lithological indicators. Shale, bentonite, andcoals tend tocave into thewellbore, soproducinganincreasedwellbore diameter. On the other hand, noborehole deviations are observed in sandstones andcarbonates since they do not tend to cave into thewellbore(Evenick,2008).Insonic logs, thespeedofsound transmitted throughthe formation is recorded in microseconds per foot(s/ft).Theselogsaregoodindicatorsoflithologyanddensity since transmission ratehighlydependon themediathatthesoundispassingthrough.Deepinductionresistivitylogsrecordtheresistanceofa formation to flow of electricity far away from theinvasioncoreproducedbydrillingmudinOhmmeter(m).Most rocks are insulators andmost formationfluids are electrical conductors. High resistivity isrecorded when the formation contains hydrocarbon(Akinyokunetal.,2009;Evenick,2008).A neutron log normally measures a formationsporositybaseduponthequantityofhydrogenpresentin the formation. It is mainly used in lithologyidentification,porosity evaluation, anddifferentiationbetween liquids and gases due to their dissimilarhydrogen contents (Akinyokun et al., 2009; Evenick,2008).Thedensity logmeasures theporosityofa formationbased on the assumed density of the formation anddrillingfluidingramspercubiccentimeter(g/cm3).Itcanalsobeemployedindifferentiationbetweengasesand liquids through crossplotting the overestimatedporosity values (from density logs) andunderestimated porosity values (from neutron logs)(Akinyokunetal.,2009).Gamma ray logsare indicatorsof radioactivityof theformation as shalefree sandstones and carbonatesyieldlowgammarayvalues.Shalesontheotherhandusually exhibit high gamma ray readings if theycontain adequate amounts of accessory mineralscontaining isotopes like potassium, uranium, and/orthorium(Hsiehetal.,2005).This article is organized in the following sections. Inthe section 2, acquisition of data and assembleddatabase are explained in detail. In section 3 detailsand equations behind the intelligent model areprovidedalongwithsomediscussionsonadvantagesanddisadvantagesofsomemethodsbasedonmachinelearningtheory.Insection4,resultsobtainedfromtheLSSVMmodel are comparedwith realwell log dataandaccuracyofthemodelisfullydescribed.Finally,a

JournalofPetroleumScienceResearch(JPSR)Volume3Issue2,April2014www.jpsr.org

75

statisticalmethod is applied fordetermination of thepossibleoutliersandalsoforinvestigatingthevalidityoftheemployeddataset.

Data Acquisition

Boreholegeophysicaldatawereobtained from anoilwell inAhwaz Iranian oilfield. Some of thewell logdatawereselectedas indicatorsof lithology.Foreachdatapoint,thesearecaliperlog(CALI),soniclog(DT),deep induction resistivity log (ILD), neutron log(NPHI), density log (RHOB), and gamma ray log(CGR). These readings were then connected tophotoelectric index (PEF) which is a supplementarymeasurementusedforrecordingtheadsorptionoflowenergygammaraysbytheformationinunitsofbarnsper electron. The logged values are directlyproportional to the aggregate atomic number of theelementsinformation,thusitisasensitiveindicatorofmineralogy and has to be predicted with highaccuracy.Figure1 indicatesdifferentvaluesofPEF indifferentformationlithologies.Atotalnumberof3243logreadingswereassembledintoadatasetincluding7inputs (lithology indicator logs) and 1 output (PEFvalues).TheoverallrangeofrecordeddataalongwiththeiraverageandstandarddeviationsaresummarizedinTableI.

FIGURE1MEASUREMENTSOFPHOTOELECTRICINDEX(PEF)

FORDIFFERENTUNDERGROUNDLITHOLOGIES

TABLEIRANGESOFINPUT/OUTPUTVARIABLESUSEDFORDEVELOPINGANDTESTINGTHEMODEL

Parameter Minimum Maximum Average StandardDepth(m) 2575.712 3075.889 2827.878 124.5312CALI(in) 8.1504 22.2763 9.345049 0.659798DT(s/ft) 53.1954 113.1356 77.09043 9.722123ILD(m) 0.1975 1705.562 12.79944 15.99413NPHI(p.u) 0.041645 0.494965 0.199554 0.047319

RHOB(g/cm3) 1.4736 2.8639 2.420654 0.158964CGR(API) 0.0139 111.2971 30.33772 19.87745

PEF(barn/electron) 1.8121 6.635 3.096314 0.845851

Details Of The Intelligent Model

SupportVectorMachine(SVM)The concept of SVM was initially introduced byVapnik (1995) as a supervised learning algorithm forsolving several classification and functionapproximation problems (Moser and Serpico, 2009;Suykens, 2001). SVM has a number of distinctadvantages as compared to traditional learningmethods based on ANN (Byvatov et al., 2003;Cristianini and ShaweTaylor, 2000; Suykens andVandewalle,1999):

1) In contrast toANN, theneed fordeterminingthe topology of the network is eliminated inSVManditisautomaticallyestablishedduringthelearningprocess.

2) Possibility of overfitting or underfitting isminimized inSVMparadigmby incorporatingastructuralriskminimization(SRM)strategy.

3) InSVM,a limitednumberofparametersneedto be adjusted during learning process,comparedto largenumberofadjustingweightfactorsinANNmodels.

Assuming 1 1 n nS (x , y ),...,(x , y ) where ix representsinputpatterns(CALI,DT,ILD,NPHI,RHOB,RT,andCGR), iy denotesoutputdata(PEFinthisstudy)andnis the totalnumberof recordeddata.SVMemploysanonlinear mapping procedure in order to map theinput parameters into a higher dimensional or eveninfinite dimensional feasible space (Cristianini andShaweTaylor, 2000; Suykens andVandewalle, 1999).Thus, themain aim of SVM is to locate an optimumhyperplane,fromwhichallexperimentaldatahaveaminimum distance. Assuming that the data samplesare linearly separable, the form of decision functionemployed by SVM is represented as follows


76

(Cristianini and ShaweTaylor, 2000; Suykens andVandewalle,1999):

tf(x) w g(x) b (1)where g(x) is the mapping function, w and b areweight vectors and bias terms, respectively, andsuperscript t denotes the transpose of the weightmatrix. The decision function is subjected to thefollowing condition under the assumption that thedatafromtwoclassesareseparable:

1 11 1

i i

i i

f(x ) if yf(x ) if y

(2)

Support vectors (SVs) are selected from a pool oftrainingdatawhichsatisfy theconstraints (Cristianiniand ShaweTaylor, 2000; Suykens and Vandewalle,1999).Iftheproblemislinearlyseparableinthefeaturemargin, there will be unlimited number of decisionfunctionswhich satisfy the Equation (2).Hence, theoptimal separating plane can bedetermined throughmaximizingthemarginandminimizingthenoisebyaslack margin introduced below (Cristianini andShaweTaylor,2000;SuykensandVandewalle,1999):

2

1

1min2

n

ii

( w ) C

(3)whereC is a positive constantwhich is the tradeoffbetween maximum margin and minimumclassificationerror, is theslackvariable representingthedistancebetweendatapointsinthefalseclassandmarginoftheirvirtualclass.Taking into consideration the equations presentedearlier,wehaveatypicalconvexoptimizationproblemthat can be solved using the Lagrange multipliersmethod given below (Baylar et al., 2009; Cristianiniand ShaweTaylor, 2000; Suykens and Vandewalle,1999):

1 1 1

1, , 12 2

n n nt ti i i i i i i

i i i

Cg(w,b, ) w w (y w x b )

(4)

where,aretheLagrangemultipliers.Thesolutionisdefined through the saddle point of the Lagrangianwhen thevalueof i isgreater thanzero (Cristianiniand ShaweTaylor, 2000; Suykens and Vandewalle,1999). Owing to the specific formalism of the SVMalgorithm, sparse solutions can be found for bothlinearandnonlinear regressionproblems (Cristianiniand ShaweTaylor, 2000; Suykens and Vandewalle,1999).

LeastSquaresSupportVectorMachine(LSSVM)Regardless of outstanding performance of SVM forsolvingstaticfunctionapproximationproblems,ithasa higher computational burden, owing to requiredconstraint optimization programming (Haifeng andDejin, 2005).Thus, application of SVM in large scalefunctionapproximationproblemswithawiderangeofexperimentaldata is limitedby the timeandmemoryconsumed during optimization (Haifeng and Dejin,2005).InanefforttominimizethecomplexityofSVMandalsotoenhanceitsspeedofconvergence,SuykensandVandewalle(1999)proposedamodifiedversionofSVM, called Least Squares Support Vector Machine(LSSVM). In LSSVM, equality constraints are usedinstead of inequality ones employed in traditionalSVM (Haifeng and Dejin, 2005; Suykens andVandewalle,1999).AlthoughLSSVMbenefitsfromthesame advantages as SVM; however, the optimumsolutioncanbeobtainedthroughsolvingasetoflinearequations (linearprogramming) rather than solvingaquadratic programming (Gharagheizi et al., 2011;Suykens and Vandewalle, 1999). In general, thefollowing equation is implemented as an objectivefunction in order to train the LSSVM algorithm(SuykensandVandewalle,1999):

2

1

12

nti

iQ w w e

(5)

whereitissubjectedtothefollowinglinearconstraints:1 2ty w (x ) b e , i , ,...,ni i i (6)

In Equations (5) and (6), ei represents the regressionerror relevant to n number data set; denotes therelativeweightregardingthesummationofregressionerrors compared to regression weight. Regressionweight coefficient (w) can be written in terms ofLagrangian multiplier (i) and input vector (xi) asrepresented below (Farasat et al., 2013; Fazavi et al.,2013;RafieeTaghanaki et al., 2013; Shokrollahi et al.,2013):

1

n

i ii

w x

where

2i ie (7)Considering the assumption that a linear regressionexists between the dependent and independentparametersoftheLSSVMalgorithm,equation(15)canbe reformulated as (Farasat et al., 2013;Fazavi et al.,2013;RafieeTaghanaki et al., 2013; Shokrollahi et al.,2013):


77

1

nt

i ii

y x x b

(8)Thus, after some mathematical manipulations, theLagrangemultipliers in equation can be determinedfrom following relationships (Farasat et al., 2013;Fazavi et al., 2013; RafieeTaghanaki et al., 2013;Shokrollahietal.,2013):

1

( )(2 )

ii ti

y bx x

(9)

The linear regression equation developed earlier canbeconverted tononlinear formemploying theKernelfunction as follows (Farasat et al., 2013;Fazavi et al.,2013;RafieeTaghanaki et al., 2013; Shokrollahi et al.,2013):

1( ) ( , )

n

i ii

f x K x x b

(10)where ( , )iK x x is the Kernel function obtained frominnerproductofvectors(x)and(xi) in thefeasiblemargin as is represented below (Farasat et al., 2013;Fazavi et al., 2013; RafieeTaghanaki et al., 2013;Shokrollahietal.,2013):

(11)

TheKernel function implemented in this study is theradial basis function (RBF) which is one the mostpowerfulkernelfunctionscommonlyemployedinthisfield(Farasatetal.,2013;RafieeTaghanakietal.,2013;Shokrollahietal.,2013):

(12)

where 2 is squared bandwidthwhich is optimizedthroughanexternaloptimizationtechniqueduringthetrainingprocess.Themean squared error (MSE)between the realPEFvalues and those of predicted by LSSVM algorithmwasdefinedas (Farasatetal.,2013;RafieeTaghanakietal.,2013;Shokrollahietal.,2013):

1( )

i i

n

pred reali

PEF PEFMSE

N

(13)

wherePEFrepresentsthePEFvalues,Nisthenumberoftrainingobjectsandsubscriptspredandrealdenotethe predicted and real PEF values, respectively. TheLSSVMalgorithmemployed in thisstudy to train thewell logdatahasbeendevelopedbyPelckmansetal.(2002)andSuykensandVandewalle(1999).Inordertoenhancemodelperformanceduring learningprocess,

Coupled Simulated Annealing (CSA) algorithm wasemployed to optimize two of themodel parameterscontrolling its accuracy and convergence namely, and 2 .CoupledSimulatedAnnealingSimulatedAnnealing(SA)isapopulationbasedsearchmethod which is usually used for combinatorialoptimization problems. The method was initiallyproposed by Metropolis et al. (1953), and waspopularized by Kirkpatrick et al. (1983) afterwards.Themotivationbehindthismethodliesinthephysicalprocessofannealing,duringwhichametalisheatedtoa liquid stateand then cooled slowly enough thatallcrystalgrainseventuallyreachtothelowestminimuminner energy. Like the metal cooling process, SAgradually converges to the optimum solutionwhichfurther guarantees global optimum accomplishmentandevadesthelocaloptimality(Fabian,1997).This study employs theCoupleSimulatedAnnealing(CSA)proposedbyXavierdeSouzaetal.(2010)inaneffort to enhance thequalityofoptimizationprocess.TheconceptofCSAwasinspiredbytheCoupledLocalMinimizers(CLM)inwhichmultiplegradientdescentoptimizers are used instead of multistart gradientdescentforoptimizationproblem.CSAdescribesasetof individual SA processes coupled by a term inacceptanceprobability function.TheaimofCSA is toobtainafasterandrobustconvergence.ThecouplingisafunctionofthecurrentcostsofalltheindividualSAprocesses (XavierdeSouza et al., 2010). Theinformationbetween individualSA isshared throughboth coupling term and acceptance probabilityfunction,allowingforcontrollinggeneraloptimizationindicator using optimization control parameters(XavierdeSouza et al., 2010). While the acceptanceprobabilityofanuphillmoveintraditionalSAisoftengiven by Metropolis rule (Metropolis et al., 1953),which depends merely on the current and probingsolution, CSA considers other current solutions aswell.Thisprobabilityisalsodependentonthecostsofsolutionsthroughacouplingterm instateset S ,where S is the set of all possible solutions. isgenerally believed to be a function of all costs ofsolution in . Theacceptanceprobability function inCSA, A ,isrepresentedasfollows:

exp ( ( ) max ( )) /( , ) i

ai x i k

i i

E x E x TA x y

(14)

( , ) ( ) . ( )ti iK x x x x

2 2( , ) exp( / )iiK x x x x


78

FIGURE2ATYPICALFLOWCHARTREPRESENTINGTHECSA

LSSVMALGORITH

where akT is the acceptance temperature, xi and yirepresent individual solutions in and theircorresponding probing solution, respectively. Andcouplingterm, ,isgivenas:

( ) max ( )exp( )ii x ial k

E x E x

T

(15)

This study proposes a CSAbased approach forparameter optimization and feature selection inLSSVM, termed CSALSSVM.A typical flowchart oftheCSALSSVMalgorithm is shown inFigure2.Theobjective functionofCSALSSVMwhensearching foroptimummodelparameters is tominimize theMeanSquaredError(MSE)giveninEquation(13).

Result And Discussion

ModelAccuracyAndValidationIn this research, CSALSSVM algorithm wasimplemented in order to obtainPEF as a function ofseveral other measurements recorded during welllogging operation. PEF can be used as a generalindicatoroflithologiesandmineralogicalcomplexitiesofdifferentlayersofformation.Inthisstudy,PEFwaslinked to some otherparameters generally known aslithologyindicators:PEF=f(Depth,CALI,DT,ILD,NPHI,RHOB,CGR)(16)

Inthenextstep,assembledwelllogdatawereinitiallydivided into three subsets namely, train, validationand test.TheTrainset isemployed toperformandgenerate themodel structure, the Validation set isappliedforadjustingthemodelparametersandalsotocheck the validity of the patterns learned by CSALSSVM over thewhole range ofdataset, andfinally,the Test set is used to investigate the finalperformance and validity of the proposedmodel forunseendata.To increase themodel applicability androbustness,thewholedatabasewasdividedrandomlyinto70%,15%,and15% fractionsof themaindatasetfortheTrainset(2270datapoints),theValidationset (486 data points), and the Test set (487 datapoints),respectively.RBF kernel functionwas implemented in this studydue to its superior performance compared to otherkernel types like linear or polynomial kernels. CSAalgorithm was then implemented for tuning theLSSVM parameters during learning process. Theoptimumvaluesfoundfortheseparametersattheendof optimization process were: 284.8173 and 2 0.9916 .TABLEIISTATISTICALPARAMETERSOFTHEPROPOSEDCSALSSVMMODEL

STATISTICALPARAMETERSTRAINSETR2 0.995

AVERAGEABSOLUTERELATIVEDEVIATION 1.3STANDARDDEVIATIONERROR 0.84ROOTMEANSQUAREERROR 0.07

N 2270VALIDATIONSETR2 0.987

AVERAGEABSOLUTERELATIVEDEVIATION 2.2STANDARDDEVIATIONERROR 0.82ROOTMEANSQUAREERROR 0.11

N 486TESTSET

R2 0.985AVERAGEABSOLUTERELATIVEDEVIATION 2.2

STANDARDDEVIATIONERROR 0.86ROOTMEANSQUAREERROR 0.12

N 487TOTAL

R2 0.993AVERAGEABSOLUTERELATIVEDEVIATION 1.6

STANDARDDEVIATIONERROR 0.84ROOTMEANSQUAREERROR 0.08

N 3243

Trn.

set

Tst.

set

Read well logdataset

Employ featuresubset( and 2 )

Construct PEF prediction model

Evaluate model accuracy

Re-train LSSVM using the optimum features

Final CSA-LSSVM model

Implement Coupled

Simulated Annealing (CSA)

Select Model features ( and 2 )

Meet stopping criteria?

Optimum Model features( and 2 )obtained

Vld

n. se

t

NoO

Yes


79

2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

2

2.5

3

3.5

4

4.5

5

5.5

6

6.5

Real PEF

LSSV

M pre

dictio

n of P

EF

45 lineTrainValidationTest

R2 = 0.993

FIGURE3GRAPHICALREPRESENTATIONOFPEFVALUESPREDICTEDBYCSALSSVMALGORITHMVERSUSREALPEF

VALUES.

0 500 1000 1500 2000 25001

2

3

4

5

6

7

Total number of train data

PEF

valu

es

Real PEFLSSVM prediction

FIGURE4COMPARISONBETWEENCSALSSVMMODELPREDICTIONSANDREALDATAFORTRAINDATASET

0 50 100 150 200 250 300 350 400 450 5001.5

2

2.5

3

3.5

4

4.5

5

5.5

Total number of validation data

PEF

valu

es


FIGURE5COMPARISONBETWEENCSALSSVMMODEL

PREDICTIONSANDREALDATAFORVALIDATIONDATASET

0 50 100 150 200 250 300 350 400 450 5001.5

2

2.5

3

3.5

4

4.5

5

5.5

Total number of test data

PEF

valu

es


FIGURE6COMPARISONBETWEENCSALSSVMMODELPREDICTIONSANDREALDATAFORTESTDATASET

1 0.5 0 0.50

100

200

300

400

500

600

Relative deviation

Data

freq

uenc

y

TrainValidationTest

FIGURE7HISTOGRAMOFERRORFREQUENCYSKETCHED

FORALLDATAINCLUDINGTRAIN,VALIDATION,ANDTESTSETS

Some statistical parameters indicating the accuracyand validity of the proposedmodel are outlined inTable II.A totalCorrelationCoefficient (R2)of 0.993,AverageAbsoluteRelativeDeviation(AARD)of1.6%,Standard Deviation Error (STD) of 0.84, and RootMean Squared Error (RMSE) of 0.08 highly confirmstheaccuracyandvalidityoftheCSALSSVMmodelinprediction of PEF values from well log data.RegressionplotofrealPEFvaluesandthosepredictedbyCSALSSVMmodel is also shown inFigure 3, forTrain, Validation, and Test data sets. Highconcentration of data around the 45 line indicates agood agreement betweenmodel predictions and realPEF values. Deviations of the real PEF values fromthosepredictedbyCSALSSVMmodelarealsoshownin Figures 46 for Train, Validation, and Test set,


80

respectively. Obviously, model predictions and thereal values approximately overlap suggesting smalldeviations and high accordance. Frequency of errorsbetweenmodelpredictionsandrealPEFdatahasalsobeenplottedinFigure7.Thisfigureindicatesanormalerror distribution which is a measure of robustnessandaccuracyinthedevelopedLSSVMmodel.

a 2

2

2

( ( ) exp.( ))1

( (exp.( )))

N

iN

i

pred i iR

pred average i

b 100 | ( ) exp.( ) |%exp.( )

N

i

pred i iAARDN i

c 2( ( ) ( ( )))N

i

error i average error iSTDN

OutlierDetectionInPEFMeasurementsDeveloping a valid and highly applicablemodel forpredicting PEF values from well log measurements,recordeddatamustbereliableandaccurate.However,accuratemeasurementsofwell logdata isalmostnotfeasibleandenvironmentalinterferencesinsomecasesmay introduce some flawed measurements intorecorded database. These observations usually differfrombulkofthedataandareconsideredasamenaceto successful lithology prediction. Thus, constructingan accurate and reliablemodel is highly dependentupondetectingthesevaluesfromwellloggingdata.In order to successfully diagnose the suspectedmeasurements, the leveragevaluestatisticalapproachwas implemented in this study. The calculationprocedure according to this method includesdeterminationoftheresidualvaluesforalldatapoints(i.e. deviations between CSALSSVM modelpredictionsandrealPEFvalues)andamatrixreferredto as Hatmatrix composed of real data and valuespredicted by the model. In general, Hat matrix isconstructed as follows (Eslamimanesh et al., 2013;Goodall,1993;Gramatica,2007):

1( )t tH X X X X (17)where X is a twodimensional matrix containing mrows (representing total number of employed data)and n columns (representing total number ofmodelparameters) and t denotes the transpose operator.Diagonal elementsofHatmatrix indicate the feasibleregionof theproblem.GraphicaldetectionofoutliersisusuallycarriedoutthroughsketchingtheWilliamsplot according to the H values calculated fromEquation (17) (Eslamimanesh et al., 2013; Goodall,

1993;Gramatica,2007;Mohammadi etal.,2012).Thisplot represents the correlation existing between Hatindices and standardized crossvalidated residuals.Awarning leverage (H*) is typicallydefined equally to3(n+1)/m,wheremdenotesthetotalnumberofdatasetand n represents the number of inputparameters.Aleveragevalueof3 isgenerallyconsideredas thecutoffvalue toaccept themeasurementswithin 3 rangestandard deviations from the mean (represented astwo green lines) (Eslamimanesh et al., 2013;Goodall,1993; Gramatica, 2007; Mohammadi et al., 2012).Existence of themajority of data points in the range

*0 H H and 3 3R revealsthehighapplicabilityand reliability of developed model. Based on thesevalues, suspected datamay be categorized into twotypesnamely, leveragepointsandregressionoutliers.Leveragepoints are also subdivided into twogroupsnamely,good leveragepoint andbad leveragepoint.Good leverage points are those data points locatedbetween *H H and 3 3R . Although thesemeasurements possess high leverage values, they donot necessarily affect the correlation coefficient andtheyareclose to the linearoundwhichmostdataarecentered.Badleveragepointsarethosemeasurementsin the rangeofR>3orR


81

0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.016

4

2

0

2

4

6

Hat

Stan

dard

ized r

esidu

al

FIGURE8DETECTIONOFPROBABLEOUTLIERSOR

SUSPECTEDDATAFROMTHEWHOLERECORDEDDATASET

Conclusions

In this study,Least Squares SupportVectorMachine(LSSVM) was implemented to obtain formationlithologyfromwelllogdataobtainedfromanoilwellin Ahvaz Iranian oilfield. In order to optimize theLSSVM parameters, Coupled Simulated Annealing(CSA) algorithm was implemented to construct ahybrid approach calledCSALSSVM.Using theCSALSSVM algorithm, photoelectric index (PEF) wassimulated based on the well logging data obtainedfromundergroundformation.ModelpredictionswerecomparedwithrealPEFvaluesandoverallCorrelationCoefficient (R2) of 0.993 and Average AbsoluteRelative Deviation (AARD) of 1.6% were obtainedshowing high accuracy ofCSALSSVM in predictingPEF values. Excellent accordance was observedbetween simulated and realPEFvalues in this studywhich corroborates the validity of developedmodel.Also, a statistical approach was implemented fordetermining the suspected data and possible outliersfrom overall PEF recordings. It was found thatemployed database is highly accurate and only onedata point was diagnosed of following a differentpatternfromtherestofthedataset.Thus,thissuggeststhehigh applicabilitydomainof thedevelopedCSALSSVMmodel inpredictingPEFvaluesfromwell logdata.Developedmodelcanfurtherbeimplementedinadjacent wells with an acceptable accuracy forlithologypredictionduringdrillingoperations.

NOMENCLATUREA AcceptanceprobabilityfunctionTak Acceptancetemperatureei Regressionerror

K(x, x )i Kernelfunction2R Coefficientofdetermination

AARD AverageAbsoluteRelativeDeviations,%B BiastermC Positiveconstant

CALI CaliperlogCGR CorrectedgammarayCLM CoupledLocalMinimizersCSA CoupledSimulatedAnnealingDT SoniclogH HatmatrixILD Deepinductionresistivitylogg(x) Mappingfunction

LSSVM LeastSquaresSupportedVectorMachineM Numberofemployeddata

MSE MeanSquaredErrorN TotalnumberofmodelparametersN Numberoftrainingobjects

NPHI NeutronlogQ Injectionrate,cc/minR Residual

RMSE RootMeanSquaredErrorsRHOB Densitylog

S SetofallpossiblesolutionsSA SimulatedAnnealingSTD StandardDeviationErrorT TransposeW AnonlinearfunctionX InputsX Atwodimensionalmatrix(mn)Y Outputs

GREEKLETTERS2 Squaredbandwidth Couplingterm Aofsubsetofallpossiblesolutions, Lagrangemultipliers Relativeweightofthesummationoftheregression

errors Slackvariable

REFERENCES:

Akinyokun,O.C.,Enikanselu,P.A.,Adeyemo,A.B.,Adesida,A., 2009. Well Log Interpretation Model for theDetermination of Lithology and Fluid Contents. ThePacificJournalofScienceandTechnology10,507517.

Baylar,A.,Hanbay,D.,Batan,M.,2009.Applicationof leastsquare support vector machines in the prediction ofaeration performance of plunging overfall jets from


82

weirs.ExpertSyst.Appl.36,83688374.Byvatov, E., Fechner,U., Sadowski, J., Schneider,G., 2003.

Comparison of support vector machine and artificialneuralnetwork systems fordrug/nondrug classification.Journal of chemical information and computer sciences43,18821889.

Chang, H.C., KopaskaMerkel, D.C., Chen, H.C., 2002.Identification of lithofacies using Kohonen selforganizingmaps.Computers&Geosciences28,223229.

Cristianini, N., ShaweTaylor, J., 2000. An introduction tosupport Vector Machines: and other kernelbasedlearningmethods.CambridgeUniversityPress.

Eslamimanesh, A., Gharagheizi, F., Mohammadi, A.H.,Richon, D., 2013. Assessment test of sulfur content ofgases.FuelProcessingTechnology110,133140.

Evenick, J.,2008. Introduction toWellLogsandSubsurfaceMaps.PennWell.

Fabian,V.,1997.Simulatedannealingsimulated.Computers&MathematicswithApplications33,8194.

Farasat, A., Shokrollahi, A., Arabloo, M., Gharagheizi, F.,Mohammadi,A.H.,2013.Towardanintelligentapproachfordeterminationofsaturationpressureofcrudeoil.FuelProcess.Technol.Fazavi,M.,Hosseini,S.M.,Arabloo,M.,Shokrollahi, A., Amani, M., 2013. Applying a SmartTechnique for Accurate Determination of FlowingOil/Water Pressure Gradient inHorizontal Pipelines. J.DispersionSci.Technol.

Gharagheizi,F.,Eslamimanesh,A.,Farjood,F.,Mohammadi,A.H., Richon, D., 2011. Solubility Parameters ofNonelectrolyte Organic Compounds: DeterminationUsing Quantitative StructureProperty RelationshipStrategy.Ind.Eng.Chem.50,1138211395.

Goodall, C.R., 1993. Computation using the QRdecomposition,HandbookofStatistics.Elsevier,pp.467508.

Gramatica,P., 2007.Principles ofQSARmodelsvalidation:internalandexternal.QSAR&CombinatorialScience26,694701.

Haifeng,W.,Dejin,H., 2005.Comparison of SVM and LSSVM for Regression, Neural Networks and Brain, pp.279283.

Hsieh, B.Z., Lewis, C., Lin, Z.S., 2005. Lithologyidentificationofaquifersfromgeophysicalwell logsandfuzzy logicanalysis:ShuiLinArea,Taiwan.Computers

&Geosciences31,263275.Kirkpatrick,S.,Gelatt,C.D.,Vecchi,M.P.,1983.Optimization

bySimulatedAnnealing.Science220,671680.Metropolis,N.,Rosenbluth,A.W.,Rosenbluth,M.N.,Teller,

A.H.,Teller,E., 1953.Equation of StateCalculations byFast Computing Machines. The Journal of ChemicalPhysics21,10871092.

Mohammadi, A.H., Eslamimanesh, A., Gharagheizi, F.,Richon, D., 2012. A novel method for evaluation ofasphaltene precipitation titration data. ChemicalEngineeringScience78,181185.

Moser,G.,Serpico,S.B.,2009.ModelingtheErrorStatisticsinSupportVectorRegressionofSurfaceTemperatureFromInfraredData.IEEEGeosci.RemoteSens.Lett.6,448452.

Pelckmans,K.,Suykens, J.A.K.,Gestel,T.V.,Brabanter, J.D.,Lukas,L.,Hamers,B.,Moor,B.D.,Vandewalle, J.,2002.LSSVMlab: a MATLAB/C toolbox for Least SquaresSupportVectorMachines,Leuven,Belgium.

RafieeTaghanaki, S.,Arabloo,M.,Chamkalani,A.,Amani,M., Zargari, M.H., Adelzadeh, M.R., 2013.Implementation of SVM framework to estimate PVTpropertiesofreservoiroil.FluidPhaseEquilib.346,2532.

Scholkopf, B.S., Smola, A.J., 2002. Learning With Kernels:Support VectorMachines, Regularization,OptimizationandBeyond.UniversityPressGroupLimited.

Serra,O.,Abbott,H.T., 1982. TheContribution of LoggingData to Sedimentology and Stratigraphy. Society ofPetroleumEngineersJournal22,117131.

Shokrollahi,A.,Arabloo,M.,Gharagheizi, F.,Mohammadi,A.H., 2013. Intelligent model for prediction of CO2 Reservoiroilminimummiscibilitypressure. J.Fuel 112,375384.

Suykens, J., Vandewalle, J., 1999. Least squares supportvectormachine classifiers.Neural Processing Letters 9,293300.

Suykens,J.A.K.,2001.SupportVectorMachines:ANonlinearModellingandControlPerspective.Eur.J.Control7,311327.

Vapnik, V., 1995. The nature of statistical learning theory.SpringerVerlag,NewYork.

XavierdeSouza, S., Suykens, J.A.K., Vandewalle, J., Bolle,D., 2010.Coupled SimulatedAnnealing. Systems,Man,andCybernetics,PartB:Cybernetics, IEEETransactionson40,320335.

Documents

Automatic Identification of Formation Iithology from Well Log Data: A Machine Learning Approach