Research Article Kernel Fisher Discriminant Analysis Based ...Research Article Kernel Fisher Discriminant Analysis Based on a Regularized Method for Multiclassification and Application

Research ArticleKernel Fisher Discriminant Analysis Based ona Regularized Method for Multiclassification andApplication in Lithological Identification

Dejiang Luo1 and Aijiang Liu2

1 College of Management Science Chengdu University of Technology Chengdu 610059 China2 College of Geophysics Chengdu University of Technology Chengdu 610059 China

Correspondence should be addressed to Aijiang Liu 75747141qqcom

Received 19 July 2014 Revised 8 October 2014 Accepted 10 October 2014

Academic Editor Jun Cheng

Copyright copy 2015 D Luo and A Liu This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

This study aimed to construct a kernel Fisher discriminant analysis (KFDA) method from well logs for lithology identificationpurposes KFDA via the use of a kernel trick greatly improves the multiclassification accuracy compared with Fisher discriminantanalysis (FDA)The optimal kernel Fisher projection of KFDA can be expressed as a generalized characteristic equation However itis difficult to solve the characteristic equation therefore a regularizedmethod is used for it In the absence of amethod to determinethe value of the regularized parameter it is often determined based on expert human experience or is specified by tests In this paperit is proposed to use an improved KFDA (IKFDA) to obtain the optimal regularized parameter by means of a numerical methodThe approach exploits the optimal regularized parameter selection ability of KFDA to obtain improved classification results Themethod is simple and not computationally complex The IKFDA was applied to the Iris data sets for training and testing purposesand subsequently to lithology data sets The experimental results illustrated that it is possible to successfully separate data that isnonlinearly separable thereby confirming that the method is effective

1 Introduction

Chinarsquos tight clastic rock reservoir is considerably widecontaining sediments that were deposited during the Car-boniferous Permian Triassic and Jurassic periodsThe reser-voirs in Western Sichuan and Erdos are more representativeThe West Sichuan depression is located in the WesternSichuan Basin which belongs to the western depression beltthat is located in Yangtze Platform or Longmenshan FaultZone A tight gas reservoir was discovered in the Xujiaheand Shaximiao formations that occur in this area Tightclastic reservoirs are noted for their low porosity because oftheir dense multilayered stacking and strong heterogeneouscharacteristics that are caused by their complexity and par-ticularity These characteristics complicate the identificationof the lithology which would enable the prediction of theproperties of a reservoir Previous research [1 2] identified thereservoir lithology consisting of mudstone sandstone andsiltstone in the AC area of western Sichuan The cross plot

of sandstone and siltstone shows that these rock types overlapand aremixed together which is a linearly nonseparable caseThe cross plot and mathematical models have been appliedextensively in lithology identification in previous studies Forexample Hsieh et al constructed a fuzzy lithology systemfrom well logs to identify the formation lithology [3] whileShao et al applied an improved BP neural network algorithmbased on a momentum factor to lithology recognition [4]Zhang et al used Fisher discrimination to identify volcaniclithology using regular logging data [5] However it is verydifficult to identify the lithology of tight clastic rock reservoirswith the above methodThus this paper proposes the KernelFisher discriminant analysis (KFDA) for tight clastic rocklithology identification

The KFDA has its roots in Fisher discriminant analysis(FDA) and is the nonlinear scheme for two-class and mul-ticlass problems [6] KFDA functions by mapping the low-dimensional sample space into a high-dimensional feature

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2015 Article ID 384183 8 pageshttpdxdoiorg1011552015384183

2 Mathematical Problems in Engineering

space in which the FDA is subsequently conducted TheKFDA study focuses on applied and theoretical researchBillings et al replaced the kernel matrix with its submatrix inorder to simplify the computation [7 8] Liu et al proposeda new criterion for KFDA to maximize the uniformity ofclass-pair separabilities that was evaluated by the entropyof the normalized class-pair separabilities [9] Wang et alconsidered discriminant vectors to be linear combinations ofldquonodesrdquo that are part of the training samples and thereforeproposed a fast kernel Fisher discriminant analysis technique[10 11] Wang et al proposed the nodes to be the mostrepresentative training samples [12] Optimal kernel selectionis one of the areas of theoretical research that has beenattracting considerable attention Fung et al have developedan iterative method based on a quadratic programmingformulation of FDA [13] Khemchandani et al consideredthe problem of finding the data-dependent ldquooptimalrdquo kernelfunction via second-order cone programming [14] The useof KFDA in combination with strong nonlinear featureextraction ability is becoming a powerful tool for solvingidentification or classification problems Hence it has beenapplied widely and successfully in many areas Examples ofthe application of KFDA are face recognition fault diagnosisclassification and the prediction of the existence of hydrocar-bon reservoirs [15ndash20]

The principle that underlies KFDA is that input data aremapped into a high-dimensional feature space by using anonlinear function after which FDA is used for recognitionor classification in feature space KFDA requires factorizationof the Grammatrix into the kernel within-class scattermatrix119870119908and the between-class scattermatrix119870

119887 KFDA can finally

be attributed to the solution of a generalized eigenvalueproblem 119870

119887120572 = 120582119870

119908120572 As the matrix 119870

119908is often singular a

regularized method is often used to solve the problem whichis transformed into a general eigenvalue problem by choosinga smaller positive number 120583 in which case 119870

119908is replaced

with 119870119908+ 120583119868 Previous studies have shown the classification

ability of KFDA to depend on the value of 120583 therefore theappropriate values for KFDA are very important In manypractical applications the parameter 120583 is specified accordingto experience or experimental results

This paper proposes a new approach for the selection ofthe regularized parameter 120583 to gain the best classificationresults and KFDA is improved for both Iris data setsand lithology data sets The paper is organized as followsSection 2 summarizes kernel Fisher discriminant analysisIn Section 3 a numerical method for finding an optimalparameter120583 is proposed by introducing a regularizedmethodfor KFDA The experimental results are given in Sections 4and 5 while Section 6 presents the concluding remarks

2 Kernel Fisher Discriminant Analysis

Let 119878 = 1199091 1199092 119909

119873 be the data set that contains119870 classes

in the 119889-dimensional real space 119877119889 Let 119873119894 samples belongto the 119895th class (119873 = 119873

1+ 1198732+ sdot sdot sdot + 119873

119896) FDA is used for

lithology identification by searching the optimal projectionvectors 119908 and then a different class of lithology samples hasminimumwithin-class scatter FDA is given by the vector119908 isin119877119889 that maximizes the Fisher discriminant function as

119869 (119882) =119908119879

119878119887119908

119908119879119878119908119908 (1)

where 119878119908is the within-class scatter matrix and 119878

119887is the

between-class scatter matrix FDA is essentially a linearmethod which makes it very difficult to separate the nonlin-ear separable sample

KFDA significantly improves the classification ability forthe nonlinear separable sample of FDA via the use of a kerneltrick To adapt to nonlinear cases 120593(119909) is mapped fromthe lower dimensional sample space into a high-dimensionalfeature space Note that 120601(119909119895

119894) (119894 = 1 2 119873

119895 119895 =

1 2 119870) represents the 119894th projection value in the class 120596119895

Let 119898120601 be the mean vector of the population and let 119898120601119895be

the mean vector of class 120596119895 In the feature space 119865 the total

scatter matrix 119878119905 the within-class scatter matrix 119878

119908 and the

between-class scatter matrix 119878119887can be defined as

119878120601

119905=1

119873

119873

sum

119894=1

(120601 (119909119894) minus 119898120601

) (120601 (119909119894) minus 119898120601

)119879

119878120601

119908=1

119873

119870

sum

119895=1

119873119895

sum

119894=1

(120601 (119909119895

119894) minus 119898

120601

119895) (120601 (119909

119895

119894) minus 119898

120601

119895)119879

119878120601

119887=

119870

sum

119895=1

119873119895

119873(119898120601

119895minus 119898120601

) (119898120601

119895minus 119898120601

)119879

(2)

Lithology identification by KFDA can be attributed to theoptimization of kernel Fisher criterion function as follows

119869 (V) =V119879119878120601119887V

V119879119878120601119908V (3)

where V represents the different optimal projection vectorThehigh dimension of feature space119865 and the infinite dimen-sion make it impossible to directly calculate the optimaldiscriminant vector V A solution for this problem is to usethe kernel trick as follows

119870(119909119894 119909119895) = (120601 (119909

119894) 120601 (119909

119895)) (4)

According to the theory of reproducing a kernel [6]any solution V must lie in the feature space 119865 which spans120601(1199091) 120601(119909) 120601(119909

119873) as follows

V =119873

sum

119894=1

120572119894120601 (119909119894) (5)

Mathematical Problems in Engineering 3

40 60 8020

30

40

50

Petal length

Peta

l wid

th

(a)

Petal length40 60 80

0

20

40

60

80

x3

(b)


0

10

20

30

Sepa

l wid

th

(c)

20 40 600

20

40

60

80

Petal width

Sepa

l len

gth

(d)

20 40 600

10

20

30

Petal width

Sepa

l wid

th

(e)

0 50 1000

10

20

30

Sepal length

Sepa

l wid

th

(f)

Figure 1 Scatter gram plots of Iris data red (C1 setosa) green (C2 virginica) blue (C3 versicolor)

In 119865 any test samples can be projected into 119908 to give thefollowing equation

V119879120601 (119909) =119873

sum

119894=1

120572119894(120601 (119909) 120601 (119909

119894))

= 120572119879

((120601 (119909) 120601 (1199091)) (120601 (119909) 120601 (119909

2))

(120601 (119909) 120601 (119909119873)))

= 120572119879

(119870 (119909 1199091) 119870 (119909 119909

2) 119870 (119909 119909

119873))

(6)

In 119865 the kernel within-class scatter matrix 119870119908

and thebetween-class scatter matrix119870

119887can be defined as

1205830= (

1

119873

119873

sum

119894=1

119870(1199091 119909119894) 1

119873

119873

sum

119894=1

119870(1199092 119909119894)

1

119873

119873

sum

119894=1

119870(119909119873 119909119894))

119879

(7)

120583119894= (

1

119873119894

119873119894

sum

119895=1

119870(1199091 119909119894

119895))

1

119873119894

119873119894

sum

119895=1

119870(119909119873 119909119894

119895) (8)

119870119887=

119870

sum

119894=1

119873119894

119873(120583119894minus 1205830) (120583119894minus 1205830)119879

(9)

119870119908=1

119873

119870

sum

119894=1

119873119894

sum

119895=1

(120585119909119895minus 120583119894) (120585119909119895minus 120583119894)

119879

(10)

119869 (V) =V119879119878120601119887V

V119879119878120601119908V=120572119879

119870119887120572

120572119879119870119908120572= 119869 (120572) (11)

According to the properties of the generalized Rayleighquotient the optimal solution vector 119908 is obtained by max-imizing the criterion function in (11) by setting it equivalent


to the solution of the generalized characteristic equation asfollows

119870119887120572 = 120582119870

119908120572 (12)

3 Choosing the Regularized Parameter

If 119870119908

is a nonsingular matrix then optimal vectors 120572obtained by maximizing (11) are equivalent to the featurevectors corresponding to the top 119898 largest eigenvalues [1221] Equation (12) can be described as

(119870119908)minus1

119870119887120572 = 120582120572 (13)

The solution of practical problems requires the use of 119899training samples to estimate the variance of 119899-dimensionalstructure therefore 119870

119908is a singular matrix This means that

it is often not possible to use (13) However it is possibleto promote the stability of the numerical method by usingregularized method as follows

(119870119908)120583= 119870119908+ 120583119868 (14)

where 120583 is a small positive number and 119868 is the identitymatrix Then (12) can be expressed as

(119870119908+ 120583119868)minus1

119870119887120572 = 120582120572 (15)

When KFDA is used to solve problems of an applied natureparameter 120583 is determined according to the experience or theresult of the experimentThis paper uses a numerical analysismethod to solve the parameter hence the determinant of thevalue of 1003816100381610038161003816119870119908 + 120583119868

1003816100381610038161003816 can be regarded as a function of 120583

119891 (120583) =1003816100381610038161003816119870119908 + 120583119868

1003816100381610038161003816 (16)

When function 119891 is stable and the value of the functiontends to zero (17) the parameter 120583 is the best classificationparameters

lim120583rarr0

119891 (120583) 997888rarr 0 (17)

4 Experiments

41 Experiments Settings The Iris data set is often used totest the discriminant analysis algorithm [22ndash24]This data setis divided into three classes which represent three differentvarieties of the Iris flower C1 Iris setosa C2 Iris versicolorand C3 Iris virginica which takes petal length petal widthsepal length and sepal width as four-dimensional variablesThe three classes represent three different varieties of Irisflowers There are 150 samples in this data set and there are50 samples in each class The results are plotted as scattergrams (Figure 1) and show that classes C1 and C2 and classesC1 and C3 are linearly separable and classes C2 and C3 arenonlinearly separable The aim was to address this problemby using the KFDAwith different values of 120583 for classificationpurposes [25]

The following experiments involve a comparative studyof the different selection schemes of 120583 In this study

0 002 004 006 008 010

02

04

06

08

1

12

120583

times10minus41

Valu

e of t

he d

eter

min

antK

w+120583lowastI

Figure 2 Value of the determinant versus the value of regularized120583

the Iris data set is used to conduct algorithm training Thusthe three sets of samples (based on petal length petal widthsepal length and sepal width) were chosen with each of thesesets comprising 30 samples The kernel function employs inKFDA the Gauss kernel function (18) for which the kernelparameter 120590 is set to be the norm of the covariance matrix ofthe training samples

119870(119909 119910) = exp(minus1003817100381710038171003817119909 minus 119910

1003817100381710038171003817

2

2120590) (18)

42 Experimental Results and Discussions The experimentwas processed within a MATLAB 70 environment runningon a PC powered by a Pentium 33GHzCPUThe experimen-tal results are shown in Figures 2 and 3 The optimal value ofregularized 120583 is identified in Figure 1 which shows that thefunction119891(120583) = 1003816100381610038161003816119870119908 + 120583119868

1003816100381610038161003816 has an obvious inflexion point andthat the value of the function 119891(120583) approaches zero when theparameter is equal to 009

This is further illustrated in Figure 3 which shows theclassification performance when the parameter 120583 has dif-ferent values A simulation was conducted by selecting 30data points from each of the three classes and the resultswere constructed in the form of scatter plots that showthe distribution of the data by using KFDA The optimalregularized parameter (120583 = 009) guaranteed separation ofclasses C2 andC3However when the value of the regularizedparameter (120583 = 018) increased it impacted negatively theclassification effect

5 Application to Lithology Data Sets

51TheGeological Settings of the ACRegion TheAC region islocated in the central segment of theWest Sichuan depressionand was formed during Upper Triassic and Jurassic periodsThe Triassic-Jurassic stratum is the part of the thick entityof the Western Sichuan foreland basin with a total thickness


0105 011 0115 012 0125 013 0135 014 0145First kernel Fisher direction

Seco

nd k

erne

l Fish

er d

irect

ion

minus05

minus1

minus15

minus2

minus25

minus3

minus35

0

Iris versicolorIris virginica

Iris setosa

(a) 120583 = 009

012 013 014 015 016 017 018 019First kernel Fisher direction

Seco

nd k

erne

l Fish

er d

irect

ion

minus05

minus1

minus15

minus2

minus25

minus3

minus35

0


Iris setosa

(b) 120583 = 018

009 0095 01 0105 011 0115 012First kernel Fisher direction

Seco

nd k

erne

l Fish

er d

irect

ion


minus05

minus1

minus15

minus2

minus25

minus3

0

Iris setosa

(c) 120583 = 0045

Figure 3 Classification results with different value of regularized 120583

of 6500m The AC region is located in a large uplift belt ofthe West Sichuan depression which shows NEE trend [1 2]This paper uses KFDA for logging parameters identificationof lithology and the purpose of the study horizon is Xujiaheformation The Xujiahe formation in Chengdu Deyang andJiangYou is a typical tight rock formation characterized bylow porosity because of its dense multilayered stacking andstrong heterogeneity The formation has a typical thicknessof 400ndash700m and the thickness of the formation in Anxiancounty in front of Longmenshan thickness is up to 1000m

The Xujiahe formation consists of alternating layers ofsandstone siltstone mudstone shale and coal series in

which the rock types aremore complex According to loggingdata obtained from wells and the extraction of the physicalparameters the rock in the Xujiahe formation can be dividedintomudstone siltstone and sandstone based on the physicalintersection diagram and histogram analysis of rocks

52 Logging Parameters Lithological Identification Loggingparameters provide a comprehensive reflection of lithologyand their sensitivity is different for lithology identificationThe sensitivity of these parameters for lithology identificationwas studied using the method of correlation analysis andfinally determined acoustic (AC) natural gamma ray (GR)


50 100 1500

02

04

06

08

CNL

(vv

)

AC (brvbar Isft) AC (brvbar Isft) AC (brvbar Isft)50 100 150

2

22

24

26

28

DEN

(gc

m3 )

50 100 15050

100

150

200

250

GR

(API

)

0 05 12

22

24

26

28

DEN

(gc

m3 )

CNL (vv)0 05 1

50

100

150

200

250

CNL (vv)

GR

(API

)

2 25 350

100

150

200

250

DEN (gcm3)

GR

(API

)

Figure 4 Cross-sectional plots of the logging parameters green O (sandstone) red (siltstone) blue (mudstone)

density (DEN) and compensated neutron logging (CNL) asthe characteristic variables Training and testing sets in thestandard layer each of which contained 50 samples wereobtained The cross-sectional plot is displayed in Figure 4A comparison of Figures 1 and 4 reveals the characteristicsthat are similar to both namely in which the sandstone andsiltstone sample data cannot be separated

In the following experiments FDA and KFDA are com-pared on logging attribute data setsThe FDAmethod is usedto extract the optimal and suboptimal discriminant vectorof the training sets and then the testing sets of sample areprojected onto vectors (Figure 5) As can be seen in Figure 5the mudstone can be separated from the sandstone andsiltstone The latter two rock types were still mixed togetherwith respect to the siltstone and sandstone data although theseparation degree is higher than indicated by cross plot

Experiments were conducted on logging attribute setsusing IKFDAThe kernel function employed in IKFDA is theGaussian kernel function and a numerical method was usedto obtain the optimal value of parameter 120583 = 006 IKFDA

was used to obtain the first and second kernel feature vectorsfollowing which the cross plot of the test sets was obtained(Figure 6) As can be seen in Figure 6 it is possible to obtaina good separation between the three types of samples witheach sample forming its own cluster centerThe experimentalresults show that the performance of IKFDA is superior tothat of FDA for lithology identification purposes

6 Conclusion

The optimal kernel Fisher projection of KFDA can beexpressed as a generalized characteristic equation by usinga regularized method For multiclass problems the value ofthe regularized parameter is a key factor in the applicationeffect which is largely influenced by human experience Thispaper proposes a novel method to optimize the regularizedparameter using a numerical method Thus by selectingan optimal value for the regularized parameter it becomespossible to solve the generalized characteristic equationthereby eliminating the human factor The effectiveness of


0 05First Fisher direction

Seco

nd F

isher

dire

ctio

n

SandstoneMudstoneSiltstone

minus21

minus205

minus2

minus195

minus19

minus185

minus18

minus175

minus17

minus165

minus16

minus15 minus1 minus05

Figure 5 FDA identification result

Mudstone

0 01 0204

06

08

1

12

14

16

18

Kernel first Fisher direction

Kern

el se

cond

Fish

er d

irect

ion

SandstoneSiltstone

minus05 minus04 minus03 minus02 minus01

Figure 6 IKFDA identification result

the IKFDA was demonstrated by applying it to the nonlin-early separable Iris and lithology data sets The experimentalresults indicated that successful separation was achieved

In this paper the selection of the regularized parameterdepended on the numerical methodWe analyzed the appliedvalidity of the improved Kernel Fisher discriminant analysiswithout performing deep theoretical analysis This needs tobe addressed by conducting further research in the future

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors are most grateful for the valuable advice onthe revision of the paper from the reviewers The workwas supported by the Foundation of Geomathematics KeyLaboratory of Sichuan Province

References

[1] A J LiuTheRock Physics Research of Typical Tight Clastic Reser-voir in Western Sichuan Chengdu University of TechnologyChengDu China 2010

[2] S Q Wang Comprehensive Reservoir Evaluation of XujiaheFormation of the West Sichuan by Log Welling in XC StructureChengdu University of Technology Chengdu China 2009

[3] B-Z Hsieh C Lewis and Z-S Lin ldquoLithology identificationof aquifers from geophysical well logs and fuzzy logic analysisShui-Lin Area Taiwanrdquo Computers amp Geosciences vol 31 no 3pp 263ndash275 2005

[4] Y X Shao Q Chen and D M Zhang ldquoThe application ofimproved BP neural network algorithm in lithology recogni-tionrdquo in Advances in Computation and Intelligence Proceedingsof the 3rd International Symposium ISICA 2008Wuhan ChinaDecember 19ndash21 2008 L Kang Z Cai X Yan and Y Liu Edsvol 5370 of Lecture Notes in Computer Science pp 342ndash349Springer Berlin Germany 2008

[5] J Z Zhang Q S Guan J Q Tan et al ldquoApplication of Fisherdiscrimination to volcanic lithology identificationrdquo XinjiangPetroleum Geology vol 29 no 6 pp 761ndash764 2008

[6] S Mika G Ratsch J Weston B Scholkopf and K-R MullerldquoFisher discriminant analysis with kernelsrdquo in Proceedings of the9th IEEE Signal Processing SocietyWorkshop onNeural Networksfor Signal Processing (NNSP rsquo99) pp 41ndash48MadisonWis USAAugust 1999

[7] S A Billings and K L Lee ldquoNonlinear Fisher discriminantanalysis using a minimum squared error cost function and theorthogonal least squares algorithmrdquo Neural Networks vol 15no 1 pp 263ndash270 2002

[8] A J Smola and B Schokopf ldquoSparse greedy matrix approx-imation for machine learningrdquo in Proceedings of the 17thInternational Conference on Machine Learning pp 911ndash918 SanFrancisco Calif USA 2000

[9] J Liu F Zhao and Y Liu ldquoLearning kernel parameters for ker-nel Fisher discriminant analysisrdquo Pattern Recognition Lettersvol 34 no 9 pp 1026ndash1031 2013

[10] Y Xu J Y Yang J F Lu and D-J Yu ldquoAn efficient renovationon kernel Fisher discriminant analysis and face recognitionexperimentsrdquo Pattern Recognition vol 37 no 10 pp 2091ndash20942004

[11] Q Zhu ldquoReformative nonlinear feature extraction using kernelMSErdquo Neurocomputing vol 73 no 16ndash18 pp 3334ndash3337 2010

[12] J Wang Q Li J You and Q Zhao ldquoFast kernel Fisherdiscriminant analysis via approximating the kernel principalcomponent analysisrdquo Neurocomputing vol 74 no 17 pp 3313ndash3322 2011

[13] G Fung M Dundar J Bi and B Rao ldquoA fast iterative algo-rithm for Fisher Discriminant using heterogeneous kernelsrdquoin Proceedings of the 21st International Conference on MachineLearning (ICML rsquo04) pp 264ndash272 ACM July 2004

[14] R Khemchandani Jayadeva and S Chandra ldquoLearning theoptimal kernel for Fisher discriminant analysis via second order


cone programmingrdquo European Journal of Operational Researchvol 203 no 3 pp 692ndash697 2010

[15] G QWang and G Q Ding Face Recognition Using KFDA-LLESpringer Berlin Germany 2011

[16] J H Li and P L Cui ldquoImproved kernel fisher discriminantanalysis for fault diagnosisrdquo Expert Systems with Applicationsvol 36 no 2 pp 1423ndash1432 2009

[17] H-W Cho ldquoAn orthogonally filtered tree classifier based onnonlinear kernel-based optimal representation of datardquo ExpertSystems with Applications vol 34 no 2 pp 1028ndash1037 2008

[18] Q Zhang J W Li and Z P Zhang Efficient SemanticKernel-Based Text Classification Using Matching Pursuit KFDASpringer Berlin Germany 2011

[19] J H Xu X G Zhang and Y D Li ldquoApplication of kernelFisher discriminanting technique to prediction of hydrocarbonreservoirrdquoOil Geophysical Prospecting vol 37 no 2 pp 170ndash1742002

[20] N Louw and S J Steel ldquoVariable selection in kernel Fisherdiscriminant analysis by means of recursive feature elimina-tionrdquo Computational Statistics amp Data Analysis vol 51 no 3pp 2043ndash2055 2006

[21] M Volpi G P Petropoulos and M Kanevski ldquoFlooding extentcartography with Landsat TM imagery and regularized kernelFisherrsquos discriminant analysisrdquo Computers amp Geosciences vol57 pp 24ndash31 2013

[22] K-L Wu and M-S Yang ldquoA cluster validity index for fuzzyclusteringrdquo Pattern Recognition Letters vol 26 no 9 pp 1275ndash1291 2005

[23] Z Volkovich Z Barzily and LMorozensky ldquoA statistical modelof cluster stabilityrdquo Pattern Recognition vol 41 no 7 pp 2174ndash2188 2008

[24] S S Khan and A Ahmad ldquoCluster center initialization algo-rithm for K-means clusteringrdquo Pattern Recognition Letters vol25 no 11 pp 1293ndash1302 2004

[25] X ZhangPerformanceMonitoring Fault Diagnosis andQualityPrediction Based on Statistical Theory Shanghai Jiao TongUniversity Shanghai China 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of


Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of


Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of


Mathematical PhysicsAdvances in

Complex AnalysisJournal of


OptimizationJournal of


CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of


Operations ResearchAdvances in

Journal of


Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences


The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Algebra

Discrete Dynamics in Nature and Society



Decision SciencesAdvances in

Discrete MathematicsJournal of


Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of


space in which the FDA is subsequently conducted TheKFDA study focuses on applied and theoretical researchBillings et al replaced the kernel matrix with its submatrix inorder to simplify the computation [7 8] Liu et al proposeda new criterion for KFDA to maximize the uniformity ofclass-pair separabilities that was evaluated by the entropyof the normalized class-pair separabilities [9] Wang et alconsidered discriminant vectors to be linear combinations ofldquonodesrdquo that are part of the training samples and thereforeproposed a fast kernel Fisher discriminant analysis technique[10 11] Wang et al proposed the nodes to be the mostrepresentative training samples [12] Optimal kernel selectionis one of the areas of theoretical research that has beenattracting considerable attention Fung et al have developedan iterative method based on a quadratic programmingformulation of FDA [13] Khemchandani et al consideredthe problem of finding the data-dependent ldquooptimalrdquo kernelfunction via second-order cone programming [14] The useof KFDA in combination with strong nonlinear featureextraction ability is becoming a powerful tool for solvingidentification or classification problems Hence it has beenapplied widely and successfully in many areas Examples ofthe application of KFDA are face recognition fault diagnosisclassification and the prediction of the existence of hydrocar-bon reservoirs [15ndash20]

The principle that underlies KFDA is that input data aremapped into a high-dimensional feature space by using anonlinear function after which FDA is used for recognitionor classification in feature space KFDA requires factorizationof the Grammatrix into the kernel within-class scattermatrix119870119908and the between-class scattermatrix119870

119887 KFDA can finally

be attributed to the solution of a generalized eigenvalueproblem 119870

119887120572 = 120582119870

119908120572 As the matrix 119870

119908is often singular a

regularized method is often used to solve the problem whichis transformed into a general eigenvalue problem by choosinga smaller positive number 120583 in which case 119870

119908is replaced

with 119870119908+ 120583119868 Previous studies have shown the classification

ability of KFDA to depend on the value of 120583 therefore theappropriate values for KFDA are very important In manypractical applications the parameter 120583 is specified accordingto experience or experimental results

This paper proposes a new approach for the selection ofthe regularized parameter 120583 to gain the best classificationresults and KFDA is improved for both Iris data setsand lithology data sets The paper is organized as followsSection 2 summarizes kernel Fisher discriminant analysisIn Section 3 a numerical method for finding an optimalparameter120583 is proposed by introducing a regularizedmethodfor KFDA The experimental results are given in Sections 4and 5 while Section 6 presents the concluding remarks

2 Kernel Fisher Discriminant Analysis

Let 119878 = 1199091 1199092 119909

119873 be the data set that contains119870 classes

in the 119889-dimensional real space 119877119889 Let 119873119894 samples belongto the 119895th class (119873 = 119873

1+ 1198732+ sdot sdot sdot + 119873

119896) FDA is used for

lithology identification by searching the optimal projectionvectors 119908 and then a different class of lithology samples hasminimumwithin-class scatter FDA is given by the vector119908 isin119877119889 that maximizes the Fisher discriminant function as

119869 (119882) =119908119879

119878119887119908

119908119879119878119908119908 (1)

where 119878119908is the within-class scatter matrix and 119878

119887is the

between-class scatter matrix FDA is essentially a linearmethod which makes it very difficult to separate the nonlin-ear separable sample

KFDA significantly improves the classification ability forthe nonlinear separable sample of FDA via the use of a kerneltrick To adapt to nonlinear cases 120593(119909) is mapped fromthe lower dimensional sample space into a high-dimensionalfeature space Note that 120601(119909119895

119894) (119894 = 1 2 119873

119895 119895 =

1 2 119870) represents the 119894th projection value in the class 120596119895

Let 119898120601 be the mean vector of the population and let 119898120601119895be

the mean vector of class 120596119895 In the feature space 119865 the total

scatter matrix 119878119905 the within-class scatter matrix 119878

119908 and the

between-class scatter matrix 119878119887can be defined as

119878120601

119905=1

119873

119873

sum

119894=1

(120601 (119909119894) minus 119898120601

) (120601 (119909119894) minus 119898120601

)119879

119878120601

119908=1

119873

119870

sum

119895=1

119873119895

sum

119894=1

(120601 (119909119895

119894) minus 119898

120601

119895) (120601 (119909

119895

119894) minus 119898

120601

119895)119879

119878120601

119887=

119870

sum

119895=1

119873119895

119873(119898120601

119895minus 119898120601

) (119898120601

119895minus 119898120601

)119879

(2)

Lithology identification by KFDA can be attributed to theoptimization of kernel Fisher criterion function as follows

119869 (V) =V119879119878120601119887V

V119879119878120601119908V (3)

where V represents the different optimal projection vectorThehigh dimension of feature space119865 and the infinite dimen-sion make it impossible to directly calculate the optimaldiscriminant vector V A solution for this problem is to usethe kernel trick as follows

119870(119909119894 119909119895) = (120601 (119909

119894) 120601 (119909

119895)) (4)

According to the theory of reproducing a kernel [6]any solution V must lie in the feature space 119865 which spans120601(1199091) 120601(119909) 120601(119909

119873) as follows

V =119873

sum

119894=1

120572119894120601 (119909119894) (5)


40 60 8020

30

40

50

Petal length

Peta

l wid

th

(a)


0

20

40

60

80

x3

(b)


0

10

20

30

Sepa

l wid

th

(c)

20 40 600

20

40

60

80

Petal width

Sepa

l len

gth

(d)

20 40 600

10

20

30

Petal width

Sepa

l wid

th

(e)

0 50 1000

10

20

30

Sepal length

Sepa

l wid

th

(f)



V119879120601 (119909) =119873

sum

119894=1

120572119894(120601 (119909) 120601 (119909

119894))

= 120572119879

((120601 (119909) 120601 (1199091)) (120601 (119909) 120601 (119909

2))

(120601 (119909) 120601 (119909119873)))

= 120572119879

(119870 (119909 1199091) 119870 (119909 119909

2) 119870 (119909 119909

119873))

(6)




1205830= (

1

119873

119873

sum

119894=1

119870(1199091 119909119894) 1

119873

119873

sum

119894=1

119870(1199092 119909119894)

1

119873

119873

sum

119894=1

119870(119909119873 119909119894))

119879

(7)

120583119894= (

1

119873119894

119873119894

sum

119895=1

119870(1199091 119909119894

119895))

1

119873119894

119873119894

sum

119895=1

119870(119909119873 119909119894

119895) (8)

119870119887=

119870

sum

119894=1

119873119894

119873(120583119894minus 1205830) (120583119894minus 1205830)119879

(9)

119870119908=1

119873

119870

sum

119894=1

119873119894

sum

119895=1

(120585119909119895minus 120583119894) (120585119909119895minus 120583119894)

119879

(10)

119869 (V) =V119879119878120601119887V

V119879119878120601119908V=120572119879

119870119887120572

120572119879119870119908120572= 119869 (120572) (11)




119870119887120572 = 120582119870

119908120572 (12)


If 119870119908


(119870119908)minus1

119870119887120572 = 120582120572 (13)




(119870119908)120583= 119870119908+ 120583119868 (14)


(119870119908+ 120583119868)minus1

119870119887120572 = 120582120572 (15)



119891 (120583) =1003816100381610038161003816119870119908 + 120583119868

1003816100381610038161003816 (16)


lim120583rarr0

119891 (120583) 997888rarr 0 (17)

4 Experiments



0 002 004 006 008 010

02

04

06

08

1

12

120583

times10minus41

Valu

e of t

he d

eter

min

antK

w+120583lowastI



119870(119909 119910) = exp(minus1003817100381710038171003817119909 minus 119910

1003817100381710038171003817

2

2120590) (18)








Seco

nd k

erne

l Fish

er d

irect

ion

minus05

minus1

minus15

minus2

minus25

minus3

minus35

0


Iris setosa

(a) 120583 = 009


Seco

nd k

erne

l Fish

er d

irect

ion

minus05

minus1

minus15

minus2

minus25

minus3

minus35

0


Iris setosa

(b) 120583 = 018


Seco

nd k

erne

l Fish

er d

irect

ion


minus05

minus1

minus15

minus2

minus25

minus3

0

Iris setosa

(c) 120583 = 0045







50 100 1500

02

04

06

08

CNL

(vv

)


2

22

24

26

28

DEN

(gc

m3 )

50 100 15050

100

150

200

250

GR

(API

)

0 05 12

22

24

26

28

DEN

(gc

m3 )

CNL (vv)0 05 1

50

100

150

200

250

CNL (vv)

GR

(API

)

2 25 350

100

150

200

250

DEN (gcm3)

GR

(API

)






6 Conclusion




Seco

nd F

isher

dire

ctio

n


minus21

minus205

minus2

minus195

minus19

minus185

minus18

minus175

minus17

minus165

minus16



Mudstone

0 01 0204

06

08

1

12

14

16

18


Kern

el se

cond

Fish

er d

irect

ion

SandstoneSiltstone







Acknowledgments


References



































Volume 2014




Journal of











Journal of


Function Spaces






Algebra










40 60 8020

30

40

50

Petal length

Peta

l wid

th

(a)


0

20

40

60

80

x3

(b)


0

10

20

30

Sepa

l wid

th

(c)

20 40 600

20

40

60

80

Petal width

Sepa

l len

gth

(d)

20 40 600

10

20

30

Petal width

Sepa

l wid

th

(e)

0 50 1000

10

20

30

Sepal length

Sepa

l wid

th

(f)



V119879120601 (119909) =119873

sum

119894=1

120572119894(120601 (119909) 120601 (119909

119894))

= 120572119879

((120601 (119909) 120601 (1199091)) (120601 (119909) 120601 (119909

2))

(120601 (119909) 120601 (119909119873)))

= 120572119879

(119870 (119909 1199091) 119870 (119909 119909

2) 119870 (119909 119909

119873))

(6)




1205830= (

1

119873

119873

sum

119894=1

119870(1199091 119909119894) 1

119873

119873

sum

119894=1

119870(1199092 119909119894)

1

119873

119873

sum

119894=1

119870(119909119873 119909119894))

119879

(7)

120583119894= (

1

119873119894

119873119894

sum

119895=1

119870(1199091 119909119894

119895))

1

119873119894

119873119894

sum

119895=1

119870(119909119873 119909119894

119895) (8)

119870119887=

119870

sum

119894=1

119873119894

119873(120583119894minus 1205830) (120583119894minus 1205830)119879

(9)

119870119908=1

119873

119870

sum

119894=1

119873119894

sum

119895=1

(120585119909119895minus 120583119894) (120585119909119895minus 120583119894)

119879

(10)

119869 (V) =V119879119878120601119887V

V119879119878120601119908V=120572119879

119870119887120572

120572119879119870119908120572= 119869 (120572) (11)




119870119887120572 = 120582119870

119908120572 (12)


If 119870119908


(119870119908)minus1

119870119887120572 = 120582120572 (13)




(119870119908)120583= 119870119908+ 120583119868 (14)


(119870119908+ 120583119868)minus1

119870119887120572 = 120582120572 (15)



119891 (120583) =1003816100381610038161003816119870119908 + 120583119868

1003816100381610038161003816 (16)


lim120583rarr0

119891 (120583) 997888rarr 0 (17)

4 Experiments



0 002 004 006 008 010

02

04

06

08

1

12

120583

times10minus41

Valu

e of t

he d

eter

min

antK

w+120583lowastI



119870(119909 119910) = exp(minus1003817100381710038171003817119909 minus 119910

1003817100381710038171003817

2

2120590) (18)








Seco

nd k

erne

l Fish

er d

irect

ion

minus05

minus1

minus15

minus2

minus25

minus3

minus35

0


Iris setosa

(a) 120583 = 009


Seco

nd k

erne

l Fish

er d

irect

ion

minus05

minus1

minus15

minus2

minus25

minus3

minus35

0


Iris setosa

(b) 120583 = 018


Seco

nd k

erne

l Fish

er d

irect

ion


minus05

minus1

minus15

minus2

minus25

minus3

0

Iris setosa

(c) 120583 = 0045







50 100 1500

02

04

06

08

CNL

(vv

)


2

22

24

26

28

DEN

(gc

m3 )

50 100 15050

100

150

200

250

GR

(API

)

0 05 12

22

24

26

28

DEN

(gc

m3 )

CNL (vv)0 05 1

50

100

150

200

250

CNL (vv)

GR

(API

)

2 25 350

100

150

200

250

DEN (gcm3)

GR

(API

)






6 Conclusion




Seco

nd F

isher

dire

ctio

n


minus21

minus205

minus2

minus195

minus19

minus185

minus18

minus175

minus17

minus165

minus16



Mudstone

0 01 0204

06

08

1

12

14

16

18


Kern

el se

cond

Fish

er d

irect

ion

SandstoneSiltstone







Acknowledgments


References



































Volume 2014




Journal of











Journal of


Function Spaces






Algebra











119870119887120572 = 120582119870

119908120572 (12)


If 119870119908


(119870119908)minus1

119870119887120572 = 120582120572 (13)




(119870119908)120583= 119870119908+ 120583119868 (14)


(119870119908+ 120583119868)minus1

119870119887120572 = 120582120572 (15)



119891 (120583) =1003816100381610038161003816119870119908 + 120583119868

1003816100381610038161003816 (16)


lim120583rarr0

119891 (120583) 997888rarr 0 (17)

4 Experiments



0 002 004 006 008 010

02

04

06

08

1

12

120583

times10minus41

Valu

e of t

he d

eter

min

antK

w+120583lowastI



119870(119909 119910) = exp(minus1003817100381710038171003817119909 minus 119910

1003817100381710038171003817

2

2120590) (18)








Seco

nd k

erne

l Fish

er d

irect

ion

minus05

minus1

minus15

minus2

minus25

minus3

minus35

0


Iris setosa

(a) 120583 = 009


Seco

nd k

erne

l Fish

er d

irect

ion

minus05

minus1

minus15

minus2

minus25

minus3

minus35

0


Iris setosa

(b) 120583 = 018


Seco

nd k

erne

l Fish

er d

irect

ion


minus05

minus1

minus15

minus2

minus25

minus3

0

Iris setosa

(c) 120583 = 0045







50 100 1500

02

04

06

08

CNL

(vv

)


2

22

24

26

28

DEN

(gc

m3 )

50 100 15050

100

150

200

250

GR

(API

)

0 05 12

22

24

26

28

DEN

(gc

m3 )

CNL (vv)0 05 1

50

100

150

200

250

CNL (vv)

GR

(API

)

2 25 350

100

150

200

250

DEN (gcm3)

GR

(API

)






6 Conclusion




Seco

nd F

isher

dire

ctio

n


minus21

minus205

minus2

minus195

minus19

minus185

minus18

minus175

minus17

minus165

minus16



Mudstone

0 01 0204

06

08

1

12

14

16

18


Kern

el se

cond

Fish

er d

irect

ion

SandstoneSiltstone







Acknowledgments


References



































Volume 2014




Journal of











Journal of


Function Spaces






Algebra











Seco

nd k

erne

l Fish

er d

irect

ion

minus05

minus1

minus15

minus2

minus25

minus3

minus35

0


Iris setosa

(a) 120583 = 009


Seco

nd k

erne

l Fish

er d

irect

ion

minus05

minus1

minus15

minus2

minus25

minus3

minus35

0


Iris setosa

(b) 120583 = 018


Seco

nd k

erne

l Fish

er d

irect

ion


minus05

minus1

minus15

minus2

minus25

minus3

0

Iris setosa

(c) 120583 = 0045







50 100 1500

02

04

06

08

CNL

(vv

)


2

22

24

26

28

DEN

(gc

m3 )

50 100 15050

100

150

200

250

GR

(API

)

0 05 12

22

24

26

28

DEN

(gc

m3 )

CNL (vv)0 05 1

50

100

150

200

250

CNL (vv)

GR

(API

)

2 25 350

100

150

200

250

DEN (gcm3)

GR

(API

)






6 Conclusion




Seco

nd F

isher

dire

ctio

n


minus21

minus205

minus2

minus195

minus19

minus185

minus18

minus175

minus17

minus165

minus16



Mudstone

0 01 0204

06

08

1

12

14

16

18


Kern

el se

cond

Fish

er d

irect

ion

SandstoneSiltstone







Acknowledgments


References



































Volume 2014




Journal of











Journal of


Function Spaces






Algebra










50 100 1500

02

04

06

08

CNL

(vv

)


2

22

24

26

28

DEN

(gc

m3 )

50 100 15050

100

150

200

250

GR

(API

)

0 05 12

22

24

26

28

DEN

(gc

m3 )

CNL (vv)0 05 1

50

100

150

200

250

CNL (vv)

GR

(API

)

2 25 350

100

150

200

250

DEN (gcm3)

GR

(API

)






6 Conclusion




Seco

nd F

isher

dire

ctio

n


minus21

minus205

minus2

minus195

minus19

minus185

minus18

minus175

minus17

minus165

minus16



Mudstone

0 01 0204

06

08

1

12

14

16

18


Kern

el se

cond

Fish

er d

irect

ion

SandstoneSiltstone







Acknowledgments


References



































Volume 2014




Journal of











Journal of


Function Spaces






Algebra











Seco

nd F

isher

dire

ctio

n


minus21

minus205

minus2

minus195

minus19

minus185

minus18

minus175

minus17

minus165

minus16



Mudstone

0 01 0204

06

08

1

12

14

16

18


Kern

el se

cond

Fish

er d

irect

ion

SandstoneSiltstone







Acknowledgments


References



































Volume 2014




Journal of











Journal of


Function Spaces






Algebra





























Volume 2014




Journal of











Journal of


Function Spaces






Algebra
















Volume 2014




Journal of











Journal of


Function Spaces






Algebra









Documents

Research Article Kernel Fisher Discriminant Analysis Based ...Research Article Kernel Fisher Discriminant Analysis Based on a Regularized Method for Multiclassification and Application