9
Received May 1, 2021, accepted May 28, 2021, date of publication June 4, 2021, date of current version June 14, 2021. Digital Object Identifier 10.1109/ACCESS.2021.3086135 Accuracy Improvement of Power Transformer Faults Diagnostic Using KNN Classifier With Decision Tree Principle OMAR KHERIF 1 , (Member, IEEE), YOUCEF BENMAHAMED 2 , (Member, IEEE), MADJID TEGUAR 2 , AHMED BOUBAKEUR 2 , (Senior Member, IEEE), AND SHERIF S. M. GHONEIM 3 , (Senior Member, IEEE) 1 Advanced High Voltage Engineering Research Centre, School of Engineering, Cardiff University, Cardiff CF24 3AA, U.K. 2 Laboratoire de Recherche en Electrotechnique, École Nationale Polytechnique (ENP) of Algiers, El Harrach 16200, Algeria 3 Electrical Engineering Department, College of Engineering, Taif University, Ta’if 21944, Saudi Arabia Corresponding author: Youcef Benmahamed ([email protected]) This work was supported by the Taif University, Taif, Saudi Arabia, through the Taif University Researchers Supporting Project, under Grant TURSP-2020/34. ABSTRACT Dissolved gas analysis (DGA) is the standard technique to diagnose the fault types of oil-immersed power transformers. Various traditional DGA methods have been employed to detect the transformer faults, but their accuracies were mostly poor. In this light, the current work aims to improve the diagnostic accuracy of power transformer faults using artificial intelligence. A KNN algorithm is combined with the decision tree principle as an improved DGA diagnostic tool. A total of 501 dataset samples are used to train and test the proposed model. Based on the number of correct detections, the neighbor’s number and distance type of the KNN algorithm are optimized in order to improve the classifier’s accuracy rate. For each fault, indeed, several input vectors are assessed to select the most appropriate one for the classifier’s corresponding layer, increasing the overall diagnostic accuracy. On the basis of the accuracy rate obtained by knots and type of defect, two models are proposed where their results are compared and discussed. It is found that the global accuracy rate exceeds 93% for the power transformer diagnosis, demonstrating the effectiveness of the proposed technique. An independent database is employed as a complimentary validation phase of the proposed research. INDEX TERMS DGA, fault diagnosis, power transformer oil, KNN algorithm, decision tree principle. I. INTRODUCTION High voltage power transformers are mainly required for the associated heavy and powerful applications in an industrial environment. These transformers use particular insulation systems that depend primarily on the voltage levels. Indeed, the higher the voltage, the greater the impact on the trans- former’s lifetime and reliability [1]. Otherwise, the trans- former insulation systems may deteriorate when exposed to numerous defects arising from overheating, paper carboniza- tion, arcing, and discharges of low or high energy [2]. There- fore, early-stage detection of faults should be conducted to ensure an efficient service of these transformer [3], [4]. For this purpose, several methods were proposed in the litera- ture. Among them, Dissolved Gas Analysis (DGA) represents The associate editor coordinating the review of this manuscript and approving it for publication was Baoping Cai . one of the fastest, economical, and widely used techniques referring to those related to the dielectric insulation sys- tems [5], [6]. Hydrogen (H 2 ), Methane (CH 4 ), Acetylene (C 2 H 2 ), Ethylene (C 2 H 4 ), and Ethane (C 2 H 6 ) might be gener- ated within the oil during a faulty mode [7]. Hence, the power transformer’s abnormal state can be identified by the DGA method according to the dissolved gases composition and content. The concentrations of these gases are associated with six basic electrical and thermal faults, which might occur separately or in a combination [1], [8]. Based on DGA, different approaches have been developed to diagnose the multiple transformer faults and quantita- tively indicate each fault’s likelihood. These approaches are mainly based on (i) graphical (e.g., [2], [9]), (ii) artifi- cial intelligence techniques (e.g., [10], [11]), and (iii) other improved coupled techniques (e.g., [12]). Overall, the analy- sis is generally based on the concentration of the five principal VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ 81693

Accuracy Improvement of Power Transformer Faults

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Accuracy Improvement of Power Transformer Faults

Received May 1, 2021, accepted May 28, 2021, date of publication June 4, 2021, date of current version June 14, 2021.

Digital Object Identifier 10.1109/ACCESS.2021.3086135

Accuracy Improvement of Power TransformerFaults Diagnostic Using KNN ClassifierWith Decision Tree PrincipleOMAR KHERIF 1, (Member, IEEE), YOUCEF BENMAHAMED 2, (Member, IEEE),MADJID TEGUAR2, AHMED BOUBAKEUR 2, (Senior Member, IEEE),AND SHERIF S. M. GHONEIM 3, (Senior Member, IEEE)1Advanced High Voltage Engineering Research Centre, School of Engineering, Cardiff University, Cardiff CF24 3AA, U.K.2Laboratoire de Recherche en Electrotechnique, École Nationale Polytechnique (ENP) of Algiers, El Harrach 16200, Algeria3Electrical Engineering Department, College of Engineering, Taif University, Ta’if 21944, Saudi Arabia

Corresponding author: Youcef Benmahamed ([email protected])

This work was supported by the Taif University, Taif, Saudi Arabia, through the Taif University Researchers Supporting Project, underGrant TURSP-2020/34.

ABSTRACT Dissolved gas analysis (DGA) is the standard technique to diagnose the fault types ofoil-immersed power transformers. Various traditional DGA methods have been employed to detect thetransformer faults, but their accuracies were mostly poor. In this light, the current work aims to improve thediagnostic accuracy of power transformer faults using artificial intelligence. A KNN algorithm is combinedwith the decision tree principle as an improved DGA diagnostic tool. A total of 501 dataset samples are usedto train and test the proposed model. Based on the number of correct detections, the neighbor’s number anddistance type of the KNN algorithm are optimized in order to improve the classifier’s accuracy rate. Foreach fault, indeed, several input vectors are assessed to select the most appropriate one for the classifier’scorresponding layer, increasing the overall diagnostic accuracy. On the basis of the accuracy rate obtainedby knots and type of defect, two models are proposed where their results are compared and discussed. It isfound that the global accuracy rate exceeds 93% for the power transformer diagnosis, demonstrating theeffectiveness of the proposed technique. An independent database is employed as a complimentary validationphase of the proposed research.

INDEX TERMS DGA, fault diagnosis, power transformer oil, KNN algorithm, decision tree principle.

I. INTRODUCTIONHigh voltage power transformers are mainly required for theassociated heavy and powerful applications in an industrialenvironment. These transformers use particular insulationsystems that depend primarily on the voltage levels. Indeed,the higher the voltage, the greater the impact on the trans-former’s lifetime and reliability [1]. Otherwise, the trans-former insulation systems may deteriorate when exposed tonumerous defects arising from overheating, paper carboniza-tion, arcing, and discharges of low or high energy [2]. There-fore, early-stage detection of faults should be conducted toensure an efficient service of these transformer [3], [4]. Forthis purpose, several methods were proposed in the litera-ture. Among them, DissolvedGasAnalysis (DGA) represents

The associate editor coordinating the review of this manuscript and

approving it for publication was Baoping Cai .

one of the fastest, economical, and widely used techniquesreferring to those related to the dielectric insulation sys-tems [5], [6]. Hydrogen (H2), Methane (CH4), Acetylene(C2H2), Ethylene (C2H4), and Ethane (C2H6) might be gener-ated within the oil during a faulty mode [7]. Hence, the powertransformer’s abnormal state can be identified by the DGAmethod according to the dissolved gases composition andcontent. The concentrations of these gases are associated withsix basic electrical and thermal faults, which might occurseparately or in a combination [1], [8].

Based on DGA, different approaches have been developedto diagnose the multiple transformer faults and quantita-tively indicate each fault’s likelihood. These approaches aremainly based on (i) graphical (e.g., [2], [9]), (ii) artifi-cial intelligence techniques (e.g., [10], [11]), and (iii) otherimproved coupled techniques (e.g., [12]). Overall, the analy-sis is generally based on the concentration of the five principal

VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ 81693

Page 2: Accuracy Improvement of Power Transformer Faults

O. Kherif et al.: Accuracy Improvement of Power Transformer Faults Diagnostic Using KNN Classifier

hydrocarbon gases where the said concentration is useddirectly in ppm or percentages to the total sum. Likewise,various methods are based on a combination of ratios of somespecific gases. For instance, three-gas ratios (C2H2/C2H4,CH4/H2, and C2H4/C2H6) are used in Roger’s method [6]and IEC 60599 [8], three relative percentages in Duval’striangle (%CH4, %C2H2 and %C2H4) [2], and four-gas ratios(CH4/H2, C2H2/CH4, C2H4/C2H6, and C2H2/CH4) in Dor-nenburg method [1]. Such techniques are mostly requiredfor the associated heavy and powerful applications. Thesetechniques and other ones (e.g., [13], [14]) are presented inthe literature to identify the different kinds of faults occurringin operating transformers. However, their diagnostic accuracyrequires further improvement. In this light, Duval’s pentagonhas been developed as a complementary tool for interpretingthe DGA in power transformers [9].For this technique, five relative percentages of the (five)

leading hydrocarbon gases (%H2, %CH4, %C2H2, %C2H4,and %C2H6) are used. In the same context, probabilisticclassifiers based on Parzen Windows, Bayesian and Mexicanhat functions (e.g., [15], [16]) have been employed for trans-formers fault classification using actual DGA data. More-over, various artificial intelligence techniques have been alsoapplied for the transformer fault diagnosis, such as a fuzzylogic technique [17]. On the other hand, bootstrap and geneticprogramming have been developed to extract classificationfeatures for each fault class. These extracted features havebeen employed as the inputs to an artificial neural network(ANN), a support vector machine (SVM), or a K-NearestNeighbor (KNN) classifier to perform multicategory faultclassification [18]. Also, combined Duval pentagon withSVM and KNN algorithms have been proposed to improvethe fault diagnostic accuracy [7]. It is worth noting thatthe KNN algorithm was first suggested by Cover and Hartin 1967 [19]. This algorithm has encountered several recentimprovements (e.g., [20]–[23]).Overall, the originality of this work consists in introducing

several input vectors into the KNN classification algorithmbased on a decision tree principle (DT) in order to selectthe best one that achieves high accuracy for the transformerfaults diagnostic. Various types and combinations of inputvectors have been employed, namely, the concentration ofgas in ppm, relative concentration of gas in percentage, IECratios, Rogers four-ratios, Dornenburg ratios, Duval trianglecoordinates, Duval pentagon coordinates, a combination ofRogers and Dornenburg ratios, and the combination of Duvaltriangle-pentagon coordinates. The accuracy rate has beenanalyzed to select the most appropriate input vector for theproposed method.The current paper is organized as follows: In Section 2,

we formulate the faults classification problem in power trans-formers and describe the database set used in this inves-tigation. A general description of the KNN technique’stheory is introduced in Section 3, including the basic theoryof the KNN technique. In Section 4, the interpretativemethods based on DGA are reported. The selection of an

appropriate input feature for each classification layer and theproposed combined KNN classifier’s performance with thedecision tree rule are accomplished in Section 5 to demon-strate the proposedmethod effectiveness. Finally, conclusionsare summarized, and potential future works are discussed inSection 6.

II. PROBLEM FORMULATIONHighly reliable transformers are mainly made of iron coreand windings. Both components are placed in a tank filledwith insulating oil. Figure 1 shows a cross-section of a typicaloil-immersed power transformer.

FIGURE 1. Representation of oil-immersed power transformercross-section.

Dissolved gases, that are liberated under particularelectrical or thermal constraints, represent a powerful featureindicating the affection of oil properties. In general, the mostimportant gases are H2, CH4, C2H2, C2H4, and C2H6. Theconcentration of these gases is strongly related to the type oftransformer fault, and the rate of gas generation can be usedto identify the fault type [24]. For instance, acetylene is asso-ciated with arcs where temperatures reach several thousanddegrees. Ethylene is related to hotspots between 150 ◦C and1000 ◦C, and hydrogen with cold gas plasma from coronadischarges. Although mixtures of all gases, including othersaturated hydrocarbons, are generally obtained in most casesof faults where their relative proportions have been correlatedwith the different fault types [25].

Oil samples in small volumes are periodically taken fromthe drain valve at the bottom of the transformer tank for theDGA test using chromatography as described in [25], [26].The most common defects are partial discharge (PD), lowenergy discharges (D1), high energy discharges (D2), thermalfaults< 300 ◦C (T1), thermal faults of 300 ◦C to 700 ◦C (T2),and thermal faults > 700 ◦C (T3) [7].A database set of 501 samples is used to train, test, and

validate the proposed classifier in the present investigation.This database is collected from the literature [4], [28]–[31].

81694 VOLUME 9, 2021

Page 3: Accuracy Improvement of Power Transformer Faults

O. Kherif et al.: Accuracy Improvement of Power Transformer Faults Diagnostic Using KNN Classifier

TABLE 1. Database distribution.

The fault type samples distribution (number of samples asso-ciated with each defect) is explained in Table 1.

III. K NEAREST NEIGHBORS CLASSIFIER (KNN)KNN algorithms are ranked among the simplest intelligentalgorithms that do not require any learning phase. It is basedon calculating the distances between the sampling points tothe nearest neighbors of the set of assigned points [19]. Thedecision is based on the majority vote of the KNN. Manytypes of distances can be used to decide the nearest neighbors,such as Gaussian, triangular, cosine . . . etc. [10].Figure 2 illustrates the principle as well as the influence

of the choice of neighbors number. The three closest starneighbors selection allows the star classification as a square(objects inside the small circle in a continuous line). However,the star is classified as a triangle if we consider five closestneighbors (items inside the large ring in discontinuous line).Indeed, the choice of the neighbor’s number k is a leadingfactor during the classification process.

FIGURE 2. Influence of the number of neighbors on the classificationresults (reproduced from [7]).

A. 1-NN ALGORITHM (KNN WITH K = 1)We consider L the data set of awarded points given by:

L : {(yi, xi), i = 1, . . . , n}

where yi ∈ {1, . . . , c} denotes the class of sample i andthe vector xi = (xi1, . . . , xip) represents the variables thatcharacterize the sample i.

The distance function d(.,.) determines the nearest neigh-bor. P variables characterize the Euclidean distance between

a sample xi and an attributed point xj and is defined by:

d((xi1, xi2, . . . , xip), (xj1, xj2, . . . , xjp))

=

√(xi1 − xj1)2 + (xi2 − xj2)2 + . . .+ (xip − xjp)2 (1)

The observation of the sample (y, x) by the nearest neigh-bor (y(1), x(1)) in the learning sample is determined by:

d(xi, xj) = minj(d(xi, xj)) (2)

We designate by y= y(i) the estimated class of the nearestneighbor. This class is selected for the prediction of y.

Considering the Minkowski formula of equation (3),the Euclidian distance is obtained by replacing p by 2 andis given by equation (4)

d(xi, xj) =

( p∑s=1

∣∣xis − xjs∣∣p)1p

(3)

d(xi, xj) =

( p∑s=1

(xis − xjs)2) 1

2

(4)

B. KNN ALGORITHMIn applications, several closer neighbors are usuallyemployed. The decision favors the classes majority repre-sented by the k neighbors obtained from equation 5, kr beingthe number of observations from the group of closest neigh-bors belonging to class r and ‘‘c’’ the number of classes.

c∑r=1

kr = k (5)

The observation is predicted in class l, only if l=max (kr)with k ∈ N.It is recommended to choose the odd k, o avoid equal votes

in the binary classification. However, in the case of multi-classes, the best choice of k depends on the nature of thedata. The noise effect on classification (risk of overlearning)is reduced when k takes large values. However, this choicemakes the boundaries between classes less distinct. The bestselection of k is the one that minimizes the classificationerror [27].

C. KERNEL FUNCTIONThe classification is influenced by parameter k and the typeof the kernel function K(x). The K function must check thefollowing properties:

VOLUME 9, 2021 81695

Page 4: Accuracy Improvement of Power Transformer Faults

O. Kherif et al.: Accuracy Improvement of Power Transformer Faults Diagnostic Using KNN Classifier

- K(d) ≥ 0 for all d ∈ <;- K(d) reaches its maximum for d = 0;- K(d) decreasing for d→±∞.Several kernel functions [27] are rectangular

(uniform law), triangular, Epanechnikov, cosine, Gaussian,and reverse.

D. KNN ALGORITHM ADVANTAGES AND DISADVANTAGESThis technique is easy to implement and apply to any data,including complex ones such as geographic information, text,images, and sound. Also, it is robust to noise. The intro-duction of new data does not require the reconstruction of amodel. The class is assigned to an object with ease and clarityonce the closest neighbors are displayed.

As mentioned, the method performance depends on thedistance type, and the number of neighbors, and how theneighbors’ responses are combined. The results could beof poor quality if the number of relevant attributes is lowrelative to the total number of characteristics. The distanceson the irrelevant attributes will drown out the proximityon the appropriate attributes. The calculations made in theclassification phase can be very time-consuming.

IV. DGA–BASED INPUT VECTORMany interpretative methods based on DGA were reportedin the literature to detect the incipient fault nature within anoil-immersed power transformer [7], [10]. These techniquesmainly include the following input vectors:• Vector 1: Since the database contains the concentrationsof the five gases in parts per million or ppm, eachsample X is represented as follows:

X = [H2, CH4, C2H2, C2H4, C2H6] (6)

• Vector 2: Since the weight percent of the gases wouldresult in an inopportunely small number, percent con-centration to the total sum is also used where eachsample X = [x1, x2, · · ·, x5] is scaled as follows:

X ←X5∑i=1

xi

× 100%

X = [%H2, %CH4, %C2H2, %C2H4, %C2H6] (7)

• Vector 3: The IEC Ratios method is used to producethe following input vector containing three ratios of thedissolved gases given by:

X = [CH4

H2,C2H2

C2H4,C2H4

C2H6] (8)

• Vector 4: Roger’s four-ratio method has been selectedin this case to transform each sample to the followingone:

X = [CH4

H2,C2H2

C2H4,C2H4

C2H6,C2H6

CH4] (9)

• Vector 5: Dornenburg’s method is also investigated inthis study. In this method, the input consists of four ratios

computed as a function of the dissolved gases in ppm asfollows:

X = [CH4

H2,C2H2

C2H4,C2H4

C2H6,C2H2

CH4] (10)

• Vector 6: Duval triangle is a graphical method that useonly the concentration of three gases (CH4, C2H2, andC2H4) to produce the input vector as follows:

X = [Cx ,Cy] (11)

where the components Cx and Cy are computed by:

Cx =13

n−1∑i=0

(xi + xi+1)(xiyi+1 − xi+1yi)

n−1∑i=0

(xiyi+1 − xi+1yi)

(12)

And

Cy =13

n−1∑i=0

(yi + yi+1)(yixi+1 − yi+1xi)

n−1∑i=0

(yixi+1 − yi+1xi)

(13)

The parameters xi are defined as follows:

x0 = %CH4 cos(π

2)

x1 = %C2H4 cos(π

2+ α)

x2 = %C2H2 cos(π

2+ 2α) (14)

The parameters yi can be found by replacing the cosinewith the sine in the previous expressions with α = 2π3.• Vector 7: The input vector, in this case, has two compo-nents given as follows:

X = [Cx ,Cy] (15)

X is computed according to the Duval pentagon that usesthe concentration of five gases in percentages. The compo-nents Cx and Cy are calculated by:

Cx =16

n−1∑i=0

(xi + xi+1)(xiyi+1 − xi+1yi)

n−1∑i=0

(xiyi+1 − xi+1yi)

(16)

And

Cy =16

n−1∑i=0

(yi + yi+1)(yixi+1 − yi+1xi)

n−1∑i=0

(yixi+1 − yi+1xi)

(17)

where the parameters xi are defined as follows:

x0 = %H2 cos(π

2)

x1 = %C2H6 cos(π

2+ α)

81696 VOLUME 9, 2021

Page 5: Accuracy Improvement of Power Transformer Faults

O. Kherif et al.: Accuracy Improvement of Power Transformer Faults Diagnostic Using KNN Classifier

x2 = %CH4 cos(π

2+ 2α)

x3 = %C2H4 cos(π

2+ 3α)

x4 = C2H4 cos(π

2+ 4α) (18)

The parameters yi can be found by replacing the cosinewith the sine in the previous expressions with α = 2π5.Other possible combinations of the above technique have

also been proposed to give strong credibility to the obtainedresults. Two combinations are given below.• Vector 8: The first combination consists of an inputvector of five ratios given by:

X = [CH4

H2,C2H2

C2H4,C2H4

C2H6,C2H2

CH4,C2H6

CH4] (19)

Equation (14) refers to the mixture of Roger’s and Dornen-burg’s methods.• Vector 9: The input vector, in this case, has four com-ponents given as follows:

X = [Cx1,Cy1,Cx2,Cy2] (20)

According to Duval’s triangle and pentagon, this vectorhas computed that use the concentration of five gases inpercentages. Cx1 and Cy1 are calculated according to thetriangle method, while Cx2 and Cy2 employ the pentagontechnique.

V. SIMULATIONS AND RESULTSThe database of 501 samples, already used to evaluate theKNN classifier’s accuracy rate, has been also employed inthis section. Randomly selected, 321 samples have beenutilized for the training phase,160 samples for the testingphase, and 20 samples to examine the validity of the proposedclassifier. Each previous vector has been used independentlyas an input of a KNN classifier. Several types of distance havebeen used, namely ‘‘Euclidean’’, ‘‘Cityblock’’, ‘‘Cosine’’,and ‘‘Correlation’’. For the training phase, the number ofneighbors k has been varied from 1 to 321 where the valuecorresponding to the better accuracy rate is maintained. Thisprocedure has been repeated for the nine types of the previ-ously defined input vectors. Figure 3 shows an example ofthe classification results obtained when using the first inputvector (gases in ppm).

From the obtained results in Figure 3, an accuracy rateof 88.75% has been developed for both distances ‘‘Correla-tion’’ and ‘‘Cosine’’ with the same neighbors number k = 6.Whereas, an accuracy rate of 77.50% (respectively 76.25%) isobtained for ‘‘Cityblock’’ (respectively Euclidean) achievedfor a distance k = 8 (respectively k = 4).For several neighbors and distance types, all input vectors

have been separately tested. Based on the obtained results,the best classification accuracy rate is selected for each inputvector. Figure 4 regroups the best nine classification results.

From this figure, it is clear that the accuracy rate is affectedby both the type and the number of the neighbors. Higher

FIGURE 3. Impact of input vector ‘‘type 1’’ and number of neighbors onthe classification performance (accuracy rate).

FIGURE 4. Performance of KNN classifier as function of the number ofneighbors for different input vectors.

accuracies are obtained for a relatively low number of neigh-bors, tending toward a plateau of about 28%. In order toquantify the results of Figure 4, the best accuracy rate over thenumber of the neighbors is subtracted for each input vector.The obtained results are given by Table 2.

As shown in Table 2, it is found that the highest accu-racy rate, of 91.88%, is obtained when employing vector 2(gas in percentage) as an input of the KNN with Cityblockdistance and k= 10. Furthermore, the combined triangle andpentagon coordinates (vector 9, using Cityblock distance andk = 6) came in second place with an accuracy rateof 90.63%. Finally, the combined Roger’s and Dornen-burg’ gave the poorest results (29.88%) compared to theothers.

As stated in the introduction, the novelty of thisinvestigation is to examine several types of input vector, in the

VOLUME 9, 2021 81697

Page 6: Accuracy Improvement of Power Transformer Faults

O. Kherif et al.: Accuracy Improvement of Power Transformer Faults Diagnostic Using KNN Classifier

TABLE 2. KNN classifier for all input vectors.

TABLE 3. KNN analysis of accuracy rate.

proposed classifier, and to compare the obtained diagnosticaccuracies. Select the best input vector that effectively sep-arates the electrical and thermal faults, which represents thefirst node of the classification process. For the electrical fault,there are two stages (nodes); (i) the first electrical node sepa-rates the fault PD from D1&D2, and (ii) the second electricalnode distinguish between the faults D1 andD2. It is importantto note that the same scenario is repeated for the thermalfault. The first thermal node aims to separate T3 and T1&T2,and the last thermal node intends to isolate T1 and T2. Thedecision tree’s strategy is represented in Figure 5. Note that

FIGURE 5. General representation of the decision tree strategy.

TABLE 4. Comparison of accuracy rate before and after improvementwith models 1.

TABLE 5. The accuracy rate of the validation phase.

the best number of neighbors k and the selected distance typeswere employed in the following study.

Elaborating the whole of the previous input vectors,an analysis has been made to compute the accuracy rateat each node. Table 3 illustrates a detailed analysis of theaccuracy rate leading to selecting the more appropriate inputvector and neighbor number for each node.

81698 VOLUME 9, 2021

Page 7: Accuracy Improvement of Power Transformer Faults

O. Kherif et al.: Accuracy Improvement of Power Transformer Faults Diagnostic Using KNN Classifier

TABLE 6. Comparison between the proposed algorithm and other traditional methods.

From these results, one can clearly see that the inputvectors 1, 2, 3, 7, and 9 separate the electrical and thermalfaults with the same accuracy rate of 98.75%. Indeed, bothnodes of electrical faults are suitable with the input vectorin ppm. Regarding the thermal faults node, the first node(T3 vs. T1&T2) is more compatible with both input vectors,in percentage, and the combined Duval triangle and pen-tagon. The accuracy rate in this stage reaches 93.24%. Thesecond thermal node is consistent with the combined Duvaltriangle and pentagon input vector, with an accuracy rateof 91.67%.

Based on the results of Table 3 as well as the accuracy rateby nodes and type of fault, it can be concluded that the use ofthe input vectors 2 and 9 in the same algorithm can improvethe overall accuracy of the diagnostic. This reasoning can berecapitulated in the flowchart given by Figure 6 (denoted bymodel 1).

Table 4 shows the diagnostic accuracy rate of the KNNalgorithm when using the input vector of DGA in ppm andthe combined Duval triangle and pentagon input vectors and

the proposed enhancement of diagnostic accuracy of theKNNalgorithm by the strategy of a decision tree.

One can see that the diagnostic accuracy rate achieved avalue of 92.5% with the proposed method. It was 91.88%with the input vector of DGA in ppm and 90.63% with thecombined Duval triangle and pentagon input vector whenused alone (Table 3). The first thermal fault node detectedmany cases by T3, but the actual fault was T2. Therefore,another model has been proposed, as in Figure 7, to overcomethis situation.

The second proposed model shows a high accuracy rateof 93.75% compared with the previous application given byTable 4. This imply that the second model is more accurate.For further verification, all proposed models are tested by anindependent new dataset of 20 samples. Table 5 shows theresults of both proposed models along with those related tothe KNN classifier with input vectors 2 and 9.

For the validation phase, the results of Table 5 confirm thatthe second proposed model (model 2) gave the best accuracyrate compared to the other ones. An accuracy rate of 90%

VOLUME 9, 2021 81699

Page 8: Accuracy Improvement of Power Transformer Faults

O. Kherif et al.: Accuracy Improvement of Power Transformer Faults Diagnostic Using KNN Classifier

FIGURE 6. Proposed flowchart of improved diagnostic accuracy (Model 1).

FIGURE 7. Proposed flowchart of improved diagnostic accuracy (Model 2).

was obtained against 80%when employing the first proposed(model 1). Both models are more accurate than the traditionalmethods of power transformer diagnosis as shown in Table 5.

Table 6 illustrates the comparison between the KNN-DT(with model 2) diagnosis results and the Duval triangle,Rogers ratios, and IEC 60 599 methods, the common DGAmethods in the literature. The results in Table 6 explained thehigh performance of the KNN- DT for correctly diagnose oftransformer faults where the KNN-DT accuracy succeededin detecting 18 samples of 20 with 90% accuracy. On theother hand, the other three methods developed poor diag-nostic accuracies, which are 30% (6/20), 25% (4/20), and30% (6/20) for Duval triangle, Rogers’ ratio, and IEC code,

respectively. Therefore, the KNN-DT has high reliability todiagnose transformer faults.

VI. CONCLUSIONOil-immersed power transformer fault diagnosis was inves-tigated using different DGA methods, in which a databaseset of 501 samples was exploited. A decision tree algorithmwas improved the KNN classifier to enhance the accuracyof the transformer faults diagnostic. The neighbor’s numberand distance type of the KNN algorithm were optimized toimprove the classifier’s accuracy rate. Several input vectorswere assessed for each fault to select the most appropriatevector associated with this type of fault for the combina-tion of the KNN classifier with the decision tree principle.The obtained results were discussed, and two models wereproposed in order to improve the global accuracy rate. Bothproposed models confirmed their accurateness where theglobal accuracy rate exceeded 93% for the power transformerdiagnosis. A complementary validation phase of the proposedresearch was also considered using an independent database.

ACKNOWLEDGMENTThe authors would like to acknowledge the financial sup-port received from Taif University Researchers SupportingProject Number (TURSP-2020/34), Taif University, Taif,Saudi Arabia.

REFERENCES[1] IEEE Guide for the Interpretation of Gases Generated in Mineral Oil-

Immersed Transformers, IEEE Standard C57.104-2019 (Revision IEEEStd C57.104-2008), Nov. 2019, pp. 1–98.

[2] M. Duval, ‘‘Dissolved gas analysis: It can save your transformer,’’ IEEEElect. Insul. Mag., vol. 5, no. 6, pp. 22–27, Nov./Dec. 1989.

[3] S. S. M. Ghoneim, ‘‘Intelligent prediction of transformer faults and sever-ities based on dissolved gas analysis integrated with thermodynamicstheory,’’ IET Sci., Meas. Technol., vol. 12, no. 3, pp. 388–394, May 2018.

[4] I. B. M. Taha, A. Hoballah, and S. S. M. Ghoneim, ‘‘Optimal ratio limitsof rogers’ four-ratios and IEC 60599 code methods using particle swarmoptimization fuzzy-logic approach,’’ IEEE Trans. Dielectr. Electr. Insul.,vol. 27, no. 1, pp. 222–230, Feb. 2020.

[5] Recent Developments in DGA Interpretation, CIGRE, Paris, France,Jun. 2006.

[6] R. R. Rogers, ‘‘IEEE and IEC codes to interpret incipient faults in trans-formers, using gas in oil analysis,’’ IEEE Trans. Electr. Insul., vol. EI-13,no. 5, pp. 348–354, Oct. 1978.

[7] Y. Benmahamed, M. Teguar, and A. Boubakeur, ‘‘Application of SVMand KNN to duval pentagon 1 for transformer oil diagnosis,’’ IEEE Trans.Dielectr. Electr. Insul., vol. 24, no. 6, pp. 3443–3451, Dec. 2017.

[8] Mineral Oil-Impregnated Electrical Equipment in Service-Guide to theInterpretation of Dissolved and Free Gases Analysis, IEC, Geneva,Switzerland, 2007.

[9] M. Duval and L. Lamarre, ‘‘The duval pentagon—A new complementarytool for the interpretation of dissolved gas analysis in transformers,’’ IEEEElect. Insul. Mag., vol. 30, no. 6, pp. 9–12, Nov./Dec. 2014.

[10] Y. Benmahamed, Y. Kemari, M. Teguar, and A. Boubakeur, ‘‘Diagnosisof power transformer oil using KNN and nave bayes classifiers,’’ inProc. IEEE 2nd Int. Conf. Dielectr. (ICD), Budapest, Hungary, Jul. 2018,pp. 1–4.

[11] S. S. M. Ghoneim, K. Mahmoud, M. Lehtonen, and M. M. F. Darwish,‘‘Enhancing diagnostic accuracy of transformer faults using teaching-learning-based optimization,’’ IEEE Access, vol. 9, pp. 30817–30832,Feb. 2021.

[12] Y. Benmahamed, M. Teguar, and A. Boubakeur, ‘‘Diagnosis of powertransformer oil using PSO-SVM and KNN classifiers,’’ in Proc. Int. Conf.Electr. Sci. Technol. Maghreb (CISTEM), Algiers, Algeria, Oct. 2018,pp. 28–31.

81700 VOLUME 9, 2021

Page 9: Accuracy Improvement of Power Transformer Faults

O. Kherif et al.: Accuracy Improvement of Power Transformer Faults Diagnostic Using KNN Classifier

[13] W. Chen, C. Pan, Y. Yun, and Y. Liu, ‘‘Wavelet networks in power trans-formers diagnosis using dissolved gas analysis,’’ IEEE Trans. Power Del.,vol. 24, no. 1, pp. 187–194, Jan. 2009.

[14] H. Malik and S. Mishra, ‘‘Application of gene expression programming(GEP) in power transformers fault diagnosis using DGA,’’ IEEE Trans.Ind. Appl., vol. 52, no. 6, pp. 4556–4565, Nov./Dec. 2016.

[15] Z. J. Richardson, J. Fitch, W. H. Tang, J. Y. Goulermas, and Q. H. Wu,‘‘A probabilistic classifier for transformer dissolved gas analysis witha particle swarm optimizer,’’ IEEE Trans. Power Del., vol. 23, no. 2,pp. 751–759, Apr. 2008.

[16] Z. Yongli, H. Limin, and L. Jinling, ‘‘Bayesian networks-based approachfor power systems fault diagnosis,’’ IEEE Trans. Power Del., vol. 21, no. 2,pp. 634–639, Apr. 2006.

[17] Q. Su, C. Mi, L. L. Lai, and P. Austin, ‘‘A fuzzy dissolved gas analysismethod for the diagnosis of multiple incipient faults in a transformer,’’IEEE Trans. Power Del., vol. 15, no. 2, pp. 593–798, May 2000.

[18] A. Shintemirov, W. H. Tang, and Q. H. Wu, ‘‘Power transformer fault clas-sification based on dissolved gas analysis by implementing bootstrap andgenetic programming,’’ IEEE Trans. Power Del., vol. 39, no. 1, pp. 69–79,Jan. 2009.

[19] T. Cover and P. Hart, ‘‘Nearest neighbor pattern classification,’’ IEEETrans. Inf. Theory, vol. IT-13, no. 1, pp. 21–27, Jan. 1967.

[20] Y.-H. Shih and C.-K. Ting, ‘‘Evolutionary optimization on k-nearestneighbors classifier for imbalanced datasets,’’ in Proc. IEEECongr. Evol. Comput. (CEC), Wellington, New Zealand, Jun. 2019,pp. 3348–3355.

[21] S. Liu, P. Zhu, and S. Qin, ‘‘An improved weighted KNN algorithm forimbalanced data classification,’’ in Proc. IEEE 4th Int. Conf. Comput.Commun. (ICCC), Chengdu, China, Dec. 2018, pp. 1814–1819.

[22] Y. Cui, H. Ma, and T. Saha, ‘‘Improvement of power transformer insu-lation diagnosis using oil characteristics data preprocessed by SMOTE-Boost technique,’’ IEEE Trans. Dielectr. Electr. Insul., vol. 21, no. 5,pp. 2363–2373, Oct. 2014.

[23] K. Li, P. Xie, J. Zhai, and W. Liu, ‘‘An improved AdaBoost algorithm forimbalanced data based on weighted KNN,’’ in Proc. IEEE 2nd Int. Conf.Big Data Anal. (ICBDA), Beijing, China, Mar. 2017, pp. 30–34.

[24] P. Mirowski and Y. LeCun, ‘‘Statistical machine learning and dissolvedgas analysis: A review,’’ IEEE Trans. Power Del., vol. 27, no. 4,pp. 1791–1799, Oct. 2012.

[25] N. Bakar, A. Abu-Siada, and S. Islam, ‘‘A review of dissolved gas analysismeasurement and interpretation techniques,’’ IEEE Elect. Insul. Mag.,vol. 30, no. 3, pp. 39–49, May/Jun. 2014.

[26] K. S. Nagapriya, S. Sinha, R. Prashanth, S. Poonacha, G. Chaudhry,A. Bhattacharya, N. Choudhury, S. Mahalik, and S. Maity, ‘‘Lasercalorimetry spectroscopy for ppm-level dissolved gas detection and anal-ysis,’’ Sci. Rep., vol. 7, no. 1, Feb. 2017, Art. no. 42917.

[27] E. Mathieu-Dupas, ‘‘Algorithme des k plus proches voisins pondérés etapplication en diagnostic,’’ 42èmes Journées de Statistique, Marseille,France, Tech. Rep. inria-00494814, 2010, pp. 1–24.

[28] J. Dukarm and F. Jakob, ‘‘Thermodynamic estimation of transformer faultseverity,’’ in Proc. IEEE/PES Transmiss. Distrib. Conf. Expo. (T&D),Dallas, TX, USA, May 2016, p. 1.

[29] J. Soni and D. Suthar, ‘‘An experimental analysis to check accuracy ofDGA using duval pentagonal method in power transformer,’’ in Proc.Int. Conf. Res. Innov. Sci., Eng. Technol. (ICRISET), vol. 1, Aug. 2017,pp. 394–401.

OMAR KHERIF (Member, IEEE) received the M.Eng. and Ph.D. degreesin electrical engineering from the École Nationale Polytechnique (ENP),in 2015 and 2019, respectively. He was a Mathematical-Analysis AssistantTeacher with ENP, from 2016 to 2018. He was also a Volunteer Researcherwith the Laboratoire de Recherche en Electrotechnique, ENP, where heco-supervised several master’s theses. He is currently a KTP Associate withCardiff University. His research interests include high voltage techniques,electromagnetic transients, and earthing and power system protection.

YOUCEF BENMAHAMED (Member, IEEE) received the engineer’s andmaster’s degrees in power electronics engineering, in 2014, and the Ph.D.degree in high voltage techniques from the École Nationale Polytech-nique (ENP) of Algiers, in 2019. His research interests include diagnosis,artificial intelligence, and optimization techniques.

MADJID TEGUAR received the degree in electrical engineering, the mas-ter’s degree, and the Ph.D. degree in high voltage engineering from theÉcole Nationale Polytechnique (ENP) of Algiers, in 1990, 1993, and 2003,respectively. He is currently a Professor of Electrical Engineering with ENP.His research interests include insulation systems, insulation coordination,earthing of electrical energy systems, and polymeric cables insulation.

AHMED BOUBAKEUR (Senior Member, IEEE) was born in Biskra,Algeria, in 1952. He received the engineer’s degree in electrical engineeringfrom the École Nationale Polytechnique (ENP) of Algiers, in 1975, andthe Ph.D. degree in technical sciences from the Institute of High VoltageEngineering, Warsaw University of Technology, Poland, in 1979. He iscurrently a Professor with the ENP of Algiers, where he has been givinglecturers and supervising research in the field of high voltage engineering,since 1982. His research interests include discharge phenomena, insulatorspollution, lightning, polymeric cables insulation, transformer oil ageing,neural networks and fuzzy logic application in HV insulation diagnosis, andelectric field calculation and measurement. He is also a member of IEEEDEIS and the Algerian HV Power Systems Association ARELEC (NationalAlgerian Comity of CIGRE and ENP Elders Association ADEP). He hasbeen amember of the Editorial Board and anAssociate Editor of IET Science,Measurement & Technology (SMT).

SHERIF S. M. GHONEIM (Senior Member, IEEE) received the B.Sc.and M.Sc. degrees from the Faculty of Engineering at Shoubra, ZagazigUniversity, Egypt, in 1994 and 2000, respectively, and the Ph.D. degreein electrical power and machines from the Faculty of Engineering, CairoUniversity, in 2008. Since 1996, he has been teaching with the Faculty ofIndustrial Education, Suez Canal University, Egypt. From 2005 to 2007,he was a Guest Researcher with the Institute of Energy Transport and Storage(ETS), University of Duisburg–Essen, Germany. He joined Taif Universityas an Associate Professor at the Electrical Engineering Department, Facultyof Engineering. His research interests include grounding systems, dissolvedgas analysis, breakdown in SF6 gas, and AI technique applications.

VOLUME 9, 2021 81701