Function analysis based rule extraction from artificial neural networks for transformer incipient fault diagnosis

Electrical Power and Energy Systems 43 (2012) 1196–1203

Contents lists available at SciVerse ScienceDirect

Electrical Power and Energy Systems

journal homepage: www.elsevier .com/locate / i jepes

Function analysis based rule extraction from artificial neuralnetworks for transformer incipient fault diagnosis

Deepika Bhalla a,⇑, Raj Kumar Bansal b, Hari Om Gupta c

a Department of Electrical & Electronics Engineering, Institute of Engineering & Technology, Bhaddal, Punjab, Indiab H.H. Gardens, Sriganganagar 335 001, Rajasthan, Indiac J.P. Institute of Information Technology, Noida, India

a r t i c l e i n f o

Article history:Received 14 May 2011Received in revised form 7 June 2012Accepted 9 June 2012Available online 16 July 2012

Keywords:Artificial neural networksKnowledge rule extractionTransformer fault diagnosis

0142-0615/$ - see front matter � 2012 Elsevier Ltd. Ahttp://dx.doi.org/10.1016/j.ijepes.2012.06.042

⇑ Corresponding author. Mobile: +91 7837336470;E-mail addresses: [email protected] (

gmail.com (R.K. Bansal), [email protected] (H.O. Gu

a b s t r a c t

Dissolved gas analysis (DGA) has been widely used for fault diagnosis in a transformer. Artificial neuralnetworks (ANN) have high accuracy but are regarded as black boxes that are difficult to interpret. Formany problems it is desired to extract knowledge from trained ANN so that the user can gain a betterunderstanding of the solution arrived by the NN. This paper applies a pedagogical approach for ruleextraction from function approximating ANN with application to incipient fault diagnosis using the con-centrations of the dissolved gases within the transformer oil, as the inputs. The proposed method deriveslinear equations by approximation the hidden unit activation function and splitting the input space intosubregion. For each subregion there is a linear equation. The experiments on real data indicate that theapproach used can extract simple and useful rules. Transformer incipient fault diagnosis can be made thatmatches the actual fault present and at times the predictions better than those of the IEC/IEEE method.The rule sets generated have been successfully checked for accuracy of predictions by applying them tocase studies.

� 2012 Elsevier Ltd. All rights reserved.

1. Introduction

The static transformer is undoubtedly one item in the powersystem which is least subject to breakdown. However, faults dooccur from time to time, which can be classified into two types,one being internal incipient fault and the other being short circuitfault. The causes of incipient faults are deteriorating insulation andaging, which are due to thermal, electrical, and mechanicalstresses, and moisture. The methods that are used to evaluatethe deteriorating material and the aging processes are Degree ofPolymerization, Furan analysis of paper and DGA of transformeroil.

The DGA is widely used by utilities for monitoring the health ofoil filled transformers. The incipient faults occurring in transform-ers give evidence very early in their development stages throughtransformer oil gas analysis. The gas concentrations (in ppm) areobtained by extracting the gases dissolved in the oil and then sep-arating them by chromatography. The gases that are typicallyfound in the transformer insulating oil are Nitrogen (N2), Oxygen(O2), Hydrogen (H2), Carbon dioxide (CO2), Carbon monoxide(CO), Methane (CH4), Ethane (C2H6), Ethylene (C2H4), and Acety-lene (C2H2). The prominent source of N2 and O2 is the atmosphere,

ll rights reserved.

fax: +91 1881 244749.D. Bhalla), bansalrajk2009@pta).

for H2 the prominent source is partial discharge, CO2 presence inthe oil is due to overheated cellulose, along with it being a constit-uent of the atmosphere, CO presence is attributed to overheatedcellulose and air pollution, while over heated oil is responsiblefor the presence of CH4 and C2H6, C2H4 is present due to very over-heated oil, and C2H2 due to arcing in the oil. The faults are classifiedinto mainly three types: partial discharge (PD) or corona, thermalheating, and discharge or arcing. The most severe intensity of en-ergy dissipation occurs with arcing, less with heating and the leastwith PD.

Based on the translucency of neural networks, rule extractioncan be done either by functional analysis based or architecturalanalysis based method. To extract knowledge from ANN’s withapplication to transformer failure diagnosis in 2005 Adrian Rosaet al. mapped a ANN into rule based Takagi–Sugeno fuzzy inferencesystem, made explicit the knowledge implicitly captured by ANN’sduring the learning stage using architectural analysis based ap-proach for rule extraction. Percentage concentration of H2, C2H4

and C2H2 were used as the three inputs to the ANN and the trans-former faults were classified as discharge, partial discharge, andthermal fault [1]. Amora et al. in 2009 also extracted decomposi-tional rules from ANN for analysis of transformers [2]. Both of thesemethods derived fuzzy rules and are architectural analysis basedmethods.

In this work rule extraction from function approximation ofneural networks (REFANNs) for non-linear function approximation

http://dx.doi.org/10.1016/j.ijepes.2012.06.042

mailto:[email protected]

mailto:bansalrajk2009@ gmail.com

mailto:bansalrajk2009@ gmail.com

mailto:[email protected]

http://dx.doi.org/10.1016/j.ijepes.2012.06.042

http://www.sciencedirect.com/science/journal/01420615

http://www.elsevier.com/locate/ijepes

Table 1IEC/IEEE method range & ratio codes.

Range Ratio C2H2/C2H4

codeRatio CH4/H2

codeRatio C2H4/C2H6

code

<0.1 0 1 00.1–1.0 1 0 01.0–3.0 1 2 1>3.0 2 2 2

Table 2Diagnostic table for IEC/IEEE method.

Ratio C2H2/C2H4 code (R1)

Ratio CH4/H2

code (R2)Ratio C2H4/C2H6 code (R3)

Characteristic fault type

0 0 0 No fault0 0 1 Low temperature thermal

fault < 150 �C0 2 0 Low temperature thermal

fault 150–300 �C0 2 1 Medium temperature

thermal fault 300–700 �C0 2 2 High temperature thermal

D. Bhalla et al. / Electrical Power and Energy Systems 43 (2012) 1196–1203 1197

or regression given by Setiono et al. has been used; that works on anetwork with a single hidden layer and one linear output unit andit is a functional analysis based approach [3]. The continuous acti-vation function of the hidden unit is approximated by piece wiselinear function which divides the input space into subregions suchthat the function values of all the inputs in the same subregion canbe computed as a linear function of the input.

After training the ANN, the information embedded within themis extracted using REFANN, the results of the rule sets are com-pared based on the mean absolute error. The approach is basedon the existing classification criteria IEC/IEEE standards. Thesestandards are extensively used to compare the results of variousAI techniques used to tackle the uncertainties in fault diagnosis.

This paper is organized as follows: Section 2 presents the intro-duction to DGA, IEC/IEEE method, and proposes a diagnostic sys-tem. Section 3 discusses rule extraction from trained neuralnetworks and describes piece wise approximation of the activationfunction along with the rule extraction algorithm. Section 4 gives adetail of the two ANN’s used, discusses their results and extractsrule sets and finally checks them with application to case studies.The conclusion is given in Section 5.

fault > 700 �C0* 1 0 Low energy partial

discharge1 1 0 High energy partial

discharge1–2 0 1–2 Low energy discharge1 0 2 High energy discharge

* Not significant.

Data input to IEC/IEEE Method

ANN output

Remove the transformer from service carry out internal

inspection and estimate the severity of incipient fault

After degassing plan its maintenance schedule

No

Routine maintenance

IEC/IEEE method prediction

Is the diagnosis by both same?

Yes No

Is it “No fault” by IEC/IEEE method?

Yes

2. Dissolved gas analysis

2.1. Introduction

The various methods that are used for DGA by industry and util-ities are IEEE std. C 57.104:1991 [4], IEC std. 60599:1999 [5], Du-val’s triangle method [6], CIGRE method [7] and Nomographmethod [8]. The first two being key-gas ratio methods, the CIGREmethod is a combination of key-gas ratio method and gas concen-tration method while the Duval’s triangle and Nomograph aregraphical methods. The IEEE std. gives Key Gas method, Doernen-burg’s ratio method, Rogers’s ratio method to evaluate the possiblefaults. The ratio methods consider all four or three of these fourratios: CH4/H2, C2H2/C2H4, C2H4/C2H6, and C2H6/C2H2. The ratioC2H6/C2H2 of the IEEE std. C 57.104:1991 is used only for sub clas-sification of the temperature ranges of thermal fault; it has been ig-nored in IEC std. 60599:1999. The ratio methods have anadvantage over the graphical methods, as they are independentof the volume of the gas dissolved. Their drawbacks are that theydo not cover all the ranges of the data and result in significantnumber of unknown/no interpretations, due to incompleteness ofthe ratio combinations and the uncertainty of the validity of thedefined ranges of key-gas ratios. There is high degree of inconsis-tency and ambiguity, also multiple faults may occur concurrentlywithin a transformer and none of these methods are able to detectthem. Owing to a lack of expert knowledge within them, theseschemes are unable to detect new or unknown faults.

Fuzzy systems have been developed for fault diagnosis in trans-formers [9–15], which were built according to the DGA methods andthe knowledge of the expert. ANN’s have been used for transformerfault diagnosis using DGA [16–23]. IEC/IEEE method has beenwidely used to compare the results of AI techniques application tofault diagnosis of transformer [1,7,10–12,14,17,18,21,24,25]. DGAis also used for on line assessment of transformer condition[26,27] and off line new methods using sweep frequency responseanalysis are now being extensively used [28].

Confirm fault type and repair the transformer and record gas fingerprinting

Fig. 1. Structure of the diagnostic system.

2.2. IEC/IEEE method & DGA based diagnostic system

Diagnosis of faults by IEC/IEEE method is accomplished via asimple coding scheme based on ranges of ratios used along withtheir codes. Table 1 shows the different gas ratio ranges anddepending on the range of gas ratio the designated codes.

1198 D. Bhalla et al. / Electrical Power and Energy Systems 43 (2012) 1196–1203

Table 2 shows the diagnostic table of IEC/IEEE method that clas-sifies the faults into thermal faults of four types, PD and dischargeof two types each, in addition to the no fault condition. The pro-posed structure of the diagnostic system that is used in this workis shown in Fig. 1. The diagnostic method does not use the ruleset if the there is ‘‘no fault’’ identified by the IEC/IEEE method.

3. Rule extraction from neural networks

ANNs have been successfully applied to solve a variety of clas-sification and function approximation problems. However, theimpediment to a more widespread acceptance of neural networksis the absence of a capability to explain to the user how an ANNmakes a particular decision. To overcome the problem associatedwith the explaining capabilities of a neural network, the knowl-edge hidden within a neural network can be made explicit byextracting rules from embedded knowledge within trained neuralnetworks. The rule extraction from trained neural networks origi-nates from Gallant’s work, in which the order of the available attri-butes was exploited [29]. The classification scheme for ruleextraction algorithms takes into consideration the network struc-ture, expressive power, translucency, complexity and scalability,portability and the quality of rules [3,30–33].

Network Structure: The type of network structure the extractionalgorithm can be applied to; of the available research, most of thealgorithms are applied to three layer feed forward networks. Thereare only few algorithms that can be applied to recurrent neuralnetworks [33].

Expressive power of the rules: Whether the rules are proposi-tional or Boolean logic i.e., non-fuzzy/crisp or they are non-conven-tional logic, i.e. probabilistic/fuzzy.

Translucency: Refers to the granularity of the explanation fea-tures; which is the level of the detailed hypotheses and evidencethat the system can provide with each of its output decisions.Translucency can be categorized into two main types; one beingfunction analysis based approach and the other is architectureanalysis based approach.

The function-analysis-based approach: It does not disassemblethe architecture of the trained ANN but regards the network asan entity and tries to extract rules that could explain its function.It is also called the pedagogical approach, where the ANN is treatedas a black box and the extracted rules describe the global relation-ship between the variables of the input and output of the ANN[3,34–37] i.e. they generate samples from a trained neural networkand induce rules from the samples. The algorithm presents onlythe result of the whole neural network, so the training results bythe unit cannot be understood. The pedagogical approaches suchas artificial neural network-decision tree (ANN-DT) directly extractglobal relationship between the input and the output of the net-work and do not investigate the hidden unit activations.

The architectural-analysis-based approach: The search processmaps the architecture of a trained neural network to a set of rules[34,38–45]. This approach is also known as the decompositionalapproach by some researchers. The analysis of the numerical val-ues of the network such as activation values of hidden and outputneurons and weights of connection between them are used to ex-tract the rules directly i.e. they extract rules from each unit in anANN and aggregate them. The algorithms consider each unit inthe NN so that the main results by unit can be understood. In liter-ature these are also referred to as link rule extraction (LRE) tech-niques, in which the algorithm first search for weighted linksthat cause a node (hidden or output) to be ‘‘active’’; then thesecombinations of weighted links are used to generate symbolic rules[45].

In case the elements of both architectural analysis & functionanalysis based approach are included then it’s called electric ap-proach; it is also called the black box based rule because the rulesare extracted from feed forwarded networks by examining their in-put – output mapping behavior; regardless of the type and struc-ture of the neural network [45].

Complexity and Scalability: Most algorithms are exponential incomplexity. In pedagogical algorithms, the total number of sam-ples generated from a trained neural network is 2n, where n isthe number of inputs to the neural network, most decompositionalalgorithms are also exponential in computation complexity. It re-lates to computational issues that are relevant to large data setsand rule based, there is a curse of dimensionality: explosion ofthe number of rules.

Portability: Capability of the rule-extraction algorithm to extractrules from different network architecture; the extent to which theunderlying ANN incorporates specialized training regimes.

Quality of the rules: These are judged by:

Generality: i.e. the generalization to the test samples, also theaccuracy of the rules.Fidelity: i.e. if they can mimic the behavior of the ANN forwhich they were generated. It’s the ability to faithfully repre-sent the embedded knowledge.Comprehensiveness: i.e. the amount of embedded knowledgecaptured by them. In terms of the size of the rule set and thenumber of antecedents per rule.Consistency: i.e. whether they can produce the same classifica-tion of test instances over different training instances.Modifiability: i.e. the ability to be updated when the corre-sponding trained network architecture is updated or retrainedwith different datasets.Stability or robustness: i.e. how insensitive the method is tocorruptions in the training data or initial domain knowledge.Theory refinement capability: i.e. it can alleviate the knowledgeacquisition bottleneck due to incompleteness, inconsistency,and/or inaccuracy of initially acquired domain knowledge.

Most existing published reports have focused on extractingsymbolic rules for solving classification problems [41,42,45–50].The methods of rule extraction for regression are ANN-DT [35]and REFANN [3]. ANN-DT is capable of extracting rules from func-tion approximating networks. It produces decision trees based onthe network inputs and the corresponding output without analyz-ing the activation values of hidden units and the connectionweights of the network. REFANN method derives rules from mini-mum-size networks. It prunes the units from the original networksand extracts linear rules by approximating the hidden unit activa-tion functions by piecewise linear functions; this rule extractionalgorithm has been applied to real life data with limited successby Yan et al. and Miradi et al. [51,52].

All the methods of rule extraction that have been identifiedhave one or the other drawback and none of them is exact. Thedrawbacks are explosion of the number of rules, approximation de-gree, and limitation of their applicability, low fidelity, poor gener-alization, and poor transparency. With increase in the number ofhidden units in the network the number of rules extracted fromNN increases. An appropriate number of hidden units must bedetermined to balance rule accuracy and rule simplicity. The twogeneral approaches proposed in literature for deciding the appro-priate numbers of units in hidden layer are the constructive anddestructive algorithms. Starting with a few hidden units, in a con-structive algorithm more and more units are added so as to im-prove network accuracy [53–56]. The destructive algorithm, startwith a large number of hidden units and remove those that arefound to be redundant. The number of useful input units corre-

Fig. 2. The tanh(x) function (solid curve) for x e [0, xm] is approximated by twopiece linear function (dashed lines).


sponds to the number of relevant input attributes of the data. Mostof the algorithms usually start by assigning one input unit to eachattribute, train the network with all input attributes and then re-move network input units that correspond to irrelevant attributes[57,58].

A neural network based data mining approach consists of threemajor phases: namely network construction and training, networkpruning and rule extraction. The first phase constructs and trains athree layer neural network based on the number of attributes andnumber of classes and chosen input coding method, the secondphase is a pruning phase that aims at removing redundant linksand units without increasing the classification error rate of the net-work. A small number of units and links left in the network afterpruning enable to extract concise and comprehensible rules andthe third phase of rule extraction is the classification rules obtainedfrom the pruned network [58].

3.1. Network training and approximating hidden unit activationfunction

The data samples are first randomly divided into three subsets:training, cross-validation and testing. If (Ip, yp), p = 1, 2. . ., K is theavailable p patterns, where Ip e RN is the input, and yp e R the target.Using the training data set, a network with H hidden units istrained. If wij is the weight of the connections from input unit jto hidden unit i and vi is the weight of the connection from hiddenunit i to the output unit, then the hidden unit activation value Aip

for input Ip is given by:

Aip ¼ hXN

j¼i

wijIjp

!¼ hðsipÞ ð1Þ

and its predicted function value ~yp are computed by:

~yp ¼XH

i¼1

v iAip ð2Þ

Here Ijp is the value of input j for pattern p. The hidden unit activationfunction is given by h(x). This function may be a sigmoid function or ahyperbolic tangent function. Hyperbolic tangent function used in theproposed method is represented by tanhðnÞ ¼ ðef � e�fÞ=ðef þ e�fÞ.

The network weights are used to extract the rules that explainthe network outputs as a collection of linear equations. The initialstep is to approximate the activation function h(x) = tanh(x) bypiecewise linear approximation followed by generating rules usingweighed sum of the inputs, and then replacing the hyperplanes bythose that are parallel to the axis.

3.1.1. Three-piece linear approximationThe hidden unit activation function h(x) is antisymmetric.

When the range of input x varies from zero to xm, a simple and con-venient approximation of h(x) is done by the piecewise linear func-tion L(x) as shown in Fig. 2. The line on the left should intersect thecoordinate (0,0) with a gradient of h0 (0) = 1, and the line on theright should intersect the coordinate (xm, h(xm)) with a gradientof h0(xm) = 1 � h2(xm), so as to ensure that L(x) is larger than h(x)everywhere between zero to xm.

Thus, L(x) can be written as:

LðxÞ ¼X if 0 6 x 6 x0

H0ðxmÞðx� xmÞ þ hðxmÞ if x > x0

�ð3Þ

The two linear line segments intersect at point x0, the value ofwhich is given by:

x0 ¼hðxmÞ � xmh0ðxmÞ

h2ðxmÞð4Þ

On estimating h(x) by L(x) there exists an error; the total error EA isgiven by:

EA ¼Z xm

0ðLðxÞ � hðxÞÞdx

¼ 1=2 x20 þ ðxm � x0Þðx0 þ hðxmÞÞ

� �� ln cosh xm

! �1=2� ln 0:5 as xm !1 ð5Þ

i.e. the total error is bounded by a constant value.

3.2. Rule extraction algorithm

The given data set (Ip, yp), p = 1, 2,. . ., K and a pruned networkwith H hidden units is available and the objective is to generate lin-ear regression rules from the network for three piece linearapproximation.

Step (1) a network with one hidden layer and one output unit isachieved by training and later pruning if required

Step (2) for each hidden unit i = 1, 2. . . H:(1) from the training samples determine xim.(2) using Eq. (4) compute xi0

(3) The three-piece approximating linear function Li(x) isdefined as:

LiðxÞ ¼ðxþ ximÞh0ðximÞ � hðximÞ if x < �xi0

x if � xi0 6 x 6 xi0

ðx� ximÞh0ðximÞ þ hðximÞ if x > xi0

8><>: ð6Þ

Divide the input space into 3H subregions using the pair ofpoints – xi0 and xi0 of function Li(x).

Step (3) generate a rule for each non-empty subregion:(i) Using the logic If

sip ¼XN

j¼1

wijIjp then yp ¼XH

i¼1

v iLiðsipÞ ð7Þ

(ii) Define a linear equation that approximates the networksoutput yp for input sample p in this subregion as the con-sequent of the extracted rule.

(iii) Generate the rule conditions: (C1 and C2 and . . ..CH),where Ci is sip < �xi0, �xi0 6 sip 6 xi0, or sip > xi0 for threepiece approximation.

A rule condition Ci is defined in terms of weighed sum of the in-puts sip which corresponds to an oblique hyper plane in the input


space. The interpretation of this type of rule condition can be diffi-cult for the user. At times these hyperplanes can also be replacedby hyperplanes that are parallel to the axis without affecting theprediction accuracy of the rules on the data set. The hyperplanescan be defined in terms of the isolated inputs to get an enhancedinterpretability; it may be at the cost or reduced accuracy. A posi-tive/negative coefficient indicates positive/negative correlation be-tween the input/output parameters. Thus the magnitude of the ruleexpression coefficients and its signs contain information regardingthe relationship between the input attributes and output variables.When the data is normalized, an input parameter with large/smallercoefficient has a stronger/lower influence on the output. This meth-od of rule extraction derives rules from minimum size networks.

To access that how faithfully the embedded knowledge is ex-pressed, the fidelity of the extracted rules is computed by:

Fidelity ¼X

P

ð~yp � ypÞ2,X

P

ð�y� ypÞ2: ð8Þ

The mean absolute error MAE, relative root mean squared error(RRMSE), and the relative mean absolute error (RMAE) of the net-work predictions are calculated using the summation over thesamples of the test set [21].

MAE ¼ 1jCj

XðIp ;yp2SÞ

j~yp � ypj ð9Þ

RRMSE ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPpð~yp � ypÞ

2Ppð�y� ypÞ

2

vuut ð10Þ

RMAE ¼X

P

j~yp � ypj,X

P

j�y� ypj ð11Þ

where |C| is the cardinality of set for which MAE is calculated, �y isthe average value of yp in the test set. The relative errors are pre-ferred over the mean absolute errors because they normalize thedifferences in the output ranges of different data set. A RRMSEand RMAE that is greater than 1 indicates that the method performsworse than the method that simply predicts the output using aver-age values of the samples.

4. Rule extraction for transformer fault diagnosis

Decompositional rules have been extracted from trained neuralnetworks for fault diagnosis of transformers [1,25,59] The availablearchitectural analysis based extracted rules for transformer faultdiagnosis used only three gases [1,25] while the functional analysisapproach presented in this work uses DGA results of five gases.

4.1. Proposed ANN and results

Networks with too many hidden units are not suitable for ruleextraction because the output would be expressed by a large num-ber of rules. Two multilayer feedforward networks were trainedeach of them with one hidden layer; which is taken to be a univer-sal function approximator. The distinct input variables are the dif-ference between the way the available DGA data is fed to the twoANN’s. For the ANN-I only three input; the normalized values(d = x � xmin/xmax � xmin) of the three gas ratios C2H2/C2H4, CH4/H2, and C2H4/C2H6, are used and for ANN-II the percentage concen-tration of five key-gases H2, CH4, C2H4, C2H6, and C2H2 are used.Pruning is not required as the number of inputs is limited. The tar-gets are linearly scaled to range in the interval [0–1]. All possiblecombinations of target values of for fault predictions were triedand the best target values for classification of the faults of both

ANN I & II are: 0-thermal fault, 0.5-discharges, and 1.0-partial dis-charge. Around each value a crisp band of size ± 0.25 has been de-fined and the output results falling in a band are labeled accordingto its central value. Data of 211 samples from research publications[1,8,9,17,20,21,25,60–64] and from state utility is used; it is splitinto ten subsets, eight subsets are used for training while one sub-set each is used for validation and testing. 169 samples are used inthe training set while 21 samples each are used in the cross-valida-tion and testing set. The results of the IEC/IEEE method are com-pared with the actual fault and it is found that for three patternsit gave wrong predictions.

Training function used is gradient descent with momentumbackpropagation (TRAINGDM) and the adaptation learning func-tion used is gradient descent weight and bias learning function(LEARNGD). The transfer function used between both the inputand hidden layer, and between hidden layer and output layer istanh(x). The work was carried out using MATLAB 7.5 version.

4.2. Rule extraction from trained neural network

Experiment 1: The data set has three input attributes; whichwere normalized values of ratios C2H2/C2H4 (R1), CH4/H2 (R2), andC2H4/C2H6 (R3) and the target type are so scaled that it ranged inthe range of 0.5. The largest magnitude of the weighed input (xm)is 1.82009 and the value of x0 is found to be 0.85238 using (4).The error EA by approximation using (5) is found to be 0.0820.The three piece linear function replaces the activation functionfor the three non-empty subregions of the input space; the hyper-bolic tangent function’s approximated by the following values:

hðxmÞ ¼ 0:94849325

h0ðxmÞ ¼ 0:09968816

Rule SetThe hyperbolic tangent function was approximated using (6)

L1ðs1pÞ¼0:09968816s1p�0:76740595 if s1p <�0:85238s1p if �0:852386 s1p60:852380:09968816s1pþ0:76740595 if s1p >0:85238

8><>:

ð12Þ

The three subsets of the input space are defined by the follow-ing inequalities:

Rule Set A

Region 1: if s1p < �0.85238, 2.0124R1 � 9.1803R2 � 1.8102R3

� 7.3209 < �0.85238Region 2: if �0.85238 6 s1p 6 0.85238, 2.0124R1 � 9.1803R2

� 1.8102R3 � 7.3209 6 0.85238Region 3: if s1p > 0.85238, 2.0124R1 � 9.1803R2 � 1.8102R3

� 7.3209 > 0.85238

The coefficients of the two parallel hyper planes that divide theinput space into two regions are equal to weights w1j from the jthinput unit to the hidden unit. Upon multiplying the coefficientsL1(s1p) by the connection weights value from the hidden unit tooutput unit and, we obtain the following rules; this is also the lin-ear output expression applicable to the original data.

Rule Set I

Rule 1: if Region 1, then y ¼ 6:412603818R1 � 161:6009245R2

�161:95412R3 � 4:708186734Rule 2: if Region 2, then y ¼ 64:32544706R1 � 1621:034452R2

�1624:5774R3 � 3:940776734Rule 3: if Region 3, then y ¼ 6:412603818R1 � 161:6009245R2

�161:95412R3 � 3:173366734

Table 3Comparison of predictions & errors of ANN-I & II.

ANN-I ANN-II

No of incorrect predictions 5 1EA 0.0820 0.0023MAE 1.696464288 0.186484509RRSME 0.790393416 0.790691686RMAE 5.99845464 0.60166035


Experiment 2: Using the percentage of the five gases H2, CH4,C2H4, C2H6, and C2H2 as the input attributes, xm is found to be0.55614, x0 is 0.35612 using (4) and the value of error EA is0.0023 using (5). Using three piece linear approximation of thehidden unit activation function, a rule set consisting of just tworules is obtained. The hyperbolic tangent function’s approximatedby

L1ðs1pÞ ¼s1p if s1p 6 0:356120:74487s1p þ 0:09086 if s1p > 0:35612

�ð13Þ

The two subsets of the input space are defined by the followinginequalities:

Rule Set A

Region 1: if s1p 6 0.35612, 0.57321H2 � 0.28823CH4 + 0.013805C2H4 � 0.13987C2H6 + 0.33333C2H2 + 0.31517 6 0.35612Region 2: if s1p > 0.35612, 0.57321H2 � 0.28823CH4 + 0.013805C2H4 � 0.13987C2H6 + 0.33333C2H2 + 0.31517 > 0.35612

The coefficients of the two parallel hyper planes that divide theinput space into two regions are equal to weights w1j from the jthinput unit to the hidden unit. Upon multiplying the coefficientsL1(s1p) by the connection weights value from the hidden unit to

Fig. 3. Predicted values of faults for Samples by Experiment II.

output unit and, we obtain the following rules; this is also the lin-ear output expression applicable to the original data.

Rule Set I

Rule 1: if Region 1, then y ¼ 1:2718383H2 � 0:63952472CH4

þ0:030631C2H4 � 0:31034C2H6 þ 0:73952604C2H2 � 0:49193.Rule 2: if Region 2, then y¼0:94735423H2�0:47636278CH4

þ0:022815766C2H4�0:231165605C2H6þ0:550850761C2H2

�0:40107

One of the above two equations give the predicted value of y.Both the ANN’s are trained using the same sample data. The

number of incorrect predictions and errors using (9–11) of boththe ANN are compared in Table 3. The value of �y is found to be0.26190476. The ANN-I cannot clearly distinguish between thePD and discharge, while ANN II can clearly distinguish the threefaults and gives much better performance when instances of incor-rect predictions and errors are compared. The fidelity of ANN-II isfound to be 296.8953. The predicted fault type for all the samplesare plotted in Fig. 3a while b and c gives the values of y for region I& II respectively.

4.3. Case study

The historic data from research publications [18,60] has beenused to check the performance of the diagnostic system and resultsof the Rule Set; Table 4 gives details of the test record and the diag-nostic results of each case given by the IEC/IEEE method and theRule Set.

For Case I when the internal examination of the transformerwas carried out for sample 2 it was found that the transformerhad developed a high temperature fault; the result of IEC/IEEEmethod: high temperature fault is correctly defined by the RuleSet. For the previous sample collected the IEC/IEEE method couldnot define the fault type where as Rule Set predicted it to be a ther-mal fault developing in the transformer; sample no. 3 has beenidentified as a condition of no fault by IEC/IEEE method hencethe diagnostic system does not propose further investigation.

For Case II the internal examination was carried out for sampleno. 5 and it was found that the safety valve of the OLTC had a prob-lem; as per the IEC/IEEE method and the Rule Set it is a dischargetype fault; for sample 2 & 4 the results of the IEC/IEEE method andthe Rule Set match. For sample no. 3 the IEC/IEEE method could notidentify the fault type because of the absence of H2.

For Case III wherever the IEC/IEEE method made predictions,the results of the Rule Set tally. For two samples before the faultoccurrence, the IEC/IEEE method identified it as thermal fault,whereas the Rule Set is able to predict thermal fault for three pre-vious samples.

The internal examination report for Case IV states that the rea-son for the failure is arcing with power flow. The predictions of IEC/IEEE method and Rule Set tally.

In three of the four cases discussed, the previous DGA reportinterpretation by the Rule Set predicted the same fault that causedtransformer failure at a later stage. Where as in the case where pre-vious sample prediction was incorrect; it was due to the oil from

Table 4Results of case study: figures in bold indicate that the prediction of the IEC/IEEE method and the rule set are same, the underlined figures indicate that this sample could correctlypredict the fault type.

Case/sample Date H2 CH4 C2H4 C2H6 C2H2 IEC/IEEE method Internal inspection report Rule Set

I/1 12/11/1984 97 95 164 44 0 ND – F1I/2 6/17/1986 9 65 7 7 0 F1 High temperature internal fault F1I/3 2/17/1992 28 26 3 23 0 NF – –I/4 10/23/1993 64 19 82 11 0 ND – F3II/1 2/19/1988 4 37 5 38 4 ND – F1II/2 2/22/1989 7 56 4 53 0 F1 – F1II/3 12/4/1989 0 37 5 42 0 ND – F1II/4 8/12/1990 7 68 9 77 0 F1 – F1II/5 10/23/1990 1071 439 581 48 227 F3 The safety valve of OLTC has problem F3III/1 5/12/1984 10 24 21 37 10 ND – F1III/2 10/16/1985 44 87 54 77 4 F1 – F1III/3 2/11/1987 32 81 53 69 3 F1 – F1III/4 5/22/1989 528 3179 3020 320 2314 ND Secondary winding has burnt out F1III/5 10/30/1989 0 8 0 0 0 ND – F1III/6 5/15/1990 0 6 4 3 0 ND – F1IV/1 8/19/1989 85 126 224 46 96 ND – F1IV/2 9/27/1989 142 118 193 31 92 F3 – F3IV/3 1/15/1990 300 45 101 17 225 F3 – F3IV/4 2/23/1990 206 42 82 16 221 F3 – F3IV/5 4/10/1990 3091 46 101 17 239 ND Arc with power flow through F3

ND-not defined, NF-no fault, F1-thermal fault, F2-partial discharge, F3-discharge.


the on load tap changer having got mixed with that in the maintank oil due to problem with its valve; which is not an incipientfault.

5. Conclusion

In this paper a technique of transformer fault diagnosis usingthe knowledge within an ANN has been used to predict incipientfaults. The ANN that used the input as the percentage concentra-tion of five gases gave better results than the one that used thegas ratios as inputs. The generated Rule Sets are able to computelinear functions of the input. By using the linear equations, thethermal, partial discharge and discharge type faults developingwithin a transformer can be successfully predicted and make theembedded knowledge within the ANN explicit. These equationscan easily be used by the site engineers for fault classificationwhere neural network software is not available and also the coef-ficients can be improved upon as more knowledge is acquired.

While applying the generated rules to case studies; the RuleSets fault prediction are in agreement with results of the internalexamination reports and IEC/IEEE method predictions when inter-nal examination report was not available. In all cases discussed, ad-vance warning of a developing incipient fault within thetransformer was correctly made. Predictions can be made wherethe fault reported by IEC/IEEE method is not defined due to oneof the C2H4, H2, and C2H6 gases being absent/not traced. The pro-posed diagnostic method of transformer fault prediction givespromising results.

References

[1] Rosa A, Castro G, Miranda V. Knowledge discovery in neural networks withapplication to transformer failure diagnosis. IEEE Trans Power Syst2005;20(2):717–24.

[2] Amora MAB, Aimeida OM, Braga APS, Brabosa FR, Lima SS, Lisboa AC.Decompositional rule extraction from artificial neural networks andapplication in analysis of transformer. In: 15th International conference onIntelligent system application to power systems, Curitiba, ISBN: 978-1-4244-5097-8, 8–12 November, 2009. p. 1–6.

[3] Setiono R, Leowuw WK, Zurada JM. Extraction of rules form artificial neuralnetworks for non-linear regression. IEEE Trans Neural Netw 2002;13(3):564–77.

[4] IEEE guide for the interpretation of gases generated in oil-immersedtransformers, ANSI/IEEE Standard C57.104-1991.

[5] Mineral oil impregnated electrical equipment in service. guide to theinterpretation of dissolved free gas analysis. 2nd ed. CEI, IEC 60599; 1999–03.

[6] Duval M. A review of fault detectable by gas-in-oil analysis in transformers.IEEE Electr Insul Mag 2002;10(3):8–17.

[7] CIGRE working group 09 of study committee 12. Life time evaluation oftransformer. Electra No. 150; 1993. p. 39–51.

[8] Yadaiah N, Ravi N. Fault detection techniques for power transformers.Industrial & commercial power systems technical conference, 2007. ICPS2007. IEEE/IAS vol. 6, no. 11, May 2007. p. 1–9.

[9] Tomsovic K, Tapper M, Ingvarsson T. A fuzzy information approach to integratedifferent transformer diagnostic methods. IEEE Trans Power Syst1993;08(3):1638–46.

[10] Huang Yann-Chang, Yang Hong-Tzer, Huang Ching-Lien. Design of robusttransformer fault diagnosis system using evolutionary fuzzy logic. In: IEEEinternational symposium on circuits and systems, vol. 1. Atlanta, USA; 12–15May, 1996. p. 613–6

[11] Huang Yann-Chang, Yang Hong-Tzer, Huang Ching Lien. Developing a newtransformer fault diagnostic system through evolutionary fuzzy logic. IEEETrans Power Deli 1997;12(2):761–7.

[12] Su Q, Mi C, Lai LL, Austin P. A fuzzy dissolved gas analysis method for diagnosisof multiple incipient faults in a transformer. IEEE Tran Power Syst2000;15(2):593–8.

[13] Muhamad NA, Phung BT, Blackburn TR. Comparative study and analysis ofDGA methods for mineral oil using fuzzy logic. In: Proceedings of 8thinternational power, engineering conference (IPEC2007); 2007. p. 1301–6.

[14] Yang Hong-Tzer, Liou Chiung Chou, Chow Jeng Hong. Fuzzy learning vectorquantization networks for power transformer condition assessment. IEEETrans Dielect Electr Insul 2001;08(1):143–9.

[15] Bhalla D, Bansal RK, Gupta HO. Transformer incipient fault diagnosis based onDGA using fuzzy logic. IICPE 2010, New Delhi, ISBN: 978-1-4244-7883-5; 28–30 January, 2010. p. 1–5

[16] Moaris DR, Rolim JG. An artificial neural network approach to transformerfault diagnosis. IEEE Trans Power Del 1996;11(4):1836–41.

[17] Vanegas O, Mizuno Y, Naito K, Kamiya T. Diagnosis of oil insulated powerapparatus using neural network simulation. IEEE Trans Dielectr Electr Insul1997;4(3):290–9.

[18] Yang Hong-Tzer, Huang Yann-Chang. Intelligent decision support for diagnosisof incipient transformer faults using self-organizing polynomial networks.IEEE Trans Power Syst 1998;13(3):946–52.

[19] Thang KF, Aggarwal RK, MacGrail AJ, Esp DG. Application of self-organizingmap algorithms for analysis and interpretation of dissolved gases in powertransformers. Power engineering society summer meeting, vol. 3. Vancouver,BC, Canada: IEEE; 2001. p. 1881–6. 07/15/2001.

[20] Guardado JL, Naredo JL, Moreno P, Fuerte CR. A comparative study of neuralnetwork efficiency in power transformer diagnosis using dissolved gasanalysis. IEEE Trans Power Del 2001;16(4):643–7.

[21] Wang MH. Extension neural network for power transformer incipient faultdiagnosis. IEE Proc Gener Transm Distrib 2003;150(6):656–79.

[22] Wang Zhan, Guo JiWei, Xie Jing Dong, Tang Guoqing. An introduction of acondition monitoring system of electrical equipment. In: Proceedings of 2001


international symposium on electrical insulating materials, 2001 (ISEIM 2001).Himeji, Japan; 19–22 November, 2001. p. 221–4.

[23] Zhang Zhe, Zhu Yong-Li, Li Zhong, Kaitlam Wu. Diagnosis of transformerdissolved gas analysis based multi class SVM. In: International conference onartificial intelligence and computing intelligence, Shanghai, China; 7–8November, 2009.

[24] Huang Yann-Chang. A new data mining approach to dissolved gas analysis ofoil insulated power apparatus. IEEE Trans Power Del 2003;18(4):1257–61.

[25] Castro AR, Marinda V. An interpretation of neural networks as inferenceengines with application to transformer failure diagnosis. J Electr PowerEnergy Syst 2005;27:620–6.

[26] Georgilakis PS. Condition monitoring and assessment of power transformersusing computational intelligence. Int J Electr Power Energy Syst2011;33(10):1784–5.

[27] Ning L, Wu W, Zhang B, Zhang P. A time-varying transformer outage model foron-line operational risk assessment. Int J Electr Power Energy Syst2011;33(3):600–7.

[28] Behjat V, Vahedi A, Setayeshmehr A, Borsi H, Gockenbach E. Sweep frequencyresponse analysis for diagnosis of low level short circuit faults on the windingsof power transformers: an experimental study. Int J Electr Power Energy Syst2012;42(01):78–90.

[29] Gallant SI. Connectionist expert systems. Commun ACM 1988;31:152–69.[30] Mitra S, Hayashi Y. Neuro-Fuzzy rule generation survey in soft computing

frame work. IEEE Trans Neural Netw 2000;11(3):748–68.[31] Setiono R, Pan SL, Hsieh Ming-Huei, Azcarraga AP. Separating core & non-core

knowledge: application of neural network rule extraction to brand imageperception. IEEE Trans Syst, Man Cybernetics-Part C: applications Rev2005;35(4):465–75.

[32] Kamruzzaman SM, Islam Md Monirul. An algorithm to extract rules fromartificial neural networks for medical diagnosis problem. Int J Inform Technol2006;12(8):41–59.

[33] Omlin CW, Giles CL. Extraction of rules from discrete time recurrent neuralnetworks. Neural Netw 1996;9(1):41–52.

[34] Thrum S, Tesauro G, Touretzky DS, Leen T. Extracting rules from artificialneural networks with distributed representations. In: Advances in neuralinformation processing systems 7. Cambridge, MA: MIT Press; 1995.

[35] Schmitz GPJ, Aldrich C, Gouws FW. ANN-DT: an algorithm for extraction ofdecision trees from artificial neural network. IEEE Trans Neural Netw1999;10:1392–402.

[36] Zhou Zhi-Hua, Jiang Yuan, Chen Shi-Fu. Extracting symbolic rule from trainedneural networks ensembles. AO Commun 2003;16(1):3–5.

[37] Quteishat A, Lim CP. A modified fuzzy max-min neural network with ruleextraction and its application to fault detection and classification. Applied SoftComput 2008;8(2):985–95.

[38] Fu L. Rule learning by searching on adapted nets. In: Proceedings of the 9thnational conference on artificial intelligence, Anaheim, CA; 1991. p. 590-5.

[39] Towell G, Shavlik JW. The extraction of refined rules from knowledge basedneural networks. Mach Learn 1993;13:207–15.

[40] Krishan R. A systematic method for decompositional rule extraction fromneural networks. In: Proceedings of the NIPS96 workshop on rule extractionfrom trained artificial neural networks, Queensland, Australia; 1996. p. 38–45.

[41] Setiono R. Extracting rules from neural networks by pruning and hidden unitsplitting. Neural Comput 1997;9(1):205–25.

[42] Setiono R, Loew WK. FERNN: an algorithm for fast extraction of neuralnetworks. Appl Int 2000;12:15–25.

[43] Castro JL, Mantas CJ, Benitez JM. Interpretation of artificial neural networks bymeans of fuzzy rules. IEEE Trans Neural Netw 2002;13(1).

[44] Andrews R, Diederich J, Tickle A. A survey & critique of techniques forextracting rules from trained neural networks. Knowl-Based Syst1995;8(6):373–89.

[45] Taha Ismail A, Gosh Joydeep. Symbolic Interpretation of Artificial NeuralNetworks. IEEE Trans Knowl Data Eng 1999;11(3):448–63.

[46] Towell GG, Shavlik JW. Extracting refined rules from knowledge-based neuralnetworks. Mach Learn 1993;13(1):77–101.

[47] Blassig R. GDS: gradient descend generation of symbolic rules. Advances inneural information processing systems 6. San Mateo, CA: Morgan Kaifmann;1994. p. 1093–100.

[48] Sestito S, Dillon T. Automated knowledge acquisition. NJ, Prentice-Hall: Englewood Cliffs; 1994.

[49] Setiono R, Liu H. Symbolic representation of neural networks. IEEE Comput1996;29(3):71–7.

[50] Gupta A, Park S, Lam SM. Generalized analytic rule extraction for feedforwardnetworks. IEEE Trans Knowl Data Eng 1998;11:985–91.

[51] Yan N, Min LI, Jianhong Y, Kunyin M. Production quality modeling based onregression rules extracted from trained artificial neural networks. In: 2009International conference on systems. IEEE computer society; 2009. p. 197–203.

[52] Miradi Maryam. Extraction of rules form artificial neural networks for Dutchporous asphalt concrete pavement. In: Proceedings of international jointconference on neural networks, Orlando, Folrida, USA, August 12–17.

[53] Ash T. Dynamic node creation in backpropagation networks. Connect Sci1989;1(4):365–75.

[54] Setiono R, Hui LCK. Use of quasi-Newton approach in feed forward neuralnetwork construction algorithm. IEEE Trans Neural Netw 1995;6:273–7.

[55] Kwoj TY, Yeung DY. Constructive algorithm for structure learning in feedforward neural networks for regression problems. IEEE Trans Neural Netw1997;8:630–45.

[56] Setiono R, Liu H. Neural network feature selector. IEEE Trans Neural Netw1997;8:654–62.

[57] Zurada JM, Malinoski A, Usui S. Perturbation method for deleting redundantinputs from perceptron networks. Neurocomputing 1997;14(2):177–93.

[58] Lu H, Setiono R, Liu H. Effective data mining using neural networks. IEEE TransKnowl Data Eng 1996;8(6):957–61.

[59] Duraisamy V, Devarajan N, Somasundareswari D, Vasanth AAM, SivanandamSN. Neuro-fuzzy schemes for fault detection in power transformers. Appl SoftComput 2007;7(2):534–9.

[60] Lin CF, Ling JM, Huang CL. An expert system for transformers fault diagnosisusing dissolved gas analysis. IEEE Trans Power Del 1993;8(1):231–8.

[61] Lin Wei-Song, Hung Chin-Pao, Wang Mang-Hui. CMAC-based fault diagnosis ofpower transformers. Proc 2002 Int Joint Conf Neural Netw 2002;3:986–91.

[62] Zhang Y, Ding X, Liu Y, Griffin PJ. An artificial neural network approach totransformer fault diagnosis. IEEE Trans Power Del 1996;11(4):1836–41.

[63] Patel NK, Khubchandani RK. ANN based power transformer fault diagnosis.Inst Eng (I) J EL 2004;85:60–3.

[64] Duval M, Pablo A. Interpretation of oil in gas analysis using new IECpublication 60599 and IEC TC 10 databases. IEEE Electr Insul Mag2001;17(2):31–41.

Documents

Function analysis based rule extraction from artificial neural networks for transformer incipient fault diagnosis