13 Optimized Diagnostic Model

7/27/2019 13 Optimized Diagnostic Model

1/10

978-1-4673-1813-6/13/$31.00 2013 IEEE1

AbstractIdentifying the most suitable classifier for diagnostics

is a challenging task. In addition to using domain expertise, a

trial and error method has been widely used to identify the

most suitable classifier. Classifier fusion can be used to

overcome this challenge and it has been widely known to

perform better than single classifier. Classifier fusion helps in

overcoming the error due to inductive bias of various

classifiers. The combination rule also plays a vital role in

classifier fusion, and it has not been well studied whichcombination rules provide the best performance during

classifier fusion. Good combination rules will achieve good

generalizability while taking advantage of the diversity of the

classifiers. In this work, we develop an approach for ensemble

learning consisting of an optimized combination rule. The

generalizability has been acknowledged to be a challenge for

training a diverse set of classifiers, but it can be achieved by an

optimal balance between bias and variance errors using the

combination rule in this paper. Generalizability implies the

ability of a classifier to learn the underlying model from the

training data and to predict the unseen observations. In this

paper, cross validation has been employed during performance

evaluation of each classifier to get an unbiased performance

estimate. An objective function is constructed and optimized

based on the performance evaluation to achieve the optimalbias-variance balance. This function can be solved as a

constrained nonlinear optimization problem. Sequential

Quadratic Programming based optimization with better

convergence property has been employed for the optimization.

We have demonstrated the applicability of the algorithm by

using support vector machine and neural networks as

classifiers, but the methodology can be broadly applicable for

combining other classifier algorithms as well. The method has

been applied to the fault diagnosis of analog circuits. The

performance of the proposed algorithm has been compared to

other combination rules in the literature. It is observed that the

proposed combination rule performs better in reducing the

number of false positives and false negatives.

TABLE OF CONTENTS1. INTRODUCTION .......................................... 12. OPTIMIZED FUSION METHODOLOGY ....... 23. RESULTS AND DISCUSSION ........................ 64. CONCLUSIONS ............................................ 8REFERENCES......................................................... 8BIOGRAPHIES........................................................ 9

1.INTRODUCTIONThe field of prognostics and health management (PHM)

involves the development of technologies and

methodologies to increase the availability and reliability of

engineering systems [2] [3] [4]. As part of the PHM

regimen, diagnostic algorithms have been developed and

employed to assist in fault diagnosis. As the number of

available PHM algorithms has increased, the algorithmselection process has become more complex and difficult.

Dilemmas for users when choosing a diagnostic algorithm in

PHM include the following: whether the data utilized for

training is a suitable representation of the global population;

how well the algorithm will perform in a noisy environment;

and how well the algorithm will perform on data that it has

not encountered (i.e. generalizability). A method is needed

to quickly identify appropriate algorithms that meet the

performance requirements for specific applications.

Ensemble learning is one technique that has been employed

to improve generalizability [1] as well as in situations where

the training data is not a suitable representation of the global

population. In ensemble learning, a collection of classifiers

are trained simultaneously, and the results are combined in a

suitable manner (also referred to as the combination rule) to

improve performance. Generally, the approach of ensemble

learning is to train a diverse set of classifiers and then devise

a method to combine these trained classifiers. Diversity in

the ensemble learning context refers to different

classification outputs from each of the trained classifiers for

a given input sample set. Studies have reported on the

importance of and methodologies for generating diverse

classifiers. Brown et al. [5] provided a mathematical account

of the role of diversity in ensemble learning and how it helpsto improve classification accuracy. Theoretically, the more

diverse the classifiers are, the less correlated the classifier

outputs are with each other. As a result, the prediction based

on each classifier of the ensemble could be complementary

to each other, which implies when one classifier makes a

prediction error, other classifiers could be correct. The

complementary nature of these classifiers helps in offsetting

potential errors, thereby providing greater generalizability

for these algorithms.

Surya KuncheCenter for Advanced Life

Cycle Engineering, University

of Maryland, College Park,MD 20742 USA

[email protected]

Chaochao ChenCenter for Advanced Life Cycle

Engineering, University of

Maryland, College Park, MD20742 USA

[email protected]

Michael G. PechtCenter for Advanced Life

Cycle Engineering, University

of Maryland, College Park,MD 20742 USA

[email protected]

Optimized Diagnostic Model Combination for

Improving Diagnostic Accuracy


2/10

2

Studies have been conducted on the diversity-generating

stage of the classifiers. The most widely used methodology

for diversity-generation focuses on manipulating the training

data; i.e., supplying each classifier with a different set of

manipulated training data (for example, training a classifier

with only part of the training data). When a classifier is

trained with a manipulated training data set, it typically

generates diverse predictions. Commonly used methods for

manipulating training data are bootstrapping, bagging [17],and boosting [18]. Diversity is also achieved by changing

the adjustable parameters of a classifier being trained; for

example, in neural networks diversity can be achieved by

changing the initial weights, the number of hidden neurons,

the activation function and the training algorithm [5]. Some

researchers have proposed the use of an evolutionary

algorithm to achieve the optimal amount of diversity during

the training phase [19, 20, 21].

Once diversity is achieved for the trained classifiers, a

combination rule is used to combine the classificationresults. The most common means of classifier fusion include

averaging [7, 8], majority voting [9, 10, 11], weighted

majority voting [12], and a localized fusion [13,14] based

approach that improves the weighted majority voting

algorithm by evaluating the performance of classifiers in the

neighborhood of the test points. Bonissone et al. [15]

proposed a fusion methodology based on Cartesian and

regression trees that reduces the computation time in the

localized fusion.

The error of any machine learning algorithm can be

classified into two parts: the bias component and the

variance component. Given a training set { , , =1,2,, } for ,1,2 = , where is the feature vectorand is the corresponding class vector. Let be theclassifier trained on training set and is the predictionof classifier for input feature . The bias component of theerror is defined as shown in (1). As seen from the equation

bias component describes the correctness of the model [36].

The variance of the classifier is as shown in (2). The

variance component describes the precision of the model and

how prediction varies with training set [36]. A machine

learning algorithm achieves minimal prediction errors on

unseen data when there is an optimal balance of bias and

variance [35]. Figure 1 shows the changes in errors

including bias, variance, and total error as the value of

model complexity changes. As seen in Figure 1, good

generalizability is achieved by an optimal balance of bias

and variance.

= | | (1) = (2)

= (3)The optimized fusion methodology discussed in this papercombines multiple classifiers based on their performance soas to achieve the least total error in fault detection, i.e., theleast number of false and missed alarms. To achieve thisoptimized fusion, we compute bias and variance to evaluatethe performance of each classifier. The performance of the

classifiers is also validated with a cost function using crossvalidation. We then developed a framework for thismethodology that combines all the classifiers. Acomparative analysis using experimental data has beenconducted to demonstrate the accuracy of this algorithmover other methodologies.

We discuss our optimized fusion methodology in section 2.In section 3 we present 2 case studies wherein we used thismethodology to perform diagnostics for different analogcircuits including a sallen-key band pass filter and a biquadlow pass filter. We also conducted a comparative analysis toevaluate the performance of this methodology. In section 4

we give concluding remarks.

2.OPTIMIZED FUSION METHODOLOGY

Inductive bias [22] is defined as the set of assumptions that aclassifier makes in order to classify a given set of features.For example, support vector machine classification (Figure2) assumes that different classes can be separated by a hyperplane. In the case of k nearest neighbor classification, the

Figure 1 Error change as a function of model complexity

[33] [36].

Point of least

total error

Figure 2 SVM classification.


3/10

3

assumption is that the distance of a test point from the knearest neighbors of each test point determines the class ofthe test point, i.e., the test point belongs to the class to whichthis distance is least. These assumptions may not necessarily

be true in all instances; for example, as seen in Figure 2, thehyperplane of the support vector machine is not able tocorrectly determine the decision boundary in this two-class(blue and green) classification problem. Hence, theassumptions made by the classifiers lead to an error knownas inductive bias in classification, which is a classificationerror.

In classifier fusion the complementary features of theclassifiers are employed to overcome their individual errors.To achieve this, classifier algorithms are initially trained onthe training data set, and then the classification outputs ofthe classifier algorithms are combined. The method forcombining the results of all these classifiers is known as thefusion or combination rule. Figure 3 shows the proposedframework for the fusion of an ensemble of classifiers. Theframework can be divided into three partsalgorithmtraining, fusion parameter computation, and classifier fusion.These three parts are discussed in the following subsections.

Algorithm Training

When training the classifiers, the training data must be

representative of the global data population. In most

situations, the training data constitute only a small part of the

whole data population and may have a lot of noise, which

can lead to misclassification of unseen data sets.

Bootstrapping is a potential solution to this problem.

BootstrappingThe bootstrapping method was originallyproposed by Efron [23]. When bootstrapping is used,resampling of training data is done such that sampleobservations are picked randomly by replacement from theoriginal training data to form new training data sets whichare the same size as the original training data set [24].Considering an original training data which has Nobservations, the bootstrapped training sets also consist of Nobservations but these observations are picked by randomlyresampling from original training data set. Individualclassifiers are then trained on these data sets. Once theclassifiers are trained, their classification performance can

be evaluated by cross-validation, wherein they are evaluatedon observations that they have not been trained on. Thisprocedure gives an unbiased estimate of the classificationerror and is discussed in detail in section performanceevaluation.

The aim of bootstrapping is to increase the diversity of the

algorithm. As suggested previously, increasing the diversity

during training improves the generalizability when a suitable

combination rule is applied. Diversity has been achieved by

using bootstrapped training data with different initial

Figure 3 Optimized fusion framework.


4/10

4

weights in the neural networks as well as using various

classifiers (neural networks and support vector machines).

ClassifiersOnce the bootstrapped training data have beengenerated, the classifiers need to be trained on thesesamples. Different classifiers are employed to overcome theproblem of inductive bias. In this paper, a support vectormachine and neural networks are employed. Each

member/classifier of the ensemble is trained to be aclassification expert on the individual bootstrapped samples.

Support vector machine (SVM) classification is based on

VapnikChervonenkis theory for structural risk

minimization [30]. The objective of SVM is to find the

optimal hyperplane + where w is the normal to thehyperplane and | /| is the distance of the hyperplanefrom the origin of the coordinate system [31]. Given an input

feature vector , , , . , where and itscorresponding class is , , , . , , where {1,1},the aim of the support vector machine is to find an optimal

hyper plane such that the objective function shown inEquation (4) is minimized subject to the constraints in

Equation (5). To solve for the hyper plane, the following

objective function is minimized which results in a

hyperplane that optimally separates the two classes.

min , = + (4) . . + 1 {1,1} (5)

where is the slack variable or the distance marginintroduced to allow any misclassifications, and is thepenalty or the cost for the misclassifications. The

function is a mapping of to a higher dimensionalspace.A neural network is also used for performing classification.

Given an input feature vector, , , , , where and its corresponding class, , , , {1, 1}, the neural network has N neurons in theinput layer (where N is the size of input feature vector), a

hidden layer and an output node [32]. A sigmoid activation

function is used in each neuron in the hidden layer. The

output of the neural network is the class label of the input

features. We used a gradient-based approach to train the

neural network.

Fusion Parameter Computation

The diverse classifiers generated in the previous step need tobe suitably combined. Here, a cost function has beenformulated and minimized to obtain the most suitablecombination of these classifiers. The fusion parametercomputation includes two steps: performance evaluation andfusion optimization.

Performance EvaluationTo evaluate the classificationperformance, the bias and variance errors of each classifierneed to be computed for the unseen observations by cross-validation. Bias and variance errors cannot be minimizedsimultaneously because a reduction in either one of the twocomponents could lead to an increase in the other, as shownin Figure 1. Therefore, the total error of a classifier is used toevaluate classification performance, which is a combination

of both of these factors, as shown below [33]:

= + 6To estimate the bias of the classifiers, conventionalvalidation methods will segment the training data intodisjoint sets, e.g., training and validation data sets. Forexample, a data set can be segmented into two parts wherein70% of the data are used to train the classifier and theremaining 30% are used for optimizing the classifierparameters [34]. But such a method is biased by thevalidation data. To choose how much data to use for trainingand validation, cross-validation has been thought to provide

an unbiased estimate. In cross-validation, each of the trainedclassifiers is evaluated for accuracy on unseen observations.

If training data are resampled into B bootstrapped samplesets, each classifier is trained on different sample sets. Onceall the classifiers are trained, the performance of eachclassifier is evaluated on the unseen observations in thetraining data. The errors computed are the false positiveerror () and false negative ( ) error, as shown in (7).

= ||1

= 1,

: 1

(7)

= 1

||

= 1,

: 1

where is the output of the classifier trained on thebootstrap sample set , : 1 ; is the actual classoutput with the input feature

; N is the sample number in

the training data; andMbi is the sample index in the bootstrapsample set for ith observation , and |Mbi| is thenumber of such samples.

In ensemble learning, classifiers with high variance aresusceptible to small changes in the input features. Forexample, neural networks that have too many hidden layersand nodes can have an over-fitting problem that results inhigh variance and low bias and therefore poorgeneralizability performance [37]. The variance of theclassifier is given by equation (8):


5/10

5

= : 1 (8)

where is the expected fusion outcome of theclassifiers. Since a weighted fusion methodology will be

employed, the expected value is given by the followingequation:

= (9)where is the weight of the ith classifier.Fusion OptimizationThe primary objective of this fusionmethodology is to equip users with a tool that is capable ofoptimally combining the results of various diagnosticalgorithms. The fusion optimization method helps achieve agood biasvariance balance, thereby providing goodgeneralizability.

The errors of false positive and false negative are calculatedvia cross-validation, as discussed in the previous section. Fora given classifier, let the false positive error be and thefalse negative error be, : 1 (i.e., there are Bclassifiers trained on the bootstrapped data).

Let us assume that the cost of having a false positive is and that the cost for a false negative is . The costfactors,and, serve as prioritizing parameters, and theyare relative terms that are used to prioritize the significanceof false positives and false negatives in the cost function. Forusers who depend on system diagnostics for schedulingmaintenance, the cost of a false positive is the cost incurredwhen a healthy system is erroneously classified as faulty,and the cost of a false negative is occurred due to theerroneous classification of a faulty system as healthy. Thesecosts may not necessarily be tangible; some of the intangibletypes of cost, such as safety, customer satisfaction,availability, etc., could be incorporated. Quantifying thesecosts is out of the scope of this paper as these costs typicallyvary across organizations and applications. The total cost ofthe misclassification is as shown in Equation (10). The costfactors and can be changed based on the conditionshown in Equation (11)

= +

(10)

+ = 1 (11)Now, we use a weighted sum of all the classifiers to obtainthe final result. Let ,, , be the weights assigned tothe each classifier. The objective function associated withthe fusion of all these classifiers is as shown in (12). Thisobjective function consists of two main components: the biascomponent and the variance component.

= + (12)

where,

= 1 (13)

0

Documents

13 Optimized Diagnostic Model