Upload
erhan-ersoy
View
214
Download
0
Embed Size (px)
Citation preview
7/27/2019 13 Optimized Diagnostic Model
1/10
978-1-4673-1813-6/13/$31.00 2013 IEEE1
AbstractIdentifying the most suitable classifier for diagnostics
is a challenging task. In addition to using domain expertise, a
trial and error method has been widely used to identify the
most suitable classifier. Classifier fusion can be used to
overcome this challenge and it has been widely known to
perform better than single classifier. Classifier fusion helps in
overcoming the error due to inductive bias of various
classifiers. The combination rule also plays a vital role in
classifier fusion, and it has not been well studied whichcombination rules provide the best performance during
classifier fusion. Good combination rules will achieve good
generalizability while taking advantage of the diversity of the
classifiers. In this work, we develop an approach for ensemble
learning consisting of an optimized combination rule. The
generalizability has been acknowledged to be a challenge for
training a diverse set of classifiers, but it can be achieved by an
optimal balance between bias and variance errors using the
combination rule in this paper. Generalizability implies the
ability of a classifier to learn the underlying model from the
training data and to predict the unseen observations. In this
paper, cross validation has been employed during performance
evaluation of each classifier to get an unbiased performance
estimate. An objective function is constructed and optimized
based on the performance evaluation to achieve the optimalbias-variance balance. This function can be solved as a
constrained nonlinear optimization problem. Sequential
Quadratic Programming based optimization with better
convergence property has been employed for the optimization.
We have demonstrated the applicability of the algorithm by
using support vector machine and neural networks as
classifiers, but the methodology can be broadly applicable for
combining other classifier algorithms as well. The method has
been applied to the fault diagnosis of analog circuits. The
performance of the proposed algorithm has been compared to
other combination rules in the literature. It is observed that the
proposed combination rule performs better in reducing the
number of false positives and false negatives.
TABLE OF CONTENTS1. INTRODUCTION .......................................... 12. OPTIMIZED FUSION METHODOLOGY ....... 23. RESULTS AND DISCUSSION ........................ 64. CONCLUSIONS ............................................ 8REFERENCES......................................................... 8BIOGRAPHIES........................................................ 9
1.INTRODUCTIONThe field of prognostics and health management (PHM)
involves the development of technologies and
methodologies to increase the availability and reliability of
engineering systems [2] [3] [4]. As part of the PHM
regimen, diagnostic algorithms have been developed and
employed to assist in fault diagnosis. As the number of
available PHM algorithms has increased, the algorithmselection process has become more complex and difficult.
Dilemmas for users when choosing a diagnostic algorithm in
PHM include the following: whether the data utilized for
training is a suitable representation of the global population;
how well the algorithm will perform in a noisy environment;
and how well the algorithm will perform on data that it has
not encountered (i.e. generalizability). A method is needed
to quickly identify appropriate algorithms that meet the
performance requirements for specific applications.
Ensemble learning is one technique that has been employed
to improve generalizability [1] as well as in situations where
the training data is not a suitable representation of the global
population. In ensemble learning, a collection of classifiers
are trained simultaneously, and the results are combined in a
suitable manner (also referred to as the combination rule) to
improve performance. Generally, the approach of ensemble
learning is to train a diverse set of classifiers and then devise
a method to combine these trained classifiers. Diversity in
the ensemble learning context refers to different
classification outputs from each of the trained classifiers for
a given input sample set. Studies have reported on the
importance of and methodologies for generating diverse
classifiers. Brown et al. [5] provided a mathematical account
of the role of diversity in ensemble learning and how it helpsto improve classification accuracy. Theoretically, the more
diverse the classifiers are, the less correlated the classifier
outputs are with each other. As a result, the prediction based
on each classifier of the ensemble could be complementary
to each other, which implies when one classifier makes a
prediction error, other classifiers could be correct. The
complementary nature of these classifiers helps in offsetting
potential errors, thereby providing greater generalizability
for these algorithms.
Surya KuncheCenter for Advanced Life
Cycle Engineering, University
of Maryland, College Park,MD 20742 USA
Chaochao ChenCenter for Advanced Life Cycle
Engineering, University of
Maryland, College Park, MD20742 USA
Michael G. PechtCenter for Advanced Life
Cycle Engineering, University
of Maryland, College Park,MD 20742 USA
Optimized Diagnostic Model Combination for
Improving Diagnostic Accuracy
7/27/2019 13 Optimized Diagnostic Model
2/10
2
Studies have been conducted on the diversity-generating
stage of the classifiers. The most widely used methodology
for diversity-generation focuses on manipulating the training
data; i.e., supplying each classifier with a different set of
manipulated training data (for example, training a classifier
with only part of the training data). When a classifier is
trained with a manipulated training data set, it typically
generates diverse predictions. Commonly used methods for
manipulating training data are bootstrapping, bagging [17],and boosting [18]. Diversity is also achieved by changing
the adjustable parameters of a classifier being trained; for
example, in neural networks diversity can be achieved by
changing the initial weights, the number of hidden neurons,
the activation function and the training algorithm [5]. Some
researchers have proposed the use of an evolutionary
algorithm to achieve the optimal amount of diversity during
the training phase [19, 20, 21].
Once diversity is achieved for the trained classifiers, a
combination rule is used to combine the classificationresults. The most common means of classifier fusion include
averaging [7, 8], majority voting [9, 10, 11], weighted
majority voting [12], and a localized fusion [13,14] based
approach that improves the weighted majority voting
algorithm by evaluating the performance of classifiers in the
neighborhood of the test points. Bonissone et al. [15]
proposed a fusion methodology based on Cartesian and
regression trees that reduces the computation time in the
localized fusion.
The error of any machine learning algorithm can be
classified into two parts: the bias component and the
variance component. Given a training set { , , =1,2,, } for ,1,2 = , where is the feature vectorand is the corresponding class vector. Let be theclassifier trained on training set and is the predictionof classifier for input feature . The bias component of theerror is defined as shown in (1). As seen from the equation
bias component describes the correctness of the model [36].
The variance of the classifier is as shown in (2). The
variance component describes the precision of the model and
how prediction varies with training set [36]. A machine
learning algorithm achieves minimal prediction errors on
unseen data when there is an optimal balance of bias and
variance [35]. Figure 1 shows the changes in errors
including bias, variance, and total error as the value of
model complexity changes. As seen in Figure 1, good
generalizability is achieved by an optimal balance of bias
and variance.
= | | (1) = (2)
= (3)The optimized fusion methodology discussed in this papercombines multiple classifiers based on their performance soas to achieve the least total error in fault detection, i.e., theleast number of false and missed alarms. To achieve thisoptimized fusion, we compute bias and variance to evaluatethe performance of each classifier. The performance of the
classifiers is also validated with a cost function using crossvalidation. We then developed a framework for thismethodology that combines all the classifiers. Acomparative analysis using experimental data has beenconducted to demonstrate the accuracy of this algorithmover other methodologies.
We discuss our optimized fusion methodology in section 2.In section 3 we present 2 case studies wherein we used thismethodology to perform diagnostics for different analogcircuits including a sallen-key band pass filter and a biquadlow pass filter. We also conducted a comparative analysis toevaluate the performance of this methodology. In section 4
we give concluding remarks.
2.OPTIMIZED FUSION METHODOLOGY
Inductive bias [22] is defined as the set of assumptions that aclassifier makes in order to classify a given set of features.For example, support vector machine classification (Figure2) assumes that different classes can be separated by a hyperplane. In the case of k nearest neighbor classification, the
Figure 1 Error change as a function of model complexity
[33] [36].
Point of least
total error
Figure 2 SVM classification.
7/27/2019 13 Optimized Diagnostic Model
3/10
3
assumption is that the distance of a test point from the knearest neighbors of each test point determines the class ofthe test point, i.e., the test point belongs to the class to whichthis distance is least. These assumptions may not necessarily
be true in all instances; for example, as seen in Figure 2, thehyperplane of the support vector machine is not able tocorrectly determine the decision boundary in this two-class(blue and green) classification problem. Hence, theassumptions made by the classifiers lead to an error knownas inductive bias in classification, which is a classificationerror.
In classifier fusion the complementary features of theclassifiers are employed to overcome their individual errors.To achieve this, classifier algorithms are initially trained onthe training data set, and then the classification outputs ofthe classifier algorithms are combined. The method forcombining the results of all these classifiers is known as thefusion or combination rule. Figure 3 shows the proposedframework for the fusion of an ensemble of classifiers. Theframework can be divided into three partsalgorithmtraining, fusion parameter computation, and classifier fusion.These three parts are discussed in the following subsections.
Algorithm Training
When training the classifiers, the training data must be
representative of the global data population. In most
situations, the training data constitute only a small part of the
whole data population and may have a lot of noise, which
can lead to misclassification of unseen data sets.
Bootstrapping is a potential solution to this problem.
BootstrappingThe bootstrapping method was originallyproposed by Efron [23]. When bootstrapping is used,resampling of training data is done such that sampleobservations are picked randomly by replacement from theoriginal training data to form new training data sets whichare the same size as the original training data set [24].Considering an original training data which has Nobservations, the bootstrapped training sets also consist of Nobservations but these observations are picked by randomlyresampling from original training data set. Individualclassifiers are then trained on these data sets. Once theclassifiers are trained, their classification performance can
be evaluated by cross-validation, wherein they are evaluatedon observations that they have not been trained on. Thisprocedure gives an unbiased estimate of the classificationerror and is discussed in detail in section performanceevaluation.
The aim of bootstrapping is to increase the diversity of the
algorithm. As suggested previously, increasing the diversity
during training improves the generalizability when a suitable
combination rule is applied. Diversity has been achieved by
using bootstrapped training data with different initial
Figure 3 Optimized fusion framework.
7/27/2019 13 Optimized Diagnostic Model
4/10
4
weights in the neural networks as well as using various
classifiers (neural networks and support vector machines).
ClassifiersOnce the bootstrapped training data have beengenerated, the classifiers need to be trained on thesesamples. Different classifiers are employed to overcome theproblem of inductive bias. In this paper, a support vectormachine and neural networks are employed. Each
member/classifier of the ensemble is trained to be aclassification expert on the individual bootstrapped samples.
Support vector machine (SVM) classification is based on
VapnikChervonenkis theory for structural risk
minimization [30]. The objective of SVM is to find the
optimal hyperplane + where w is the normal to thehyperplane and | /| is the distance of the hyperplanefrom the origin of the coordinate system [31]. Given an input
feature vector , , , . , where and itscorresponding class is , , , . , , where {1,1},the aim of the support vector machine is to find an optimal
hyper plane such that the objective function shown inEquation (4) is minimized subject to the constraints in
Equation (5). To solve for the hyper plane, the following
objective function is minimized which results in a
hyperplane that optimally separates the two classes.
min , = + (4) . . + 1 {1,1} (5)
where is the slack variable or the distance marginintroduced to allow any misclassifications, and is thepenalty or the cost for the misclassifications. The
function is a mapping of to a higher dimensionalspace.A neural network is also used for performing classification.
Given an input feature vector, , , , , where and its corresponding class, , , , {1, 1}, the neural network has N neurons in theinput layer (where N is the size of input feature vector), a
hidden layer and an output node [32]. A sigmoid activation
function is used in each neuron in the hidden layer. The
output of the neural network is the class label of the input
features. We used a gradient-based approach to train the
neural network.
Fusion Parameter Computation
The diverse classifiers generated in the previous step need tobe suitably combined. Here, a cost function has beenformulated and minimized to obtain the most suitablecombination of these classifiers. The fusion parametercomputation includes two steps: performance evaluation andfusion optimization.
Performance EvaluationTo evaluate the classificationperformance, the bias and variance errors of each classifierneed to be computed for the unseen observations by cross-validation. Bias and variance errors cannot be minimizedsimultaneously because a reduction in either one of the twocomponents could lead to an increase in the other, as shownin Figure 1. Therefore, the total error of a classifier is used toevaluate classification performance, which is a combination
of both of these factors, as shown below [33]:
= + 6To estimate the bias of the classifiers, conventionalvalidation methods will segment the training data intodisjoint sets, e.g., training and validation data sets. Forexample, a data set can be segmented into two parts wherein70% of the data are used to train the classifier and theremaining 30% are used for optimizing the classifierparameters [34]. But such a method is biased by thevalidation data. To choose how much data to use for trainingand validation, cross-validation has been thought to provide
an unbiased estimate. In cross-validation, each of the trainedclassifiers is evaluated for accuracy on unseen observations.
If training data are resampled into B bootstrapped samplesets, each classifier is trained on different sample sets. Onceall the classifiers are trained, the performance of eachclassifier is evaluated on the unseen observations in thetraining data. The errors computed are the false positiveerror () and false negative ( ) error, as shown in (7).
= ||1
= 1,
: 1
(7)
= 1
||
= 1,
: 1
where is the output of the classifier trained on thebootstrap sample set , : 1 ; is the actual classoutput with the input feature
; N is the sample number in
the training data; andMbi is the sample index in the bootstrapsample set for ith observation , and |Mbi| is thenumber of such samples.
In ensemble learning, classifiers with high variance aresusceptible to small changes in the input features. Forexample, neural networks that have too many hidden layersand nodes can have an over-fitting problem that results inhigh variance and low bias and therefore poorgeneralizability performance [37]. The variance of theclassifier is given by equation (8):
7/27/2019 13 Optimized Diagnostic Model
5/10
5
= : 1 (8)
where is the expected fusion outcome of theclassifiers. Since a weighted fusion methodology will be
employed, the expected value is given by the followingequation:
= (9)where is the weight of the ith classifier.Fusion OptimizationThe primary objective of this fusionmethodology is to equip users with a tool that is capable ofoptimally combining the results of various diagnosticalgorithms. The fusion optimization method helps achieve agood biasvariance balance, thereby providing goodgeneralizability.
The errors of false positive and false negative are calculatedvia cross-validation, as discussed in the previous section. Fora given classifier, let the false positive error be and thefalse negative error be, : 1 (i.e., there are Bclassifiers trained on the bootstrapped data).
Let us assume that the cost of having a false positive is and that the cost for a false negative is . The costfactors,and, serve as prioritizing parameters, and theyare relative terms that are used to prioritize the significanceof false positives and false negatives in the cost function. Forusers who depend on system diagnostics for schedulingmaintenance, the cost of a false positive is the cost incurredwhen a healthy system is erroneously classified as faulty,and the cost of a false negative is occurred due to theerroneous classification of a faulty system as healthy. Thesecosts may not necessarily be tangible; some of the intangibletypes of cost, such as safety, customer satisfaction,availability, etc., could be incorporated. Quantifying thesecosts is out of the scope of this paper as these costs typicallyvary across organizations and applications. The total cost ofthe misclassification is as shown in Equation (10). The costfactors and can be changed based on the conditionshown in Equation (11)
= +
(10)
+ = 1 (11)Now, we use a weighted sum of all the classifiers to obtainthe final result. Let ,, , be the weights assigned tothe each classifier. The objective function associated withthe fusion of all these classifiers is as shown in (12). Thisobjective function consists of two main components: the biascomponent and the variance component.
= + (12)
where,
= 1 (13)
0