13 Optimized Diagnostic Model

Embed Size (px)

Citation preview

  • 7/27/2019 13 Optimized Diagnostic Model

    1/10

    978-1-4673-1813-6/13/$31.00 2013 IEEE1

    AbstractIdentifying the most suitable classifier for diagnostics

    is a challenging task. In addition to using domain expertise, a

    trial and error method has been widely used to identify the

    most suitable classifier. Classifier fusion can be used to

    overcome this challenge and it has been widely known to

    perform better than single classifier. Classifier fusion helps in

    overcoming the error due to inductive bias of various

    classifiers. The combination rule also plays a vital role in

    classifier fusion, and it has not been well studied whichcombination rules provide the best performance during

    classifier fusion. Good combination rules will achieve good

    generalizability while taking advantage of the diversity of the

    classifiers. In this work, we develop an approach for ensemble

    learning consisting of an optimized combination rule. The

    generalizability has been acknowledged to be a challenge for

    training a diverse set of classifiers, but it can be achieved by an

    optimal balance between bias and variance errors using the

    combination rule in this paper. Generalizability implies the

    ability of a classifier to learn the underlying model from the

    training data and to predict the unseen observations. In this

    paper, cross validation has been employed during performance

    evaluation of each classifier to get an unbiased performance

    estimate. An objective function is constructed and optimized

    based on the performance evaluation to achieve the optimalbias-variance balance. This function can be solved as a

    constrained nonlinear optimization problem. Sequential

    Quadratic Programming based optimization with better

    convergence property has been employed for the optimization.

    We have demonstrated the applicability of the algorithm by

    using support vector machine and neural networks as

    classifiers, but the methodology can be broadly applicable for

    combining other classifier algorithms as well. The method has

    been applied to the fault diagnosis of analog circuits. The

    performance of the proposed algorithm has been compared to

    other combination rules in the literature. It is observed that the

    proposed combination rule performs better in reducing the

    number of false positives and false negatives.

    TABLE OF CONTENTS1. INTRODUCTION .......................................... 12. OPTIMIZED FUSION METHODOLOGY ....... 23. RESULTS AND DISCUSSION ........................ 64. CONCLUSIONS ............................................ 8REFERENCES......................................................... 8BIOGRAPHIES........................................................ 9

    1.INTRODUCTIONThe field of prognostics and health management (PHM)

    involves the development of technologies and

    methodologies to increase the availability and reliability of

    engineering systems [2] [3] [4]. As part of the PHM

    regimen, diagnostic algorithms have been developed and

    employed to assist in fault diagnosis. As the number of

    available PHM algorithms has increased, the algorithmselection process has become more complex and difficult.

    Dilemmas for users when choosing a diagnostic algorithm in

    PHM include the following: whether the data utilized for

    training is a suitable representation of the global population;

    how well the algorithm will perform in a noisy environment;

    and how well the algorithm will perform on data that it has

    not encountered (i.e. generalizability). A method is needed

    to quickly identify appropriate algorithms that meet the

    performance requirements for specific applications.

    Ensemble learning is one technique that has been employed

    to improve generalizability [1] as well as in situations where

    the training data is not a suitable representation of the global

    population. In ensemble learning, a collection of classifiers

    are trained simultaneously, and the results are combined in a

    suitable manner (also referred to as the combination rule) to

    improve performance. Generally, the approach of ensemble

    learning is to train a diverse set of classifiers and then devise

    a method to combine these trained classifiers. Diversity in

    the ensemble learning context refers to different

    classification outputs from each of the trained classifiers for

    a given input sample set. Studies have reported on the

    importance of and methodologies for generating diverse

    classifiers. Brown et al. [5] provided a mathematical account

    of the role of diversity in ensemble learning and how it helpsto improve classification accuracy. Theoretically, the more

    diverse the classifiers are, the less correlated the classifier

    outputs are with each other. As a result, the prediction based

    on each classifier of the ensemble could be complementary

    to each other, which implies when one classifier makes a

    prediction error, other classifiers could be correct. The

    complementary nature of these classifiers helps in offsetting

    potential errors, thereby providing greater generalizability

    for these algorithms.

    Surya KuncheCenter for Advanced Life

    Cycle Engineering, University

    of Maryland, College Park,MD 20742 USA

    [email protected]

    Chaochao ChenCenter for Advanced Life Cycle

    Engineering, University of

    Maryland, College Park, MD20742 USA

    [email protected]

    Michael G. PechtCenter for Advanced Life

    Cycle Engineering, University

    of Maryland, College Park,MD 20742 USA

    [email protected]

    Optimized Diagnostic Model Combination for

    Improving Diagnostic Accuracy

  • 7/27/2019 13 Optimized Diagnostic Model

    2/10

    2

    Studies have been conducted on the diversity-generating

    stage of the classifiers. The most widely used methodology

    for diversity-generation focuses on manipulating the training

    data; i.e., supplying each classifier with a different set of

    manipulated training data (for example, training a classifier

    with only part of the training data). When a classifier is

    trained with a manipulated training data set, it typically

    generates diverse predictions. Commonly used methods for

    manipulating training data are bootstrapping, bagging [17],and boosting [18]. Diversity is also achieved by changing

    the adjustable parameters of a classifier being trained; for

    example, in neural networks diversity can be achieved by

    changing the initial weights, the number of hidden neurons,

    the activation function and the training algorithm [5]. Some

    researchers have proposed the use of an evolutionary

    algorithm to achieve the optimal amount of diversity during

    the training phase [19, 20, 21].

    Once diversity is achieved for the trained classifiers, a

    combination rule is used to combine the classificationresults. The most common means of classifier fusion include

    averaging [7, 8], majority voting [9, 10, 11], weighted

    majority voting [12], and a localized fusion [13,14] based

    approach that improves the weighted majority voting

    algorithm by evaluating the performance of classifiers in the

    neighborhood of the test points. Bonissone et al. [15]

    proposed a fusion methodology based on Cartesian and

    regression trees that reduces the computation time in the

    localized fusion.

    The error of any machine learning algorithm can be

    classified into two parts: the bias component and the

    variance component. Given a training set { , , =1,2,, } for ,1,2 = , where is the feature vectorand is the corresponding class vector. Let be theclassifier trained on training set and is the predictionof classifier for input feature . The bias component of theerror is defined as shown in (1). As seen from the equation

    bias component describes the correctness of the model [36].

    The variance of the classifier is as shown in (2). The

    variance component describes the precision of the model and

    how prediction varies with training set [36]. A machine

    learning algorithm achieves minimal prediction errors on

    unseen data when there is an optimal balance of bias and

    variance [35]. Figure 1 shows the changes in errors

    including bias, variance, and total error as the value of

    model complexity changes. As seen in Figure 1, good

    generalizability is achieved by an optimal balance of bias

    and variance.

    = | | (1) = (2)

    = (3)The optimized fusion methodology discussed in this papercombines multiple classifiers based on their performance soas to achieve the least total error in fault detection, i.e., theleast number of false and missed alarms. To achieve thisoptimized fusion, we compute bias and variance to evaluatethe performance of each classifier. The performance of the

    classifiers is also validated with a cost function using crossvalidation. We then developed a framework for thismethodology that combines all the classifiers. Acomparative analysis using experimental data has beenconducted to demonstrate the accuracy of this algorithmover other methodologies.

    We discuss our optimized fusion methodology in section 2.In section 3 we present 2 case studies wherein we used thismethodology to perform diagnostics for different analogcircuits including a sallen-key band pass filter and a biquadlow pass filter. We also conducted a comparative analysis toevaluate the performance of this methodology. In section 4

    we give concluding remarks.

    2.OPTIMIZED FUSION METHODOLOGY

    Inductive bias [22] is defined as the set of assumptions that aclassifier makes in order to classify a given set of features.For example, support vector machine classification (Figure2) assumes that different classes can be separated by a hyperplane. In the case of k nearest neighbor classification, the

    Figure 1 Error change as a function of model complexity

    [33] [36].

    Point of least

    total error

    Figure 2 SVM classification.

  • 7/27/2019 13 Optimized Diagnostic Model

    3/10

    3

    assumption is that the distance of a test point from the knearest neighbors of each test point determines the class ofthe test point, i.e., the test point belongs to the class to whichthis distance is least. These assumptions may not necessarily

    be true in all instances; for example, as seen in Figure 2, thehyperplane of the support vector machine is not able tocorrectly determine the decision boundary in this two-class(blue and green) classification problem. Hence, theassumptions made by the classifiers lead to an error knownas inductive bias in classification, which is a classificationerror.

    In classifier fusion the complementary features of theclassifiers are employed to overcome their individual errors.To achieve this, classifier algorithms are initially trained onthe training data set, and then the classification outputs ofthe classifier algorithms are combined. The method forcombining the results of all these classifiers is known as thefusion or combination rule. Figure 3 shows the proposedframework for the fusion of an ensemble of classifiers. Theframework can be divided into three partsalgorithmtraining, fusion parameter computation, and classifier fusion.These three parts are discussed in the following subsections.

    Algorithm Training

    When training the classifiers, the training data must be

    representative of the global data population. In most

    situations, the training data constitute only a small part of the

    whole data population and may have a lot of noise, which

    can lead to misclassification of unseen data sets.

    Bootstrapping is a potential solution to this problem.

    BootstrappingThe bootstrapping method was originallyproposed by Efron [23]. When bootstrapping is used,resampling of training data is done such that sampleobservations are picked randomly by replacement from theoriginal training data to form new training data sets whichare the same size as the original training data set [24].Considering an original training data which has Nobservations, the bootstrapped training sets also consist of Nobservations but these observations are picked by randomlyresampling from original training data set. Individualclassifiers are then trained on these data sets. Once theclassifiers are trained, their classification performance can

    be evaluated by cross-validation, wherein they are evaluatedon observations that they have not been trained on. Thisprocedure gives an unbiased estimate of the classificationerror and is discussed in detail in section performanceevaluation.

    The aim of bootstrapping is to increase the diversity of the

    algorithm. As suggested previously, increasing the diversity

    during training improves the generalizability when a suitable

    combination rule is applied. Diversity has been achieved by

    using bootstrapped training data with different initial

    Figure 3 Optimized fusion framework.

  • 7/27/2019 13 Optimized Diagnostic Model

    4/10

    4

    weights in the neural networks as well as using various

    classifiers (neural networks and support vector machines).

    ClassifiersOnce the bootstrapped training data have beengenerated, the classifiers need to be trained on thesesamples. Different classifiers are employed to overcome theproblem of inductive bias. In this paper, a support vectormachine and neural networks are employed. Each

    member/classifier of the ensemble is trained to be aclassification expert on the individual bootstrapped samples.

    Support vector machine (SVM) classification is based on

    VapnikChervonenkis theory for structural risk

    minimization [30]. The objective of SVM is to find the

    optimal hyperplane + where w is the normal to thehyperplane and | /| is the distance of the hyperplanefrom the origin of the coordinate system [31]. Given an input

    feature vector , , , . , where and itscorresponding class is , , , . , , where {1,1},the aim of the support vector machine is to find an optimal

    hyper plane such that the objective function shown inEquation (4) is minimized subject to the constraints in

    Equation (5). To solve for the hyper plane, the following

    objective function is minimized which results in a

    hyperplane that optimally separates the two classes.

    min , = + (4) . . + 1 {1,1} (5)

    where is the slack variable or the distance marginintroduced to allow any misclassifications, and is thepenalty or the cost for the misclassifications. The

    function is a mapping of to a higher dimensionalspace.A neural network is also used for performing classification.

    Given an input feature vector, , , , , where and its corresponding class, , , , {1, 1}, the neural network has N neurons in theinput layer (where N is the size of input feature vector), a

    hidden layer and an output node [32]. A sigmoid activation

    function is used in each neuron in the hidden layer. The

    output of the neural network is the class label of the input

    features. We used a gradient-based approach to train the

    neural network.

    Fusion Parameter Computation

    The diverse classifiers generated in the previous step need tobe suitably combined. Here, a cost function has beenformulated and minimized to obtain the most suitablecombination of these classifiers. The fusion parametercomputation includes two steps: performance evaluation andfusion optimization.

    Performance EvaluationTo evaluate the classificationperformance, the bias and variance errors of each classifierneed to be computed for the unseen observations by cross-validation. Bias and variance errors cannot be minimizedsimultaneously because a reduction in either one of the twocomponents could lead to an increase in the other, as shownin Figure 1. Therefore, the total error of a classifier is used toevaluate classification performance, which is a combination

    of both of these factors, as shown below [33]:

    = + 6To estimate the bias of the classifiers, conventionalvalidation methods will segment the training data intodisjoint sets, e.g., training and validation data sets. Forexample, a data set can be segmented into two parts wherein70% of the data are used to train the classifier and theremaining 30% are used for optimizing the classifierparameters [34]. But such a method is biased by thevalidation data. To choose how much data to use for trainingand validation, cross-validation has been thought to provide

    an unbiased estimate. In cross-validation, each of the trainedclassifiers is evaluated for accuracy on unseen observations.

    If training data are resampled into B bootstrapped samplesets, each classifier is trained on different sample sets. Onceall the classifiers are trained, the performance of eachclassifier is evaluated on the unseen observations in thetraining data. The errors computed are the false positiveerror () and false negative ( ) error, as shown in (7).

    = ||1

    = 1,

    : 1

    (7)

    = 1

    ||

    = 1,

    : 1

    where is the output of the classifier trained on thebootstrap sample set , : 1 ; is the actual classoutput with the input feature

    ; N is the sample number in

    the training data; andMbi is the sample index in the bootstrapsample set for ith observation , and |Mbi| is thenumber of such samples.

    In ensemble learning, classifiers with high variance aresusceptible to small changes in the input features. Forexample, neural networks that have too many hidden layersand nodes can have an over-fitting problem that results inhigh variance and low bias and therefore poorgeneralizability performance [37]. The variance of theclassifier is given by equation (8):

  • 7/27/2019 13 Optimized Diagnostic Model

    5/10

    5

    = : 1 (8)

    where is the expected fusion outcome of theclassifiers. Since a weighted fusion methodology will be

    employed, the expected value is given by the followingequation:

    = (9)where is the weight of the ith classifier.Fusion OptimizationThe primary objective of this fusionmethodology is to equip users with a tool that is capable ofoptimally combining the results of various diagnosticalgorithms. The fusion optimization method helps achieve agood biasvariance balance, thereby providing goodgeneralizability.

    The errors of false positive and false negative are calculatedvia cross-validation, as discussed in the previous section. Fora given classifier, let the false positive error be and thefalse negative error be, : 1 (i.e., there are Bclassifiers trained on the bootstrapped data).

    Let us assume that the cost of having a false positive is and that the cost for a false negative is . The costfactors,and, serve as prioritizing parameters, and theyare relative terms that are used to prioritize the significanceof false positives and false negatives in the cost function. Forusers who depend on system diagnostics for schedulingmaintenance, the cost of a false positive is the cost incurredwhen a healthy system is erroneously classified as faulty,and the cost of a false negative is occurred due to theerroneous classification of a faulty system as healthy. Thesecosts may not necessarily be tangible; some of the intangibletypes of cost, such as safety, customer satisfaction,availability, etc., could be incorporated. Quantifying thesecosts is out of the scope of this paper as these costs typicallyvary across organizations and applications. The total cost ofthe misclassification is as shown in Equation (10). The costfactors and can be changed based on the conditionshown in Equation (11)

    = +

    (10)

    + = 1 (11)Now, we use a weighted sum of all the classifiers to obtainthe final result. Let ,, , be the weights assigned tothe each classifier. The objective function associated withthe fusion of all these classifiers is as shown in (12). Thisobjective function consists of two main components: the biascomponent and the variance component.

    = + (12)

    where,

    = 1 (13)

    0