12
IEEE TRANSACTIONS ON RELIABILITY, VOL. 64, NO. 1, MARCH 2015 51 Assessing Risk of a Serious Failure Mode Based on Limited Field Data Zhibing Xu, Yili Hong, and William Q. Meeker Abstract—Many consumer products are designed and manufac- tured so that the probability of failure during the technological life of the product is small. Most product units in the field retire be- fore they fail. Even though the number of failures of such prod- ucts is small, there is still a need to model and predict field fail- ures for purposes of risk assessment in applications that involve safety. Challenges in the modeling and prediction of failures arise because the retirement times are often unknown, few failures have been reported, and there are delays in field failure reporting. Mo- tivated by an application to assess the risk of failure for a par- ticular product, we develop a statistical prediction procedure that considers the impact of product retirements and reporting delays. Based on the developed method, we provide the point predictions for the cumulative number of reported failures over a future time period, and corresponding prediction intervals to quantify uncer- tainty. We also conduct sensitivity analysis to assess the effects of different assumptions on failure-time and retirement distributions. Index Terms—Calibration, discrete Fourier transform, failure reporting delay, Poisson-binomial distribution, prediction interval, Weibull. Acronyms and Abbreviations: cdf cumulative distribution function DFD data freeze date ML maximum likelihood pdf probability density function PI prediction interval Notation: cumulative number of reported failures at time retirement time failure time reporting delay time pdf of the failure-time distribution cdf of the failure-time distribution Manuscript received September 08, 2013; revised February 14, 2014; ac- cepted April 22, 2014. Date of publication September 11, 2014; date of current version February 27, 2015. The work of Z. Xu and Y. Hong was supported in part by the National Science Foundation under Grant CMMI-1068933 to Vir- ginia Tech and the 2011 DuPont Young Professor Grant. Associate Editor: R. H. Yeh. Z. Xu and Y. Hong are with the Department of Statistics, Virginia Poly- technic Institute and State University, Blacksburg, VA 24061 USA (e-mail: [email protected]; [email protected]). W. Q. Meeker is with the Department of Statistics, Iowa State University, Ames, IA 50011 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TR.2014.2354893 cdf of the retirement-time distribution probability of a reported failure in batch before the DFD probability of a unit in batch being reported in after the DFD standard smallest extreme value cdf standard smallest extreme value pdf standard -normal cdf standard -normal pdf expected value (mean) of the retirement-time distribution variance of the retirement-time distribution standard deviation of the retirement-time distribution Weibull shape parameter of the failure-time distribution Weibull scale parameter of the failure-time distribution location parameter of the distribution of scale parameter of the distribution of probability of a unit in batch not being reported as failing before the DFD age that failed unit would be if it had survived to the DFD age of a unit in batch at the DFD number of units not being reported as failures before the DFD in batch ML estimator of likelihood function for the unknown parameter vector I. INTRODUCTION A. Motivation Product reliability is important to manufacturers and con- sumers both. Many products have high reliability with only a small fraction failing (e.g., 1% or less). There are, however, some failure modes that can lead to a risk of loss of property or life. Examples include material anomalies in rotating com- ponents in aircraft engines that lead to premature cracking and 0018-9529 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Assessing Risk of a Serious Failure Mode Based on Limited Field Data

Embed Size (px)

DESCRIPTION

failure mode

Citation preview

  • IEEE TRANSACTIONS ON RELIABILITY, VOL. 64, NO. 1, MARCH 2015 51

    Assessing Risk of a Serious FailureMode Based on Limited Field Data

    Zhibing Xu, Yili Hong, and William Q. Meeker

    AbstractMany consumer products are designed and manufac-tured so that the probability of failure during the technological lifeof the product is small. Most product units in the eld retire be-fore they fail. Even though the number of failures of such prod-ucts is small, there is still a need to model and predict eld fail-ures for purposes of risk assessment in applications that involvesafety. Challenges in the modeling and prediction of failures arisebecause the retirement times are often unknown, few failures havebeen reported, and there are delays in eld failure reporting. Mo-tivated by an application to assess the risk of failure for a par-ticular product, we develop a statistical prediction procedure thatconsiders the impact of product retirements and reporting delays.Based on the developed method, we provide the point predictionsfor the cumulative number of reported failures over a future timeperiod, and corresponding prediction intervals to quantify uncer-tainty. We also conduct sensitivity analysis to assess the effects ofdifferent assumptions on failure-time and retirement distributions.Index TermsCalibration, discrete Fourier transform, failure

    reporting delay, Poisson-binomial distribution, prediction interval,Weibull.

    Acronyms and Abbreviations:

    cdf cumulative distribution functionDFD data freeze dateML maximum likelihoodpdf probability density functionPI prediction intervalNotation:

    cumulative number of reported failures at time

    retirement timefailure timereporting delay timepdf of the failure-time distributioncdf of the failure-time distribution

    Manuscript received September 08, 2013; revised February 14, 2014; ac-cepted April 22, 2014. Date of publication September 11, 2014; date of currentversion February 27, 2015. The work of Z. Xu and Y. Hong was supported inpart by the National Science Foundation under Grant CMMI-1068933 to Vir-ginia Tech and the 2011 DuPont Young Professor Grant. Associate Editor: R.H. Yeh.Z. Xu and Y. Hong are with the Department of Statistics, Virginia Poly-

    technic Institute and State University, Blacksburg, VA 24061 USA (e-mail:[email protected]; [email protected]).W. Q. Meeker is with the Department of Statistics, Iowa State University,

    Ames, IA 50011 USA (e-mail: [email protected]).Digital Object Identier 10.1109/TR.2014.2354893

    cdf of the retirement-time distributionprobability of a reported failure in batchbefore the DFDprobability of a unit in batch being reportedin after the DFDstandard smallest extreme value cdfstandard smallest extreme value pdfstandard -normal cdfstandard -normal pdfexpected value (mean) of the retirement-timedistributionvariance of the retirement-time distributionstandard deviation of the retirement-timedistributionWeibull shape parameter of the failure-timedistributionWeibull scale parameter of the failure-timedistributionlocation parameter of the distribution of

    scale parameter of the distribution ofprobability of a unit in batch not beingreported as failing before the DFDage that failed unit would be if it had survivedto the DFDage of a unit in batch at the DFDnumber of units not being reported as failuresbefore the DFD in batchML estimator oflikelihood function for the unknown parametervector

    I. INTRODUCTIONA. MotivationProduct reliability is important to manufacturers and con-

    sumers both. Many products have high reliability with only asmall fraction failing (e.g., 1% or less). There are, however,some failure modes that can lead to a risk of loss of propertyor life. Examples include material anomalies in rotating com-ponents in aircraft engines that lead to premature cracking and

    0018-9529 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

  • 52 IEEE TRANSACTIONS ON RELIABILITY, VOL. 64, NO. 1, MARCH 2015

    fracture, failure of electrical insulation in home appliancesgiving rise to the risk of re or electrical shock, failure ofelectrical connections in debrillators, and the explosion of alaptop battery.Although the particular technical details and nature of the

    available data and other information will differ from applica-tion to application, there is a common scenario that we haveseen in numerous different applications. At some point in time(which may range from months to years) after product intro-duction, a few failures have been reported. Often the particularfailure mode is one that had not been anticipated. Sometimesthe problem was caused by just a single batch of raw material,or an unreported and untested change in a component or mate-rial made by a vendor. Generally, management (or in some casesgovernment agencies) will want engineers to determine whetherthere is a serious problem, and will often ask for a formal risk as-sessment. This action then leads to some or all of the followingquestions.1) Were the reported failures anomalies (e.g., cause by ex-

    treme product abuse or a few defective units that gotshipped) or is the problem more widespread? Often, it isthe latter, but wishful thinking will cause some to believethe former.

    2) Is there a small proportion of defective units failing rapidly,or will all units (that remain in service) eventually fail pre-maturely?

    3) What is the risk (e.g., potential cost, both tangible and in-tangible) of future failures from this product?

    4) Should there be a product recall?5) How can we x the problem so that future production will

    not have the failure mode of concern?This paper focuses on statistical methods for answering question3.In some applications, the risk of failure is lessened by product

    retirement, before product failure occurs. Retirements areoften a result of product performance degradation or technicalobsolescence. For example, cell phones and laptop computersare typically retired after two or three years of use. Ironically,for some products, the risk of a serious failure is sometimeslessened by the occurrence of an innocuous failure mode. Forexample, an implanted debrillator that has a broken electricalconnection would be removed from service if its rechargeablebattery fails before the unit is called upon to be used. In suchapplications, possible retirement or innocuous failure eventsshould be part of the risk assessment. Another complicatingfeature of some eld data is the delayed reporting of failures.Motivated by several different but similar applications, we

    develop a statistical procedure to predict the eld failures ofproducts, considering the impact of product retirement and re-porting delays. Based on the developed method, we providepoint predictions for the cumulative number of reported fail-ures at a future point in time, and the corresponding predictioninterval (PI) to quantify uncertainty.B. Related WorkThere is a large amount of literature describing statistical pre-

    diction, and some of this previous work has focused on the pre-diction of the number of failures in a future time period and the

    construction of a corresponding PI. Nelson [1], and Meeker andEscobar [2] introduced general methods to obtain PIs for reli-ability applications. Engehardt and Bain [3] provided an exactPI for the number of failures in a repairable system based onmaximum likelihood (ML) estimation. Mee and Kushary [4]gave simulation-based methods for computing PIs for selectedorder statistics from future samples from a Weibull distribu-tion. Nelson [5], and Nordman and Meeker [6] proposed PI pro-cedures based on a Weibull distribution with a known shapeparameter. Geisser [7], and Tian, Tang, and Yu [8] describedBayesian approaches to obtain a PI. De Menezes, Vivanco, andSampaio [9] used subsampling to obtain the PI for the numberof failures in a future time interval. For censored failure-timedata, Escobar and Meeker [10], and Hong, Meeker, and Mc-Calley [11] described methods to obtain PIs for a future numberof failures. Lawless and Fredette [12] proposed an effective,easy-to-use procedure to construct frequentist PIs. Yang [13]proposed a prediction method for warranty cost based on ac-celerated life test plans. Park and Kulasekera [14] described aparametric method to deal with the competing risks problem.Few published works, however, have considered prediction inthe presence of the unknown retirement times and reporting de-lays. In one exception, Zhao, Steffey, and Loud [15] comparedthe difference of predictions between the models that accountfor retirement and that do not account for retirement, based ona specic retirement rate assumption. In this paper, we proposea general statistical procedure to predict the future number ofeld failures in the presence of retirement and reporting delays,and we develop a PI procedure to quantify the uncertainties inprediction.

    C. Overview

    Our approach to eld-failure prediction uses the followingsteps. Failure-time modelingWe rst construct a failure-time

    model based on assumptions for the retirement-time distri-bution and reporting delays. Then, we estimate the param-eters of the failure-time distribution using ML.

    Derivation of the probability of future failuresBased onthe failure-time and retirement-time distributions, as wellas the ML estimates from Step 1, the probability of a re-ported failure in a future time interval can be estimated,providing the basis of prediction of the cumulative numberof reported failures in a specied future time period.

    PredictionBased on the probability of a reported failure,and the number of units that are at risk, one can obtain apoint prediction for the cumulative number of reported fail-ures by a specied future point in time and a correspondingPI.

    Prediction IntervalWe use a method based on the con-cept of a predictive distribution and bootstrap calibrationto construct PIs for the future number of failures.

    Sensitivity analysisThe predictions are based on uncer-tain assumptions about the failure-time and the retirement-time distributions. Thus, it is prudent to assess the effect ofdeviations from these assumptions.

  • XU et al.: ASSESSING RISK OF A SERIOUS FAILURE MODE BASED ON LIMITED FIELD DATA 53

    The rest of the paper is organized as follows. Section II in-troduces the failure-time distribution, the retirement-time dis-tribution, and the reporting delay distribution. Section III de-velops an ML procedure to estimate the unknown failure-timedistribution parameters. Section IV shows in detail how to usethe failure-time distribution and estimated parameters to pre-dict the number of reported future failures, and how to computea corresponding PI. In Section V, sensitivity analysis is usedto compare the prediction results with different parameters anddistributions. Section VI gives some concluding remarks, anddescribes possible areas for future research.

    II. DATA AND FAILURE-TIME MODELA. The DataThis paper uses a dataset from a product that is used at home

    that we call product B. To protect proprietary and sensitiveinformation, we have disguised the data by changing the timescale, and using a randomly chosen subset of the originaldataset. Although our methods were motivated by this specicapplication, the developed method is general, and can be ap-plied to other situations with unknown retirement times andreporting delays.The company manufactured 14 batches of product B over

    time, and there were 120,921 units in total. The units were putinto service at different times inclusive between January 1996and June 1997 (staggered entry). We dene the rst installa-tion time (i.e., January 1996) to be time 0. Then the installa-tion times for the 14 different batches were 0, 2, 3, 4, 5, 6, 7,10, 12, 13, 14, 15, 16, and 17 months, respectively. After 55months (a little less than ve years) later, a potentially dan-gerous failure mode was reported. Subsequently, 32 additionalfailures were reported by the data freeze date (DFD), which wasat 118 months (about 9.8 years) after the rst units were intro-duced into service. Fig. 1 illustrates the staggered entry pattern.The gure shows, for each batch, the number of units that hadbeen installed, the time where failures were reported, and thenumber of units that had not being reported as failing by theDFD.Table I shows the 32 reported failure times (months in service

    before failure). The failure times are denoted by ,where is the total number of reported failures ( here).The failure times were recorded to the nearest month. For ex-ample, failure time 91 indicates that a unit failed between 90.5and 91.5 months after its installation. Table I also lists the agethat the failed unit would have been at the DFD if it had notfailed; these times are denoted by .Table II shows the number of units installed, the number of

    failures reported, the number of units not being reported by theDFD, and the ages of units at the DFD (denoted by ) for the14 batches of product B. The number of units installed, denotedby , where is the total number of batches (

    here), ranges between 5,795 and 12,233. The numberof units that were not reported is denoted by . Although thenumber of reported failures from each batch and the overall frac-tion failing is small (0.026%), it is worthy to investigate suchdata, and predict the number of failures in the future due to thepotential serious consequences of the failure.

    Fig. 1. Event plot for the dataset with indicating the failure times for thereported failures.

    TABLE IFAILURE TIME , AND CORRESPONDING AGE

    OF THE UNIT'S BATCH AT THE DFD

    B. Model for Time to Failure

    Let denote the product failure time. We use the log-lo-cation-scale family of distributions to model the distributionof . Among those members in the log-location-scale family,the Weibull and lognormal distributions are the two most com-monly-used distributions for describing failure times. In partic-ular, the cumulative distribution function (cdf) and probabilitydensity function (pdf) of the Weibull distribution can be ex-pressed as

    where , andare the standard smallest extreme value cdf, and pdf,

  • 54 IEEE TRANSACTIONS ON RELIABILITY, VOL. 64, NO. 1, MARCH 2015

    TABLE IIINSTALLED QUANTITIES, NUMBER OF FAILURES REPORTED, NUMBER

    NOT REPORTED, AND THE AGE OF THE BATCH AT THE DFD

    respectively. Here, is the location parameter, and is thescale parameter of the distribution of .We use the Weibull distribution to describe the distribution

    of . In our sensitivity analysis, the lognormal distribution isconsidered as an alternative to describe the failure-time distri-bution. By replacing , and with , and , thecdf, and pdf of the lognormal distribution are

    respectively.The cdf, and pdf of theWeibull distribution can also be re-ex-

    pressed as

    where , is the Weibull scale parameter(also the approximate 0.63 quantile), and is theWeibull shape parameter. The value of indicates the shapeof the hazard function, which is given by

    In particular, indicates an increasing hazard function;indicates a constant hazard function; and in-

    dicates a decreasing hazard function. More information aboutthe log-location-scale family of distributions can be found inChapter 4 of Meeker and Escobar [2].

    C. Product Retirement DistributionProduct retirement occurs when a unit is removed from

    service before it fails. Let be the time of retirement. To avoidprediction bias, it is important to incorporate the retirementinformation into the failure-time model. However, there is notracking of the retirement time at the individual product level.Thus, the retirement information at the population level has tobe used.A marketing survey had been conducted, providing informa-

    tion about the length of time that people tended to use certainappliances before replacing them with new models. This surveyprovided information about the mean and standard deviation ofappliances like product B. By assuming that the retirement timess-independently and identically follow a Weibull distribution,we obtain that the mean retirement time is(approximately 8.2 years), and the shape parameter is be-tween 1.5 and 2. The cdf of the retirement time can be expressedas

    (1)

    The value of shape parameter indicates that the re-tirement hazard function is an increasing function of productage. The mean of the Weibull distribution is

    , implying that the Weibull characteristic life parameteris between 108.6 and 110.6 months. Due to the nature of the

    serious failure mechanism (because it has no symptoms beforeit occurs), it can reasonably be assumed to be s-independent ofthe time of retirement.D. Failure Reporting DelayIn this application, there were known delays in the reporting

    of failures. Because these delays were potentially important tothe estimation of the failure-time distribution, there was need toconsider them in modeling and prediction. We denote the lengthof the delay by , which is assumed to be s-independent of thefailure time. The reporting time is equal to , where isthe product's failure time.Based on available records, no reporting delays had been

    longer than 15months. Thus, the probability of a reporting delaygreater than or equal to 16 months is equal to zero, and all delaytimes are between 0 month and 15 months. Based on histor-ical information, the distribution of delays is approximated by adiscrete distribution given in Table III. A particular delay timeis denoted by , and the corresponding probability is denotedby . Table III indicates that around 62% of failureswould be reported to the company without any delay. Note that

    . Because the failure process and reportingprocess are not related, it is reasonable to assume that the re-porting delay is s-independent of failure time .

    III. MAXIMUM LIKELIHOOD ESTIMATIONA. Construction of the Likelihood FunctionLet denote the realized failure time of unit , which is the

    amount of time between when the unit was installed and whenit failed. If a unit failed, and the failure was reported before theDFD, the failure that occurred at time was recorded. Because

  • XU et al.: ASSESSING RISK OF A SERIOUS FAILURE MODE BASED ON LIMITED FIELD DATA 55

    TABLE IIIPROBABILITY MASS FUNCTION FOR THE DISTRIBUTION OF REPORTING DELAYS

    the failure times were recorded to the nearest month, the actualfailure time for observation is in the interval , where

    , and . The prob-ability of a failure before retirement with failure time between

    and is

    (2)where the factor represents the probability that theunit retires after time . One can consider the failure time andretirement time as in a competing-risks model (e.g., Chapter2 of Crowder [16]). Fig. 2 illustrates the computing of the like-lihood contribution in (2) in which the shaded area shows thelikelihood contribution.To account for reporting delay, (2) needs to be adjusted. Here,

    the reporting delay is incorporated into the model by condi-tioning on the observed value of . In particular, the probabilityof actually failing in the interval and having the failurereported before the DFD is shown in (3) at the bottom of thepage, where

    whenotherwise.

    The indicator function accounts for the cen-soring that arises because we only know about failures that arereported before the DFD. For purposes of numerical computa-tion, (3) can be re-expressed as

    Fig. 2. Illustration of the likelihood contribution in (2) relative to the joint dis-tribution of and . The probability under the shaded region is the likelihoodcontribution.

    For those units that were not reported as failures before DFD,the probability that a unit in installation batch has not beenreported as a failure before DFD is shown in (4) at the bottomof the page, where

    whenotherwise.

    (3)

    (4)

  • 56 IEEE TRANSACTIONS ON RELIABILITY, VOL. 64, NO. 1, MARCH 2015

    Fig. 3. Comparison of shape of the log relative likelihood function of the two parametrizations with , and . (a) Original Parametrization.(b) Alternative Parametrization.

    Equation (4) can be re-expressed as

    The log-likelihood function based on the data in Table I andTable II is

    (5)

    where . Here, the rst summation is over thereported failures, the second summation is over the installationbatches in Table II, and is the number of units from batchthat have not been reported as failures.

    B. Parameter Estimates

    The ML estimator of is denoted by . To makethe numerical optimization more stable, we optimized the log-likelihood function using an alternative parametrizationand , instead of the original parametrization and . Here,

    is the 0.001 quantile of the product failure-time distribu-tion. The effect of the reparametrization is shown in the con-tour plots of the log relative likelihood as in Fig. 3. Under theoriginal parametrization, the shape of the log relative likeli-hood is elongated, indicating a strong correlation betweenand . Such strong correlation will make the numerical opti-mization less stable. Under the alternative parametrization, thelog-likelihood is better behaved. In addition, it is more mean-ingful to parameterize it using instead of using , be-cause is the time by which 63.2% of the population willfail, which is hard to reach with the consideration of retire-ment time. Due to the invariance property of ML estimators,theML estimates obtained under the alternative parametrization

    TABLE IVML ESTIMATES AND STANDARD ERRORS FOR PARAMETERS

    UNDER

    can be transformed into the ML estimates for the original pa-rameters for subsequent computations. Here,

    , where is the0.001 quantile of the standard smallest extreme value distribu-tion. The standard error of can be obtained by using the deltamethod. More details about the delta method can be found inpage 626 of Meeker and Escobar [2]. The CI can be computedby the transformation as in Hong, Meeker, and Escobar [17].Table IV shows the estimates of the Weibull shape and

    scale parameters based on the Weibull retirement distributionassumption with , and . Under this as-sumption, the Weibull shape parameter estimate is ,which is larger than 1, indicating that the failure-time distribu-tion hazard function is increasing, which is in agreement withthe known physical degradation cause of failure.It is important to consider retirement when one needs to de-

    termine the fraction reported. As an illustration, Fig. 4 showsthe failure-time distributions with and without adjustment of re-tirement. Here, the distribution with adjustment of retirementis plotted based on , and the distribu-tion without adjustment of retirement time is plotted based on

    , where is the estimated pdf based on ML es-timates in Table IV given , and . The gureillustrates the large effect that the retirement distribution playsin determining the fraction reported as a function of time. Thatis, the failure probability of the distribution without considering

  • XU et al.: ASSESSING RISK OF A SERIOUS FAILURE MODE BASED ON LIMITED FIELD DATA 57

    Fig. 4. Failure-time distributions with and without adjustment of retirementwhen , and .

    retirement is much larger than the one with retirement as timegoes.

    IV. PREDICTION OF FUTURE NUMBER OF REPORTSPrediction is important in reliability analysis, even for highly

    reliable products, because failure events have the potential tocause serious damage, or have other serious consequences suchas an automobile accident, the explosion of power transformers,and the failure of power systems in satellites. The lifetime pre-diction for highly reliable products is quite challenging due to

    the limited number of failure events. In this section, we proposea method of prediction for the number of reported failures in thefuture. We consider both point prediction and the constructionof corresponding prediction intervals.A. Probability of Being Reported Before RetirementBased on the model for the failure-time distribution, the ML

    estimates and , and the assumed values of and ,one can predict the number of reported failures that will occurbefore a specied future point in time. Let denote the age ofthe units in batch at the DFD, .For a product unit in batch that has not been reported as a

    failure before the DFD, it may fail and be reported in the futuretime interval . Here, ,

    , and is the number of months after the DFD,. The corresponding probability of being reported

    is shown in (6) at the bottom of the page. Here, denotes thefailure time of a unit that has not been reported as a failed unitby the DFD, and denotes the random reporting delayed time.Let

    and let in (6). In particular,see the unnumbered equation at the bottom of the page, where

    whenotherwise.

    Here, the indicator function constrains thereporting time between and . The factor rep-resents the probability that the unit has not retired before it fails.Numerical integration is needed to compute . The quantity

    is computed as shown in (7) at the bottom of the next page,

    (6)

  • 58 IEEE TRANSACTIONS ON RELIABILITY, VOL. 64, NO. 1, MARCH 2015

    which is similar to (4). The only difference is that we use (4) toestimate parameters, but use (7) evaluated at the ML estimatesto provide an estimate of the probability that a unit in batchis not reported as a failure before . Based on the probabilityfunction , the probability that a unit in batchis reported as a failure within months after the DFD is

    (8)

    Because the ages of units at the DFD from different batches arenot the same, the are different for different batches.B. Point PredictionFor unit in batch that has not been reported as a failure by, we use as an indicator for being reported in the future

    time interval , where . The distributionof is . Thus, the cumulative number ofreported failures at time is

    (9)

    which is the sum of s-independent and non-identical indicators.The point prediction (estimate of the expected number failing)for the number of reports up to time is

    Here, is the number of units not reported as having failed bythe DFD in batch , and is obtained by evaluating (8) atthe ML estimates and , and the assumed values of and

    .C. Prediction IntervalThis section introduces a method of computing PIs for the

    cumulative number of reported failures at a future time point.The cumulative number of reported failures , as given in

    (9), is the sum of s-independent and non-identically distributedBernoulli random variables which follows a Poisson-binomialdistribution. Hong [18] gives an exact expression for thecdf of the Poisson-binomial distribution based on a discreteFourier transform. In particular, the cdf of , denoted by

    , is shown in the last equation at thebottom of the page, where , , and

    .Using the predictive distribution given in Lawless and Fre-

    dette [12], a PI for , denoted by, is obtained by solving

    (10)

    Here, is the quantile of the distribution of the random quan-tity , where both and are treated as random vari-ables. We use a bootstrap simulation procedure to approximatethe quantile . In particular, is approximated from boot-strap samples by the sample quantile of

    . Here, is simulated from given the MLestimate , and is the ML estimates obtained from bootstrapsamples. We use the random weighted bootstrap proposed byNewton and Raftery [19] instead of the ordinary bootstrap be-cause of the heavy censoring, and the complicated data struc-ture. The specic procedure for such a bootstrap is described asfollows.1) Simulate random weights and from a positive dis-

    tribution with the property of , whereis the random weight for reported failure unit , andis the random weight of the not-reported unit in

    batch by the DFD. We sample and from the ex-ponential distribution with a mean of one.

    2) Based on the random weights and , we calculatethe random weighted likelihood

    (11)

    (7)

  • XU et al.: ASSESSING RISK OF A SERIOUS FAILURE MODE BASED ON LIMITED FIELD DATA 59

    TABLE VML ESTIMATES AND STANDARD ERRORS FOR MODEL PARAMETERS

    3) Obtain the estimated by maximizing (11).4) Repeat the above steps times to obtain the bootstrap

    samples .Following Lawless and Fredette [12], we construct a cali-

    brated PI for by using the following steps.1) Simulate , where is the

    vector of ML estimates from the original data.2) Compute3) Calculate the lower, and upper quantiles of

    , denoted by , and , respectively.4) Solve from , and from

    to obtain the endpoints of the PI.D. Prediction ResultsIn this section, we present the results of point predictions and

    PIs for the cumulative number of reported failures, based on theassumptions that both the failure-time distribution and the re-tirement-time distribution are Weibull. Fig. 5 shows the pointpredictions for the cumulative number of reported failures, andthe corresponding PIs based on and .The cumulative number of reported failures is increasing rapidlyuntil 100 months, and then tends to be stable after 150 months.The leveling-off is caused by the fact that more sample unitsare retired as time passes. Compared to the initial number ofunits not reported as failures , the predicted cu-mulative number of reported failures (estimate of the expectednumber) is around 70, which is a small number of the units, rel-ative to the number that had been put into service (i.e., less than0.058%).

    V. SENSITIVITY ANALYSISThe prediction of the cumulative number of reported failures

    is based on uncertain assumptions including the failure-time dis-tribution, as well as the parameters and the distribution for re-tirement times. Changes in these assumptions will affect the pre-diction results. Thus, it is necessary to do the sensitivity analysisto assess the effect of departures from the assumptions, and tounderstand which assumptions are conservative.A. Parameter AssumptionsThe prediction results shown in Fig. 5 are based on the

    Weibull distribution for the retirement model with ,

    Fig. 5. Plot of the 90% PI for cumulative number of reported failures based ona Weibull distribution assumption for retirement and failure distributions, with

    , and .

    and . Historical information suggests that the valuesof parameters in the retirement-time distribution are withina certain range. Thus, it is desirable to consider other retire-ment-time model parameters to assess the effect that deviationsfrom the assumptions have on the prediction results. Table V,and Fig. 6 summarize the results of this sensitivity analysis.The standard errors of , and are denoted as ,and , respectively. From Table V, we note that, asexpected, the ML estimates and change under differentassumptions for the retirement distribution parameters. Thereis more change in the estimates of because these estimatesinvolve a substantial amount of extrapolation. The maximumlog-likelihood values are close to each other, indicating thereis little or no information about the retirement distributionparameters in the data. Fig. 6 shows corresponding predictionsfor the cumulative number of reported failures. The graphindicates that, as the expected retirement time increases, orthe Weibull shape parameter decreases (implying more spreadin the retirement times), the predicted cumulative number ofreported failures increases. The extension of retirement time

  • 60 IEEE TRANSACTIONS ON RELIABILITY, VOL. 64, NO. 1, MARCH 2015

    TABLE VIML ESTIMATES FOR PARAMETERS WHEN ,

    TABLE VIIML ESTIMATES FOR PARAMETERS WHEN ,

    Fig. 6. Weibull distribution point predictions for the cumulative number ofreported failures with different values for the parameters in the retirementdistributions.

    leads to a higher risk of having failures, but the differenceis not large given the same shape parameter. Compared tothe change of expected retirement time, predictions are moresensitive to the assumption about the Weibull shape parameter.This difference is the result of the large change of the standarddeviation in the retirement time. For example, the standarddeviations for , and given are57.57, and 44.4, respectively.B. Distributional AssumptionsIn the previous analysis, the retirement and the failure-time

    distributions were assumed to beWeibull. The data, however, do

    not provide much information to distinguish among competingdistributions. Thus, it is useful to compute predictions with dif-ferent retirement-time distribution and failure-time distributionassumptions. In this sensitivity analysis, we use the lognormaldistribution as an alternative for the failure-time distribution andthe retirement-time distribution. When the lognormal distribu-tion is chosen as the retirement-time distribution, the assumedmean and standard deviation are specied to be the same as thatassumed for the Weibull retirement-time distribution.Table VI shows the ML estimates for the failure-time distri-

    bution parameters when , and , forthe four combinations of the failure-time and retirement distri-butions. Instead of comparing values of , we compare thevalues of because, with the adjustment of retirementdistribution, the expected lifetime is not meaningful in practice(most of the products have been retired by ). From theresults in Table VI, themaximum log-likelihood values are quiteclose for all four combinations. When the failure-time distribu-tion is the same, there is no practical difference in the valuesof , and . Table VII shows similar results when

    , and .Fig. 7 shows the predicted cumulative number of reported

    failures with different retirement-time and failure-time distri-butions. The legend shows the failure-time and retirement-timedistribution combinations. For example, Lognormal-Weibullindicates that retirement times follow a lognormal distribution,and failure times follow a Weibull distribution. Fig. 8 showssimilar comparisons under a different set of values for ,and . Given the data follow the same failure-time dis-tribution, the predictions before 120 months do not dependstrongly on assumptions about the retirement-time distribution.In other words, the effect of the failure-time distribution is muchstronger than the effect of the retirement-time distribution be-fore 120 months. Compared to other distribution combinations,a lognormal retirement distribution with a Weibull failure-time

  • XU et al.: ASSESSING RISK OF A SERIOUS FAILURE MODE BASED ON LIMITED FIELD DATA 61

    Fig. 7. The predicted number of the reported failures based on different failure-time and retirement-time distributions when , and .The legend indicates the distribution of retirement time, and the distribution ofthe failure time, respectively.

    Fig. 8. The predicted number of the reported failures based on different failure-time and retirement-time distributions when , and .The legend indicates the distribution of retirement time, and the distribution ofthe failure time, respectively.

    distribution provides the most conservative predictions (i.e.,predicts more reported failures).

    VI. CONCLUSIONS, AND AREAS FOR FUTURE RESEARCHThis paper provides general statistical methods to predict

    eld failures, and conduct a risk assessment for products. Togenerate accurate predictions, the proposed method considersthe effect of retirement times and reporting delays whenestimating the failure-time distribution, and when makingpredictions. Based on the failure-time model, we predict

    the cumulative number of reported failures, and constructa corresponding PI. We also conduct sensitivity analysis toassess the effect of different failure-time and retirement-timedistributions. The proposed methods can also apply to otherhighly reliable products such as aircraft engine components,and medical devices, if the data have similar structures (i.e.,with retirement or reporting delay).Here are some possible areas for future research. In the product B application, the degradation process that

    causes the serious failures was related to product age (notthe amount of use), and was unrelated to customer-per-ceivable performance degradation. Thus it was reasonableto assume that the retirement-time and failure-time randomvariables were s-independent. In other applications, amodel that allows dependency could be used.

    For most products, it is not possible to track the retire-ment time for all units. For some applications, it wouldbe possible and useful to track a representative subset ofthe product populations through a carefully designed eldtracking study.

    Today, some products, even home appliances, can be con-nected to the internet, potentially providing detailed infor-mation about how each such unit is being used. See Hongand Meeker [20], [21] for applications involving the pre-diction of future failures when a proportion of units in theproduct population are connected to the Internet. Havingthe additional information about which units are still in ac-tive use would reduce much of the uncertainty in predic-tions associated with a risk analysis.

    For the product B application, there was only limited infor-mation about the retirement-time distribution, based on acompletely separate marketing study for a similar product.An alterative analysis could have taken that information,perhaps supplemented by expert opinion, to develop a jointprior distribution to describe unknown characteristics (in-cluding the form and the parameters) of the retirement-timeand failure-time distributions. This alternative would allowa fully Bayesian analysis to be performed.

    ACKNOWLEDGMENTThe authors would like to thank the associate editor and two

    referees for their valuable comments that helped in improvingthis paper. The authors acknowledge Advanced Research Com-puting at Virginia Tech for providing computational resources.

    REFERENCES[1] W. Nelson, Applied Life Data Analysis. New York, NY, USA: Wiley,

    1982.[2] W. Q. Meeker and L. A. Escobar, Statistical Methods for Reliability

    Data. New York, NY, USA: Wiley, 1998.[3] M. Engehardt and L. J. Bain, Prediction intervals for the Weibull

    process, Technometrics, vol. 20, pp. 167169, 1978.[4] R. W. Mee and D. Kushary, Prediction limits for the Weibull distri-

    bution utilizing simulation, Computat. Statist. Data Anal., vol. 17, pp.327336, 1994.

    [5] W. Nelson, Weibull prediction of a future number of failures, Qual.Rel. Eng. Int., vol. 16, pp. 2326, 2000.

    [6] D. J. Nordman and W. Q. Meeker, Weibull prediction intervals for afuture number of failures, Technometrics, vol. 44, pp. 1523, 2002.

    [7] S. Geisser, Predictive Inference: An Introduction. New York, NY,USA: Chapman & Hall, 1993.

  • 62 IEEE TRANSACTIONS ON RELIABILITY, VOL. 64, NO. 1, MARCH 2015

    [8] G. Tian, M. Tang, and J. Yu, Bayesian estimation and prediction forthe power law process with left-truncated data, J. Data Sci., vol. 9,pp. 445470, 2011.

    [9] F. S. D. Menezes, M. J. F. Vivanco, and L. C. Sampaio, Determinationof prediction intervals for a future number of failures: A statistical andMonte Carlo approach, Brazilian J. of Phys., vol. 36, pp. 690699,2006.

    [10] L. A. Escobar and W. Q. Meeker, Statistical prediction based on cen-sored life data, Technometrics, vol. 41, pp. 113124, 1999.

    [11] Y. Hong, W. Q. Meeker, and J. D. McCalley, Prediction of remaininglife of power transformers based on left truncated and right censoredlifetime data, Ann. Appl. Statist., vol. 3, pp. 857879, 2009.

    [12] J. F. Lawless and M. Fredette, Frequentist prediction intervals andpredictive distributions, Biometrika, vol. 92, pp. 529542, 2005.

    [13] G. Yang, Accelerated life test plans for predicting warranty cost,IEEE Trans. Rel., vol. 59, pp. 628634, 2010.

    [14] C. Park and K. B. Kulasekera, Parametric inference of incomplete datawith competing risks among several groups, IEEE Trans. Rel., vol. 53,pp. 1121, 2004.

    [15] K. Zhao, D. Steffey, and J. Loud, Incorporating product retirementin eld performance reliability analysis, in Proc. 2010 Ann. Rel. andMaintainability Symp. (RAMS), 2010, pp. 15.

    [16] M. J. Crowder, Classical Competing Risks. Boca Raton, FL, USA:Chapman & Hall/CRC, 2001.

    [17] Y. Hong, W. Q. Meeker, and L. A. Escobar, The relationship betweencondence intervals for failure probabilities and life time quantile,IEEE Trans. Rel., vol. 57, pp. 260266, 2008.

    [18] Y. Hong, On computing the distribution function for the sum of inde-pendent and non-identically distributed random indicators,Computat.Statist. Data Anal., vol. 59, pp. 4151, 2013.

    [19] M. A. Newton and A. E. Raftery, Approximate Bayesian inferencewith the weighted likelihood bootstrap, J. Roy. Statist. Soc., ser. B,vol. 56, pp. 348, 1994.

    [20] Y. Hong and W. Q. Meeker, Field-failure and warranty predictionbased on auxiliary use-rate information, Technometrics, vol. 52, pp.148159, 2010.

    [21] Y. Hong andW. Q.Meeker, Field-failure predictions based on failure-time data with dynamic covariate information, Technometrics, vol. 55,pp. 135149, 2013.

    Zhibing Xu received the M.S. degree in statistics from Virginia Tech in 2011.He is currently a Ph.D. candidate in the Department of Statistics at VirginiaTech.His research interests include reliability data analysis, and engineering

    statistics.

    Yili Hong received the B.S. degree in statistics from the University of Scienceand Technology of China in 2004 and the M.S. and Ph.D. degrees in statisticsfrom Iowa State University in 2005 and 2009, respectively.He is currently an Associate Professor in the Department of Statistics at

    Virginia Tech. His research interests include reliability data analysis, andengineering statistics.Prof. Hong is an elected member of the International Statistical Institute.

    William Q. Meeker is a Professor of Statistics and Distinguished Professorof Liberal Arts and Sciences at Iowa State University. He is co-author of thebooks Statistical Methods for Reliability Data with Luis Escobar (1998), Statis-tical Intervals: A Guide for Practitioners with Gerald Hahn (1991), nine bookchapters, and of numerous publications in the engineering and statistical lit-erature. He has consulted extensively on problems in reliability data analysis,reliability test planning, accelerated testing, nondestructive evaluation, and sta-tistical computing.Prof. Meeker is a Fellow of the American Statistical Association, and of the

    American Society for Quality. He is an elected member of the International Sta-tistical Institute, and a past Editor of Technometrics. He has won the AmericanSociety for Quality (ASQ) Youden prize ve times, the ASQ Wilcoxon Prizethree times, and the American Statistical Association Best Practical Applica-tion Award. He is also an ASQ Shewhart Medalist.