Mechanical Systems and Signal Processingstatic.tongtianta.site/paper_pdf/6e2be43e-7907-11e9-adb6-00163e08… · A Wiener-process-based degradation model with a recursive ﬁlter algorithm

Contents lists available at SciVerse ScienceDirect

Mechanical Systems and Signal Processing

Mechanical Systems and Signal Processing 35 (2013) 219–237

0888-32

http://d

n Corr

E-m

journal homepage: www.elsevier.com/locate/ymssp

A Wiener-process-based degradation model with a recursive filteralgorithm for remaining useful life estimation

Xiao-Sheng Si a,b, Wenbin Wang c,n, Chang-Hua Hu a, Mao-Yin Chen b, Dong-Hua Zhou b,n

a Department of Automation, Xi’an Institute of High-Tech, Xi’an, Shaanxi 710025, Chinab Department of Automation, TNList, Tsinghua University, Beijing 100084, Chinac Dongling School of Ecomonics and Management, University of Science and Technology Beijing, Beijing 100083, China

a r t i c l e i n f o

Article history:

Received 2 February 2012

Received in revised form

27 July 2012

Accepted 10 August 2012Available online 1 September 2012

Keywords:

Reliability

Remaining useful life

Wiener process

Recursive filter

Expectation maximization

70/$ - see front matter & 2012 Elsevier Ltd. A

x.doi.org/10.1016/j.ymssp.2012.08.016

esponding authors. Tel.: þ86 010 62794461

ail addresses: [email protected] (X

a b s t r a c t

Remaining useful life estimation (RUL) is an essential part in prognostics and health

management. This paper addresses the problem of estimating the RUL from the

observed degradation data. A Wiener-process-based degradation model with a recur-

sive filter algorithm is developed to achieve the aim. A novel contribution made in this

paper is the use of both a recursive filter to update the drift coefficient in the Wiener

process and the expectation maximization (EM) algorithm to update all other para-

meters. Both updating are done at the time that a new piece of degradation data

becomes available. This makes the model depend on the observed degradation data

history, which the conventional Wiener-process-based models did not consider.

Another contribution is to take into account the distribution in the drift coefficient

when updating, rather than using a point estimate as an approximation. An exact RUL

distribution considering the distribution of the drift coefficient is obtained based on the

concept of the first hitting time. A practical case study for gyros in an inertial navigation

system is provided to substantiate the superiority of the proposed model compared

with competing models reported in the literature. The results show that our developed

model can provide better RUL estimation accuracy.

& 2012 Elsevier Ltd. All rights reserved.

1. Introduction

Enhancing safety, efficiency, availability, and effectiveness of industrial and military systems through prognostics andhealth management (PHM) paradigm has gained momentum over the last decade [1,2]. PHM is a systematic approachthat is used to evaluate the reliability of a system in its actual life-cycle conditions, predict failure progression, andmitigate operating risks via management actions. There are two parts in PHM, namely, ‘‘prognostics’’ and ‘‘health

management’’. Prognostics is often characterized by estimating the remaining useful life (RUL) of a system usingavailable condition monitoring (CM) information [3–7]. Once such prognosis is available, appropriate health manage-ment actions such as repair, replacement, and logistic support can be performed to achieve the required system’soperational objectives [8–10]. In PHM, the term ‘‘RUL estimation’’ often implies to find the probability density function(PDF) of the RUL or the mean of the RUL [11], but the emphasis is often placed more on estimating the PDF of the

ll rights reserved.

; fax: þ86 010 62786911.

.-S. Si), [email protected] (W. Wang), [email protected] (D.-H. Zhou).

www.elsevier.com/locate/ymssp

www.elsevier.com/locate/ymssp

dx.doi.org/10.1016/j.ymssp.2012.08.016

dx.doi.org/10.1016/j.ymssp.2012.08.016

dx.doi.org/10.1016/j.ymssp.2012.08.016

mailto:[email protected]



dx.doi.org/10.1016/j.ymssp.2012.08.016

X.-S. Si et al. / Mechanical Systems and Signal Processing 35 (2013) 219–237220

RUL than the mean RUL since a PDF can characterize the uncertainty associated with the RUL and is hence moreinformative for management decision making.

The current RUL estimation approaches can be broadly classified as physics of failure, data driven and fusion methods.Physics of failure approaches rely on the physics of underlying failure mechanisms. Data driven approaches achieve RULestimation via data fitting mainly including machine learning and statistics based approaches. The fusion approaches arethe combination of the physics of failure and data driven approaches. However, for complex or large-scale engineeringsystems, it is typically difficult to obtain the physical failure mechanisms in advance or cost-expensive and time-consuming to capture the physics of failure by experiments. In contrast, data-driven approaches attempt to derive modelsdirectly from collected degradation data or life data, and thus are more appealing and have gained much attention inrecent years.

Statistics-based data driven methods for RUL estimation can be classified into the models based on the indirectlyobserved state processes and the models based on the directly observed state processes [2]. The former models consideredthe data partially indicating the underlying state of the system, and assumed that the available CM data werestochastically related to the underlying health state. In this case, lifetime data must be available to establish therelationship between the CM data and failure. The latter models utilized the observed degradation data directly to describethe underlying state of the system. Therefore, the RUL is defined as the time to reach the failure threshold of the monitoreddegradation data for the first time, namely the first hitting time (FHT) [12]. It is noted that the observed degradation datawith a threshold are easier to manipulate and implement in practice when lifetime data are scarce [13,14].

It is well recognized that degradation process is uncertain over time and thus stochastic models are frequently used tocharacterize the evolution of degradation process. In literature, random effect regression (RER) models and stochasticprocess (SP) models are two kinds of most commonly used stochastic models. Lu and Meeker in [15] first presented theRER model to characterize the degradation of a population of units, and many extensions appeared later [2,16]. However,the degradation modeling paradigm in RER models is based on the fact that a population of ‘‘identical’’ systems (or devices)has a common degradation form. However, individual systems may exhibit different degradation rates, hence the differentfailure times. On the other side, Pandey et al. in [17] identified that the temporal uncertainty of the degradation processwas not taken into account in RER models and thus argued that SP models could remedy it well. They also showed theadvantages of SP models over RER models in condition based maintenance. SP models such as Markov chain, Gammaprocesses, and Wiener processes have been widely used to model the degradation process [5,18–21]. A Wiener-process-based degradation model is one of statistics-based data driven models, which can characterize a non-monotonicdegradation process, and provide a good description of system’s behavior due to an increased or reduced intensity ofthe use [22]. This type of models has been applied to model the degradation process and to estimate the RUL of a variety ofindustrial assets, such as rotating element bearings [19], LED lamps [23], self-regulating heating cables [24], lasergenerator [25], bridge beams [26], and fatigue crack dynamics [27]. Therefore, in this paper we focus on RUL estimationbased on a Wiener process where the degradation process can be observed directly and a failure threshold of degradationis available. It is known that the selection of the failure threshold is an important problem in practice. However, suchthreshold is usually set based on either engineering domain knowledge or accepted industrial standards. For example, theISO 2372 and ISO 10816 are frequently adopted for defining acceptable vibration threshold levels. It is therefore an issuebeyond the scope of this paper. In this work, we assume that the failure threshold is known a priori.

A Wiener-process-based model has a drift term characterized by its drift coefficient and a noise term by Brownianmotion. It has been widely used to model degradation processes which can be observed directly, and conduct lifetimeanalysis. Tseng et al. in [23] used a Wiener process to determine the lifetime for the light intensity of LED lamps of contactimage scanners. As an extension, Tseng and Peng proposed an integrated Wiener process to model the cumulativedegradation path of a product’s quality characteristics [28]. Joseph and Yu used a Wiener process for degradation modelingand reliability improvement [14]. Other recent extensions in lifetime estimation can be found in [29–31]. However, thedegradation modeling paradigm for lifetime analysis in these conventional Wiener-process-based models is based on anassumption that the estimated PDF of RUL depends only on the currently observed degradation data, which is a strongMarkovian assumption.

To relax this assumption, Gebraeel et al. in [19] presented an exponential degradation model for rotating elementbearings based on a Wiener process, but incorporated some new and important improvements for RUL estimation.Their model established a linkage between the past and current degradation data of the same system by a Bayesianmechanism. However, it is worth noting that the Brownian motion in the Wiener process was just used as an error term intheir models and the availability of the explicit distribution of the FHT from the Wiener process was not utilized. Instead,they directly estimated the RUL distribution using an implicit monotonic assumption. It is well-known that Wienerprocesses are non-monotonic. As such, the resulted RUL estimates in [19] are approximations. In addition, we note that thestochastic coefficients reflecting the individual-to-individual variability in [19] followed some prior distributions, but noelaborated method was presented to select the parameters in the prior distributions. Typically, several systems’ historicaldegradation data of the same type are required to determine the prior parameters. However, such historical degradationdata of many systems are not always available in practice, particularly for newly commissioned systems. It is shown inSection IV that the inappropriate selection of the prior parameters can result in inaccurate estimation of the degradationand the RUL. Wang et al. in [32] recently proposed a Wiener-process-based model which used all past degradation data todate of the system for RUL estimation. Their model explicitly used the FHT from the Wiener process for RUL estimation.

X.-S. Si et al. / Mechanical Systems and Signal Processing 35 (2013) 219–237 221

However, we note that the model in [32] also required the data of many same systems for parameter estimation, and thedistribution of the updated drift coefficient was not considered. Our results reveal that considering such distribution canlead to uncertainty reduction in the estimated RUL. One final observation from the existing literature of Wiener-process-based RUL estimation models is that all estimated parameters are not updated in line with newly observed data.

From the above review of related researches, we observe that there are three issues remaining to be solved whenapplying Wiener process for RUL estimation. The first is how to estimate the model parameters from an individualsystem’s data without the need of past data from many systems. The second is to consider the distribution in the estimateddrift coefficient, which is a critical parameter having impact on both the mean and variance of the RUL. The third is toupdate model parameters based on newly observed degradation data. As we know the system’s life is heavily influencedby the way it is operated, maintained and the environment where it has been operating. The consideration of the abovethree issues will make our model tailored to an individual system through its actual monitoring data which relate to itsoperational and environmental characteristics.

In this paper, we address the above issues by utilizing a Wiener-process-based model with a recursive filter algorithmfor RUL estimation. We use two techniques for the updating of the RUL estimation. A state-space model is used torecursively update the drift coefficient and an expectation maximization (EM) algorithm is used to re-estimate allunknown parameters at each time when new data are available. The new contributions of this paper are summarized asfollows: (1) Different from all the previous works, our model estimates an individual system’s RUL based on its entiremonitoring information to date through a recursive filter and an EM algorithm, and does not require historical degradationdata of other systems in a population; (2) Unlike the work of Wang et al. in [32] where a recursive filter was also used, thedrift coefficient is treated as a random variable to incorporate its distribution in estimation; (3) Our model is also differentfrom the approximated results in [19,33] in that our result on the PDF of the RUL is exact in the sense of the FHT, and wealso show that our result can ensure that the moments of the RUL exist, but this is not the case for the approximatedresults in [19,33]; and (4) We apply the proposed model to estimate the RUL of gyros in an inertial navigation system usedin weapon systems as a case application.

The remainder parts are organized as follows. In Section II, we develop a Wiener-process-based degradation model andobtain the distribution of the RUL. In Section III, we discuss the parameter estimation algorithm in detail. Section IV provides acase study to illustrate the application and usefulness of the developed model. Section V draws the main conclusions.

2. Wiener-process-based degradation modeling and RUL estimation

Notation used in this paper is summarized as follows.

ti Time of the ith CM point (can be irregularly spaced)XðtÞ Random variable representing the degradation at time t

xi Degradation observation at ti

X0:i ¼ fx0,x1,x2,:::,xig History of degradation observations for the system to ti

Ui ¼ l0,l1,. . .,li�1

� �History of drift coefficients to ti

EðUÞ Expectation operatorBðtÞ Standard Brownian motionl Drift coefficientli Drift coefficient at ti

l0 Initial state which follows Nða0,P0Þ

s Diffusion coefficientZ Noise of li which follows Nð0,Q Þw Failure thresholdli Estimated li conditional on X0:i, denoted by Eðli9X0:iÞ

lj9i Smoothed lj conditional on X0:i, denoted by Eðlj9X0:iÞ

Pi9i Variance of estimated li, denoted by Varðli9X0:iÞ

Pi9i�1 Variance of estimated li at ti�1, denoted by Varðli9X0:i�1Þ

Ki Filter gainRi RUL at ti with its realization, ri

T Lifetimeh Parameter column vector, denoted by h¼ ½a0,P0,Q ,s2�T

hi Maximum likelihood estimate (MLE) of h based on X0:i

hi

ðkÞEstimated h in the kth step based on X0:i in the EM algorithm

LiðhÞ Log-likelihood function for observed events, x0,x1,x2,:::,xi

‘iðhÞ Joint log-likelihood function for observed events, X0:i and Ui

‘ðh9hðkÞ

i Þ Expectation of ‘iðhÞ conditional on hðkÞ

i and X0:i

Dn Order principal minor determinant of the second derivation of ‘ðh9hðkÞ

i Þ, with n¼ 1,2,3,4FðUÞ Standard normal CDF


2.1. An outline of Wiener-process-based degradation model for lifetime analysis

In this section we briefly present the conventional Wiener-process-based degradation model for lifetime analysis.A Wiener process is typically used for modeling degradation processes where the degradation increases linearly in timewith random noise. The rate of degradation is characterized by the drift coefficient. The wear of break pads on automotivewheels is a practical example. Christer and Wang in [34] modeled the wear of break pads as a linear function of time wherethe thickness of the break pad decreases linearly in time with a Gaussian noise.

In general, a Wiener-process-based degradation model can be represented as,

XðtÞ ¼ ltþsBðtÞ, ð1Þ

where l is the drift coefficient, s40 is the diffusion coefficient, and BðtÞ is the standard Brownian motion representing thestochastic dynamics of the degradation process.

In physics, a Wiener process aims at modeling the movement of small particles in fluids and air with tiny fluctuations.A characteristic feature of this process in the context of reliability is that the plant’s degradation can increase or decreasegradually and accumulatively over time. The small increase or decrease in degradation over a small time interval behavessimilarly to the random walk of small particles in fluids and air. Therefore, this type of stochastic processes has beenwidely used to characterize the path of degradation processes where successive fluctuations in degradation can beobserved, such as the degradation observations of rotating element bearings [19], LED lamps [23], self-regulating heatingcables [24], laser generator [25], bridge beams [26] and other examples in [14] and [30–35] and our case of gyros’ drifting.Modeling a stochastic degradation process as a Wiener process implies that the mean degradation path is a linear functionof time, i.e. E XðtÞ½ � ¼ lt. Therefore, the drift parameter l is closely related with the progression of the degradation.In addition, we have the variance of the degradation process var XðtÞ½ � ¼ s2t, which represents the uncertainty of thedegradation at time t.

For in service lifetime estimation at time ti with the obtained degradation observation xi, we can use,

XðtÞ ¼ xiþlðt�tiÞþs BðtÞ�BðtiÞð Þ

¼ xiþlðt�tiÞþsBðt�tiÞ, f or t4ti ð2Þ

At time ti, we assume xiow, otherwise the degradation has crossed w and the system would have failed as defined.Although the above model setting is the same, there are two different ways to relate the degradation XðtÞ to lifetime T at ti

in the literature. The first one is that the lifetime is directly defined as T ¼ t : XðtÞZw9xi

��, and then the lifetime

distribution can be represented by FT9xiðt9xiÞ ¼ PrðXðtÞZw9xiÞ, such as [19,31,33]. To see this, using T ¼ t : XðtÞZw9xi

��, the

PDF and the cumulative density function (CDF) of lifetime T at time ti can be directly obtained as,

f T9xiðt9xiÞ ¼

1

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pðt�tiÞ

p exp �w�xi�lðt�tiÞð Þ

2

2s2ðt�tiÞ

!ð3Þ

FT9xiðt9xiÞ ¼ 1�F

w�xi�lðt�tiÞ

sffiffiffiffiffiffiffiffiffit�ti

p

� �, ð4Þ

where FðUÞ denotes the standard normal CDF.Another one defines the lifetime based on the concept of the FHT as T ¼ inf t : XðtÞZw9xi

��, such as [21,29,30,32,35].

Therefore, using the FHT, the following results can be obtained [36],

f T9xiðt9xiÞ ¼

w�xiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pðt�tiÞ

3s2

q exp �w�xi�lðt�tiÞð Þ

2

2s2ðt�tiÞ

!, ð5Þ

FT9xiðt9xiÞ ¼ 1�F

w�xi�lðt�tiÞ


p

� �þexp

2lðw�xiÞ

s2

� �F�ðw�xiÞ�lðt�tiÞ


p

� �, ð6Þ

with the mean E T9xi

�¼ tiþðw�xiÞ=l

�and variance var T9xi

�¼ ðw�xiÞs2=l3

h. This reflects the relationship among the

parameters used in the model, the current degradation measurement and the estimated future life of the plant modeled.Particularly, l is critical for both the mean and variance of the estimated lifetime.

Clearly, the above two definitions are different from each other and also lead to different lifetime estimations. As notedby Park and Bae in [16], T ¼ t : XðtÞZw9xi

��completely ignores the possible hitting events within interval ðti, tÞ and thus is

only a crude approximation to T ¼ inf t : XðtÞZw9xi

��when the degradation fluctuations are large. In addition, the CDF

using the FHT as Eq. (6) is greater than the approximation by Eq. (4). For safety-critical systems, using the approximatedresult as Eq. (4) for maintenance scheduling may lead to under-maintenance because of the lower risk of failure estimatedby Eq. (4). As such, it is necessary to consider the FHT as the lifetime, which is exact if the failure is defined as the FHT.

We note that Eq. (6) uses only the current degradation data, but not its history before ti. However, as we have discussed,this is a strong Markovian assumption and ideally the future FHT should depend on the path that the degradation hasinvolved to date. For example, in Fig. 1, at the same level of xi, case (a) would be expected to fail faster than case (b), butusing Eq. (6) will give the same prediction.

xi xi

w w

Degradation Degradation

TimeTime

Fig. 1. Two exemplar sample paths with different tracks but the same xi.


Consequently, it is desired to utilize the degradation data to date for evaluating the RUL of the degraded system. It isexpected that utilizing the degradation data to date can make the RUL estimation sharper and more tailored to anindividual system than only using the current data. This is our main focus in the remaining parts of this paper.

2.2. Wiener-process-based degradation modeling

Now we address the issues discussed in the introduction. Since some variants introduced in [19] can be easilytransformed into Eq. (1) by logarithmic transformation, we only focus on Eq. (1), which is used to describe the evolution ofthe monitored degradation variable over time in this paper.

To incorporate the history of the observations and to maintain at the same time the nice property of the Wienerprocess, we consider an updating procedure for coefficient l by a random walk model li ¼ li�1þZ over time whereZ�Nð0,Q Þ. Thus the drift coefficient l evolves as a time-dependent random variable with a distribution, conditional onli�1. In fact, the diffusion coefficient, s, can also be made time-dependent. However the structure of Eq. (1) does not allowus to use the state space model shown later, and therefore a general filter has to be used, which is computationallydifficult. There is also a practical reason why we are only interested in making l time-varying. It is known from Eq. (1) andthe discussion in Section II-A that the mean degradation and the progression of the degradation are governed by the driftcoefficient l and the time while the diffusion coefficient s controls in part the uncertainty in the degradation process.The trend in degradation is determined by the drift coefficient while the diffusion coefficient only influences the noise,which can be considered to be constant. A similar idea can also be found in statistical process control literature, in which itis frequently assumed that the process mean will change but the variance will be constant when the process shifts from‘‘in control’’ to ‘‘out of control’’ [37]. Motivated by the state space model [38], the degradation equation can bereconstructed via a linear state-space model as,

li ¼ li�1þZ, ð7Þ

xi ¼ xi�1þli�1ðti�ti�1Þþsei, ð8Þ

where t0 ¼ 0, x0 ¼ 0, and ei �Nð0,ti�ti�1Þ. The use of ti�ti�1 as the variance of ei is required by the property of Brownianmotion. Eq. (7) is called the system equation, while Eq. (8) is the observation equation [38]. The reason to use a linearsystem equation is not only because of its simplicity. We can use a nonlinear system equation for li, but we expect that thegain obtained will be minor, but at the expense of a substantially long computation time. There is also a practical problemas what form of nonlinearity we should use because li is not observable. Since the system equation is to model the changeof the drift coefficient over a sampling interval which is not long, we would expect that li should be around li�1 adjustedby the noise term.

We assume that the initial drift coefficient l0 follows a normal distribution with mean a0 and variance P0 as required bythe state space model. The drift coefficient is considered as a hidden ‘‘state’’ and can be estimated from the observations upto ti, denoted by X0:i ¼ fx0, x1, x2,:::, xig. As such, this model establishes the linkage between the drift coefficient and theobservation history up to ti. In Eq. (7), li follows a distribution which can be estimated by a recursive filter once newobservation xi is available at ti. We denote its mean by li ¼ Eðli9X0:iÞ and its variance by Pi9i ¼ Varðli9X0:iÞ.

In order to compute li and Pi9i, we need to know the PDF of the li given X0:i, denoted by pðli9X0:iÞ. Recursion solution ofpðli9X0:iÞ can be computed from pðli�19X0:i�1Þ by the well-known Bayesian rule as follows,

pðli9X0:iÞ ¼

Zpðli9li�1Þ pðli�19X0:iÞdli�1 ¼

Rpðli9li�1Þ pðxi9li�1,X0:i�1Þ pðli�19X0:i�1Þdli�1

pðxi9X0:i�1Þ:ð9Þ

It has been well established that if Eqs. (7) and (8) are used, Eq. (9) is Gaussian with mean li and variance Pi9i, which canbe computed by the Kalman filter [38]. As a result, the entire history is captured via recursively updating the estimate of li,which is the advantage of the state space model. The recursive estimations for li and Pi9i using Kalman filtering aresummarized as Algorithm 1 in Appendix A.

In literature, the Kalman filter has been successfully applied when system states (here referred the drift coefficient as astate) and observations evolve in a smooth and gradually changing way. However, degradation sometimes may havejumps or sudden changes [19]. Here, we introduce an algorithm to deal with this by a strong tracking filter (STF) [39].STF is also Kalman filter-based, but adjusts the prediction variance Pi9i�1 so that it is sensitive to the prediction error,


xi�xi�1�li�1ðti�ti�1Þ, and then the filter gain Ki is sensitive to the change of the system state. The details about the STFalgorithm are summarized in Appendix B as Algorithm 2.

Based on Eqs. (7), (8), and (9), the PDF of li conditional on X0:i is,

pðli9X0:iÞ ¼1ffiffiffiffiffiffiffiffiffiffiffiffiffi

2pPi9i

q exp

� li�li

�22Pi9i

�,

ð10Þ

where the dependence between li and X0:i is contained in li and Pi9i. Based on this result, we derive the associated RULdistribution in the following.

2.3. Real-time updating of the RUL distribution

Based on a predefined threshold w, the RUL modeling principle is that when degradation XðtÞ first reaches threshold w,the system is declared to be non-operable and its lifetime terminates. Consequently, it is natural to view the event oflifetime termination as the point that the degradation XðtÞ exceeds threshold w for the first time. In this paper, from theconcept of the FHT, we define RUL Ri at time ti as

Ri ¼ inf ri : XðriþtiÞZw9X0:i

�,

�ð11Þ

with CDF FRi9X0:iðri9X0:iÞ and PDF f Ri9X0:i

ðri9X0:iÞ. From Eq. (2), it is direct to obtain the PDF and CDF of the RUL at time ti

defined in Eq. (11) as follows [36],

f Ri9li ,X0:iðri9li,X0:iÞ ¼

w�xiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pri

3s2p exp �

w�xi�lirið Þ2

2s2ri

!, ri40: ð12Þ

FRi9li ,X0:iðri9li,X0:iÞ ¼ 1�F

w�xi�liri

s ffiffiffiffirip

� �þexp

2liðw�xiÞ

s2

� �F�ðw�xiÞ�liri

s ffiffiffiffirip

� �, ri40: ð13Þ

In Eq. (12) if we replace li by li, then it is the RUL model used in [32]. We call this as Wang’s model for subsequentcomparisons in Section IV. However, as mentioned above, the drift coefficient evolves as a random variable in Eq. (7) with adistribution, pðli9X0:iÞ, conditional on the observed data up to time ti as formulated by Eq. (10). Now we want to usepðli9X0:iÞ for deriving the estimated RUL distribution. In order to achieve this aim, we first give two lemmas, which cansignificantly simplify the course of the derivation for the RUL distribution.

Lemma 1. If Y �Nðm1,s21Þ, then EY FðYÞ½ � can be formulated as EY FðYÞ½ � ¼Fðm1=

ffiffiffiffiffiffiffiffiffiffiffiffiffis2

1þ1q

Þ where FðUÞ denotes the standard

normal CDF.

Proof: From the property of the normal distribution, we have

EY FðYÞ½ � ¼ E EðI Z rYf g9YÞ9Y�¼ PrðZrYÞ ¼ PrðZ�Yr0Þ ¼Fðm1=

ffiffiffiffiffiffiffiffiffiffiffiffiffis2

1þ1q

Þ:

ð14Þ

In the derivation process, I ZrYf g is the indicator function, Z is a standard normal variable and independent of Y , andZ�Y �Nð�m1,s2

1þ1Þ.

Lemma 2. If l�Nðml,s2lÞ , the PDF and CDF of the FHT of process XðtÞ ¼ ltþsBðtÞ to first hit threshold w can be formulated

as

f T tð Þ ¼wffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

2pt3 s2ltþs2

� �q exp �w�mlt� �2

2t s2ltþs2

� �" #

: ð15Þ

FT tð Þ ¼Fmlt�wffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffis2lt2þs2t

q0B@

1CAþexp

2mlw

s2þ

2s2lw2

s4

!F �

2s2lwtþs2 mltþw

� �s2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffis2lt2þs2t

q0B@

1CA: ð16Þ

Lemma 2 is similar to the results given in [40] and [41]. This lemma can be obtained by some direct manipulations usingLemma 1 and the total law of probability [25]. From Lemma 2, we give the following theorem.

Theorem 1. For the Wiener process defined by Eq. (2) and the state space model as Eqs. (7) and (8), the PDF and CDF of theupdated RUL at ti based on the updated PDF of li can be obtained as

f Ri9X0:iðri9X0:iÞ ¼

w�xiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pri

3 Pi9iriþs2 �r exp �

w�xi�liri

�2

2ri Pi9iriþs2 �

0B@

1CA, ri40: ð17Þ


FRi9X0:iðri9X0:iÞ ¼ 1�F

w�xi�liriffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPi9iri

2þs2ri

q0B@

1CAþexp

2liðw�xiÞ

s2þ

2Pi9iðw�xiÞ2

s4

!F �

2Pi9iðw�xiÞriþs2 liriþw�xi

�s2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPi9iri

2þs2ri

q0B@

1CA: ð18Þ

Proof: Using Eqs. (10), (12), (13), (15), (16) and the total law of probability, we have

f Ri9X0:iðri9X0:iÞ ¼

Z þ1�1

f Ri9li ,X0:iðri9li,X0:iÞpðli9X0:iÞdli, ð19Þ

FRi9X0:iðri9X0:iÞ ¼

Z þ1�1

FRi9li ,X0:iðri9li,X0:iÞpðli9X0:iÞdli: ð20Þ

Following Lemma 2, it is straightforward to obtain Eqs. (17) and (18).Comparing Eq. (17) with Eq. (12), we observe that the observation history and the variance of li are involved in Eqs. (17)

and (18), which is also recursively updated. We call Eq. (17) as our model to distinguish other models in Section IV.

Remark 1. In [19], they directly used PrðRirri9X0:iÞ ¼ Pr XðriþtiÞZw9X0:i

��to calculate the RUL distribution and obtained

the associated RUL distribution with similar form to Eqs. (3) and (4). However, using PrðRirri9X0:iÞ ¼ Pr XðriþtiÞZw9X0:i

��ignores the possible hitting events within ðti,tiþriÞ as discussed in Section II-A and thus their results are approximations.Instead, Eqs. (17) and (18) are exact in the sense of the FHT.

Remark 2. The moment of the RUL distribution obtained in [19] and [33], does not exist since their obtained RULdistributions belong to the family of Bernstein distributions, known without moments, but this is not the case for ourresult. For example, the mean of RUL can be easily formulated by

EðRi9X0:iÞ ¼ E½EðRi9li,X0:iÞ9X0:i� ¼ Eðw�xi

li9X1:kÞ ¼

w�xk

Pi9iexpð�

l2

i

2Pi9iÞ

Z li

0expð

u2

2Pi9iÞdu¼

ffiffiffi2p

w�xið ÞffiffiffiffiffiffiffiPi9i

q Dðliffiffiffiffiffiffiffiffiffiffi2Pi9i

q Þ

where DðzÞ ¼ expð�z2ÞR z

0 expðu2Þdu is the Dawson integral for real z, which is known to exist. This property is desired inmaintenance practice, since the expectation of the life estimation is required to be existent sometimes [42,43].

In Eqs. (17) and (18), parameters a0,P0, Q and s2 should be estimated. As opposed to the result in [19,32], we develop aparameter estimation algorithm in the following section for this task.

3. Parameter estimation

Now we return to estimate and update a0, P0, Q and s2 in Eqs. (7) and (8). Unlike the method adopted in [32], we useonly the data from one system from the time of installation and recursively update the estimate along with theobservation process. We denote h¼ ½a0, P0, Q , s2�T as a parameter vector. We use the maximum likelihood estimation(MLE) to estimate h once new degradation observation xi is available. In this case, the log-likelihood function for X0:i can bewritten as

LiðhÞ ¼ log½pðX0:i9hÞ�, ð21Þ

where pðX0:i9hÞ is the joint PDF of the degradation data X0:i. Then the MLE estimate of h, denoted by hi, conditional on X0:i

can be obtained by

hi ¼ argmaxh

LiðhÞ: ð22Þ

If the drift coefficient is constant then maximizing Eq. (21) with respect to h is straightforward. However, since we treatli as a hidden variable which is given by Eq. (7), then directly maximizing Eq. (21) is impossible. However, the EMalgorithm provides a possible framework for estimating the parameters involving hidden variables [44]. A fundamentalassumption of the EM algorithm is that the hidden variables can be estimated by observed data. This is the case for ourproblem since pðli9X0:iÞ can be obtained by Eq. (10).

The fundamental principle of the EM algorithm is to replace the hidden variables with their expectations conditional onthe observed data. Then the parameter estimation can be formulated as maximizing the joint likelihood functionpðX0:i,Ui9hÞ. Specifically, by manipulating the relationship between pðX0:i9hÞ and pðX0:i,Ui9hÞ, LiðhÞ can be divided into twoparts as

LiðhÞ ¼ ‘iðhÞ�log pðUi9X0:i,hÞ, ð23Þ

where

‘iðhÞ ¼ log pðX0:i,Ui9hÞ: ð24Þ

Then taking the expectation operator on both sides of Eq. (23) with respect to Ui9X0:i,h0, we have

LiðhÞ ¼ ‘ðh9h0Þ�Kðh9h0Þ, ð25Þ


where

‘ðh9h0Þ ¼ EUi9X0:i ,h0 ‘iðhÞ� �

, ð26Þ

Kðh9h0Þ ¼ EUi9X0:i ,h0 log pðUi9X0:i,hÞ� �

: ð27Þ

Finally, via Eq. (25), the following holds

LiðhÞ�Liðh0Þ ¼ ‘ðh9h0Þ�‘ðh09h0ÞþKðh09h0Þ�Kðh9h0Þ|fflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflffl}

Z0

, ð28Þ

where the positivity of the last term is implied by the Kullback–Leibler divergence metric between pðUi9X0:i,hÞ andpðUi9X0:i,h

0Þ, [45].

Obviously, if ‘ðh9h0Þ4‘ðh09h0Þ, then LiðhÞ�Liðh0Þ40 holds. This is achieved by the EM algorithm which takes an

approximation hi

ðkÞof MLE hi given in Eq. (22) and updates it to a better hi

ðkþ1Þaccording to the following two steps:

E-step

Calculate ‘ðh9hðkÞ

i Þ ¼ EUi9X0:i ,h

ðkÞ

i

‘iðhÞ� �

, ð29Þ

where yi

ðkÞ¼ ½a0i

ðkÞ, P0iðkÞ, Qi

ðkÞ, si2ðkÞ�T denotes the estimated parameters in the kth step conditional on X0:i.

M-step

Calculate hi

ðkþ1Þ¼ argmax

hEUi9X0:i ,hi

ðkÞ ‘iðhÞ� ��

: ð30Þ

Then we iterate the E-step and M-step until a criterion of convergence is satisfied. In our case, we can calculate theE-step and M-step separately but just outline the properties of the estimation algorithm. The details of the algorithm aresummarized in Appendix C. Interestingly, it can be observed from Theorem 2 in Appendix C that the M-step in our approachcan be solved analytically and we can obtain the unique maximum point. This implies that each iteration of the EMalgorithm can be performed with a single computation, which leads to an extremely fast and simple estimation procedure.This computation advantage plus the exact RUL distribution are particularly attractive for practical applications.The convergence property of the proposed algorithm can be similarly demonstrated in [46–50].

4. A practical case study

In this section, we provide a practical case study for gyros in an inertial navigation system (INS) to illustrate theapplication of our model and compare the performance of our model with the models presented in [19,32]. In our study,it is found that STF can generate superior results to Kalman filter to a certain extent. As such, in the following, we only useSTF in the filtering step for illustration. Of course, in practice which one to use needs to be tested against the model fit.However, due to limited space, we do not discuss this issue in this paper. Actually, the detailed comparisons between STFand Kalman filter with sudden state changing can be found in [51,52].

4.1. Problem description

As a key device of the INS in weapon systems and space equipment, an inertial platform plays an important andirreplaceable role in the INS. Its operating state has a direct influence on navigation precision. The sensors fixed in aninertial platform include three gyros and three accelerometers, which measure angular velocity and linear acceleration,respectively. The gyro fixed on an inertial platform is a mechanical structure having two degrees of freedom from thedriver and sense axis (see [53] for a general description of inertial navigation platforms and gyros). When the inertialplatform is operating, the wheels of the gyros rotate at very high speeds and can lead to rotation axis wear. As the wear isaccumulated, the bearing on the gyro’s electric motor will become deformed and such deformation can lead to the drift ofthe gyros. The increasing drift finally results in the failure of gyros and then the inertial platform. Past data show thatalmost 70% of the failures of inertial platforms result from gyroscopic drift and such drift is largely resulted from the wearof bearings as the case of rolling element bearings which are extensively investigated in the literature. However, thedifference of our case is that we use the drift data of gyros to estimate the RUL rather than the vibration data as rollingelement bearings since we cannot obtain the vibration data as it is not allowed to fix the vibration sensors in the inertialplatform. As such, the drift of gyros is often used as a performance indicator to evaluate the health condition of an inertialplatform and to schedule maintenance activities.

For an illustrative purpose, we provide an illustration of the inertial platform (see Fig. 2), and an illustration of adeformed bearing of the gyro’s electric motor (see Fig. 3), which was obtained by scanning electron microscopy S-3700N.It can be found that the maximum length of the metal flake is 155 mm. Such deformation is reflected by the drifting data ofgyros, which can be monitored directly.

In this study, we assume that CM values of drift coefficients reflect the performance of the inertial platform, and thelarger the drift coefficients monitored are, the worser the performance. Therefore, according to the CM data and technical

Fig. 2. Illustration of the inertial platform.

Fig. 3. Illustration of a deformed bearing of the gyro’s electric motor.


index of the inertial platform, failure prediction can be implemented by modeling the drift coefficients. The driftcoefficients of an inertial platform mainly include K0X , K0Y , K0Z , KSX , KSY , KIZ , in which K0X , K0Y , K0Z denote constant driftcoefficients, and KSX , KSY , KIZ are stochastic drift coefficients, where KSX , KSY denote the coefficients related to the firstmoment of specific force along the sense axis, and KIZ denotes the coefficient related to the first moment of specific forcealong the input axis. Generally, the drift degradation measurement along the sense axis, KSX , plays a dominant role in theassessment of gyro degradation. In our study, we take the CM data of KSX as the degradation signals and use them for RULestimation of the INS. For our monitored INS in certain weapon system with the terminated life 180.5 h, 73 points of driftcoefficients data were collected with regular CM intervals 2.5 h in field condition. The collected data are illustrated in Fig. 4.

In the practice of the INS health monitoring, it is usually required that the drift measurement along the sense axisshould not exceed 0:37ð1=hÞ. This threshold is predetermined at the design stage and is strictly enforced in practice sincean INS is a critical device used in a navigated weapon system.

4.2. The implementation of our model for RUL estimation of the INS

Using our model, the predictions of the gyro’s drift and the distribution of the RUL can be obtained at each CM point.Specifically, using our approach initialized by parameter vector h0 ¼ ½0:002, 0:001, 0:01, 0:01�T , the one step predicteddrifting path by xiþ1 ¼ xiþ liðtiþ1�tiÞ is illustrated in Fig. 4 to show the fitness of our model to the gyro’s drift degradation

0 20 40 60 80 100 120 140 160 1800

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

The CM point (hours)

The

degr

adat

ion

path

of g

yro

drift

(0 /ho

ur)

The actual observed pathThe predicted path

Fig. 4. The actual gyro’s drift data and the predictions of our model.


data. Clearly, the predicted results match with the actual data well and the mean squared error (MSE) of the predictions is1.1962E-4 which is small. This demonstrates that our developed model can model the gyro’s drift degradation dataeffectively. Correspondingly, the evolving path of the estimated parameter vector h, consisting of a0, P0, Q and s2, areillustrated in Fig. 5, which are estimated by Algorithm 4 summarized in Appendix E.

Fig. 5 shows that the updated parameters converge quickly as the observed degradation data are accumulated. In thiscase, once the parameters converge, further updating may be unnecessary. However, such updating is needed for the casethat the degradation process may be subject to unusual changes in its progression. Once the parameters in the model areupdated, the PDF of the estimated RUL can be calculated at each CM point. It is noted that the RUL estimation at a CM pointis not achieved by estimating the long term future degradation state with estimating errors, and instead is achieved bydirectly calculating the FHT of the degradation process. Fig. 6 illustrates the RUL distributions at six different CM points.

It is noted that the first drift reading is zero according to the model setting. In our used practical data, the last driftreading is 0:3566 ð1=hÞ at the monitoring time t73 ¼ 180 h. It can be found that this drift reading is very close to the failurethreshold (w¼ 0:37 ð1=hÞ). Therefore, we can consider that the data captures full life cycle history (useful life) of themachine component. In other words, the actual lifetime of the gyros in an INS is approximated to be 180.5 h and thus theactual RUL at each CM point is known from the full life cycle data. As shown in Fig. 6, the actual RUL (denoted by square)falls within the range of the estimated PDF of the RUL at each CM point from our model and further the estimated PDF ofthe RUL becomes sharper as the degradation data are accumulated. This implies that the uncertainty of the estimated RULis reduced since more data are utilized during estimating the model parameters. When we use our predictive model forRUL estimation at a given CM point, we use the CM data up to that CM point. In other words, if we estimate the RUL at ti,the data X0:i are used to update the model and for RUL estimation. Therefore, at the last CM time t73, all the available CMdata are used to estimate the RUL. Fig. 7 illustrates the performance of our predictive model against the full life cycle data.

It can be observed that the estimated mean RUL and the actual RUL match each other well. For example, the relativeerror between the actual life and the estimated mean life at t73 is 0.16%. This reflects that the estimated result of ourdeveloped model can match the actual result from the full life cycle data closely.

4.3. Comparative studies

In this part, we conduct some comparative studies with the models presented in [19,32]. We first compare our modelwith Wang’s model about their performance of the RUL estimation for the INS. The detailed implementation process ofWang’s model can be found in [32].

In order to compare our model with Wang’s model, a loss function is employed to enable a direct comparison of thedistributions for the RUL estimation between two models. The loss function is the MSE about the actual RUL obtained ateach observation point [4], defined as

MSEi ¼

Z 10ðri�~r iÞ

2f Ri9X0:iri9X0:i

�dri,

�ð31Þ

where ~r i is the actual RUL obtained at ti and f Ri9X0:iri9X0:i

� �is the estimated PDF of the RUL.

Fig. 8(a) compares the estimated RUL distributions from our model with Wang’s model. In both cases, the unknownparameters, such as a0,P0, Q and s2, are obtained by the EM algorithm, but the difference is that our model recursivelyupdates all parameter estimates whenever a new piece of information is available, whereas the model in [32] uses the dataof past systems to estimate the model parameters and once estimated they are fixed. Only li is updated with the new data

0 20 40 60 80 100 120 140 160 1800

0.005

0.01

0.015

0.02

0.025

The CM point(hours)

Est

imat

ed a

0

0 20 40 60 80 100 120 140 160 1800

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1x 10-3

The CM point(hours)

Est

imat

ed P

0

0 20 40 60 80 100 120 140 160 1800

0.2

0.4

0.6

0.8

1

1.2 x 10-3

The CM point(hours)

Est

imat

ed Q

0 20 40 60 80 100 120 140 160 1800

0.2

0.4

0.6

0.8

1

1.2

1.4 x 10-3

The CM point(hours)

Est

imat

edσ

2

Fig. 5. The updated parameters at monitoring points t0 , t1 ,:::, t73.

0 20 40 60 80 100 120 140 160 180 2000

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

The RUL(hours)

Pro

babi

lity

dens

ity fu

nctio

n (P

DF)

The 21th CM pointThe 41th CM pointThe 51th CM pointThe 61th CM pointThe 71th CM pointThe 73th CM pointThe actual RUL

Fig. 6. Illustration of the PDF of the RUL at six different CM points.


0 20 40 60 80 100 120 140 160 1800

20

40

60

80

100

120

140

160

180

200

RU

L(ho

urs)

The CM point(hours)

The actual RULThe estimated RUL

Fig. 7. The estimated mean RUL and the actual RUL at monitoring points t0 ,t1 ,:::,t73.

05

1015

2025

165

170

175

1800

0.05

0.1

0.15

0.2

The RUL (hours)The CM time (hours)

Pro

babi

lity

dens

ity

func

tion

(PD

F)

Our modelWang's model

0 20 40 60 80 100 120 140 160 1800

0.5

1

1.5

2

2.5

3 x 104

The CM point(hours)

MS

Ei a

t eac

h C

M p

oint

Wang's modelOur model

Fig. 8. Comparative results of our model with Wang’s model (a) the estimated PDFs of the RULs at the last six CM points, and (b) the MSEs of the RULs at

all CM points (&–actual RUL).


in Wang’s model. Clearly, the PDFs of the RULs from both models can cover the actual RULs well as the obtained data areaccumulated. However, it is observed from Fig. 8(a) that the PDFs of the estimated RULs are typically more dispersed whenusing Wang’s model. Fig. 8(b) shows the calculated MSE between the actual RUL and the estimated RUL at each CM point.We can observe that the MSEs using Wang’s model change irregularly at different sampling points with relatively large


fluctuations, particularly in the earlier stage of estimation. This implies that the estimated RUL PDFs by Wang’s model issensitive to small changes in observations shown in Fig. 4. This may also lead to completely different health managementdecisions at two consecutive CM points where the actual degradation observations may only have changed a little as seenfrom Fig. 4. The reason for such behavior seen from Fig. 8(b) stems from the factor that the distribution pðli9X0:iÞ of theupdated drift coefficient is not considered and the estimated parameters are not recursively updated in [32]. FromFig. 8(b), we observe that our model can make the MSEs about the actual RUL at each CM point less sensitive to smallchanges. But most importantly our model can improve the accuracy of the estimated PDFs of the RULs with reducedvariances, see Fig. 8. This reduction in variance is due to the use of the complete distribution of li via the Bayesian ruleshown earlier. These comparisons reflect the superiority of our model to Wang’s model in the RUL estimation for the INS.

0 20 40 60 80 100 120 140 160 1800

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

The

drift

deg

rada

tion

data

of t

he IN

S(o /

hour

)

The CM time (hours)

The actual degradation pathGebraeel's model with random prior parametersGebraeel's model with appropriate prior parametersOur model with random prior parameters

05

1015

2025

165170

175180

0

0.05

0.1

0.15

0.2

0.25

0.3


Pro

babi

lity

dens

ity fu

nctio

n (P

DF)

Our modelGebraeel's model with appropriate prior parameters

05

1015

2025

165170

175180

0

0.05

0.1

0.15

0.2

0.25

0.3


Pro

babi

lity

dens

ity fu

nctio

n (P

DF)

Our modelGebraeel model with random prior parameters

Fig. 9. Comparative results of our model with Gebraeel’s model (a) the predicted degradation path, (b) RUL estimation using appropriate prior

parameters, and (c) RUL estimation using random prior parameters (&–actual RUL).

0 20 40 60 80 100 120 140 160 1800

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

The RUL(hours)

Con

ditio

nal r

elia

bilit

y at

the

last

CM

poi

nt Our modelGebraeel's model

Fig. 10. Conditional reliability at the last CM points.


Now, we further compare our model with Gebraeel’s model about their performance in the RUL estimation. Thedetailed implementation process of Gebraeel’s model can be found in [19]. It is noted that the model in [19] needs priorparameters as well our model. In order to compare the goodness of fit, the prior parameters of our model in the followingcomparison are selected at random at time zero, but we consider two cases for the prior parameters of Gebraeel’s model.One uses the appropriately selected parameters as the prior parameters and the other uses random prior parameters. Fig. 9shows the comparative results.

Fig. 9(a) shows that the predicted degradation path. The MSEs between the actual data and the correspondingpredictions of the degradation path for our model with random prior parameters, Gebraeel’s model with random priorparameters, and Gebraeel’s model with appropriate prior parameters are 1.179E-4, 0.0026, and 7.9864E-4, respectively.Thus, our model has the best fitting for the degradation data. We also find that our model is robust with the choice of theprior parameters, but Gebraeel’s model is not. Fig. 9(b) illustrates several estimated PDFs of the RULs over time usingGebraeel’s model with appropriate prior parameters. It is clear that the range of PDFs of RULs covers the actual RULs, butthe uncertainty in the estimated RUL of our model is less than that from Gebraeel’s model, particular for the last fewsampling points, since our estimated PDFs of the RULs are sharper than the results of Gebraeel’s model. However, if theprior parameters in Gebraeel’s model are selected at random, the estimated RUL may be incorrect as illustrated in Fig. 9(c)(the observed RULs are outside of the predicted RUL PDF ranges). In comparison, our model using the random selectedprior parameters can produce reasonable estimates. This demonstrates the merit of our model for RUL estimation of aparticular system. Therefore, if there are no sufficient historical degradation data at the beginning for selecting the priorparameters, our model may be more appropriate for RUL estimation.

As noted by Remark 2, the moments of the RUL distribution obtained by Gebraeel’s model do not exist. Therefore, wecannot calculate the MSE about the actual RUL defined in Eq. (31). But, to have a further look at the difference of theestimated RUL distributions between our exact result and Gebraeel’s approximate result, it is desired to compare theestimated reliability from both models. Here, we use FRi9X0:i

ðri9X0:iÞ ¼ 1�FRi9X0:iðri9X0:iÞ as the conditional reliability obtained

from our exact approach at ti, while FR0i9X0:iðri9X0:iÞ as the conditional reliability obtained from Gebraeel’s model by

PrðR0irri9X0:iÞ ¼ Pr XðriþtiÞZw9X0:i

��at ti. The difference between them at the last CM point is illustrated in Fig. 10.

It can be found that the difference between both results is significant. This is due in part to the many fluctuationsexisting in the degradation path of gyro’s drift so that using PrðL0kr lk9X1:kÞ ¼ Pr XðlkþtkÞZw9X1:k

��to approximate the RUL

will lead to overestimating of the reliability, as discussed in Remark 1. Consequently, when the approximated result isapplied to reliability-centered maintenance, the obtained decision may be far from the reality.

In sum, this practical case study demonstrates that our developed model can work well and efficiently. On the otherhand, we verify that incorporating the observation history to date can improve the accuracy of the RUL estimation indeed.

5. Concluding remarks

In this paper, we present a Wiener-process-based degradation model with a recursive filter algorithm and Bayesianupdating to estimate the PDF of the RUL. Based on the Wiener process, we construct a RUL estimation model whichdepends on the system CM data to date via recursively updating. The updating is done twice in that the drift coefficient isrecursively updated via the state-space model and all parameters are updated via the EM algorithm, both at the time that anew piece of CM data becomes available. During the RUL estimation, we also consider the distribution in the estimateddrift coefficient and accordingly an explicit RUL distribution is obtained based on the concept of FHT. The usefulness of theproposed model is demonstrated via a real-world example for gyros in an inertial navigation system. By comparing the


proposed model with existing models, we show that the proposed model can generate better results than existing modelscompared both in terms of the mean and variance of the estimated RUL and is robust with respect to the choice of the priorparameters.

Although the practical case studies show that our model is a simple but efficient method for RUL estimation, there arestill several issues needed to be further studied. Firstly, in this paper, we only consider a Wiener process with a linear drift,but for complex systems in practice, a nonlinear model may be more appropriate. Secondly, we mainly consider the casethat the failure is resulted by the gradual and continuous degradation in this paper. However, considering the suddenfailure in degradation modeling and RUL estimation is interesting and of practical significance. Because the failure ofcomplex engineering systems is often the result of gradual degradation and stresses generated either by the systemsthemselves or from external shocks. Such shocks frequently lead to the sudden failure. Therefore, considering both thegradual and sudden cases are necessary in future research. Thirdly, we only consider the case that the degradation processis observable in this paper. However, hidden or partially observable degradation process is frequently encountered inmany engineering practices. This may encourage the research for RUL estimation subject to hidden degradation processes.The key to tackle this issue is to incorporate the estimation error or estimation uncertainty in RUL estimation due to theunobservability of the degradation state. In addition, we primarily discuss the issues associated with estimating the RUL.Much more decision-focused research under the prognostic information needs to be researched so that we can generatenew insights on the effect of the estimated RUL upon decision making.

Acknowledgments

The authors would like to sincerely thank and acknowledge the support and constructive comments from the Editorand the anonymous reviewers. This work was partially supported by the National 973 Project under grants 2010CB731800and 2009CB32602, the NSFC under grants 61028010, 61021063, 60931160440, 61174030, 71071097, 71231001,61210012, and 61104223 the National Science Fund for Distinguished Young Scholars of China under grant 61025014and the Fundamental Research Funds for the Central Universities of China.

Appendix A. Algorithm 1: Kalman filtering algorithm

Step 1: Initialize l0 ¼ a0, P0.Step 2: State estimation at time ti

Pi9i�1 ¼ Pi�19i�1þQ

Ki ¼ ðti�ti�1Þ2Pi9i�1þs2ðti�ti�1Þ

li ¼ li�1þPi9i�1ðti�ti�1ÞK�1i xi�xi�1�li�1ðti�ti�1Þ

�:

Step 3: Updating variance

Pi9i ¼ Pi9i�1�Pi9i�1ðti�ti�1Þ2K�1

i Pi9i�1:

Appendix B. Algorithm 2: Strong tracking filtering algorithm

Step 1: Initialize l0 ¼ a0, P0, a, r.Step 2: Calculating fading factor uðtiÞ from orthogonality principle

V0ðtiÞ ¼

g2ðt1Þ, i¼ 1rV0ðti�1Þþg2ðtiÞ

1þr , i41with gðtiÞ ¼ xi�xi�1�li�1ðti�ti�1Þ

8<:

BðtiÞ ¼ V0ðtiÞ�Q ðti�ti�1Þ2�as2ðti�ti�1Þ; CðtiÞ ¼ Pi�19i�1ðti�ti�1Þ

2; u0 ¼ BðtiÞ=CðtiÞ:

uðtiÞ ¼u0, u0Z1

1, u0o1

(

Step 3: State estimation

Pi9i�1 ¼ uðtiÞPi�19i�1þQ

Ki ¼ ðti�ti�1Þ2Pi9i�1þs2ðti�ti�1Þ

li ¼ li�1þPi9i�1ðti�ti�1ÞK�1i xi�xi�1�li�1ðti�ti�1Þ

�:


Step 4: Updating variancePi9i ¼ Pi9i�1�Pi9i�1ðti�ti�1Þ

2K�1i Pi9i�1:

In Algorithm 2, aZ1 and r denote the softening factor and the forgetting factor, respectively, which can be selectedheuristically. r¼ 0:95 has been used in general [39,51].

Appendix C. The implementation of EM algorithm for our model

Following the above-mentioned procedures for the EM algorithm, the joint log-likelihood function for our problem canbe expressed as

‘iðhÞ ¼ log pðX0:i9Ui,hÞþ log pðUi9hÞ

¼ log pðl09hÞþ logYi

j ¼ 0pðlj9lj�1,hÞþ log

Yi

j ¼ 0pðxj9lj�1,hÞ: ðC� 1Þ

From Eqs. (7) and (8), we directly have lj9lj�1 �Nðlj�1,Q Þ, xj9lj�1 �N xj�1þlj�1ðtj�tj�1Þ,s2ðtj�tj�1Þ� �

and l0 �Nða0,P0Þ.Using Eq. (C-1) and ignoring the constant terms, the joint log-likelihood function can be formulated as

2‘iðhÞ ¼ �logP0�ðl0�a0Þ2=P0�

Xi

j ¼ 1log Qþðlj�lj�1Þ

2=Q �

�Xi

j ¼ 1logs2þ xj�xj�1�lj�1ðtj�tj�1Þ

� �2=s2ðtj�tj�1Þ

�: ðC� 2Þ

To calculate the conditional expectation ‘ðh9hðkÞ

i Þ defined in Eq. (29), we have

2‘ðh9hðkÞ

i Þ ¼ EUi9X1:i ,h

ðkÞ

i

2‘iðhÞ½ � ¼ EUi9X0:i ,h

ðkÞ

i

�logP0�ðl0�a0Þ2=P0�

Xi


2=Q �h

�Xi


� �2= s2ðtj�tj�1Þ� � �i

: ðC� 3Þ

Clearly, to calculate the expectation of this expression requires to obtain EUi9X0:i ,h

ðkÞ

i

ðljÞ, EUi9X0:i ,h

ðkÞ

i

ðlj2Þ and E

Ui9X0:i ,hðkÞ

i

ðljlj�1Þ,

which are the conditional expectations with respect to Ui, given the observed history X0:i. In this paper, we use the Rauch–

Tung–Striebel (RTS) smoother to provide an optimal estimation of EUi9X0:i ,h

ðkÞ

i

ðljÞ, EUi9X0:i ,h

ðkÞ

i

ðlj2Þ and E

Ui9X0:i ,hðkÞ

i

ðljlj�1Þ,

summarized as Algorithm 3, [47,54]. In Algorithm 3, we define M j9i ¼ Covðlj,lj�19X0:iÞ.

Algorithm 3. RTS smoothing algorithm

Step 1: Forwards iteration by Algorithm 1 or Algorithm 2Step 2: Backwards iteration

Sj ¼ Pj9jP�1jþ19j

lj9i ¼ ljþSjðljþ19i�ljþ19jÞ ¼ ljþSjðljþ19i�ljÞ

Pj9i ¼ Pj9jþSj2ðPjþ19i�Pjþ19jÞ

Step 3: InitializeMi9i ¼ 1�Kiðti�ti�1Þð ÞPi�19i�1

Step 4: Backwards iteration for smoothing covarianceMj9i ¼ Pj9jSj�1þSjðMjþ19i�Pj9jÞSj�1

From the RTS smoothing algorithm, we can obtain the conditional expectations of EUi9X0:i ,h

ðkÞ

i

ðljÞ, EUi9X0:i ,h

ðkÞ

i

ðlj2Þ and

EUi9X0:i ,h

ðkÞ

i

ðljlj�1Þ in the following lemma.

Lemma 3. Conditional on current estimated parameter hðkÞ

i and observations history X0:i, the values of EUi9X0:i ,h

ðkÞ

i

ðljÞ,EUi9X0:i ,h

ðkÞ

i

ðlj2Þ and E

Ui9X0:i ,hðkÞ

i

ðljlj�1Þ are given by

EUi9X0:i ,h

ðkÞ

i

ðljÞ ¼ lj9i,

EUi9X0:i ,h

ðkÞ

i

ðlj2Þ ¼ lj9i

2þPj9i,

EUi9X0:i ,h

ðkÞ

i

ðljlj�1Þ ¼ lj9ilj�19iþMj9i ¼ Pj9jSj�1þSjðMjþ19i�Pj9jÞSj�1þ lj9i lj�19i :

ðC� 4Þ


Proof: These equations are the direct results of applying the properties of variance-covariance and RTS smoothingalgorithm and the proof is hence omitted.

From Eq. (C-3) and Lemma 3, ‘ðh9hðkÞ

i Þ can be written as

2‘ðh9hðkÞ

i Þ ¼ EUi9X0:i ,h

ðkÞ

i

2‘iðhÞ½ � ¼ EUi9X0:i ,h

ðkÞ

i

�logP0�ðl0�a0Þ2=P0�

Xi


2=Q �h

�Xi


� �2= s2ðtj�tj�1Þ� � �i

¼�logP0�ðC09i�2a09ia0þa02Þ=P0�

Xi

j ¼ 1log QþðCj9i�2Cj,j�19iþCj�19iÞ=Q �

�Xi

j ¼ 1logs2þ xj�xj�1

� �2�2lj�19i xj�xj�1

� �ðtj�tj�1Þþðtj�tj�1Þ

2Cj�19i

�= s2ðtj�tj�1Þ� � �

: ðC� 5Þ

This completes the E-step and in the following we handle the M-step.After obtaining ‘ðh9h

ðkÞ

i Þ, the results of estimated parameter hðkþ1Þ

i in the (kþ1)th step can be summarized in thefollowing theorem.

Theorem 2. hðkþ1Þ

i , by maximizing ‘ðh9hðkÞ

i Þ, is given by

a0iðkþ1Þ ¼ a09i,

P0iðkþ1Þ

¼ C09i�a09i2 ¼ P09i,

Qiðkþ1Þ

¼1

i

Xi

j ¼ 1ðCj9i�2Cj,j�19iþCj�19iÞ,

s2� �

i

ðkþ1Þ¼

1

i

Xi

j ¼ 1

xj�xj�1

� �2�2lj�19i xj�xj�1

� �ðtj�tj�1Þþðtj�tj�1Þ

2Cj�19i

tj�tj�1

0@

1A: ðC� 6Þ

with Cj9i ¼ EUi9X0:i ,h

ðkÞ

i

ðlj2Þ,a09i ¼ l09i,Cj,j�19i ¼ E

Ui9X0:i ,hðkÞ

i

ðljlj�1Þ and hðkþ1Þ

i is uniquely determined and located at the maximum.

Proof: See Appendix D.The preceding derivations are summarized as Algorithm 4 in Appendix E via a complete specification of the EM-based

algorithm for estimating the parameters in h when new observation xi is available.

Appendix D. The proof of Theorem 2

The unknown parameters hðkþ1Þ

i can be obtained by maximizing the ‘ðh9hðkÞ

i Þ with h. Therefore, the central goal in this

step is to find the maximum of function ‘ðh9hðkÞ

i Þ on h, i.e.

hðkþ1Þ

i ¼ argmaxh

EUi9X0:i ,h

ðkÞ

i

‘iðhÞ� ��

¼ argmaxh

‘ðh9hðkÞ

i Þ ðD� 1Þ

From Eq. (C-5), taking @‘ðh9hðkÞ

i Þ=@h, we obtain the solution by @‘ðh9hðkÞ

i Þ=@h¼ 0, which leads to Eq. (36), and taking@2‘ðh9h

ðkÞ

i Þ=@h@hT , the following is obtained,

@2‘ðh9hðkÞ

i Þ

@h@hT¼

1

2

�2=P0 2ða0�a09iÞ=P20 0 0

2ða0�a09iÞ=P20 1=P2

0�2ðC09i�2a09ia0þa02Þ=P3

0 0 0

0 0 i=Q2�2c=Q3 0

0 0 0 i=s4�2f=s6

2666664

3777775, ðD� 2Þ

with

c¼Pi

j ¼ 1ðCj9i�2Cj,j�19iþCj�19iÞ:

f¼Pi

j ¼ 1

xj�xj�1ð Þ2�2lj�19i xj�xj�1ð Þðtj�tj�1Þþ ðtj�tj�1Þ

2Cj�19i

tj�tj�1

� �ðD� 3Þ

We show that the matrix in (D-2) is negative definite at h¼ hðkþ1Þ

i , by calculating the order principal minor determinantas follows,

D1 ¼�1

P0,D2 ¼�

1

2P0

1

P20

�2ðC09i�2a09ia0þa0

2Þ

P30

!�ða0�a09iÞ

2

P40

,

D3 ¼1

2

i

Q2�

2cQ3

� �D2,D4 ¼

1

2

i

s4�

2fs6

� �D3:


Then, at h¼ hðkþ1Þ

i , the following results are obtained,

D19h ¼ hðkþ 1Þ

i

¼�1

P09io0,

D29h ¼ hðkþ 1Þ

i

¼�1

2P09i

1

P09i2�

2ðC09i�2a09ia09iþa09i2Þ

P09i3

!�ða09i�a09iÞ

2

P09i4

¼�1

2P09i

1

P09i2�

2

P09i2

!¼

1

2P09i340,

D39h ¼ hðkþ 1Þ

i

¼1

2

i3

c2�

2i3cc3

!D29h ¼ h

ðkþ 1Þ

i

¼�i3

2c2D29h ¼ h

ðkþ 1Þ

i

o0,

D49h ¼ hðkþ 1Þ

i

¼1

2

i

s4�

2fs6

� �D39h ¼ h

ðkþ 1Þ

i

¼1

2

i3

f2�

2i3ff3

!D39h ¼ h

ðkþ 1Þ

i

¼�i3

2f2D39h ¼ h

ðkþ 1Þ

i

40:

This completes the proof that the matrix in (D-2) is negative definite at h¼ hðkþ1Þ

i , verifying that hðkþ1Þ

i is located at a

maximum. In addition, hðkþ1Þ

i is unique since hðkþ1Þ

i is the only solution satisfying @‘ðh9hðkÞ

i Þ=@h¼ 0.

Appendix E. Algorithm 4: EM algorithm for parameter estimation

1)
Initialization Initialize the initial parameters in hð0Þ
i .
2) E-Step Calculate the expectation quantities defined in Eq. (C-3) using Algorithm 3 and Eq. (C-4), with state-space model
of Eqs. (7) and (8) parameterized by hðkÞ

i .

3)
M-Step Maximize ‘ðh9h
ðkÞ

i Þ using Eq. (C-6) to obtain the updated parameter estimates by Theorem 2.

4)
Test convergence Test the convergence of the algorithm. If converged, then stop. Otherwise set k¼ kþ1, go to Step 2
and repeat.

References

[1] M. Pecht, Prognostics and Health Management of Electronics, John Wiley, New Jersey, 2008.[2] X.S. Si, W. Wang, C.H. Hu, D.H. Zhou, Remaining useful life estimation–a review on the statistical data driven approaches, Eur. J. Oper. Res. 213

(2011) 1–14.[3] F. Camci, R.B. Chinnam, Health state estimation and prognostics in machining processes, IEEE Trans. Autom. Sci. Eng. 7 (2010) 581–597.[4] M.J. Carr, W. Wang, Modeling failure modes for residual life prediction using stochastic filtering theory, IEEE Trans. Rel. 59 (2010) 346–355.[5] M. Dong, D. He, Hidden semi-Markov model-based methodology for multi-sensor equipment health diagnosis and prognosis, Eur. J. Oper. Res. 178

(2007) 858–878.[6] Y. Peng, M. Dong, A prognosis method using age-dependent hidden semi-Markov model for equipment health prediction, Mech. Syst. Signal Process.

25 (2011) 237–252.[7] J. Sikorska, M. Hodkiewicz, L. Ma, Prognostic modelling options for remaining useful life estimation by industry, Mech. Syst. Signal Process. 25 (2011)

1803–1836.[8] M.I. Mazhar, S. Kara, H. Kaebernick, Remaining life estimation of used components in consumer products life cycle data analysis by Weibull and

artificial neural networks, J. Oper. Manag. 25 (2007) 1184–1193.[9] W. Wang, A two-stage prognosis model in condition based maintenance, Eur. J. Oper. Res. 182 (2007) 1177–1187.

[10] M.Y. You, L. Li, G. Meng, J. Ni, Statistically planned and individual improved predictive maintenance management for continuously monitoreddegrading systems, IEEE Trans. Rel. 59 (2010) 744–753.

[11] A.K.S. Jardine, D. Lin, D. Banjevic, A review on machinery diagnostics and prognostics implementing condition-based maintenance, Mech. Syst.Signal Process. 20 (2006) 1483–1510.

[12] M.-L.T. Lee, G.A. Whitmore, Threshold regression for survival analysis: modeling event times by a stochastic process reaching a boundary, Stat. Sci.21 (2006) 501–513.

[13] L.A. Escobar, W.Q. Meeker, A review of accelerated test models, Stat. Sci. 21 (2006) 552–577.[14] V.R. Joseph, I.T. Yu, Reliability improvement experiments with degradation data, IEEE Trans. Rel. 55 (2006) 149–157.[15] C.J. Lu, W.Q. Meeker, Using degradation measures to estimate a time-to-failure distribution, Technometrics 35 (1993) 161–174.[16] J.I. Park, S.J. Bae, Direct prediction methods on lifetime distribution of organic light-emitting diodes from accelerated degradation tests, IEEE Trans.

Rel. 59 (2010) 74–90.[17] M.D. Pandey, X.-X. Yuan, J.M. van Noortwijk, The influence of temporal uncertainty of deterioration on life-cycle management of structures, Struct.

Infrastruct. Eng. 5 (2009) 145–156.[18] S. Bloch-Mecier, A preventive maintenance policy with sequential checking procedure for a Markov deteriorating system, Eur. J. Oper. Res. 147

(2002) 548–576.[19] N. Gebraeel, M.A. Lawley, R. Li, J.K. Ryan, Residual-life distributions from component degradation signals: a Bayesian approach, IIE Trans. 37 (2005)

543–557.[20] M.C. Delia, P.O. Rafael, A maintenance model with failures and inspection following Markovian arrival processes and two repair modes, Eur. J. Oper.

Res. 186 (2008) 694–707.[21] H.T. Liao, E.A. Elsayed, Reliability inference for field conditions from accelerated degradation testing, Nav. Res. Log. 53 (2006) 576–587.[22] C.T. Barker, M.J. Newby, Optimal non-periodic inspection for a multivariate degradation model, Reliab. Eng. Sys. Safe. 94 (2009) 33–43.[23] S.T. Tseng, J. Tang, L.H. Ku, Determination of optimal burn-in parameters and residual life for highly reliable products, Nav. Res. Log. 50 (2003) 1–14.[24] G.A. Whitmore, F. Schenkelberg, Modelling accelerated degradation data using Wiener diffusion with a time scale transformation, Lifetime Data

Anal. 3 (1997) 27–45.[25] C.Y. Peng, S.T. Tseng, Mis-specification analysis of linear degradation models, IEEE Trans. Rel. 58 (2009) 444–455.


[26] X. Wang, Wiener processes with random effects for degradation data, J. Multivariate Anal. 101 (2010) 340–351.[27] A. Ray, S. Tangirala, Stochastic modeling of fatigue crack dynamics for on-line failure prognostics, IEEE Trans. Control Syst. Techn. 4 (1996) 443–450.[28] S.T. Tseng, C.Y. Peng, Optimal burn-in policy by using an integrated Wiener process, IIE Trans. 36 (2004) 1161–1170.[29] M. Crowder, J. Lawless, On a scheme for predictive maintenance, Eur. J. Oper. Res. 176 (2007) 1713–1722.[30] J. Tang, T.S. Su, Estimating failure time distribution and its parameters based on intermediate data from a Wiener degradation model, Nav. Res. Log.

55 (2008) 265–276.[31] Z.G. Xu, Y.D. Ji, D.H. Zhou, Real-time reliability prediction for a dynamic system based on the hidden degradation process identification, IEEE Trans.

Rel. 57 (2008) 230–242.[32] W. Wang, M. Carr, W. Xu, A.K.H. Kobbacy, A model for residual life prediction based on Brownian motion with an adaptive drift, Microeletron. Rel. 51

(2011) 285–293.[33] A.H. Elwany, N.Z. Gebraeel, Real-time estimation of mean remaining life using sensor-based degradation models, J. Manuf. Sci. Eng. 131 (2009).

051005-1-9.[34] A.H. Christer, W. Wang, A model of condition monitoring of a production plant, Int. J. Prod. Res. 30 (1992) 2199–2211.[35] R. Li, J.K. Ryan, A Bayesian, inventory model using real-time condition monitoring information, Prod. Oper. Manag. 20 (2011) 754–771.[36] D.R. Cox, H.D. Miller, The Theory of Stochastic Processes, Methuen and Company, London, 1965.[37] D.C. Montgomery, Introduction to Statistical Quality Control, 6th Edition, John Wiley, New York, 2008.[38] A.C. Harvey, Forecasting, Structural Time Series Models and the Kalman Filter, Cambradge University Press, Cambridge, U.K., 1989.[39] D.H. Zhou, P.M. Frank, Strong tracking filtering of nonlinear time-varying stochastic systems with colored noise: application to parameter estimation

and empirical robustness analysis, Int. J. Control. 65 (1996) 295–307.[40] G.A. Whitmore, Normal-Gamma mixture of inverse Gaussian distributions, Scand. J. Statist. 13 (1986) 211–220.[41] O.O. Aalen, Effects of frailty in survival analysis, Stat. Methods Med. Res. 3 (1994) 227–243.[42] C. Derman, G.J. Lieberman, S.M. Ross, On the use of replacements to extend system life, Oper. Res. 32 (1984) 616–627.[43] S.M. Shechter, M.D. Bailey, A.J. Schaefer, Replacing nonidentical vital components to extend system life, Nav. Res. Log. 55 (2008) 700–703.[44] A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Stat. Soc. B 39 (1977) 1–38.[45] T. Kailath, A. Sayed, B. Hassabi, Linear Estimation, Prentice-Hall, Upper Saddle River, NJ, 2000.[46] C.F.J. Wu, On the convergence property of the EM algorithm, Ann. Statist. 11 (1983) 95–103.[47] M. Dewar, K. Scerri, V. Kadirkamanathan, Data-driven spatio-temporal modeling using the integro-difference equation, IEEE Trans. Signal Process.

57 (2009) 83–91.[48] S. Gibson, A. Wills, B. Ninness, Maximum-likelihood parameter estimation of bilinear systems, IEEE Trans. Automat. Contr. 50 (2005) 1581–1596.[49] A. Wills, B. Ninness, S. Gibson, Maximum likelihood estimation of state space models from frequency domain data, IEEE Trans. Automat. Contr.

54 (2009) 19–33.[50] T.B. Schon, A. Wills, B. Ninness, System identification of nonlinear state-space models, Automatica 47 (2011) 39–49.[51] D.H. Zhou, P.M. Frank, Fault diagnostics and fault tolerant control, IEEE Trans. Aero. Electr. Sys. 34 (1998) 420–427.[52] D.J. Jwo, S.H. Wang, Adaptive fuzzy strong tracking extended Kalman filtering for GPS navigation, IEEE Sens. J. 7 (2007) 778–789.[53] O.J. Woodman, An introduction to inertial navigation, Technical report, published by the University of Cambridge Computer Laboratory, 2007,

/http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-696.htmlS, available via Internet.[54] H.E. Rauch, F. Tung, C.T. Striebel, Maximum Likelihood estimates of linear dynamic systems, AIAA J. 3 (1965) 1145–1450.

http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-696.html

Documents

Mechanical Systems and Signal Processingstatic.tongtianta.site/paper_pdf/6e2be43e-7907-11e9-adb6-00163e08… · A Wiener-process-based degradation model with a recursive ﬁlter algorithm