Optimal Control and Estimation of Stochastic …...Chapter 2. Optimal Control of Stochastic Systems 6 control policy that minimizes the long-run expected average cost per unit time

Optimal Control and Estimation of Stochastic Systemswith Costly Partial Information

by

Michael Jong Kim

A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy

Graduate Department of Industrial EngineeringUniversity of Toronto

Copyright c© 2012 by Michael Jong Kim

Abstract

Optimal Control and Estimation of Stochastic Systems with Costly Partial Information

Michael Jong Kim

Doctor of Philosophy

Graduate Department of Industrial Engineering

University of Toronto

2012

Stochastic control problems that arise in sequential decision making applications typically

assume that information used for decision-making is obtained according to a predeter-

mined sampling schedule. In many real applications however, there is a high sampling

cost associated with collecting such data. It is therefore of equal importance to determine

when information should be collected as it is to decide how this information should be

utilized for optimal decision-making. This type of joint optimization has been a long-

standing problem in the operations research literature, and very few results regarding

the structure of the optimal sampling and control policy have been published. In this

thesis, the joint optimization of sampling and control is studied in the context of mainte-

nance optimization. New theoretical results characterizing the structure of the optimal

policy are established, which have practical interpretation and give new insight into the

value of condition-based maintenance programs in life-cycle asset management. Applica-

tions in other areas such as healthcare decision-making and statistical process control are

discussed. Statistical parameter estimation results are also developed with illustrative

real-world numerical examples.

ii

To Li

iii

Acknowledgements

First and foremost, I thank my dearest family - Leo, little Sammy, Shaddy, Hobo, Fed

(the peck), Sammy, Li, Janice, Jona, Ma and Dad. This thesis was possible because of

you.

Of course, I thank my supervisor Viliam Makis, who taught me (among many impor-

tant lessons) the importance of research excellence, dedication and hard work. Pretty

much everything I know about the beautiful world of stochastic modeling and optimiza-

tion, I learned from you. Working with you has been a true privilege.

I would also like to give a special thanks to Roy Kwon and Daniel Frances for their

friendship and continued support throughout my years at U of T.

I thank my PhD committee Roy Kwon, Jeremy Quastel, Baris Balcioglu, Daniel

Frances, Haitao Liao and Kagan Kerman, for their guidance and advice, and Chi-Guhn

Lee and Timothy Chan for their great help in my final year of PhD. I must also thank

the amazing MIE graduate staff Brenda Fung, Donna Liu and Lorna Wong, who always

patiently answered my (many) questions. I also can’t thank NSERC enough for support-

ing my passion for research since my undergraduate studies. Your support has made all

the difference.

Finally, I would like to thank all the awesome friends I’ve made during my stay at U

of T. My QRM lab: Zhijian Yin, Bing Liu, Ming Yang, Rui (Eric) Jiang, Jing Yu, Zillur

Rahman, Jue Wang, Lawrence Yip, Jian Liu, Cathy Hancharek, Konstantin Shestopaloff,

Chen Lin, Akram Khaleghei GB and Farnoosh Naderkhani. And my UTORG crew:

Jonathan Li, Vahid Sarhangian, Kimia Ghobadi, Jenya Doudareva, Velibor Misic and

Hamid Ghaffari. You are the reason my stay here was always fun and full of laughs.

I thank you all.

iv

Contents

1 Introduction 1

2 Optimal Control of Stochastic Systems 5

2.1 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Derivation of the Optimality Equation and Structural Properties . . . . . 10

2.3 Optimality of Bayesian Control Chart . . . . . . . . . . . . . . . . . . . . 19

2.4 Computation of the Optimal Policy . . . . . . . . . . . . . . . . . . . . . 23

2.5 Conclusions and Future Research . . . . . . . . . . . . . . . . . . . . . . 31

3 Optimal Sampling and Control of Stochastic Systems 32


3.2 Structural Form of the Optimal Policy . . . . . . . . . . . . . . . . . . . 39

3.3 Computation of the Optimal Policy . . . . . . . . . . . . . . . . . . . . . 50

3.3.1 Constructing the Optimal Control Chart . . . . . . . . . . . . . . 52

3.3.2 Comparison with Other Policies . . . . . . . . . . . . . . . . . . . 54


4 Parameter Estimation for Stochastic Systems 60


4.2 Parameter Estimation Using the EM Algorithm . . . . . . . . . . . . . . 66

4.2.1 Form of the Likelihood Function . . . . . . . . . . . . . . . . . . . 68

v

4.2.2 Form of the Pseudo Likelihood . . . . . . . . . . . . . . . . . . . 71

4.2.3 Maximization of the Pseudo Likelihood Function . . . . . . . . . 77

4.3 A Practical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . 79


Bibliography 87

vi

Chapter 1

Introduction

The motivation behind this thesis comes from the following scenario. Consider a system

that begins in a brand new state, deteriorates over time due to use, and ultimately

fails. Over the system’s useful life, data is collected at discrete time points to get partial

information about its condition, since the true level of system deterioration is generally

unknown. When the system fails it is replaced by a new independent system of the

same type. Data is once again collected until the new system fails and is replaced by

yet another system, and the cycle continues. Suppose we have the ability to dynamically

control the replacement cycles in two ways. First, we can decide at what time points

data will be collected, and second, we can opt to replace a functional system before it

fails. If we impose a cost structure, (e.g. replacement costs, data collection costs, etc.)

it is natural to ask whether there exists an optimal control policy that minimizes some

useful cost objective such as the expected long-term cost rate over an infinite horizon.

More importantly, one may also want to know if such an optimal control policy possesses

any insightful structural properties that can be utilized at a managerial level, or to aid

further algorithmic developments.

In this thesis, we formulate and analyze the above problem statement under different

model assumptions. New theoretical results characterizing the structure of the optimal

1

Chapter 1. Introduction 2

control policies are established, which have practical interpretation and give new insight

into the value of condition-based maintenance programs in life-cycle asset management.

Statistical parameter estimation results are also developed with illustrative real-world

numerical examples. In particular, we consider a mining industry application where con-

dition monitoring data comes from the transmission oil samples of 240-ton heavy hauler

trucks used in the Athabasca oil sands of Alberta, Canada. During the operational life of

each transmission unit, oil samples are collected at discrete time points (approximately

every 600 hours) and spectrometric oil analysis is carried out, which provides the concen-

trations (in ppm) of iron and copper that come from the direct wear of the transmission

unit. This data gives partial information about the transmission’s condition, since the

true condition of the unit is unobservable. Using this data set, we illustrate the practical

benefits of both the control and estimation results of this thesis.

In Chapter 2, we consider the optimal control problem with periodic inspections. The

state process follows an unobservable continuous time homogeneous Markov process.

At equidistant sampling times vector-valued observations having multivariate normal

distribution with state-dependent mean and covariance matrix are obtained at a positive

cost. At each sampling epoch a decision is made either to run the system until the next

sampling epoch or to carry out full preventive maintenance, which is assumed to be less

costly than corrective maintenance carried out upon system failure. The objective is to

determine the optimal control policy that minimizes the long-run expected average cost

per unit time. We formulate the problem as an optimal stopping problem with partial

information. We show that the optimal preventive maintenance region is a convex subset

of Euclidean space. We also analyze the practical three-state version of this problem in

detail and show that in this case the optimal policy is a control limit policy. Based on

this structural result, an efficient computational algorithm is developed for the three-state

problem, illustrated by a real-world numerical example.

In Chapter 3, we consider the situation in which the decision maker can decide when


condition monitoring information should be collected, as well as when to initiate preven-

tive maintenance. The objective is to characterize the structural form of the optimal sam-

pling and maintenance policy that minimizes the long-run expected cost per unit time.

The problem is formulated as a partially observable Markov decision process (POMDP).

It is shown that monitoring the posterior probability that the system is in a so-called

warning state is sufficient for decision-making. We prove that the optimal control policy

can be represented as a control chart with three critical thresholds. Such a control chart

has direct practical value as it can be readily implemented for online decision-making.

Implication of the structural results such as planning maintenance activities into the fu-

ture are discussed, and cost comparisons with other suboptimal policies are developed

which illustrate the benefits of the joint optimization of sampling and control.

In Chapter 4, we present a parameter estimation procedure for a condition-based

maintenance model with partial information. Two types of data histories are available:

data histories that end with observable failure, and censored data histories that end

when the system has been suspended from operation but has not failed. The approach

taken in this chapter is to first pre-process the data histories and remove as much of the

autocorrelation as possible before proceeding to hidden Markov modeling. The idea is

to first decide on an initial approximation for the healthy portions of the data histories

and fit a time series model to the healthy data portions. The residuals using the fitted

model are then computed for both healthy and unhealthy portions of data histories, and

formal statistical tests for conditional independence and multivariate normality are per-

formed. The residuals are then chosen as the “observation” process in the hidden Markov

framework. The main advantage of this approach is that the conditional independence

and multivariate normality of the residuals are essential for tractable maintenance opti-

mization modeling, and, as a result, computational times for parameter estimation are

extremely fast. The model parameters are estimated using the EM algorithm. We show

that both the pseudo likelihood function and the parameter updates in each iteration


of the EM algorithm have explicit formulas. The estimation procedure is illustrated on

real-world data coming from mining industry.

Bibliographical note. Chapter 2 contains results from Kim and Makis [39]. Chap-

ter 3 contains results from Kim and Makis [40]. Chapter 4 contains results from Kim et

al. [38], Kim et al. [41] and Kim et al. [42].

Chapter 2

Optimal Control of Stochastically

Failing Systems with Periodic

Inspections

Consider a deteriorating system that can be in one of N unobservable operational states

1, . . . , N, or in an observable failure state N+1. The state process (Xt : t ∈ R+) follows

a continuous time homogeneous Markov chain with state space 1, . . . , N ∪ N + 1.

At equidistant sampling times ∆, 2∆, . . ., vector data Y1, Y2, . . . ∈ Rd, are sampled at

a positive cost. We assume that (Yn) have multivariate normal distribution with state-

dependent mean and covariance matrix. The observations represent information obtained

through condition monitoring, such as engine oil data obtained from spectrometric analy-

sis or vibration data collected from rotating machinery. When the system fails, corrective

maintenance is performed, which is either a replacement or a maintenance action that

returns the system to a “good-as-new” condition, i.e. returns Xt to state 1. At each

sampling epoch, a decision is made either to run the system until the next sampling

epoch or to carry out full preventive maintenance. Preventive maintenance also returns

the system to a “good-as-new” condition. The objective is to determine the optimal

5

Chapter 2. Optimal Control of Stochastic Systems 6

control policy that minimizes the long-run expected average cost per unit time.

A lot of recent theoretical research has been done on the analysis and control of

maintenance models. Neuts et al. [56] considered a failing system governed by phase

type distributions. The authors analyzed the stationary distribution of the state process

and considered two types of performance measures: availability and rate of occurrence

of failures (ROCOF). Makis et al. [52] considered a repair/replacement model for a

single unit system with random repair costs. Jiang et al. [33] studied a maintenance

model with general repair and two types of replacement actions: failure and preventive

replacement. The authors proved that a generalized repair-cost-limit policy is optimal for

the minimization of the long-run expected average cost per unit time. Li and Shaked [46]

analyzed an imperfect repair model also subject to preventive maintenance. The authors

compared a variety of different maintenance policies using a point-process approach.

Some recent and classical survey papers on maintenance optimization are [12], [72] and

[74]. In addition to the theoretical work done in this area, maintenance models have been

successfully applied in many real world applications including furnace erosion prediction

using the state-space model [14], transmission fault detection using the proportional

hazards model [50], and helicopter gearbox state assessment using the hidden Markov

model [8].

We show in this chapter that the optimal control policy for the three-state version

of our model is a control limit policy. This provides a formal justification for the recent

papers by Yin and Makis [81] and Kim et al. [38] who proposed Bayesian control charts

for maintenance decision making, but did not prove that such a control policy is optimal.

This also shows that the χ2 control chart recently proposed by Wu and Makis [78] is in

fact a suboptimal control policy. The model considered in this chapter can also be viewed

as a generalization of a recent model considered by Makis [49], who analyzed a two-state

version of our model in the context of quality control, but did not consider observable

failure information, a property that is present in maintenance applications.


The remainder of this chapter is organized as follows. In §2.1, we describe the model

and formulate the control problem as an optimal stopping problem with partial infor-

mation. In §2.2, we use the λ−minimization technique to transform the problem into

a stopping problem with an additive objective function, which is easier to analyze. We

derive the optimality equation and characterize the structural properties of the optimal

control policy. It is shown that the optimal preventive maintenance region is a convex

subset of Euclidean space. In §2.3, we treat the practical three-state version of our model

in detail and show that the optimal control policy is a control limit policy. Based on

this structural property, in §2.4, an efficient computational algorithm is developed for the

three-state problem, illustrated by a numerical example. Concluding remarks and future

research directions are provided in §2.5.

2.1 Model Formulation

Let (Ω,F , P ) be a complete probability space on which the following stochastic processes

are defined. The state process (Xt : t ∈ R+) is a continuous time homogeneous Markov

chain with N ∈ N unobservable operational states X = 1, . . . , N and an observable

failure state N + 1, so that the state space of the Markov chain is X = X ∪ N + 1.

The instantaneous transition rates

qij = limh→0+

P (Xh = j|X0 = i)

h< +∞, i 6= j ∈ X

qii = −∑j 6=i

qij,

and the state transition rate matrix Q = (qij)N+1×N+1. We assume that if i < j, state

i is not worse than state j, and state 1 denotes the state of a new system. To model

such monotonic system deterioration, we assume that the state process is non-decreasing

with probability 1, i.e. qij = 0 for all j < i. This implies that the failure state is

absorbing. We also assume that if i < j, then failure rates qi,N+1 ≤ qj,N+1. Upon system


failure, corrective maintenance is carried out, which brings the system to a new state.

The observable time to system failure is denoted ξ := inf t ∈ R+ : Xt = N + 1.

The system is monitored at equidistant sampling times ∆, 2∆, . . ., ∆ ∈ (0,+∞),

and the information obtained at time n∆ is denoted Yn ∈ Rd. While the system is in

operational state i ∈ X , we assume Yn|Xn∆ = i ∼ Nd (µi,Σi), and that observations

(Yn : n ∈ N) are conditionally independent given the system state. The conditional den-

sity of Yn given Xn∆ is denoted

f (y|i) =1√

(2π)d

det (Σi)exp

(−1

2(y − µi)

′Σ−1i (y − µi)

), y ∈ Rd, i ∈ X . (2.1.1)

Let F = (Fn : n ∈ Z+) be the complete natural filtration generated by the observable

information at each sampling epoch,

Fn = σ(Y1, . . . , Yn, Iξ>n∆).

After collecting a sample and processing the new information, a decision is made either

to run the system until the next sampling epoch or carry out full preventive maintenance,

which brings the system to a new state. We consider the following cost structure.

Ci = operational cost rate in state i ∈ X .

Cfi = corrective maintenance cost if failure occurs from operational state i ∈ X .

Cpi = preventive maintenance cost in operational state i ∈ X .

Cs = sampling cost incurred when obtaining each observation Yn.

We assume that preventive maintenance becomes more costly as the system deteriorates,

i.e. Cpi ≤ Cpj for i ≤ j. Furthermore, preventive maintenance is assumed to be less

costly than corrective maintenance, i.e. maxi∈X Cpi < mini∈X Cfi. This assumption

is a requirement to make the problem non-trivial. Indeed, if the cost due to system failure

is lower than the cost of preventive maintenance, then the optimal action is always to let

the system run until failure.


The objective is to determine the optimal control policy minimizing the long-run

expected average cost per unit time. The problem can be formulated as an optimal

stopping problem with partial information. From renewal theory, the long-run expected

average cost per unit time is calculated for any control policy as the expected cost incurred

in one cycle divided by the expected cycle length, where a cycle is completed when either

preventive or corrective maintenance is carried out, which brings the system to a new

state. For the average cost criterion, the control problem is formulated as follows. Find

an F−stopping time τ ∗, if it exists, minimizing the long-run expected average cost per

unit time given by

EΠ0(TCτ)

EΠ0(τ∆ ∧ ξ)

, (2.1.2)

where τ is an F−stopping time, TCτ is the total cost incurred over one complete cycle

of length τ∆∧ ξ, and EΠ0is the conditional expectation given Π0, the initial distribution

of X0. We assume that a new system is installed at the beginning of each cycle, i.e.

Π0 = [1, 0, . . . , 0]1×N+1

. Based on the cost structure given above,

TCn =∑i∈X

Ci

∫ n∆

0

IXs=ids+∑i∈X

CfiIξ≤n∆,Xξ−=i

+∑i∈X

CpiIXn∆=i + (n ∧ bξ/∆c)Cs, (2.1.3)

where TCn represents the total cost incurred if preventive maintenance is scheduled at

time n∆. The summands on the right-hand side of (2.1.3) represent the total operational

cost, corrective maintenance cost, preventive maintenance cost, and sampling cost, re-

spectively.

The optimal F−stopping time τ ∗ represents the first sampling epoch at which full

preventive maintenance should take place. It is important to realize that since we are

also considering mandatory corrective maintenance upon system failure, the optimal

control policy is identified with random variable τ ∗∆ ∧ ξ, which represents the optimal

time at which preventive or corrective maintenance should be carried out. Thus, without

loss of generality we may restrict the stopping problem (2.1.2) to the class of F−stopping


times τ ≤ dξ/∆e.

In the next section we derive the dynamic optimality equation, which will be analyzed

to characterize the structure of the optimal control policy.

2.2 Derivation of the Optimality Equation and Struc-

tural Properties of the Optimal Policy

In this section, we use the λ−minimization technique to transform the problem into a

stopping problem with an additive objective function, which is easier to analyze. We

derive the optimality equation and characterize the structural properties of the optimal

control policy. It is shown that the optimal preventive maintenance region is a convex

subset of Euclidean space.

We first apply the λ−minimization technique (see Aven and Bergman [3]) and trans-

form the stopping problem (2.1.2) to a parameterized stopping problem (with parameter

λ) with an additive objective function. Define for λ > 0 the value function

V λ (Π0) = infτEΠ0

(Zλτ

), (2.2.1)

where the infimum is taken over all F−stopping times τ and

Zλn = TCn − λ (n∆ ∧ ξ) . (2.2.2)

Aven and Bergman [3] showed that λ∗ determined by the equation

λ∗ = infλ > 0 : V λ (Π0) ≤ 0

(2.2.3)

is the optimal expected average cost for the stopping problem (2.1.2), and the F−stopping

time τ ∗ that minimizes the right-hand side of (2.2.1) for λ = λ∗ determines the optimal

stopping time. To simplify notation, we suppress the dependence of λ for the remainder

of the chapter. Since the process (Zn : n ∈ Z+) defined by (2.2.2) is not F−adapted, we


consider the following stopping problem

EZτ∗ = infτEZτ , (2.2.4)

where Zn = E (Zn|Fn). For any F−stopping time τ , EZτ = E (E (Zτ |Fτ)) = EZτ ,

so that (2.2.4) is equivalent to (2.2.1). Then, the observable F−adapted process (Zn :

n ∈ Z+), Zn = E (Zn|Fn), admits the following discrete-time smooth F−semimartingale

representation (see e.g. Jensen [31]),

Zn = Z0 +n∑k=1

Tk +Mn, (2.2.5)

where Tk = E(Zk − Zk−1|Fk−1), and (Mn : n ∈ Z+) is an F−martingale with M0 = 0.

To evaluate Tk, we first note that the indicator random variables Iξ≤n∆,Xξ−=i and

IXn∆=i have the following representation [10],

Iξ≤n∆,Xξ−=i =

∫ n∆

0

IXs=iqi,N+1ds+ Lin,

IXn∆=i = IX0=i +

∫ n∆

0

∑j∈X

IXs=jqjids+K in, (2.2.6)

where the processes (Lin : n ∈ Z+) and (K in : n ∈ Z+) are both (Gn)−martingales, with

Gn = σ(Y1, . . . , Yn, Xt : t ≤ n∆) ⊃ Fn. Then, using equations (2.1.3), (2.2.2) and


(2.2.6),

Tk = E(E(Zk|Fk)− E(Zk−1|Fk−1)|Fk−1)

= E (Zk − Zk−1|Fk−1)

=∑i∈X

(Ci + Cfiqi,N+1 +

∑j∈X

Cpjqij − λ

)∫ k∆

(k−1)∆

E (IXs=i|Fk−1) ds

+∑i∈X

CfiE(Lik − Lik−1|Fk−1

)+∑i∈X

CpiE(K ik −K i

k−1|Fk−1

)+E

(Cs

k∑m=1

Iξ>m∆ − Csk−1∑m=1

Iξ>m∆|Fk−1

)

=:

∫ k∆

(k−1)∆

∑i∈X

riΠs(i)ds+∑i∈X

Cfi[E (E (Lik|Gk−1) |Fk−1)− E

(Lik−1|Fk−1

)]+∑i∈X

Cpi[E (E (K i

k|Gk−1) |Fk−1)− E(K ik−1|Fk−1

)]+ CsE (Iξ>k∆|Fk−1)

=

∫ k∆

(k−1)∆

〈r,Πs〉 ds+ Cs (1− Π−k∆(N + 1)) ,

where

r = [r1, . . . , rN , 0] ,

ri = Ci + Cfiqi,N+1 +∑j∈X

Cpjqij − λ, (2.2.7)

Πs = [Πs (1) , . . . ,Πs (N + 1)] ,

Πs (i) = P (Xs = i|Fbs/∆c) ,

the inner product 〈v, w〉 = vwT , and the left hand limit Π−k∆ = limt↑k∆ Πt. Thus, (2.2.5)

simplifies to

Zn = Z0 +

∫ n∆

0

〈r,Πs〉 ds+n∑k=1

Cs (1− Π−k∆(N + 1)) +Mn, (2.2.8)

where Z0 =∑

i∈X CpiΠ0(i). The vector Πt defined in (2.2.7) is the conditional distribution

of the system stateXt given Fbt/∆c, the information at the previous sampling epoch bt/∆c.

The evolution of the vector process (Πt : t ∈ R+) is described by the following lemma.


Lemma 2.2.1. For t > 0, and given initial state distribution Π0, Πt can be obtained

iteratively as follows:

Πt = Πn∆ exp ((t− n∆)Q) , n∆ < t < (n+ 1)∆,

Πn∆ =Π−n∆diag (fYn)

〈fYn ,Π−n∆〉Iξ>n∆ + eN+1Iξ≤n∆, n ∈ N, (2.2.9)

where eN+1 = [0, . . . , 0, 1]1×N+1

, fy = [f (y|1) , . . . , f (y|N) , 0], y ∈ Rd, and diag (fy) is

the N + 1×N + 1 matrix with fy along its main diagonal and zero elsewhere.

Proof. The first equation in (2.2.9) follows since

dΠt

dt= lim

h→0+

Πt+h − Πt

h

= limh→0+

E (It+h − It|Fbt/∆c)h

= ΠtQ,

where It = [IXt=1, . . . , IXt=N+1], so that Πt = Πbt/∆c∆ exp ((t− bt/∆c∆)Q). The

second equality in the above equation follows since n∆ < t < (n + 1)∆, which implies

that for h sufficiently small, h > 0, Fb(t+h)/∆c = Fbt/∆c. The second equation in (2.2.9)

follows since for any n ∈ N, given ξ > n∆ and Yn = y ∈ Rd, Bayes’ Theorem implies

Πn∆ (i) =

f(y|i)Π−n∆(i)∑j∈X f(y|j)Π−n∆(j)

, i ∈ X

0, i = N + 1

and given ξ ≤ n∆,

Πn∆ (i) =

0, i ∈ X

1, i = N + 1

Combining the above two equations in vector form gives

Πn∆ =Π−n∆diag (fYn)

〈fYn ,Π−n∆〉Iξ>n∆ + eN+1Iξ≤n∆,

which completes the proof.


Lemma 2.2.1 implies that the Markov process (Πt) defined above has piecewise-

deterministic trajectories. Such a process is known as a piecewise-deterministic Markov

process [16]. By representation (2.2.8), the stopping problem (2.2.4) can now be explicitly

formulated as

V (Π0) = EΠ0Zτ∗

= Z0 + infτEΠ0

(τ∑n=1

[∫ n∆

(n−1)∆

〈r,Πs〉 ds+ Cs (1− Π−n∆(N + 1))

])=: Z0 + V (Π0), (2.2.10)

where the second equality follows by the optional sampling theorem since EMτ = EM0 =

0, for any F−stopping time. Then, for any probability measure Π defined on X , the

function V (Π) satisfies the following dynamic optimality equation

V (Π) = min

0,

infτ≥1

EΠ

(τ∑n=1

[∫ n∆

(n−1)∆

〈r,Πs〉 ds+ Cs (1− Π−n∆(N + 1))

])

= min

0,∫ ∆

0〈r,Πs〉 ds+ Cs (1− Π−∆(N + 1))

+∫Rd V

(Π−∆diag(f(y))

〈f(y),Π−∆〉

)〈f(y),Π−∆〉 dy

, (2.2.11)

where the first equality in (2.2.11) follows by partitioning the class of F−stopping times

into two classes: the class of stopping times τ = 0 and the class of stopping times τ ≥ 1.

The second equality in (2.2.11) follows by Lemma 1 and the strong Markov property of

(Πt : t ∈ R+). We analyze the structural properties of (2.2.11).

Since failure is observable, and upon system failure Π = [0, . . . , 0, 1]1×N+1 we consider

mandatory corrective maintenance, we need only analyze the function V (Π) over the

space of probability measures

P =

Π ∈ [0, 1]

N+1:∑i∈X

Π(i) = 1,Π(N + 1) = 0

. (2.2.12)


in which the system is known to be operational. For any g : P → R, define the operator

T (g)(Π) = min

0,∫ ∆

0〈r,Πs〉 ds+ Cs (1− Π−∆(N + 1))

+∫Rd g

(Π−∆diag(f(y))

〈f(y),Π−∆〉

)〈f(y),Π−∆〉 dy

. (2.2.13)

Then for g1, g2 : P → R and Π ∈ P ,

|T (g1)(Π)− T (g2)(Π)| ≤

∣∣∣∣∣∣∣∣∫Rd g1

(Π−∆diag(f(y))

〈f(y),Π−∆〉

)〈f(y),Π−∆〉 dy

−∫Rd g2

(Π−∆diag(f(y))

〈f(y),Π−∆〉

)〈f(y),Π−∆〉 dy

∣∣∣∣∣∣∣∣≤

∫Rd

∣∣∣∣∣∣∣∣g1

(Π−∆diag(f(y))

〈f(y),Π−∆〉

)−g2

(Π−∆diag(f(y))

〈f(y),Π−∆〉

)∣∣∣∣∣∣∣∣ 〈f(y),Π−∆〉 dy

≤ ‖g1 − g2‖∫Rd〈f(y),Π−∆〉 dy

≤ ‖g1 − g2‖maxi∈XP (ξ > ∆|X0 = i)

=: ‖g1 − g2‖ β,

so that ‖T (g1)− T (g2)‖ ≤ β ‖g1 − g2‖ for some 0 < β < 1. Thus, the operator T defined

in (2.2.13) is a contraction operator.

Bertsekas and Shreve [7] (p. 55, Proposition 4.2) showed that the contraction property

of T defined in (2.2.13) implies that the function V (Π) is the unique solution of the

optimality equation (2.2.11), and can be obtained as the limit

V (Π) = limn→+∞

T n(0)(Π)

= limn→+∞

V n+1(Π), (2.2.14)

where V n+1(Π) is the value function for the (n + 1)−stage stopping problem, satisfying

the dynamic equation:

V n+1(Π) = min

0,∫ ∆

0〈r,Πs〉 ds+ Cs (1− Π−∆(N + 1))

+∫Rd V n

(Π−∆diag(f(y))

〈f(y),Π−∆〉

)〈f(y),Π−∆〉 dy

V 0(Π) = 0 (2.2.15)


Using (2.2.14) and (2.2.15) we have the following result

Lemma 2.2.2. The function V : P → R is concave.

Proof. We use mathematical induction. By equation (2.2.9), the terms∫ ∆

0〈r,Πs〉 ds and

Cs (1− Π−∆(N + 1)) in (2.2.15) are linear, and hence concave. Since the operator ‘min’

preserves concavity, for the base case n = 1, V 1(Π) is concave. Assume now that for

some n ∈ N, V n(Π) is concave. We need only show that the last term on the right-hand

side of equation (2.2.15) is concave. For any constant α ∈ [0, 1] and probability measures

Π,Γ ∈ P , we put θ =〈f(y),αΠ−∆〉

〈f(y),(αΠ−∆+(1−α)Γ−∆)〉 ∈ [0, 1]. Then,

∫RdV n

((αΠ−∆ + (1− α)Γ−∆) diag (f(y))

〈f(y), (αΠ−∆ + (1− α)Γ−∆)〉

)〈f(y), (αΠ−∆ + (1− α)Γ−∆)〉 dy

=

∫RdV n

θΠ−∆diag(f(y))

〈f(y),Π−∆〉+(1− θ)Γ−∆diag(f(y))

〈f(y),Γ−∆〉

〈f(y), (αΠ−∆ + (1− α)Γ−∆)〉 dy

≥∫Rd

θV n

(Π−∆diag(f(y))

〈f(y),Π−∆〉

)+(1− θ)V n

(Γ−∆diag(f(y))

〈f(y),Γ−∆〉

) 〈f(y), (αΠ−∆ + (1− α)Γ−∆)〉 dy

= α

∫RdV n

(Π−∆diag (f(y))

〈f(y),Π−∆〉

)〈f(y),Π−∆〉 dy

+(1− α)

∫RdV n

(Γ−∆diag (f(y))

〈f(y),Γ−∆〉

)〈f(y),Γ−∆〉 dy,

where the inequality follows since V n(Π) is concave by the induction hypothesis. Thus,

V n+1(Π) is concave, and by (2.2.14) it follows that the limit V (Π) = limn→+∞

V n+1(Π) is

also concave, which completes the proof.

Lemma 2.2.2 implies that the optimal preventive replacement region defined by

R =

Π ∈ P : V (Π) ≥ 0

(2.2.16)

is a convex subset of P , and the optimal control policy is determined by the following

procedure:


Theorem 2.2.3. For λ = λ∗, at sampling epoch n∆,

1. If Πn∆ ∈ R, full preventive maintenance is carried out. Otherwise, run the system

until the next sampling epoch (n+ 1)∆.

2. Corrective maintenance is carried out immediately upon system failure.

We now present an iterative algorithm based on the λ−minimization technique and

the contraction property, for the computation of the optimal expected average cost and

the optimal control policy. Recall, by equation (2.2.10), since Π0 = [1, 0, . . . , 0]1×N+1

the

original value function V λ(Π0) defined in (2.2.1) is related to Vλ(Π0) via the equation

V λ(Π0) = Vλ(Π0) + Cp1.

The Algorithm

Step 1. Choose ε > 0 and lower and upper bounds of λ, λ ≤ λ ≤ λ.

Step 2. Put λ = (λ+ λ)/2, and Vλ

0 ≡ 0, n = 1.

Step 3. Calculate Vλ

n = T (Vλ

n−1) using equations (2.2.13) and (2.2.15). Stop the

iteration of Vλ

n when ||V λ

n − Vλ

n−1|| ≤ ε, and put Vλ

= Vλ

n, and

V λ(Π0) = Vλ(Π0) + Cp1.

Step 4. If V λ(Π0) < −ε, put λ = λ and go to Step 2.

If V λ(Π0) > ε, put λ = λ and go to Step 2.

If |V λ(Π0)| ≤ ε, put λ∗ε = λ and stop. λ∗ε approximates the optimal average

cost.

Proposition 2.2.4. For any δ > 0, we can always choose ε > 0 sufficiently small such

that λ∗ε obtained from the algorithm above approximates the optimal average cost rate λ∗,

i.e., |λ∗ − λ∗ε | ≤ δ.

Proof. In the general theory of the λ-minimization technique (Proposition A.2.), Aven

and Bergman [3] proved that for any λ > 0, if λ > λ∗ then the value function satisfies


V λ(Π0) < 0, where λ∗ is the optimal average cost rate. Similarly, if λ < λ∗ then the value

function V λ(Π0) > 0, and if λ = λ∗ then the value function V λ(Π0) = 0. Furthermore,

the authors proved that the mapping λ 7→ V λ(Π0) is non-increasing and concave. By

Proposition 2.9., p. 29, of Avriel et al. [4], it follows that the mapping λ 7→ V λ(Π0)

is continuous. Therefore, it follows that for any δ > 0, we can always choose ε > 0

sufficiently small such that λ∗ε obtained from the algorithm above satisfies |λ∗ − λ∗ε | ≤ δ,

where λ∗ is the optimal average cost rate.

In the algorithm above, since λ > 0, a natural choice for the initial value of λ the

lower bound is 0. However, it is not clear how one should choose the value of the initial

upper bound λ. The following result provides a feasible choice of the initial upper bound.

Proposition 2.2.5. The optimal average cost is bounded by

0 < λ∗ ≤ qN,N+1

∆ max Ci+ max Cfi+ max Cpi+ Cs1− e−qN,N+1∆

.

Thus, in the algorithm given above,

λ = 0 and λ = qN,N+1


are feasible initial values for lower and upper bounds, respectively.

Proof. Consider the policy that initiates preventive maintenance at time ∆, which we

identify with the stopping time τ1 ≡ 1. From the renewal-reward theorem (see e.g.

Grimmett and Stirzaker [26], p. 431) and equation (2.1.3), the long-run expected average

cost per unit time for this policy, which we denote as λ1, has the upper bound given by

λ1 =EΠ0

(TCτ1)

EΠ0(τ1∆ ∧ ξ)

≤ ∆ max Ci+ max Cfi+ max Cpi+ Cs∫ ∆

0e−qN,N+1sds

= qN,N+1


,


where the first inequality follows from the non-decreasing failure rate assumption qi,N+1 ≤

qj,N+1, i < j. Thus, it follows that

0 < λ∗ = infτ

EΠ0(TCτ)

EΠ0(τ∆ ∧ ξ)

≤ EΠ0(TCτ1)

EΠ0(τ1∆ ∧ ξ)

=: λ1

≤ qN,N+1


.

Therefore, the optimal average cost is bounded by

0 < λ∗ ≤ qN,N+1


.


In the next section, we analyze the three-state version of our model in detail.

2.3 Optimality of Bayesian Control Chart

In this section we analyze the three-state version of the problem in detail and show that

the optimal control policy is a control limit policy. For practical purposes, it is usually

sufficient to consider two working states: a good state and a warning state. The state

process (Xt : t ∈ R+) has state space X = 1, 2 ∪ 3, where state 1 represents an

unobservable good state, state 2 represents an unobservable warning state, and state 3

is the observable failure state. In this case, the generator of the Markov chain takes the

form

Q =

−(q12 + q13) q12 q13

0 −q23 q23

0 0 0

, (2.3.1)


where q12, q13, q23 ∈ (0,+∞). Using the Kolmogorov backward differential equations we

explicitly solve for the transition probability matrix

P(t) = [pij(t)]

=

e−υ1t

q12(e−υ2t−e−υ1t)υ1−υ2

1− e−υ1t − q12(e−υ2t−e−υ1t)υ1−υ2

0 e−υ2t 1− e−υ2t

0 0 1

, (2.3.2)

where transition probabilities pij(t) = P (Xt = j|X0 = i), i, j ∈ X , and constants

υ1 = q12 + q13, υ2 = q23.

We now prove an intuitive result, which will be important in showing that a control

limit policy is optimal for the three-state model. The proof makes use of a classical

age-based policy result of Barlow and Hunter [6]. Any control policy determined by a

stopping time τ that is equal to a deterministic constant n is known as an age-based

policy.

Theorem 2.3.1. Under the model assumptions made in Section 2.1, the control policy

that never carries out preventive maintenance, i.e. τ =∞, is not optimal.

Proof. Consider the age-based policy that carries out preventive maintenance after n

periods. From the renewal-reward theorem (see e.g. Grimmett and Stirzaker [26], p.

431), the long-run expected average cost per unit time for this policy is given by

g(n)

=

∑i∈X

(Cfiqi3 + Ci)

∫ n∆

0

p1i(s)ds+ CsE [n ∧ bξ/∆c] +∑i∈X

Cpip1i(n∆)

E [n∆ ∧ ξ]. (2.3.3)

Thus, to prove the claim, it suffices to show that

arg minn

g(n) < +∞. (2.3.4)

To show (2.3.4), we derive an upper bound on arg minn g(n) by considering a special case

of cost parameters Cfi, Cpi, Ci, Cs for which preventive maintenance must be carried out at


a later time. In particular, we choose corrective maintenance costs Cfi = min Cfi =: Cf

all equal to the cheapest corrective maintenance cost, and preventive maintenance costs

Cpi = max Cpi =: Cp all equal to the most expensive preventive maintenance cost. We

also impose no penalty for operating system the longer, i.e. Ci = 0 and Cs = 0. Then,

if preventive maintenance is scheduled after n periods, the expected average cost using

cost parameters Cfi, Cpi, Ci, Cs is given by

h(n) =CfF (n∆) + CpF (n∆)∫ n∆

0F (s)ds

, (2.3.5)

where F (t) = p13(t) is the distribution function of ξ and F (t) = 1 − F (t). Since terms∫ n∆

0

p1i(s)ds and E [n ∧ bξ/∆c] in the numerator of (2.3.3) are increasing in n, the term

p11(n∆) = e−υ0n∆ is decreasing in n, and we have assumed in Section 2.1 that Cp1 ≤ Cp2,

by choice of cost parameters Cfi, Cpi, Ci, Cs, equations (2.3.3) and (2.3.5) imply

arg minn

g(n) ≤ arg minn

h(n).

We now appeal to a classical age-based policy result of Barlow and Hunter [6] to show

arg minn h(n) < +∞. Since we have assumed q13 < q23, the failure rate of ξ is increasing.

We have also assumed that Cp < Cf . Barlow and Hunter [6] showed that under these

hypotheses, there exists a positive real value t∗ < +∞ such that t∗ is the unique minimizer

of h(t). For our problem arg minn h(n) is required to be integer-valued. However, since

t∗ is a unique minimizer, the function h(t) is increasing for t > t∗. Thus, it follows that

arg minn h(n) ≤ dt∗e < +∞, which completes the proof.

We are now ready to state and prove the main result of this section.

Theorem 2.3.2. The optimal control policy for the three-state model is a control limit

policy. In particular, there exists control limit Π ∈ (0, 1], such that the optimal control

policy is determined by the following procedure. At sampling epoch n∆,

1. If Πn∆(2) ≥ Π, full preventive maintenance is carried out. Otherwise, run the

system until the next sampling epoch (n+ 1)∆.



Proof. For the three-state version of the problem, the space P defined in (2.2.12) takes

the form

P = [1− π, π, 0] : π ∈ [0, 1] , (2.3.6)

which is the line segment in R3 connecting points e1 = [1, 0, 0] and e2 = [0, 1, 0]. By

Lemma 2.2.2 that the optimal control region R defined in (2.2.16) is a convex subset

of P . Thus, to prove that a control limit policy is optimal it suffices to show that the

function V (e2) = 0. We note that the control limit must be strictly greater than 0,

i.e. Π > 0. This is because if Π = 0, then R = P , which implies that the policy

will immediately initiate preventive maintenance at the beginning of each cycle. Since

preventive maintenance times are assumed to be zero, the long-run average cost rate for

this policy will be infinite. Therefore, if we can show that V (e2) = 0, necessarily the

control limit Π ∈ (0, 1]. To prove that V (e2) = 0, we use mathematical induction. For

n = 1, using equation (2.2.15),

V 1(e2) = min

0, r2

∫ ∆

0

p22(s)ds+ (Cs + V0(e2)) p22(∆)

= min

0, r2

∫ ∆

0

p22(s)ds+ Csp22(∆)

. (2.3.7)

We assume now that V 1(e2) < 0 and draw a contradiction. Since it is not optimal to

carry out preventive maintenance when the system is in a good state, V 1(e1) < 0. If

V 1(e2) < 0, then equation (2.2.15) and linearity of∫ ∆

0〈r,Πs〉 ds and Cs (1− Π−∆(N + 1))

imply that V 1(Π) < 0 for all Π ∈ P . Since V n(Π) ≥ V n+1(Π) for all n ∈ N, it follows

that the limit V (e2) = limn→∞

V n(e2) < 0, and the policy that never carries out preventive

maintenance, i.e. τ = ∞, is optimal. This is a direct contradiction of Theorem 2.3.1.

Thus, it follows that V 1(e2) ≥ 0 and by equation (2.3.7) we have the following inequality

r2

∫ ∆

0

p22(s)ds+ Csp22(∆) ≥ 0. (2.3.8)


Suppose now that for some n ∈ N, V n(e2) = 0. Using inequality (2.3.8),

V n+1(e2) = min

0, r2

∫ ∆

0

p22(s)ds+ Csp22(∆)

= 0,

which completes the inductive step. Therefore, the function V (e2) = limn→∞

V n(e2) = 0,


Theorem 2.3.2 shows that the optimal control policy for the three-state model can be

represented as a control chart, which monitors the posterior probability Πn∆(2) that the

system is in a warning state 2. Once Πn∆(2) exceeds a fixed control limit Π ∈ (0, 1], full

preventive maintenance is carried out. Unlike the general N−state model, the control

limit policy for the three-state model has the advantage that it is no longer parameterized

by λ, which is an extremely useful property from a computational point of view. In the

next section, we develop an efficient computational algorithm to determine the optimal

control limit Π∗ ∈ (0, 1], as well as the optimal long-run average cost λ∗, for the three-

state model.

2.4 Computation of the Optimal Policy

In this section, we develop an efficient computational algorithm for the three-state model

based the control limit policy described in Theorem 2.3.2. The objective is to determine

the optimal value of the control limit Π∗ ∈ (0, 1] that minimizes the long-run expected

average cost per unit time. Using the policy of Theorem 2.3.2, we analyze the dynamics of

the posterior probability Πn∆(2) in the semi-Markov decision process (SMDP) framework.

In particular, for fixed control limit Π ∈ (0, 1], we partition the interval [0,Π) into

M ∈ N disjoint subintervals Im = [lm, um), where lm = m−1

MΠ and um = m

MΠ, m =

1, . . . ,M . The set I = I1, . . . , IM is taken as the state space of the following SMDP.

Let tn be the time of the nth decision epoch. Then, the SMDP is defined to be in state


Im ∈ I provided the current value of the posterior probability Πtn(2) ∈ [lm, um). The

time of the next decision epoch is taken as tn+1 = (tn + ∆) ∧ ξ. To follow the policy

of Theorem 2.3.2, we impose the following actions. If tn+1 = ξ, mandatory corrective

maintenance is carried out so that at the (n + 1)th decision epoch the SMDP returns

to state I1 =[0,Π/M

). Similarly, if tn+1 = (tn + ∆) and Πtn+1

(2) ≥ Π, full preventive

maintenance is carried out so that at the (n+ 1)th decision epoch the SMDP also returns

to state I1 =[0,Π/M

).

With this definition of the state and decision epoch times of the SMDP, for the long-

run average cost criterion, the SMDP is determined by the following quantities [71]:

pmk = the probability that the SMDP will be in state k ∈ I at the

next decision epoch given the current state is m ∈ I.

τm = the expected sojourn time until the next decision epoch given

current state is m ∈ I.

cm = the expected cost incurred until the next decision epoch given

current state is m ∈ I.

Using quantities defined above, for fixed control limit Π ∈ (0, 1], the long-run expected

average cost λ(Π) can be obtained by solving the following system of linear equations,

υm = cm − λ(Π)τm +∑k∈I

pmkυl, for each m ∈ I, (2.4.1)

υl = 0, for some l ∈ I,

and the optimal control limit Π∗ ∈ (0, 1] and corresponding optimal average cost λ∗ =

infΠ∈(0,1]

λ(Π), can be computed using the equations (2.4.1).

The remainder of this section is devoted to explicitly computing the quantities pmk,

τm, cm, m, k ∈ I, defined above. To simplify notation in the derivations that follow, we

write Wn = Πn∆(2) for the posterior probability the system is in a warning state 2 at


sampling epoch time n∆. Bayes’ Theorem implies,

Wn+1 =f (Yn+1|2) (p12(∆) (1−Wn) + p22(∆)Wn) f (Yn+1|1) p11(∆) (1−Wn)

+f (Yn+1|2) (p12(∆) (1−Wn) + p22(∆)Wn)

Iξ>(n+1)∆. (2.4.2)

Straightforward algebra shows that the ratio of normal densities has the following repre-

sentation

f(y|1)

f(y|2)=

det−1/2 (Σ1)

det−1/2 (Σ2)·

exp(− 1

2(y − µ1)

TΣ−1

1 (y − µ1))

exp(− 1

2(y − µ2)

TΣ−1

2 (y − µ2))

=: h exp((y − b)TA(y − b) + c

),

where

h =det−1/2 (Σ1)

det−1/2 (Σ2), b = −1

2A−1

(µT1 Σ−1

1 + µT2 Σ−12

)T,

A =1

2(Σ−1

2 − Σ−11 ) , c =

µT2 Σ−12 µ2 − µT1 Σ−1

1 µ1

2− bTAb,

so equation (2.4.2) simplifies to

Wn+1 =p12(∆) (1−Wn) + p22(∆)Wn h exp(GTn+1AGn+1 + c

)p11(∆) (1−Wn)

+p12(∆) (1−Wn) + p22(∆)Wn

Iξ>(n+1)∆, (2.4.3)

where Gn+1 := Yn+1 − b. From equation (2.4.3) we have the following result

Theorem 2.4.1. At sampling epoch n∆, for any t ∈ R+, the conditional reliability

function of ξ,

P (ξ > n∆ + t|Fn∆)

= ((1−Wn) (1− p13(t)) +Wn (1− p23(t))) · Iξ>n∆

=: RWn(t) · Iξ>n∆, (2.4.4)


and for any w ∈ [0, 1], the conditional distribution function of Wn+1,

P (Wn+1 ≤ w|Fn∆)

= RWn(∆)

∑i∈X

P(GTn+1AGn+1 ≥ gWn

(w)|X(n+1)∆ = i)γWn

(i) · Iξ>n∆

+Iξ≤n∆ (2.4.5)

=: FWn(w) · Iξ>n∆ + Iξ≤n∆,

where

gWn(w) = ln

(p12(∆) (1−Wn) + p22(∆)Wn

p11(∆) (1−Wn)

(1− wwh

))− c

γWn(i) =

p1i(∆) (1−Wn) + p2i(∆)Wn

RWn(∆)

.

Proof. We first prove the formula for the conditional reliability function of ξ. Conditional

on ξ > n∆, for any t ∈ R+, Bayes’ Theorem implies

P (ξ > n∆ + t|Fn∆)

= P (Xn∆+t 6= 2|Fn∆)

= P (Xn∆+t 6= 2|Xn∆ = 1) (1−Wn) + P (Xn∆+t 6= 2|Xn∆ = 2)Wn

= ((1−Wn) (1− p13(t)) +Wn (1− p23(t))) ,

and conditional on ξ ≤ n∆, P (ξ > n∆ + t|Fn∆) = 0. Combining the equations gives

(2.4.4). We next prove the formula for the conditional distribution function of Wn+1.

Conditional on ξ > n∆, for any w ∈ [0, 1], equation (2.4.3) and Bayes’ Theorem imply

P (Wn+1 ≤ w|Fn∆)

= RWn(∆)P (Wn+1 ≤ w|ξ > (n+ 1)∆, Y1, . . . , Yn)

= RWn(∆)

∑i∈X

P(GTn+1AGn+1 ≥ gWn

(w)|X(n+1)∆ = i)

·P (X(n+1)∆ = i|ξ > (n+ 1)∆, Y1, . . . , Yn)

,


where

P (X(n+1)∆ = i|ξ > (n+ 1)∆, Y1, . . . , Yn)

=1

RWn(∆)

P (X(n+1)∆ = i|ξ > n∆, Y1, . . . , Yn)

=p1i(∆) (1−Wn) + p2i(∆)Wn

RWn(∆)

= γWn(i),

and conditional on ξ ≤ n∆, P (Wn+1 ≤ w|Fn∆) = 1. Combining the equations gives

(2.4.5), which completes the proof.

Provost and Rudiuk [59] derived an explicit formula for both the density (Theorem

2.1, pp. 386) and distribution function (Theorem 3.1, pp. 391) of indefinite quadratic

forms in normal vectors GTAG, where G is any multivariate normal Nd (µ,Σ) and A is any

d×d symmetric matrix. By definition Gn := Yn−b, so that Gn|Xn∆ = i ∼ Nd (µi − b,Σi)

and P(GTn+1AGn+1 ≥ gWn

(w)|X(n+1)∆ = i)

in equation (2.4.5) can be computed explicitly

using Theorem 3.1. of Provost and Rudiuk [59].

Using equations (2.4.4) and (2.4.5) we now can easily evaluate the quantities pmk, τm,

cm, m, k ∈ I. Suppose at time n∆, the process is in state Im ∈ I, then for M large,

Wn ≈ lm and we can approximate the transition probabilities

pImI1 = (1−Rlm(∆)) +(1− Flm

(Π))

+ (Flm (u1)− Flm (l1)) .

The first term on the right-hand side of the equation above is the probability the system

will fail before the next observation epoch, the second and third terms are the probabilities

the system will not fail before the next sampling epoch (n + 1)∆ and the posterior

probability Wn+1 will enter either the preventive maintenance region [Π, 1] or interval

I1 = [l1, u1), respectively. In all three cases, the state of the SMDP at the next decision

epoch is I1. The remaining transition probabilities of the SMDP have a simpler structure

and are given by

pImIk = Flm (uk)− Flm (lk) , k = 2, . . . ,M.


Using equation (2.4.4), the mean sojourn time

τIm =

∫ ∆

0

Rlm(s)ds

= (1− lm)

(∆−

∫ ∆

0

p13(s)ds

)+ lm

(∆−

∫ ∆

0

p23(s)ds

),

and the mean cost

cIm =∑i∈X

(Ci + Cfiqi3)

((1− lm)

∫ ∆

0

p13(s)ds+ lm

∫ ∆

0

p23(s)ds

)+Rlm(∆)

∑i∈X

CpiP(GT

1AG1 ≤ glm(Π)|X∆ = i)γlm(i) + CsFlm

(Π),

so that quantities pmk, τm, cm, m, k ∈ I, can now be computed explicitly and the optimal

control limit Π∗ ∈ (0, 1] and corresponding optimal average cost λ∗ = infΠ∈(0,1] λ(Π), can

be computed using the equations (2.4.1).

Example. We now illustrate the computational procedure with a numerical example

using model parameters from a mining industry application. In Chapter 4, we will show

how the model parameters can be estimated using historical data. We consider a failing

transmission unit with state generator

Q =

−0.0304 0.0303 0.0001

0 −0.3548 0.3548

0 0 0

Every ∆ = 600 hours, oil samples are collected and spectrometric analysis is carried out

which provides the concentrations in parts per million (ppm) of d = 2 wear elements. At

each sampling epoch n∆, the bivariate vector Yn follows N (µ1,Σ1) when the system is

in healthy state 1 and N (µ2,Σ2) when the system is in warning state 2, where

µ1 =

1.1

1.9

, µ2 =

4.1

5.5

, Σ1 =

7.2 2.0

2.0 3.6

, Σ2 =

7.6 1.0

1.0 3.2

.The known cost parameters C1 = C2 = 0, Cf1 = Cf2 = 6780, Cp1 = 450, Cp2 = 1560,

however the sampling cost Cs is unknown. We analyze the effect Cs has on the optimal


Figure 2.4.1: Optimal control limit Π∗

vs sampling cost Cs.

control limit Π∗ ∈ (0, 1] and optimal average cost rate λ∗. We chose partition parameter

M = 25, and using MATLAB we compute optimal average cost using the system of linear

equations (2.4.1), and obtain the following results in Table 2.4.1.

Table 2.4.1: Effect of varying sampling costs Cs on Π∗

and λ∗

Cs 0 5 10 50 100 500

Π∗

0.1894 0.1857 0.1856 0.1847 0.1837 0.1742

λ∗ 104.59 109.40 114.21 152.69 200.78 585.52

Table 2.4.1 shows that as the sampling cost Cs increases, the optimal control limit

Π∗

decreases and the optimal average cost λ∗ increases. We graph the results in Figures

2.4.1 and 2.4.2. For example if Cs = 10, the optimal control Π∗

= 0.1857 and optimal

average cost λ∗ = 109.40. We illustrate the use of this optimal control limit policy

on a sample data history in Figure 2.4.3. The control chart shows that the posterior

probability Πn∆(2) that the system is in a warning state 1 exceeds the control limit at

the 14th sampling epoch. At this point, full preventive maintenance is carried out.


Figure 2.4.2: Optimal average cost rate λ∗ vs sampling cost Cs.

Figure 2.4.3: Optimal control limit policy.


2.5 Conclusions and Future Research

We have considered an optimal control problem with costly multivariate observations

carrying partial information about the system state. The state process follows an unob-

servable continuous time homogeneous Markov process. The objective was to determine

the optimal replacement policy that minimizes the long-run expected average cost per

unit time. We have characterized the structure of the optimal replacement policy and

have shown that the optimal preventive maintenance region is a convex subset of Eu-

clidean space. We have also analyzed the three-state version of this problem in detail

and have shown that the optimal policy is a control limit policy. An efficient computa-

tional algorithm was developed in the semi-Markov decision process framework for the

three-state problem with an illustrative numerical example.

We suggest a few possible directions for future research. In some applications, it

may be appropriate to allow for preventive maintenance at any real-valued time, not

just at sampling epochs n∆. This could be an interesting topic for future research. In

this case, the optimal stopping problem must be formulated considering a continuous-

time filtration defined by Ft = σ (Y1, . . . , Yb(t∧ξ)/∆c, ξIξ≤t, Iξ>t), t ∈ R+. An interesting

comparison could then be made to determine how much additional cost savings can be

obtained by allowing preventive maintenance to be taken in continuous time. Another

interesting extension would be to allow the state sojourn times, which are exponentially

distributed, to have more general distributions such as an Erlang, Weibull or Gamma

distribution. In the literature, such models are referred to as hidden semi-Markov models

(HSMM). Typically, HSMM are more difficult to analyze due to the loss of the Markov

(i.e., memoryless) property. A final possible future research topic would be to test the

effectiveness of our methodology on other real-world data sets such as vibration data,

performance, or quality monitoring data using the availability maximization criterion,

which is sometimes preferable in practice to the cost minimization criterion. This should

also lead to a further refinement of both the model and control algorithm.

Chapter 3

Optimal Sampling and Control of

Stochastically Failing Systems

Modern manufacturing and production industries rely heavily on complex technical sys-

tems for their everyday operations. These systems typically deteriorate and are subject

to breakdowns due to usage and age. The high cost associated with unplanned break-

downs has stimulated a lot of research activity in the maintenance optimization literature,

where the main focus has been on determining the optimal time to preventively repair

or replace a system before it fails. One of the earliest and most significant contributions

to this class of problems is the celebrated paper of Barlow and Hunter [6]. More recent

contributions are given by Dogramaci and Fraiman [21], Heidergott and Farenhorst-Yuan

[28], Kurt and Kharoufeh [44], and Kim et al. [38], among others.

The most advanced state of the art maintenance program applied in practice is known

as condition-based maintenance (CBM), which recommends maintenance actions based

on information collected through online condition monitoring. CBM initiates mainte-

nance actions only when there is strong evidence of severe system deterioration, which

significantly reduces maintenance costs by decreasing the number of unnecessary main-

tenance operations. For a recent overview of the mathematical models and technologies

32

Chapter 3. Optimal Sampling and Control of Stochastic Systems 33

used in CBM readers are referred to Jardine et al. [30] and the references therein.

The common assumption made in CBM optimization models is that information

used for decision-making is obtained at periodic equidistant sampling epochs. Under this

assumption, the goal is to determine the optimal maintenance policy that optimizes an

objective function over a finite or infinite time horizon. Recent contributions are given by

Dayanik and Gurler [18], Makis and Jiang [51], Wang et al. [73], and Juang and Anderson

[34]. The problem with the equidistant sampling assumption is that in many applications

there is a high sampling cost associated with collecting observable data. It is therefore

of equal importance to determine when information should be collected as it is to decide

how this information should be utilized for maintenance decision-making. This type of

joint optimization has been a long-standing open problem in the operations research and

maintenance optimization literature, but very few result regarding the structure of the

optimal sampling and maintenance policy have been published.

An excellent early contribution to the joint optimization problem was given by Ohnishi

et al. [58], who considered a deteriorating system with N fully observable states. Under

reasonable monotonicity assumptions, the authors were able to partially characterize the

form of the optimal policy and showed that the times between successive samples are

monotonically decreasing. Ross [63] considered a similar problem in the area of quality

control in which the system state is only partially observable. Under the expected total

discounted reward criterion, the author showed that for a two state model, the optimal

policy is characterized by four control regions. Other early noteworthy contributions are

the models of Anderson and Friedman [1], Kander [35] and Rosenfield [62]. More recently,

Yeh [82] modelled a general N state sampling and maintenance problem in the Markov

decision process framework. The author proposed a number of different algorithms to

derive the optimal sampling and maintenance policy, but was not able to characterize its

form. The models of by Dieulle et al. [20], Lam and Yeh [45], and Jiang [32] are other

recent contributions. It should be noted that no optimality results have been published


for partially observable failing systems with the long-run average cost criterion.

In this chapter, we consider a system whose state information is unobservable and can

only be inferred by taking a sample through condition monitoring. System failure on the

other hand is fully observable. The decision maker can decide when condition monitoring

information should be collected, as well as when to initiate full system inspection, followed

possibly by preventive maintenance. The objective is to characterize the structural form

of the optimal sampling and maintenance policy that minimizes the long-run expected

cost per unit time. The problem is formulated as a partially observable Markov decision

process (POMDP). It is shown that monitoring the posterior probability that the system

is in a so-called warning state is sufficient for decision-making. The primary contribution

of this chapter is that we prove the optimality of a sampling and maintenance policy that

is characterized by three critical thresholds, which have practical interpretation and give

new insight into the value of condition monitoring information.

The remainder of the chapter is organized as follows. In §3.1, we formulate and analyze

the joint optimization problem in the POMDP framework. In §3.2, we determine the

structural properties of the optimal policy. The dynamic optimality equation is derived

and we establish the form of the optimal sampling and maintenance policy. In §3.3, we

develop an iterative algorithm to compute the optimal policy and the long-run expected

average cost per unit time. We also provide numerical comparisons with other suboptimal

policies that illustrate the benefits of the joint optimization of sampling and maintenance.

Concluding remarks and future research directions are provide in §3.4.


Consider a system that can be characterized by one of three distinguishable states: a

healthy state (state 1), a warning state (state 2), and a failure state (state 3). Recent

studies have found through experiments with real diagnostic data such as spectrometric


oil data (e.g. [38], [53]) and vibration data (e.g. [80]), that it is usually preferable and

sufficient to consider only two operational states - a healthy state and a warning state.

Such a characterization has the desirable property that maintenance actions are only

initiated when the system experiences severe deterioration that can actually cause failure.

In many cases, the system moves through two distinct phases of operation. In the first

and longer phase, the system operates under normal conditions, and the observations

behave in a stationary manner. Although system degradation can be gradual, it is

usually not until degradation has exceeded a certain level that the behaviour of the

condition monitoring observations changes substantially. At this point, the system enters

the second and shorter phase, which we define to be the warning state.

Let (Ω,F , P ) be a complete probability space on which the following stochastic pro-

cesses are defined. The state process (Xt : t ∈ R+) is modeled as a continuous time

homogeneous Markov chain with state space X = 1, 2∪3 and transition rate matrix

Q = (qij). To model monotonic system deterioration, the state process is non-decreasing

with probability 1, i.e. qij = 0 for all j < i. In particular, this implies that without

corrective maintenance the failure state is absorbing. The system is more likely to fail

in warning state 2 than in healthy state 1, i.e. q23 > q13. Let ξ = inf t ∈ R+ : Xt = 3

be the observable time of system failure. Upon system failure, mandatory corrective

maintenance that takes TF time units is performed at a cost CF , which brings the system

to a healthy state 1.

To avoid costly failures, the decision maker can take a sample at a cost CS. In

real applications, taking and processing a sample through condition monitoring, such

as an oil sample, cannot be done instantaneously due to the time it takes to collect

the sample and process it at a laboratory. Therefore, we assume that the processing

time of the sample is ∆ ∈ (0,+∞) time units, so that if a sample is taken at time t,

information from the sample is first available to be used for decision making at time

t + ∆. We therefore naturally assume that the decision maker has the opportunity to


take (or not take) samples only at time points 0,∆, 2∆, 3∆, . . .. Condition monitoring

information at time n∆ is denoted Yn and takes values in E = 1, . . . , L. Samples

Yn are stochastically related to the operational system state Xn∆. In particular, while

the system is in operational state Xn∆ = i ∈ 1, 2, sample Yn has state-dependent

distribution

diy = P (Yn = y|Xn∆ = i), y ∈ E . (3.1.1)

The state-observation matrix is denoted D = (diy).

Upon receiving information from a condition monitoring sample, the decision maker

can initiate full system inspection to reveal (with probability 1) the current state of

the system at a cost CI . If the system is found to be in warning state 2, preventive

maintenance is performed which brings the system to a healthy state 1 at a cost CP . If

the system is found to be in healthy state 1, no preventive maintenance is performed and

the process continues. Full system inspection and preventive maintenance takes TI and

TP time units, respectively. We make the standard assumption CF ≥ CI +CP . For every

time unit the system remains in warning state 2, an operating cost CW is incurred. The

objective is to characterize the structural form of the optimal sampling and maintenance

policy that minimizes the long-run expected average cost per unit time. The problem

can be formulated in the POMDP framework as follows.

While the system is operational, one of the following three actions an ∈ 1, 2, 3 must

be taken at each decision epoch time n∆:

1. Do nothing, and take an action at the next decision epoch time (n+ 1)∆.

2. Take a sample. Information from the sample Yn+1 is first made available for decision-

making at the beginning of the next decision epoch time (n+ 1)∆.

3. Initiate full system inspection, followed possibly by preventive maintenance.

If n∆ time units have elapsed since the last maintenance action (full inspection,

preventive maintenance or corrective maintenance) and k samples Yn1, . . . , Ynk have been


collected at time points 0 < n1∆ < · · · < nk∆ ≤ n∆, then it is well known from the

theory of POMDPs (e.g. [7]) that

Πn = P (Xn∆ = 2|ξ > n∆, Yn1, . . . , Ynk), (3.1.2)

the probability that the system is in warning state 2 given all available information until

time n∆, represents sufficient information for decision-making at the nth decision epoch.

Then, if an optimal stationary policy exists, it has the functional form φ(π) ∈ 1, 2, 3,

0 ≤ π ≤ 1, where φ(π) indicates the action an to be chosen when Πn = π. Let Φ be the

class of all stationary policies. From renewal theory, the long-run expected average cost

per unit time is calculated for any stationary policy φ ∈ Φ as the expected total cost

TC(φ) incurred in one cycle divided by the expected cycle length CL(φ), where a cycle

is completed when either full system inspection, preventive maintenance or corrective

maintenance is carried out.

For any stationary policy φ ∈ Φ, let

M(φ) = inf n∆ ∈ R+ : φ(Πn) = 3 (3.1.3)

represent the first time at which full system inspection is initiated, and let

N(φ) = |n : φ(Πn) = 2, n∆ < M(φ) ∧ ξ| (3.1.4)

represent the total number of samples collected in a cycle. Then, from the model de-

scription given above,

TC(φ) = CSN(φ) +

∫ M(φ)∧ξ

0

CW IXt=2dt+ CIIXM(φ)∧ξ=1

+(CI + CP )IXM(φ)∧ξ=2 + CF IXM(φ)∧ξ=3, (3.1.5)

and

CL(φ) = (M(φ) ∧ ξ) + TIIXM(φ)∧ξ=1 + (TI + TP )IXM(φ)∧ξ=2 + TF IXM(φ)∧ξ=3. (3.1.6)


For the average cost criterion, the problem is to find a stationary policy φ∗ ∈ Φ, if it

exists, minimizing the long-run expected average cost per unit time given by

EΠ0[TC(φ)]

EΠ0[CL(φ)]

, (3.1.7)

where EΠ0is the conditional expectation given Π0 = P (X0 = 2). We assume that a new

system is installed at the beginning of the first cycle, i.e. Π0 = 0.

We first transform the stochastic control problem (3.1.7) to an equivalent parameter-

ized stochastic control problem (with parameter λ) with an additive objective function.

This transformation is known as the λ−minimization technique, and its theory is devel-

oped in the excellent paper of [3]. Define for λ > 0 the function

V λ (Π0) = infφ∈Φ

EΠ0[TC(φ)− λCL(φ)] . (3.1.8)

Then, [3] showed that λ∗ determined by the equation

λ∗ = infλ > 0 : V λ (Π0) ≤ 0

(3.1.9)

is the optimal expected average cost for the stochastic control problem (3.1.7), and the

stationary policy φ∗ ∈ Φ that minimizes the right-hand side of (3.1.8) for λ = λ∗ deter-

mines the optimal stationary policy. We refer to the function V λ(·) defined in (3.1.8) as

the value function.

Although the model developed in this chapter is presented in the reliability and main-

tenance context, the methods and results developed in this chapter can be applied to a

number of different fields. For example, there is a very close connection between the

problem described above and the joint optimization of cancer screening and treatment

scheduling. In such healthcare applications, a patient can be in one of three states: a

healthy state (no disease), an asymptomatic state (has the disease, but the state is not

fully observable) or symptomatic (has the disease, and it is observable). The three states

correspond exactly to our healthy, warning and failure states. As in our model, since the

asymptomatic state is not fully observable, the state of the patient can only be inferred


through ‘costly’ checkups. Although the checkups provide information about the state

of the patient, the information is imperfect due false positive and negative test results.

Furthermore, based on the information collected, the physician can recommend a more

costly test which can reveal the true state of the patient with certainty (this corresponds

to initiating full system inspection, action an = 3). If the patient is found to have the

disease, treatment begins (analogous to preventive maintenance), otherwise the screening

process continues. Recent contributions to healthcare screening and treatment planning

are given by [13], [54] and [66], and the references therein. Other interesting applications

of our model also include quality and statistical process control (e.g. [11], [49], [70]) and

change point detection applications (e.g. [17], [57]).

In the next section, we analyze the value function defined in (3.1.8) and determine

the structure of the optimal sampling and maintenance policy. For the remainder of the

chapter, to simplify notation we suppress the dependence on λ when there is no confusion

and write, for example, V (Π0) instead of V λ (Π0).

3.2 Structural Form of the Optimal Policy

The goal of this section is to characterize the form of the optimal sampling and mainte-

nance policy. The strategy we take is to first analyze the control problem over a restricted

subclass of stationary policies Φk ⊂ Φ in which full system inspection must be initiated

no later than at time k∆. The value function Vk for the restricted control problem is de-

rived and its properties are determined. The restriction is then lifted, and the properties

of the restricted value functions Vk are carried over to the infinite horizon value function

V , which can be obtained as the limit Vk → V . The dynamic optimality equation is then

derived and further properties of the infinite horizon value function V are determined. It

is then shown that the optimal policy is characterized by three critical thresholds, which

have practical value and intuitive interpretation.


We begin by providing a closed-form expression for the transition probability matrix

for the uncontrolled state process (Xt). By the model assumptions given in Section 3.1,

it can be shown by solving the Kolmogorov backward differential equations (e.g. [26]),

that the transition probability matrix for the uncontrolled state process is given by

P(t) = [pij(t)]

=

e−υ1t

q12(e−υ2t−e−υ1t)υ1−υ2

1− e−υ1t − q12(e−υ2t−e−υ1t)υ1−υ2

0 e−υ2t 1− e−υ2t

0 0 1

, (3.2.1)

where transition probabilities pij(t) = P (Xt = j|X0 = i), i, j ∈ X , and constants

υ1 = q12 + q13, υ2 = q23.

Suppose at decision epoch n the system has not failed, i.e. ξ > n∆, and Πn = π.

Then for any t ∈ [0,∆], the probability that the system will not fail by n∆ + t is given

by

R(t|π) := P (ξ > n∆ + t|ξ > n∆,Πn = π)

= (1− p13(t))(1− π) + (1− p23(t))π. (3.2.2)

The function R(·|π) defined in (3.2.2) is known as the conditional reliability function. If

the decision maker chooses action an = 2 (take a sample), then at the beginning of the

next decision epoch n+ 1, if ξ > (n+ 1)∆, a sample Yn+1 is made available and the state

probability is updated using Bayes’ Rule (e.g. [65])

Πn+1(Yn+1, π) := P (X(n+1)∆ = 2|ξ > (n+ 1)∆, Yn+1,Πn = π)

=d2Yn+1

(p12(∆)(1− π) + p22(∆)π)

d1Yn+1p11(∆)(1− π) + d2Yn+1

(p12(∆)(1− π) + p22(∆)π).(3.2.3)

On the other hand, if the decision maker choses action an = 1 (do nothing), at the

beginnning of the next decision epoch n+ 1, if ξ > (n+ 1)∆, no new sample is available,


so that the state probability is given by

Πn+1(∅, π) := P (X(n+1)∆ = 2|ξ > (n+ 1)∆,Πn = π)

=p12(∆)(1− π) + p22(∆)π

R(∆|π)

=p12(∆)(1− π) + p22(∆)π

(1− p13(∆))(1− π) + (1− p23(∆))π. (3.2.4)

The empty set symbol ∅ in (3.2.4) is used to indicate that no new sample Yn+1 was

obtained at the beginning of decision epoch n+ 1.

We next analyze the control problem over a restricted subclass of stationary policies.

For k ≥ 0, let Φk ⊂ Φ represent the class of stationary policies φ such that the time of

the first decision epoch at which full system inspection is initiated is less than or equal to

k∆ with probability 1, i.e. M(φ) ≤ k∆. Then, by the dynamic programming algorithm

(e.g. [7]), the value function for the restricted control problem

Vk(π) = infφ∈Φk

Eπ [TC(φ)− λCL(φ)] (3.2.5)

satisfies the dynamic equations

V0(π) = CI + CPπ − λ(TI + TPπ),

Vk(π) = min V 1k (π), V 2

k (π), V 3k (π) , (3.2.6)

where

V 1k (π) = CW

∫ ∆

0

(p12(t)(1− π) + p22(t)π)dt− λ∫ ∆

0

R(t|π)dt

+(CF − λTF ) (1−R(∆|π)) +R(∆|π)Vk−1 (Π1(∅, π)) ,

V 2k (π) = CS + CW

∫ ∆

0

(p12(t)(1− π) + p22(t)π)dt− λ∫ ∆

0

R(t|π)dt

+(CF − λTF ) (1−R(∆|π)) +R(∆|π)∑y∈E

Vk−1 (Π1(y, π)) g(y, π),(3.2.7)

V 3k (π) = CI + CPπ − λ(TI + TPπ),

and

g(y, π) =d1yp11(∆)(1− π) + d2y (p12(∆)(1− π) + p22(∆)π)

R(∆|π). (3.2.8)


The first term V 1k (π) in (3.2.7) is the expected ‘cost’ if action 1 (do nothing) is chosen,

and the decision maker runs the system for one period, updates the state probability

Π1(∅, π) using equation (3.2.4), and then continues optimally with k − 1 periods left.

The second term V 2k (π) is the expected ‘cost’ if action 2 (take a sample) is chosen, and

the decision maker runs the system for one period, collects a sample Y1 = y, updates

the state probability Π1(y, π) using equation (3.2.3), and then continues optimally with

k−1 periods left. The third term V 3k (π) is the expected ‘cost’ if action 3 (full inspection)

is chosen, and the decision maker stops the process for full system inspection, followed

possibly by preventive maintenance.

It then follows from equations (3.2.6) - (3.2.8) that the restricted value functions Vk

have the following property.

Lemma 3.2.1. For each k ≥ 0, Vk(π) is a concave function of π.

Proof. We prove this lemma using mathematical induction. For k = 1, substituting

equations (3.2.2) - (3.2.4) into (3.2.7) shows that V 11 (π), V 2

1 (π), V 31 (π) are linear, and

hence concave, functions of π. Assume that for some k > 0, Vk(π) is a concave function

of π. We want to show that Vk+1(π) is also a concave function of π. Since the min

operator preserves concavity and R(∆|π) is a linear function of π, it suffices to show

that R(∆|π)Vk (Π1(∅, π)) and R(∆|π)∑

y∈E Vk (Π1(y, π)) g(y, π) are concave functions of

π. Fix arbitrary π1, π2, α ∈ [0, 1]. Then by equation (3.2.4),

Π1(∅, απ1 + (1− α)π2) =

(αR(∆|π1)

αR(∆|π1) + (1− α)R(∆|π2)

)Π1(∅, π1)

+

((1− α)R(∆|π2)

αR(∆|π1) + (1− α)R(∆|π2)

)Π1(∅, π2).

Then by concavity of Vk,

R(∆|απ1 + (1− α)π2)Vk (Π1(∅, απ1 + (1− α)π2))

≥ αR(∆|π1)Vk (Π1(∅, π1)) + (1− α)R(∆|π2)Vk (Π1(∅, π2)) ,


which shows that R(∆|π)Vk (Π1(∅, π)) is a concave function of π. Similarly, by equation

(3.2.3), for each y ∈ E ,

Π1(y, απ1 + (1− α)π2)

=

(αg(y, π1)R(∆|π1)

αg(y, π1)R(∆|π1) + (1− α)g(y, π2)R(∆|π2)

)Π1(y, π1)

+

((1− α)g(y, π2)R(∆|π2)

αg(y, π1)R(∆|π1) + (1− α)g(y, π2)R(∆|π2)

)Π1(y, π2).

Then by concavity of Vk,

R(∆|απ1 + (1− α)π2)∑y∈E

Vk (Π1(y, απ1 + (1− α)π2)) g(y, απ1 + (1− α)π2)

≥∑y∈E

(αVk (Π1(y, π1)) g(y, π1)R(∆|π1) + (1− α)Vk (Π1(y, π2)) g(y, π2)R(∆|π2))

= αR(∆|π1)∑y∈E

Vk (Π1(y, π1)) g(y, π1) + (1− α)R(∆|π2)∑y∈E

Vk (Π1(y, π2)) g(y, π2)

which shows that R(∆|π)∑

y∈E Vk (Π1(y, π)) g(y, π) is a concave function of π. Thus, by

mathematical induction, for each k > 0, Vk(π) is a concave function of π.

We also have the following lower bound on the family of restricted value functions

(Vk(π)).

Lemma 3.2.2. The restricted value functions Vk(π) are uniformly bounded from below,

Vk(π) ≥ − λ(∆ + TF )

1−R(∆|0). (3.2.9)

Proof. We prove inequality (3.2.9) using mathematical induction. For k = 0, it is clear

that V0(π) ≥ − λ(∆+TF )

1−R(∆|0). Assume that for some k ≥ 0, Vk(π) ≥ − λ(∆+TF )

1−R(∆|0). Then, it follows

that

V 1k+1(π) = CW

∫ ∆

0

(p12(t)(1− π) + p22(t)π)dt− λ∫ ∆

0

R(t|π)dt

+(CF − λTF ) (1−R(∆|π)) +R(∆|π)Vk (Π1(∅, π))

≥ −λ(∆ + TF )− λ(∆ + TF )

1−R(∆|0)R(∆|π)

≥ −λ(∆ + TF )− λ(∆ + TF )

1−R(∆|0)R(∆|0)

= − λ(∆ + TF )

1−R(∆|0),


and

V 2k+1(π) = CS + CW

∫ ∆

0

(p12(t)(1− π) + p22(t)π)dt− λ∫ ∆

0

R(t|π)dt

+(CF − λTF ) (1−R(∆|π)) +R(∆|π)∑y∈E

Vk−1 (Π1(y, π)) g(y, π)

≥ −λ(∆ + TF )− λ(∆ + TF )

1−R(∆|0)R(∆|π)

∑y∈E

g(y, π)

= −λ(∆ + TF )− λ(∆ + TF )

1−R(∆|0)R(∆|π)

≥ −λ(∆ + TF )− λ(∆ + TF )

1−R(∆|0)R(∆|0)

= − λ(∆ + TF )

1−R(∆|0).

Since V 3k+1(π) = V0(π) ≥ − λ(∆+TF )

1−R(∆|0), it follows that Vk+1(π) ≥ − λ(∆+TF )

1−R(∆|0). Thus, by

mathematical induction, Vk(π) ≥ − λ(∆+TF )

1−R(∆|0)for all k, π.

Lemmas 3.2.1 and 3.2.2 allow us to characterize the infinite horizon value function

V defined in (3.1.8). For each k, since Φk ⊂ Φk+1, by definition of Vk given in equation

(3.2.5) it follows that Vk(π) ≥ Vk+1(π). Then, by Lemma 3.2.2, since the restricted value

functions (Vk) are uniformly bounded from below, limk→∞ Vk = V exists, and by Lemma

3.2.1, the value function V is concave and bounded, and it satisfies the following dynamic

optimality equation (e.g. [7]), which gives us our first important structural result:

Theorem 3.2.3. The infinite horizon value function defined in equation (3.1.8) is ob-

tained as the limit V (π) = limk→∞ Vk(π). Furthermore, V (π) is a concave, bounded

function of π, satisfying the dynamic optimality equation

V (π) = min V 1(π), V 2(π), V 3(π) , (3.2.10)


where

V 1(π) = CW

∫ ∆

0

(p12(t)(1− π) + p22(t)π)dt− λ∫ ∆

0

R(t|π)dt

+(CF − λTF ) (1−R(∆|π)) +R(∆|π)V (Π1(∅, π)) ,

V 2(π) = CS + CW

∫ ∆

0

(p12(t)(1− π) + p22(t)π)dt− λ∫ ∆

0

R(t|π)dt (3.2.11)

+(CF − λTF ) (1−R(∆|π)) +R(∆|π)∑y∈E

V (Π1(y, π)) g(y, π),

V 3(π) = CI + CPπ − λ(TI + TPπ).

It then follows that the value function V is also non-decreasing.

Corollary 3.2.4. The infinite horizon value function V (π) is a non-decreasing function

of π.

Proof. By Theorem 3.2.3, the value function V (π) = minV 1(π), V 2(π), V 3(π) is a con-

cave function of π. Furthermore, V 3(π) = CI + CPπ − λ(TI + TPπ) is a non-decreasing

in π if and only if λ ≤ CPTP

. However, the second inequality must hold; for if λ > CPTP

,

there would be no need to monitor and control the system since it would always be op-

timal to initiate preventive maintenance indefinitely, which gives a long-run average cost

of CPTP

. Thus, V 3(π) = CI + CPπ − λ(TI + TPπ) is linear and non-decreasing in π. If

there exists π1 < π2 such that V (π1) > V (π2), then necessarily V (1) < V 3(1), which is

a contradiction, since we show in the proof of Theorem 3.2.6 that V (1) = V 3(1). Thus,

for all π1 < π2, V (π1) ≤ V (π2), i.e. the value function V (π) is a non-decreasing function

of π, which completes the proof.

We next prove a theorem, which makes use of the result in the classical paper of [6].

Theorem 3.2.5. Any policy φ∞ ∈ Φ that never stops the process to initiate full system

inspection, i.e. M(φ∞) = +∞, is not optimal.

Proof. Consider an age-based policy φn ∈ Φ that initiates full system inspection at time

n∆. From renewal theory, the long-run expected average cost per unit time for this policy


is given by

g(n) =

CFp13(n∆) + CIp11(n∆) + (CI + CP )p12(n∆)

+CW

∫ n∆

0

p12(t)dt+ CSE [N(φn)]

E [n∆ ∧ ξ] + TFp13(n∆) + TIp11(n∆) + (TI + TP )p12(n∆)

. (3.2.12)

Thus, to prove the claim, it suffices to show that

arg minn

g(n) < +∞. (3.2.13)

To show (3.2.13), we derive an upper bound for arg minn g(n) by considering a related

process in which we remove all incentive to stop the process early, so that full system

inspection must be done at a later time. In particular, consider a related process in which

full system inspection costs CI + CP , whether the system is found to be in healthy or

warning state, and all maintenance actions (corrective, inspection and preventive) take

0 time units. We furthermore assume that there is no penalty to run the system longer,

so that CW = CS = 0. Then, if preventive maintenance is scheduled at time n∆, the

expected average cost for this process is given by

b(n) =CFp13(n∆) + (CI + CP )(1− p13(n∆))

E[n∆ ∧ ξ], (3.2.14)

and clearly

arg minn

g(n) ≤ arg minn

b(n).

To complete the proof, we show that arg minn b(n) < +∞, which implies equation

(3.2.13). Since we have assumed q23 > q13, the failure rate of ξ is increasing. We have

also assumed that CF > CI +CP . [6] showed that under these assumptions, there exists a

positive real value t∗ < +∞ such that t∗ is the unique minimizer of b(t). For our problem

arg minn b(n) is required to be integer-valued. However, since t∗ is a unique minimizer,

the function b(t) is increasing for t > t∗. Thus, it follows that arg minn b(n) ≤ dt∗e < +∞,



The optimal sampling and maintenance policy is described by the following Theorem.

Theorem 3.2.6. The optimal sampling and maintenance policy φ∗ ∈ Φ is characterized

by three critical thresholds 0 ≤ θL ≤ θU ≤ η ≤ 1. In particular, at decision epoch n:

1. If Πn < θL, do nothing and run the system until the next decision epoch n+ 1.

2. If θL ≤ Πn < θU , take a sample.

3. If θU ≤ Πn < η, do nothing and run the system until the next decision epoch n+ 1.

4. If Πn ≥ η, initiate full system inspection, followed possibly by preventive mainte-

nance.


Proof. We first show that for π = 1, V 3(1) < V 1(1) < V 2(1). We start with the second

inequality V 1(1) < V 2(1). By equation (3.2.11),

V 1(1)− V 2(1) = R(∆|1)V (Π1(∅, 1))−R(∆|1)∑y∈E

V (Π1(y, 1)) g(y, 1)− CS

= R(∆|1)V (1)−R(∆|1)V (1)∑y∈E

g(y, 1)− CS

= −CS

< 0,

which implies V 1(1) < V 2(1). We next show the first inequality V 3(1) < V 1(1) using

mathematical induction. The inequality is equivalent to V 3(1) = V (1). For k = 1, we

assume V 31 (1) > V1(1) and draw a contradiction. Since it is not optimal to stop the

process to initiate full system inspection when π = 0, linearity of V 11 (π), V 2

1 (π), V 31 (π),

implies that V 31 (π) > V1(π) for all 0 ≤ π ≤ 1. Since for each k, Vk(π) ≥ Vk+1(π), it

follows that the limit V (π) = limk→∞ Vk(π) < V 31 (π), and the policy that never stops the

process is optimal, which contradicts Theorem 3.2.5. Whence, V 31 (1) = V1(1), and by


equation (3.2.6),

CI + CP − TI − TP ≤ CW

∫ ∆

0

p22(t)dt− λ∫ ∆

0

R(t|1)dt

+(CF − λTF ) (1−R(∆|1)) +R(∆|1) (CI + CP ) .

Suppose now that for some k > 0, V 3k (1) = Vk(1). Using the above inequality, it follows

that

Vk+1(1) = minV 1k+1(1), V 2

k+1(1), V 3k+1(1)

= min

V 1k+1(1), V 3

k+1(1)

= CI + CP − TI − TP

= V 3k+1(1),

which completes the inductive step. Therefore the limit V (1) = limk→∞ Vk(1) = V 3(1).

Thus, for π = 1, V 3(1) < V 1(1) < V 2(1). Since for π = 0, V 1(0) < V 2(0) < V 3(0),

the above inequalities and equation (3.2.10) imply that the region π : V (π) = V 3(π)

is a convex subset of [0, 1] of the form [η, 1], for some 0 ≤ η ≤ 1, and the region

π : V (π) = V 2(π) is a convex subset of [0, 1] of the form [θL, θU ], for some 0 ≤ θL ≤

θU ≤ η, which completes the proof.

Theorem 3.2.6 shows that the optimal control policy can be represented as a type of

control chart, which monitors the probability Πn that the system is in a warning state.

The intuitive interpretation of the three critical thresholds (θL, θU , η) is as follows. When

the probability that the system is in a warning state is below the lower sampling limit

Πn ≤ θL, the decision maker has high confidence that the system is in healthy state 1,

and therefore has little reason to take an expensive sample through condition monitoring

to confirm this belief. Similarly, when the state probability is above the upper sampling

limit Πn ≥ θU , the decision maker has high confidence that the system is in warning

state 2, and therefore also has little reason to take the sample. It is only when the state

probability θL ≤ Πn < θU , that the decision maker is unsure about the system’s condition


and is willing to pay for a sample to get a better idea about its health. However, once

the state probability exceeds η, the risk of system failure and of incurring an expensive

corrective maintenance cost is too high, so the decision maker should stop the process

and initiate full system inspection, followed possibly by preventive maintenance.

Remark 3.2.7. It is important to note that practitioners can also use the control pol-

icy described in Theorem 3.2.6 as a tool for planning maintenance activities in advance.

For example, if θU ≤ Πn < η, the optimal action is to do nothing and run the sys-

tem until the next decision epoch n + 1. However, since no sample is taken, the state

probability at the next decision epoch Πn+1 = Πn(∅, π) = p12(∆)(1−π)+p22(∆)π

(1−p13(∆))(1−π)+(1−p23(∆))πis

a deterministic function given by equation (3.2.4). Therefore, the next maintenance

action (full system inspection) can be scheduled to take place in the future in T =

infm∆ : p12(m∆)(1−π)+p22(m∆)π

(1−p13(m∆))(1−π)+(1−p23(m∆))π≥ η

time units from now. Planning maintenance

activities in advance is particularly useful in practice since suspending a system from

operation for full inspection and maintenance may require significant preparation.

Intuitively, one would expect that if the sampling cost CS = 0, we should always take

a sample. On the other hand, if the sampling cost is greater than the cost of full system

inspection and preventive maintenance, i.e. CS > CI + CP , one would expect that we

should never take a sample. To conclude this section, we show using Jensen’s inequality

(e.g. [9]) that this intuition is mathematically correct.

Corollary 3.2.8. If the sampling cost CS = 0, then θL = 0 and θU = η. In other words,

before full system inspection is initiated, i.e. for all π < η, it is always optimal to take

a sample, i.e. φ∗(π) = 2. On the other hand, if the sampling cost CS > CI + CP , then

θL = θU = η. In other words, before full system inspection is initiated, i.e. for all π < η,

it is never optimal to take a sample, i.e. φ∗(π) = 1.

Proof. By equation (3.2.11),

V 1(π)− V 2(π) = R(∆|π)V (Π1(∅, π))−R(∆|π)∑y∈E

V (Π1(y, π)) g(y, π)− CS.


Also, equations (3.2.3), (3.2.4) and (3.2.8) imply

Π1(∅, π) =∑y∈E

Π1(y, π)g(y, π).

Thus, by concavity of V , it follows by Jensen’s inequality that for all 0 ≤ π ≤ 1,

R(∆|π)V (Π1(∅, π)) ≥ R(∆|π)∑y∈E

V (Π1(y, π)) g(y, π).

Thus, if CS = 0, for all 0 ≤ π ≤ 1, V 1(π) ≥ V 2(π), and it is always optimal to sample if

π < η.

For the case in which CS > CI + CP , since we know by Corollary 3.2.4 the value

function V (π) is a non-decreasing function of π, it follows that for all 0 ≤ π ≤ 1,

R(∆|π)V (Π1(∅, π))−R(∆|π)∑

y∈E V (Π1(y, π)) g(y, π) ≤ CI +CP . Thus, CS > CI +CP ,

then V 1(π) < V 2(π), i.e. it is never optimal to take a sample.

In the next section, we develop an iterative computational algorithm that determines

the optimal values of the critical thresholds (θ∗L, θ∗U , η

∗) and the minimum long-run ex-

pected average cost per unit time λ∗. We also provide numerical comparisons with other

suboptimal policies that illustrate the benefits of the joint optimization of sampling and

maintenance.

3.3 Computation of the Optimal Policy

In this section, we develop a computational algorithm that determines the optimal values

of the critical thresholds (θ∗L, θ∗U , η

∗) and the long-run expected average cost per unit time

λ∗. We also provide numerical comparisons with other suboptimal policies that illustrate

the benefits of the joint optimization of sampling and maintenance.

The computational algorithm is based on the λ−minimization technique ([3]) and the

(monotone) convergence of the restricted value functions Vk → V .

The Algorithm


Step 1. Choose ε > 0 and lower and upper bounds of λ, λ ≤ λ ≤ λ.

Step 2. Put λ = (λ+ λ)/2, and V λ0 (π) = CI + CPπ − λ(TI + TPπ), k = 1.

Step 3. Calculate V λk using the dynamic equations (3.2.6) and (3.2.7). Stop the

iteration of V λk when ||V λ

k − V λk−1|| ≤ ε.

Step 4. If V λk (0) < −ε, put λ = λ and go to Step 2.

If V λk (0) > ε, put λ = λ and go to Step 2.

If |V λk (0)| ≤ ε, put λ∗ = λ and stop.

In the algorithm above, Step 3 and Theorem 3.2.3 imply that the restricted value func-

tion V λk approximates the value function V λ for λ = λ∗. Step 4 and the λ−minimization

technique ([3]) imply that λ∗ is the optimal expected average cost. Furthermore, by

Theorem 3.2.3, the optimal value of the lower (resp. upper) sampling limit θ∗L (resp. θ∗U)

is the smallest (resp. largest) value of π such that Vk(π) = V 2k (π), and η∗ is the smallest

value of π such that Vk(π) = V 3k (π).

In the algorithm above, since λ > 0, a natural choice for the initial value of the lower

bound λ is 0. However, it is not clear how one should choose the value of the initial

upper bound λ. Fortunately, we have the following result for a feasible choice of the

initial upper bound.

Lemma 3.3.1. The optimal average cost is bounded by 0 < λ∗ ≤ CITI

. Thus, in the

algorithm given above, λ = 0 and λ = CITI

are feasible initial values for lower and upper

bounds, respectively.

Proof. Consider an age-based policy φ0 ∈ Φ that initiates full system inspection immedi-

ately at time 0. From renewal theory, it is clear that the long-run expected average cost

per unit time for this policy is given by

λ0 =CI + CPΠ0

TI + TPΠ0

=CITI,

where the second equality follows since we have assumed that a new system is installed


at time 0, i.e. Π0 = P (X0 = 2) = 0. Thus, it follows that

λ∗ = infφ∈Φ

EΠ0[TC(φ)]

EΠ0[CL(φ)]

≤ EΠ0[TC(φ0)]

EΠ0[CL(φ0)]

=: λ0 =CITI.

Therefore, the optimal average cost is bounded by 0 < λ∗ ≤ CITI

, which completes the

proof.

We next illustrate the use of the computational algorithm in the following subsection

and determine the optimal values of the critical thresholds (θ∗L, θ∗U , η

∗) and the long-run

expected average cost per unit time λ∗, in a numerical example.

3.3.1 Constructing the Optimal Control Chart

In this subsection, we construct the cost-optimal control chart described in Theorem

3.2.6. Using the computational algorithm described above, the optimal values of the

critical thresholds (θ∗L, θ∗U , η

∗) and the long-run expected average cost per unit time λ∗

are determined.

Consider the following transition rate matrix and state-observation matrix

Q =

−0.17 0.12 0.05

0 −0.50 0.50

0 0 0

, D =

0.90 0.05 0.05 0.00

0.00 0.05 0.10 0.85

.Maintenance cost parameters are given by CW = 30, CF = 85, CS = 2, CI = 65, CP = 20,

and maintenance time parameters TF = TI = TP = ∆ = 1. We coded the computational

algorithm given above in MATLAB and obtained the following optimal values θ∗L = 0.05,

θ∗U = 0.60, η∗ = 0.75, with a minimum expected average cost λ∗ = 17.82. The algorithm

took 369.39 seconds to complete on an Intel Corel 2 6420, 2.13 GHz with 2 GB RAM.

To run the algorithm, we chose ε = 0.01 and the interval 0 ≤ π ≤ 1 was discretized,

considering values of π = 0, 0.01, 0.02, . . . , 1, so that ||V λk − V λ

k−1|| in Step 3, for example,

is calculated as maxπ=0,0.01,...,1 |V λk − V λ

k−1|. The value function is graphed in Figure 3.3.1.


Figure 3.3.1: The Graph of the Value Function V (π)

Theorem 3.2.6 implies that the optimal sampling and maintenance policy can be

represented as a control chart, which monitors the probability Πn that the system is in

a warning state. To illustrate the use of such a control chart we plot a sample path

realization of (Πn) in Figure 3.3.2 below.

Figure 3.3.2: The Optimal Sampling and Maintenance Policy Represented as a Control Chart

Figure 3.3.2 shows that no samples should be taken from decision epoch 0 to decision

epoch 2. From decision epochs 3 to 6, the posterior probability θ∗L ≤ Πn < θ∗U so

the optimal action is to take a sample. At decision epoch 7, the posterior probability

θ∗U ≤ Πn < η∗, so again it is optimal to do nothing. At decision epoch 8, Πn ≥ η∗ so the


optimal action is full system inspection, followed possibly by preventive maintenance.

Such a control chart has direct practical value as it can be readily implemented for

online decision-making. Furthermore, since the monitored statistic is univariate and three

critical thresholds have straightforward and intuitive interpretation, decisions that are

made can be easily justified and explained at a managerial level.

In the next subsection, we provide numerical comparisons with other policies that

illustrate the benefits of the joint optimization of sampling and maintenance.

3.3.2 Comparison with Other Policies

In this subsection, we compare the performance of our jointly optimal sampling and

maintenance policy with the two most widely considered sampling policies: the policy

that never takes a sample at any decision epoch, and the policy that always takes a

sample at every decision epoch. Under each of these suboptimal sampling policies, the

decision maker still has the freedom to initiate full system inspection at any time. On

one hand, the policy that never takes a condition monitoring sample incurs no sampling

costs but also has the least amount of information. On the other hand, the sampling

policy that always takes a sample at every decision epoch carries the most information,

but also incurs the highest sampling cost. Our joint sampling and maintenance policy is

the optimal balance between having the largest amount of information at the least cost.

It is well known that the policy that never takes a sample at any decision epoch is

nothing more than the classical age-based policy (e.g. [6]). Within our framework, this

policy corresponds to the special case where θL = θU = η. Similarly, the policy that

always takes a sample at every decision epoch corresponds to the special case where

θL = 0 and θU = η. This type of control policy is known as a Bayesian control chart,

which was the focus of Chapter 2.

To facilitate our discussion, we refer to the policy that never takes a sample as an

N−Policy, the policy that always takes a sample as an A−Policy, and our jointly optimal


policy of Theorem 3.2.6 as a J−Policy. For this comparison, we consider the following

transition rate matrix and state-observation matrix

Q =

−0.23 0.12 0.11

0 −0.11 0.11

0 0 0

, D =

0.80 0.10 0.05 0.05

0.10 0.05 0.00 0.85

,and model parameters CW = 70, CF = 110, CS = 8, CI = 55, CP = 55, and TF = TI =

TP = ∆ = 1. We obtain the following results in Table 3.3.1.

Table 3.3.1: Comparison with Suboptimal Policies

N−Policy A−Policy J−Policy

θ∗L 0.27 0 0.09

θ∗U 0.27 0.34 0.29

η∗ 0.27 0.34 0.29

λ∗ 25.65 25.15 23.01

Run Time 5.97 78.64 364.95

Table 3.3.1 shows that the jointly optimal J−Policy performs substantially better

than both the optimal N−Policy and the optimal A−Policy. In particular, Table 3.3.1

shows that using the optimal J−Policy gives an expected 10.29% cost saving over the

optimal N−Policy, and an expected 8.51% cost saving over the optimal A−Policy. Nat-

urally, determining the optimal thresholds values (θ∗L, θ∗U , η

∗) for the J−Policy takes

longer than the determining the optimal threshold values for the optimal N−Policy and

the optimal A−Policy. However, in practice, since these computations are typically done

off-line, a total run time of a few minutes is surely worth the large cost savings obtained

by using the optimal J−Policy. It is also interesting to note that in this example, the

optimal threshold η∗ for full system inspection is quite low for all three policies. This is

due the fact that the cost of corrective maintenance CF = 110 is relatively much higher

than the cost of system inspection CI = 55 and preventive maintenance CP = 55. There-


fore, it is more beneficial to perform full system inspection more frequently than to run

the system longer and risk costly corrective maintenance due to failure.

We next analyze the sensitivity of the optimal policy for different value of the sampling

cost CS. In light of Corollary 3.2.8, we already know that the optimal J−Policy coincides

with the optimal A−Policy when CS = 0, and with the optimal N−Policy when CS >

CI + CP = 110. We obtain the following results in Table 3.3.2.

Table 3.3.2: Optimal Expected Average Cost λ∗ for Varying Sampling Costs CS

CS N−Policy A−Policy J−Policy

0 25.65 19.63 19.63

2 25.65 21.01 20.61

4 25.65 22.39 21.70

6 25.65 23.77 22.38

8 25.65 25.15 23.01

10 25.65 26.53 23.72

12 25.65 27.91 24.23

14 25.65 29.29 24.71

16 25.65 30.67 25.08

18 25.65 32.05 25.37

20 25.65 33.43 25.58

22 25.65 34.81 25.62

24 25.65 36.19 25.65

Table 3.3.2 provides important managerial insight into the operational value of con-

dition monitoring information and technologies. This insight is best understood visually,

so we plot the optimal expected average costs of Table 3.3.2 below in Figure 3.3.3.

The dashed horizontal line in Figure 3.3.3 is the expected average cost for the optimal

N−Policy for different values of the sampling cost CS. The dotted increasing line is the


Figure 3.3.3: Graphical Illustration of the Optimal Expected Average Cost λ∗ for Varying Sampling

Costs CS

expected average cost for the optimal A−Policy, and the solid increasing curve is the

expected average cost for the optimal J−Policy. Figure 3.3.3 shows that the optimal

J−Policy coincides with the optimal A−Policy when CS = 0, and with the optimal

N−Policy when CS ≥ 24.

The optimal A−Policy is always better than the optimal N−Policy from CS = 0 to

around CS = 9. Afterwhich, the optimal N−Policy is always better than the optimal

A−Policy. This implies that once the sampling cost CS exceeds 9, it is better to never

take a sample and be ignorant of the state of the system, than it is to incur regular

condition monitoring sample costs to get a better idea of the system state.

Although the optimal J−Policy is always better than both the optimal N−Policy

and optimal A−Policy for all values of CS, the benefits are approximately the greatest

when CS = 9, i.e. the point at which the optimal N−Policy is better than the optimal

A−Policy. On the other hand, the benefits of using the optimal J−Policy become quite

marginal when CS is close to 0 and 24. This suggests that a manager is likely not to be

willing to invest in condition monitoring technologies if the sampling cost CS is close to

24. Similarly, a manager should choose to sample the system at every decision epoch to


simplify the scheduling of sampling and maintenance activities if the sampling cost CS

is close to 0.


In this chapter, a joint sampling and control problem under partial observations has been

considered. The problem has been formulated as a partially observable Markov decision

process. The objective was to characterize the form of the optimal sampling and main-

tenance policy that minimizes the long-run expected average cost per unit time. It was

shown that the optimal control policy can be represented as a control chart with three

critical thresholds, which monitors the posterior probability that the system is in a so-

called warning state. Such a control chart has direct practical value as it can be readily

implemented for online decision-making. Furthermore, since the monitored statistic and

three critical thresholds have straightforward and intuitive interpretation, decisions can

be easily justified and explained at a managerial level. It was also shown that the struc-

ture of the optimal policy allows practitioners to plan and schedule maintenance activities

into the future. A cost comparison with other suboptimal policies has been examined,

which illustrates the benefits of the joint optimization of sampling and control. It was

found that the jointly optimal sampling and maintenance policy performed substantially

better than existing suboptimal policies. Numerical results indicate that the advantage

of using the jointly optimal sampling and maintenance policy becomes less substantial

for both very small and large values of the sampling cost CS.

There are a number of exciting extensions and topics for future research. We have

considered a system that can be characterized by three distinguishable states: a healthy

state (state 1), a warning state (state 2), and a failure state (state 3). In practice, this

assumption is reasonable for many applications. As considered in Chapter 2, it may be

of worth investigating how much additional value would be gained by considering the


general N > 3 state model. Such an extension would lead to both interesting theoretical

and practical challenges. The main challenge roots from the fact that the sufficient

statistic for decision-making is no longer a univariate statistic, as is the case for the

three state model. In fact, the sufficient statistic for decision-making would now be an

(N − 1)−dimensional vector representing the posterior probability distribution of the

system state. Thus, the optimal sampling and maintenance policy can no longer be

visualized and represented as a control chart.

The numerical results of Section 3.3 showed that the run time of our algorithm took

over 6 minutes to complete. Although this is not unreasonably long, there is still much

room for improvement. In particular, a closer look at Theorem 3.2.6 reveals that the re-

sult has further computational value. Recall, that the original stochastic control problem

defined in (3.1.7) was transformed to an equivalent parameterized stochastic control prob-

lem (with parameter λ) with an additive objective function using the λ−minimization

technique. However, the characterization given in Theorem 3.2.6 implies that the op-

timal control policy is no longer parameterized by λ, and is completely determined by

the ordered triple (θL, θU , η). This is potentially a useful property from a computational

point of view, since it is possible to develop an algorithm that directly finds the opti-

mal values of (θ∗L, θ∗U , η

∗) that minimize the original objective function defined in (3.1.7).

Such an algorithm would likely be faster than the algorithm presented in Section 3.3, as

one would now be solving a single optimization problem, as opposed to solving multiple

stochastic control problems for different values of λ.

Chapter 4

Parameter Estimation for

Stochastically Failing Systems

In this chapter, we consider a parameter estimation problem for a partially observable

system subject to random failure. We assume that two types of data histories are avail-

able: histories that end with observable system failure, and censored data histories that

end when the system has been suspended from operation but has not failed. Given any

number of failure and suspension histories, our objective is to determine the maximum

likelihood estimates (MLEs) of the model parameters.

In recent years, a lot of research has been done on the analysis and control of main-

tenance models. Surprisingly, little research has been done on parameter estimation for

partially observable systems subject to random failure. Although some research has con-

sidered estimation for partially observable systems in the hidden Markov model (HMM)

framework, few researchers have considered the inclusion of failure information, which is

present in almost every maintenance application. For example, Ryden [64], Douc et al.

[22], Genon-Catalot and Laredo [25], and Hamilton [27] considered maximum likelihood

estimation for hidden Markov models in discrete time, however the results of their papers

are not applicable to maintenance systems for which system failure is observable.

60

Chapter 4. Parameter Estimation for Stochastic Systems 61

Asmussen et al. [2] considered an estimation method using the EM algorithm for

phase-type distributions. Their paper has some similarities with the model considered

in this chapter. In particular, in both papers, the system state follows a continuous-time

homogeneous Markov chain, and the time to system failure is observable and follows

phase-type distribution. However, since our model is for maintenance applications, we

also consider a stochastically related observation process that is sampled at equidistant

time points, which give partial information about the system state. This additional level

of complexity was not considered by Asmussen et al. [2]. Roberts and Ephraim [61]

considered a parameter estimation problem for continuous-time Markov chains that is

partially observed through a discrete-time observation process. There are two distinct

differences between their model and ours. The first is that the observation process they

consider is a univariate process, whereas we consider multivariate observations. The sec-

ond difference is that Roberts and Ephraim’s [61] model of ion-channel currents does

not have the notion of observable failure information, a feature that is found in mainte-

nance applications. Recently, Ghasemi et al. [24] considered parameter estimation for

a maintenance model with partial observations. In their paper, the failure rate of the

system follows Cox’s proportional hazards model, whereas our failure time is governed

by a phase-type distribution. The authors assumed a discrete time hidden Markov model

with a univariate, finite-valued observation process, whereas our hidden state process is

a continuous-time process with a multivariate, Rd-valued observation process.

We have found through our work with diagnostic data such as spectrometric oil data

and vibration data, that it is usually sufficient to consider only two operational states - a

healthy state and an unhealthy state. This is because in many cases, the system moves

through two distinct phases of operation. In the first and longer phase, the system

operates under normal conditions, and the observations behave in a stationary manner.

Although system degradation can be gradual, it is usually not until degradation has

exceeded a certain level that the behaviour of the condition monitoring observations


changes substantially. Furthermore, in many applications it may not be desirable to

define multiple intermediate degradation states if the objective is to run the system as

long as possible. This is because if the system is considered to be in a healthy or normal

state while degradation is below a critical warning level, only when the system experiences

severe degradation that can cause failure, will the decision maker initiate a maintenance

action. At this point, the system enters the second and shorter phase, which we define

to be the warning or unhealthy state. It will be shown that the estimation problem of

the three-state model considered in this chapter can be solved by directly analyzing the

structure of the pseudo likelihood function. We will show that both the pseudo likelihood

function and the parameter updates in each iteration of the EM algorithm have explicit

formulas. This implies that each iteration of the EM algorithm can be performed with a

single computation, which leads to an extremely fast and simple estimation procedure.

This computational advantage is particularly attractive for practical applications.

We should note that in certain applications, gradual system degradation can be mod-

eled more realistically using a general N-state extension of our model. However, as shown

in Lin and Makis [47], explicit update formulas in the EM algorithm are not readily avail-

able, which is one of the major advantages of using the three-state model considered in

this chapter. In particular, Lin and Makis [47] considered an interesting maintenance

model with finite-valued observations and failure information, similar to the model con-

sidered in this chapter. Their objective was to derive a general recursive filter, which

is important mainly for on-line re-estimation. The authors were able to express the pa-

rameter updates in each iteration of the EM algorithm in terms of the recursive filter.

However, such an approach has been found to be quite computationally intensive and

difficult to implement when working with real data sets.

The remainder of the chapter is organized as follows. In §4.1, we present the models

of the state and observation processes. In §4.2, we discuss maximum likelihood estima-

tion using the EM algorithm and the pseudo likelihood function. We derive an explicit


expression for the pseudo likelihood function and provide update formulas for both the

state and observation parameters. In §4.3, we develop a numerical example using real

multivariate spectrometric oil data coming from the failing transmission units of heavy

hauler trucks, which illustrates the entire estimation procedure. §4.4 provides concluding

remarks and future research directions.


We assume that a technical system’s condition can be categorized into one of three states:

a healthy or “good as new” state (state 1), an unhealthy or warning state (state 2), and a

failure state (state 3). In many real world applications the state of an operational system

is unobservable, and only the failure state is observable. For example, the state of an

operational transmission unit in a heavy hauler truck cannot be observed without full

system inspection, which is typically quite costly. However, failure of the mechanical unit

is immediately observable. We model the state process (Xt : t ∈ R+) as a continuous time

homogeneous Markov chain with state space X = 1, 2 ∪ 3. The system is assumed

to start in a healthy state, i.e. P (X0 = 1) = 1, and the transition rate matrix is given by

Q =

−(q12 + q13) q12 q13

0 −q23 q23

0 0 0

, (4.1.1)

where q12, q13, q23 ∈ (0,+∞) are the unknown state parameters. As in the previous

chapters, let ξ = inf t ∈ R+ : Xt = 3 be the observable failure time of the system.

Suppose at equidistant sampling times ∆, 2∆, . . ., ∆ ∈ (0,+∞), vector-data Y1, Y2, . . . ∈

Rd is collected through condition monitoring, which gives partial information about the

system state. The observations are assumed to be conditionally independent given the

state of the system, and for each n ∈ N, we assume that Yn conditional on Xn∆ = i,


i = 1, 2, has d−dimensional normal distribution Nd (µi,Σi) with density

f (y|i) =1√

(2π)d

det (Σi)exp

(−1

2(y − µi)

′Σ−1i (y − µi)

), (4.1.2)

where µ1, µ2 ∈ Rd and Σ1,Σ2 ∈ Rd×d are the unknown observation process parameters.

It is important to point out that the assumption of conditional independence is not

always reasonable in practice when the observations are highly autocorrelated. There are

essentially two main approaches that exist in the literature to deal with autocorrelation

in the data histories.

The first approach is to directly model autocorrelation in the observation process in

the hidden Markov framework. Such models are referred to in the literature as models

with a Markov regime (see e.g. Krishnamurthy and Yin [43], Hamilton [27]), Markov

switching (see e.g. Kim [37]), or Markov sources (see e.g. Liporace [48]). This approach

mathematically integrates the hidden Markov state process and the autocorrelation in

the data histories into a single model.

Kim et al. [41] analyzed a parameter estimation problem for this type of autoregressive

Markov switching model, where observations (Yn) are stochastically related to the state

process (Xt) via the equation

Yn = µXn∆+

p∑r=1

Φ(r)Xn∆

(Y(n−r)∆ − µ(n−r)∆) + AXn∆εn∆, (4.1.3)

where (εn∆) is a sequence of i.i.d. d-dimensional standard multivariate normal random

vectors, and µ1, µ2 ∈ Rd, A1, A2 ∈ Rd×d, and Φ(r)1 ,Φ(r)

2 ∈ Rd×d, r = 1, . . . , p, are unknown

model parameters that need to be estimated.

It was found that while such models are mathematically elegant and compact, they

have two severe limitations. First and foremost, the Markov property required for sub-

sequent optimization problems no longer holds, making these models algorithmically in-

tractable for optimal maintenance decision making. Thus, although such Markov switch-

ing models are able to incorporate the state-observation relationship, they are not very


useful for optimal decision making, which is the most important aspect of mathemati-

cal modeling in operations research and industrial engineering. For typical examples of

maintenance decision models that require the assumption of conditional independence in

the hidden Markov model see e.g. Makis and Jiang [51], Wu and Makis [78], and Yin

and Makis [81]. The second drawback of these models is that parameter estimation is

extremely computationally intensive for such models. This is due to the fact that no

closed-form analytical procedures are available for estimation. In particular, Kim et al.

[41] showed that explicit closed-form update formulae for the parameter estimates in each

iteration of the EM algorithm do not exist. As a result, numerical methods are required

to estimate the model parameters and computational time increases exponentially as the

number of data histories increase.

The second approach, which we adopt in this chapter, is to first pre-process the data

histories and remove as much of the autocorrelation as possible before proceeding to

hidden Markov modeling. The idea is to first decide on an initial approximation for the

healthy portions of the data histories and fit a time series model to the healthy data

portions. The residuals of the fitted model are then computed and formal statistical

tests for conditional independence can be performed. The residuals are then chosen as

the “observation” process in the hidden Markov framework under the assumption of

conditional independence.

In contrast with the Markov switching approach, this approach does not have either of

the two aforementioned drawbacks. In particular, as we will demonstrate in Section 4.3,

computational time for parameter estimation using this method does not grow exponen-

tially with the number of data histories, and, the model constructed using this approach

can be readily used for subsequent maintenance decision making since the memoryless

property is preserved.

The approach of using the residuals of the fitted model has theoretical justification,

and has been successfully applied in a variety of statistical, scientific, and engineering


applications. For a theoretical justification for this approach, see for example Yang and

Makis [79]. In their paper, the authors proposed a general method for studying the resid-

ual behaviour of autocorrelated processes subject to a change from a healthy to unhealthy

system state. They proved that the residuals of the fitted model are conditionally inde-

pendent and normally distributed. For successful application of the residual approach

see e.g. Sohn and Farrar [69], Wang and Wong [75], Schneider and Frank [67] in fault

diagnosis, Baddeley et al. [5] in spatial point processes, Schoenberg [68] in earthquake

occurrences, Wang et al. [76] in vibration data analysis, among others. In Section 4.3,

we apply this approach on real diagnostic data coming from the spectrometric analysis

of failing transmission units.

4.2 Parameter Estimation Using the EM Algorithm

We begin this section by briefly reviewing the EM algorithm in the context of our model.

The EM algorithm, first introduced into the literature by Dempster et al. [19], has been

found to be well-suited for solving parameter estimation problems in the hidden Markov

framework. A comprehensive overview of the EM algorithm and its many applications

are given in McLachlan and Krishnan [55].

Suppose we have collected H ∈ N failure histories, which we denote as H1, . . . ,HH .

Failure history Hi is assumed to be of the form ~Yi = (yi1, . . . , yiTi

) and ξi = ti, where

Ti∆ < ti ≤ (Ti + 1)∆. The sampling history ~Yi represents the collection of all vector

data yij ∈ Rd, j ≤ Ti, that was obtained through condition monitoring until system

failure at time ti. Suppose further that we have collected K ∈ N suspension histories,

which we denote as S1, . . . ,SK . Suspension history Sj is assumed to be of the form ~Yj =

(yj1, . . . , yjTj

) and ξj > Tj∆. Let O = H1, . . . ,HH ,S1, . . . ,SK represent all observable

data and L (γ, θ|O) be the associated likelihood function, where γ = (q12, q13, q23) and

θ = (µ1, µ2,Σ1,Σ2) are the sets of unknown state and observation parameters. Because


the sample paths of the state process (Xt) are not observable, maximizing L (γ, θ|O)

analytically is not possible. The EM algorithm resolves this difficulty by iteratively

maximizing the so-called pseudo likelihood function. More specifically, the EM algorithm

works as follows. Let γ0, θ0 be some initial values of the unknown parameters.

E-STEP. For n ≥ 0, compute the pseudo likelihood function defined by

Q (γ, θ|γn, θn) := Eγn,θn (lnL (γ, θ|C) |O) , (4.2.1)

where C =H1, . . . , HH , S1, . . . , SK

represents the complete data set, in which each fail-

ure historyHi and suspension history Sj of the observable dataO set has been augmented

with the unobservable sample path information of the state process (Xt : t ∈ R+).

M-STEP. Choose γn+1, θn+1 such that

(γn+1, θn+1) ∈ arg maxγ,θ

Q (γ, θ|γn, θn) . (4.2.2)

The E and M steps are repeated until the Euclidean norm |(γn+1, θn+1)− (γn, θn)| < ε,

for ε > 0 small.

We will show in Theorems 4.2.2 and 4.2.3, that (4.2.1) admits the following decompo-

sition Q (γ, θ|γn, θn) = Qstate (γ|γn, θn) +Qobs (θ|γn, θn), where Qstate depends only on the

state parameters γ = (q12, q13, q23) and Qobs depends only on the observation parameters

θ = (µ1, µ2,Σ1,Σ2). This implies in particular that the M-step (4.2.2) can be carried out

separately for the state and observation parameters, which considerably simplifies the

algorithm and increases the speed of computation. It is important to note that under

very general conditions, the EM algorithm may not always converge to the maximum

likelihood estimates, see e.g. Wu [77]. However, we have not encountered such problems

with our model, and, as illustrated in Section 4.3, our parameter estimates converge quite

rapidly.


4.2.1 Form of the Likelihood Function

In this subsection, we are interested in deriving an explicit formula for the likelihood

function L (γ, θ|C) in (4.2.1). Let τ1 = inf t ∈ R+ : Xt > 1 be the unobservable sojourn

time of the state process in healthy state 1. From (4.1.1), it is clear that there is a

one-to-one correspondence between the entire sample path (Xt) of the system state and

the two random variables τ1 and ξ. The distributional properties of the sojourn time and

failure time are given by the following lemma.

Lemma 4.2.1. For each t ∈ R+, the density of ξ is given by

fξ(t) = p12

υ1υ2

υ1 − υ2

(e−υ2t − e−υ1t) + p13υ1e−υ1t. (4.2.3)

For all non-negative s < t, the conditional density of τ1 given ξ is given by

fτ1|ξ(s|t) =p12υ2e

−υ2te−(υ1−υ2)s

p12υ2

υ1−υ2(e−υ2t − e−υ1t) + p13e−υ1t

, (4.2.4)

and for each t ∈ R+, the conditional probability P (τ1 = t|ξ = t) is given by

mτ1|ξ(t|t) =p13e

−υ1t

p12υ2

υ1−υ2(e−υ2t − e−υ1t) + p13e−υ1t

, (4.2.5)

where υ1 = q12 + q13, υ2 = q23, p12 = q12

q12+q13, and p13 = q13

q12+q13.

Proof. Let S1 = Xτ1 be the state of the system at time τ1. Then for each t ∈ R+,

P (ξ ≤ t) = p12P (ξ ≤ t|S1 = 2) + p13P (ξ ≤ t|S1 = 3)

= p12

∫ t

u=0

P (Xt−u = 3|X0 = 2)υ1e−υ1udu+ p13

∫ t

u=0

1 · υ1e−υ1udu

= p12 (1− e−υ1t)− p12

υ1

υ1 − υ2

e−υ2t + p12

υ1

υ1 − υ2

e−υ1t + p13 (1− e−υ1t) ,

which is differentiable in t so that the density of ξ is given by

fξ(t) :=dP (ξ ≤ t)

dt= p12

υ1υ2

υ1 − υ2

(e−υ2t − e−υ1t) + p13υ1e−υ1t,


for all t ∈ R+, and zero otherwise. For all non-negative s < t,

P (τ1 ≤ s, ξ ≤ t) = p12

∫ s

u=0

P (Xt = 3|Xu = 2)υ1e−υ1udu+ p13

∫ s

u=0

1 · υ1e−υ1udu

= p12 (1− e−υ1s)− p12

υ1

υ1 − υ2

e−υ2t + p12

υ1

υ1 − υ2

e−υ2te−(υ1−υ2)s

+p13 (1− e−υ1s) ,

which is differentiable in both variables so that the joint density of (τ1, ξ) for all non-

negative s < t is given by

fτ1,ξ(s, t) :=∂2P (τ1 ≤ s, ξ ≤ t)

∂s∂t= p12υ1υ2e

−υ2te−(υ1−υ2)s,

and for all non-negative s < t, we define the density function

fτ1|ξ(s|t) :=fτ1,ξ(s, t)

fξ(t)=

p12υ2e−υ2te−(υ1−υ2)s

p12υ2

υ1−υ2(e−υ2t − e−υ1t) + p13e−υ1t

.

For s = t, we define the probability mass function

mτ1|ξ(t|t) := P (τ1 = t|ξ = t) = 1−∫s<t

fτ1|ξ(s|t)ds

=p13e

−υ1t

p12υ2

υ1−υ2(e−υ2t − e−υ1t) + p13e−υ1t

,


Before we derive the formula for the likelihood function L(γ, θ|C) in the general case

for H observed failure histories and K suspension histories, we first consider the case with

a single failure history H, i.e. we have collected data ~Y = (y1, . . . , yT ) and the system

is known to have failed at time ξ = t, where T∆ < t ≤ (T + 1)∆. Since the observable

data set O = H and the complete data set C =H

, we denote the likelihood function

L(γ, θ|C) as LH(γ, θ).

Since τ1 and ξ are sufficient for characterizing the sample paths of the state process,

equations (4.2.3) - (4.2.5) imply that the likelihood function LH(γ, θ) is given by

LH(γ, θ) =

g~Y|ξ,τ1(~y|t, τ1)fτ1|ξ(τ1|t)fξ(t), τ1 < t

g~Y|ξ,τ1(~y|t, t)mτ1|ξ(t|t)fξ(t), τ1 = t(4.2.6)


where g~Y|ξ,τ1(~y|t, s) is the conditional density of the observation process ~Y = (Y1, . . . , YT )

given ξ = t and τ1 = s ≤ t, which can be expressed in an explicit form. For any

s ∈ ((k − 1)∆, k∆], k = 1, . . . , T , equation (4.1.2) implies that g~Y|ξ,τ1(~y|t, s) is given by

g~Y|ξ,τ1(~y|t, s) = g~Y|ξ,τ1(~y|t, k∆)

=

exp

(−1

2

k−1∑n=1

(yn − µ1)′Σ−1

1 (yn − µ1)−1

2

T∑n=k

(yn − µ2)′Σ−1

2 (yn − µ2)

)√

(2π)Td detk−1(Σ1) detT−k+1(Σ2),(4.2.7)

and for any s > T∆, g~Y|ξ,τ1(~y|t, s) is given by

g~Y|ξ,τ1(~y|t, s) = g~Y|ξ,τ1(~y|t, t)

=

exp

(−1

2

T∑n=1

(yn − µ1)′Σ−1

1 (yn − µ1)

)√

(2π)Td detT (Σ1). (4.2.8)

We next consider the case where we have observed only a single suspension history S,

i.e. we have collected data ~Y = (y1, . . . , yT ) and stopped observing the operating system

at time T∆. Since the observable data set O = S and the complete data set C =S

,

in this case we denote the likelihood function L(γ, θ|C) as LS(γ, θ). For each s, t ∈ R+,

it is not difficult to see that the conditional reliability function of ξ given τ1 is given by

h(t|s) := P (ξ > t|τ1 = s) =

p12e−υ2(t−s), t ≥ s

1, t < s(4.2.9)

Furthermore, it is well-known that the density function of the unobservable sojourn time

τ1 is given by

fτ1(s) =

υ1e−υ1s, s ≥ 0

0, s < 0(4.2.10)

Then equations (4.2.7) - (4.2.10) imply that the likelihood function LS(γ, θ) is given by

LS(γ, θ) = g~Y|ξ,τ1(~y|t, τ1)h(t|τ1)fτ1(τ1). (4.2.11)


Thus, for the general case in which we have observed H independent failure histories

H1, . . . ,HH and K independent suspension histories S1, . . . ,SK , the likelihood function

is given by

L(γ, θ|C) =H∏i=1

LHi(γ, θ)K∏j=1

LSj(γ, θ), (4.2.12)

where the likelihood functions for the individual failure and suspension histories are given

by equations (4.2.6) and (4.2.11), respectively.

4.2.2 Form of the Pseudo Likelihood

In this subsection, we are interested in carrying out the E-step of the EM algorithm, i.e.

deriving the pseudo likelihood by taking the expectation of the likelihood function given

by (4.2.12). As in the previous subsection, we first analyze the case in which we have

observed only a single failure history H of the form ~Y = (y1, . . . , yT ) and ξ = t, where

T∆ < t ≤ (T + 1)∆. Thus, for any fixed estimates γ, θ of the state and observations

parameters, we are interested in deriving the formula for the pseudo likelihood function

QH(γ, θ|γ, θ) = Eγ,θ (lnLH(γ, θ)|H), where the likelihood function LH(γ, θ) is given in

(4.2.6).

To simplify notation, for the remainder of the chapter we denote ~γ = (q12, q13, q23)′

and g = (g~Y|ξ,τ1(~y|t,∆), . . . , g~Y|ξ,τ1(~y|t, T∆), g~Y|ξ,τ1(~y|t, t))′. Also, for any vector v =

(v1, . . . , vn)′, we denote lnv := (ln v1, . . . , ln vn)′. The inner product 〈v,w〉 := v′w.

Theorem 4.2.2. Given a single failure history H, the pseudo likelihood function has the

following decomposition

QH(γ, θ|γ, θ) = QstateH (γ|γ, θ) +Qobs

H (θ|γ, θ), (4.2.13)

where

QstateH (γ|γ, θ) = 〈a, ~γ〉+

⟨b, ln~γ

⟩,

QobsH (θ|γ, θ) = 〈c, ln g〉 , (4.2.14)


for vectors a, b, and c that depend only on the fixed estimates γ, θ.

Proof. Using equations (4.2.3) - (4.2.5) of Lemma 4.2.1 and the formula for the likelihood

function LH(γ, θ) given by (4.2.6),

QH(γ, θ|γ, θ) = Eγ,θ (lnLH(γ, θ)|H)

= Eγ,θ

(lnLH(γ, θ)|~Y = ~y, ξ = t

)

=

∫s<t

ln(g~Y|ξ,τ1(~y|t, s)fτ1|ξ(s|t)fξ(t)

)g~Y|ξ,τ1(~y|t, s)fτ1|ξ(s|t)ds

+ ln(g~Y|ξ,τ1(~y|t, t)mτ1|ξ(t|t)fξ(t)

)g~Y|ξ,τ1(~y|t, t)mτ1|ξ(t|t)

∫u<t

g~Y|ξ,τ1(~y|t, u)fτ1|ξ(u|t)du+ g~Y|ξ,τ1(~y|t, t)mτ1|ξ(t|t),

where the notation g~Y|ξ,τ1 , fτ1|ξ, mτ1|ξ is used to signify that the functions g~Y|ξ,τ1 , fτ1|ξ,

mτ1|ξ are parameterized by fixed estimates γ, θ. Since g~Y|ξ,τ1 defined in (4.2.7) and (4.2.8)

depends only on observation parameters θ = (µ1, µ2,Σ1,Σ2), and fξ, fτ1|ξ, and mτ1|ξ

depend only on state parameters γ = (q12, q13, q23), the equation above can be decomposed

into two terms QH(γ, θ|γ, θ) = QstateH (γ|γ, θ)+Qobs

H (θ|γ, θ). Substituting equations (4.2.3)

- (4.2.5) of Lemma 4.2.1, the first term QstateH simplifies to

QstateH (γ|γ, θ) =

∫s<t

ln(q12q23e

−q23te−(q12+q13−q23)s)g~Y|ξ,τ1(~y|t, s)fτ1|ξ(s|t)ds

+ ln (q13e−(q12+q13)t) g~Y|ξ,τ1(~y|t, t)mτ1|ξ(t|t)

∫u<t

g~Y|ξ,τ1(~y|t, u)fτ1|ξ(u|t)du+ g~Y|ξ,τ1(~y|t, t)mτ1|ξ(t|t)

=: a12q12 + a13q13 + a23q23 + b12 ln q12 + b13 ln q13 + b23 ln q23,

where constants that depend only on fixed parameter estimates γ, θ are given by

a12 = a13 = − p12υ2e−υ2t

d〈e2, g〉 −

tp13e−υ1t

dg~Y|ξ,τ1(~y|t, t),

a23 =p12υ2e

−υ2t

d(〈e2, g〉 − t 〈e1, g〉) ,

b12 = b23 =p12υ2e

−υ2t

d〈e1, g〉 , (4.2.15)

b13 =p13e

−υ1t

dg~Y|ξ,τ1(~y|t, t),

d = p12υ2e−υ2t 〈e1, g〉+ p13e

−υ1tg~Y|ξ,τ1(~y|t, t),


and vectors e1 = (e11, . . . , e

T1 , e

t1)′ and e2 = (e1

2, . . . , eT2 , e

t2)′ are defined by

ek1 =e−(υ1−υ2)(k−1)∆ − e−(υ1−υ2)k∆

υ1 − υ2

, k = 1, . . . , T,

et1 =e−(υ1−υ2)T∆ − e−(υ1−υ2)t

υ1 − υ2

,

ek2 =ek1 − k∆e−(υ1−υ2)k∆ + (k − 1)∆e−(υ1−υ2)(k−1)∆

υ1 − υ2

, k = 1, . . . , T, (4.2.16)

et2 =et1 − te−(υ1−υ2)t + T∆e−(υ1−υ2)T∆

υ1 − υ2

,

Similarly, the second term QobsH , which is a function only of the observation parameters

θ, simplifies to

QobsH (θ|γ, θ) =

∫s<t

ln(g~Y|ξ,τ1(~y|t, s)

)g~Y|ξ,τ1(~y|t, s)fτ1|ξ(s|t)ds

+ ln(g~Y|ξ,τ1(~y|t, t)

)g~Y|ξ,τ1(~y|t, t)mτ1|ξ(t|t)

∫u<t

g~Y|ξ,τ1(~y|t, u)fτ1|ξ(u|t)du+ g~Y|ξ,τ1(~y|t, t)mτ1|ξ(t|t)

=:T∑k=1

ck ln(g~Y|ξ,τ1(~y|t, k∆)

)+ ct ln

(g~Y|ξ,τ1(~y|t, t)

),

where constants that depend only on γ, θ are given by

ck =p12υ2e

−υ2tek1

dg~Y|ξ,τ1(~y|t, k∆), k = 1, . . . , T,

ct =

(p12υ2e

−υ2tek1 + p13e−υ1t

d

)g~Y|ξ,τ1(~y|t, t). (4.2.17)

To finish the proof, put a = (a12, a13, a23)′, b = (b12, b13, b23)

′, and c = (c1, . . . , cT , ct)′.

It is interesting to note that the quantities appearing in Theorem 4.2.2 can be given

a probabilistic interpretation. In particular, by inspecting the proof of Theorem 4.2.2, it

follows that quantities a = (a12, a13, a23)′, b = (b12, b13, b23)

′, and c = (c1, . . . , cT , ct)′ have

the following probabilistic interpretations:

• −a12 and −a13 equal the conditional expectation of the sojourn time of the system

in the healthy state 1 given ~Y = ~y and ξ = t.


• −a23 equals the conditional expectation of the sojourn time of the system in the

unhealthy state 2 given and ~Y = ~y and ξ = t.

• b12 and b23 equal the conditional probability the sojourn time τ1 is less than t given

~Y = ~y and ξ = t.

• b13 and ct equal the conditional probability the sojourn time τ1 is equal t to given

~Y = ~y and ξ = t.

• For each k = 1, . . . , T , ck equals the conditional probability the sojourn time τ1 is

in the interval [(k − 1)∆, k∆) given ~Y = ~y and ξ = t.

We next analyze the case in which we have observed only a single suspension history

S of the form ~Y = (y1, . . . , yT ) and ξ > T∆. That is, for any fixed estimates γ, θ

of the state and observations parameters, we are interested in deriving the formula for

the pseudo likelihood function QS(γ, θ|γ, θ) = Eγ,θ (lnLS(γ, θ)|S), where the likelihood

function LS(γ, θ) is given in (4.2.11).

Theorem 4.2.3. Given a single suspension history S, the pseudo likelihood function has

the following decomposition

QS(γ, θ|γ, θ) = QstateS (γ|γ, θ) +Qobs

S (θ|γ, θ), (4.2.18)

where

QstateS (γ|γ, θ) = 〈~α,~γ〉+ ϕ1 ln(q12) + ϕ2 ln(q12 + q13),

QobsS (θ|γ, θ) =

⟨~β, ln g

⟩, (4.2.19)

for constants ~α, ~β, ϕ1 and ϕ2 that depend only on the fixed estimates γ, θ.

Proof. To simplify notation, we put t := T∆ in the proof. Using equations (4.2.9) and


(4.2.10) and the formula for the likelihood function LS(γ, θ) given by (4.2.11),

QS(γ, θ|γ, θ) = Eγ,θ (lnLS(γ, θ)|S)

= Eγ,θ

(lnLS(γ, θ)|~Y = ~y, ξ > t

)=

∫ln(g~Y|ξ,τ1(~y|t, s)h(t|s)fτ1(s)

)g~Y|ξ,τ1(~y|t, s)h(t|s)fτ1(s)ds∫

g~Y|ξ,τ1(~y|t, u)h(t|u)fτ1(u)du.

Since h and fτ1 defined in (4.2.9) and (4.2.10) depend only on the state parameters γ,

the equation above can be decomposed into two terms QS(γ, θ|γ, θ) = QstateS (γ|γ, θ) +

QobsS (θ|γ, θ), where the first term Qstate

S depends only on γ and the second term QobsS

depends only on θ. Substituting equations (4.2.9) and (4.2.10), the first term QstateS

simplifies to

QstateS (γ|γ, θ) =

∫ln (h(t|s)fτ1(s)) g~Y|ξ,τ1(~y|t, s)h(t|s)fτ1(s)ds∫

g~Y|ξ,τ1(~y|t, u)h(t|u)fτ1(u)du

=

∫s≤t ln (q12e

−q23te−(q12+q13−q23)s) g~Y|ξ,τ1(~y|t, s)h(t|s)fτ1(s)ds

+∫s>t

ln ((q12 + q13)e−(q12+q13)s) g~Y|ξ,τ1(~y|t, s)h(t|s)fτ1(s)ds

∫g~Y|ξ,τ1(~y|t, u)h(t|u)fτ1(u)du

=: α12q12 + α13q13 + α23q23 + ϕ1 ln(q12) + ϕ2 ln(q12 + q13),

where constants that depend only on fixed parameter estimates γ, θ are given by

α12 = α13 = − q12e−υ2t

δ〈e2, g〉 −

(t+ υ−11 )e−υ1t

δg~Y|ξ,τ1(~y|t, t),

α23 =q12e

−υ2t

δ(〈e2, g〉 − t 〈e1, g〉) ,

ϕ1 =q12e

−υ2t

δ〈e1, g〉 , (4.2.20)

ϕ2 =e−υ1t

δg~Y|ξ,τ1(~y|t, t),

δ = q12e−υ2t 〈e1, g〉+ e−υ1tg~Y|ξ,τ1(~y|t, t),

and vectors e1 and e2 are defined in (4.2.16). Similarly, the second term QobsS , which is a


function only of the observation parameters θ, simplifies to

QobsS (θ|γ, θ) =

∫ln(g~Y|ξ,τ1(~y|t, s)

)g~Y|ξ,τ1(~y|t, s)h(t|s)fτ1(s)ds∫

g~Y|ξ,τ1(~y|t, u)h(t|u)fτ1(u)du

=:T∑k=1

βk ln(g~Y|ξ,τ1(~y|t, k∆)

)+ βt ln

(g~Y|ξ,τ1(~y|t, t)

),

where constants that depend only on γ, θ are given by

βk =q12e

−υ2tek1

δg~Y|ξ,τ1(~y|t, k∆), k = 1, . . . , T,

βt =

(q12e

−υ2tek1 + e−υ1t

δ

)g~Y|ξ,τ1(~y|t, t). (4.2.21)

To finish the proof, put ~α = (α12, α13, α23)′ and ~β = (β1, . . . , βT , βt)

′.

As in the case of Theorem 4.2.2, the quantities appearing in Theorem 4.2.3 can be

given similar probabilistic interpretations.

Finally, for the general case in which we have observed H independent failure histories

H1, . . . ,HH and K independent suspension histories S1, . . . ,SK , Theorems 4.2.2 and 4.2.3

and equation (4.2.12) imply that the pseudo likelihood function is given by

Q(γ, θ|γ, θ) = Eγ,θ (lnL(γ, θ|C)|O)

= Eγ,θ

(ln

(H∏i=1

LHi(γ, θ)K∏j=1

LSj(γ, θ)

)|O

)

=H∑i=1

Eγ,θ (ln (LHi(γ, θ)) |Hi) +K∑j=1

Eγ,θ

(ln(LSj(γ, θ)

)|Sj)

=H∑i=1

QHi(γ, θ|γ, θ) +K∑j=1

QSj(γ, θ|γ, θ). (4.2.22)

Thus, to evaluate the pseudo likelihood function for all available histories, is suffices to

evaluate the pseudo likelihood function for individual failure and suspension histories

separately. Equation (4.2.22) completes the E-step of the EM algorithm. In the next

subsection, we solve the M-step of the EM algorithm and derive explicit parameter update

formulas for the maximizers of the pseudo likelihood function defined in (4.2.22).


4.2.3 Maximization of the Pseudo Likelihood Function

In this subsection we are interested in finding maximizers of the pseudo likelihood function

defined in (4.2.22). By Theorems 4.2.2 and 4.2.3, the pseudo likelihood function can be

decomposed as Q(γ, θ|γ, θ) = Qstate(γ|γ, θ) +Qobs(θ|γ, θ), where Qstate is a function only

of the state parameters γ = (q12, q13, q23) and Qobs is a function only of the observation

parameters θ = (µ1, µ2,Σ1,Σ2). This means that the M-step can be carried out separately

for the state and observation parameters. Using equation (4.2.22) and Theorems 4.2.2

and 4.2.3, we solve for the stationary points of the state parameters γ = (q12, q13, q23).

After some algebra, it is not difficult to check that there is a unique stationary point

γ∗ = (q∗12, q∗13, q

∗23) of the pseudo likelihood function given explicitly by

q∗12 = −

H∑i=1

bi12 +K∑j=1

ϕj1 +

K∑j=1

ϕj2

(H∑i=1

bi12 +K∑j=1

ϕj1

)H∑i=1

bi12 +K∑j=1

ϕj1 +H∑i=1

bi13

H∑i=1

ai12 +K∑j=1

αj12

,

q∗13 = q∗12

H∑i=1

bi13

H∑i=1

bi12 +K∑j=1

ϕj1

, q∗23 = −

H∑i=1

bi23

H∑i=1

ai23 +K∑j=1

αj23

, (4.2.23)

where constants ai

= (ai12, ai13, a

i23), b

i

= (bi12, bi13, b

i23), ~α

j = (αj12, αj13, α

j23), ϕ

j1, and ϕj2 are

given in equations (4.2.15) and (4.2.20). Similarly, using equations (4.2.7) and (4.2.8),

it follows that there is a unique stationary point of the observation parameters θ∗ =


(µ∗1, µ∗2,Σ

∗1,Σ

∗2) given explicitly by

µ∗1 =

H∑i=1

ni1 · ci+

K∑j=1

nj1 · ~βj

H∑i=1

⟨ci,di1⟩

+K∑j=1

⟨~βj,dj1

⟩ , Σ∗1 =

H∑i=1

ni3 · ci+

K∑j=1

nj3 · ~βj

H∑i=1

⟨ci,di1⟩

+K∑j=1

⟨~βj,dj1

⟩ ,

µ∗2 =

H∑i=1

ni2 · ci+

K∑j=1

nj2 · ~βj

H∑i=1

⟨ci,di2⟩

+K∑j=1

⟨~βj,dj2

⟩ , Σ∗2 =

H∑i=1

ni4 · ci+

K∑j=1

nj4 · ~βj

H∑i=1

⟨ci,di2⟩

+K∑j=1

⟨~βj,dj2

⟩ , (4.2.24)

where vectors

n1 =

(0,∑n≤1

yn, . . . ,∑n≤T

yn

), n2 =

(∑n≥1

yn,∑n≥2

yn, . . . , yT , 0

),

n3 =

(0,∑n≤1

(yn − µ∗1)(yn − µ∗1)′, . . . ,∑n≤T

(yn − µ∗1)(yn − µ∗1)′),

n4 =

(∑n≥1

(yn − µ∗2)(yn − µ∗2)′, . . . , (yT − µ∗2)(yT − µ∗2)′, 0

),

d1 = (0, 1, . . . , T )′, d2 = (T, T − 1, . . . , 1, 0)

′,

and constants ci

= (ci1, . . . , ciTi, citi) and ~βj = (βj1, . . . , β

jTj, βjtj) are given in equations

(4.2.17) and (4.2.21). This completes the M-step of the EM algorithm.

The results obtained in equations (4.2.23) and (4.2.24) can be viewed as a general-

ization of the parameter estimation result for multivariate normal mixture models (see

e.g. McLachlan and Krishnan [55], Section 2.7.2, equations (2.56) and (2.58)). In such

mixture models, multivariate normal data is drawn from a finite number of unobservable

groups, where the mean and covariance matrix can depend on the underlying group.

In our model, the mean and covariance matrix depend on the unobservable state (i.e.

healthy or warning state) of the Markov process (Xt), which makes the analysis more

difficult.


Figure 4.3.1: Spectrometric Measurements of Copper and Iron

4.3 A Practical Application

In this section, we develop a numerical example using real-world data coming from mining

industry, which illustrates the entire estimation procedure. In particular, we analyze con-

dition monitoring data coming from the transmission oil samples of 240-ton heavy hauler

trucks used in the Athabasca oil sands of Alberta, Canada. During the operational life of

each transmission unit, oil samples are collected every ∆ = 600 hours and spectrometric

oil analysis is carried out, which provides the concentrations (in ppm) of iron and copper

that come from the direct wear of the transmission unit. The total number of data his-

tories recorded is 36, which consists of H = 13 failure histories and K = 23 suspension

histories. A typical data history is given in Figure 4.3.1. This particular transmission

unit failed after the 13th sampling epoch at 8123 operational hours.

As detailed in Section 4.1, to satisfy the assumption of independence and normality,

we first need to fit a model that accounts for autocorrelation in the data histories, and

choose as the observation process, in our the hidden Markov model, the residuals of the


fitted model. Before fitting a model to the data histories, we have to approximate the

healthy portions of the data histories. Partitioning a non-stationary time series into a

finite number of stationary portions is known as time series segmentation. The purpose

of segmentation in our application is to achieve stationarity in the healthy portions of the

data histories so that the residuals of the fitted model can be computed. Generally, there

is no agreed upon criterion for selecting the ‘optimal’ segmentation. Thus, a variety

of segmentation methods exist, ranging from very sophisticated algorithms to simple

heuristic graphical methods (see e.g. Keogh [36] and Fukuda et al. [23]). For our

application, for simplicity we have chosen to segment the data histories via graphical

examination. For each of the H + K = 36 data histories, the healthy portions of the

data histories are denotedzl1, . . . , z

ltl

, l = 1, . . . , H+K. The healthy data histories are

assumed to follow a common stationary VAR process (see e.g. Reinsel [60]) given by

Zn − δ0 =

p∑r=1

Φr(Zn−1 − δ0) + εn, n ∈ Z, (4.3.1)

where εn are i.i.d. N2(0,C), the model order p ∈ N, the autocorrelation matrices Φr ∈

R2×2, and the mean and covariance model parameters δ0 ∈ R2 and C ∈ R2×2. All model

parameters are unknown and need to be estimated. We set δ = δ0 =∑p

r=1Φrδ0, and

write equation (4.3.1) in standard form

Zn = δ +

p∑r=1

ΦrZn−1 + εn, n ∈ Z,

so that the observed healthy data historieszl1, . . . , z

ltl

, l = 1, . . . , H + K, have the

regression representation W = VA+ E, where

W′ =[zH+KtH+K· · · zH+K

p+1 · · · z1t1· · · z1

p+1

], A′ = [δ Φ1 · · ·Φp] ,

E ′ =[εH+KtH+K· · · εH+K

p+1 · · · ε1t1· · · ε1

p+1

],

V′ = =

1 · · · 1 · · · 1 · · · 1

zH+KtH+K−1 · · · zH+K

p · · · z1tH+K−1 · · · z1

p

... · · · ... · · · ... · · · ...

zH+KtH+K−p · · · zH+K

1 · · · z1tH+K−p · · · z1

1

.


Reinsel [60] showed that the least squares estimates for A and C are given by

A = (V′V)−1V′W,

C = (T − (2p+ 1))−1(W−VA)′(W−VA), (4.3.2)

where T =∑H+K

l=1(tl − p) is the total number of available data points. The estimate for

the model order p ∈ N is obtained by testing H0 : Φp = 0 against Ha : Φp 6= 0 and using

the likelihood ratio statistic given by

Mp = −(T − 2p− 1− 1/2) lndet(Sp)

det(Sp−1),

where Sp = (W −VA)′(W −VA) is the residual sum of squares matrix obtained from

(4.3.2) when fitting a VAR model of order p ∈ N. For T large, if H0 is true, Mp converges

in distribution to χ24. Thus, for significance level α ∈ (0, 1), we reject H0 if Mp > χ2

4,α.

For our 2-dimensional spectrometric oil data, we find that M2 = 221.84 and M3 =

10.28. From the chi-square distribution with 4 degrees of freedom and α = 0.01, χ24,α =

13.28. Since M2 > χ24,0.01 and M3 < χ2

4,0.01, we reject H0 : Φ2 = 0 and fail to reject

H0 : Φ3 = 0. Thus we conclude that p = 2 is an adequate model order and using (4.3.2)

the VAR model parameter estimates are given by

Φ1 =

0.3825 −0.0758

−0.0672 0.1775

, Φ2 =

0.3356 0.0063

−0.0169 0.3532

,δ =

7.6819

4.0570

, C =

7.1789 2.0260

2.0260 3.5725

, (4.3.3)

From parameter estimates given in (4.3.3), the eigenvalues of

Φ =

Φ1 Φ2

I 0

,are 0.8200, 0.6729, -0.4156, and -0.5173 which are all smaller than one in absolute value

implying the fitted model is stationary.


Using estimates ψ = (δ, p, Φ1, Φ2, C), we define the residual process (Yn : n ∈ N) by

Yn := Zn − Eψ(Zn|~Zn−1), (4.3.4)

where ~Zn−1 = (Z1, . . . , Zn−1). The residuals are then computed for both the healthy

and unhealthy portions of each data history. We now present a method for explicitly

computing (4.3.4).

We first note that for n > p,

Yn = Zn −

[δ +

p∑r=1

ΦrZn−1

].

For n < p, we recursively compute Yn using the Kalman filter by writing (4.3.1) as

a state-space model (see e.g. Reinsel [60]). We choose as the state and observation

equation

αn = D + Tαn−1 + En,

Zn = Hαn,

where

αn =

Zn...

Zn−p+1

, D =

δ

0

...

0

, T =

Φ1 Φ2 · · · Φp

Iq 0 · · · 0

.... . . . . .

...

0 · · · Iq 0

,

En =

εn

0

...

0

, H = (Iq 0 · · ·0) ,


and εn are i.i.d. Nq(0, C), and q = 2. For each m ≥ 0, define

αn+m|n = E(αn+m|~Zn),

Pn+m|n = E((αn+m − αn+m|n)(αn+m − αn+m|n)′|~Zn),

ηn+m|n = Zn+m − E(Zn+m|~Zn),

fn+m|n = E(ηn+m|nη′n+m|n|~Zn).

Then, the Kalman filter is given by the following six recursive equations

αn+1|n = D + Tαn|n, Pn+1|n = TPn|nT′ + C,

ηn+1|n = Zn+1 −Hαn+1|n, fn+1|n = HPn+1|nH′,

αn+1|n+1 = αn+1|n + (Pn+1|nH′)(f−1

n+1|n)ηn+1|n,

Pn+1|n+1 = Pn+1|n − (Pn+1|nH′)(f−1

n+1|n)(HPn+1|n),

which is initiated by setting

α0|0 = (Iqp − T )−1D,

P0|0 = vec−1[(I(qd)2 − T ⊗ T )−1vec(C)

].

Thus, for each n ≤ p, using the recursive equations above we obtain

Yn := Zn − Eψ(Zn|~Zn−1) =: ηn|n−1.

The residuals for both the healthy and warning data sets are provided graphically in

a 2-dimensional scatter plot in Figure 4.3.2.

We statistically test the independence and normality assumptions using the Port-

manteau Independence Test [15] and the Henze-Zirkler Multivariate Normality Test [29],

respectively, and obtain the following results.

Table 4.3.1 shows that there is no statistical evidence to reject the hypotheses that the

residuals of the fitted model are independent and have multivariate normal distribution,

as proved theoretically by Yang and Makis [79].


Figure 4.3.2: Scatter plot for the residuals. The crosses are residuals computed from the healthy data

and the circles are residuals computed from the unhealthy data.

Table 4.3.1: p-Values of the Residual Independence and Normality Tests.

Healthy Data Set Unhealthy Data Set

Independence (Portmanteau) 0.0675 0.4284

Normality (Henze-Zirkler) 0.6911 0.5270

The residuals now constitute the observation process (Yn) in our hidden Markov

model. Using equations (4.2.23) and (4.2.24) of Section 4.2.3, and the Euclidean norm

stopping criterion |(γn+1, θn+1)− (γn, θn)| < 10−4, we have obtained the following results

summarized in Table 4.3.2.

Table 4.3.2 shows that iterations of the EM algorithm take on average 8.27 seconds

which is extremely fast for offline computations. Furthermore, the estimates converge

rapidly in 3 iterations, which is an attractive feature for real applications. All computa-

tions were coded in Matlab on an Intel Corel 2 6420, 2.13 GHz with 2 GB RAM.

Thus, for this application the condition of the transmission unit is modeled as a


Table 4.3.2: Iterations of the EM Algorithm.

Initial Values Update 1 Update 2 Optimal Estimates

q12 0.0030 0.0410 0.0302 0.0303

q13 0.0000 0.0001 0.0001 0.0001

q23 0.1500 0.3510 0.3545 0.3548

µ1

1.5

0.8

1.2

0.8

1.1

1.9

1.1

1.9

µ2

11

5.5

4.2

5.3

4.2

5.5

4.1

5.5

Σ1

11.2 6.8

6.8 8.9

7.2 1.8

1.8 3.2

7.2 1.9

1.9 3.7

7.2 2.0

2.0 3.6

Σ2

11.2 6.8

6.8 8.9

7.4 1.3

1.3 3.1

7.5 1.1

1.1 3.2

7.6 1.0

1.0 3.2

Q −1.78× 10−3 −1.41× 10−3 −1.39× 10−3 −1.39× 10−3

Time (sec) 4.12 7.19 9.83 11.95

continuous time homogeneous Markov chain (Xt : t ∈ R+) with state space X = 1, 2 ∪

3 and transition rate matrix

Q =

−0.0304 0.0303 0.0001

0 −0.3548 0.3548

0 0 0

and the bivariate residual vectors Yn follows N2 (µ1,Σ1) when the system is in healthy

state 1 and N2 (µ2,Σ2) when the system is in warning state 2, where

µ1 =

1.1

1.9

, µ2 =

4.1

5.5

, Σ1 =

7.2 2.0

2.0 3.6

, Σ2 =

7.6 1.0

1.0 3.2

.



In this chapter, a parameter estimation problem for partially observable failing systems

has been considered. System deterioration is driven by a continuous time homogeneous

Markov chain and the system state is unobservable, except the failure state. Vector

autoregressive information is obtained through condition monitoring at equidistant sam-

pling times. Two types of data histories were considered: data histories that end with

observable failure and data histories that end when the system has been suspended from

operation. The state and observation process have been modeled in the hidden Markov

framework and the maximum likelihood estimates of the model parameters have been

obtained using the EM algorithm. It was shown that both the pseudo likelihood function

and the parameter updates in each iteration of the EM algorithm have explicit formulas.

A numerical example has been developed to illustrate the estimation procedure using real

oil data coming from failing transmission units. It has been found that the procedure is

both computationally efficient and converges rapidly to reasonable parameter estimates.

There are a variety of interesting extensions and topics for future research. Recall

that at the beginning of Section 4.3, the observation histories were pre-processed and

residuals obtained. One direction of future research would be to systematically investigate

the effect that different pre-processing methods have on the parameter estimates in the

hidden Markov framework. In Section 4.3, it was shown empirically that the parameter

estimates converged quite rapidly using the EM algorithm. Another interesting topic of

future research would be to analytically investigate the rate of this convergence using

methods of mathematical statistics. Finally, recall that it was assumed that only a single

vector measurement is taken at each sampling epoch, which is the usual case in condition-

based maintenance applications. A final interesting topic for future research would be to

see if the analysis given in this chapter can be extended to the case where more than one

sampling unit is collected at each sampling epoch.

Bibliography

[1] Anderson, R.F.; Friedman, A. Optimal Inspection in a Stochastic Control Problem

with Costly Observations II. Math Oper Res, 1978, 3, 67-81.

[2] Asmussen, S.; Nerman, O.; Olsson M. Fitting Phase-Type Distribution via the EM

Algorithm. Scan J Stat, 1996, 23, 419-441.

[3] Aven, T.; Bergman, B. Optimal Replacement Times - A General Set-up. J Appl

Probab, 1986, 23, 432-442.

[4] Avriel, M.; Diewert, W.E.; Schaible, S.; Zang, I. Generalized Concavity. Springer,

1988.

[5] Baddeley, A.; Turner, R.; Moller, J.; Hazelton, M. Residual Analysis for Spatial

Point Processes. J Roy Stat Soc, 2005, 67, 617-666.

[6] Barlow, R.; Hunter, L. Optimum Preventive Maintenance Policies. Oper Res, 1960,

8, 90-100.

[7] Bertsekas, D.P.; Shreve, S.E. Stochastic Optimal Control: The Discrete Time Case.

Academic Press, New York, 1978.

[8] Bimls, C.; McCarthy, D.; Al-Ani, T. Condition-Based Maintenance of Machine Using

Hidden Markov Models. Mech Syst Signal Pr, 2000, 14, 597-612.

[9] Billingsley, P. Probability and Measure. Wiley-Interscience, 1995.

87

Bibliography 88

[10] Bremaud, P. Point Processes and Queues: Martingale Dynamics. Springer-Verlag,

1981.

[11] Calabrese, J.M. Bayesian Process Control for Attributes. Manage Sci, 1995, 41,

637-645.

[12] Cekyay, B.; Ozekici, S. Condition-Based Maintenance under Markovian Deteriora-

tion. In Wiley Encyclopedia of Operations Research and Management Science, John

Wiley & Sons, NJ, 2011.

[13] Chhatwal, J.; Alagoz, O.; Burnside, E.S. Optimal Breast Biopsy Decision-Making

Based on Mammographic Features and Demographic Factors. Oper Res, 2010, 58,

1577-1591.

[14] Christer, A.H.; Wang, W.; Sharp, J.M. A State Space Condition Monitoring Model

for Furnace Erosion Prediction and Replacement. Eur J Oper Res, 1997, 101, 1-14.

[15] Cromwell, J.B.; Hannan, M.J.; Labys, W.C.; Terraza, M. Multivariate Tests for

Time Series Models. Sage Publications, 1994.

[16] Davis, M.H.A. Markov Models and Optimization. Chapman and Hall, 1993.

[17] Dayanik, S.; Goulding, C.; Poor, H.V. Bayesian Sequential Change Diagnosis. Math

Oper Res, 2008, 33, 475-496.

[18] Dayanik, S.; Gurler, U. An Adaptive Bayesian Replacement Policy with Minimal

Repair. Oper Res, 2002, 50, 552-558.

[19] Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum Likelihood from Incomplete

Data via the EM Algorithm. J Roy Stat Soc, 1977, 39, 1-38.

[20] Dieulle, L.; Berenguer, C.; Grall, A.; Roussignol, M. Sequential Condition-Based

Maintenance Scheduling for a Deteriorating System. Eur J Oper Res, 2003, 150,

451-461.

Bibliography 89

[21] Dogramaci, A.; Fraiman, N.M. Replacement Decisions with Maintenance Under

Uncertainty: An Imbedded Optimal Control Model. Oper Res, 2004, 52, 785-794.

[22] Douc, R.; Moulines, E.; Ryden, T. Asymptotic Properties of the Maximum Likeli-

hood Estimator in Autoregressive Models with Markov Regime. Ann Stat, 2004, 32,

2254-2304.

[23] Fukuda, K.; Stanley, H.E.; Amaral, L.A.N. Heuristic Segmentation of Non-

Stationary Time Series. Phys Rev E, 2004, 69, 1-12.

[24] Ghasemi, A.; Yacout, S.; Ouali, M.S. Parameter Estimation Methods for Condition-

Based Maintenance with Indirect Observations. IEEE T Reliab, 2010, 59, 426-439.

[25] Genon-Catalot, V.; Laredo, C. Leroux’s Method for General Hidden Markov Models.

Stoch Proc Appl, 2006, 116, 222-243.

[26] Grimmett, G.; Stirzaker, D. Probability and Random Processes. Oxford University

Press, 2001.

[27] Hamilton, J.D. Analysis of Time Series Subject to Changes in Regime. J Economet-

rics, 1990, 45, 39-70.

[28] Heidergott, B.; Farenhorst-Yuan, T. Gradient Estimation for Multicomponent Main-

tenance Systems with Age-Replacement Policy. Oper Res, 2010, 58, 706-718.

[29] Henze, N.; Zirkler, B. A Class of Invariant Consistent Tests for Multivariate Nor-

mality. Commun Stat A-Theor, 1990, 19, 3595-3617.

[30] Jardine, A.K.S.; Lin, D.; Banjevic, D. A Review on Machinery Diagnostics and Prog-

nostics Implementing Condition-Based Maintenance. Mech Syst Signal Pr, 2006, 20,

1483-1510.

[31] Jensen, U. Monotone Stopping Rules for Stochastic Processes in a Semimartingale

Representation with Applications. Optim, 1989, 6, 837-852.

Bibliography 90

[32] Jiang, R. Optimization of Alarm Threshold and Sequential Inspection Scheme. Re-

liab Eng Syst Safe, 2010, 95, 208-215.

[33] Jiang, X.; Makis, V.; Jardine, A.K.S. Optimal Repair-Replacement Policy for a

General Repair Model. Adv Appl Probab, 2001, 33, 206-222.

[34] Juang, M.; Anderson, G. A Bayesian Method on Adaptive Preventive Maintenance

Problem. Eur J Oper Res, 2004, 155, 455-473.

[35] Kander, Z. Inspection Policies for Deteriorating Equipment Characterized by N

Quality Levels. Nav Res Log, 1978, 25, 243-255.

[36] Keogh, E.; Chu, S.; Hart, D.; Pazzani, M. Segmenting Time Series: A Survey and

Novel Approach. World Scientific, 1993.

[37] Kim, C.G. Dynamic Linear Models with Markov-Switching. J Econometrics, 1994,

60, 1-22.

[38] Kim, M.J.; Jiang, R.; Makis, V.; Lee, C.G. Optimal Bayesian Fault Prediction

Scheme for a Partially Observable System Subject to Random Failure. Eur J Oper

Res, 2011, 214, 331-339.

[39] Kim, M.J.; Makis, V. Optimal Control of Partially Observable Failing Systems with

Costly Multivariate Observations. Stoch Model, 2012, Under Review.

[40] Kim, M.J.; Makis, V. Joint Optimization of Sampling and Control of Partially Ob-

servable Failing Systems. Oper Res, 2012, Under Review.

[41] Kim, M.J.; Makis, V.; Jiang, R. Parameter Estimation in a Condition Based Main-

tenance Model. Stat Probab Lett, 2010, 80, 1633-1639.

[42] Kim, M.J.; Makis, V.; Jiang, R. Parameter Estimation for Partially Observable

Systems Subject to Random Failure. Appl Stoch Model Bus, 2012, forthcoming.

Bibliography 91

[43] Krishnamurthy, V,; Yin, G.G. Recursive algorithms for estimation of hidden Markov

models and autoregressive models with Markov regime. IEEE T Inform Theory, 2002,

48, 458-476.

[44] Kurt, M.; Kharoufeh, J.P. Optimally Maintaining a Markovian Deteriorating System

with Limited Imperfect Repairs. Eur J Oper Res, 2010, 205, 368-380.

[45] Lam, C.T.; Yeh, R.H. Comparison of Sequential and Continuous Inspection Strate-

gies for Deteriorating Systems. Adv Appl Probab, 1994, 26, 423-435.

[46] Li, H.; Shaked, M. Imperfect Repair Models with Preventive Maintenance. J Appl

Probab, 2003, 40, 1043-1059.

[47] Lin, D.; Makis, V. Recursive Filters for a Partially Observable System Subject to

Random Failure. Adv Appl Probab, 2003, 35, 207-227.

[48] Liporace, L.A. Maximum Likelihood Estimation for Multivariate Observations of

Markov Sources. IEEE T Inform Theory, 1982; 28, 729-734.

[49] Makis, V. Multivariate Bayesian Control Chart. Oper Res, 2008, 56, 487-496.

[50] Makis, V.; Jardine, A.K.S. Optimal Replacement in the Proportional Hazards

Model. INFOR, 1992, 30, 172-183.

[51] Makis, V.; Jiang, X. Optimal Replacement Under Partial Observations. Math Oper

Res, 2003, 28, 382-394.

[52] Makis, V.; Jiang, X.; Cheng, K. Optimal Preventive Replacement Under Minimal

Repair and Random Repair Costs. Math Oper Res, 2000, 25, 141-156.

[53] Makis, V.; Wu, J.; Gao, Y.; An Application of DPCA to Oil Data for CBM Modeling.

Eur J Oper Res, 2006, 174, 112-123.

Bibliography 92

[54] Maillart, L.M.; Ivy, J.S.; Ransom, S.; Diehl, K. Assessing Dynamic Breast Cancer

Screening Policies. Oper Res, 2008, 56, 1411-1427.

[55] McLachlan, G.J.; Krishnan, T. The EM Algorithm and Extensions. John Wiley &

Sons, 2008.

[56] Neuts, M.F.; Perez-Ocon, R.; Torres-Castro, I. Repairable Models with Operating

and Repair Times Governed by Phase Type Distributions. Adv Appl Probab, 2000,

32, 468-479.

[57] Nikiforov, I.V. A Generalized Change Detection Problem. IEEE T Inform Theory,

1995, 41, 171-187.

[58] Ohnishi, M.; Kawai, H.; Mine, H. An Optimal Inspection and Replacement Policy

for a Deteriorating System. J Appl Probab, 1986, 23, 973-988.

[59] Provost, S.B.; Rudiuk, E.M. The Exact Distribution of Indefinite Quadratic Forms

in Noncentral Normal Vectors. Ann I Stat Math, 1996, 48, 381-394.

[60] Reinsel, G.C. Elements of Multivariate Time Series Analysis. Springer, New York,

1997.

[61] Roberts, W.J.J.; Ephraim, Y. An EM Algorithm for Ion-Channel Current Estima-

tion. IEEE T Signal Proces, 2008, 56, 26-33.

[62] Rosenfield, D. Markovian Deterioration with Uncertain Information. Oper Res, 1976,

24, 141-155.

[63] Ross, S.M. Quality Control Under Markovian Deterioration. Manage Sci, 1971, 17,

587-596.

[64] Ryden, T. On Recursive Estimation for Hidden Markov Models. Stoch Proc Appl,

1997, 66, 79-96.

Bibliography 93

[65] Schervish, M.J. Theory of Statistics. Springer, 1995.

[66] Shechter, S.M.; Bailey, M.D.; Schaefer, A.J.; Roberts, M.S. The Optimal Time to

Initiate HIV Therapy Under Ordered Health States. Oper Res, 2008, 56, 20-33.

[67] Schneider, H.; Frank, P.M. Observer-Based Supervision and Fault Detection in

Robots Using Nonlinear and Fuzzy Logic Residual Evaluation. IEEE T Contr Syst

T, 1996, 4, 274-282.

[68] Schoenberg, F.P. Multidimensional Residual Analysis of Point Process Models for

Earthquake Occurrences. J Am Stat Assoc, 2003, 98, 789-795.

[69] Sohn, H.; Farrar, C.R. Damage Diagnosis Using Time Series Analysis of Vibration

Signals. Smart Mater Struct, 2001, 10, 446-451.

[70] Tagaras, G.; Nikolaidis, Y. Comparing the effectiveness of various Bayesian X Con-

trol Charts. Oper Res, 2002, 50, 878-888.

[71] Tijms, H.C. Stochastic Models: An Algorithmic Approach. John Wiley, 1994.

[72] Valdez-Flores, C.; Feldman, R. A survey of preventive maintenance models for

stochasically deteriorating single-unit systems. Nav Res Log, 1989, 36, 419-446.

[73] Wang, L.; Chu, J.; Mao, W. A Condition-based Replacement and Spare Provisioning

Policy for Deteriorating Systems with Uncertain Deterioration to Failure. Eur J Oper

Res, 2009, 194, 184-205.

[74] Wang, H. A survey of maintenance policies of deteriorating systems. Eur J Oper

Res, 2002, 139, 469-489.

[75] Wang, W.; Wong, A.K. Autoregressive Model-Based Gear Fault Diagnosis. T ASME,

2002, 124, 172-179.

Bibliography 94

[76] Wang, X.; Makis, V.; Yang, M. A Wavelet Approach to Fault Diagnosis of a Gearbox

Under Varying Load Conditions. Journal of Sound and Vibration 2010; 329, 1570-

1585.

[77] Wu CFJ. On the Convergence Properties of the EM Algorithm. The Annals of

Statistics 1983; 11, 95-103.

[78] Wu, J.; Makis, V. Economic and Economic-Statistical Design of a Chi-Square Chart

for CBM. Eur J Oper Res, 2008, 188, 516-529.

[79] Yang, J.; Makis, V. Dynamic Response of Residual to External Deviations in a

Controlled Production Process. Technometrics, 2000, 42, 290-299.

[80] Yang M.; Makis V. ARX Model-Based Gearbox Fault Detection and Localization

Under Varying Load Conditions. J Sound Vib, 2010, 329, 5209-5221.

[81] Yin, Z.; Makis, V. Economic and Economic-Statistical Design of a Multivariate

Bayesian Control Chart for Condition-Based Maintenance. IMA J Manage Math,

2011, 22, 47-63.

[82] Yeh, R.H. Optimal Inspection and Replacement Policies for Multi-State Deteriorat-

ing Systems. Eur J Oper Res, 1996, 96, 248-259.

Documents

Optimal Control and Estimation of Stochastic …...Chapter 2. Optimal Control of Stochastic Systems 6 control policy that minimizes the long-run expected average cost per unit time