Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Optimal Control and Estimation of Stochastic Systemswith Costly Partial Information
by
Michael Jong Kim
A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy
Graduate Department of Industrial EngineeringUniversity of Toronto
Copyright c© 2012 by Michael Jong Kim
Abstract
Optimal Control and Estimation of Stochastic Systems with Costly Partial Information
Michael Jong Kim
Doctor of Philosophy
Graduate Department of Industrial Engineering
University of Toronto
2012
Stochastic control problems that arise in sequential decision making applications typically
assume that information used for decision-making is obtained according to a predeter-
mined sampling schedule. In many real applications however, there is a high sampling
cost associated with collecting such data. It is therefore of equal importance to determine
when information should be collected as it is to decide how this information should be
utilized for optimal decision-making. This type of joint optimization has been a long-
standing problem in the operations research literature, and very few results regarding
the structure of the optimal sampling and control policy have been published. In this
thesis, the joint optimization of sampling and control is studied in the context of mainte-
nance optimization. New theoretical results characterizing the structure of the optimal
policy are established, which have practical interpretation and give new insight into the
value of condition-based maintenance programs in life-cycle asset management. Applica-
tions in other areas such as healthcare decision-making and statistical process control are
discussed. Statistical parameter estimation results are also developed with illustrative
real-world numerical examples.
ii
To Li
iii
Acknowledgements
First and foremost, I thank my dearest family - Leo, little Sammy, Shaddy, Hobo, Fed
(the peck), Sammy, Li, Janice, Jona, Ma and Dad. This thesis was possible because of
you.
Of course, I thank my supervisor Viliam Makis, who taught me (among many impor-
tant lessons) the importance of research excellence, dedication and hard work. Pretty
much everything I know about the beautiful world of stochastic modeling and optimiza-
tion, I learned from you. Working with you has been a true privilege.
I would also like to give a special thanks to Roy Kwon and Daniel Frances for their
friendship and continued support throughout my years at U of T.
I thank my PhD committee Roy Kwon, Jeremy Quastel, Baris Balcioglu, Daniel
Frances, Haitao Liao and Kagan Kerman, for their guidance and advice, and Chi-Guhn
Lee and Timothy Chan for their great help in my final year of PhD. I must also thank
the amazing MIE graduate staff Brenda Fung, Donna Liu and Lorna Wong, who always
patiently answered my (many) questions. I also can’t thank NSERC enough for support-
ing my passion for research since my undergraduate studies. Your support has made all
the difference.
Finally, I would like to thank all the awesome friends I’ve made during my stay at U
of T. My QRM lab: Zhijian Yin, Bing Liu, Ming Yang, Rui (Eric) Jiang, Jing Yu, Zillur
Rahman, Jue Wang, Lawrence Yip, Jian Liu, Cathy Hancharek, Konstantin Shestopaloff,
Chen Lin, Akram Khaleghei GB and Farnoosh Naderkhani. And my UTORG crew:
Jonathan Li, Vahid Sarhangian, Kimia Ghobadi, Jenya Doudareva, Velibor Misic and
Hamid Ghaffari. You are the reason my stay here was always fun and full of laughs.
I thank you all.
iv
Contents
1 Introduction 1
2 Optimal Control of Stochastic Systems 5
2.1 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Derivation of the Optimality Equation and Structural Properties . . . . . 10
2.3 Optimality of Bayesian Control Chart . . . . . . . . . . . . . . . . . . . . 19
2.4 Computation of the Optimal Policy . . . . . . . . . . . . . . . . . . . . . 23
2.5 Conclusions and Future Research . . . . . . . . . . . . . . . . . . . . . . 31
3 Optimal Sampling and Control of Stochastic Systems 32
3.1 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Structural Form of the Optimal Policy . . . . . . . . . . . . . . . . . . . 39
3.3 Computation of the Optimal Policy . . . . . . . . . . . . . . . . . . . . . 50
3.3.1 Constructing the Optimal Control Chart . . . . . . . . . . . . . . 52
3.3.2 Comparison with Other Policies . . . . . . . . . . . . . . . . . . . 54
3.4 Conclusions and Future Research . . . . . . . . . . . . . . . . . . . . . . 58
4 Parameter Estimation for Stochastic Systems 60
4.1 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Parameter Estimation Using the EM Algorithm . . . . . . . . . . . . . . 66
4.2.1 Form of the Likelihood Function . . . . . . . . . . . . . . . . . . . 68
v
4.2.2 Form of the Pseudo Likelihood . . . . . . . . . . . . . . . . . . . 71
4.2.3 Maximization of the Pseudo Likelihood Function . . . . . . . . . 77
4.3 A Practical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.4 Conclusions and Future Research . . . . . . . . . . . . . . . . . . . . . . 86
Bibliography 87
vi
Chapter 1
Introduction
The motivation behind this thesis comes from the following scenario. Consider a system
that begins in a brand new state, deteriorates over time due to use, and ultimately
fails. Over the system’s useful life, data is collected at discrete time points to get partial
information about its condition, since the true level of system deterioration is generally
unknown. When the system fails it is replaced by a new independent system of the
same type. Data is once again collected until the new system fails and is replaced by
yet another system, and the cycle continues. Suppose we have the ability to dynamically
control the replacement cycles in two ways. First, we can decide at what time points
data will be collected, and second, we can opt to replace a functional system before it
fails. If we impose a cost structure, (e.g. replacement costs, data collection costs, etc.)
it is natural to ask whether there exists an optimal control policy that minimizes some
useful cost objective such as the expected long-term cost rate over an infinite horizon.
More importantly, one may also want to know if such an optimal control policy possesses
any insightful structural properties that can be utilized at a managerial level, or to aid
further algorithmic developments.
In this thesis, we formulate and analyze the above problem statement under different
model assumptions. New theoretical results characterizing the structure of the optimal
1
Chapter 1. Introduction 2
control policies are established, which have practical interpretation and give new insight
into the value of condition-based maintenance programs in life-cycle asset management.
Statistical parameter estimation results are also developed with illustrative real-world
numerical examples. In particular, we consider a mining industry application where con-
dition monitoring data comes from the transmission oil samples of 240-ton heavy hauler
trucks used in the Athabasca oil sands of Alberta, Canada. During the operational life of
each transmission unit, oil samples are collected at discrete time points (approximately
every 600 hours) and spectrometric oil analysis is carried out, which provides the concen-
trations (in ppm) of iron and copper that come from the direct wear of the transmission
unit. This data gives partial information about the transmission’s condition, since the
true condition of the unit is unobservable. Using this data set, we illustrate the practical
benefits of both the control and estimation results of this thesis.
In Chapter 2, we consider the optimal control problem with periodic inspections. The
state process follows an unobservable continuous time homogeneous Markov process.
At equidistant sampling times vector-valued observations having multivariate normal
distribution with state-dependent mean and covariance matrix are obtained at a positive
cost. At each sampling epoch a decision is made either to run the system until the next
sampling epoch or to carry out full preventive maintenance, which is assumed to be less
costly than corrective maintenance carried out upon system failure. The objective is to
determine the optimal control policy that minimizes the long-run expected average cost
per unit time. We formulate the problem as an optimal stopping problem with partial
information. We show that the optimal preventive maintenance region is a convex subset
of Euclidean space. We also analyze the practical three-state version of this problem in
detail and show that in this case the optimal policy is a control limit policy. Based on
this structural result, an efficient computational algorithm is developed for the three-state
problem, illustrated by a real-world numerical example.
In Chapter 3, we consider the situation in which the decision maker can decide when
Chapter 1. Introduction 3
condition monitoring information should be collected, as well as when to initiate preven-
tive maintenance. The objective is to characterize the structural form of the optimal sam-
pling and maintenance policy that minimizes the long-run expected cost per unit time.
The problem is formulated as a partially observable Markov decision process (POMDP).
It is shown that monitoring the posterior probability that the system is in a so-called
warning state is sufficient for decision-making. We prove that the optimal control policy
can be represented as a control chart with three critical thresholds. Such a control chart
has direct practical value as it can be readily implemented for online decision-making.
Implication of the structural results such as planning maintenance activities into the fu-
ture are discussed, and cost comparisons with other suboptimal policies are developed
which illustrate the benefits of the joint optimization of sampling and control.
In Chapter 4, we present a parameter estimation procedure for a condition-based
maintenance model with partial information. Two types of data histories are available:
data histories that end with observable failure, and censored data histories that end
when the system has been suspended from operation but has not failed. The approach
taken in this chapter is to first pre-process the data histories and remove as much of the
autocorrelation as possible before proceeding to hidden Markov modeling. The idea is
to first decide on an initial approximation for the healthy portions of the data histories
and fit a time series model to the healthy data portions. The residuals using the fitted
model are then computed for both healthy and unhealthy portions of data histories, and
formal statistical tests for conditional independence and multivariate normality are per-
formed. The residuals are then chosen as the “observation” process in the hidden Markov
framework. The main advantage of this approach is that the conditional independence
and multivariate normality of the residuals are essential for tractable maintenance opti-
mization modeling, and, as a result, computational times for parameter estimation are
extremely fast. The model parameters are estimated using the EM algorithm. We show
that both the pseudo likelihood function and the parameter updates in each iteration
Chapter 1. Introduction 4
of the EM algorithm have explicit formulas. The estimation procedure is illustrated on
real-world data coming from mining industry.
Bibliographical note. Chapter 2 contains results from Kim and Makis [39]. Chap-
ter 3 contains results from Kim and Makis [40]. Chapter 4 contains results from Kim et
al. [38], Kim et al. [41] and Kim et al. [42].
Chapter 2
Optimal Control of Stochastically
Failing Systems with Periodic
Inspections
Consider a deteriorating system that can be in one of N unobservable operational states
1, . . . , N, or in an observable failure state N+1. The state process (Xt : t ∈ R+) follows
a continuous time homogeneous Markov chain with state space 1, . . . , N ∪ N + 1.
At equidistant sampling times ∆, 2∆, . . ., vector data Y1, Y2, . . . ∈ Rd, are sampled at
a positive cost. We assume that (Yn) have multivariate normal distribution with state-
dependent mean and covariance matrix. The observations represent information obtained
through condition monitoring, such as engine oil data obtained from spectrometric analy-
sis or vibration data collected from rotating machinery. When the system fails, corrective
maintenance is performed, which is either a replacement or a maintenance action that
returns the system to a “good-as-new” condition, i.e. returns Xt to state 1. At each
sampling epoch, a decision is made either to run the system until the next sampling
epoch or to carry out full preventive maintenance. Preventive maintenance also returns
the system to a “good-as-new” condition. The objective is to determine the optimal
5
Chapter 2. Optimal Control of Stochastic Systems 6
control policy that minimizes the long-run expected average cost per unit time.
A lot of recent theoretical research has been done on the analysis and control of
maintenance models. Neuts et al. [56] considered a failing system governed by phase
type distributions. The authors analyzed the stationary distribution of the state process
and considered two types of performance measures: availability and rate of occurrence
of failures (ROCOF). Makis et al. [52] considered a repair/replacement model for a
single unit system with random repair costs. Jiang et al. [33] studied a maintenance
model with general repair and two types of replacement actions: failure and preventive
replacement. The authors proved that a generalized repair-cost-limit policy is optimal for
the minimization of the long-run expected average cost per unit time. Li and Shaked [46]
analyzed an imperfect repair model also subject to preventive maintenance. The authors
compared a variety of different maintenance policies using a point-process approach.
Some recent and classical survey papers on maintenance optimization are [12], [72] and
[74]. In addition to the theoretical work done in this area, maintenance models have been
successfully applied in many real world applications including furnace erosion prediction
using the state-space model [14], transmission fault detection using the proportional
hazards model [50], and helicopter gearbox state assessment using the hidden Markov
model [8].
We show in this chapter that the optimal control policy for the three-state version
of our model is a control limit policy. This provides a formal justification for the recent
papers by Yin and Makis [81] and Kim et al. [38] who proposed Bayesian control charts
for maintenance decision making, but did not prove that such a control policy is optimal.
This also shows that the χ2 control chart recently proposed by Wu and Makis [78] is in
fact a suboptimal control policy. The model considered in this chapter can also be viewed
as a generalization of a recent model considered by Makis [49], who analyzed a two-state
version of our model in the context of quality control, but did not consider observable
failure information, a property that is present in maintenance applications.
Chapter 2. Optimal Control of Stochastic Systems 7
The remainder of this chapter is organized as follows. In §2.1, we describe the model
and formulate the control problem as an optimal stopping problem with partial infor-
mation. In §2.2, we use the λ−minimization technique to transform the problem into
a stopping problem with an additive objective function, which is easier to analyze. We
derive the optimality equation and characterize the structural properties of the optimal
control policy. It is shown that the optimal preventive maintenance region is a convex
subset of Euclidean space. In §2.3, we treat the practical three-state version of our model
in detail and show that the optimal control policy is a control limit policy. Based on
this structural property, in §2.4, an efficient computational algorithm is developed for the
three-state problem, illustrated by a numerical example. Concluding remarks and future
research directions are provided in §2.5.
2.1 Model Formulation
Let (Ω,F , P ) be a complete probability space on which the following stochastic processes
are defined. The state process (Xt : t ∈ R+) is a continuous time homogeneous Markov
chain with N ∈ N unobservable operational states X = 1, . . . , N and an observable
failure state N + 1, so that the state space of the Markov chain is X = X ∪ N + 1.
The instantaneous transition rates
qij = limh→0+
P (Xh = j|X0 = i)
h< +∞, i 6= j ∈ X
qii = −∑j 6=i
qij,
and the state transition rate matrix Q = (qij)N+1×N+1. We assume that if i < j, state
i is not worse than state j, and state 1 denotes the state of a new system. To model
such monotonic system deterioration, we assume that the state process is non-decreasing
with probability 1, i.e. qij = 0 for all j < i. This implies that the failure state is
absorbing. We also assume that if i < j, then failure rates qi,N+1 ≤ qj,N+1. Upon system
Chapter 2. Optimal Control of Stochastic Systems 8
failure, corrective maintenance is carried out, which brings the system to a new state.
The observable time to system failure is denoted ξ := inf t ∈ R+ : Xt = N + 1.
The system is monitored at equidistant sampling times ∆, 2∆, . . ., ∆ ∈ (0,+∞),
and the information obtained at time n∆ is denoted Yn ∈ Rd. While the system is in
operational state i ∈ X , we assume Yn|Xn∆ = i ∼ Nd (µi,Σi), and that observations
(Yn : n ∈ N) are conditionally independent given the system state. The conditional den-
sity of Yn given Xn∆ is denoted
f (y|i) =1√
(2π)d
det (Σi)exp
(−1
2(y − µi)
′Σ−1i (y − µi)
), y ∈ Rd, i ∈ X . (2.1.1)
Let F = (Fn : n ∈ Z+) be the complete natural filtration generated by the observable
information at each sampling epoch,
Fn = σ(Y1, . . . , Yn, Iξ>n∆).
After collecting a sample and processing the new information, a decision is made either
to run the system until the next sampling epoch or carry out full preventive maintenance,
which brings the system to a new state. We consider the following cost structure.
Ci = operational cost rate in state i ∈ X .
Cfi = corrective maintenance cost if failure occurs from operational state i ∈ X .
Cpi = preventive maintenance cost in operational state i ∈ X .
Cs = sampling cost incurred when obtaining each observation Yn.
We assume that preventive maintenance becomes more costly as the system deteriorates,
i.e. Cpi ≤ Cpj for i ≤ j. Furthermore, preventive maintenance is assumed to be less
costly than corrective maintenance, i.e. maxi∈X Cpi < mini∈X Cfi. This assumption
is a requirement to make the problem non-trivial. Indeed, if the cost due to system failure
is lower than the cost of preventive maintenance, then the optimal action is always to let
the system run until failure.
Chapter 2. Optimal Control of Stochastic Systems 9
The objective is to determine the optimal control policy minimizing the long-run
expected average cost per unit time. The problem can be formulated as an optimal
stopping problem with partial information. From renewal theory, the long-run expected
average cost per unit time is calculated for any control policy as the expected cost incurred
in one cycle divided by the expected cycle length, where a cycle is completed when either
preventive or corrective maintenance is carried out, which brings the system to a new
state. For the average cost criterion, the control problem is formulated as follows. Find
an F−stopping time τ ∗, if it exists, minimizing the long-run expected average cost per
unit time given by
EΠ0(TCτ)
EΠ0(τ∆ ∧ ξ)
, (2.1.2)
where τ is an F−stopping time, TCτ is the total cost incurred over one complete cycle
of length τ∆∧ ξ, and EΠ0is the conditional expectation given Π0, the initial distribution
of X0. We assume that a new system is installed at the beginning of each cycle, i.e.
Π0 = [1, 0, . . . , 0]1×N+1
. Based on the cost structure given above,
TCn =∑i∈X
Ci
∫ n∆
0
IXs=ids+∑i∈X
CfiIξ≤n∆,Xξ−=i
+∑i∈X
CpiIXn∆=i + (n ∧ bξ/∆c)Cs, (2.1.3)
where TCn represents the total cost incurred if preventive maintenance is scheduled at
time n∆. The summands on the right-hand side of (2.1.3) represent the total operational
cost, corrective maintenance cost, preventive maintenance cost, and sampling cost, re-
spectively.
The optimal F−stopping time τ ∗ represents the first sampling epoch at which full
preventive maintenance should take place. It is important to realize that since we are
also considering mandatory corrective maintenance upon system failure, the optimal
control policy is identified with random variable τ ∗∆ ∧ ξ, which represents the optimal
time at which preventive or corrective maintenance should be carried out. Thus, without
loss of generality we may restrict the stopping problem (2.1.2) to the class of F−stopping
Chapter 2. Optimal Control of Stochastic Systems 10
times τ ≤ dξ/∆e.
In the next section we derive the dynamic optimality equation, which will be analyzed
to characterize the structure of the optimal control policy.
2.2 Derivation of the Optimality Equation and Struc-
tural Properties of the Optimal Policy
In this section, we use the λ−minimization technique to transform the problem into a
stopping problem with an additive objective function, which is easier to analyze. We
derive the optimality equation and characterize the structural properties of the optimal
control policy. It is shown that the optimal preventive maintenance region is a convex
subset of Euclidean space.
We first apply the λ−minimization technique (see Aven and Bergman [3]) and trans-
form the stopping problem (2.1.2) to a parameterized stopping problem (with parameter
λ) with an additive objective function. Define for λ > 0 the value function
V λ (Π0) = infτEΠ0
(Zλτ
), (2.2.1)
where the infimum is taken over all F−stopping times τ and
Zλn = TCn − λ (n∆ ∧ ξ) . (2.2.2)
Aven and Bergman [3] showed that λ∗ determined by the equation
λ∗ = infλ > 0 : V λ (Π0) ≤ 0
(2.2.3)
is the optimal expected average cost for the stopping problem (2.1.2), and the F−stopping
time τ ∗ that minimizes the right-hand side of (2.2.1) for λ = λ∗ determines the optimal
stopping time. To simplify notation, we suppress the dependence of λ for the remainder
of the chapter. Since the process (Zn : n ∈ Z+) defined by (2.2.2) is not F−adapted, we
Chapter 2. Optimal Control of Stochastic Systems 11
consider the following stopping problem
EZτ∗ = infτEZτ , (2.2.4)
where Zn = E (Zn|Fn). For any F−stopping time τ , EZτ = E (E (Zτ |Fτ)) = EZτ ,
so that (2.2.4) is equivalent to (2.2.1). Then, the observable F−adapted process (Zn :
n ∈ Z+), Zn = E (Zn|Fn), admits the following discrete-time smooth F−semimartingale
representation (see e.g. Jensen [31]),
Zn = Z0 +n∑k=1
Tk +Mn, (2.2.5)
where Tk = E(Zk − Zk−1|Fk−1), and (Mn : n ∈ Z+) is an F−martingale with M0 = 0.
To evaluate Tk, we first note that the indicator random variables Iξ≤n∆,Xξ−=i and
IXn∆=i have the following representation [10],
Iξ≤n∆,Xξ−=i =
∫ n∆
0
IXs=iqi,N+1ds+ Lin,
IXn∆=i = IX0=i +
∫ n∆
0
∑j∈X
IXs=jqjids+K in, (2.2.6)
where the processes (Lin : n ∈ Z+) and (K in : n ∈ Z+) are both (Gn)−martingales, with
Gn = σ(Y1, . . . , Yn, Xt : t ≤ n∆) ⊃ Fn. Then, using equations (2.1.3), (2.2.2) and
Chapter 2. Optimal Control of Stochastic Systems 12
(2.2.6),
Tk = E(E(Zk|Fk)− E(Zk−1|Fk−1)|Fk−1)
= E (Zk − Zk−1|Fk−1)
=∑i∈X
(Ci + Cfiqi,N+1 +
∑j∈X
Cpjqij − λ
)∫ k∆
(k−1)∆
E (IXs=i|Fk−1) ds
+∑i∈X
CfiE(Lik − Lik−1|Fk−1
)+∑i∈X
CpiE(K ik −K i
k−1|Fk−1
)+E
(Cs
k∑m=1
Iξ>m∆ − Csk−1∑m=1
Iξ>m∆|Fk−1
)
=:
∫ k∆
(k−1)∆
∑i∈X
riΠs(i)ds+∑i∈X
Cfi[E (E (Lik|Gk−1) |Fk−1)− E
(Lik−1|Fk−1
)]+∑i∈X
Cpi[E (E (K i
k|Gk−1) |Fk−1)− E(K ik−1|Fk−1
)]+ CsE (Iξ>k∆|Fk−1)
=
∫ k∆
(k−1)∆
〈r,Πs〉 ds+ Cs (1− Π−k∆(N + 1)) ,
where
r = [r1, . . . , rN , 0] ,
ri = Ci + Cfiqi,N+1 +∑j∈X
Cpjqij − λ, (2.2.7)
Πs = [Πs (1) , . . . ,Πs (N + 1)] ,
Πs (i) = P (Xs = i|Fbs/∆c) ,
the inner product 〈v, w〉 = vwT , and the left hand limit Π−k∆ = limt↑k∆ Πt. Thus, (2.2.5)
simplifies to
Zn = Z0 +
∫ n∆
0
〈r,Πs〉 ds+n∑k=1
Cs (1− Π−k∆(N + 1)) +Mn, (2.2.8)
where Z0 =∑
i∈X CpiΠ0(i). The vector Πt defined in (2.2.7) is the conditional distribution
of the system stateXt given Fbt/∆c, the information at the previous sampling epoch bt/∆c.
The evolution of the vector process (Πt : t ∈ R+) is described by the following lemma.
Chapter 2. Optimal Control of Stochastic Systems 13
Lemma 2.2.1. For t > 0, and given initial state distribution Π0, Πt can be obtained
iteratively as follows:
Πt = Πn∆ exp ((t− n∆)Q) , n∆ < t < (n+ 1)∆,
Πn∆ =Π−n∆diag (fYn)
〈fYn ,Π−n∆〉Iξ>n∆ + eN+1Iξ≤n∆, n ∈ N, (2.2.9)
where eN+1 = [0, . . . , 0, 1]1×N+1
, fy = [f (y|1) , . . . , f (y|N) , 0], y ∈ Rd, and diag (fy) is
the N + 1×N + 1 matrix with fy along its main diagonal and zero elsewhere.
Proof. The first equation in (2.2.9) follows since
dΠt
dt= lim
h→0+
Πt+h − Πt
h
= limh→0+
E (It+h − It|Fbt/∆c)h
= ΠtQ,
where It = [IXt=1, . . . , IXt=N+1], so that Πt = Πbt/∆c∆ exp ((t− bt/∆c∆)Q). The
second equality in the above equation follows since n∆ < t < (n + 1)∆, which implies
that for h sufficiently small, h > 0, Fb(t+h)/∆c = Fbt/∆c. The second equation in (2.2.9)
follows since for any n ∈ N, given ξ > n∆ and Yn = y ∈ Rd, Bayes’ Theorem implies
Πn∆ (i) =
f(y|i)Π−n∆(i)∑j∈X f(y|j)Π−n∆(j)
, i ∈ X
0, i = N + 1
and given ξ ≤ n∆,
Πn∆ (i) =
0, i ∈ X
1, i = N + 1
Combining the above two equations in vector form gives
Πn∆ =Π−n∆diag (fYn)
〈fYn ,Π−n∆〉Iξ>n∆ + eN+1Iξ≤n∆,
which completes the proof.
Chapter 2. Optimal Control of Stochastic Systems 14
Lemma 2.2.1 implies that the Markov process (Πt) defined above has piecewise-
deterministic trajectories. Such a process is known as a piecewise-deterministic Markov
process [16]. By representation (2.2.8), the stopping problem (2.2.4) can now be explicitly
formulated as
V (Π0) = EΠ0Zτ∗
= Z0 + infτEΠ0
(τ∑n=1
[∫ n∆
(n−1)∆
〈r,Πs〉 ds+ Cs (1− Π−n∆(N + 1))
])=: Z0 + V (Π0), (2.2.10)
where the second equality follows by the optional sampling theorem since EMτ = EM0 =
0, for any F−stopping time. Then, for any probability measure Π defined on X , the
function V (Π) satisfies the following dynamic optimality equation
V (Π) = min
0,
infτ≥1
EΠ
(τ∑n=1
[∫ n∆
(n−1)∆
〈r,Πs〉 ds+ Cs (1− Π−n∆(N + 1))
])
= min
0,∫ ∆
0〈r,Πs〉 ds+ Cs (1− Π−∆(N + 1))
+∫Rd V
(Π−∆diag(f(y))
〈f(y),Π−∆〉
)〈f(y),Π−∆〉 dy
, (2.2.11)
where the first equality in (2.2.11) follows by partitioning the class of F−stopping times
into two classes: the class of stopping times τ = 0 and the class of stopping times τ ≥ 1.
The second equality in (2.2.11) follows by Lemma 1 and the strong Markov property of
(Πt : t ∈ R+). We analyze the structural properties of (2.2.11).
Since failure is observable, and upon system failure Π = [0, . . . , 0, 1]1×N+1 we consider
mandatory corrective maintenance, we need only analyze the function V (Π) over the
space of probability measures
P =
Π ∈ [0, 1]
N+1:∑i∈X
Π(i) = 1,Π(N + 1) = 0
. (2.2.12)
Chapter 2. Optimal Control of Stochastic Systems 15
in which the system is known to be operational. For any g : P → R, define the operator
T (g)(Π) = min
0,∫ ∆
0〈r,Πs〉 ds+ Cs (1− Π−∆(N + 1))
+∫Rd g
(Π−∆diag(f(y))
〈f(y),Π−∆〉
)〈f(y),Π−∆〉 dy
. (2.2.13)
Then for g1, g2 : P → R and Π ∈ P ,
|T (g1)(Π)− T (g2)(Π)| ≤
∣∣∣∣∣∣∣∣∫Rd g1
(Π−∆diag(f(y))
〈f(y),Π−∆〉
)〈f(y),Π−∆〉 dy
−∫Rd g2
(Π−∆diag(f(y))
〈f(y),Π−∆〉
)〈f(y),Π−∆〉 dy
∣∣∣∣∣∣∣∣≤
∫Rd
∣∣∣∣∣∣∣∣g1
(Π−∆diag(f(y))
〈f(y),Π−∆〉
)−g2
(Π−∆diag(f(y))
〈f(y),Π−∆〉
)∣∣∣∣∣∣∣∣ 〈f(y),Π−∆〉 dy
≤ ‖g1 − g2‖∫Rd〈f(y),Π−∆〉 dy
≤ ‖g1 − g2‖maxi∈XP (ξ > ∆|X0 = i)
=: ‖g1 − g2‖ β,
so that ‖T (g1)− T (g2)‖ ≤ β ‖g1 − g2‖ for some 0 < β < 1. Thus, the operator T defined
in (2.2.13) is a contraction operator.
Bertsekas and Shreve [7] (p. 55, Proposition 4.2) showed that the contraction property
of T defined in (2.2.13) implies that the function V (Π) is the unique solution of the
optimality equation (2.2.11), and can be obtained as the limit
V (Π) = limn→+∞
T n(0)(Π)
= limn→+∞
V n+1(Π), (2.2.14)
where V n+1(Π) is the value function for the (n + 1)−stage stopping problem, satisfying
the dynamic equation:
V n+1(Π) = min
0,∫ ∆
0〈r,Πs〉 ds+ Cs (1− Π−∆(N + 1))
+∫Rd V n
(Π−∆diag(f(y))
〈f(y),Π−∆〉
)〈f(y),Π−∆〉 dy
V 0(Π) = 0 (2.2.15)
Chapter 2. Optimal Control of Stochastic Systems 16
Using (2.2.14) and (2.2.15) we have the following result
Lemma 2.2.2. The function V : P → R is concave.
Proof. We use mathematical induction. By equation (2.2.9), the terms∫ ∆
0〈r,Πs〉 ds and
Cs (1− Π−∆(N + 1)) in (2.2.15) are linear, and hence concave. Since the operator ‘min’
preserves concavity, for the base case n = 1, V 1(Π) is concave. Assume now that for
some n ∈ N, V n(Π) is concave. We need only show that the last term on the right-hand
side of equation (2.2.15) is concave. For any constant α ∈ [0, 1] and probability measures
Π,Γ ∈ P , we put θ =〈f(y),αΠ−∆〉
〈f(y),(αΠ−∆+(1−α)Γ−∆)〉 ∈ [0, 1]. Then,
∫RdV n
((αΠ−∆ + (1− α)Γ−∆) diag (f(y))
〈f(y), (αΠ−∆ + (1− α)Γ−∆)〉
)〈f(y), (αΠ−∆ + (1− α)Γ−∆)〉 dy
=
∫RdV n
θΠ−∆diag(f(y))
〈f(y),Π−∆〉+(1− θ)Γ−∆diag(f(y))
〈f(y),Γ−∆〉
〈f(y), (αΠ−∆ + (1− α)Γ−∆)〉 dy
≥∫Rd
θV n
(Π−∆diag(f(y))
〈f(y),Π−∆〉
)+(1− θ)V n
(Γ−∆diag(f(y))
〈f(y),Γ−∆〉
) 〈f(y), (αΠ−∆ + (1− α)Γ−∆)〉 dy
= α
∫RdV n
(Π−∆diag (f(y))
〈f(y),Π−∆〉
)〈f(y),Π−∆〉 dy
+(1− α)
∫RdV n
(Γ−∆diag (f(y))
〈f(y),Γ−∆〉
)〈f(y),Γ−∆〉 dy,
where the inequality follows since V n(Π) is concave by the induction hypothesis. Thus,
V n+1(Π) is concave, and by (2.2.14) it follows that the limit V (Π) = limn→+∞
V n+1(Π) is
also concave, which completes the proof.
Lemma 2.2.2 implies that the optimal preventive replacement region defined by
R =
Π ∈ P : V (Π) ≥ 0
(2.2.16)
is a convex subset of P , and the optimal control policy is determined by the following
procedure:
Chapter 2. Optimal Control of Stochastic Systems 17
Theorem 2.2.3. For λ = λ∗, at sampling epoch n∆,
1. If Πn∆ ∈ R, full preventive maintenance is carried out. Otherwise, run the system
until the next sampling epoch (n+ 1)∆.
2. Corrective maintenance is carried out immediately upon system failure.
We now present an iterative algorithm based on the λ−minimization technique and
the contraction property, for the computation of the optimal expected average cost and
the optimal control policy. Recall, by equation (2.2.10), since Π0 = [1, 0, . . . , 0]1×N+1
the
original value function V λ(Π0) defined in (2.2.1) is related to Vλ(Π0) via the equation
V λ(Π0) = Vλ(Π0) + Cp1.
The Algorithm
Step 1. Choose ε > 0 and lower and upper bounds of λ, λ ≤ λ ≤ λ.
Step 2. Put λ = (λ+ λ)/2, and Vλ
0 ≡ 0, n = 1.
Step 3. Calculate Vλ
n = T (Vλ
n−1) using equations (2.2.13) and (2.2.15). Stop the
iteration of Vλ
n when ||V λ
n − Vλ
n−1|| ≤ ε, and put Vλ
= Vλ
n, and
V λ(Π0) = Vλ(Π0) + Cp1.
Step 4. If V λ(Π0) < −ε, put λ = λ and go to Step 2.
If V λ(Π0) > ε, put λ = λ and go to Step 2.
If |V λ(Π0)| ≤ ε, put λ∗ε = λ and stop. λ∗ε approximates the optimal average
cost.
Proposition 2.2.4. For any δ > 0, we can always choose ε > 0 sufficiently small such
that λ∗ε obtained from the algorithm above approximates the optimal average cost rate λ∗,
i.e., |λ∗ − λ∗ε | ≤ δ.
Proof. In the general theory of the λ-minimization technique (Proposition A.2.), Aven
and Bergman [3] proved that for any λ > 0, if λ > λ∗ then the value function satisfies
Chapter 2. Optimal Control of Stochastic Systems 18
V λ(Π0) < 0, where λ∗ is the optimal average cost rate. Similarly, if λ < λ∗ then the value
function V λ(Π0) > 0, and if λ = λ∗ then the value function V λ(Π0) = 0. Furthermore,
the authors proved that the mapping λ 7→ V λ(Π0) is non-increasing and concave. By
Proposition 2.9., p. 29, of Avriel et al. [4], it follows that the mapping λ 7→ V λ(Π0)
is continuous. Therefore, it follows that for any δ > 0, we can always choose ε > 0
sufficiently small such that λ∗ε obtained from the algorithm above satisfies |λ∗ − λ∗ε | ≤ δ,
where λ∗ is the optimal average cost rate.
In the algorithm above, since λ > 0, a natural choice for the initial value of λ the
lower bound is 0. However, it is not clear how one should choose the value of the initial
upper bound λ. The following result provides a feasible choice of the initial upper bound.
Proposition 2.2.5. The optimal average cost is bounded by
0 < λ∗ ≤ qN,N+1
∆ max Ci+ max Cfi+ max Cpi+ Cs1− e−qN,N+1∆
.
Thus, in the algorithm given above,
λ = 0 and λ = qN,N+1
∆ max Ci+ max Cfi+ max Cpi+ Cs1− e−qN,N+1∆
are feasible initial values for lower and upper bounds, respectively.
Proof. Consider the policy that initiates preventive maintenance at time ∆, which we
identify with the stopping time τ1 ≡ 1. From the renewal-reward theorem (see e.g.
Grimmett and Stirzaker [26], p. 431) and equation (2.1.3), the long-run expected average
cost per unit time for this policy, which we denote as λ1, has the upper bound given by
λ1 =EΠ0
(TCτ1)
EΠ0(τ1∆ ∧ ξ)
≤ ∆ max Ci+ max Cfi+ max Cpi+ Cs∫ ∆
0e−qN,N+1sds
= qN,N+1
∆ max Ci+ max Cfi+ max Cpi+ Cs1− e−qN,N+1∆
,
Chapter 2. Optimal Control of Stochastic Systems 19
where the first inequality follows from the non-decreasing failure rate assumption qi,N+1 ≤
qj,N+1, i < j. Thus, it follows that
0 < λ∗ = infτ
EΠ0(TCτ)
EΠ0(τ∆ ∧ ξ)
≤ EΠ0(TCτ1)
EΠ0(τ1∆ ∧ ξ)
=: λ1
≤ qN,N+1
∆ max Ci+ max Cfi+ max Cpi+ Cs1− e−qN,N+1∆
.
Therefore, the optimal average cost is bounded by
0 < λ∗ ≤ qN,N+1
∆ max Ci+ max Cfi+ max Cpi+ Cs1− e−qN,N+1∆
.
which completes the proof.
In the next section, we analyze the three-state version of our model in detail.
2.3 Optimality of Bayesian Control Chart
In this section we analyze the three-state version of the problem in detail and show that
the optimal control policy is a control limit policy. For practical purposes, it is usually
sufficient to consider two working states: a good state and a warning state. The state
process (Xt : t ∈ R+) has state space X = 1, 2 ∪ 3, where state 1 represents an
unobservable good state, state 2 represents an unobservable warning state, and state 3
is the observable failure state. In this case, the generator of the Markov chain takes the
form
Q =
−(q12 + q13) q12 q13
0 −q23 q23
0 0 0
, (2.3.1)
Chapter 2. Optimal Control of Stochastic Systems 20
where q12, q13, q23 ∈ (0,+∞). Using the Kolmogorov backward differential equations we
explicitly solve for the transition probability matrix
P(t) = [pij(t)]
=
e−υ1t
q12(e−υ2t−e−υ1t)υ1−υ2
1− e−υ1t − q12(e−υ2t−e−υ1t)υ1−υ2
0 e−υ2t 1− e−υ2t
0 0 1
, (2.3.2)
where transition probabilities pij(t) = P (Xt = j|X0 = i), i, j ∈ X , and constants
υ1 = q12 + q13, υ2 = q23.
We now prove an intuitive result, which will be important in showing that a control
limit policy is optimal for the three-state model. The proof makes use of a classical
age-based policy result of Barlow and Hunter [6]. Any control policy determined by a
stopping time τ that is equal to a deterministic constant n is known as an age-based
policy.
Theorem 2.3.1. Under the model assumptions made in Section 2.1, the control policy
that never carries out preventive maintenance, i.e. τ =∞, is not optimal.
Proof. Consider the age-based policy that carries out preventive maintenance after n
periods. From the renewal-reward theorem (see e.g. Grimmett and Stirzaker [26], p.
431), the long-run expected average cost per unit time for this policy is given by
g(n)
=
∑i∈X
(Cfiqi3 + Ci)
∫ n∆
0
p1i(s)ds+ CsE [n ∧ bξ/∆c] +∑i∈X
Cpip1i(n∆)
E [n∆ ∧ ξ]. (2.3.3)
Thus, to prove the claim, it suffices to show that
arg minn
g(n) < +∞. (2.3.4)
To show (2.3.4), we derive an upper bound on arg minn g(n) by considering a special case
of cost parameters Cfi, Cpi, Ci, Cs for which preventive maintenance must be carried out at
Chapter 2. Optimal Control of Stochastic Systems 21
a later time. In particular, we choose corrective maintenance costs Cfi = min Cfi =: Cf
all equal to the cheapest corrective maintenance cost, and preventive maintenance costs
Cpi = max Cpi =: Cp all equal to the most expensive preventive maintenance cost. We
also impose no penalty for operating system the longer, i.e. Ci = 0 and Cs = 0. Then,
if preventive maintenance is scheduled after n periods, the expected average cost using
cost parameters Cfi, Cpi, Ci, Cs is given by
h(n) =CfF (n∆) + CpF (n∆)∫ n∆
0F (s)ds
, (2.3.5)
where F (t) = p13(t) is the distribution function of ξ and F (t) = 1 − F (t). Since terms∫ n∆
0
p1i(s)ds and E [n ∧ bξ/∆c] in the numerator of (2.3.3) are increasing in n, the term
p11(n∆) = e−υ0n∆ is decreasing in n, and we have assumed in Section 2.1 that Cp1 ≤ Cp2,
by choice of cost parameters Cfi, Cpi, Ci, Cs, equations (2.3.3) and (2.3.5) imply
arg minn
g(n) ≤ arg minn
h(n).
We now appeal to a classical age-based policy result of Barlow and Hunter [6] to show
arg minn h(n) < +∞. Since we have assumed q13 < q23, the failure rate of ξ is increasing.
We have also assumed that Cp < Cf . Barlow and Hunter [6] showed that under these
hypotheses, there exists a positive real value t∗ < +∞ such that t∗ is the unique minimizer
of h(t). For our problem arg minn h(n) is required to be integer-valued. However, since
t∗ is a unique minimizer, the function h(t) is increasing for t > t∗. Thus, it follows that
arg minn h(n) ≤ dt∗e < +∞, which completes the proof.
We are now ready to state and prove the main result of this section.
Theorem 2.3.2. The optimal control policy for the three-state model is a control limit
policy. In particular, there exists control limit Π ∈ (0, 1], such that the optimal control
policy is determined by the following procedure. At sampling epoch n∆,
1. If Πn∆(2) ≥ Π, full preventive maintenance is carried out. Otherwise, run the
system until the next sampling epoch (n+ 1)∆.
Chapter 2. Optimal Control of Stochastic Systems 22
2. Corrective maintenance is carried out immediately upon system failure.
Proof. For the three-state version of the problem, the space P defined in (2.2.12) takes
the form
P = [1− π, π, 0] : π ∈ [0, 1] , (2.3.6)
which is the line segment in R3 connecting points e1 = [1, 0, 0] and e2 = [0, 1, 0]. By
Lemma 2.2.2 that the optimal control region R defined in (2.2.16) is a convex subset
of P . Thus, to prove that a control limit policy is optimal it suffices to show that the
function V (e2) = 0. We note that the control limit must be strictly greater than 0,
i.e. Π > 0. This is because if Π = 0, then R = P , which implies that the policy
will immediately initiate preventive maintenance at the beginning of each cycle. Since
preventive maintenance times are assumed to be zero, the long-run average cost rate for
this policy will be infinite. Therefore, if we can show that V (e2) = 0, necessarily the
control limit Π ∈ (0, 1]. To prove that V (e2) = 0, we use mathematical induction. For
n = 1, using equation (2.2.15),
V 1(e2) = min
0, r2
∫ ∆
0
p22(s)ds+ (Cs + V0(e2)) p22(∆)
= min
0, r2
∫ ∆
0
p22(s)ds+ Csp22(∆)
. (2.3.7)
We assume now that V 1(e2) < 0 and draw a contradiction. Since it is not optimal to
carry out preventive maintenance when the system is in a good state, V 1(e1) < 0. If
V 1(e2) < 0, then equation (2.2.15) and linearity of∫ ∆
0〈r,Πs〉 ds and Cs (1− Π−∆(N + 1))
imply that V 1(Π) < 0 for all Π ∈ P . Since V n(Π) ≥ V n+1(Π) for all n ∈ N, it follows
that the limit V (e2) = limn→∞
V n(e2) < 0, and the policy that never carries out preventive
maintenance, i.e. τ = ∞, is optimal. This is a direct contradiction of Theorem 2.3.1.
Thus, it follows that V 1(e2) ≥ 0 and by equation (2.3.7) we have the following inequality
r2
∫ ∆
0
p22(s)ds+ Csp22(∆) ≥ 0. (2.3.8)
Chapter 2. Optimal Control of Stochastic Systems 23
Suppose now that for some n ∈ N, V n(e2) = 0. Using inequality (2.3.8),
V n+1(e2) = min
0, r2
∫ ∆
0
p22(s)ds+ Csp22(∆)
= 0,
which completes the inductive step. Therefore, the function V (e2) = limn→∞
V n(e2) = 0,
which completes the proof.
Theorem 2.3.2 shows that the optimal control policy for the three-state model can be
represented as a control chart, which monitors the posterior probability Πn∆(2) that the
system is in a warning state 2. Once Πn∆(2) exceeds a fixed control limit Π ∈ (0, 1], full
preventive maintenance is carried out. Unlike the general N−state model, the control
limit policy for the three-state model has the advantage that it is no longer parameterized
by λ, which is an extremely useful property from a computational point of view. In the
next section, we develop an efficient computational algorithm to determine the optimal
control limit Π∗ ∈ (0, 1], as well as the optimal long-run average cost λ∗, for the three-
state model.
2.4 Computation of the Optimal Policy
In this section, we develop an efficient computational algorithm for the three-state model
based the control limit policy described in Theorem 2.3.2. The objective is to determine
the optimal value of the control limit Π∗ ∈ (0, 1] that minimizes the long-run expected
average cost per unit time. Using the policy of Theorem 2.3.2, we analyze the dynamics of
the posterior probability Πn∆(2) in the semi-Markov decision process (SMDP) framework.
In particular, for fixed control limit Π ∈ (0, 1], we partition the interval [0,Π) into
M ∈ N disjoint subintervals Im = [lm, um), where lm = m−1
MΠ and um = m
MΠ, m =
1, . . . ,M . The set I = I1, . . . , IM is taken as the state space of the following SMDP.
Let tn be the time of the nth decision epoch. Then, the SMDP is defined to be in state
Chapter 2. Optimal Control of Stochastic Systems 24
Im ∈ I provided the current value of the posterior probability Πtn(2) ∈ [lm, um). The
time of the next decision epoch is taken as tn+1 = (tn + ∆) ∧ ξ. To follow the policy
of Theorem 2.3.2, we impose the following actions. If tn+1 = ξ, mandatory corrective
maintenance is carried out so that at the (n + 1)th decision epoch the SMDP returns
to state I1 =[0,Π/M
). Similarly, if tn+1 = (tn + ∆) and Πtn+1
(2) ≥ Π, full preventive
maintenance is carried out so that at the (n+ 1)th decision epoch the SMDP also returns
to state I1 =[0,Π/M
).
With this definition of the state and decision epoch times of the SMDP, for the long-
run average cost criterion, the SMDP is determined by the following quantities [71]:
pmk = the probability that the SMDP will be in state k ∈ I at the
next decision epoch given the current state is m ∈ I.
τm = the expected sojourn time until the next decision epoch given
current state is m ∈ I.
cm = the expected cost incurred until the next decision epoch given
current state is m ∈ I.
Using quantities defined above, for fixed control limit Π ∈ (0, 1], the long-run expected
average cost λ(Π) can be obtained by solving the following system of linear equations,
υm = cm − λ(Π)τm +∑k∈I
pmkυl, for each m ∈ I, (2.4.1)
υl = 0, for some l ∈ I,
and the optimal control limit Π∗ ∈ (0, 1] and corresponding optimal average cost λ∗ =
infΠ∈(0,1]
λ(Π), can be computed using the equations (2.4.1).
The remainder of this section is devoted to explicitly computing the quantities pmk,
τm, cm, m, k ∈ I, defined above. To simplify notation in the derivations that follow, we
write Wn = Πn∆(2) for the posterior probability the system is in a warning state 2 at
Chapter 2. Optimal Control of Stochastic Systems 25
sampling epoch time n∆. Bayes’ Theorem implies,
Wn+1 =f (Yn+1|2) (p12(∆) (1−Wn) + p22(∆)Wn) f (Yn+1|1) p11(∆) (1−Wn)
+f (Yn+1|2) (p12(∆) (1−Wn) + p22(∆)Wn)
Iξ>(n+1)∆. (2.4.2)
Straightforward algebra shows that the ratio of normal densities has the following repre-
sentation
f(y|1)
f(y|2)=
det−1/2 (Σ1)
det−1/2 (Σ2)·
exp(− 1
2(y − µ1)
TΣ−1
1 (y − µ1))
exp(− 1
2(y − µ2)
TΣ−1
2 (y − µ2))
=: h exp((y − b)TA(y − b) + c
),
where
h =det−1/2 (Σ1)
det−1/2 (Σ2), b = −1
2A−1
(µT1 Σ−1
1 + µT2 Σ−12
)T,
A =1
2(Σ−1
2 − Σ−11 ) , c =
µT2 Σ−12 µ2 − µT1 Σ−1
1 µ1
2− bTAb,
so equation (2.4.2) simplifies to
Wn+1 =p12(∆) (1−Wn) + p22(∆)Wn h exp(GTn+1AGn+1 + c
)p11(∆) (1−Wn)
+p12(∆) (1−Wn) + p22(∆)Wn
Iξ>(n+1)∆, (2.4.3)
where Gn+1 := Yn+1 − b. From equation (2.4.3) we have the following result
Theorem 2.4.1. At sampling epoch n∆, for any t ∈ R+, the conditional reliability
function of ξ,
P (ξ > n∆ + t|Fn∆)
= ((1−Wn) (1− p13(t)) +Wn (1− p23(t))) · Iξ>n∆
=: RWn(t) · Iξ>n∆, (2.4.4)
Chapter 2. Optimal Control of Stochastic Systems 26
and for any w ∈ [0, 1], the conditional distribution function of Wn+1,
P (Wn+1 ≤ w|Fn∆)
= RWn(∆)
∑i∈X
P(GTn+1AGn+1 ≥ gWn
(w)|X(n+1)∆ = i)γWn
(i) · Iξ>n∆
+Iξ≤n∆ (2.4.5)
=: FWn(w) · Iξ>n∆ + Iξ≤n∆,
where
gWn(w) = ln
(p12(∆) (1−Wn) + p22(∆)Wn
p11(∆) (1−Wn)
(1− wwh
))− c
γWn(i) =
p1i(∆) (1−Wn) + p2i(∆)Wn
RWn(∆)
.
Proof. We first prove the formula for the conditional reliability function of ξ. Conditional
on ξ > n∆, for any t ∈ R+, Bayes’ Theorem implies
P (ξ > n∆ + t|Fn∆)
= P (Xn∆+t 6= 2|Fn∆)
= P (Xn∆+t 6= 2|Xn∆ = 1) (1−Wn) + P (Xn∆+t 6= 2|Xn∆ = 2)Wn
= ((1−Wn) (1− p13(t)) +Wn (1− p23(t))) ,
and conditional on ξ ≤ n∆, P (ξ > n∆ + t|Fn∆) = 0. Combining the equations gives
(2.4.4). We next prove the formula for the conditional distribution function of Wn+1.
Conditional on ξ > n∆, for any w ∈ [0, 1], equation (2.4.3) and Bayes’ Theorem imply
P (Wn+1 ≤ w|Fn∆)
= RWn(∆)P (Wn+1 ≤ w|ξ > (n+ 1)∆, Y1, . . . , Yn)
= RWn(∆)
∑i∈X
P(GTn+1AGn+1 ≥ gWn
(w)|X(n+1)∆ = i)
·P (X(n+1)∆ = i|ξ > (n+ 1)∆, Y1, . . . , Yn)
,
Chapter 2. Optimal Control of Stochastic Systems 27
where
P (X(n+1)∆ = i|ξ > (n+ 1)∆, Y1, . . . , Yn)
=1
RWn(∆)
P (X(n+1)∆ = i|ξ > n∆, Y1, . . . , Yn)
=p1i(∆) (1−Wn) + p2i(∆)Wn
RWn(∆)
= γWn(i),
and conditional on ξ ≤ n∆, P (Wn+1 ≤ w|Fn∆) = 1. Combining the equations gives
(2.4.5), which completes the proof.
Provost and Rudiuk [59] derived an explicit formula for both the density (Theorem
2.1, pp. 386) and distribution function (Theorem 3.1, pp. 391) of indefinite quadratic
forms in normal vectors GTAG, where G is any multivariate normal Nd (µ,Σ) and A is any
d×d symmetric matrix. By definition Gn := Yn−b, so that Gn|Xn∆ = i ∼ Nd (µi − b,Σi)
and P(GTn+1AGn+1 ≥ gWn
(w)|X(n+1)∆ = i)
in equation (2.4.5) can be computed explicitly
using Theorem 3.1. of Provost and Rudiuk [59].
Using equations (2.4.4) and (2.4.5) we now can easily evaluate the quantities pmk, τm,
cm, m, k ∈ I. Suppose at time n∆, the process is in state Im ∈ I, then for M large,
Wn ≈ lm and we can approximate the transition probabilities
pImI1 = (1−Rlm(∆)) +(1− Flm
(Π))
+ (Flm (u1)− Flm (l1)) .
The first term on the right-hand side of the equation above is the probability the system
will fail before the next observation epoch, the second and third terms are the probabilities
the system will not fail before the next sampling epoch (n + 1)∆ and the posterior
probability Wn+1 will enter either the preventive maintenance region [Π, 1] or interval
I1 = [l1, u1), respectively. In all three cases, the state of the SMDP at the next decision
epoch is I1. The remaining transition probabilities of the SMDP have a simpler structure
and are given by
pImIk = Flm (uk)− Flm (lk) , k = 2, . . . ,M.
Chapter 2. Optimal Control of Stochastic Systems 28
Using equation (2.4.4), the mean sojourn time
τIm =
∫ ∆
0
Rlm(s)ds
= (1− lm)
(∆−
∫ ∆
0
p13(s)ds
)+ lm
(∆−
∫ ∆
0
p23(s)ds
),
and the mean cost
cIm =∑i∈X
(Ci + Cfiqi3)
((1− lm)
∫ ∆
0
p13(s)ds+ lm
∫ ∆
0
p23(s)ds
)+Rlm(∆)
∑i∈X
CpiP(GT
1AG1 ≤ glm(Π)|X∆ = i)γlm(i) + CsFlm
(Π),
so that quantities pmk, τm, cm, m, k ∈ I, can now be computed explicitly and the optimal
control limit Π∗ ∈ (0, 1] and corresponding optimal average cost λ∗ = infΠ∈(0,1] λ(Π), can
be computed using the equations (2.4.1).
Example. We now illustrate the computational procedure with a numerical example
using model parameters from a mining industry application. In Chapter 4, we will show
how the model parameters can be estimated using historical data. We consider a failing
transmission unit with state generator
Q =
−0.0304 0.0303 0.0001
0 −0.3548 0.3548
0 0 0
Every ∆ = 600 hours, oil samples are collected and spectrometric analysis is carried out
which provides the concentrations in parts per million (ppm) of d = 2 wear elements. At
each sampling epoch n∆, the bivariate vector Yn follows N (µ1,Σ1) when the system is
in healthy state 1 and N (µ2,Σ2) when the system is in warning state 2, where
µ1 =
1.1
1.9
, µ2 =
4.1
5.5
, Σ1 =
7.2 2.0
2.0 3.6
, Σ2 =
7.6 1.0
1.0 3.2
.The known cost parameters C1 = C2 = 0, Cf1 = Cf2 = 6780, Cp1 = 450, Cp2 = 1560,
however the sampling cost Cs is unknown. We analyze the effect Cs has on the optimal
Chapter 2. Optimal Control of Stochastic Systems 29
Figure 2.4.1: Optimal control limit Π∗
vs sampling cost Cs.
control limit Π∗ ∈ (0, 1] and optimal average cost rate λ∗. We chose partition parameter
M = 25, and using MATLAB we compute optimal average cost using the system of linear
equations (2.4.1), and obtain the following results in Table 2.4.1.
Table 2.4.1: Effect of varying sampling costs Cs on Π∗
and λ∗
Cs 0 5 10 50 100 500
Π∗
0.1894 0.1857 0.1856 0.1847 0.1837 0.1742
λ∗ 104.59 109.40 114.21 152.69 200.78 585.52
Table 2.4.1 shows that as the sampling cost Cs increases, the optimal control limit
Π∗
decreases and the optimal average cost λ∗ increases. We graph the results in Figures
2.4.1 and 2.4.2. For example if Cs = 10, the optimal control Π∗
= 0.1857 and optimal
average cost λ∗ = 109.40. We illustrate the use of this optimal control limit policy
on a sample data history in Figure 2.4.3. The control chart shows that the posterior
probability Πn∆(2) that the system is in a warning state 1 exceeds the control limit at
the 14th sampling epoch. At this point, full preventive maintenance is carried out.
Chapter 2. Optimal Control of Stochastic Systems 30
Figure 2.4.2: Optimal average cost rate λ∗ vs sampling cost Cs.
Figure 2.4.3: Optimal control limit policy.
Chapter 2. Optimal Control of Stochastic Systems 31
2.5 Conclusions and Future Research
We have considered an optimal control problem with costly multivariate observations
carrying partial information about the system state. The state process follows an unob-
servable continuous time homogeneous Markov process. The objective was to determine
the optimal replacement policy that minimizes the long-run expected average cost per
unit time. We have characterized the structure of the optimal replacement policy and
have shown that the optimal preventive maintenance region is a convex subset of Eu-
clidean space. We have also analyzed the three-state version of this problem in detail
and have shown that the optimal policy is a control limit policy. An efficient computa-
tional algorithm was developed in the semi-Markov decision process framework for the
three-state problem with an illustrative numerical example.
We suggest a few possible directions for future research. In some applications, it
may be appropriate to allow for preventive maintenance at any real-valued time, not
just at sampling epochs n∆. This could be an interesting topic for future research. In
this case, the optimal stopping problem must be formulated considering a continuous-
time filtration defined by Ft = σ (Y1, . . . , Yb(t∧ξ)/∆c, ξIξ≤t, Iξ>t), t ∈ R+. An interesting
comparison could then be made to determine how much additional cost savings can be
obtained by allowing preventive maintenance to be taken in continuous time. Another
interesting extension would be to allow the state sojourn times, which are exponentially
distributed, to have more general distributions such as an Erlang, Weibull or Gamma
distribution. In the literature, such models are referred to as hidden semi-Markov models
(HSMM). Typically, HSMM are more difficult to analyze due to the loss of the Markov
(i.e., memoryless) property. A final possible future research topic would be to test the
effectiveness of our methodology on other real-world data sets such as vibration data,
performance, or quality monitoring data using the availability maximization criterion,
which is sometimes preferable in practice to the cost minimization criterion. This should
also lead to a further refinement of both the model and control algorithm.
Chapter 3
Optimal Sampling and Control of
Stochastically Failing Systems
Modern manufacturing and production industries rely heavily on complex technical sys-
tems for their everyday operations. These systems typically deteriorate and are subject
to breakdowns due to usage and age. The high cost associated with unplanned break-
downs has stimulated a lot of research activity in the maintenance optimization literature,
where the main focus has been on determining the optimal time to preventively repair
or replace a system before it fails. One of the earliest and most significant contributions
to this class of problems is the celebrated paper of Barlow and Hunter [6]. More recent
contributions are given by Dogramaci and Fraiman [21], Heidergott and Farenhorst-Yuan
[28], Kurt and Kharoufeh [44], and Kim et al. [38], among others.
The most advanced state of the art maintenance program applied in practice is known
as condition-based maintenance (CBM), which recommends maintenance actions based
on information collected through online condition monitoring. CBM initiates mainte-
nance actions only when there is strong evidence of severe system deterioration, which
significantly reduces maintenance costs by decreasing the number of unnecessary main-
tenance operations. For a recent overview of the mathematical models and technologies
32
Chapter 3. Optimal Sampling and Control of Stochastic Systems 33
used in CBM readers are referred to Jardine et al. [30] and the references therein.
The common assumption made in CBM optimization models is that information
used for decision-making is obtained at periodic equidistant sampling epochs. Under this
assumption, the goal is to determine the optimal maintenance policy that optimizes an
objective function over a finite or infinite time horizon. Recent contributions are given by
Dayanik and Gurler [18], Makis and Jiang [51], Wang et al. [73], and Juang and Anderson
[34]. The problem with the equidistant sampling assumption is that in many applications
there is a high sampling cost associated with collecting observable data. It is therefore
of equal importance to determine when information should be collected as it is to decide
how this information should be utilized for maintenance decision-making. This type of
joint optimization has been a long-standing open problem in the operations research and
maintenance optimization literature, but very few result regarding the structure of the
optimal sampling and maintenance policy have been published.
An excellent early contribution to the joint optimization problem was given by Ohnishi
et al. [58], who considered a deteriorating system with N fully observable states. Under
reasonable monotonicity assumptions, the authors were able to partially characterize the
form of the optimal policy and showed that the times between successive samples are
monotonically decreasing. Ross [63] considered a similar problem in the area of quality
control in which the system state is only partially observable. Under the expected total
discounted reward criterion, the author showed that for a two state model, the optimal
policy is characterized by four control regions. Other early noteworthy contributions are
the models of Anderson and Friedman [1], Kander [35] and Rosenfield [62]. More recently,
Yeh [82] modelled a general N state sampling and maintenance problem in the Markov
decision process framework. The author proposed a number of different algorithms to
derive the optimal sampling and maintenance policy, but was not able to characterize its
form. The models of by Dieulle et al. [20], Lam and Yeh [45], and Jiang [32] are other
recent contributions. It should be noted that no optimality results have been published
Chapter 3. Optimal Sampling and Control of Stochastic Systems 34
for partially observable failing systems with the long-run average cost criterion.
In this chapter, we consider a system whose state information is unobservable and can
only be inferred by taking a sample through condition monitoring. System failure on the
other hand is fully observable. The decision maker can decide when condition monitoring
information should be collected, as well as when to initiate full system inspection, followed
possibly by preventive maintenance. The objective is to characterize the structural form
of the optimal sampling and maintenance policy that minimizes the long-run expected
cost per unit time. The problem is formulated as a partially observable Markov decision
process (POMDP). It is shown that monitoring the posterior probability that the system
is in a so-called warning state is sufficient for decision-making. The primary contribution
of this chapter is that we prove the optimality of a sampling and maintenance policy that
is characterized by three critical thresholds, which have practical interpretation and give
new insight into the value of condition monitoring information.
The remainder of the chapter is organized as follows. In §3.1, we formulate and analyze
the joint optimization problem in the POMDP framework. In §3.2, we determine the
structural properties of the optimal policy. The dynamic optimality equation is derived
and we establish the form of the optimal sampling and maintenance policy. In §3.3, we
develop an iterative algorithm to compute the optimal policy and the long-run expected
average cost per unit time. We also provide numerical comparisons with other suboptimal
policies that illustrate the benefits of the joint optimization of sampling and maintenance.
Concluding remarks and future research directions are provide in §3.4.
3.1 Model Formulation
Consider a system that can be characterized by one of three distinguishable states: a
healthy state (state 1), a warning state (state 2), and a failure state (state 3). Recent
studies have found through experiments with real diagnostic data such as spectrometric
Chapter 3. Optimal Sampling and Control of Stochastic Systems 35
oil data (e.g. [38], [53]) and vibration data (e.g. [80]), that it is usually preferable and
sufficient to consider only two operational states - a healthy state and a warning state.
Such a characterization has the desirable property that maintenance actions are only
initiated when the system experiences severe deterioration that can actually cause failure.
In many cases, the system moves through two distinct phases of operation. In the first
and longer phase, the system operates under normal conditions, and the observations
behave in a stationary manner. Although system degradation can be gradual, it is
usually not until degradation has exceeded a certain level that the behaviour of the
condition monitoring observations changes substantially. At this point, the system enters
the second and shorter phase, which we define to be the warning state.
Let (Ω,F , P ) be a complete probability space on which the following stochastic pro-
cesses are defined. The state process (Xt : t ∈ R+) is modeled as a continuous time
homogeneous Markov chain with state space X = 1, 2∪3 and transition rate matrix
Q = (qij). To model monotonic system deterioration, the state process is non-decreasing
with probability 1, i.e. qij = 0 for all j < i. In particular, this implies that without
corrective maintenance the failure state is absorbing. The system is more likely to fail
in warning state 2 than in healthy state 1, i.e. q23 > q13. Let ξ = inf t ∈ R+ : Xt = 3
be the observable time of system failure. Upon system failure, mandatory corrective
maintenance that takes TF time units is performed at a cost CF , which brings the system
to a healthy state 1.
To avoid costly failures, the decision maker can take a sample at a cost CS. In
real applications, taking and processing a sample through condition monitoring, such
as an oil sample, cannot be done instantaneously due to the time it takes to collect
the sample and process it at a laboratory. Therefore, we assume that the processing
time of the sample is ∆ ∈ (0,+∞) time units, so that if a sample is taken at time t,
information from the sample is first available to be used for decision making at time
t + ∆. We therefore naturally assume that the decision maker has the opportunity to
Chapter 3. Optimal Sampling and Control of Stochastic Systems 36
take (or not take) samples only at time points 0,∆, 2∆, 3∆, . . .. Condition monitoring
information at time n∆ is denoted Yn and takes values in E = 1, . . . , L. Samples
Yn are stochastically related to the operational system state Xn∆. In particular, while
the system is in operational state Xn∆ = i ∈ 1, 2, sample Yn has state-dependent
distribution
diy = P (Yn = y|Xn∆ = i), y ∈ E . (3.1.1)
The state-observation matrix is denoted D = (diy).
Upon receiving information from a condition monitoring sample, the decision maker
can initiate full system inspection to reveal (with probability 1) the current state of
the system at a cost CI . If the system is found to be in warning state 2, preventive
maintenance is performed which brings the system to a healthy state 1 at a cost CP . If
the system is found to be in healthy state 1, no preventive maintenance is performed and
the process continues. Full system inspection and preventive maintenance takes TI and
TP time units, respectively. We make the standard assumption CF ≥ CI +CP . For every
time unit the system remains in warning state 2, an operating cost CW is incurred. The
objective is to characterize the structural form of the optimal sampling and maintenance
policy that minimizes the long-run expected average cost per unit time. The problem
can be formulated in the POMDP framework as follows.
While the system is operational, one of the following three actions an ∈ 1, 2, 3 must
be taken at each decision epoch time n∆:
1. Do nothing, and take an action at the next decision epoch time (n+ 1)∆.
2. Take a sample. Information from the sample Yn+1 is first made available for decision-
making at the beginning of the next decision epoch time (n+ 1)∆.
3. Initiate full system inspection, followed possibly by preventive maintenance.
If n∆ time units have elapsed since the last maintenance action (full inspection,
preventive maintenance or corrective maintenance) and k samples Yn1, . . . , Ynk have been
Chapter 3. Optimal Sampling and Control of Stochastic Systems 37
collected at time points 0 < n1∆ < · · · < nk∆ ≤ n∆, then it is well known from the
theory of POMDPs (e.g. [7]) that
Πn = P (Xn∆ = 2|ξ > n∆, Yn1, . . . , Ynk), (3.1.2)
the probability that the system is in warning state 2 given all available information until
time n∆, represents sufficient information for decision-making at the nth decision epoch.
Then, if an optimal stationary policy exists, it has the functional form φ(π) ∈ 1, 2, 3,
0 ≤ π ≤ 1, where φ(π) indicates the action an to be chosen when Πn = π. Let Φ be the
class of all stationary policies. From renewal theory, the long-run expected average cost
per unit time is calculated for any stationary policy φ ∈ Φ as the expected total cost
TC(φ) incurred in one cycle divided by the expected cycle length CL(φ), where a cycle
is completed when either full system inspection, preventive maintenance or corrective
maintenance is carried out.
For any stationary policy φ ∈ Φ, let
M(φ) = inf n∆ ∈ R+ : φ(Πn) = 3 (3.1.3)
represent the first time at which full system inspection is initiated, and let
N(φ) = |n : φ(Πn) = 2, n∆ < M(φ) ∧ ξ| (3.1.4)
represent the total number of samples collected in a cycle. Then, from the model de-
scription given above,
TC(φ) = CSN(φ) +
∫ M(φ)∧ξ
0
CW IXt=2dt+ CIIXM(φ)∧ξ=1
+(CI + CP )IXM(φ)∧ξ=2 + CF IXM(φ)∧ξ=3, (3.1.5)
and
CL(φ) = (M(φ) ∧ ξ) + TIIXM(φ)∧ξ=1 + (TI + TP )IXM(φ)∧ξ=2 + TF IXM(φ)∧ξ=3. (3.1.6)
Chapter 3. Optimal Sampling and Control of Stochastic Systems 38
For the average cost criterion, the problem is to find a stationary policy φ∗ ∈ Φ, if it
exists, minimizing the long-run expected average cost per unit time given by
EΠ0[TC(φ)]
EΠ0[CL(φ)]
, (3.1.7)
where EΠ0is the conditional expectation given Π0 = P (X0 = 2). We assume that a new
system is installed at the beginning of the first cycle, i.e. Π0 = 0.
We first transform the stochastic control problem (3.1.7) to an equivalent parameter-
ized stochastic control problem (with parameter λ) with an additive objective function.
This transformation is known as the λ−minimization technique, and its theory is devel-
oped in the excellent paper of [3]. Define for λ > 0 the function
V λ (Π0) = infφ∈Φ
EΠ0[TC(φ)− λCL(φ)] . (3.1.8)
Then, [3] showed that λ∗ determined by the equation
λ∗ = infλ > 0 : V λ (Π0) ≤ 0
(3.1.9)
is the optimal expected average cost for the stochastic control problem (3.1.7), and the
stationary policy φ∗ ∈ Φ that minimizes the right-hand side of (3.1.8) for λ = λ∗ deter-
mines the optimal stationary policy. We refer to the function V λ(·) defined in (3.1.8) as
the value function.
Although the model developed in this chapter is presented in the reliability and main-
tenance context, the methods and results developed in this chapter can be applied to a
number of different fields. For example, there is a very close connection between the
problem described above and the joint optimization of cancer screening and treatment
scheduling. In such healthcare applications, a patient can be in one of three states: a
healthy state (no disease), an asymptomatic state (has the disease, but the state is not
fully observable) or symptomatic (has the disease, and it is observable). The three states
correspond exactly to our healthy, warning and failure states. As in our model, since the
asymptomatic state is not fully observable, the state of the patient can only be inferred
Chapter 3. Optimal Sampling and Control of Stochastic Systems 39
through ‘costly’ checkups. Although the checkups provide information about the state
of the patient, the information is imperfect due false positive and negative test results.
Furthermore, based on the information collected, the physician can recommend a more
costly test which can reveal the true state of the patient with certainty (this corresponds
to initiating full system inspection, action an = 3). If the patient is found to have the
disease, treatment begins (analogous to preventive maintenance), otherwise the screening
process continues. Recent contributions to healthcare screening and treatment planning
are given by [13], [54] and [66], and the references therein. Other interesting applications
of our model also include quality and statistical process control (e.g. [11], [49], [70]) and
change point detection applications (e.g. [17], [57]).
In the next section, we analyze the value function defined in (3.1.8) and determine
the structure of the optimal sampling and maintenance policy. For the remainder of the
chapter, to simplify notation we suppress the dependence on λ when there is no confusion
and write, for example, V (Π0) instead of V λ (Π0).
3.2 Structural Form of the Optimal Policy
The goal of this section is to characterize the form of the optimal sampling and mainte-
nance policy. The strategy we take is to first analyze the control problem over a restricted
subclass of stationary policies Φk ⊂ Φ in which full system inspection must be initiated
no later than at time k∆. The value function Vk for the restricted control problem is de-
rived and its properties are determined. The restriction is then lifted, and the properties
of the restricted value functions Vk are carried over to the infinite horizon value function
V , which can be obtained as the limit Vk → V . The dynamic optimality equation is then
derived and further properties of the infinite horizon value function V are determined. It
is then shown that the optimal policy is characterized by three critical thresholds, which
have practical value and intuitive interpretation.
Chapter 3. Optimal Sampling and Control of Stochastic Systems 40
We begin by providing a closed-form expression for the transition probability matrix
for the uncontrolled state process (Xt). By the model assumptions given in Section 3.1,
it can be shown by solving the Kolmogorov backward differential equations (e.g. [26]),
that the transition probability matrix for the uncontrolled state process is given by
P(t) = [pij(t)]
=
e−υ1t
q12(e−υ2t−e−υ1t)υ1−υ2
1− e−υ1t − q12(e−υ2t−e−υ1t)υ1−υ2
0 e−υ2t 1− e−υ2t
0 0 1
, (3.2.1)
where transition probabilities pij(t) = P (Xt = j|X0 = i), i, j ∈ X , and constants
υ1 = q12 + q13, υ2 = q23.
Suppose at decision epoch n the system has not failed, i.e. ξ > n∆, and Πn = π.
Then for any t ∈ [0,∆], the probability that the system will not fail by n∆ + t is given
by
R(t|π) := P (ξ > n∆ + t|ξ > n∆,Πn = π)
= (1− p13(t))(1− π) + (1− p23(t))π. (3.2.2)
The function R(·|π) defined in (3.2.2) is known as the conditional reliability function. If
the decision maker chooses action an = 2 (take a sample), then at the beginning of the
next decision epoch n+ 1, if ξ > (n+ 1)∆, a sample Yn+1 is made available and the state
probability is updated using Bayes’ Rule (e.g. [65])
Πn+1(Yn+1, π) := P (X(n+1)∆ = 2|ξ > (n+ 1)∆, Yn+1,Πn = π)
=d2Yn+1
(p12(∆)(1− π) + p22(∆)π)
d1Yn+1p11(∆)(1− π) + d2Yn+1
(p12(∆)(1− π) + p22(∆)π).(3.2.3)
On the other hand, if the decision maker choses action an = 1 (do nothing), at the
beginnning of the next decision epoch n+ 1, if ξ > (n+ 1)∆, no new sample is available,
Chapter 3. Optimal Sampling and Control of Stochastic Systems 41
so that the state probability is given by
Πn+1(∅, π) := P (X(n+1)∆ = 2|ξ > (n+ 1)∆,Πn = π)
=p12(∆)(1− π) + p22(∆)π
R(∆|π)
=p12(∆)(1− π) + p22(∆)π
(1− p13(∆))(1− π) + (1− p23(∆))π. (3.2.4)
The empty set symbol ∅ in (3.2.4) is used to indicate that no new sample Yn+1 was
obtained at the beginning of decision epoch n+ 1.
We next analyze the control problem over a restricted subclass of stationary policies.
For k ≥ 0, let Φk ⊂ Φ represent the class of stationary policies φ such that the time of
the first decision epoch at which full system inspection is initiated is less than or equal to
k∆ with probability 1, i.e. M(φ) ≤ k∆. Then, by the dynamic programming algorithm
(e.g. [7]), the value function for the restricted control problem
Vk(π) = infφ∈Φk
Eπ [TC(φ)− λCL(φ)] (3.2.5)
satisfies the dynamic equations
V0(π) = CI + CPπ − λ(TI + TPπ),
Vk(π) = min V 1k (π), V 2
k (π), V 3k (π) , (3.2.6)
where
V 1k (π) = CW
∫ ∆
0
(p12(t)(1− π) + p22(t)π)dt− λ∫ ∆
0
R(t|π)dt
+(CF − λTF ) (1−R(∆|π)) +R(∆|π)Vk−1 (Π1(∅, π)) ,
V 2k (π) = CS + CW
∫ ∆
0
(p12(t)(1− π) + p22(t)π)dt− λ∫ ∆
0
R(t|π)dt
+(CF − λTF ) (1−R(∆|π)) +R(∆|π)∑y∈E
Vk−1 (Π1(y, π)) g(y, π),(3.2.7)
V 3k (π) = CI + CPπ − λ(TI + TPπ),
and
g(y, π) =d1yp11(∆)(1− π) + d2y (p12(∆)(1− π) + p22(∆)π)
R(∆|π). (3.2.8)
Chapter 3. Optimal Sampling and Control of Stochastic Systems 42
The first term V 1k (π) in (3.2.7) is the expected ‘cost’ if action 1 (do nothing) is chosen,
and the decision maker runs the system for one period, updates the state probability
Π1(∅, π) using equation (3.2.4), and then continues optimally with k − 1 periods left.
The second term V 2k (π) is the expected ‘cost’ if action 2 (take a sample) is chosen, and
the decision maker runs the system for one period, collects a sample Y1 = y, updates
the state probability Π1(y, π) using equation (3.2.3), and then continues optimally with
k−1 periods left. The third term V 3k (π) is the expected ‘cost’ if action 3 (full inspection)
is chosen, and the decision maker stops the process for full system inspection, followed
possibly by preventive maintenance.
It then follows from equations (3.2.6) - (3.2.8) that the restricted value functions Vk
have the following property.
Lemma 3.2.1. For each k ≥ 0, Vk(π) is a concave function of π.
Proof. We prove this lemma using mathematical induction. For k = 1, substituting
equations (3.2.2) - (3.2.4) into (3.2.7) shows that V 11 (π), V 2
1 (π), V 31 (π) are linear, and
hence concave, functions of π. Assume that for some k > 0, Vk(π) is a concave function
of π. We want to show that Vk+1(π) is also a concave function of π. Since the min
operator preserves concavity and R(∆|π) is a linear function of π, it suffices to show
that R(∆|π)Vk (Π1(∅, π)) and R(∆|π)∑
y∈E Vk (Π1(y, π)) g(y, π) are concave functions of
π. Fix arbitrary π1, π2, α ∈ [0, 1]. Then by equation (3.2.4),
Π1(∅, απ1 + (1− α)π2) =
(αR(∆|π1)
αR(∆|π1) + (1− α)R(∆|π2)
)Π1(∅, π1)
+
((1− α)R(∆|π2)
αR(∆|π1) + (1− α)R(∆|π2)
)Π1(∅, π2).
Then by concavity of Vk,
R(∆|απ1 + (1− α)π2)Vk (Π1(∅, απ1 + (1− α)π2))
≥ αR(∆|π1)Vk (Π1(∅, π1)) + (1− α)R(∆|π2)Vk (Π1(∅, π2)) ,
Chapter 3. Optimal Sampling and Control of Stochastic Systems 43
which shows that R(∆|π)Vk (Π1(∅, π)) is a concave function of π. Similarly, by equation
(3.2.3), for each y ∈ E ,
Π1(y, απ1 + (1− α)π2)
=
(αg(y, π1)R(∆|π1)
αg(y, π1)R(∆|π1) + (1− α)g(y, π2)R(∆|π2)
)Π1(y, π1)
+
((1− α)g(y, π2)R(∆|π2)
αg(y, π1)R(∆|π1) + (1− α)g(y, π2)R(∆|π2)
)Π1(y, π2).
Then by concavity of Vk,
R(∆|απ1 + (1− α)π2)∑y∈E
Vk (Π1(y, απ1 + (1− α)π2)) g(y, απ1 + (1− α)π2)
≥∑y∈E
(αVk (Π1(y, π1)) g(y, π1)R(∆|π1) + (1− α)Vk (Π1(y, π2)) g(y, π2)R(∆|π2))
= αR(∆|π1)∑y∈E
Vk (Π1(y, π1)) g(y, π1) + (1− α)R(∆|π2)∑y∈E
Vk (Π1(y, π2)) g(y, π2)
which shows that R(∆|π)∑
y∈E Vk (Π1(y, π)) g(y, π) is a concave function of π. Thus, by
mathematical induction, for each k > 0, Vk(π) is a concave function of π.
We also have the following lower bound on the family of restricted value functions
(Vk(π)).
Lemma 3.2.2. The restricted value functions Vk(π) are uniformly bounded from below,
Vk(π) ≥ − λ(∆ + TF )
1−R(∆|0). (3.2.9)
Proof. We prove inequality (3.2.9) using mathematical induction. For k = 0, it is clear
that V0(π) ≥ − λ(∆+TF )
1−R(∆|0). Assume that for some k ≥ 0, Vk(π) ≥ − λ(∆+TF )
1−R(∆|0). Then, it follows
that
V 1k+1(π) = CW
∫ ∆
0
(p12(t)(1− π) + p22(t)π)dt− λ∫ ∆
0
R(t|π)dt
+(CF − λTF ) (1−R(∆|π)) +R(∆|π)Vk (Π1(∅, π))
≥ −λ(∆ + TF )− λ(∆ + TF )
1−R(∆|0)R(∆|π)
≥ −λ(∆ + TF )− λ(∆ + TF )
1−R(∆|0)R(∆|0)
= − λ(∆ + TF )
1−R(∆|0),
Chapter 3. Optimal Sampling and Control of Stochastic Systems 44
and
V 2k+1(π) = CS + CW
∫ ∆
0
(p12(t)(1− π) + p22(t)π)dt− λ∫ ∆
0
R(t|π)dt
+(CF − λTF ) (1−R(∆|π)) +R(∆|π)∑y∈E
Vk−1 (Π1(y, π)) g(y, π)
≥ −λ(∆ + TF )− λ(∆ + TF )
1−R(∆|0)R(∆|π)
∑y∈E
g(y, π)
= −λ(∆ + TF )− λ(∆ + TF )
1−R(∆|0)R(∆|π)
≥ −λ(∆ + TF )− λ(∆ + TF )
1−R(∆|0)R(∆|0)
= − λ(∆ + TF )
1−R(∆|0).
Since V 3k+1(π) = V0(π) ≥ − λ(∆+TF )
1−R(∆|0), it follows that Vk+1(π) ≥ − λ(∆+TF )
1−R(∆|0). Thus, by
mathematical induction, Vk(π) ≥ − λ(∆+TF )
1−R(∆|0)for all k, π.
Lemmas 3.2.1 and 3.2.2 allow us to characterize the infinite horizon value function
V defined in (3.1.8). For each k, since Φk ⊂ Φk+1, by definition of Vk given in equation
(3.2.5) it follows that Vk(π) ≥ Vk+1(π). Then, by Lemma 3.2.2, since the restricted value
functions (Vk) are uniformly bounded from below, limk→∞ Vk = V exists, and by Lemma
3.2.1, the value function V is concave and bounded, and it satisfies the following dynamic
optimality equation (e.g. [7]), which gives us our first important structural result:
Theorem 3.2.3. The infinite horizon value function defined in equation (3.1.8) is ob-
tained as the limit V (π) = limk→∞ Vk(π). Furthermore, V (π) is a concave, bounded
function of π, satisfying the dynamic optimality equation
V (π) = min V 1(π), V 2(π), V 3(π) , (3.2.10)
Chapter 3. Optimal Sampling and Control of Stochastic Systems 45
where
V 1(π) = CW
∫ ∆
0
(p12(t)(1− π) + p22(t)π)dt− λ∫ ∆
0
R(t|π)dt
+(CF − λTF ) (1−R(∆|π)) +R(∆|π)V (Π1(∅, π)) ,
V 2(π) = CS + CW
∫ ∆
0
(p12(t)(1− π) + p22(t)π)dt− λ∫ ∆
0
R(t|π)dt (3.2.11)
+(CF − λTF ) (1−R(∆|π)) +R(∆|π)∑y∈E
V (Π1(y, π)) g(y, π),
V 3(π) = CI + CPπ − λ(TI + TPπ).
It then follows that the value function V is also non-decreasing.
Corollary 3.2.4. The infinite horizon value function V (π) is a non-decreasing function
of π.
Proof. By Theorem 3.2.3, the value function V (π) = minV 1(π), V 2(π), V 3(π) is a con-
cave function of π. Furthermore, V 3(π) = CI + CPπ − λ(TI + TPπ) is a non-decreasing
in π if and only if λ ≤ CPTP
. However, the second inequality must hold; for if λ > CPTP
,
there would be no need to monitor and control the system since it would always be op-
timal to initiate preventive maintenance indefinitely, which gives a long-run average cost
of CPTP
. Thus, V 3(π) = CI + CPπ − λ(TI + TPπ) is linear and non-decreasing in π. If
there exists π1 < π2 such that V (π1) > V (π2), then necessarily V (1) < V 3(1), which is
a contradiction, since we show in the proof of Theorem 3.2.6 that V (1) = V 3(1). Thus,
for all π1 < π2, V (π1) ≤ V (π2), i.e. the value function V (π) is a non-decreasing function
of π, which completes the proof.
We next prove a theorem, which makes use of the result in the classical paper of [6].
Theorem 3.2.5. Any policy φ∞ ∈ Φ that never stops the process to initiate full system
inspection, i.e. M(φ∞) = +∞, is not optimal.
Proof. Consider an age-based policy φn ∈ Φ that initiates full system inspection at time
n∆. From renewal theory, the long-run expected average cost per unit time for this policy
Chapter 3. Optimal Sampling and Control of Stochastic Systems 46
is given by
g(n) =
CFp13(n∆) + CIp11(n∆) + (CI + CP )p12(n∆)
+CW
∫ n∆
0
p12(t)dt+ CSE [N(φn)]
E [n∆ ∧ ξ] + TFp13(n∆) + TIp11(n∆) + (TI + TP )p12(n∆)
. (3.2.12)
Thus, to prove the claim, it suffices to show that
arg minn
g(n) < +∞. (3.2.13)
To show (3.2.13), we derive an upper bound for arg minn g(n) by considering a related
process in which we remove all incentive to stop the process early, so that full system
inspection must be done at a later time. In particular, consider a related process in which
full system inspection costs CI + CP , whether the system is found to be in healthy or
warning state, and all maintenance actions (corrective, inspection and preventive) take
0 time units. We furthermore assume that there is no penalty to run the system longer,
so that CW = CS = 0. Then, if preventive maintenance is scheduled at time n∆, the
expected average cost for this process is given by
b(n) =CFp13(n∆) + (CI + CP )(1− p13(n∆))
E[n∆ ∧ ξ], (3.2.14)
and clearly
arg minn
g(n) ≤ arg minn
b(n).
To complete the proof, we show that arg minn b(n) < +∞, which implies equation
(3.2.13). Since we have assumed q23 > q13, the failure rate of ξ is increasing. We have
also assumed that CF > CI +CP . [6] showed that under these assumptions, there exists a
positive real value t∗ < +∞ such that t∗ is the unique minimizer of b(t). For our problem
arg minn b(n) is required to be integer-valued. However, since t∗ is a unique minimizer,
the function b(t) is increasing for t > t∗. Thus, it follows that arg minn b(n) ≤ dt∗e < +∞,
which completes the proof.
Chapter 3. Optimal Sampling and Control of Stochastic Systems 47
The optimal sampling and maintenance policy is described by the following Theorem.
Theorem 3.2.6. The optimal sampling and maintenance policy φ∗ ∈ Φ is characterized
by three critical thresholds 0 ≤ θL ≤ θU ≤ η ≤ 1. In particular, at decision epoch n:
1. If Πn < θL, do nothing and run the system until the next decision epoch n+ 1.
2. If θL ≤ Πn < θU , take a sample.
3. If θU ≤ Πn < η, do nothing and run the system until the next decision epoch n+ 1.
4. If Πn ≥ η, initiate full system inspection, followed possibly by preventive mainte-
nance.
5. Corrective maintenance is carried out immediately upon system failure.
Proof. We first show that for π = 1, V 3(1) < V 1(1) < V 2(1). We start with the second
inequality V 1(1) < V 2(1). By equation (3.2.11),
V 1(1)− V 2(1) = R(∆|1)V (Π1(∅, 1))−R(∆|1)∑y∈E
V (Π1(y, 1)) g(y, 1)− CS
= R(∆|1)V (1)−R(∆|1)V (1)∑y∈E
g(y, 1)− CS
= −CS
< 0,
which implies V 1(1) < V 2(1). We next show the first inequality V 3(1) < V 1(1) using
mathematical induction. The inequality is equivalent to V 3(1) = V (1). For k = 1, we
assume V 31 (1) > V1(1) and draw a contradiction. Since it is not optimal to stop the
process to initiate full system inspection when π = 0, linearity of V 11 (π), V 2
1 (π), V 31 (π),
implies that V 31 (π) > V1(π) for all 0 ≤ π ≤ 1. Since for each k, Vk(π) ≥ Vk+1(π), it
follows that the limit V (π) = limk→∞ Vk(π) < V 31 (π), and the policy that never stops the
process is optimal, which contradicts Theorem 3.2.5. Whence, V 31 (1) = V1(1), and by
Chapter 3. Optimal Sampling and Control of Stochastic Systems 48
equation (3.2.6),
CI + CP − TI − TP ≤ CW
∫ ∆
0
p22(t)dt− λ∫ ∆
0
R(t|1)dt
+(CF − λTF ) (1−R(∆|1)) +R(∆|1) (CI + CP ) .
Suppose now that for some k > 0, V 3k (1) = Vk(1). Using the above inequality, it follows
that
Vk+1(1) = minV 1k+1(1), V 2
k+1(1), V 3k+1(1)
= min
V 1k+1(1), V 3
k+1(1)
= CI + CP − TI − TP
= V 3k+1(1),
which completes the inductive step. Therefore the limit V (1) = limk→∞ Vk(1) = V 3(1).
Thus, for π = 1, V 3(1) < V 1(1) < V 2(1). Since for π = 0, V 1(0) < V 2(0) < V 3(0),
the above inequalities and equation (3.2.10) imply that the region π : V (π) = V 3(π)
is a convex subset of [0, 1] of the form [η, 1], for some 0 ≤ η ≤ 1, and the region
π : V (π) = V 2(π) is a convex subset of [0, 1] of the form [θL, θU ], for some 0 ≤ θL ≤
θU ≤ η, which completes the proof.
Theorem 3.2.6 shows that the optimal control policy can be represented as a type of
control chart, which monitors the probability Πn that the system is in a warning state.
The intuitive interpretation of the three critical thresholds (θL, θU , η) is as follows. When
the probability that the system is in a warning state is below the lower sampling limit
Πn ≤ θL, the decision maker has high confidence that the system is in healthy state 1,
and therefore has little reason to take an expensive sample through condition monitoring
to confirm this belief. Similarly, when the state probability is above the upper sampling
limit Πn ≥ θU , the decision maker has high confidence that the system is in warning
state 2, and therefore also has little reason to take the sample. It is only when the state
probability θL ≤ Πn < θU , that the decision maker is unsure about the system’s condition
Chapter 3. Optimal Sampling and Control of Stochastic Systems 49
and is willing to pay for a sample to get a better idea about its health. However, once
the state probability exceeds η, the risk of system failure and of incurring an expensive
corrective maintenance cost is too high, so the decision maker should stop the process
and initiate full system inspection, followed possibly by preventive maintenance.
Remark 3.2.7. It is important to note that practitioners can also use the control pol-
icy described in Theorem 3.2.6 as a tool for planning maintenance activities in advance.
For example, if θU ≤ Πn < η, the optimal action is to do nothing and run the sys-
tem until the next decision epoch n + 1. However, since no sample is taken, the state
probability at the next decision epoch Πn+1 = Πn(∅, π) = p12(∆)(1−π)+p22(∆)π
(1−p13(∆))(1−π)+(1−p23(∆))πis
a deterministic function given by equation (3.2.4). Therefore, the next maintenance
action (full system inspection) can be scheduled to take place in the future in T =
infm∆ : p12(m∆)(1−π)+p22(m∆)π
(1−p13(m∆))(1−π)+(1−p23(m∆))π≥ η
time units from now. Planning maintenance
activities in advance is particularly useful in practice since suspending a system from
operation for full inspection and maintenance may require significant preparation.
Intuitively, one would expect that if the sampling cost CS = 0, we should always take
a sample. On the other hand, if the sampling cost is greater than the cost of full system
inspection and preventive maintenance, i.e. CS > CI + CP , one would expect that we
should never take a sample. To conclude this section, we show using Jensen’s inequality
(e.g. [9]) that this intuition is mathematically correct.
Corollary 3.2.8. If the sampling cost CS = 0, then θL = 0 and θU = η. In other words,
before full system inspection is initiated, i.e. for all π < η, it is always optimal to take
a sample, i.e. φ∗(π) = 2. On the other hand, if the sampling cost CS > CI + CP , then
θL = θU = η. In other words, before full system inspection is initiated, i.e. for all π < η,
it is never optimal to take a sample, i.e. φ∗(π) = 1.
Proof. By equation (3.2.11),
V 1(π)− V 2(π) = R(∆|π)V (Π1(∅, π))−R(∆|π)∑y∈E
V (Π1(y, π)) g(y, π)− CS.
Chapter 3. Optimal Sampling and Control of Stochastic Systems 50
Also, equations (3.2.3), (3.2.4) and (3.2.8) imply
Π1(∅, π) =∑y∈E
Π1(y, π)g(y, π).
Thus, by concavity of V , it follows by Jensen’s inequality that for all 0 ≤ π ≤ 1,
R(∆|π)V (Π1(∅, π)) ≥ R(∆|π)∑y∈E
V (Π1(y, π)) g(y, π).
Thus, if CS = 0, for all 0 ≤ π ≤ 1, V 1(π) ≥ V 2(π), and it is always optimal to sample if
π < η.
For the case in which CS > CI + CP , since we know by Corollary 3.2.4 the value
function V (π) is a non-decreasing function of π, it follows that for all 0 ≤ π ≤ 1,
R(∆|π)V (Π1(∅, π))−R(∆|π)∑
y∈E V (Π1(y, π)) g(y, π) ≤ CI +CP . Thus, CS > CI +CP ,
then V 1(π) < V 2(π), i.e. it is never optimal to take a sample.
In the next section, we develop an iterative computational algorithm that determines
the optimal values of the critical thresholds (θ∗L, θ∗U , η
∗) and the minimum long-run ex-
pected average cost per unit time λ∗. We also provide numerical comparisons with other
suboptimal policies that illustrate the benefits of the joint optimization of sampling and
maintenance.
3.3 Computation of the Optimal Policy
In this section, we develop a computational algorithm that determines the optimal values
of the critical thresholds (θ∗L, θ∗U , η
∗) and the long-run expected average cost per unit time
λ∗. We also provide numerical comparisons with other suboptimal policies that illustrate
the benefits of the joint optimization of sampling and maintenance.
The computational algorithm is based on the λ−minimization technique ([3]) and the
(monotone) convergence of the restricted value functions Vk → V .
The Algorithm
Chapter 3. Optimal Sampling and Control of Stochastic Systems 51
Step 1. Choose ε > 0 and lower and upper bounds of λ, λ ≤ λ ≤ λ.
Step 2. Put λ = (λ+ λ)/2, and V λ0 (π) = CI + CPπ − λ(TI + TPπ), k = 1.
Step 3. Calculate V λk using the dynamic equations (3.2.6) and (3.2.7). Stop the
iteration of V λk when ||V λ
k − V λk−1|| ≤ ε.
Step 4. If V λk (0) < −ε, put λ = λ and go to Step 2.
If V λk (0) > ε, put λ = λ and go to Step 2.
If |V λk (0)| ≤ ε, put λ∗ = λ and stop.
In the algorithm above, Step 3 and Theorem 3.2.3 imply that the restricted value func-
tion V λk approximates the value function V λ for λ = λ∗. Step 4 and the λ−minimization
technique ([3]) imply that λ∗ is the optimal expected average cost. Furthermore, by
Theorem 3.2.3, the optimal value of the lower (resp. upper) sampling limit θ∗L (resp. θ∗U)
is the smallest (resp. largest) value of π such that Vk(π) = V 2k (π), and η∗ is the smallest
value of π such that Vk(π) = V 3k (π).
In the algorithm above, since λ > 0, a natural choice for the initial value of the lower
bound λ is 0. However, it is not clear how one should choose the value of the initial
upper bound λ. Fortunately, we have the following result for a feasible choice of the
initial upper bound.
Lemma 3.3.1. The optimal average cost is bounded by 0 < λ∗ ≤ CITI
. Thus, in the
algorithm given above, λ = 0 and λ = CITI
are feasible initial values for lower and upper
bounds, respectively.
Proof. Consider an age-based policy φ0 ∈ Φ that initiates full system inspection immedi-
ately at time 0. From renewal theory, it is clear that the long-run expected average cost
per unit time for this policy is given by
λ0 =CI + CPΠ0
TI + TPΠ0
=CITI,
where the second equality follows since we have assumed that a new system is installed
Chapter 3. Optimal Sampling and Control of Stochastic Systems 52
at time 0, i.e. Π0 = P (X0 = 2) = 0. Thus, it follows that
λ∗ = infφ∈Φ
EΠ0[TC(φ)]
EΠ0[CL(φ)]
≤ EΠ0[TC(φ0)]
EΠ0[CL(φ0)]
=: λ0 =CITI.
Therefore, the optimal average cost is bounded by 0 < λ∗ ≤ CITI
, which completes the
proof.
We next illustrate the use of the computational algorithm in the following subsection
and determine the optimal values of the critical thresholds (θ∗L, θ∗U , η
∗) and the long-run
expected average cost per unit time λ∗, in a numerical example.
3.3.1 Constructing the Optimal Control Chart
In this subsection, we construct the cost-optimal control chart described in Theorem
3.2.6. Using the computational algorithm described above, the optimal values of the
critical thresholds (θ∗L, θ∗U , η
∗) and the long-run expected average cost per unit time λ∗
are determined.
Consider the following transition rate matrix and state-observation matrix
Q =
−0.17 0.12 0.05
0 −0.50 0.50
0 0 0
, D =
0.90 0.05 0.05 0.00
0.00 0.05 0.10 0.85
.Maintenance cost parameters are given by CW = 30, CF = 85, CS = 2, CI = 65, CP = 20,
and maintenance time parameters TF = TI = TP = ∆ = 1. We coded the computational
algorithm given above in MATLAB and obtained the following optimal values θ∗L = 0.05,
θ∗U = 0.60, η∗ = 0.75, with a minimum expected average cost λ∗ = 17.82. The algorithm
took 369.39 seconds to complete on an Intel Corel 2 6420, 2.13 GHz with 2 GB RAM.
To run the algorithm, we chose ε = 0.01 and the interval 0 ≤ π ≤ 1 was discretized,
considering values of π = 0, 0.01, 0.02, . . . , 1, so that ||V λk − V λ
k−1|| in Step 3, for example,
is calculated as maxπ=0,0.01,...,1 |V λk − V λ
k−1|. The value function is graphed in Figure 3.3.1.
Chapter 3. Optimal Sampling and Control of Stochastic Systems 53
Figure 3.3.1: The Graph of the Value Function V (π)
Theorem 3.2.6 implies that the optimal sampling and maintenance policy can be
represented as a control chart, which monitors the probability Πn that the system is in
a warning state. To illustrate the use of such a control chart we plot a sample path
realization of (Πn) in Figure 3.3.2 below.
Figure 3.3.2: The Optimal Sampling and Maintenance Policy Represented as a Control Chart
Figure 3.3.2 shows that no samples should be taken from decision epoch 0 to decision
epoch 2. From decision epochs 3 to 6, the posterior probability θ∗L ≤ Πn < θ∗U so
the optimal action is to take a sample. At decision epoch 7, the posterior probability
θ∗U ≤ Πn < η∗, so again it is optimal to do nothing. At decision epoch 8, Πn ≥ η∗ so the
Chapter 3. Optimal Sampling and Control of Stochastic Systems 54
optimal action is full system inspection, followed possibly by preventive maintenance.
Such a control chart has direct practical value as it can be readily implemented for
online decision-making. Furthermore, since the monitored statistic is univariate and three
critical thresholds have straightforward and intuitive interpretation, decisions that are
made can be easily justified and explained at a managerial level.
In the next subsection, we provide numerical comparisons with other policies that
illustrate the benefits of the joint optimization of sampling and maintenance.
3.3.2 Comparison with Other Policies
In this subsection, we compare the performance of our jointly optimal sampling and
maintenance policy with the two most widely considered sampling policies: the policy
that never takes a sample at any decision epoch, and the policy that always takes a
sample at every decision epoch. Under each of these suboptimal sampling policies, the
decision maker still has the freedom to initiate full system inspection at any time. On
one hand, the policy that never takes a condition monitoring sample incurs no sampling
costs but also has the least amount of information. On the other hand, the sampling
policy that always takes a sample at every decision epoch carries the most information,
but also incurs the highest sampling cost. Our joint sampling and maintenance policy is
the optimal balance between having the largest amount of information at the least cost.
It is well known that the policy that never takes a sample at any decision epoch is
nothing more than the classical age-based policy (e.g. [6]). Within our framework, this
policy corresponds to the special case where θL = θU = η. Similarly, the policy that
always takes a sample at every decision epoch corresponds to the special case where
θL = 0 and θU = η. This type of control policy is known as a Bayesian control chart,
which was the focus of Chapter 2.
To facilitate our discussion, we refer to the policy that never takes a sample as an
N−Policy, the policy that always takes a sample as an A−Policy, and our jointly optimal
Chapter 3. Optimal Sampling and Control of Stochastic Systems 55
policy of Theorem 3.2.6 as a J−Policy. For this comparison, we consider the following
transition rate matrix and state-observation matrix
Q =
−0.23 0.12 0.11
0 −0.11 0.11
0 0 0
, D =
0.80 0.10 0.05 0.05
0.10 0.05 0.00 0.85
,and model parameters CW = 70, CF = 110, CS = 8, CI = 55, CP = 55, and TF = TI =
TP = ∆ = 1. We obtain the following results in Table 3.3.1.
Table 3.3.1: Comparison with Suboptimal Policies
N−Policy A−Policy J−Policy
θ∗L 0.27 0 0.09
θ∗U 0.27 0.34 0.29
η∗ 0.27 0.34 0.29
λ∗ 25.65 25.15 23.01
Run Time 5.97 78.64 364.95
Table 3.3.1 shows that the jointly optimal J−Policy performs substantially better
than both the optimal N−Policy and the optimal A−Policy. In particular, Table 3.3.1
shows that using the optimal J−Policy gives an expected 10.29% cost saving over the
optimal N−Policy, and an expected 8.51% cost saving over the optimal A−Policy. Nat-
urally, determining the optimal thresholds values (θ∗L, θ∗U , η
∗) for the J−Policy takes
longer than the determining the optimal threshold values for the optimal N−Policy and
the optimal A−Policy. However, in practice, since these computations are typically done
off-line, a total run time of a few minutes is surely worth the large cost savings obtained
by using the optimal J−Policy. It is also interesting to note that in this example, the
optimal threshold η∗ for full system inspection is quite low for all three policies. This is
due the fact that the cost of corrective maintenance CF = 110 is relatively much higher
than the cost of system inspection CI = 55 and preventive maintenance CP = 55. There-
Chapter 3. Optimal Sampling and Control of Stochastic Systems 56
fore, it is more beneficial to perform full system inspection more frequently than to run
the system longer and risk costly corrective maintenance due to failure.
We next analyze the sensitivity of the optimal policy for different value of the sampling
cost CS. In light of Corollary 3.2.8, we already know that the optimal J−Policy coincides
with the optimal A−Policy when CS = 0, and with the optimal N−Policy when CS >
CI + CP = 110. We obtain the following results in Table 3.3.2.
Table 3.3.2: Optimal Expected Average Cost λ∗ for Varying Sampling Costs CS
CS N−Policy A−Policy J−Policy
0 25.65 19.63 19.63
2 25.65 21.01 20.61
4 25.65 22.39 21.70
6 25.65 23.77 22.38
8 25.65 25.15 23.01
10 25.65 26.53 23.72
12 25.65 27.91 24.23
14 25.65 29.29 24.71
16 25.65 30.67 25.08
18 25.65 32.05 25.37
20 25.65 33.43 25.58
22 25.65 34.81 25.62
24 25.65 36.19 25.65
Table 3.3.2 provides important managerial insight into the operational value of con-
dition monitoring information and technologies. This insight is best understood visually,
so we plot the optimal expected average costs of Table 3.3.2 below in Figure 3.3.3.
The dashed horizontal line in Figure 3.3.3 is the expected average cost for the optimal
N−Policy for different values of the sampling cost CS. The dotted increasing line is the
Chapter 3. Optimal Sampling and Control of Stochastic Systems 57
Figure 3.3.3: Graphical Illustration of the Optimal Expected Average Cost λ∗ for Varying Sampling
Costs CS
expected average cost for the optimal A−Policy, and the solid increasing curve is the
expected average cost for the optimal J−Policy. Figure 3.3.3 shows that the optimal
J−Policy coincides with the optimal A−Policy when CS = 0, and with the optimal
N−Policy when CS ≥ 24.
The optimal A−Policy is always better than the optimal N−Policy from CS = 0 to
around CS = 9. Afterwhich, the optimal N−Policy is always better than the optimal
A−Policy. This implies that once the sampling cost CS exceeds 9, it is better to never
take a sample and be ignorant of the state of the system, than it is to incur regular
condition monitoring sample costs to get a better idea of the system state.
Although the optimal J−Policy is always better than both the optimal N−Policy
and optimal A−Policy for all values of CS, the benefits are approximately the greatest
when CS = 9, i.e. the point at which the optimal N−Policy is better than the optimal
A−Policy. On the other hand, the benefits of using the optimal J−Policy become quite
marginal when CS is close to 0 and 24. This suggests that a manager is likely not to be
willing to invest in condition monitoring technologies if the sampling cost CS is close to
24. Similarly, a manager should choose to sample the system at every decision epoch to
Chapter 3. Optimal Sampling and Control of Stochastic Systems 58
simplify the scheduling of sampling and maintenance activities if the sampling cost CS
is close to 0.
3.4 Conclusions and Future Research
In this chapter, a joint sampling and control problem under partial observations has been
considered. The problem has been formulated as a partially observable Markov decision
process. The objective was to characterize the form of the optimal sampling and main-
tenance policy that minimizes the long-run expected average cost per unit time. It was
shown that the optimal control policy can be represented as a control chart with three
critical thresholds, which monitors the posterior probability that the system is in a so-
called warning state. Such a control chart has direct practical value as it can be readily
implemented for online decision-making. Furthermore, since the monitored statistic and
three critical thresholds have straightforward and intuitive interpretation, decisions can
be easily justified and explained at a managerial level. It was also shown that the struc-
ture of the optimal policy allows practitioners to plan and schedule maintenance activities
into the future. A cost comparison with other suboptimal policies has been examined,
which illustrates the benefits of the joint optimization of sampling and control. It was
found that the jointly optimal sampling and maintenance policy performed substantially
better than existing suboptimal policies. Numerical results indicate that the advantage
of using the jointly optimal sampling and maintenance policy becomes less substantial
for both very small and large values of the sampling cost CS.
There are a number of exciting extensions and topics for future research. We have
considered a system that can be characterized by three distinguishable states: a healthy
state (state 1), a warning state (state 2), and a failure state (state 3). In practice, this
assumption is reasonable for many applications. As considered in Chapter 2, it may be
of worth investigating how much additional value would be gained by considering the
Chapter 3. Optimal Sampling and Control of Stochastic Systems 59
general N > 3 state model. Such an extension would lead to both interesting theoretical
and practical challenges. The main challenge roots from the fact that the sufficient
statistic for decision-making is no longer a univariate statistic, as is the case for the
three state model. In fact, the sufficient statistic for decision-making would now be an
(N − 1)−dimensional vector representing the posterior probability distribution of the
system state. Thus, the optimal sampling and maintenance policy can no longer be
visualized and represented as a control chart.
The numerical results of Section 3.3 showed that the run time of our algorithm took
over 6 minutes to complete. Although this is not unreasonably long, there is still much
room for improvement. In particular, a closer look at Theorem 3.2.6 reveals that the re-
sult has further computational value. Recall, that the original stochastic control problem
defined in (3.1.7) was transformed to an equivalent parameterized stochastic control prob-
lem (with parameter λ) with an additive objective function using the λ−minimization
technique. However, the characterization given in Theorem 3.2.6 implies that the op-
timal control policy is no longer parameterized by λ, and is completely determined by
the ordered triple (θL, θU , η). This is potentially a useful property from a computational
point of view, since it is possible to develop an algorithm that directly finds the opti-
mal values of (θ∗L, θ∗U , η
∗) that minimize the original objective function defined in (3.1.7).
Such an algorithm would likely be faster than the algorithm presented in Section 3.3, as
one would now be solving a single optimization problem, as opposed to solving multiple
stochastic control problems for different values of λ.
Chapter 4
Parameter Estimation for
Stochastically Failing Systems
In this chapter, we consider a parameter estimation problem for a partially observable
system subject to random failure. We assume that two types of data histories are avail-
able: histories that end with observable system failure, and censored data histories that
end when the system has been suspended from operation but has not failed. Given any
number of failure and suspension histories, our objective is to determine the maximum
likelihood estimates (MLEs) of the model parameters.
In recent years, a lot of research has been done on the analysis and control of main-
tenance models. Surprisingly, little research has been done on parameter estimation for
partially observable systems subject to random failure. Although some research has con-
sidered estimation for partially observable systems in the hidden Markov model (HMM)
framework, few researchers have considered the inclusion of failure information, which is
present in almost every maintenance application. For example, Ryden [64], Douc et al.
[22], Genon-Catalot and Laredo [25], and Hamilton [27] considered maximum likelihood
estimation for hidden Markov models in discrete time, however the results of their papers
are not applicable to maintenance systems for which system failure is observable.
60
Chapter 4. Parameter Estimation for Stochastic Systems 61
Asmussen et al. [2] considered an estimation method using the EM algorithm for
phase-type distributions. Their paper has some similarities with the model considered
in this chapter. In particular, in both papers, the system state follows a continuous-time
homogeneous Markov chain, and the time to system failure is observable and follows
phase-type distribution. However, since our model is for maintenance applications, we
also consider a stochastically related observation process that is sampled at equidistant
time points, which give partial information about the system state. This additional level
of complexity was not considered by Asmussen et al. [2]. Roberts and Ephraim [61]
considered a parameter estimation problem for continuous-time Markov chains that is
partially observed through a discrete-time observation process. There are two distinct
differences between their model and ours. The first is that the observation process they
consider is a univariate process, whereas we consider multivariate observations. The sec-
ond difference is that Roberts and Ephraim’s [61] model of ion-channel currents does
not have the notion of observable failure information, a feature that is found in mainte-
nance applications. Recently, Ghasemi et al. [24] considered parameter estimation for
a maintenance model with partial observations. In their paper, the failure rate of the
system follows Cox’s proportional hazards model, whereas our failure time is governed
by a phase-type distribution. The authors assumed a discrete time hidden Markov model
with a univariate, finite-valued observation process, whereas our hidden state process is
a continuous-time process with a multivariate, Rd-valued observation process.
We have found through our work with diagnostic data such as spectrometric oil data
and vibration data, that it is usually sufficient to consider only two operational states - a
healthy state and an unhealthy state. This is because in many cases, the system moves
through two distinct phases of operation. In the first and longer phase, the system
operates under normal conditions, and the observations behave in a stationary manner.
Although system degradation can be gradual, it is usually not until degradation has
exceeded a certain level that the behaviour of the condition monitoring observations
Chapter 4. Parameter Estimation for Stochastic Systems 62
changes substantially. Furthermore, in many applications it may not be desirable to
define multiple intermediate degradation states if the objective is to run the system as
long as possible. This is because if the system is considered to be in a healthy or normal
state while degradation is below a critical warning level, only when the system experiences
severe degradation that can cause failure, will the decision maker initiate a maintenance
action. At this point, the system enters the second and shorter phase, which we define
to be the warning or unhealthy state. It will be shown that the estimation problem of
the three-state model considered in this chapter can be solved by directly analyzing the
structure of the pseudo likelihood function. We will show that both the pseudo likelihood
function and the parameter updates in each iteration of the EM algorithm have explicit
formulas. This implies that each iteration of the EM algorithm can be performed with a
single computation, which leads to an extremely fast and simple estimation procedure.
This computational advantage is particularly attractive for practical applications.
We should note that in certain applications, gradual system degradation can be mod-
eled more realistically using a general N-state extension of our model. However, as shown
in Lin and Makis [47], explicit update formulas in the EM algorithm are not readily avail-
able, which is one of the major advantages of using the three-state model considered in
this chapter. In particular, Lin and Makis [47] considered an interesting maintenance
model with finite-valued observations and failure information, similar to the model con-
sidered in this chapter. Their objective was to derive a general recursive filter, which
is important mainly for on-line re-estimation. The authors were able to express the pa-
rameter updates in each iteration of the EM algorithm in terms of the recursive filter.
However, such an approach has been found to be quite computationally intensive and
difficult to implement when working with real data sets.
The remainder of the chapter is organized as follows. In §4.1, we present the models
of the state and observation processes. In §4.2, we discuss maximum likelihood estima-
tion using the EM algorithm and the pseudo likelihood function. We derive an explicit
Chapter 4. Parameter Estimation for Stochastic Systems 63
expression for the pseudo likelihood function and provide update formulas for both the
state and observation parameters. In §4.3, we develop a numerical example using real
multivariate spectrometric oil data coming from the failing transmission units of heavy
hauler trucks, which illustrates the entire estimation procedure. §4.4 provides concluding
remarks and future research directions.
4.1 Model Formulation
We assume that a technical system’s condition can be categorized into one of three states:
a healthy or “good as new” state (state 1), an unhealthy or warning state (state 2), and a
failure state (state 3). In many real world applications the state of an operational system
is unobservable, and only the failure state is observable. For example, the state of an
operational transmission unit in a heavy hauler truck cannot be observed without full
system inspection, which is typically quite costly. However, failure of the mechanical unit
is immediately observable. We model the state process (Xt : t ∈ R+) as a continuous time
homogeneous Markov chain with state space X = 1, 2 ∪ 3. The system is assumed
to start in a healthy state, i.e. P (X0 = 1) = 1, and the transition rate matrix is given by
Q =
−(q12 + q13) q12 q13
0 −q23 q23
0 0 0
, (4.1.1)
where q12, q13, q23 ∈ (0,+∞) are the unknown state parameters. As in the previous
chapters, let ξ = inf t ∈ R+ : Xt = 3 be the observable failure time of the system.
Suppose at equidistant sampling times ∆, 2∆, . . ., ∆ ∈ (0,+∞), vector-data Y1, Y2, . . . ∈
Rd is collected through condition monitoring, which gives partial information about the
system state. The observations are assumed to be conditionally independent given the
state of the system, and for each n ∈ N, we assume that Yn conditional on Xn∆ = i,
Chapter 4. Parameter Estimation for Stochastic Systems 64
i = 1, 2, has d−dimensional normal distribution Nd (µi,Σi) with density
f (y|i) =1√
(2π)d
det (Σi)exp
(−1
2(y − µi)
′Σ−1i (y − µi)
), (4.1.2)
where µ1, µ2 ∈ Rd and Σ1,Σ2 ∈ Rd×d are the unknown observation process parameters.
It is important to point out that the assumption of conditional independence is not
always reasonable in practice when the observations are highly autocorrelated. There are
essentially two main approaches that exist in the literature to deal with autocorrelation
in the data histories.
The first approach is to directly model autocorrelation in the observation process in
the hidden Markov framework. Such models are referred to in the literature as models
with a Markov regime (see e.g. Krishnamurthy and Yin [43], Hamilton [27]), Markov
switching (see e.g. Kim [37]), or Markov sources (see e.g. Liporace [48]). This approach
mathematically integrates the hidden Markov state process and the autocorrelation in
the data histories into a single model.
Kim et al. [41] analyzed a parameter estimation problem for this type of autoregressive
Markov switching model, where observations (Yn) are stochastically related to the state
process (Xt) via the equation
Yn = µXn∆+
p∑r=1
Φ(r)Xn∆
(Y(n−r)∆ − µ(n−r)∆) + AXn∆εn∆, (4.1.3)
where (εn∆) is a sequence of i.i.d. d-dimensional standard multivariate normal random
vectors, and µ1, µ2 ∈ Rd, A1, A2 ∈ Rd×d, and Φ(r)1 ,Φ(r)
2 ∈ Rd×d, r = 1, . . . , p, are unknown
model parameters that need to be estimated.
It was found that while such models are mathematically elegant and compact, they
have two severe limitations. First and foremost, the Markov property required for sub-
sequent optimization problems no longer holds, making these models algorithmically in-
tractable for optimal maintenance decision making. Thus, although such Markov switch-
ing models are able to incorporate the state-observation relationship, they are not very
Chapter 4. Parameter Estimation for Stochastic Systems 65
useful for optimal decision making, which is the most important aspect of mathemati-
cal modeling in operations research and industrial engineering. For typical examples of
maintenance decision models that require the assumption of conditional independence in
the hidden Markov model see e.g. Makis and Jiang [51], Wu and Makis [78], and Yin
and Makis [81]. The second drawback of these models is that parameter estimation is
extremely computationally intensive for such models. This is due to the fact that no
closed-form analytical procedures are available for estimation. In particular, Kim et al.
[41] showed that explicit closed-form update formulae for the parameter estimates in each
iteration of the EM algorithm do not exist. As a result, numerical methods are required
to estimate the model parameters and computational time increases exponentially as the
number of data histories increase.
The second approach, which we adopt in this chapter, is to first pre-process the data
histories and remove as much of the autocorrelation as possible before proceeding to
hidden Markov modeling. The idea is to first decide on an initial approximation for the
healthy portions of the data histories and fit a time series model to the healthy data
portions. The residuals of the fitted model are then computed and formal statistical
tests for conditional independence can be performed. The residuals are then chosen as
the “observation” process in the hidden Markov framework under the assumption of
conditional independence.
In contrast with the Markov switching approach, this approach does not have either of
the two aforementioned drawbacks. In particular, as we will demonstrate in Section 4.3,
computational time for parameter estimation using this method does not grow exponen-
tially with the number of data histories, and, the model constructed using this approach
can be readily used for subsequent maintenance decision making since the memoryless
property is preserved.
The approach of using the residuals of the fitted model has theoretical justification,
and has been successfully applied in a variety of statistical, scientific, and engineering
Chapter 4. Parameter Estimation for Stochastic Systems 66
applications. For a theoretical justification for this approach, see for example Yang and
Makis [79]. In their paper, the authors proposed a general method for studying the resid-
ual behaviour of autocorrelated processes subject to a change from a healthy to unhealthy
system state. They proved that the residuals of the fitted model are conditionally inde-
pendent and normally distributed. For successful application of the residual approach
see e.g. Sohn and Farrar [69], Wang and Wong [75], Schneider and Frank [67] in fault
diagnosis, Baddeley et al. [5] in spatial point processes, Schoenberg [68] in earthquake
occurrences, Wang et al. [76] in vibration data analysis, among others. In Section 4.3,
we apply this approach on real diagnostic data coming from the spectrometric analysis
of failing transmission units.
4.2 Parameter Estimation Using the EM Algorithm
We begin this section by briefly reviewing the EM algorithm in the context of our model.
The EM algorithm, first introduced into the literature by Dempster et al. [19], has been
found to be well-suited for solving parameter estimation problems in the hidden Markov
framework. A comprehensive overview of the EM algorithm and its many applications
are given in McLachlan and Krishnan [55].
Suppose we have collected H ∈ N failure histories, which we denote as H1, . . . ,HH .
Failure history Hi is assumed to be of the form ~Yi = (yi1, . . . , yiTi
) and ξi = ti, where
Ti∆ < ti ≤ (Ti + 1)∆. The sampling history ~Yi represents the collection of all vector
data yij ∈ Rd, j ≤ Ti, that was obtained through condition monitoring until system
failure at time ti. Suppose further that we have collected K ∈ N suspension histories,
which we denote as S1, . . . ,SK . Suspension history Sj is assumed to be of the form ~Yj =
(yj1, . . . , yjTj
) and ξj > Tj∆. Let O = H1, . . . ,HH ,S1, . . . ,SK represent all observable
data and L (γ, θ|O) be the associated likelihood function, where γ = (q12, q13, q23) and
θ = (µ1, µ2,Σ1,Σ2) are the sets of unknown state and observation parameters. Because
Chapter 4. Parameter Estimation for Stochastic Systems 67
the sample paths of the state process (Xt) are not observable, maximizing L (γ, θ|O)
analytically is not possible. The EM algorithm resolves this difficulty by iteratively
maximizing the so-called pseudo likelihood function. More specifically, the EM algorithm
works as follows. Let γ0, θ0 be some initial values of the unknown parameters.
E-STEP. For n ≥ 0, compute the pseudo likelihood function defined by
Q (γ, θ|γn, θn) := Eγn,θn (lnL (γ, θ|C) |O) , (4.2.1)
where C =H1, . . . , HH , S1, . . . , SK
represents the complete data set, in which each fail-
ure historyHi and suspension history Sj of the observable dataO set has been augmented
with the unobservable sample path information of the state process (Xt : t ∈ R+).
M-STEP. Choose γn+1, θn+1 such that
(γn+1, θn+1) ∈ arg maxγ,θ
Q (γ, θ|γn, θn) . (4.2.2)
The E and M steps are repeated until the Euclidean norm |(γn+1, θn+1)− (γn, θn)| < ε,
for ε > 0 small.
We will show in Theorems 4.2.2 and 4.2.3, that (4.2.1) admits the following decompo-
sition Q (γ, θ|γn, θn) = Qstate (γ|γn, θn) +Qobs (θ|γn, θn), where Qstate depends only on the
state parameters γ = (q12, q13, q23) and Qobs depends only on the observation parameters
θ = (µ1, µ2,Σ1,Σ2). This implies in particular that the M-step (4.2.2) can be carried out
separately for the state and observation parameters, which considerably simplifies the
algorithm and increases the speed of computation. It is important to note that under
very general conditions, the EM algorithm may not always converge to the maximum
likelihood estimates, see e.g. Wu [77]. However, we have not encountered such problems
with our model, and, as illustrated in Section 4.3, our parameter estimates converge quite
rapidly.
Chapter 4. Parameter Estimation for Stochastic Systems 68
4.2.1 Form of the Likelihood Function
In this subsection, we are interested in deriving an explicit formula for the likelihood
function L (γ, θ|C) in (4.2.1). Let τ1 = inf t ∈ R+ : Xt > 1 be the unobservable sojourn
time of the state process in healthy state 1. From (4.1.1), it is clear that there is a
one-to-one correspondence between the entire sample path (Xt) of the system state and
the two random variables τ1 and ξ. The distributional properties of the sojourn time and
failure time are given by the following lemma.
Lemma 4.2.1. For each t ∈ R+, the density of ξ is given by
fξ(t) = p12
υ1υ2
υ1 − υ2
(e−υ2t − e−υ1t) + p13υ1e−υ1t. (4.2.3)
For all non-negative s < t, the conditional density of τ1 given ξ is given by
fτ1|ξ(s|t) =p12υ2e
−υ2te−(υ1−υ2)s
p12υ2
υ1−υ2(e−υ2t − e−υ1t) + p13e−υ1t
, (4.2.4)
and for each t ∈ R+, the conditional probability P (τ1 = t|ξ = t) is given by
mτ1|ξ(t|t) =p13e
−υ1t
p12υ2
υ1−υ2(e−υ2t − e−υ1t) + p13e−υ1t
, (4.2.5)
where υ1 = q12 + q13, υ2 = q23, p12 = q12
q12+q13, and p13 = q13
q12+q13.
Proof. Let S1 = Xτ1 be the state of the system at time τ1. Then for each t ∈ R+,
P (ξ ≤ t) = p12P (ξ ≤ t|S1 = 2) + p13P (ξ ≤ t|S1 = 3)
= p12
∫ t
u=0
P (Xt−u = 3|X0 = 2)υ1e−υ1udu+ p13
∫ t
u=0
1 · υ1e−υ1udu
= p12 (1− e−υ1t)− p12
υ1
υ1 − υ2
e−υ2t + p12
υ1
υ1 − υ2
e−υ1t + p13 (1− e−υ1t) ,
which is differentiable in t so that the density of ξ is given by
fξ(t) :=dP (ξ ≤ t)
dt= p12
υ1υ2
υ1 − υ2
(e−υ2t − e−υ1t) + p13υ1e−υ1t,
Chapter 4. Parameter Estimation for Stochastic Systems 69
for all t ∈ R+, and zero otherwise. For all non-negative s < t,
P (τ1 ≤ s, ξ ≤ t) = p12
∫ s
u=0
P (Xt = 3|Xu = 2)υ1e−υ1udu+ p13
∫ s
u=0
1 · υ1e−υ1udu
= p12 (1− e−υ1s)− p12
υ1
υ1 − υ2
e−υ2t + p12
υ1
υ1 − υ2
e−υ2te−(υ1−υ2)s
+p13 (1− e−υ1s) ,
which is differentiable in both variables so that the joint density of (τ1, ξ) for all non-
negative s < t is given by
fτ1,ξ(s, t) :=∂2P (τ1 ≤ s, ξ ≤ t)
∂s∂t= p12υ1υ2e
−υ2te−(υ1−υ2)s,
and for all non-negative s < t, we define the density function
fτ1|ξ(s|t) :=fτ1,ξ(s, t)
fξ(t)=
p12υ2e−υ2te−(υ1−υ2)s
p12υ2
υ1−υ2(e−υ2t − e−υ1t) + p13e−υ1t
.
For s = t, we define the probability mass function
mτ1|ξ(t|t) := P (τ1 = t|ξ = t) = 1−∫s<t
fτ1|ξ(s|t)ds
=p13e
−υ1t
p12υ2
υ1−υ2(e−υ2t − e−υ1t) + p13e−υ1t
,
which completes the proof.
Before we derive the formula for the likelihood function L(γ, θ|C) in the general case
for H observed failure histories and K suspension histories, we first consider the case with
a single failure history H, i.e. we have collected data ~Y = (y1, . . . , yT ) and the system
is known to have failed at time ξ = t, where T∆ < t ≤ (T + 1)∆. Since the observable
data set O = H and the complete data set C =H
, we denote the likelihood function
L(γ, θ|C) as LH(γ, θ).
Since τ1 and ξ are sufficient for characterizing the sample paths of the state process,
equations (4.2.3) - (4.2.5) imply that the likelihood function LH(γ, θ) is given by
LH(γ, θ) =
g~Y|ξ,τ1(~y|t, τ1)fτ1|ξ(τ1|t)fξ(t), τ1 < t
g~Y|ξ,τ1(~y|t, t)mτ1|ξ(t|t)fξ(t), τ1 = t(4.2.6)
Chapter 4. Parameter Estimation for Stochastic Systems 70
where g~Y|ξ,τ1(~y|t, s) is the conditional density of the observation process ~Y = (Y1, . . . , YT )
given ξ = t and τ1 = s ≤ t, which can be expressed in an explicit form. For any
s ∈ ((k − 1)∆, k∆], k = 1, . . . , T , equation (4.1.2) implies that g~Y|ξ,τ1(~y|t, s) is given by
g~Y|ξ,τ1(~y|t, s) = g~Y|ξ,τ1(~y|t, k∆)
=
exp
(−1
2
k−1∑n=1
(yn − µ1)′Σ−1
1 (yn − µ1)−1
2
T∑n=k
(yn − µ2)′Σ−1
2 (yn − µ2)
)√
(2π)Td detk−1(Σ1) detT−k+1(Σ2),(4.2.7)
and for any s > T∆, g~Y|ξ,τ1(~y|t, s) is given by
g~Y|ξ,τ1(~y|t, s) = g~Y|ξ,τ1(~y|t, t)
=
exp
(−1
2
T∑n=1
(yn − µ1)′Σ−1
1 (yn − µ1)
)√
(2π)Td detT (Σ1). (4.2.8)
We next consider the case where we have observed only a single suspension history S,
i.e. we have collected data ~Y = (y1, . . . , yT ) and stopped observing the operating system
at time T∆. Since the observable data set O = S and the complete data set C =S
,
in this case we denote the likelihood function L(γ, θ|C) as LS(γ, θ). For each s, t ∈ R+,
it is not difficult to see that the conditional reliability function of ξ given τ1 is given by
h(t|s) := P (ξ > t|τ1 = s) =
p12e−υ2(t−s), t ≥ s
1, t < s(4.2.9)
Furthermore, it is well-known that the density function of the unobservable sojourn time
τ1 is given by
fτ1(s) =
υ1e−υ1s, s ≥ 0
0, s < 0(4.2.10)
Then equations (4.2.7) - (4.2.10) imply that the likelihood function LS(γ, θ) is given by
LS(γ, θ) = g~Y|ξ,τ1(~y|t, τ1)h(t|τ1)fτ1(τ1). (4.2.11)
Chapter 4. Parameter Estimation for Stochastic Systems 71
Thus, for the general case in which we have observed H independent failure histories
H1, . . . ,HH and K independent suspension histories S1, . . . ,SK , the likelihood function
is given by
L(γ, θ|C) =H∏i=1
LHi(γ, θ)K∏j=1
LSj(γ, θ), (4.2.12)
where the likelihood functions for the individual failure and suspension histories are given
by equations (4.2.6) and (4.2.11), respectively.
4.2.2 Form of the Pseudo Likelihood
In this subsection, we are interested in carrying out the E-step of the EM algorithm, i.e.
deriving the pseudo likelihood by taking the expectation of the likelihood function given
by (4.2.12). As in the previous subsection, we first analyze the case in which we have
observed only a single failure history H of the form ~Y = (y1, . . . , yT ) and ξ = t, where
T∆ < t ≤ (T + 1)∆. Thus, for any fixed estimates γ, θ of the state and observations
parameters, we are interested in deriving the formula for the pseudo likelihood function
QH(γ, θ|γ, θ) = Eγ,θ (lnLH(γ, θ)|H), where the likelihood function LH(γ, θ) is given in
(4.2.6).
To simplify notation, for the remainder of the chapter we denote ~γ = (q12, q13, q23)′
and g = (g~Y|ξ,τ1(~y|t,∆), . . . , g~Y|ξ,τ1(~y|t, T∆), g~Y|ξ,τ1(~y|t, t))′. Also, for any vector v =
(v1, . . . , vn)′, we denote lnv := (ln v1, . . . , ln vn)′. The inner product 〈v,w〉 := v′w.
Theorem 4.2.2. Given a single failure history H, the pseudo likelihood function has the
following decomposition
QH(γ, θ|γ, θ) = QstateH (γ|γ, θ) +Qobs
H (θ|γ, θ), (4.2.13)
where
QstateH (γ|γ, θ) = 〈a, ~γ〉+
⟨b, ln~γ
⟩,
QobsH (θ|γ, θ) = 〈c, ln g〉 , (4.2.14)
Chapter 4. Parameter Estimation for Stochastic Systems 72
for vectors a, b, and c that depend only on the fixed estimates γ, θ.
Proof. Using equations (4.2.3) - (4.2.5) of Lemma 4.2.1 and the formula for the likelihood
function LH(γ, θ) given by (4.2.6),
QH(γ, θ|γ, θ) = Eγ,θ (lnLH(γ, θ)|H)
= Eγ,θ
(lnLH(γ, θ)|~Y = ~y, ξ = t
)
=
∫s<t
ln(g~Y|ξ,τ1(~y|t, s)fτ1|ξ(s|t)fξ(t)
)g~Y|ξ,τ1(~y|t, s)fτ1|ξ(s|t)ds
+ ln(g~Y|ξ,τ1(~y|t, t)mτ1|ξ(t|t)fξ(t)
)g~Y|ξ,τ1(~y|t, t)mτ1|ξ(t|t)
∫u<t
g~Y|ξ,τ1(~y|t, u)fτ1|ξ(u|t)du+ g~Y|ξ,τ1(~y|t, t)mτ1|ξ(t|t),
where the notation g~Y|ξ,τ1 , fτ1|ξ, mτ1|ξ is used to signify that the functions g~Y|ξ,τ1 , fτ1|ξ,
mτ1|ξ are parameterized by fixed estimates γ, θ. Since g~Y|ξ,τ1 defined in (4.2.7) and (4.2.8)
depends only on observation parameters θ = (µ1, µ2,Σ1,Σ2), and fξ, fτ1|ξ, and mτ1|ξ
depend only on state parameters γ = (q12, q13, q23), the equation above can be decomposed
into two terms QH(γ, θ|γ, θ) = QstateH (γ|γ, θ)+Qobs
H (θ|γ, θ). Substituting equations (4.2.3)
- (4.2.5) of Lemma 4.2.1, the first term QstateH simplifies to
QstateH (γ|γ, θ) =
∫s<t
ln(q12q23e
−q23te−(q12+q13−q23)s)g~Y|ξ,τ1(~y|t, s)fτ1|ξ(s|t)ds
+ ln (q13e−(q12+q13)t) g~Y|ξ,τ1(~y|t, t)mτ1|ξ(t|t)
∫u<t
g~Y|ξ,τ1(~y|t, u)fτ1|ξ(u|t)du+ g~Y|ξ,τ1(~y|t, t)mτ1|ξ(t|t)
=: a12q12 + a13q13 + a23q23 + b12 ln q12 + b13 ln q13 + b23 ln q23,
where constants that depend only on fixed parameter estimates γ, θ are given by
a12 = a13 = − p12υ2e−υ2t
d〈e2, g〉 −
tp13e−υ1t
dg~Y|ξ,τ1(~y|t, t),
a23 =p12υ2e
−υ2t
d(〈e2, g〉 − t 〈e1, g〉) ,
b12 = b23 =p12υ2e
−υ2t
d〈e1, g〉 , (4.2.15)
b13 =p13e
−υ1t
dg~Y|ξ,τ1(~y|t, t),
d = p12υ2e−υ2t 〈e1, g〉+ p13e
−υ1tg~Y|ξ,τ1(~y|t, t),
Chapter 4. Parameter Estimation for Stochastic Systems 73
and vectors e1 = (e11, . . . , e
T1 , e
t1)′ and e2 = (e1
2, . . . , eT2 , e
t2)′ are defined by
ek1 =e−(υ1−υ2)(k−1)∆ − e−(υ1−υ2)k∆
υ1 − υ2
, k = 1, . . . , T,
et1 =e−(υ1−υ2)T∆ − e−(υ1−υ2)t
υ1 − υ2
,
ek2 =ek1 − k∆e−(υ1−υ2)k∆ + (k − 1)∆e−(υ1−υ2)(k−1)∆
υ1 − υ2
, k = 1, . . . , T, (4.2.16)
et2 =et1 − te−(υ1−υ2)t + T∆e−(υ1−υ2)T∆
υ1 − υ2
,
Similarly, the second term QobsH , which is a function only of the observation parameters
θ, simplifies to
QobsH (θ|γ, θ) =
∫s<t
ln(g~Y|ξ,τ1(~y|t, s)
)g~Y|ξ,τ1(~y|t, s)fτ1|ξ(s|t)ds
+ ln(g~Y|ξ,τ1(~y|t, t)
)g~Y|ξ,τ1(~y|t, t)mτ1|ξ(t|t)
∫u<t
g~Y|ξ,τ1(~y|t, u)fτ1|ξ(u|t)du+ g~Y|ξ,τ1(~y|t, t)mτ1|ξ(t|t)
=:T∑k=1
ck ln(g~Y|ξ,τ1(~y|t, k∆)
)+ ct ln
(g~Y|ξ,τ1(~y|t, t)
),
where constants that depend only on γ, θ are given by
ck =p12υ2e
−υ2tek1
dg~Y|ξ,τ1(~y|t, k∆), k = 1, . . . , T,
ct =
(p12υ2e
−υ2tek1 + p13e−υ1t
d
)g~Y|ξ,τ1(~y|t, t). (4.2.17)
To finish the proof, put a = (a12, a13, a23)′, b = (b12, b13, b23)
′, and c = (c1, . . . , cT , ct)′.
It is interesting to note that the quantities appearing in Theorem 4.2.2 can be given
a probabilistic interpretation. In particular, by inspecting the proof of Theorem 4.2.2, it
follows that quantities a = (a12, a13, a23)′, b = (b12, b13, b23)
′, and c = (c1, . . . , cT , ct)′ have
the following probabilistic interpretations:
• −a12 and −a13 equal the conditional expectation of the sojourn time of the system
in the healthy state 1 given ~Y = ~y and ξ = t.
Chapter 4. Parameter Estimation for Stochastic Systems 74
• −a23 equals the conditional expectation of the sojourn time of the system in the
unhealthy state 2 given and ~Y = ~y and ξ = t.
• b12 and b23 equal the conditional probability the sojourn time τ1 is less than t given
~Y = ~y and ξ = t.
• b13 and ct equal the conditional probability the sojourn time τ1 is equal t to given
~Y = ~y and ξ = t.
• For each k = 1, . . . , T , ck equals the conditional probability the sojourn time τ1 is
in the interval [(k − 1)∆, k∆) given ~Y = ~y and ξ = t.
We next analyze the case in which we have observed only a single suspension history
S of the form ~Y = (y1, . . . , yT ) and ξ > T∆. That is, for any fixed estimates γ, θ
of the state and observations parameters, we are interested in deriving the formula for
the pseudo likelihood function QS(γ, θ|γ, θ) = Eγ,θ (lnLS(γ, θ)|S), where the likelihood
function LS(γ, θ) is given in (4.2.11).
Theorem 4.2.3. Given a single suspension history S, the pseudo likelihood function has
the following decomposition
QS(γ, θ|γ, θ) = QstateS (γ|γ, θ) +Qobs
S (θ|γ, θ), (4.2.18)
where
QstateS (γ|γ, θ) = 〈~α,~γ〉+ ϕ1 ln(q12) + ϕ2 ln(q12 + q13),
QobsS (θ|γ, θ) =
⟨~β, ln g
⟩, (4.2.19)
for constants ~α, ~β, ϕ1 and ϕ2 that depend only on the fixed estimates γ, θ.
Proof. To simplify notation, we put t := T∆ in the proof. Using equations (4.2.9) and
Chapter 4. Parameter Estimation for Stochastic Systems 75
(4.2.10) and the formula for the likelihood function LS(γ, θ) given by (4.2.11),
QS(γ, θ|γ, θ) = Eγ,θ (lnLS(γ, θ)|S)
= Eγ,θ
(lnLS(γ, θ)|~Y = ~y, ξ > t
)=
∫ln(g~Y|ξ,τ1(~y|t, s)h(t|s)fτ1(s)
)g~Y|ξ,τ1(~y|t, s)h(t|s)fτ1(s)ds∫
g~Y|ξ,τ1(~y|t, u)h(t|u)fτ1(u)du.
Since h and fτ1 defined in (4.2.9) and (4.2.10) depend only on the state parameters γ,
the equation above can be decomposed into two terms QS(γ, θ|γ, θ) = QstateS (γ|γ, θ) +
QobsS (θ|γ, θ), where the first term Qstate
S depends only on γ and the second term QobsS
depends only on θ. Substituting equations (4.2.9) and (4.2.10), the first term QstateS
simplifies to
QstateS (γ|γ, θ) =
∫ln (h(t|s)fτ1(s)) g~Y|ξ,τ1(~y|t, s)h(t|s)fτ1(s)ds∫
g~Y|ξ,τ1(~y|t, u)h(t|u)fτ1(u)du
=
∫s≤t ln (q12e
−q23te−(q12+q13−q23)s) g~Y|ξ,τ1(~y|t, s)h(t|s)fτ1(s)ds
+∫s>t
ln ((q12 + q13)e−(q12+q13)s) g~Y|ξ,τ1(~y|t, s)h(t|s)fτ1(s)ds
∫g~Y|ξ,τ1(~y|t, u)h(t|u)fτ1(u)du
=: α12q12 + α13q13 + α23q23 + ϕ1 ln(q12) + ϕ2 ln(q12 + q13),
where constants that depend only on fixed parameter estimates γ, θ are given by
α12 = α13 = − q12e−υ2t
δ〈e2, g〉 −
(t+ υ−11 )e−υ1t
δg~Y|ξ,τ1(~y|t, t),
α23 =q12e
−υ2t
δ(〈e2, g〉 − t 〈e1, g〉) ,
ϕ1 =q12e
−υ2t
δ〈e1, g〉 , (4.2.20)
ϕ2 =e−υ1t
δg~Y|ξ,τ1(~y|t, t),
δ = q12e−υ2t 〈e1, g〉+ e−υ1tg~Y|ξ,τ1(~y|t, t),
and vectors e1 and e2 are defined in (4.2.16). Similarly, the second term QobsS , which is a
Chapter 4. Parameter Estimation for Stochastic Systems 76
function only of the observation parameters θ, simplifies to
QobsS (θ|γ, θ) =
∫ln(g~Y|ξ,τ1(~y|t, s)
)g~Y|ξ,τ1(~y|t, s)h(t|s)fτ1(s)ds∫
g~Y|ξ,τ1(~y|t, u)h(t|u)fτ1(u)du
=:T∑k=1
βk ln(g~Y|ξ,τ1(~y|t, k∆)
)+ βt ln
(g~Y|ξ,τ1(~y|t, t)
),
where constants that depend only on γ, θ are given by
βk =q12e
−υ2tek1
δg~Y|ξ,τ1(~y|t, k∆), k = 1, . . . , T,
βt =
(q12e
−υ2tek1 + e−υ1t
δ
)g~Y|ξ,τ1(~y|t, t). (4.2.21)
To finish the proof, put ~α = (α12, α13, α23)′ and ~β = (β1, . . . , βT , βt)
′.
As in the case of Theorem 4.2.2, the quantities appearing in Theorem 4.2.3 can be
given similar probabilistic interpretations.
Finally, for the general case in which we have observed H independent failure histories
H1, . . . ,HH and K independent suspension histories S1, . . . ,SK , Theorems 4.2.2 and 4.2.3
and equation (4.2.12) imply that the pseudo likelihood function is given by
Q(γ, θ|γ, θ) = Eγ,θ (lnL(γ, θ|C)|O)
= Eγ,θ
(ln
(H∏i=1
LHi(γ, θ)K∏j=1
LSj(γ, θ)
)|O
)
=H∑i=1
Eγ,θ (ln (LHi(γ, θ)) |Hi) +K∑j=1
Eγ,θ
(ln(LSj(γ, θ)
)|Sj)
=H∑i=1
QHi(γ, θ|γ, θ) +K∑j=1
QSj(γ, θ|γ, θ). (4.2.22)
Thus, to evaluate the pseudo likelihood function for all available histories, is suffices to
evaluate the pseudo likelihood function for individual failure and suspension histories
separately. Equation (4.2.22) completes the E-step of the EM algorithm. In the next
subsection, we solve the M-step of the EM algorithm and derive explicit parameter update
formulas for the maximizers of the pseudo likelihood function defined in (4.2.22).
Chapter 4. Parameter Estimation for Stochastic Systems 77
4.2.3 Maximization of the Pseudo Likelihood Function
In this subsection we are interested in finding maximizers of the pseudo likelihood function
defined in (4.2.22). By Theorems 4.2.2 and 4.2.3, the pseudo likelihood function can be
decomposed as Q(γ, θ|γ, θ) = Qstate(γ|γ, θ) +Qobs(θ|γ, θ), where Qstate is a function only
of the state parameters γ = (q12, q13, q23) and Qobs is a function only of the observation
parameters θ = (µ1, µ2,Σ1,Σ2). This means that the M-step can be carried out separately
for the state and observation parameters. Using equation (4.2.22) and Theorems 4.2.2
and 4.2.3, we solve for the stationary points of the state parameters γ = (q12, q13, q23).
After some algebra, it is not difficult to check that there is a unique stationary point
γ∗ = (q∗12, q∗13, q
∗23) of the pseudo likelihood function given explicitly by
q∗12 = −
H∑i=1
bi12 +K∑j=1
ϕj1 +
K∑j=1
ϕj2
(H∑i=1
bi12 +K∑j=1
ϕj1
)H∑i=1
bi12 +K∑j=1
ϕj1 +H∑i=1
bi13
H∑i=1
ai12 +K∑j=1
αj12
,
q∗13 = q∗12
H∑i=1
bi13
H∑i=1
bi12 +K∑j=1
ϕj1
, q∗23 = −
H∑i=1
bi23
H∑i=1
ai23 +K∑j=1
αj23
, (4.2.23)
where constants ai
= (ai12, ai13, a
i23), b
i
= (bi12, bi13, b
i23), ~α
j = (αj12, αj13, α
j23), ϕ
j1, and ϕj2 are
given in equations (4.2.15) and (4.2.20). Similarly, using equations (4.2.7) and (4.2.8),
it follows that there is a unique stationary point of the observation parameters θ∗ =
Chapter 4. Parameter Estimation for Stochastic Systems 78
(µ∗1, µ∗2,Σ
∗1,Σ
∗2) given explicitly by
µ∗1 =
H∑i=1
ni1 · ci+
K∑j=1
nj1 · ~βj
H∑i=1
⟨ci,di1⟩
+K∑j=1
⟨~βj,dj1
⟩ , Σ∗1 =
H∑i=1
ni3 · ci+
K∑j=1
nj3 · ~βj
H∑i=1
⟨ci,di1⟩
+K∑j=1
⟨~βj,dj1
⟩ ,
µ∗2 =
H∑i=1
ni2 · ci+
K∑j=1
nj2 · ~βj
H∑i=1
⟨ci,di2⟩
+K∑j=1
⟨~βj,dj2
⟩ , Σ∗2 =
H∑i=1
ni4 · ci+
K∑j=1
nj4 · ~βj
H∑i=1
⟨ci,di2⟩
+K∑j=1
⟨~βj,dj2
⟩ , (4.2.24)
where vectors
n1 =
(0,∑n≤1
yn, . . . ,∑n≤T
yn
), n2 =
(∑n≥1
yn,∑n≥2
yn, . . . , yT , 0
),
n3 =
(0,∑n≤1
(yn − µ∗1)(yn − µ∗1)′, . . . ,∑n≤T
(yn − µ∗1)(yn − µ∗1)′),
n4 =
(∑n≥1
(yn − µ∗2)(yn − µ∗2)′, . . . , (yT − µ∗2)(yT − µ∗2)′, 0
),
d1 = (0, 1, . . . , T )′, d2 = (T, T − 1, . . . , 1, 0)
′,
and constants ci
= (ci1, . . . , ciTi, citi) and ~βj = (βj1, . . . , β
jTj, βjtj) are given in equations
(4.2.17) and (4.2.21). This completes the M-step of the EM algorithm.
The results obtained in equations (4.2.23) and (4.2.24) can be viewed as a general-
ization of the parameter estimation result for multivariate normal mixture models (see
e.g. McLachlan and Krishnan [55], Section 2.7.2, equations (2.56) and (2.58)). In such
mixture models, multivariate normal data is drawn from a finite number of unobservable
groups, where the mean and covariance matrix can depend on the underlying group.
In our model, the mean and covariance matrix depend on the unobservable state (i.e.
healthy or warning state) of the Markov process (Xt), which makes the analysis more
difficult.
Chapter 4. Parameter Estimation for Stochastic Systems 79
Figure 4.3.1: Spectrometric Measurements of Copper and Iron
4.3 A Practical Application
In this section, we develop a numerical example using real-world data coming from mining
industry, which illustrates the entire estimation procedure. In particular, we analyze con-
dition monitoring data coming from the transmission oil samples of 240-ton heavy hauler
trucks used in the Athabasca oil sands of Alberta, Canada. During the operational life of
each transmission unit, oil samples are collected every ∆ = 600 hours and spectrometric
oil analysis is carried out, which provides the concentrations (in ppm) of iron and copper
that come from the direct wear of the transmission unit. The total number of data his-
tories recorded is 36, which consists of H = 13 failure histories and K = 23 suspension
histories. A typical data history is given in Figure 4.3.1. This particular transmission
unit failed after the 13th sampling epoch at 8123 operational hours.
As detailed in Section 4.1, to satisfy the assumption of independence and normality,
we first need to fit a model that accounts for autocorrelation in the data histories, and
choose as the observation process, in our the hidden Markov model, the residuals of the
Chapter 4. Parameter Estimation for Stochastic Systems 80
fitted model. Before fitting a model to the data histories, we have to approximate the
healthy portions of the data histories. Partitioning a non-stationary time series into a
finite number of stationary portions is known as time series segmentation. The purpose
of segmentation in our application is to achieve stationarity in the healthy portions of the
data histories so that the residuals of the fitted model can be computed. Generally, there
is no agreed upon criterion for selecting the ‘optimal’ segmentation. Thus, a variety
of segmentation methods exist, ranging from very sophisticated algorithms to simple
heuristic graphical methods (see e.g. Keogh [36] and Fukuda et al. [23]). For our
application, for simplicity we have chosen to segment the data histories via graphical
examination. For each of the H + K = 36 data histories, the healthy portions of the
data histories are denotedzl1, . . . , z
ltl
, l = 1, . . . , H+K. The healthy data histories are
assumed to follow a common stationary VAR process (see e.g. Reinsel [60]) given by
Zn − δ0 =
p∑r=1
Φr(Zn−1 − δ0) + εn, n ∈ Z, (4.3.1)
where εn are i.i.d. N2(0,C), the model order p ∈ N, the autocorrelation matrices Φr ∈
R2×2, and the mean and covariance model parameters δ0 ∈ R2 and C ∈ R2×2. All model
parameters are unknown and need to be estimated. We set δ = δ0 =∑p
r=1Φrδ0, and
write equation (4.3.1) in standard form
Zn = δ +
p∑r=1
ΦrZn−1 + εn, n ∈ Z,
so that the observed healthy data historieszl1, . . . , z
ltl
, l = 1, . . . , H + K, have the
regression representation W = VA+ E, where
W′ =[zH+KtH+K· · · zH+K
p+1 · · · z1t1· · · z1
p+1
], A′ = [δ Φ1 · · ·Φp] ,
E ′ =[εH+KtH+K· · · εH+K
p+1 · · · ε1t1· · · ε1
p+1
],
V′ = =
1 · · · 1 · · · 1 · · · 1
zH+KtH+K−1 · · · zH+K
p · · · z1tH+K−1 · · · z1
p
... · · · ... · · · ... · · · ...
zH+KtH+K−p · · · zH+K
1 · · · z1tH+K−p · · · z1
1
.
Chapter 4. Parameter Estimation for Stochastic Systems 81
Reinsel [60] showed that the least squares estimates for A and C are given by
A = (V′V)−1V′W,
C = (T − (2p+ 1))−1(W−VA)′(W−VA), (4.3.2)
where T =∑H+K
l=1(tl − p) is the total number of available data points. The estimate for
the model order p ∈ N is obtained by testing H0 : Φp = 0 against Ha : Φp 6= 0 and using
the likelihood ratio statistic given by
Mp = −(T − 2p− 1− 1/2) lndet(Sp)
det(Sp−1),
where Sp = (W −VA)′(W −VA) is the residual sum of squares matrix obtained from
(4.3.2) when fitting a VAR model of order p ∈ N. For T large, if H0 is true, Mp converges
in distribution to χ24. Thus, for significance level α ∈ (0, 1), we reject H0 if Mp > χ2
4,α.
For our 2-dimensional spectrometric oil data, we find that M2 = 221.84 and M3 =
10.28. From the chi-square distribution with 4 degrees of freedom and α = 0.01, χ24,α =
13.28. Since M2 > χ24,0.01 and M3 < χ2
4,0.01, we reject H0 : Φ2 = 0 and fail to reject
H0 : Φ3 = 0. Thus we conclude that p = 2 is an adequate model order and using (4.3.2)
the VAR model parameter estimates are given by
Φ1 =
0.3825 −0.0758
−0.0672 0.1775
, Φ2 =
0.3356 0.0063
−0.0169 0.3532
,δ =
7.6819
4.0570
, C =
7.1789 2.0260
2.0260 3.5725
, (4.3.3)
From parameter estimates given in (4.3.3), the eigenvalues of
Φ =
Φ1 Φ2
I 0
,are 0.8200, 0.6729, -0.4156, and -0.5173 which are all smaller than one in absolute value
implying the fitted model is stationary.
Chapter 4. Parameter Estimation for Stochastic Systems 82
Using estimates ψ = (δ, p, Φ1, Φ2, C), we define the residual process (Yn : n ∈ N) by
Yn := Zn − Eψ(Zn|~Zn−1), (4.3.4)
where ~Zn−1 = (Z1, . . . , Zn−1). The residuals are then computed for both the healthy
and unhealthy portions of each data history. We now present a method for explicitly
computing (4.3.4).
We first note that for n > p,
Yn = Zn −
[δ +
p∑r=1
ΦrZn−1
].
For n < p, we recursively compute Yn using the Kalman filter by writing (4.3.1) as
a state-space model (see e.g. Reinsel [60]). We choose as the state and observation
equation
αn = D + Tαn−1 + En,
Zn = Hαn,
where
αn =
Zn...
Zn−p+1
, D =
δ
0
...
0
, T =
Φ1 Φ2 · · · Φp
Iq 0 · · · 0
.... . . . . .
...
0 · · · Iq 0
,
En =
εn
0
...
0
, H = (Iq 0 · · ·0) ,
Chapter 4. Parameter Estimation for Stochastic Systems 83
and εn are i.i.d. Nq(0, C), and q = 2. For each m ≥ 0, define
αn+m|n = E(αn+m|~Zn),
Pn+m|n = E((αn+m − αn+m|n)(αn+m − αn+m|n)′|~Zn),
ηn+m|n = Zn+m − E(Zn+m|~Zn),
fn+m|n = E(ηn+m|nη′n+m|n|~Zn).
Then, the Kalman filter is given by the following six recursive equations
αn+1|n = D + Tαn|n, Pn+1|n = TPn|nT′ + C,
ηn+1|n = Zn+1 −Hαn+1|n, fn+1|n = HPn+1|nH′,
αn+1|n+1 = αn+1|n + (Pn+1|nH′)(f−1
n+1|n)ηn+1|n,
Pn+1|n+1 = Pn+1|n − (Pn+1|nH′)(f−1
n+1|n)(HPn+1|n),
which is initiated by setting
α0|0 = (Iqp − T )−1D,
P0|0 = vec−1[(I(qd)2 − T ⊗ T )−1vec(C)
].
Thus, for each n ≤ p, using the recursive equations above we obtain
Yn := Zn − Eψ(Zn|~Zn−1) =: ηn|n−1.
The residuals for both the healthy and warning data sets are provided graphically in
a 2-dimensional scatter plot in Figure 4.3.2.
We statistically test the independence and normality assumptions using the Port-
manteau Independence Test [15] and the Henze-Zirkler Multivariate Normality Test [29],
respectively, and obtain the following results.
Table 4.3.1 shows that there is no statistical evidence to reject the hypotheses that the
residuals of the fitted model are independent and have multivariate normal distribution,
as proved theoretically by Yang and Makis [79].
Chapter 4. Parameter Estimation for Stochastic Systems 84
Figure 4.3.2: Scatter plot for the residuals. The crosses are residuals computed from the healthy data
and the circles are residuals computed from the unhealthy data.
Table 4.3.1: p-Values of the Residual Independence and Normality Tests.
Healthy Data Set Unhealthy Data Set
Independence (Portmanteau) 0.0675 0.4284
Normality (Henze-Zirkler) 0.6911 0.5270
The residuals now constitute the observation process (Yn) in our hidden Markov
model. Using equations (4.2.23) and (4.2.24) of Section 4.2.3, and the Euclidean norm
stopping criterion |(γn+1, θn+1)− (γn, θn)| < 10−4, we have obtained the following results
summarized in Table 4.3.2.
Table 4.3.2 shows that iterations of the EM algorithm take on average 8.27 seconds
which is extremely fast for offline computations. Furthermore, the estimates converge
rapidly in 3 iterations, which is an attractive feature for real applications. All computa-
tions were coded in Matlab on an Intel Corel 2 6420, 2.13 GHz with 2 GB RAM.
Thus, for this application the condition of the transmission unit is modeled as a
Chapter 4. Parameter Estimation for Stochastic Systems 85
Table 4.3.2: Iterations of the EM Algorithm.
Initial Values Update 1 Update 2 Optimal Estimates
q12 0.0030 0.0410 0.0302 0.0303
q13 0.0000 0.0001 0.0001 0.0001
q23 0.1500 0.3510 0.3545 0.3548
µ1
1.5
0.8
1.2
0.8
1.1
1.9
1.1
1.9
µ2
11
5.5
4.2
5.3
4.2
5.5
4.1
5.5
Σ1
11.2 6.8
6.8 8.9
7.2 1.8
1.8 3.2
7.2 1.9
1.9 3.7
7.2 2.0
2.0 3.6
Σ2
11.2 6.8
6.8 8.9
7.4 1.3
1.3 3.1
7.5 1.1
1.1 3.2
7.6 1.0
1.0 3.2
Q −1.78× 10−3 −1.41× 10−3 −1.39× 10−3 −1.39× 10−3
Time (sec) 4.12 7.19 9.83 11.95
continuous time homogeneous Markov chain (Xt : t ∈ R+) with state space X = 1, 2 ∪
3 and transition rate matrix
Q =
−0.0304 0.0303 0.0001
0 −0.3548 0.3548
0 0 0
and the bivariate residual vectors Yn follows N2 (µ1,Σ1) when the system is in healthy
state 1 and N2 (µ2,Σ2) when the system is in warning state 2, where
µ1 =
1.1
1.9
, µ2 =
4.1
5.5
, Σ1 =
7.2 2.0
2.0 3.6
, Σ2 =
7.6 1.0
1.0 3.2
.
Chapter 4. Parameter Estimation for Stochastic Systems 86
4.4 Conclusions and Future Research
In this chapter, a parameter estimation problem for partially observable failing systems
has been considered. System deterioration is driven by a continuous time homogeneous
Markov chain and the system state is unobservable, except the failure state. Vector
autoregressive information is obtained through condition monitoring at equidistant sam-
pling times. Two types of data histories were considered: data histories that end with
observable failure and data histories that end when the system has been suspended from
operation. The state and observation process have been modeled in the hidden Markov
framework and the maximum likelihood estimates of the model parameters have been
obtained using the EM algorithm. It was shown that both the pseudo likelihood function
and the parameter updates in each iteration of the EM algorithm have explicit formulas.
A numerical example has been developed to illustrate the estimation procedure using real
oil data coming from failing transmission units. It has been found that the procedure is
both computationally efficient and converges rapidly to reasonable parameter estimates.
There are a variety of interesting extensions and topics for future research. Recall
that at the beginning of Section 4.3, the observation histories were pre-processed and
residuals obtained. One direction of future research would be to systematically investigate
the effect that different pre-processing methods have on the parameter estimates in the
hidden Markov framework. In Section 4.3, it was shown empirically that the parameter
estimates converged quite rapidly using the EM algorithm. Another interesting topic of
future research would be to analytically investigate the rate of this convergence using
methods of mathematical statistics. Finally, recall that it was assumed that only a single
vector measurement is taken at each sampling epoch, which is the usual case in condition-
based maintenance applications. A final interesting topic for future research would be to
see if the analysis given in this chapter can be extended to the case where more than one
sampling unit is collected at each sampling epoch.
Bibliography
[1] Anderson, R.F.; Friedman, A. Optimal Inspection in a Stochastic Control Problem
with Costly Observations II. Math Oper Res, 1978, 3, 67-81.
[2] Asmussen, S.; Nerman, O.; Olsson M. Fitting Phase-Type Distribution via the EM
Algorithm. Scan J Stat, 1996, 23, 419-441.
[3] Aven, T.; Bergman, B. Optimal Replacement Times - A General Set-up. J Appl
Probab, 1986, 23, 432-442.
[4] Avriel, M.; Diewert, W.E.; Schaible, S.; Zang, I. Generalized Concavity. Springer,
1988.
[5] Baddeley, A.; Turner, R.; Moller, J.; Hazelton, M. Residual Analysis for Spatial
Point Processes. J Roy Stat Soc, 2005, 67, 617-666.
[6] Barlow, R.; Hunter, L. Optimum Preventive Maintenance Policies. Oper Res, 1960,
8, 90-100.
[7] Bertsekas, D.P.; Shreve, S.E. Stochastic Optimal Control: The Discrete Time Case.
Academic Press, New York, 1978.
[8] Bimls, C.; McCarthy, D.; Al-Ani, T. Condition-Based Maintenance of Machine Using
Hidden Markov Models. Mech Syst Signal Pr, 2000, 14, 597-612.
[9] Billingsley, P. Probability and Measure. Wiley-Interscience, 1995.
87
Bibliography 88
[10] Bremaud, P. Point Processes and Queues: Martingale Dynamics. Springer-Verlag,
1981.
[11] Calabrese, J.M. Bayesian Process Control for Attributes. Manage Sci, 1995, 41,
637-645.
[12] Cekyay, B.; Ozekici, S. Condition-Based Maintenance under Markovian Deteriora-
tion. In Wiley Encyclopedia of Operations Research and Management Science, John
Wiley & Sons, NJ, 2011.
[13] Chhatwal, J.; Alagoz, O.; Burnside, E.S. Optimal Breast Biopsy Decision-Making
Based on Mammographic Features and Demographic Factors. Oper Res, 2010, 58,
1577-1591.
[14] Christer, A.H.; Wang, W.; Sharp, J.M. A State Space Condition Monitoring Model
for Furnace Erosion Prediction and Replacement. Eur J Oper Res, 1997, 101, 1-14.
[15] Cromwell, J.B.; Hannan, M.J.; Labys, W.C.; Terraza, M. Multivariate Tests for
Time Series Models. Sage Publications, 1994.
[16] Davis, M.H.A. Markov Models and Optimization. Chapman and Hall, 1993.
[17] Dayanik, S.; Goulding, C.; Poor, H.V. Bayesian Sequential Change Diagnosis. Math
Oper Res, 2008, 33, 475-496.
[18] Dayanik, S.; Gurler, U. An Adaptive Bayesian Replacement Policy with Minimal
Repair. Oper Res, 2002, 50, 552-558.
[19] Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum Likelihood from Incomplete
Data via the EM Algorithm. J Roy Stat Soc, 1977, 39, 1-38.
[20] Dieulle, L.; Berenguer, C.; Grall, A.; Roussignol, M. Sequential Condition-Based
Maintenance Scheduling for a Deteriorating System. Eur J Oper Res, 2003, 150,
451-461.
Bibliography 89
[21] Dogramaci, A.; Fraiman, N.M. Replacement Decisions with Maintenance Under
Uncertainty: An Imbedded Optimal Control Model. Oper Res, 2004, 52, 785-794.
[22] Douc, R.; Moulines, E.; Ryden, T. Asymptotic Properties of the Maximum Likeli-
hood Estimator in Autoregressive Models with Markov Regime. Ann Stat, 2004, 32,
2254-2304.
[23] Fukuda, K.; Stanley, H.E.; Amaral, L.A.N. Heuristic Segmentation of Non-
Stationary Time Series. Phys Rev E, 2004, 69, 1-12.
[24] Ghasemi, A.; Yacout, S.; Ouali, M.S. Parameter Estimation Methods for Condition-
Based Maintenance with Indirect Observations. IEEE T Reliab, 2010, 59, 426-439.
[25] Genon-Catalot, V.; Laredo, C. Leroux’s Method for General Hidden Markov Models.
Stoch Proc Appl, 2006, 116, 222-243.
[26] Grimmett, G.; Stirzaker, D. Probability and Random Processes. Oxford University
Press, 2001.
[27] Hamilton, J.D. Analysis of Time Series Subject to Changes in Regime. J Economet-
rics, 1990, 45, 39-70.
[28] Heidergott, B.; Farenhorst-Yuan, T. Gradient Estimation for Multicomponent Main-
tenance Systems with Age-Replacement Policy. Oper Res, 2010, 58, 706-718.
[29] Henze, N.; Zirkler, B. A Class of Invariant Consistent Tests for Multivariate Nor-
mality. Commun Stat A-Theor, 1990, 19, 3595-3617.
[30] Jardine, A.K.S.; Lin, D.; Banjevic, D. A Review on Machinery Diagnostics and Prog-
nostics Implementing Condition-Based Maintenance. Mech Syst Signal Pr, 2006, 20,
1483-1510.
[31] Jensen, U. Monotone Stopping Rules for Stochastic Processes in a Semimartingale
Representation with Applications. Optim, 1989, 6, 837-852.
Bibliography 90
[32] Jiang, R. Optimization of Alarm Threshold and Sequential Inspection Scheme. Re-
liab Eng Syst Safe, 2010, 95, 208-215.
[33] Jiang, X.; Makis, V.; Jardine, A.K.S. Optimal Repair-Replacement Policy for a
General Repair Model. Adv Appl Probab, 2001, 33, 206-222.
[34] Juang, M.; Anderson, G. A Bayesian Method on Adaptive Preventive Maintenance
Problem. Eur J Oper Res, 2004, 155, 455-473.
[35] Kander, Z. Inspection Policies for Deteriorating Equipment Characterized by N
Quality Levels. Nav Res Log, 1978, 25, 243-255.
[36] Keogh, E.; Chu, S.; Hart, D.; Pazzani, M. Segmenting Time Series: A Survey and
Novel Approach. World Scientific, 1993.
[37] Kim, C.G. Dynamic Linear Models with Markov-Switching. J Econometrics, 1994,
60, 1-22.
[38] Kim, M.J.; Jiang, R.; Makis, V.; Lee, C.G. Optimal Bayesian Fault Prediction
Scheme for a Partially Observable System Subject to Random Failure. Eur J Oper
Res, 2011, 214, 331-339.
[39] Kim, M.J.; Makis, V. Optimal Control of Partially Observable Failing Systems with
Costly Multivariate Observations. Stoch Model, 2012, Under Review.
[40] Kim, M.J.; Makis, V. Joint Optimization of Sampling and Control of Partially Ob-
servable Failing Systems. Oper Res, 2012, Under Review.
[41] Kim, M.J.; Makis, V.; Jiang, R. Parameter Estimation in a Condition Based Main-
tenance Model. Stat Probab Lett, 2010, 80, 1633-1639.
[42] Kim, M.J.; Makis, V.; Jiang, R. Parameter Estimation for Partially Observable
Systems Subject to Random Failure. Appl Stoch Model Bus, 2012, forthcoming.
Bibliography 91
[43] Krishnamurthy, V,; Yin, G.G. Recursive algorithms for estimation of hidden Markov
models and autoregressive models with Markov regime. IEEE T Inform Theory, 2002,
48, 458-476.
[44] Kurt, M.; Kharoufeh, J.P. Optimally Maintaining a Markovian Deteriorating System
with Limited Imperfect Repairs. Eur J Oper Res, 2010, 205, 368-380.
[45] Lam, C.T.; Yeh, R.H. Comparison of Sequential and Continuous Inspection Strate-
gies for Deteriorating Systems. Adv Appl Probab, 1994, 26, 423-435.
[46] Li, H.; Shaked, M. Imperfect Repair Models with Preventive Maintenance. J Appl
Probab, 2003, 40, 1043-1059.
[47] Lin, D.; Makis, V. Recursive Filters for a Partially Observable System Subject to
Random Failure. Adv Appl Probab, 2003, 35, 207-227.
[48] Liporace, L.A. Maximum Likelihood Estimation for Multivariate Observations of
Markov Sources. IEEE T Inform Theory, 1982; 28, 729-734.
[49] Makis, V. Multivariate Bayesian Control Chart. Oper Res, 2008, 56, 487-496.
[50] Makis, V.; Jardine, A.K.S. Optimal Replacement in the Proportional Hazards
Model. INFOR, 1992, 30, 172-183.
[51] Makis, V.; Jiang, X. Optimal Replacement Under Partial Observations. Math Oper
Res, 2003, 28, 382-394.
[52] Makis, V.; Jiang, X.; Cheng, K. Optimal Preventive Replacement Under Minimal
Repair and Random Repair Costs. Math Oper Res, 2000, 25, 141-156.
[53] Makis, V.; Wu, J.; Gao, Y.; An Application of DPCA to Oil Data for CBM Modeling.
Eur J Oper Res, 2006, 174, 112-123.
Bibliography 92
[54] Maillart, L.M.; Ivy, J.S.; Ransom, S.; Diehl, K. Assessing Dynamic Breast Cancer
Screening Policies. Oper Res, 2008, 56, 1411-1427.
[55] McLachlan, G.J.; Krishnan, T. The EM Algorithm and Extensions. John Wiley &
Sons, 2008.
[56] Neuts, M.F.; Perez-Ocon, R.; Torres-Castro, I. Repairable Models with Operating
and Repair Times Governed by Phase Type Distributions. Adv Appl Probab, 2000,
32, 468-479.
[57] Nikiforov, I.V. A Generalized Change Detection Problem. IEEE T Inform Theory,
1995, 41, 171-187.
[58] Ohnishi, M.; Kawai, H.; Mine, H. An Optimal Inspection and Replacement Policy
for a Deteriorating System. J Appl Probab, 1986, 23, 973-988.
[59] Provost, S.B.; Rudiuk, E.M. The Exact Distribution of Indefinite Quadratic Forms
in Noncentral Normal Vectors. Ann I Stat Math, 1996, 48, 381-394.
[60] Reinsel, G.C. Elements of Multivariate Time Series Analysis. Springer, New York,
1997.
[61] Roberts, W.J.J.; Ephraim, Y. An EM Algorithm for Ion-Channel Current Estima-
tion. IEEE T Signal Proces, 2008, 56, 26-33.
[62] Rosenfield, D. Markovian Deterioration with Uncertain Information. Oper Res, 1976,
24, 141-155.
[63] Ross, S.M. Quality Control Under Markovian Deterioration. Manage Sci, 1971, 17,
587-596.
[64] Ryden, T. On Recursive Estimation for Hidden Markov Models. Stoch Proc Appl,
1997, 66, 79-96.
Bibliography 93
[65] Schervish, M.J. Theory of Statistics. Springer, 1995.
[66] Shechter, S.M.; Bailey, M.D.; Schaefer, A.J.; Roberts, M.S. The Optimal Time to
Initiate HIV Therapy Under Ordered Health States. Oper Res, 2008, 56, 20-33.
[67] Schneider, H.; Frank, P.M. Observer-Based Supervision and Fault Detection in
Robots Using Nonlinear and Fuzzy Logic Residual Evaluation. IEEE T Contr Syst
T, 1996, 4, 274-282.
[68] Schoenberg, F.P. Multidimensional Residual Analysis of Point Process Models for
Earthquake Occurrences. J Am Stat Assoc, 2003, 98, 789-795.
[69] Sohn, H.; Farrar, C.R. Damage Diagnosis Using Time Series Analysis of Vibration
Signals. Smart Mater Struct, 2001, 10, 446-451.
[70] Tagaras, G.; Nikolaidis, Y. Comparing the effectiveness of various Bayesian X Con-
trol Charts. Oper Res, 2002, 50, 878-888.
[71] Tijms, H.C. Stochastic Models: An Algorithmic Approach. John Wiley, 1994.
[72] Valdez-Flores, C.; Feldman, R. A survey of preventive maintenance models for
stochasically deteriorating single-unit systems. Nav Res Log, 1989, 36, 419-446.
[73] Wang, L.; Chu, J.; Mao, W. A Condition-based Replacement and Spare Provisioning
Policy for Deteriorating Systems with Uncertain Deterioration to Failure. Eur J Oper
Res, 2009, 194, 184-205.
[74] Wang, H. A survey of maintenance policies of deteriorating systems. Eur J Oper
Res, 2002, 139, 469-489.
[75] Wang, W.; Wong, A.K. Autoregressive Model-Based Gear Fault Diagnosis. T ASME,
2002, 124, 172-179.
Bibliography 94
[76] Wang, X.; Makis, V.; Yang, M. A Wavelet Approach to Fault Diagnosis of a Gearbox
Under Varying Load Conditions. Journal of Sound and Vibration 2010; 329, 1570-
1585.
[77] Wu CFJ. On the Convergence Properties of the EM Algorithm. The Annals of
Statistics 1983; 11, 95-103.
[78] Wu, J.; Makis, V. Economic and Economic-Statistical Design of a Chi-Square Chart
for CBM. Eur J Oper Res, 2008, 188, 516-529.
[79] Yang, J.; Makis, V. Dynamic Response of Residual to External Deviations in a
Controlled Production Process. Technometrics, 2000, 42, 290-299.
[80] Yang M.; Makis V. ARX Model-Based Gearbox Fault Detection and Localization
Under Varying Load Conditions. J Sound Vib, 2010, 329, 5209-5221.
[81] Yin, Z.; Makis, V. Economic and Economic-Statistical Design of a Multivariate
Bayesian Control Chart for Condition-Based Maintenance. IMA J Manage Math,
2011, 22, 47-63.
[82] Yeh, R.H. Optimal Inspection and Replacement Policies for Multi-State Deteriorat-
ing Systems. Eur J Oper Res, 1996, 96, 248-259.