11
IEEE TRANSACTIONS ON RELIABILITY, VOL. 62, NO. 2, JUNE 2013 351 Optimal Preventive Maintenance Rate for Best Availability With Hypo-Exponential Failure Distribution Meng-Lai Yin, Senior Member, IEEE, John E. Angus, and Kishor S. Trivedi, Fellow, IEEE Abstract—The optimal rate of periodic preventive maintenance to achieve the best availability is studied for Markov systems with multiple degraded operational stages, where the time-to-failure has a hypo-exponential distribution. An analytical expression is de- veloped for the availability of such systems having operational stages, and a necessary and sufcient condition is derived for a non-trivial optimal rate of periodic maintenance to exist. Numer- ical procedures for nding the optimal rate of periodic mainte- nance are given, and examples are presented. Index Terms—Degraded-stage systems, hypo-exponential failure distribution, periodic preventive maintenance, system availability. NOTATION number of system operational stages transition rate from stage to the next degraded stage preventive maintenance rate repair rate rate to complete a preventive maintenance task availability as a function of ACRONYMS PM Preventive Maintenance CM Corrective Maintenance I. INTRODUCTION B ALANCING maintenance cost and system availability for economic efciency is the motivation of this study. In particular, we are interested in developing optimal preventive maintenance schedules to achieve the best availability. Manuscript received January 04, 2012; revised July 26, 2012 and October 30, 2012; accepted November 29, 2012. Date of publication April 11, 2013; date of current version May 29, 2013. Associate Editor: L. Walls. M.-L. Yin is with the Electrical and Computer Engineering Department, California State Polytechnic University, Pomona, CA 91768 USA (e-mail: [email protected]). J. E. Angus is with the School of Mathematical Sciences, Claremont Graduate University, Claremont, CA 91711 USA (e-mail: [email protected]). K. S. Trivedi is with the Department of Electrical and Computer Engineering, Duke University, Durham, NC 27708 USA (e-mail: [email protected]). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TR.2013.2256672 Usually, maintenance activities fall into two categories: cor- rective, or preventive [1]. Corrective maintenance (CM) oc- curs when the system has failed, while preventive maintenance (PM) takes place periodically to reduce or eliminate accumu- lated degradation. CM is not the focus here; what this paper will focus on is the rate at which PM is performed. For many real-life systems such as aircraft [2], power systems [1], and mission-critical systems, determining the best PM rate is a crit- ical design issue. Thus, we address the issue of nding the op- timal PM rate with a hypo-exponential failure distribution for the system, while all other system parameters remain xed. A. Previous Works on PM Optimization PM optimization has been studied extensively, particularly in the power systems literature, and mainly in the context of multi-objective optimization. For multi-objective optimization, maintenance activities are managed with multiple objectives considered. Hilber, et al. [3] used multi-objective optimization to study the problem of balancing PM and CM for electrical net- work systems, with the goal of obtaining the lowest total cost. Monte Carlo simulations and a heuristic approach were applied to reach the goal. Stopczyk et al. [4] developed a maintenance policy model using a semi-Markov process and a modied sim- ulated annealing algorithm, also for a multi-objective optimiza- tion. Yang and Chang [5] proposed a method for multi-objective maintenance scheduling where a two-level modeling scheme is applied. Their method was implemented using simulations. In the present work, we consider the single-objective optimization problem of nding the best availability for a type of system that is common in electronic systems modeling. With regard to restoration, minimal PM and major PM can be distinguished. Minimal PM restores the system to a previous, less degraded stage, while major PM restores the system to a “good as new” state. Sim and Endrenyi [6] studied the optimal PM problem with minimal PM considered. Due to the main- tenance transitions considered, they found that the availability could be obtained using a recursive computational scheme [6]. On the other hand, Chen and Trivedi [7] studied systems with major PM, and formulated general expressions for system avail- ability under general failure, repair, and maintenance distribu- tions. Models considering both major and minor PM activities can be found in Endrenyi el. al. [8], and Chen and Trivedi [9], where simulations, and analytical approaches were respectively employed. Our work considers major PM, and derives the availability function analytically for the purpose of obtaining the optimal 0018-9529/$31.00 © 2013 IEEE

Optimal Preventive Maintenance Rate for Best Availability With Hypo-Exponential Failure Distribution

Embed Size (px)

DESCRIPTION

achieve the best availability is studied for Markov systems withmultiple degraded operational stages,where the time-to-failure hasa hypo-exponential distribution. An analytical expression is developedfor the availability of such systems having operationalstages, and a necessary and sufficient condition is derived for anon-trivial optimal rate of periodic maintenance to exist. Numericalprocedures for finding the optimal rate of periodic maintenanceare given, and examples are presented.

Citation preview

  • IEEE TRANSACTIONS ON RELIABILITY, VOL. 62, NO. 2, JUNE 2013 351

    Optimal Preventive Maintenance Rate forBest Availability With Hypo-Exponential

    Failure DistributionMeng-Lai Yin, Senior Member, IEEE, John E. Angus, and Kishor S. Trivedi, Fellow, IEEE

    AbstractThe optimal rate of periodic preventive maintenanceto achieve the best availability is studied for Markov systems withmultiple degraded operational stages, where the time-to-failure hasa hypo-exponential distribution. An analytical expression is de-veloped for the availability of such systems having operationalstages, and a necessary and sufficient condition is derived for anon-trivial optimal rate of periodic maintenance to exist. Numer-ical procedures for finding the optimal rate of periodic mainte-nance are given, and examples are presented.

    Index TermsDegraded-stage systems, hypo-exponential failuredistribution, periodic preventive maintenance, system availability.

    NOTATION

    number of system operational stages

    transition rate from stage to the next degraded stage

    preventive maintenance rate

    repair rate

    rate to complete a preventive maintenance task

    availability as a function of

    ACRONYMS

    PM Preventive Maintenance

    CM Corrective Maintenance

    I. INTRODUCTION

    B ALANCINGmaintenance cost and system availability foreconomic efficiency is the motivation of this study. Inparticular, we are interested in developing optimal preventivemaintenance schedules to achieve the best availability.

    Manuscript received January 04, 2012; revised July 26, 2012 and October 30,2012; accepted November 29, 2012. Date of publication April 11, 2013; date ofcurrent version May 29, 2013. Associate Editor: L. Walls.M.-L. Yin is with the Electrical and Computer Engineering Department,

    California State Polytechnic University, Pomona, CA 91768 USA (e-mail:[email protected]).J. E. Angus is with the School ofMathematical Sciences, Claremont Graduate

    University, Claremont, CA 91711 USA (e-mail: [email protected]).K. S. Trivedi is with the Department of Electrical and Computer Engineering,

    Duke University, Durham, NC 27708 USA (e-mail: [email protected]).Color versions of one or more of the figures in this paper are available online

    at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TR.2013.2256672

    Usually, maintenance activities fall into two categories: cor-rective, or preventive [1]. Corrective maintenance (CM) oc-curs when the system has failed, while preventive maintenance(PM) takes place periodically to reduce or eliminate accumu-lated degradation. CM is not the focus here; what this paperwill focus on is the rate at which PM is performed. For manyreal-life systems such as aircraft [2], power systems [1], andmission-critical systems, determining the best PM rate is a crit-ical design issue. Thus, we address the issue of finding the op-timal PM rate with a hypo-exponential failure distribution forthe system, while all other system parameters remain fixed.

    A. Previous Works on PM Optimization

    PM optimization has been studied extensively, particularlyin the power systems literature, and mainly in the context ofmulti-objective optimization. For multi-objective optimization,maintenance activities are managed with multiple objectivesconsidered. Hilber, et al. [3] used multi-objective optimizationto study the problem of balancing PM and CM for electrical net-work systems, with the goal of obtaining the lowest total cost.Monte Carlo simulations and a heuristic approach were appliedto reach the goal. Stopczyk et al. [4] developed a maintenancepolicy model using a semi-Markov process and a modified sim-ulated annealing algorithm, also for a multi-objective optimiza-tion. Yang and Chang [5] proposed a method for multi-objectivemaintenance scheduling where a two-level modeling scheme isapplied. Their method was implemented using simulations. Inthe present work, we consider the single-objective optimizationproblem of finding the best availability for a type of system thatis common in electronic systems modeling.With regard to restoration, minimal PM and major PM can be

    distinguished. Minimal PM restores the system to a previous,less degraded stage, while major PM restores the system to agood as new state. Sim and Endrenyi [6] studied the optimalPM problem with minimal PM considered. Due to the main-tenance transitions considered, they found that the availabilitycould be obtained using a recursive computational scheme [6].On the other hand, Chen and Trivedi [7] studied systems withmajor PM, and formulated general expressions for system avail-ability under general failure, repair, and maintenance distribu-tions. Models considering both major and minor PM activitiescan be found in Endrenyi el. al. [8], and Chen and Trivedi [9],where simulations, and analytical approaches were respectivelyemployed.Our work considers major PM, and derives the availability

    function analytically for the purpose of obtaining the optimal

    0018-9529/$31.00 2013 IEEE

  • 352 IEEE TRANSACTIONS ON RELIABILITY, VOL. 62, NO. 2, JUNE 2013

    PM rate, for systems with a hypo-exponential time-to-failuredistribution. Note that our work is different from [9] where con-dition-based PM was studied. For condition-based PM, whetheror not to conduct PM depends on the state of the system, whilefor time-based [7] PM, PM takes place at set intervals or pre-determined times. Our work is also different from [10], whereShirmohammadi et al. applied an ad-hoc linear cost function inthe exploration of an optimal PM rate, assuming imperfect re-pair and time-based PM. In our model, all restorations are per-fect, and the PM is age-based.

    B. The Challenge of Complexity and Fidelity

    To provide optimal maintenance rates for operational sys-tems, accuracy and efficiency are both crucial. This point leadsto the challenge of dealing with the two conflicting factors, e.g.,complexity, and fidelity. In general, analytical methods are ef-ficient. However, they are difficult to develop when the systemis complex, and a high-fidelity model is required [11]. On theother hand, discrete-event simulations can represent the systemwith high fidelity, but do not produce accurate results efficiently,especially in regard to rare events that are critical to safety orperformance.Numerous efforts have been devoted to address the issues

    of modeling complexity. In Chapter 6 of [11], Singh discussedmethods to deal with the difficulty of non-Markovian systemsin reliability modeling. One promising method mentionedthere was the device of stages method, which converts anon-Markov model into a Markov model by redefining thestate space. As stated in [11], the device-of-stages methodis a method of representing a non-exponentially distributedstate by a combination of stages each of which is exponen-tially distributed. In particular, the stages-in-series modeldescribed in [11] is applied here to model failures with thehypo-exponential distribution [12].Thus, assuming the hypo-exponential failure distribution, a

    general availability model with major PM (explained next) isdeveloped which serves as the basis for our study. An analyticalexpression for the availability for such a model is then derived,and a necessary and sufficient condition is specified for a non-trivial optimal rate of periodic maintenance to exist. Based onour findings, numerical procedures to find the optimal PM rateare given, and then demonstrated by examples and a simulation.Although finding the optimal periodic maintenance rate an-

    alytically for systems with non-exponentially distributed state-occupancy times is inherently difficult, we will show that theproblem can be alleviated by the device-of-stages method,and the analytical and numerical efforts can go hand-in-handto achieve efficiency.The work by van Casteren shows another way of attacking

    the issues of modeling complexity, which is related to PM. In[13], van Casteren proposed a calculation method for assessingthe interruption costs, where PM can be a cause for the interrup-tions. His work is based on the Weibull-Markov model, whichextends Markov models to include Weibull-distributed state oc-cupancy times. The method developed in [13] takes advantageof pre-processing to achieve efficiency.

    As highlighted in [11], assessing the reliability for non-Markovian systems in an analytical manner is difficult, not tomention obtaining the optimal PM rate. To the authors knowl-edge, no attempt has been made to find the optimal periodicmaintenance rate analytically for best system availability witha hypo-exponential failure distribution.

    II. PRELIMINARIES

    A. Problem Description

    In this study, the evolution of the system in continuous timeis described by a homogeneous continuous time Markov chain.That is, all state-occupancy times are exponentially distributed.The maintenance policy assumed is age replacement, whichmeans a unit is always replaced at the time of failure orhours after its installation, whichever occurs first [14]. canbe random or deterministic. It is also assumed that PM as wellas CM will bring the system back to its good as new state.While our main focus is with a model in which the PM policy

    is random, it is instructive to review here the special case whenis deterministic (with all other assumptions above main-

    tained). This case is studied extensively in Chapters 3 and 4 of[14]. Consider a systemwhose time to failure (for a new system)has a finite first moment, and has a continuous cumulative dis-tribution function . The system is repaired upon failure, orunits of time after it was last restored to a new condition,

    whichever comes first. All maintenance actions bring the systemto like new condition. Suppose the repair, and PM durationsare randomly distributed with means , and , respec-tively. Using renewal arguments (or the general equations thatare derived in [15]), the availability of this system is given by

    (1)If, in the cost function in Equation 2.2 of Chapter 4 in [14],

    we make the substitutions that , , ,and use the distribution that places probability 1 at the valueas the distribution corresponding there to , then the optimum(minimum) of that cost function would give themaximum avail-ability. Also, from Equation 2.3 in [14], we find that the optimalis the solution (if one exists) to

    (2)

    where is the failure rate functionfor the distribution . It is shown in [14] that there is a solu-tion to this equation if corresponds to an IFR (IncreasingFailure Rate) distribution, though it is possible that(i.e. do not perform any PM, and only perform repairs when thesystem fails). It is also shown in [14] that the optimal availability

    corresponding to that solution is at least as large as any

  • YIN et al.: OPTIMAL PREVENTIVE MAINTENANCE RATE FOR BEST AVAILABILITY WITH HYPO-EXPONENTIAL FAILURE DISTRIBUTION 353

    Fig. 1. Markov chain model for a PM or CM system with two operationalstages.

    optimal availability for the case in which the PM policy (i.e. )is random.The foregoing discussion addresses optimum PM policies

    that are nonrandom, that is, age-based, but with PM triggeredat a deterministic age . However, in mission-critical systemswhere down-time cannot be scheduled in advance, nonrandomPM is rarely feasible or desirable. Thus, this paper will focuson models in which PM occurs at random, and the foregoingdiscussion will merely serve to provide a means of finding ageneral upper bound to the availability achievable from PM.

    B. A Simple Case

    A simple system considered here can go through one de-graded stage before it is completely failed, at which time a repair(CM) will take place. When the system is still operating, peri-odic PM activities are triggered with constant rate . The ratefor (corrective) repair is . Both repair and PM take the systemback to the original (as good as new) state. A Markov chainto represent this system with two distinct operational stages (i.e.one degraded operational state) is shown in Fig. 1. The discus-sion below helps clarify and gives insights into solving the moregeneral case.The initial state is UP, where the system is fully operational.

    With the degradation rate , the system transits to state D1,which is a degraded operational state. With rate , the systemcan move from state D1 to state D2, which is the failed state.The CM, with repair rate , will take the system back to stateUP, and all PM actions take the system back to state UP.The system is available when it is in states UP or D1. While

    the system is available, periodic maintenance will take placewith rate . The system is not available when it is in state PM,or in state D2. The rate to finish PM is denoted as .The steady state availability can be obtained by solving the

    usual set of balance equations. Because we are interested in, the availability is represented as a function of with theother parameters fixed. This model is fairly simple to analyze,yielding

    (3)

    The optimal maintenance rate can be obtained by differen-tiating the availability expression above with respect to , andsolving for . The resulting quadratic equation forhas a non-negative root only when

    (4)

    When this condition is met, the maximum availability occurswhen

    (5)

    If condition (4) is not met, then the availability is a decreasingfunction of , and the maximum occurs at . Note that from(5), as m increases (i.e. average duration of PM decreases), theoptimal increases, as it should.If we ignored the maintenance (i.e. set ),

    and make the failed state D2 absorbing, then the model de-picted in Fig. 1 corresponds to a system that has a two-stagehypo-exponential failure time distribution, with parame-ters and . Let be the random variable representingthe failure times using this two-stage hypo-exponentialdistribution. Then has the probability density func-tion , andthe cumulative distribution function

    . Notethat these formulae also work for the case if one takesthe limit as .This failure distribution clearly does not have the memory-

    less property. Therefore, if we were to combine the UP and D1states into one operational state with the time-to-failure mod-eled directly by the hypo-exponential distribution, then the oc-cupancy time in the operational state is no longer exponentiallydistributed. However, such a non-Markovian model can be con-verted to a Markovian one, as shown in Fig. 1, which facilitatesthe pursuit of the optimization problem analytically.Thus, bymodeling the system as shown in Fig. 1, we have cre-

    ated a continuous time Markov model for this system in whichCM takes place upon failure (the transition labeled with ), andPM occurs at a constant rate while the system is operational.Note that a variation on this model could be to increase the rateof entering PM from the degraded state D1 (e.g. make it in-stead of ). However, this tacitly assumes that the state D1 is vis-ible, meaning one is always aware of which degraded state thesystem is in. Recalling the device-of-stages method discussedin Section I, the degraded states here are hidden, and only usedso as to convert the process into a Markovian one so that an an-alytical solution can be obtained.The availability as a function of the PM rate is displayed

    for two examples in Figs. 2 and 3. The system in Fig. 2 has arepair rate of 0.1 (per hour); the maintenance rate is set to6 per hour (i.e. 10 minute average duration), and ,

    (per hour). The optimal occurs at 0.007955 perhour. Fig. 3 has a repair rate equal to 1 (ten times faster interms of repair activity) with all other parameters the same as inthe previous figure. The optimal is 0.0004641 per hour.

    III. THE GENERAL MODEL

    We now turn to the general case. Fig. 4 illustrates a generalcontinuous time Markov chain model for the PM system with

    operational stages (i.e. degraded stages). The systemis available if it is in state UP, or in any of the D1 through

  • 354 IEEE TRANSACTIONS ON RELIABILITY, VOL. 62, NO. 2, JUNE 2013

    Fig. 2. Availability vs. PM rate example 1:( , , ).

    Fig. 3. Availability vs. PM rate example 2:( , , ).

    Fig. 4. A generalized Markov chain model for PM systems with operationalstages.

    states; the system is down if it is in state PM where the systemis under PM, or if it is in state Dn where the system has failed.This system can be solved in general the same way as the

    case. Alternatively, the general solution for availabilityin systems having multiple outage types has been worked outin [15] for the more general semi-Markov case, which can bedirectly applied here.In [14] and [9], methods were developed to obtain the avail-

    ability for systems with periodic PMwith general system failuredistributions. This work is also related to the research presentedin [7] where the system availability assuming non-exponen-tially distributed times-to-outages was derived. In particular, themodel shown in Fig. 5 describes such a systems behavior when

    Fig. 5. A state diagram model for PM systems with general distributions.

    there are two outage types. The UP state is the initial state wherethe system is functional. is the cumulative distributionfunction for the time to failure. represents the cumulativedistribution function for the time to trigger PM. is the cu-mulative distribution function for the PM task to be completed,and is the cumulative distribution function for the correc-tive repair times. For such a system, availability is defined asthe long-run probability of being in state UP or alternatively thelong-run fraction of time spent in state UP. Denoting state UPas state 0, state Down as state 1, and state PM as state 2, theavailability can be calculated using the formula from [15]:

    (6)

    , and are the one-step transition probabilities, given by

    (7)

    (8)

    Moreover, , , and are the mean sojourn times in thecorresponding states. Thus,

    (9)

    (10)

    (11)

    Comparing the state diagram model shown in Fig. 5 with theMarkov Chain model displayed in Fig. 4, we see that, for ourmodel, corresponds to the cumulative distribution of thesum of -independent exponentially distributed random vari-ables having respective rates , i.e. the hypo-expo-nential distribution [12]. This distribution describes the processof degradation of our system towards failure, whereas the cumu-lative distributions , , and correspond to thoseof exponential distributions with rates , , and respectively.Therefore, we have , and . Also, from thediagram, it is clear that . To get the remaining

  • YIN et al.: OPTIMAL PREVENTIVE MAINTENANCE RATE FOR BEST AVAILABILITY WITH HYPO-EXPONENTIAL FAILURE DISTRIBUTION 355

    quantities, we make use of the fact that the Laplace Transformof the exponential density with rate is

    (12)

    and that the Laplace Transform of a convolution of densities (i.e.the distribution of the sum of -independent random variables)is the product of the Laplace Transforms of those densities.Hence, the Laplace Transform of the aforementioned hypo-ex-ponential density is

    (13)

    From (13), and integration by parts, we obtain

    (14)

    and

    (15)

    Putting these together into (6) immediately gives

    (16)

    From this expression, it is clear that the availability dependson the parameters , , , , and only throughthe values , , , and

    . Define the function

    (17)

    Note that is a polynomial in . Hence, the apparentsingularity in at is removable (i.e. the limit as ap-proaches 0 is finite), and we may define

    (18)

    so that is defined for all and continuous. Then theavailability in (16) can be written more compactly as

    (19)

    It follows immediately that, when , the availability isgiven by

    (20)

    Fig. 6 illustrates the availability vs. PM rate using (19),for a system with , , ,

    , , , and. In the next section, we

    will describe how to find the optimal PM rate that maximizesthe availability.

    IV. OPTIMIZING THE AVAILABILITY

    Maximizing availability is equivalent to minimizing thequantity inside the brackets in (19). Differentiating that quan-tity with respect to and setting it equal to 0 yields

    (21)

    As in the case discussed earlier, roots of this equa-tion are candidate values of at which the availability is max-imized. Though there is no closed form for the roots, we candevelop a necessary and sufficient condition for there to be a so-lution to (21) that occurs at some value . Moreover, wecan argue that such a root is unique, and yields the maximumavailability. By direct differentiation, and using the fact that

    (22)

    we have

    (23)

    Applying LHospitals rule to (23), it is seen that, as ap-proaches infinity, the limit is 0, and as approaches 0, the limitis

    (24)

  • 356 IEEE TRANSACTIONS ON RELIABILITY, VOL. 62, NO. 2, JUNE 2013

    Because the function is a polynomial with positive co-efficients, and so is increasing in , the functionis also increasing, and thus

    (25)

    It can also be shown that the left side of (23) is decreasing inby arguing that for .

    Note that the left hand side of this inequality is a polynomial inof degree with the constant term, and all coefficients ofpowers of are negative. This result can be seen by multiplyingout the polynomials involved. We omit the tedious details.Because the left hand side of (23) is non negative and de-

    creasing for , taking value

    at , and value 0 as approaches infinity, then as long as

    (26)

    which is equivalent to

    (27)there will be a unique positive solution to (21). This solu-tion yields a minimum to the function inside the brackets in(19). This result follows from the negativity of the quantity

    for , which implies that thesolution yields a maximum availability as defined by (19).When , (21) is easily seen to agree with (4). Also, when(27) is not true, it follows that the maximum availability occurswhen .For the 4-operational-stage example shown in Fig. 6, the right

    side of (27) is 0.336, and . Thus theoptimal PM rate exists, as Fig. 6 confirms. By (27), as long as theaverage time to complete a PM activity is less than 33.6% of theaverage corrective repair time, the availability can be increasedby performing PM.When (27) is satisfied, we see from (23) that the optimal PM

    rate can be obtained uniquely by solving

    (28)

    Fig. 6. Availability vs. PM rate for a system with , ,, , , , and .

    If we denote the function

    (29)Then the secant method, a variation on Newtons iterative

    method, will determine a numerical approximation to the rootof . This method starts with two initial guesses, ,, and proceeds by forming a sequence iteratively as

    (30)The sequence can be stopped whenever is less

    than a pre-specified error tolerance. A plot of availability (19)will quickly identify a region where the maximum occurs, sothat the initial points , can be selected.To illustrate the foregoing, consider the system in the example

    of Fig. 6 with four distinct operational stages, i.e., . Thetransition rates are , , ,

    , , and . Solving (28) via (30) givesan optimal availability of 0.99877, which occurs when

    , corresponding to a PM rate of .To help validate these results, we simulated the example of

    Fig. 6 using the MATLAB code shown in the Appendix. Thesame set of parameter values are used. For the PM rate, we tookthe analytical (optimum) result of solving (28) and (30); namely,

    . The simulation generates 10,000 cycles (returnsto the full up state UP), and accumulates all the up-time anddown-time incurred during all these cycles, returning the totalup-time divided by the total up-time plus total down-time asthe simulation estimate of steady state availability. We ran thissimulation 10 times (a total of 100,000 cycles), yielding a mean(average of the 10 simulation estimates) availability of 0.99875with a standard error (standard deviation of the 10 simulationestimates, divided by the square root of 10) of 0.000017, in ex-cellent agreement with the analytical optimum of 0.99877.

  • YIN et al.: OPTIMAL PREVENTIVE MAINTENANCE RATE FOR BEST AVAILABILITY WITH HYPO-EXPONENTIAL FAILURE DISTRIBUTION 357

    V. APPLICATIONSThe hypo-exponential distribution is a general model which

    can describe many widely-applied systems [16], including hotstandby, cold standby, and warm standby systems. This sectionwill demonstrate the methods derived above through these com-monly used systems.

    A. Standby Systems With Identical UnitsAn important special case that occurs frequently in fault tol-

    erant system design is the parallel system with identicalunits (i.e., one unit providing service with units as redun-dant, or standby units). Two cases are prevalent. One, called hotstandby, has all standby units switched on at all times. The other,named cold standby, has all standby units stitched off (hence areassumed will not fail) until they are called into service.These are easily handled by the general model developed in

    Sections III and IV. For the cold standby case, we have, ; and for the hot standby case, we have

    , . For these special cases,it is seen from (28) defining the optimal for availability that afurther reduction is possible. As before, we set . Forthe cold standby case, (28) becomes

    (31)

    where . The condition (27) for an optimal PMrate to exist becomes

    (32)

    and the availability becomes

    (33)

    For the hot standby case, (28) becomes

    (34)

    expression (27) becomes

    (35)

    and the availability becomes

    (36)

    Thus, in both cases, a given combination of satisfying(31) and (32) (or (34) and (35)) determines availability via (33)(or (36)) for any choice of . In Tables I and II, we present so-lutions to (31) and (34). For each value of through 10, arange of satisfying (32) and (35), respectively, is presented.

    TABLE IVALUES OF DEFINING THE OPTIMAL PM RATE FOR THE COLD STANDBY CASE

    For values of outside this range, PM cannot improve avail-ability, so would be optimal. These tables aid in designingoptimal PM for these types of systems.As an example of the use of these tables, let us revisit the

    example leading to Fig. 6, which is a hot standby case with, , , . As we stated earlier, the

    general equations and numerical methods of Section IV wereused to derive the optimal , correspondingto an optimal PM rate of . The optimal

  • 358 IEEE TRANSACTIONS ON RELIABILITY, VOL. 62, NO. 2, JUNE 2013

    TABLE IIVALUES OF DEFINING THE OPTIMAL PM RATE FOR THE HOT STANDBY CASE

    availability is 0.99877. Not knowing these results, suppose thata systems engineer wants a quick assessment of the optimal PMfor this design. Because this is a case of a parallel system withhot standby units, Table II can be applied. For this system, thevalues of and fall in the rangecovered by Table II, so PM can improve availability in an op-timal fashion. Note that had exceeded the value determinedby (35), which in this case is 0.336, PM could not improve theavailability for this system (i.e. is optimal). Interpolatingthe value of from the column with between the values0.01 and 0.02 in Table II yields , from which it fol-lows that , and ,all reasonably close to the more precise calculations using the

    methods of Section IV. Plugging in the value of interpolatedfrom the table into (36) gives the optimal availability approxi-mately as 0.99877, which matches exactly the more precise cal-culation to the precision reported. Now suppose we wish to re-peat the analysis with a different value of the transition rate, say

    . The previous estimate of does not change. Butnow we have , and . Themore precise methods of Section 4 yield . Usingthe new value of the transition rate (normalized by ) in (36)yields 0.99816 as the approximate optimal availability, whichalso agrees with the more precise calculation using the methodsof Section IV.

    B. Warm Standby Systems

    Consider a hybrid system where one unit is required to beoperating with the remaining units in a de-energized standbystatus, referred to as the warm standby system. The active unithas an exponentially distributed failure time with transition rate; while the warm standby units can fail with transition rate .Thus, in the discussions in Section III and IV can be describedas: , , , and

    , where . The optimal is obtained by solving(28), which can be re-expressed as

    (37)

    Denoting , and , then (28)becomes

    (38)

    To check the existence condition for an optimal PM rate, (27)becomes

    (39)

  • YIN et al.: OPTIMAL PREVENTIVE MAINTENANCE RATE FOR BEST AVAILABILITY WITH HYPO-EXPONENTIAL FAILURE DISTRIBUTION 359

    Fig. 7. Availability vs. PM rate for a system with warm-, hot- and cold- standbyunits , , , , and .

    and availability is derived as

    (40)

    Because (38) depends on both and , no single table (likeTables I and II) can be derived.As an example, suppose is 4, and let be 0.001, be

    0.0005. Thus, is 0.5. According to (39), the condition for anoptimal PM rate to exist is . If the previous re-pair rate is applied, i.e., , then thecondition is satisfied, and an optimal PM rate exists. Applyingthe secant method described in (30), the optimal PM rate for thewarm-standby case is 0.0036 per hour, which is a mean of about278 hours. The corresponding availability is 0.9991.A plot of the availability for the systems described in this

    section, under the conditions of warm-, hot- and cold-standby, isshown in Fig. 7. From this figure, see that, to achieve the sameavailability, the cold-standby system has the smallest PM ratedue to the assumption that cold standby units will not fail untilused.

    VI. CONCLUSIONSWe have analyzed a systemwith two types of outages (system

    failure, and PM), and a hypo-exponential time-to-failure distri-bution, for the purpose of finding the optimal periodic PM rateto achieve the best system availability. The hypo-exponentialdistribution is represented by decomposing the operational stateinto a fully UP state and degraded stages of operation sothat a continuous time Markov chain model can be used in theanalysis. In so doing, we have determined general conditions

    under which PM can improve availability for this system, andshown that under these conditions with other system parametersfixed, a unique optimal rate of PM exists. Moreover, we haveoutlined stable numerical methods for computing the optimalrate of PM, which is more efficient than Monte Carlo simulationmethods. Finally, we have applied the general model to somespecial cases of a parallel system with identical units in cold orhot standby, as well as a system with warm-standby units.For the two systems with hot or cold standby units, a fur-

    ther reduction of the number of parameters determining the op-timal PM has been developed, leading to the construction oftwo useful tables (one for the cold, and one for the hot standbycase) giving the optimal ratio of PM rate to unit failure rate asa function of the product of the repair rate and the mean timeto perform PM. These two tables, developed for parallel sys-tems with units, provide the systems engineer away to quickly determine whether PM can improve availability,and if so, quickly compute the optimal PM rate and optimalavailability.We note that the optimum PM rate is sensitive to the values of

    the other systems parameters, as demonstrated in the examplescorresponding to Figs. 2 and 3 above. The methods developedhere for computing the optimum PM rate are stable, relativelyefficient, and easy to implement, making any needed sensitivityanalysis straight forward to compute.Having determined an optimum rate of PM, we have in ef-

    fect solved an optimal cost problem. We have found the rate ofPM that minimizes the maintenance cost, where cost is mea-sured solely in terms of long run average down time betweensystem renewals. We acknowledge that there are also other eco-nomic costs to performing PM that may also be worthwhile toconsider. For example, one could consider finding the optimumrate of PM subject to various economic constraints on mainte-nance. Moreover, in practice there is often considerable uncer-tainty in setting values for the fixed parameters in this model(failure or degradation rates, and repair rates). For this reason,it may be useful to implement other optimization methods thatwould allow the development of confidence bounds on the op-timum PM rate that reflect these uncertainties. We leave theseas areas for future research.Finally, it is clear from the work in [2] that all the results of

    this paper remain true if the PM duration, and repair time havegeneral distributions with means , and , respectively.

    APPENDIXMATLAB SIMULATION CODE FOR THE EXAMPLE OF FIG. 6

    %Simulation for the example of Fig. 6:

    ; ; ;;

    ; ; ; %optimumdelta from (22) & (23)

    % states: 0 full up; 1 degraded; 2 degraded;

    % 3 degraded; 4 failed (down); 5 prev. maint.(down)

    ; %successive visits tostate 0

  • 360 IEEE TRANSACTIONS ON RELIABILITY, VOL. 62, NO. 2, JUNE 2013

    ; ;

    ; %initial state

    ;

    while cycles totalcycles

    if

    ;

    ;

    if

    ;

    else

    ;

    end

    end

    if

    ;

    ;

    if

    ;

    else

    ;

    end

    end

    if

    ;

    ;

    if

    ;

    else

    ;

    end

    end

    if

    ;

    ;

    if

    ;

    else

    ;

    end

    end

    if

    ;

    ;

    ;

    end

    if

    ;

    ;

    ;

    end

    ;

    if

    ;

    end

    end

    ;

    REFERENCES[1] W. Li, Risk Assessment of Power Systems: Models, Methods, and Ap-

    plications. New York, NY, USA: Wiley-IEEE, 2005, 0-471-63168.[2] J. Moubray, Reliability-Centered Maintenance. New York, NY,

    USA: Industrial Press, 1997, 0-8311-3146-2.[3] P. Hilber, V. Miranda, M. A. Matos, and L. Bertling, Multiobjective

    optimization applied to maintenance policy for electrical networks,IEEE Trans. Power Syst., vol. 22, no. 4, pp. 16751682, Nov. 2007.

    [4] M. Stopczyk, B. Sakowicz, and G. J. Anders, Application of a semi-Markov model and a simulated annealing algorithm for the selectionof an optimal maintenance policy for power equipment, Int. J. Rel.Safety, vol. 2, no. 1/2, pp. 129145, 2008.

    [5] F. Yang and C. S. Chang, Multiobjective evolutionary optimizationof maintenance schedules and extents forcomposite power systems,IEEE Trans. Power Syst., vol. 24, no. 4, pp. 16941702, Nov. 2009.

    [6] S. H. Sim and J. Endrenyi, Optimal preventive maintenance with re-pair, IEEE Trans. Rel., vol. 37, no. 1, pp. 9296, Apr. 1988.

    [7] D. Chen and K. S. Trivedi, Analysis of periodic preventive mainte-nance with general component failure distribution, presented at thePacific Rim Int. Symp. Dependable Comput. (PRDC 2001), Seoul,Korea, Dec. 2001.

    [8] J. Endrenyi, G. J. Anders, and A. M. Leite da Silva, Probabilistic eval-uation of the effect of maintenance on reliabilityAn application,IEEE Trans. Power Sys., vol. 13, no. 2, pp. 576583, May 1998.

    [9] B. D. Chen and K. S. Trivedi, Closed-form analytical results for con-dition-based maintenance, Rel. Eng. Sys. Safety, vol. 76, pp. 4351,2002.

    [10] A. H. Shirmohammadi, Z. G. Zhang, and E. Love, A computationalmodel for determining the optimal preventive maintenance policy withrandom breakdowns and imperfect repairs, IEEE Trans. Rel., vol. 56,no. 2, pp. 332339, Jun. 2007.

    [11] C. Singh and R. Billinton, System Reliability Modeling and Evalua-tion, Hutchinson Educational. London, U.K., 1977 [Online]. Avail-able: http://www.ece.tamu.edu/People/bios/singh/sysreliability

    [12] S. M. Ross, Introduction to Probability Models, 9th ed. Hoboken, NJ,USA: Wiley, 2007, pp. 298301.

    [13] J. Van Casteren, Assessment of Interruption Costs in Electric PowerSystems using the Weibull-Markov Model, Dissertation [Online].Available: http://webfiles.portal.chalmers.se/et/PhD/VanCasteren-JasperPhD.pdf

    [14] R. E. Barlow and R. Proschan, Mathematical Theory of Reliability,in SIAM, Philadelphia, 1996, 978-0-898713-69-5.

    [15] Y. Cao, H. Sun, J. Han, and K. Trivedi, System availability with non-exponentially distributed outages, IEEE Trans. Reliability, vol. 51, no.2, pp. 193198, 2002.

    [16] K. S. Trivedi, Probability Statist. Rel., Queuing Comput. Sci. Appl.,2nd ed. : Wiley, 2002, 0-471-33341-7.

    Meng-Lai Yin is a Professor of Electrical and Computer Engineering at theCalifornia State Polytechnic University, Pomona. She received her MS, andPh.D. degrees in Information and Computer Science from the University of Cal-ifornia, Irvine, in 1989, and 1995, respectively. She also holds a Master degreein Electrical and Computer Engineering from National Cheng-Kung University,Taiwan. She has years of industrial experience at Hughes Aircraft and Raytheon.Her research interests include performance and reliability analysis, embeddedsystems, and parallel processing.

  • YIN et al.: OPTIMAL PREVENTIVE MAINTENANCE RATE FOR BEST AVAILABILITY WITH HYPO-EXPONENTIAL FAILURE DISTRIBUTION 361

    John E. Angus has been a Professor of Mathematics in the School of Math-ematical Sciences at Claremont Graduate University since 1990. He receivedhis MA in Mathematics from UCLA in 1977, and his MS and Ph.D. in Statis-tics from the University of California at Riverside in 1981. After receiving hisBA in Mathematics from the University of San Diego in 1975, he worked forHughes Aircraft Company as a Systems Engineer until 1990, and has been anactive consultant to Raytheon since 1996 in system engineering and algorithmdevelopment. His research interests include survival analysis, applied proba-bility, and statistics.

    Kishor S. Trivedi holds the Hudson Chair in the Department of Electrical andComputer Engineering at Duke University, Durham, NC. He has been on theDuke faculty since 1975. He is the author of a well known text entitled, Prob-ability and Statistics with Reliability, Queuing and Computer Science Appli-cations, published by Prentice-Hall; a thoroughly revised second edition (in-cluding its Indian edition) of this book has been published by JohnWiley. He hasalso published two other books entitled, Performance and Reliability Analysis ofComputer Systems, published by Kluwer Academic Publishers and QueueingNetworks and Markov Chains, John Wiley. He is a Fellow of the Institute ofElectrical and Electronics Engineers. He is a Golden Core Member of IEEEComputer Society. He has published over 490 articles, and has supervised 44Ph.D. dissertations. He is the recipient of the IEEE Computer Society TechnicalAchievement Award for his research on Software Aging and Rejuvenation. Hisresearch interests are in reliability, availability, performance, and survivabilityof computer and communication systems, and in software dependability. Heworks closely with industry in carrying out reliability and availability analysis,providing short courses on reliability, availability, and in the development anddissemination of software packages such as SHARPE, SREPT, and SPNP.