Tutorial: Petri nets as a graphical description medium for many reliability scenarios

IEEE TRANSACTIONS ON RELIABILITY, VOL. 50, NO. 2, JUNE 2001 159

Tutorial: Petri Nets as a Graphical DescriptionMedium for Many Reliability Scenarios

Winfrid G. Schneeweiss, Senior Member, IEEE

Abstract—Summary & Conclusions—One of the basic problemsof dependability modeling is the adequate abstraction of real-worldtechnological problems to the principle terms of reliability/safetyscenarios. This concerns primarily the definition of componentsand of faults on different system levels. Given these terms the rest ofany modeling needs basic logical operations, mostly AND and OR(see fault tree analysis [14]), and delays, and often some counting(of time or events).

All of these basic operations are offered by a fairly simple kindof Petri nets (PN) [1]–[3], i.e., timed stochastic Petri nets, allowingalso for the modeling of immediate activities and of such with adeterministic delay.

In this half-tutorial paper it is shown how such Petri nets mod-eling, i.e., the construction of the relevant nets, works in practice.No math will be needed for that. Still several typical engineeringvirtues are needed; primarily imagination as to how to i) findsimple solutions, since often nonelegant solutions can be correcttoo, ii) compose larger PN from elementary building blocks, andiii) the ability to model the real world by interpreting the so-calledtokens of a PN intelligently (and differently!) in different places ofone and the same PN.

In the Appendix it will be shown how PN can also help in the an-alytical analysis of nonrepairable systems. In that context they aresuperior to state graphs since they show state durations explicitly.

Index Terms—Cost/benefit, dependability, maintenance, modu-larization, Petri net.

ACRONYMS1

FCFS first-come-first-served priorityPN Petri net

NOTATION

constants: deterministic firing delaysrandom firing delay; also down-time of component

an arbitrarily small positive reallife time (up-time) of component‘place’ of a Petri netrepair time (part or all of the down time) of compo-nentindex for systemindex for switch for activating sparesfiring delay of a timed transition‘transition’ of a Petri net

Manuscript received October 12, 1999; revised December 22, 1999.Responsible Editor: W. KuoW. G. Schneeweiss is with the FernUniversität, Postfach 940, D-58084

Hagen, Fed. Rep. Germany (e-mail: [email protected]).Publisher Item Identifier S 0018-9529(01)09549-5.

1The singular & plural of an acronym are always spelled the same.

NOTATION (for Appendix only)

componentpdf probability density functionCdf cumulative distribution function

Cdf, pdf of

random eventindicator of :

for good

for bad

I. INTRODUCTION

Motivation and Overview

PN of different types useful for dependability, reliability,safety modeling are discussed amply in recent textbooks,e.g., [4]–[6], and in articles & conference papers, typically in[7]–[11]. Despite such well-meant activities it is my strongfeeling that PN are not yet used to the extent to which theycan be used. This observation has recently resulted in theshort monograph (textbook) [3]. This paper is a digest of themore important ideas of [3] with novel PN pictures. Becausereaders are addressed who are known to distrust mathematicalformalisms, I avoid mathematical rigor, and concentrate onbasic ideas. Only in the Appendix is it demonstrated how PNcan also assist analytic modeling (beyond Markov modeling).

Section II is a very short introduction to PN. See [2] for morerigor and many more details.

Section III deals with the systematic design of PN for small(few) components, systems.

Section IV treats the modeling of maintenance.Section V is concerned with cost/benefit aspects of PN mod-

eling.Section VI discusses the usefulness of a special pattern, viz,

a chess board pattern, of PN places and PN transitions.The Appendix shows how life-time distributions of systems

without repairs can be determined more easily via PN than viastate graphs.

II. BASIC PN DEFINITIONS

A PN is a directed-graph (digraph) with two types of nodesin which abstract objects (tokens), drawn as bold-face dots, aremoving or are created or are vanishing. Tokens are stored in‘places’: one of the two types of PN nodes. Operations on to-kens, including their delay, happen in ‘transitions’: the othertype of PN nodes. Details follow from the ‘switching rule’;

0018–9529/01$10.00 © 2001 IEEE

160 IEEE TRANSACTIONS ON RELIABILITY, VOL. 50, NO. 2, JUNE 2001

Fig. 1. Switching rule of a PN.

Fig. 2. Modeling basic systems: (a) A repairable unit (with a cycling token,jumping from any place to the next one at random times defined by the samplevalues ofL andD). (b) A 1-Out-of-2:G system with a cold spare.

see Fig.1, where round nodes are ‘places’ and square nodes are‘transitions.’

Switching rule of PN: If all input of a aremarked (contain at least 1 token each) then this transition is‘enabled’ and switches (‘fires’) after a delay . On firing,the number of tokens in each

• input place of is decreased by 1,• output place of is increased by 1.

From this definition, there follows almost everything else inPN modeling. A brief review is:

1) A transition with its neighboring places is for each outputplace a logical AND with a certain delay, the transition’s‘switching delay’.

2) The change of the marking, defining a kind of local flowof tokens, adds certain properties, to the PN, not foundin digital electronics switching networks. Typically thenumber of tokens of a PN can change on the switchingof certain of its transitions.

Fig.2 shows the first applications of PN modeling of depend-ability scenarios.

Fig.2(a) shows the typical cycle of a repairable unit. Fig.2(b)shows a 1-out-of-2:G (duplex) system with a cold spare wherethe token moves at time to and at time to . Herethe PN switching rule is used only in a trivial form, because thetransitions have at most 1 input. (Even though PN are almostself-explanatory, a newcomer should study the meaning of thePN in Fig.2.)

The marking in Fig.2 concerns a) the beginning of an up-timeperiod (length ), and b) the beginning of system life.

It is typical of PN modeling that more realistic aspects ofsystems can be easily added to a first, rough design. For Fig.2(a),the addition of periodic checks (everyunits of time) prior tothe beginning of repairs is modeled in Fig.3.

Fig. 3. A repairable unit with checks everyT units of time.

Fig. 4. 1-out-of-2:G system with a cold spare and a fallible switch (with lifeL ) to activate the spare.

In the upper center of Fig.3, an auxiliary type of edge, in-hibit(or) edge, is introduced. It shows a tiny circle instead of anarrow head. If the place from which an inhibit edge originatesis marked, then the transition that the edge points at is blocked:it cannot be enabled, or its delay time does not diminish afterenabling.

With the inhibit edge known, in Fig.3 one token each is cy-cling in the left-hand and in the right-hand parts of the PN.

In Fig.2(b) the fallible switch to activate the spare is modeledin Fig.4. This way the concept of coverage [4] can be modeledexplicitly.

In such simplistic modeling there is no single PN place forsystem up. In Fig.4, the system is up as long as eitheroror is marked. When the system is down, then the switch ismodeled as being neither up nor down, which might be awk-ward. (This problem can be easily solved.)

III. D ESIGNING PN FOR SMALL SYSTEMS

Even though many systems are large [3], the design of smallsystems must be fully understood, because small systems aretypical building blocks (modules) of large systems.

With repairable systems, one begins with the componentcycles of Fig.2(a) or Fig.3, and adds the PN versions of boththe fault-tree & success-tree. This is demonstrated with the2-out-of-3:G system (for independent repairs); see Fig.5.

The typical Boolean functions of the fault-tree & success-treeof the 2-out-of-3:G/F system are visible in the lower and upperthirds of Fig.5, respectively.

It is a characteristic of PN modeling that the cycling of the to-kens of single units is practically not disturbed by the system’sswitching from up to down and vice versa. (The tokens for the

SCHNEEWEISS: TUTORIAL: PETRI NETS AS A GRAPHICAL DESCRIPTION MEDIUM 161

Fig. 5. Complete detailed PN of a repairable 2-out-of-3 system.

Fig. 6. A 1-out-of-3:G system with cold spares and a fallible switch foractivating spares.

latter switching are returned immediately thereafter, and the mo-mentary absence of a token does not mean the selection of a newsample of the delay time of the following transition.)

The long vertical edges in Fig.5 are needed to keep the systemalways in precisely 1 of its 2 states. Thus for higher-level mod-eling, Fig.5 can be readily replaced by Fig.2(a).

As for nonrepairable systems there is likewise often the needto always have 1 place for system-up or 1 place for system-downmarked. Fig.6 shows a 1-out-of-3:G system with cold standbyand a (single) fallible switch for activating spares. (This is a rea-sonable expansion of Fig. 4.) The analytic analysis of such sys-tems is discussed in [13]; a digest of that analysis is in AppendixA.3.

Considerable effort is needed to make the system a truemodule with a single place for system up and for systemdown. The -delay transitions are used in this context to modelsystem failure, if a further spare cannot be activated.is needed; results in switching conflicts as to whichtransition should fire next. For realistic modeling a very small

is desirable. Similar to the edge in the upper right corner

Fig. 7. Repairs of component 2 are suspended during the down-time ofcomponent 1.

Fig. 8. Repair of 3 components, one at a time.

of Fig.4, in Fig.6 one could add edges from the place ‘switchdown’ to the -delay transitions. In that case is allowed.

Since the details of the design of large systems are discussedin [3], the next section discusses maintenance scenarios.

IV. M ODELING MAINTENANCE

Complicated maintenance implies, in general, dependenciesbetween the maintenance activities of various components.

A repairable 2-components system with repair-priority ofcomponent 1 (with respect to component 2) is modeled in Fig.7by using an inhibit edge. (How to avoid the latter is shown in[3].)

Fig.8 shows a 3-component system with a single repair fa-cility (or team).

There is FCFS priority. The upper half of Fig.8 shows, ap-proximately, 3 times the cycle of Fig.2(a).

With there could be some switching conflict if 2 com-ponents fail during repair of component #3. Since this situationdoes not happen often with high-quality systems, this is only aweak priority of components with the smaller index.

For a simple model-check one can use the fact that during arepair the number of tokens in this system is 3; otherwise it is 4.

V. MODELING COST/BENEFIT ASPECTS

Due to the ability of PN to count events, it is quite feasibleto measure time, and, via time, to measure many costs and/orbenefits of system operation. Fig.9 shows the ‘measurement’ ofthe total, viz, summed up-time of a repairable unit. There is ofcourse a small quantization error due to the discrete countingoperation.


Fig. 9. Measuring the total up-time of a unit.

Fig. 10. A simple reward-model.

Fig. 11. Benefit possible only for joint operation of 2 units.

For more details on modeling time, in PN, in the context ofdependability modeling, see [12].

The case of benefit during operation, and cost during repair,both proportional to (calendar) time, is modeled for a repairablesystem in Fig.10. Hopefully, in practice, .

Of course, precautions must be observed, lest the place fornet profit in the left upper corner of Fig.10 becomes empty.

Fig.11 models the case of benefit originating only from thejoint operation of 2 units, and of repair costs different for eachof these units. Here again the model needs a positive net profit towork properly. However, would a bankrupt system be repaired?

There is an interesting use of ‘negative’ logic via 2 inhibit-edges in this context. Only as long as both components are up,is the ‘flow’ of tokens from to allowed. (Any component’sdown-state blocks this flow.) The reader is invited to design a PNfor positive logic in this case.

VI. USEFULNESS OFCHESSBOARD-TYPE PICTURES

From the basic switching rule, it follows that only edges be-tween places & transitions, and vice versa, make sense. Begin-ners are helped to avoid the mistake of not following this rule byarranging both types of PN nodes alternatingly such that an edgebetween a place and a transition is the most reasonable thing tobe drawn, if at all. This covers only the shortest edges possible.But the majority of the edges of Figs.1–11 is of this type.

This paper demonstrates the extreme usefulness of the chessboard pattern of PN nodes. The PN pictures readily reveal moredetails than are discussed explicitly in the text. This paper wasintended to create appetite for studying [3]. Contact the authorfor ways to get a copy of [3] at a moderate price.

APPENDIX

PN AIDS FORFINDING LIFE-TIME DISTRIBUTIONS

With nonrepairable systems the distribution of life-time isthe basic information for any in-depth reliability/dependabilityanalysis. Several practical examples show how PN can help inthis kind of analysis. Always consider, , ending betweenand . The corresponding pdf is . This appendixshows how PN can help in identifying mutually exclusive sub-events, such that is the arithmetical sum of their prob-abilities.

The pdf of the sum of two-independent nonnegative randomtimes, & , is:

(1)

A. 1-Out-of-2:G System With Cold-Standby and a FallibleSwitch for Activating the Spare

In Fig.4, there are 2 paths ending in the system-down place,. Because the movements on both paths correspond to mu-

tually exclusive random events, there are at least two additiveterms of the pdf of system life. The one associated with the upperpath of Fig.4 concerns the case of the switch failing beforefails; the lower path in Fig.4 is viable only under the conditionof the switch being usable upon the failure of, which is mod-eled by a token being in . Thus,

(2)

The second term is an obvious extension of (1).

B. A Generator Backed-Up by a Fallible Battery

There is a 2-component system where 1 component has 2failure modes. The latter component is the battery which hasa true failure mode, but ‘fails’ also for the ‘work’ it is supposedto do by being discharged after the generator failure.&are indexes for Battery & Generator, respectively;is the dis-charging (unloading) time of the battery. From the paths endingin the system-down place, , of Fig.12(a),

(3)

• The first term in (3) covers the token in wandering alongthe uppermost path, which is possible when the token in

reaches too late for to be enabled.

SCHNEEWEISS: TUTORIAL: PETRI NETS AS A GRAPHICAL DESCRIPTION MEDIUM 163

Fig. 12. (a) PN of a generator,G, backed-up by battery,B; (b) Little addendumto clearp , if needed.

• The second term in (3) corresponds to the path fromto, passing .

• The third term’s path is symmetrical to that of the secondterm. (This does not imply an identical math term.)

With terms 1–3:

respectively.In the end, there might or might-not be a token in. In

Fig.12(b), is used to clear of any token, once is marked.Adding and its incident edges is therefore: ‘orderly’ PN de-sign.

C. 1-out-of-3:G System With Cold Standby and a FallibleSwitch for Activating Spares

First, extend the example of Section A.1. Fig.6 is a reasonablycompact picture of a PN for this system. (The alternative with

is discussed now.) From the 3 inputs of the system-downplace,

(4)

(As a check is readily confirmed for all terms.)

• The first term of (4) corresponds to the upper input of,where the switch fails prior to .

• The second term corresponds to the lower input of,where the switch fails between and , the proba-bility of which is (for and ),

.

Fig. 13. State graph up to and including the first faulty states of the PN of Fig.6; C is the switch.

• The last term of (4) corresponds to the bold-face path( ), where the switch activates both& .

An example of a 2-fold convolution is:

(5)

The switch survives the activation of .To check (5), construct the state graph; see Fig.13, which

contains all the needed information. The state number is:. It becomes obvious how much easier (4) could be

found from the PN of Fig.6, where state durations are explicitlygiven inside of the transition’s boxes.

Elaborating:

• State 1001 corresponds to the first term of (4).• State 1101 corresponds to the second term.• State 1110 corresponds to .• State 1111 corresponds to:

the term in equals.

This reconfirms the compactness of (4).For more advanced analytic & Monte Carlo simulation-type

modeling consult [4]–[6].

REFERENCES

[1] T. Agerwala, “Putting Petri nets to work,”IEEE Computer, pp. 85–94,Dec. 1979.

[2] W. Reisig,Petri Nets: Springer, 1985.[3] W. Schneeweiss,Petri Nets for Reliability Modeling (in the Field of En-

gineering Safety and Dependability): LiLoLe-Verlag, 1999.[4] R. Sahner, K. Trivedi, and A. Puliafito,Performance and Reliability

Analysis of Computer Systems: Kluwer, 1996.[5] F. Bause and P. Kritzinger,Stochastic Petri Nets: Vieweg, 1996.[6] M. A. Marsanet al., Modeling with Generalized Stochastic Petri Nets:

Wiley, 1995.[7] M. Malhotra and K. Trivedi, “Dependability modeling using Petri nets,”

IEEE Trans. Reliability, vol. 44, no. 3, pp. 428–440, Sept. 1995.


[8] G. Ciardo, J. Muppala, and K. Trivedi, “SPNP stochastic Petri netspackage,” inProc. Int’l Workshop on Petri Nets & Performance Models,1989, pp. 142–150.

[9] A. Bobbio and M. Telek, “Computational restrictions for SPN withgenerally distributed transition times,” inProc. EDCC-1, Echtle andHammer, Eds, 1994, pp. 131–148.

[10] G. Ciardo, A. Blakemore, and P. Chimentoet al., “Automated generationand analysis of Markov reward models using stochastic reward nets,”in Linear Algebra, Markov Chains, and Queuing Models, Meyer andPlemmons, Eds: Springer, 1993, vol. 48 in IMA Math. & Appl.

[11] G. Chiola, “A software package for the analysis of generalized Petrinets,” presented at the Proc. Int’l Workshop on Timed Petri Nets, 1985.

[12] W. Schneeweiss, “Petri nets for modeling timeliness,” inProc.ESREL’98, 1998, pp. 609–615.

[13] , “Life Time Distributions via Petri Nets and State Graphs,” FernUniversity, Informatik-Bericht (Technical Report) 256, 1999.

[14] W. Schneeweiss,The Fault Tree Method (in the Field of Reliability andSafety Technology): LiLoLe-Verlag, 1999.

W. Schneeweissholds a Dipl. in physics (1957) from Frankfurt University, a Dr.rer. nat. (1968) from the Technical University of Munich, a Dr. habil. in com-puter science (1973) from Karlsruhe University. He worked as a computer engi-neer with Max-Planck Society, as a control engineer with the German AerospaceEstablishment (DLR), and as a senior reliability engineer with Siemens. From1977–1999 he was a Professor of Computer Science at the German Universityfor Distance Studies, working mainly on dependability modeling. He has pub-lished about 150 papers & reports (10% with coauthors) and 11 books. The mostrecent books areThe Fault Tree Method& Petri Nets for Reliability Modeling.He is a Senior Member of IEEE, and a member of ITG/VDE, GMA (VDI/VDE),and GI (the German version of ACM).

Documents

Tutorial: Petri nets as a graphical description medium for many reliability scenarios