20
Distributed Diagnosis of Discrete-Event Systems Using Petri Nets ? Sahika Genc and St´ ephane Lafortune Department of Electrical Engineering and Computer Science, University of Michigan, 1301 Beal Avenue, Ann Arbor, MI, 48109-2122 USA {sgenc,stephane}@eecs.umich.edu; www.eecs.umich.edu/umdes Abstract. The problem of detecting and isolating fault events in dy- namic systems modeled as discrete-event systems is considered. The modeling formalism adopted is that of Petri nets with labeled transi- tions, where some of the transitions are labeled by different types of unobservable fault events. The Diagnoser Approach for discrete-event systems modeled by automata developed in earlier work is adapted and extended to on-line fault diagnosis of systems modeled by Petri nets, re- sulting in a centralized diagnosis algorithm based on the notion of “Petri net diagnosers”. A distributed version of this centralized algorithm is also presented. This distributed version assumes that the Petri net model of the system can be decomposed into two place-bordered Petri nets satis- fying certain conditions and that the two resulting Petri net diagnosers can exchange messages upon the occurrence of observable events. It is shown that this distributed algorithm is correct in the sense that it re- covers the same diagnostic information as the centralized algorithm. The distributed algorithm provides an approach for tackling fault diagnosis of large complex systems. 1 Introduction The problem of detecting and isolating faults in technological systems has re- ceived considerable attention due to its importance in terms of safety and effi- ciency of operation. A variety of complementary approaches have been proposed, based on the level of detail chosen for the model of the system and the kinds of faults that need to be diagnosed; see, e.g., [1]. In this paper, we consider techno- logical systems that can be modeled at some level of abstraction as discrete-event dynamic systems [2]. This includes a wide variety of technological systems such as automated manufacturing systems, communication networks, heating, venti- lation, and air-conditioning units, process control systems, and power systems. Faults are modeled as unobservable events, namely, events whose occurrence is not directly detected by the sensors. Rather, the occurrence of fault events must be inferred from the system model and future observations of the evolution of ? This research is supported in part by NSF grant ECS-0080406.

Distributed Diagnosis of Discrete-Event Systems Using ... · distributed algorithm provides an approach for tackling fault diagnosis of large complex systems. ... ventilation, and

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

  • Distributed Diagnosis of Discrete-Event SystemsUsing Petri Nets ?

    Sahika Genc and Stéphane Lafortune

    Department of Electrical Engineering and Computer Science,University of Michigan,

    1301 Beal Avenue, Ann Arbor, MI, 48109-2122 USA{sgenc,stephane}@eecs.umich.edu; www.eecs.umich.edu/umdes

    Abstract. The problem of detecting and isolating fault events in dy-namic systems modeled as discrete-event systems is considered. Themodeling formalism adopted is that of Petri nets with labeled transi-tions, where some of the transitions are labeled by different types ofunobservable fault events. The Diagnoser Approach for discrete-eventsystems modeled by automata developed in earlier work is adapted andextended to on-line fault diagnosis of systems modeled by Petri nets, re-sulting in a centralized diagnosis algorithm based on the notion of “Petrinet diagnosers”. A distributed version of this centralized algorithm is alsopresented. This distributed version assumes that the Petri net model ofthe system can be decomposed into two place-bordered Petri nets satis-fying certain conditions and that the two resulting Petri net diagnoserscan exchange messages upon the occurrence of observable events. It isshown that this distributed algorithm is correct in the sense that it re-covers the same diagnostic information as the centralized algorithm. Thedistributed algorithm provides an approach for tackling fault diagnosisof large complex systems.

    1 Introduction

    The problem of detecting and isolating faults in technological systems has re-ceived considerable attention due to its importance in terms of safety and effi-ciency of operation. A variety of complementary approaches have been proposed,based on the level of detail chosen for the model of the system and the kinds offaults that need to be diagnosed; see, e.g., [1]. In this paper, we consider techno-logical systems that can be modeled at some level of abstraction as discrete-eventdynamic systems [2]. This includes a wide variety of technological systems suchas automated manufacturing systems, communication networks, heating, venti-lation, and air-conditioning units, process control systems, and power systems.Faults are modeled as unobservable events, namely, events whose occurrence isnot directly detected by the sensors. Rather, the occurrence of fault events mustbe inferred from the system model and future observations of the evolution of

    ? This research is supported in part by NSF grant ECS-0080406.

  • the system. This is often referred to as “model-based diagnostics.” The faults ofinterest are those that cause a distinct change in the operation of the system butdo not necessarily bring it to a halt. Examples of such faults include: equipmentfaults (e.g., stuck faults of valves, stalling of actuators, bias faults of sensors,controller faults, and degraded or worn-out components), as well as many typesof process faults (e.g., overflow of buffers in manufacturing and communicationnetworks, contamination in semiconductor manufacturing, and control softwarefaults).

    The discrete-event modeling formalism adopted in this paper is that of Petrinets with labeled transitions, where some of the transitions are labeled by dif-ferent types of unobservable fault events. Our objective is to adapt and extend,in the context of Petri net models, a recently-proposed approach for fault di-agnosis of discrete-event systems modeled by finite-state automata, termed the“Diagnoser Approach”; see [3] and the references therein, including [4, 5]. Thatapproach has been used successfully in a variety of application areas, includ-ing heating, ventilation, and air-conditioning units [6], intelligent transportationsystems [7, 8], document processing systems [9, 10], and chemical process control[11]. In the Diagnoser Approach, a diagnoser automaton, or simply diagnoser, iscontructed from (i) the finite-state automaton model of the discrete-event sys-tem, (ii) the set of unobservable events, (iii) the set of fault events, and (iv) thepartition of the set of fault events into fault types. The states of the diagnosercontain information about the possible occurrence of faults, according to the sys-tem model. The diagnoser is then used for on-line fault diagnosis of the system asfollows. Each observable event executed by the system triggers a state transitionin the diagnoser. Examination of the current diagnoser state reveals the status ofthe different types of faults: fault(s) of Type F1 did not occur, fault(s) of TypeF1 possibly occurred (“F1-uncertain state” in the terminology of [4]), fault(s)of Type F1 occurred for sure (“F1-certain state” in the terminology of [4]). It isthis capability of diagnosers that we wish to extend to Petri net models of thesystem. Diagnosers can also be used to analyze the diagnosability properties ofthe system (“Can all fault types eventually be detected?”), but this aspect isnot considered in this paper.

    There are many reasons for extending the Diagnoser Approach to Petri nets.Our primary motivation is to take advantage of the modularity of Petri netmodels and thereby propose a modular/distributed version of the DiagnoserApproach that can help in mitigating the state space explosion problem thatoften occurs in discrete-event modeling of complex systems. Consequently, thecontribution of this paper is two-fold. First, a centralized diagnosis algorithmbased on the novel notion of “Petri net diagnosers” is presented in Section 2for on-line diagnosis of systems modeled by Petri nets. The Petri net diagnoserassociated with a Petri net has the same graphical structure as the Petri net buthas a different state transition function. In addition, Petri net diagnosers includeseveral markings of the net at any given time, corresponding to the notion in theDiagnoser Approach that the state of the diagnoser is a form of state estimate(of the system) together with fault type information. (Due to the simultaneous

  • presence of different markings in Petri net diagnosers, we can think of them asa special kind of colored Petri nets.)

    The second contribution of this paper is to present, in Section 3, a distributedversion of the above-mentioned centralized algorithm. This distributed versionassumes that the Petri net model of the system can be decomposed into twoplace-bordered Petri nets satisfying certain conditions. Moreover, it is assumedthat the two resulting Petri net diagnosers can exchange messages upon theoccurrence of observable events. A method to decompose the system into place-bordered nets is given, if such a decomposition is necessary. We refer the reader to([12, 13]) for modular modeling methodologies that result in place-bordered nets.We show that our distributed algorithm is correct in the sense that it recoversthe same diagnostic information as the centralized algorithm. The distributedalgorithm provides an approach for tackling fault diagnosis of large complexsystems, in particular networked systems where the different system modules areconnected by a communication network.

    To the best of our knowledge, the present paper is the first to explore the ex-tension of the Diagnoser Approach originally proposed in [4] to Petri net models.However, there has been prior work on the general problem of monitoring andfault diagnosis of dynamic systems using Petri net models. We mention in thisregard the work done at IRISA/INRIA on alarm supervision in telecommunica-tion networks [14, 15] and the work done on detection of loss or creation of tokensin nets using matrix algebraic techniques in [16]. Our problem formulation andobjective however differ from those in [14–17]. They also differ from the work onobservability of Petri nets in [18].

    The remainder of this paper is organized as follows. Section 2 starts by pre-senting our notation for labeled Petri nets and then presents the centralizedalgorithm for on-line diagnosis of dynamic systems using Petri net diagnosers.The distributed version of the centralized algorithm, termed Algorithm DDC for“distributed diagnosis with communication” is presented in Section 3. AlgorithmDDC consists of two communicating Petri net diagnosers, whose respective statescan be “merged” (in a technical sense made precise in Section 3) to recover thestate of the corresponding centralized Petri net diagnoser. An illustrative ex-ample is used throughout the paper and conclusions are presented in Section4.

    2 Centralized Diagnosis Using Petri Net Diagnosers

    In this section, we define the notion of a centralized Petri net diagnoser, orsimply diagnoser, which is used as a tool to detect and isolate faults in thesystem. The system to be diagnosed is modeled by a labeled Petri net. Thecentralized diagnoser observes the system and determines the states the systemcan be in upon observation of an event. Note that upon observation of an event,the state of the system is not known exactly in general due to the presence ofunobservable events in the set of transition labels. The Petri net diagnoser findsall the states the system can be in, namely, all the states that are consistent with

  • the sequence of observable events seen thus far. Fault information is attached tothese state estimates in the from of fault labels. The faults are explicitly modeledas events in the system. Figure 1 gives a block diagram of the system and itsdiagnoser interacting with each other (the notation in the figure is introducedbelow in Section 2.1 and 2.2).

    N Nd

    s So oe F

    i

    System Model Diagnoser

    ObservableEvent

    FailureType

    Fig. 1. Centralized diagnosis

    This section first defines how the system and the diagnoser are modeledand gives their graphical representation. Then, we define the dynamics of thediagnoser. Although the diagnoser is modeled as a labeled Petri net graphically,its state transition function and states differ from regular Petri nets. We concludethe section by an example that builds the diagnoser and finds some of its states.

    2.1 System Model

    A Petri net graph 1 is a weighted bipartite graph

    N = 〈P, T, A,w〉 (1)where P is the finite set of places, T is the finite set of transitions, A ⊆ (P ×T )∪ (T ×P ) is the set of arcs from places to transitions and from transitions toplaces, and w : A → Z+ is the weight function on the arcs. In a Petri net graphN , given t ∈ T we denote by I(t) = {p ∈ P : (p, t) ∈ A} the set of input placesto transition t, and similarly we denote by O(t) = {p ∈ P : (t, p) ∈ A} the set ofoutput places of t.

    A marking of a Petri net graph is a mapping x : P → N. A state is repre-sented by x = [x(p1), x(p2), . . . x(pn)], where p1, p2, . . . , pn is an arbitrary fixedenumeration of P and n is the number of elements of P . A Petri net is a pair(N , x0), where N is Petri net graph and x0 is the initial state. The state spaceof (N , x0) is given by X = Nn and x0 ∈ X. The state transition functionf : X × T → X of a Petri net (N , x0) is defined for state x ∈ X and transitiont ∈ T if x(p) ≥ w(p, t) for all p ∈ I(t). That is, a transition t can fire from x ifand only if t is feasible from x and when t fires, f(x, t) gives the resulting state.If f(x, t) is defined, then we set x′ = f(x, t) where

    x′(p) = x(p)− w(p, t) + w(t, p), for all p ∈ P. (2)Not all the states in X are reachable in (N , x0). In order to define the set of

    reachable states, denoted by R(N , x0), of Petri net (N , x0), we first extend the1 The notation and terminology used in this paper mostly follow those in [2, 19].

  • state transition function f from domain X × T to domain X × T ∗:

    f(x, ε) := x (3)f(x, st) := f(f(x, s), t) for s ∈ T ∗ and t ∈ T (4)

    where ε is to be interpreted as the absence of transition firing and T ∗ denotesthe Kleene-closure of T . The set of reachable states of Petri net (N , x0) is

    R(N , x0) := {x′ ∈ X : ∃s ∈ T ∗ such that f(x0, s) = x′} (5)

    The system to be diagnosed is modeled by a labeled Petri net

    (N , Σ, l, x0) (6)

    where Σ is the set of event labels for the transitions in T , l : T → Σ is the tran-sition labeling function, and x0 is the initial state. The event labeling function lis extended to l : T ∗ → Σ∗ in the following manner: given t, t′ ∈ T and a, a′ ∈ Σ,

    l(t) = a and l(t′) = a′ ⇒ l(tt′) = l(t)l(t′) = aa′. (7)

    The language generated by the labeled Petri net (N , Σ, l, x0), denoted byL(N , Σ, l, x0), is the set of all traces of events that can be generated by (N , Σ, l, x0)from its initial state x0. L(N , Σ, l, x0) is formally defined as

    L(N , Σ, l, x0) = {l(s) ∈ Σ∗ : s ∈ T ∗ and f(x0, s) is defined}. (8)

    Some of the events in Σ are observable, i.e., their occurrence can be observed(detected by sensors), and while the other events are unobservable. Thus Σ ispartitioned into observable and unobservable event sets: Σ = Σo ∪ Σuo. Theobservable events in the system may be commands issued by the controller,sensor readings, and changes of sensor readings. On the other hand, unobservableevents may be fault events and some events that cause changes in the systemstate that are not recorded by sensors.

    We model faults as events. The set of fault events Σf is a subset of Σ. Sinceit is trivial to diagnose fault events that are observable, we assume Σf ⊆ Σuo.Our goal is to detect the occurrence of fault events, if any, from the observableof traces of events generated by the system.

    We partition the set of fault events into disjoint sets where each disjoint setcorresponds to a different fault type. The motivation for doing so is that it mightnot be necessary to detect uniquely every fault event, but only the occurrenceof one among a subset (type) of fault events. We write

    Σf = ΣF1∪̇ · · · ∪̇ΣFk (9)

    where ΣFi denotes the set of fault events corresponding to a type i fault, 1 ≤i ≤ k, where k is the number of fault types. When we write “a fault of type ihas occurred”, we mean that a fault event from the set ΣFi has occurred.

  • 2.2 Petri Net Diagnoser

    We now introduce the diagnoser. The diagnoser is a labeled Petri net built fromthe system model (N , Σ, l, x0). This labeled Petri net performs diagnostics whileobserving on-line the behavior of (N , Σ, l, x0).

    The diagnoser for (N , Σ, l, x0) isNd = (N , Σ, l, xd0,∆f ) (10)

    where N , Σ, l are defined as before, xd0 is the initial diagnoser state and ∆f ={F1, F2, ..., Fk} is the finite set of fault types. The diagnoser Petri net Nd keepsthe graphical structure of the underlying system model. Up to this point Nd isnot different from a labeled Petri net. However, its dynamics are different fromthose of a labeled Petri net since its state transition function is only defined forobservable events.

    The diagnoser gives the estimate of the current state of the system after theoccurrence of an observable event. Hereafter when we say “state”, we mean thestate of the system model and when we say “diagnoser state”, we mean the stateof the diagnoser. The diagnoser state is a list of the set of states the system modelcan be in after observation of an event in Σo together with fault information.Fault information in a diagnoser state is coded by fault labels.

    Every state in a diagnoser state has a fault label. A fault label is a vectorof length k (the number of fault types) which has entries of “0” or “1”. If wedenote the fault label by lf , then lf ∈ ∆ = {0, 1}k. Thus, the number of possiblefault labels is |∆| = 2k. When the fault label is the zero vector, we say the faultlabel is “normal”. The initial state has the “normal” fault label by definition.

    We now define the fault label propagation function LP : X×∆×T ∗ → ∆. LPpropagates the fault labels consistent with the traces of events. Let x ∈ X, lf ∈ ∆and s ∈ T ∗. Then LP (x, lf , s) is defined as

    LP (x, lf , s) = lf +k∑

    i=1

    bsi (11)

    where bsi ∈ ∆ and

    bsi =

    [0, · · · , 0, 1, 0, · · · , 0], if l(s) contains an event from ΣFi,↑ith coloumn

    [0 , · · · , 0, 0, 0, · · · , 0] , otherwise.(12)

    Before we define the diagnoser state and the diagnoser state transition func-tion, we need the notion of unobservable reach of a diagnoser state. To define theunobservable reach of a diagnoser state we first define the unobservable reach ofa state.

    Let xd,i = xili denote a state with fault label li in the diagnoser state xd.The unobservable reach of xd,i is denoted by UR(xd,i) and defined as follows:

    UR(xd,i) := {xili} ∪ {yly : ∃y ∈ R(N , xi), ∃s ∈ T ∗, l(s) ∈ Σ∗uo(f(xi, s) = y) and (ly = LP (xi, li, s))}. (13)

  • The unobservable reach of the diagnoser state xd is the listing of all distinctvectors UR(xd,i) for all xd,i in xd; it is denoted by UR(xd).

    We can now define the initial diagnoser state xd0 of Nd. xdo is the listing ofall distinct elements of UR(x0l0), where x0 is the initial state of the underlyinglabeled Petri net (N , Σ, l, x0) and l0 is the fault label of x0 that is “normal” bydefinition.

    We find the diagnoser states reachable from the initial diagnoser state byusing the diagnoser state transition function. In order to define the diagnoserstate transition function, we first define the feasible transitions and then thestates reached by firing the feasible transitions.

    We denote by B(xd,i, a) the feasible transitions from xd,i = xili where xi ∈ Xis a state in diagnoser state xd, li is the fault label of xi, and a is an event inΣo. Formally, B(xd,i, a) is defined as

    B(xd,i, a) = {t ∈ T : l(t) = a and for all p ∈ I(t) (xi(p) ≥ w(p, t))}. (14)

    We define B(xd, a) to be the set resulting from the union of B(xd,i, a) for alli, where 1 ≤ i ≤ r, i.e.,

    B(xd, a) = ∪1≤i≤rB(xd,i, a), (15)

    where r is the number of rows of xd.We denote by S(xd, a) the set of all distinct reachable states, together with

    their fault labels, when all transitions in B(xd,i, a) are fired from every xd,i inxd. Namely, S(xd, a) is defined as

    S(xd, a) := ∪1≤i≤r ∪t∈B(xd,i,a) {x′il′i : x

    ′i = f(xi, t), l

    ′i = li} (16)

    where xi is a state in xd, li is the fault label of xi, and r is the number of statesin xd. Since the fault events are unobservable, the label propagation functiondoes not change the fault labels of the states in the process of building S(xd, a).

    The diagnoser state transition function of Nd is fd : Xd × Σ0 → Xd whereXd is the state space of the diagnoser Nd. Given the diagnoser state xd and theevent a ∈ Σo, fd(xd, a) is defined if B(xd,i, a) 6= ∅ for some xd,i in xd. If fd(xd, a)is defined, then x′d = fd(xd, a) and x

    ′d is the listing of the elements of the set

    ∪s∈S(xd,a)UR(s). (17)

    The diagnostic information provided by a diagnoser state is given by exam-ining the last k columns of that state: (i) if a column contains only 0’s, thenwe know that no fault event of the corresponding type could have occurred; (ii)if a column contains only 1’s, then we are sure that at least one fault event ofthat type has occurred; (iii) otherwise, if a column contains 0’s and 1’s, we areuncertain about the occurrence of a fault of that type. If the diagnoser is surethat a fault of type i has occurred, then it outputs “Fn” as indicated in Figure1. This diagnostic infortmation is equivalent to that obtained from diagnoserautomata in the Diagnoser Approach of [4].

  • Example 1. Consider the Petri net graph N given in Fig. 2. The set of places ofN is P = {p1, p2, . . . , p16}. The set of transitions of N is T = {t1, t2, . . . , t17}.All arc weights are equal to 1. The initial marking x0 is

    x0 = [ 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 ] (18)

    The set of events is Σ = {a, e, g, h, σuo, f1, f2}. We do not explicitly writethe event labeling function l, but the event label of every transition t ∈ T isshown in Fig. 2. The set of unobservable events is Σuo = {σuo, f1, f2} and allthe remaining events in event set Σ are observable. There are two types of faultsand the sets corresponding them are Σf1 = {f1} and Σf2 = {f2}.

    Let Nd = (N , Σ, l, x0,∆f ) denote the diagnoser. The initial diagnoser statex0 is the listing of the elements of set UR(x0l0) where l0 is “normal”. Then, x0

    is found as

    x0 =

    1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 |0 00 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 |0 00 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 |1 00 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 |0 0

    •�4∗

    (19)

    These four states in the initial diagnoser state are shown in Nd in Fig. 2 byusing four types of tokens.

    We show in (20)-(22), the states of the diagnoser that are reached if thesequence of observable events is “aeh”. An examination of the last two columnsof x1, x2 and x3 reveals that: (i) x1 and x2 are F1-uncertain (f1 could havehappened but we do not know for sure) and (ii) x2 and x3 are F2-uncertain.The complete state space of Nd contains 28 diagnoser states. These states arenot listed here due to space constraints. We note that we wrote a Matlabprogram to generate the state space of Petri net diagnosers.

    x1 = fd(x0, a) =

    1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 |0 00 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 |0 00 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 |0 00 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 |1 00 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 |0 00 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 |0 0

    (20)

    x2 = fd(x1, e) =

    1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 |0 00 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 |0 00 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 |0 00 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 |1 00 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 |0 00 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 |0 01 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 |0 10 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 |0 10 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 |0 10 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 |0 1

    (21)

    x3 = fd(x2, h) =[

    0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 |0 00 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 |0 1

    ](22)

  • suo

    f1 a

    suo

    a

    e

    h

    a

    e

    f2

    h

    a

    e

    g

    e

    f2

    g

    p1

    t1

    t4

    t8

    t12

    t16

    t2

    t5

    t9

    t13

    t17

    t3

    t6

    t10

    t14

    t7

    t11

    t15

    p4

    p7

    p11

    p15

    p2

    p5

    p8

    p12

    p16

    p3

    p6

    p9

    p13

    p10

    p14

    N

    suo

    f1 a

    suo

    a

    e

    h

    a

    e

    f2

    h

    a

    e

    g

    e

    f2

    g

    p1

    t1

    t4

    t8

    t12

    t16

    t2

    t5

    t9

    t13

    t17

    t3

    t6

    t10

    t14

    t7

    t11

    t15

    p4

    p7

    p11

    p15

    p2

    p5

    p8

    p12

    p16

    p3

    p6

    p9

    p13

    p10

    p14

    Nd *

    *

    *

    Fig. 2. The Petri net graph N with initial marking x0 is given on the left, and its Petrinet diagnoser Nd with the initial diagnoser state x0 is given on the right

    3 Distributed Diagnosis with Communication

    In this section, we study the problem of distributed diagnosis with communica-tion. In the case of centralized diagnosis, there is one diagnoser working on theentire system model and processing all the observations. We are interested in thesituation depicted in Fig. 3 where there are two diagnosers, each containing onlypart of the model and each observing only a subset of the observed events. Weallow these two diagnosers to communicate after the occurrence of an observableevent.

    We begin this section by defining the distributed system model and the dis-tributed diagnosers. In the second part, we define the communication protocoland give the algorithm of distributed diagnosis with communication at the endof the subsection. Then, we state the main result of this paper and give its proof.

    3.1 System Model

    The system to be diagnosed is given by the labeled Petri net (N , Σ, l, x0). Wewish to partition (N , Σ, l, x0) into two “place-bordered” labeled Petri nets using

  • Nd ,1

    Nd ,2

    s So o, 1e

    s So o, 2e

    Fi

    Fj

    System Model

    Diagnoser

    Diagnoser

    Observable Event ofFirst Diagnoser

    Observable Event ofSecond Diagnoser

    FailureType

    FailureType

    Communication

    N1

    N2

    Common

    Places

    Fig. 3. Distributed diagnosis with communication

    a partition on the event set Σ. For this purpose, let ΠΣ be a partition on Σ suchthat Σ = Σ1 ∪Σ2 and Σ1 ∩Σ2 = ∅. Then, we define the place-bordered labeledPetri nets (N1, Σ1, l1, x01) and (N2, Σ2, l2, x02) where N1 = 〈P1, T1, A1, w1〉 andN2 = 〈P2, T2, A2, w2〉 by the following conditions

    1. ∀t ∈ T if l(t) ∈ Σ1, then t ∈ T1; ∀t ∈ T if l(t) ∈ Σ2, then t ∈ T2.2. P1 = ∪t∈T1 (I(t) ∪O(t)), P2 = ∪t∈T2 (I(t) ∪O(t)).

    The corresponding Petri net graphs N1 and N2 have disjoint sets of transitions.However, the sets of places are not disjoint, i.e., there may exist t1 and t2 suchthat O(t1) ∩ I(t2) 6= ∅ or I(t1) ∩O(t2) 6= ∅. A1 and A2 are the restrictions of Ato (P1 × T1) ∪ (T1 × P1) and (P2 × T2) ∪ (T2 × P2), respectively. Similarly forw1, w2 and l1, l2. x0 can be written as

    x0 = [x0,P1−Pc , x0,Pc , x0,P2−Pc ] (23)

    where Pc = P1 ∩ P2 denotes the set of common places, x0,P1−Pc denotes thecolumns of markings corresponding to the places in the set (P1 − Pc)[we use −to denote the set difference]; x0,Pc and x0,P2−Pc are defined similarly. Then, x01is defined as

    x01 = [x0,P1−Pc , x0,Pc ], (24)

    and x02 is defined asx02 = [x0,Pc , x0,P2−Pc ]. (25)

    Every labeled Petri net (N , Σ, l, x0) can be partitioned into place-borderednets (N1, Σ1, l1, x01) and (N2, Σ2, l2, x02) using the method described above.But not all partitions allow us to do distributed diagnosis. Suppose Nd,1 =(N1, Σ1, l1, xd,01,∆f,1) and Nd,2 = (N2, Σ2, l2, xd,02, ∆f,2) are diagnosers for(N1, Σ1, l1, x01) and (N2, Σ2, l2, x02), respectively. The sets ∆f,1 and ∆f,2 arethe finite sets of fault types of Nd,1 and Nd,2, respectively, and ∆f,1∪∆f,2 = ∆f .Then, Nd,1 and Nd,2 must satisfy the two following conditions to perform dis-tributed diagnosis:

    1. ∀t ∈ T if (I(t) ∪O(t)) ∩ (P1 ∩ P2) 6= ∅ , then l(t) ∈ Σo.2. ∀t1 ∈ T1 and ∀t2 ∈ T2, if l(t1) ∈ ΣFi and l(t2) ∈ ΣFj , then i 6= j.

  • The first condition says that the transitions putting tokens in or removingtokens from common places are labeled with observable events. The second con-dition ensures that if two transitions belong to different place-bordered netsthen they belong to different types of faults, i.e, ∆f,1 ∩∆f,2 = ∅. This assump-tion is made for the sake of simplicity; it could be relaxed at the price of extracommunication between the diagnosers.

    The initial diagnoser states xd,01 and xd,02 are listings of the elements of thesets UR(x01l01f ) and UR(x02l

    02f ), respectively. The fault label l

    01f is in ∆1 where

    ∆1 is the set of possible fault types in Nd,1, and similarly the fault label l02f isin ∆2 where ∆2 is the set of possible fault types in Nd,2.

    3.2 Algorithm of Distributed Diagnosis with Communication

    Nd,1 and Nd,2 diagnose (N1, Σ1, l1, x01) and (N2, Σ2, l2, x02), respectively. How-ever, the individual estimates of Nd,1 and Nd,2 do not provide enough informa-tion for diagnosis, if Nd,1 and Nd,1 work in isolation. This is because N1 and N2do not have disjoint sets of places and both nets can change the markings onthe common places and then effect each other. If Nd,1 and Nd,2 are not informedof each others’ changes of markings, then their state estimates are incompleteor otherwise wrong. We overcome this problem by defining a communicationprotocol between diagnosers. This protocol recovers the centralized diagnosisinformation by allowing the two diagnosers to send each other the change ofmarkings on the common places.

    We define the weighting vector W (t) for a labeled Petri net graph as

    W (t) = [w(t, p1)−w(p1, t), w(t, p2)−w(p2, t), . . . , w(t, p|P |)−w(p|P |, t)] (26)

    where t ∈ T and pi ∈ P for all 1 ≤ i ≤ |P |. When we write WPc(t), this meansthe columns of W (t) corresponding to the common places of N1 and N2. We usethe same notation in the states and the diagnoser states. That is, as was done forthe initial state in the previous section if x ∈ X, then xP1 denotes the columnsof x corresponding to the places of N1 and if xd ∈ Xd, then xd,P1 denotes thecolumns of xd corresponding to the places of N1.

    In contrast with centralized diagnoser states, distributed diagnoser statescarry message labels. Message labels record the actions on the common places.Let lm be the message label of x ∈ X and t = t1t2 . . . t|t| ∈ T ∗ be a string oftransitions. If x′ = f(x, t), i.e., x′ = f(· · · f(f(x, t1), t2), t|t|), then the messagelabel propagation function MLP defines the message label l′m of x

    ′ as

    l′m = MLP (x, lm, t) = [lm, WPc(t1), WPc(t2), . . . , WPc(t|t|)]. (27)

    The length of the message label l′m is bounded by (|lm|+ |t||Pc|).The message label of a diagnoser state is the listing of the message labels for

    every state in the diagnoser state, i.e., every row of the diagnoser state. Given adiagnoser state xd, we denote by MLabel(xd) the message label of xd. Supposethat xd is reached from x′d by firing transition t labeled by observable event σo. If

  • WPc(t) is equal to the zero vector for every state reached in xd, then the messagelabel of xd is equal to the message label of x′d, i.e., MLabel(xd) = MLabel(x

    ′d).

    As defined the length of the message labels will grow unboundedly as thenumber of observed events grows unboundedly. However, the message labels ofdiagnoser states can be truncated under the following conditions. Given thediagnoser state xd ∈ Nn×k, let lim denote the message label of the state in theith row of xd. Then for all i where 1 ≤ i ≤ n if the message labels are of theform lim = abi, i.e., they have a common prefix a but different suffixes bi), thenthese message labels are truncated to lim = bi.

    Communication among the two diagnosers are triggered by the occurrenceof observable events. When Nd,1 observes event σo ∈ Σo,1, then Nd,1 updates itsdiagnoser state and sends the message label of the resulting diagnoser state toNd,2. We assume that the message is correctly received by Nd,2 without delay (orwith delay that is less than the minimum interarrival time). Upon reception ofthe message, Nd,2 uses the received message label to update its current diagnoserstate. We will demonstrate that under this protocol, the centralized diagnoserstate can be recovered from the diagnoser states of Nd,1 and Nd,2. We nowformalize this protocol for distributed diagnosis with communication (DDC).

    From now on when we denote a diagnoser state xd,1 of the diagnoser Nd,1, wewill drop the subscript d and write x1 instead of xd,1. We use the same notationfor the diagnoser states of diagnoser Nd,2.Algorithm DDC. Given that the sequence s = σo0σo1 . . . σon is observed where|s| = n + 1, initialize the algorithm i := 0.

    Upon observation of σoi do { If σoi ∈ Σ1, then go to 1, else go to 2 }1 {Master is Nd,1 }

    1.1 Find the next diagnoser state of Nd,1:xi+11 = fd,1(x

    i1, σoi),

    where fd,1 is the diagnoser state transition function of Nd,1 and x11 isthe diagnoser state of Nd,1 after the completion of the DDC algorithmfor the event σo(i−1).

    1.2 If WPc(t) = 0 for all t ∈ B(xi1, σoi), then equate xi+12 to xi2 and go to1.4.

    1.3 Send a “message” to Nd,2:message := MLabel(xi+11 )

    Upon reception of this message, Nd,2 “updates” xi2 to xi+12 as follows:Initialize diagnoser state xi+12 to empty matrix. For k from 1 to r wherer denotes the number of rows of “message”, do the following

    1.3.1 Given

    messagek = [message prefixk, message presentk],

    where message presentk is the last |Pc| columns of the messagekand message prefixk is the rest of it, extract the set M of states ofxi2 with message labels that are equal to message prefixk.

  • 1.3.2 Given M , construct the set M as follows

    sPc = sPc + message presentk, sP2−Pc = sP2−Pc ,

    lsm = [lsm,message presentk]

    where s ∈ M , s ∈ M and lsm and lsm are the message labels of s ands, respectively.

    1.3.3 Append every element of M to xi+12 as a new row.1.4 If possible, truncate message labels of both xi+11 and x

    i+12 .

    1.5 Increment i.2 {Master is Nd,2 } Same as 1 but exchange 1 and 2 in every expression.

    End

    Example 2. We use the same labeled Petri net in Example 1. We choose an ar-bitrary event partition ΠΣ such that Σ1 = {a, σuo, f1} and Σ2 = {e, g, h, f2}.Given this event partition the set of transitions of N1 and N2 are T1 = {t1, t2, t3,t4, t5, t6, t8} and T2 = {t7, t9, t10, t11, t12, t13, t14, t15, t16, t17}, respectively. Theset of common places is Pc = {p3, p6, p8, p9, p11}. The resulting place-borderedPetri net graphs are shown in Fig. 4. In Fig. 4, common places are shown withdashed lines. Place-bordered Petri nets (N1, Σ1, l1, x01) and (N2, Σ2, l2, x02) sat-isfy the two conditions to be eligible for distributed diagnosis. That is, all thetransitions putting tokens in or removing tokens from common places in bothPetri net graphs are labeled with observable events, and the sets of fault typesof these nets are disjoint.

    Suppose event sequence “aeh” is observed. The diagnoser states of Nd,1 andNd,2 at end of each iteration are

    x01 =

    1 1 1 0 0 0 0 0 0 0 |00 1 1 1 0 0 0 0 0 0 |00 0 1 0 1 0 0 0 0 0 |10 1 1 0 0 0 1 0 0 0 |1

    , x02 =

    [1 0 0 0 0 0 0 0 0 0 0 |0

    ](28)

    x11 =

    1 0 0 0 0 1 0 0 0 0 |0| −1 1 0 0 00 0 0 1 0 1 0 0 0 0 |0| −1 1 0 0 00 1 1 0 0 0 0 1 0 0 |0| 0 0 1 0 00 1 1 0 0 0 0 0 1 0 |1| 0 0 0 1 00 0 0 0 0 1 1 0 0 0 |1| −1 1 0 0 00 1 1 0 0 0 0 0 0 1 |1| 0 0 0 0 1

    , x12 =

    0 1 0 0 0 0 0 0 0 0 0 |0| −1 1 0 0 01 0 1 0 0 0 0 0 0 0 0 |0| 0 0 1 0 01 0 0 1 0 0 0 0 0 0 0 |0| 0 0 0 1 01 0 0 0 0 1 0 0 0 0 0 |0| 0 0 0 0 1

    (29)

    x21 =

    1 0 0 0 0 0 0 0 0 0 |0| −1 1 0 0 0 0 −1 0 0 00 0 0 1 0 0 0 0 0 0 |0| −1 1 0 0 0 0 −1 0 0 00 0 0 0 0 0 1 0 0 0 |1| −1 1 0 0 0 0 −1 0 0 00 1 1 0 0 0 0 0 0 0 |0| 0 0 1 0 0 0 0 −1 0 00 1 1 0 0 0 0 0 0 0 |1| 0 0 0 1 0 0 0 0 −1 00 1 1 0 0 0 0 0 0 0 |1| 0 0 0 0 1 0 0 0 0 −1

    , (30)

  • x22 =

    0 0 0 0 1 0 0 0 0 0 0 |0| −1 1 0 0 0 0 −1 0 0 01 0 0 0 0 0 1 0 0 0 0 |0| 0 0 1 0 0 0 0 −1 0 01 0 0 0 0 0 0 1 0 0 0 |0| 0 0 0 1 0 0 0 0 −1 01 0 0 0 0 0 0 0 0 1 0 |0| 0 0 0 0 1 0 0 0 0 −10 0 0 0 0 0 0 0 1 0 0 |1| −1 1 0 0 0 0 −1 0 0 01 0 0 0 0 0 0 0 0 0 1 |1| 0 0 1 0 0 0 0 −1 0 0

    . (31)

    x31 = x21, x

    32 =

    [1 0 0 0 0 0 0 0 0 1 0 |0| 0 0 0 0 1 0 0 0 0 −11 0 0 0 0 0 0 0 0 0 1 |1| 0 0 1 0 0 0 0 −1 0 0

    ]. (32)

    The above diagnoser states of Nd,1 and Nd,2 were found by a Matlab pro-gram that implements Algorithm DDC.

    3.3 Recovering the Centralized Diagnoser State from theDistributed Diagnoser States

    In this section we show how the centralized diagnoser state can be recoveredunder the communication protocol described in Algorithm DDC in the previ-ous section. We verify the correctness of the recovery method by showing thatit reconstructs the centralized diagnoser state after each observable event in thegiven observed sequence of events.

    An iteration of Algorithm DDC is the completion of the algorithm for anobservable event in the sequence. Let x and x be the diagnoser states of Nd,1and Nd,2, respectively, at the end of an iteration. We denote by xi the ith rowof diagnoser state x. Similarly, we denote by xj the jth row of diagnoser statex. Let lxim and l

    xjm denote the message labels of xi and xj , and lxif and l

    xjf denote

    the fault labels of xi and xj , respectively. We denote by di and dj the states of(N1, Σ1, l1, x01) and (N2, Σ2, l2, x02) in rows xi and xj , respectively. Combiningall these notations we write xi = dilxif l

    xim and xj = dj l

    xjf l

    xjm . If the number of

    rows of x and x are r and r, respectively, then we define the set Merge(x, x) asfollows

    Merge(x, x) = ∪1≤i≤r ∪1≤j≤r {[didj,P2−Pc |lxif lxjf ] : lxim = lxjm }. (33)

    Algorithm DDC results in xi,Pc = xj,Pc when lxim = l

    xjm . Therefore Merge(x, x)

    can equivalently be defined as follows

    Merge(x, x) = ∪1≤i≤r ∪1≤j≤r {[di,P1−Pcdj |lxif lxjf ] : lxim = lxjm }. (34)

    We state the main result of this paper in the following theorem. The theoremclaims that the centralized diagnoser state can be recovered from the diagnoserstates of the distributed diagnosers by merging these diagnoser states as definedin (33) or (34).

  • suo

    f1 a

    suo

    a

    a a

    p1

    t1

    t4

    t8

    t2

    t5

    t3

    t6

    p4

    p7

    p11

    p2

    p5

    p8

    p3

    p6

    p9

    suo

    f1 a

    suo

    a

    e

    h

    a

    e

    f2

    h

    a

    e

    g

    e

    f2

    g

    p1

    t1

    t4

    t8

    t12

    t16

    t2

    t5

    t9

    t13

    t17

    t3

    t6

    t10

    t14

    t7

    t11

    t15

    p4

    p7

    p11

    p15

    p2

    p5

    p8

    p12

    p16

    p3

    p6

    p9

    p13

    p10

    p14

    e

    h

    e

    f2

    h

    e

    g

    e

    f2

    g

    t12

    t16

    t9

    t13

    t17

    t10

    t14

    t7

    t11

    t15

    p11

    p15

    p8

    p12

    p16

    p3

    p6p9

    p13 p10

    p14

    N1

    N

    N2

    Fig. 4. Petri net graphs N , N1 and N2

    Theorem 1. Given the system (N , Σ, l, x0), its diagnoser Nd, and the eventpartition ΠΣ, let (N1, Σ1, l1, x01) and (N2, Σ2, l2, x02) be the resulting place-bordered Petri nets, and Nd,1 and Nd,2 be the corresponding diagnosers, respec-tively. Given a sequence of observable events σo0σo1 . . . σon such that xi+1 =fd(xi, σoi) for all i where 0 ≤ i ≤ n and xi is the diagnoser state of Nd, ifxi1 and x

    i2 are the diagnoser states of Nd,1 and Nd,2, respectively, at the end of

    the iteration of Algorithm DDC for σoi, then the set of rows of xi is equal toMerge(xi1, x

    i2).

  • Proof (of Theorem). The proof is by induction. We first show that mergingthe initial diagnoser states of the distributed diagnosers results in the initialdiagnoser state of the centralized diagnoser. The induction hypothesis statesthat merging the diagnoser states xi1 and x

    i2 of the distributed diagnosers results

    in the centralized diagnoser state xi. In the induction step we prove that mergingthe diagnoser states xi+11 and x

    i+12 of the distributed diagnosers results in the

    centralized diagnoser state xi+1 using the induction hypothesis.Induction Base. Merge(x01, x

    02) is equal to the set of rows of x

    0.

    Proof (of Induction Base). The initial diagnoser state x0 is the listing of theelements in the set UR(x0l0) where l0 is “normal”. If (ssf ) is a row of x

    0 then

    s = x0 + W ∗(t), (35)lsf = LP (x0, l0, t), (36)

    where t = t1t2 . . . t|t| ∈ Σ∗uo and W ∗(t) =∑|t|

    m=1 W (tm).Since addition is column-wise (36) can be separated into two equations as

    follows

    sP1 = x0,P1 + W∗P1(t) (37)

    sP2 = x0,P2 + W∗P2(t) (38)

    From (24) and (25), x0,P1 = x01 and x0,P2 = x02. Thus (sP1 lfsP1) and (sP2 lsP2f )

    are rows in UR(x01lx01f ) and UR(x02lx02f ), respectively. By definition x

    01 and

    x02 are the listings of the elements in the sets UR(x01lx01f ) and UR(x02l

    x02f ),

    respectively. Thus, (sP1 lfsP1) and (sP2 lsP2f ) are rows of x

    01 and x

    02, respec-

    tively. Conversely if (s′ls′

    f ) and (s′′ls

    ′′f ) are rows of x

    01 and x

    02, respectively, then

    (s′s′′P2−Pc ls′f l

    s′′f ) is a row of x

    0.

    Induction Hypothesis. Merge(xi1, xi2) is equal to the set of rows of x

    i.Induction Step. Merge(xi+11 , x

    i+12 ) is equal to the set of rows of x

    i+1.

    Proof (of Induction Step). We need to show that Merge(xi+11 , xi+12 ) is equal to

    the set of rows of xi+1. This is done by showing inclusion in both directions forthese two sets of rows. Without loss of generality assume that σoi ∈ Σ1.

    (⇐) If (slsf ) is a row of xi+1, then there exist a row (s′ls′

    f ) of xi+11 and a row

    (s′′ls′′

    f ) of xi+12 such that l

    s′m = ls

    ′′m , s = s′s′′P2−Pc = s

    ′P1−Pcs

    ′′ and lsf = ls′f l

    s′′f ,

    where lsf , ls′f and lfs

    ′′ are the fault labels of s, s′ and s′′, respectively, and ls′

    m

    and ls′′

    m are the message labels of s′ and s′′, respectively.If (slsf ) is a row of x

    i+1, then (slsf ) ∈ ∪z∈|S(xi,σoi)|UR(z) [cf. (17)]. Thus,(slsf ) ∈ S(xi, σoi) or there exist (vlvf ) ∈ S(xi, σoi) and t = t1t2 . . . t|t| ∈ Σ∗uo suchthat

    s = v + W ∗(t) and lsf = LP (v, lvf , t), (39)

    where W ∗(t) =∑|t|

    m=1 W (tm). The first case can be thought of as a special caseof the second. When s is equal to v, W ∗(t) = 0. In this case, lsf = l

    vf . Then

  • (slsf ) ∈ S(xi, σoi) since (slsf ) = (vlvf ) and (vlvf ) ∈ S(xi, σoi). Thus, we will writethe proof for the second case only.

    If (vlvf ) ∈ S(xi, σoi), then there exists a row (dldf ) of xi such that there existsa transition t ∈ B(xi, σoi)[cf. (15)] feasible from d, and

    v = d + W (t) and lvf = ldf . (40)

    From the induction hypothesis, d = dP1dP2−Pc and ldf = l

    dP1f l

    dP2f where (dP1 l

    dP1f )

    is a row of xi1 and (dP2 ldP2f ) is a row of x

    i2, and l

    dP1m = l

    dP2m .

    Since addition of vectors is column-wise, from (40) we get

    vP1 = dP1 + WP1(t) (41)

    Note that since l(t) ∈ Σ1, t ∈ T1. Thus, t is feasible from dP1 in (N1, Σ1, l1, x01).Then, (vP1 l

    vP1f ) ∈ S(xi1, σoi).

    Consider (39); since addition of vectors is column-wise we get

    sP1 = vP1 + W∗P1(t) (42)

    WPc(tm) = 0 for all tm in t. Thus, W∗Pc

    (t) = 0. Then, (sP1 lsP1f ) is a row

    of xi+11 since (sP1 lsP1f ) ∈ UR(vP1 l

    vP1f ) and (vP1 l

    vP1f ) ∈ S(xi1, σoi). (sP1 l

    sP1f ) ∈

    UR(vP1 lvP1f ) since if there exists some tm in t where 1 ≤ m ≤ such that l(t) ∈

    Σuo,2, then WP1(tm) = 0. The message label lsP1m of (sP1 l

    sP1f ) is given as

    lsP1m = l

    vP1m = [l

    dP1m , WPc(t)] (43)

    if for some t ∈ B(xi1, σoi), WPc(t) 6= 0, i.e., a message is sent. Otherwise, lsP1m =ldP1m , i.e. no message is sent. Note that the transitions labeled with unobservableevents do not change the markings of the common places.

    We now show that sP2 lsP2f is a row of x

    i+12 and its message label l

    sP2m is equal

    to the message label of sP1 . Similar to (41) and (42) we can write

    vP2 = dP2 + WP2(t) (44)

    andsP2 = vP2 + W

    ∗P2(t) (45)

    If no message is sent, then WPc(t) = 0 for every t ∈ B(xi1, σoi). WP2−Pc(t) = 0since t ∈ Σ1. Thus WP2(t) = 0 and from (44) we get vP2 = dP2 . If we substitutethis result into (45) we see that sP2 l

    sP2f ∈ UR(dP2 l

    dP2f ). Since dP2 l

    dP2f is a row

    of xi2 and by definition (17) each element of UR(dP2 ldP2f ) is a row of x

    i2, sP2 l

    sP2f

    is a row of xi2. By Algorithm DDC, when no message is sent, xi+12 = x

    i2.

    Thus, sP2 lsP2f is a row of x

    i+12 . The message label of sP2 is equal to l

    dP2m . Since

    ldP1m = l

    dP2m , the message label l

    sP2m = l

    dP1m . Since no message is sent, l

    sP1m = l

    dP1m

    and lsP1m = lsP2m .

  • If a message is sent, then for some t ∈ B(xi1, σoi) WPc(t) 6= 0. If we substitute(44) into (45) we get

    sP2 = dP2 + WP2(t) + W∗P2(t) (46)

    Since t ∈ T1, WP2−Pc(t) = 0. It was also shown that W ∗Pc(t) = 0. Then, (46) canbe rewritten as

    sP2 = dP2 + [0, W∗P2−Pc(t)]︸ ︷︷ ︸

    d

    +[WPc(t), 0] (47)

    Let ldf denote the fault label of d. Then, dldf ∈ UR(dP2) and the message label ldm

    of d is equal to the message label ldP2m of dP2 . Since ldP1m = l

    dP2m , then ldm = l

    dP1m .

    By Algorithm DDC, the message sent is MLabel(xi+11 ). Since sP1 lsP1f is

    a row of xi+11 . There exists row k of MLabel(xi+11 ) which is equal to l

    sP1m .

    Let messagek denote row k of this message; then from (43) we extract thatmessage prefixk = l

    dP1m and message presentk = WPc(t). Thus, we get message prefixk =

    ldm. This results in d ∈ M in Algorithm DDC. If d ∈ M , then there existsd̃ ∈ M such that

    d̃Pc = dPc + message presentk = dPc + WPc(t), d̃P2−Pc = dP2−Pc (48)

    andld̃m = [l

    dm, WPc(t)]. (49)

    From (47) and (48) we find that d̃ = sP2 and ld̃m = l

    sP2m . From (43) and (49) we

    find that lsP1m = ld̃m. Thus, lsP1m = l

    sP2m .

    As a result we showed that given that (slsf ) is a row of xi+1, (sP1 l

    sP1f ) and

    (sP2 lsP2f ) are rows of x

    i+11 and x

    i+12 , respectively, and l

    sP1m = l

    sP2m . Thus, defining

    s′ = sP1 , ls′f = l

    sP1f , s

    ′′ = sP2 and ls′′f = l

    sP2f concludes the proof of one direction

    of inclusion.(⇒) If (s′ls′f ) and (s′′ls

    ′′f ) are rows of x

    i+11 and x

    i+12 , respectively, and l

    s′m = l

    s′′m ,

    then there exists a row (slsf ) of xi+1 such that s = s′s′′P2−Pc = s

    ′P1−Pcs

    ′′ andlsf = l

    s′f lfs

    ′′, where lsf , ls′f and l

    s′′f are the fault labels of s, s

    ′ and s′′, respectively,and ls

    ′m and l

    s′′m are the message labels of s

    ′ and s′′, respectively.The proof of the above statement is similar to the proof of the converse

    statement proved in detail above, when the steps are followed in reverse order.First we find the rows of xi1 and x

    i2 from which (s

    ′ls′

    f ) and (s′′ls

    ′′f ) are reached,

    and using the induction hypothesis we show that the merging of these rows formsa row of xi. Then we find the row in xi+1 when the transitions in B(xi1, σoi) orΣuo are fired, and show that it is the merging of (s′ls

    ′f ) and (s

    ′′ls′′

    f ). The detailsof this proof are omitted here.

    This completes the proof of the induction step and the proof of the theorem.ut

  • Example 3. We consider Example 1 and Example 2 since they give the central-ized and distributed diagnoser states, respectively, for the event sequence “aeh”.

    Note that merging x01 and x02 in (28) results in x

    0 given in (19). Similarly,merging x11 and x

    12 in (29) results in x

    1 given in (20); merging x21 and x22 in (30)

    and (31), respectively, results in x2 given in (21); merging x31 and x32 in (32)

    results in x3 given in (22). Observe that x31 in (32) contains state estimates forN1 that are not present in the centralized diagnoser state given in (22). Thisoverestimation is due to the use of a partial system model, namely, N1 by Nd,1.However, these overestimates disappear during the merge operation.

    Remark : Each time the merge operation is invoked, we could send each di-agnoser their part of the merged diagnoser state and they could use these partsas their new initial states. This resetting of the initial states allows the reset ofthe message labels as well, thus preventing their unbounded growth.

    4 Conclusion

    The Petri net diagnosers introduced in this work are different from the diag-noser automata in [4] in the sense that they perform on-line fault diagnosis onthe same transition structure as the system model, namely the Petri net graphof the system. This feature can be exploited to allow for modular/distributedimplementations of diagnosis algorithms, as was done in Section 3 based on adecomposition of the Petri net graph of the system into place-bordered subnets.This kind of modular decomposition often occurs naturally in the modelling ofcomplex systems; it would be more difficult to achieve using automata models.In the future work, it might be worthwhile to investigate other types of decom-position of Petri nets.

    For the sake of generality, our presentation of Algorithm DDC focused onthe main steps involved and on its correctness proof. Several improvements to itare possible in order to achieve more efficient implementations from the point ofview of the communications required between the diagnosers. As was mentionedin Section 3, message labels can, and should, be truncated when all the rowsin a message label share a common prefix. It may be possible to determineupper bounds on the size of message labels based on the structure of the Petrinet. Another possible improvement is to attempt to reduce the frequency ofcommunications. For instance, if the connectivity between the place-borderedsubnets is “one-way”, in the sense that one subnet only consumes tokens fromcommon places but never puts tokens in them, and vice-versa for the other, thencommunication need only be one-way and in fact could possibly be delayed.In general, it may be possible to delay communications if the diagnosers usetimestamps and their local clocks are synchronized. Detailed investigations ofsuch improvements constitute interesting topics for future research.

  • References

    1. Pouliezos, A.D., Stavrakakis, G.S.: Real time fault monitoring of industrial pro-cesses. Kluwer Academic Publishers (1994)

    2. Cassandras, C.G., Lafortune, S.: Introduction to Discrete Event Systems. KluwerAcademic Publishers (1999)

    3. Lafortune, S., Teneketzis, D., Sampath, M., Sengupta, R., Sinnamohideen, K.:Failure diagnosis of dynamic systems: An approach based on discrete event systems.In: Proc. 2001 American Control Conf. (2001) 2058–2071

    4. Sampath, M., Sengupta, R., Lafortune, S., Sinnamohideen, K., Teneketzis, D.:Diagnosability of discrete event systems. IEEE Trans. Automatic Control 40 (1995)1555–1575

    5. Sampath, M., Sengupta, R., Lafortune, S., Sinnamohideen, K., Teneketzis, D.:Failure diagnosis using discrete event models. IEEE Trans. Control Systems Tech-nology 4 (1996) 105–124

    6. Sampath, M.: Discrete event systems based diagnostics for a variable air volumeterminal box application. Technical report, Advanced Development Team, JohnsonControls, Inc. (1995)

    7. Şimşek, H.T., Sengupta, R., Yovine, S., Eskafi, F.: Fault diagnosis for intra-platooncommunication. In: Proc. 38th IEEE Conf. on Decision and Control. (1999)

    8. Sengupta, R.: Discrete-event diagnostics of automated vehicles and highways. In:Proc. 2001 American Control Conf. (2001)

    9. Sampath, M., Godambe, A., Jackson, E., Mallow, E.: Combining qualitative andquantitative reasoning - a hybrid approach to failure diagnosis of industrial sys-tems. In: IFAC SafeProcess 2000. (2000) 494–501

    10. Sampath, M.: A hybrid approach to failure diagnosis of industrial systems. In:Proc. 2001 American Control Conf. (2001)

    11. Garćıa, E., Morant, F., Blasco-Giménez, R., Quiles, E.: Centralized modular di-agnosis and the phenomenon of coupling. In Silva, M., Giua, A., Colom, J., eds.:Proceedings of the 6th International Workshop on Discrete Event Systems, IEEEComputer Society (2002) 161–168

    12. Chehaibar, G.: Replacements of Open Interface Subnets and Stable State Trans-formation Equivalance, Springer-Verlag (1993) 1–25

    13. Vogler, W.: Modular Construction and Partial Order Semantics of Petri Nets(Lecture Notes in Computer Science, vol. 625). Springer-Verlag (1998)

    14. Aghasaryan, A., Fabre, E., Benveniste, A., Boubour, R., Jard, C.: Fault detectionand diagnosis in distributed systems: An approach by partially stochastic petrinets. Journal of Discrete Event Dynamical Systems Vol. 8(2) (1998) 203–231

    15. Benveniste, A., Fabre, E., Jard, C., Haar, S.: Diagnosis of asynchronous discreteevent systems, a net unfolding approach. Technical Report Research Report 1456,Irisa (2002)

    16. Hadjicostis, C.N., Verghese, G.C.: Monitoring Discrete Event Systems Using PetriNet Embeddings. Application and Theory of Petri Nets 1999 (Series Lecture Notesin Computer Science, vol. 1639) (1999) 188–207

    17. Sifakis, J.: Realization of fault-tolerant systems by coding petri nets. Journal ofDesign Automation and Fault-Tolerant Computing Vol. 3 (1979) 93–107

    18. Giua, A.: Petri net state estimators based on event observation. IEEE 36th Int.Conf. on Decision and Control (1997) 4086–4091

    19. Desel, J., Esparza, J.: Free Choice Petri Nets. Cambridge University Press (1995)