ACSI { Artifact-Centric Service Interoperation€¦ · given model of an artifact-centric process conforms to process executions observed in reality. We provided checking techniques

ACSI – Artifact-Centric Service Interoperation

Deliverable D3.2

Model Repair (Task 3.2)

Project Acronym ACSIProject Title Artifact-Centric Service InteroperationProject Number 257593Workpackage WP3 - Observation-based techniques and toolsLead Beneficiary UTEditor(s) Project Management BoardContributors(s) Massimiliano de Leoni TU/e

Boudewijn van Dongen TU/eDirk Fahland TU/e

Reviewer(s) Viara Popova UTNick Bezhanishvili Imperial

Dissemination Level PUContractual Delivery Date NAActual Delivery Date 30-06-12Version v1.3

The research leading to these results has received funding from the European Community’s SeventhFramework Programme [FP7/2007-2013] under grant agreement no. 257593

Ref. Ares(2012)669931 - 06/06/2012

*Document History

Version Date Comments

V1.0 02-05-2012 Copy of MS6 reportV1.1 02-05-2012 Adapted submitted BPM paperV1.2 14-05-2012 Complete Deliverable Text for Internal ReviewV1.3 30-05-2012 Final Version

The research leading to these results has received funding from the European Community’s SeventhFramework Programme [FP7/2007-2013] under grant agreement no. 257593

Abstract

This deliverable describes the results of task T3.2: “Model Repair” from the second yearof WP3 of the ACSI project. It assumes the reader to be familiar with the idea of artifactsas they have been developed within the ACSI project.

In the first year of the project, we developed (1) techniques to observe the executions ofan artifact-centric system from a running system by recording it in raw logs or by extractingbehavioral information from the system’s database, and (2) techniques to check whether agiven model of an artifact-centric process conforms to process executions observed in reality.We provided checking techniques for various notions of conformance, including behavioralconformance which describes whether the life-cycle model of each artifact is followed inthe system. If observed process executions deviate from what is described in the model, aconformance checker reports which artifact deviates from the observed executions, and alsohow the artifact’s life-cycle model deviates.

Task T3.2 of WP3 in the second year addresses the problem of how to repair an artifactlife-cycle model that does not conform to observed executions. The goal is to automaticallyextend the given model with additional actions allowing to execute activities not executablebefore, or to skip activities that shall not occur according to the log. The deviating artifactcan then be replaced with the repaired artifact, which in turn repairs the entire artifact-centric process. As a consequence, artifact-centric systems such as the artifact-centric inter-operation hub gain self-evolvability mechanisms.

This deliverable presents (1) newly developed techniques for repairing artifact life-cycles,(2) the artifact repair tool that was developed for ACSI with the process mining toolkitProM, and (3) an evaluation of the techniques on the Order-to-Cash example used withinACSI as well as first results on the FRIS use case provided by Collibra. The core techniqueof repairing life-cycle models of this deliverable has been accepted at BPM’12 [12]. The un-derlying technique of finding alignments was developed last year within ACSI in deliverableD3.1 and published at ACSD and EDOC [2, 4].

c© Deliverable D3.2 — Model Repair (Task 3.2) Page 3 of 28

Table of ContentsDocument History 2

1 Introduction 61.1 Artifacts and Repairing Artifact Models . . . . . . . . . . . . . . . . . . . . . . . 61.2 Formal Problem Statement and Approach . . . . . . . . . . . . . . . . . . . . . . 7

2 Preliminaries 82.1 Event Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 Petri Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 Conformance Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4 Alignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Model Repair: The Problem 11

4 Naive Solution to Model Repair 124.1 Basic Idea: Locations of Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . 124.2 Formal Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5 Repairing Processes by Adding Subprocesses 145.1 Basic Idea: Identify Subprocesses . . . . . . . . . . . . . . . . . . . . . . . . . . . 145.2 Formal Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.3 Improving Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6 Experimental Evaluation 176.1 Implementation in ProM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176.2 Validation on the Order-to-Cash Example . . . . . . . . . . . . . . . . . . . . . . 17

6.2.1 Initial Situation: Simple Order-to-Cash Process . . . . . . . . . . . . . . . 186.2.2 Change: Complex Order-to-Cash Process . . . . . . . . . . . . . . . . . . 186.2.3 Evolving from Simple to Complex by Model Repair . . . . . . . . . . . . 20

6.3 Validation on the FRIS Use-Case . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

7 Related Work 25

8 Conclusion 26

References 27


List of Figures1 A net system N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Result of local model repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Result of sub-process model repair . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Given life-cycle models of the Order-to-Cash example . . . . . . . . . . . . . . . 195 Entity-relationship diagram of the extended order-to-cash example . . . . . . . . 206 Best-matching alignments of logs of the extended process to given life-cycles . . . 217 Model repair plugin in ProM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 Repaired life-cycle model of the customer purchase order . . . . . . . . . . . . . . 229 Repaired life-cycle model of the material purchase order . . . . . . . . . . . . . . 2310 Conformance of to-be FRIS payments artifact life-cycle . . . . . . . . . . . . . . 2411 Repaired FRIS payments artifact life-cycle model . . . . . . . . . . . . . . . . . . 25


1 IntroductionArtifact-centric systems such as the artifact interoperation hub use artifact models to supportprocess executions. For this reason, it is crucial that the artifact models describe exactly whatkind of process executions are permitted and supported by the system. If this is the case, theartifact models conform to actual process executions. However, process environments change,requirements evolve, and process executions in reality deviate because of various influences.As a consequence, the system’s artifact models no longer conform to reality, that is, they nolonger describe how a process is handled, and the artifact-centric system fails to support processexecutions and process participants [24].

The system can regain this capability when its artifact models are repaired, that is, wechange a given model so that the repaired model can replay the log and is as similar as possibleto the original one. In the following, we present the first technique to model repair for classicaland artifact-centric processes with respect to a given log. The technique particularly can helpto obtain a conformant running system from an initial model of an artifact-centric system.

1.1 Artifacts and Repairing Artifact ModelsIn the artifact-centric paradigm, a process is driven by interacting artifacts: one artifact is anentity with attributes. The artifact can be instantiated; each instance evolves as its attributes’values are changed during process execution. The process may be driven by several artifacts (viz.stateful entities) that interact with each other by producing and consuming events. An artifact-centric process model defines the artifacts (entities) of the process and their relations, as well asone life-cycle model per artifact. The life-cycle model defines how each artifact instance evolvesfrom its initial state to a goal state. State transitions are triggered by external events, producedby service calls of process participants or other artifact instances. Executing a transition mayproduce further events to be consumed by other artifact instances or process participants. Asshown in Deliverable D1.1, the complete life-cycle of an artifact describing the artifact’s possiblestates and state transitions can be expressed as Finite State Machine model (FSM), as a GuardStage Milestone Model (GSM) or in Proclets, which use Petri nets to represent the life-cycle ofan artifact. Standard translations from FSMs to Petri nets (and vice versa) [6] are available,and a translation from Petri nets to GSM has been developed in Deliverable D3.3 [1].

In a nutshell, an artifact model describes how each instance of the artifact evolves overtime (the artifact’s life-cycle) and how it interacts with other artifact instances or processparticipants. An artifact-centric process model M consists of several artifact models M1, . . . ,Mn

that interact by producing and consuming events based on their life-cycle models.In D3.1 [10, 9] we have shown how to check whether a given artifact-centric process model

M conforms to observed process executions E, that is, whether each artifact model Mi followsthe behavior in E, and whether all artifact models M together allow the interaction observed inE. As described in D3.1 and D3.3, the observed process executions E can either be recorded in araw log or extracted from a database such as the database of the ACSI Interaction Hub describedin Deliverable D1.2. In D3.1, we have shown that the behavior E equivalently decomposes intosets L1, . . . , Ln of cases, where each set Li contains all cases of an individual artifact model Mi

together with information about how Mi interacts with other artifacts of M . Li is called log ofMi; each case of Mi is a sequence of events1; each event in a case in Li denotes a state change

1Events of a log are not to be confused with events exchanged between artifact instances; terminology evolvedin both fields independently and we make sure to clearly distinguish the kind of event discussed in the following.


of an instance of Mi, all events of a case describe how this instance evolved throughout theprocess execution E. Additionally, each event has attributes describing the interaction betweenartifact instances (in terms of exchanged artifact events). The individual logs L1, . . . , Ln can beextracted automatically using the log extraction techniques of D3.3.

Now, if M does not conform to E (the behavior recorded in E differs from the behaviorprescribed by M), then there exists an artifact model Mi that does not conform to its log Li.By gradually transforming Mi into a model M ′i that conforms to Li, we turn the artifact-centricprocess model M into an model M ′ that conforms to the entire behavior E.

Thus, the problem of repairing an artifact-centric process model M to conform to observedbehavior E is reduced to the problem of repairing an artifact model Mi to conform to a log Li.We solve this problem in the following.

1.2 Formal Problem Statement and ApproachOur solution technically builds on Petri nets. In Deliverable D1.1, we have shown how to con-cretely instantiate the ACSI Abstract Artifact Model A3M in terms of Proclets. One Procletdescribes one artifact, where the artifact’s life-cycle is described by a Petri net which is ex-tended by ports to describe artifact interactions. The Petri net notation for artifact life-cyclesis interchangeable with the FSM and the GSM notation notation also presented in D1.1. FSMstranslate to Petri nets (and vice versa) by standard translations [6]. A translation from PetriNets to GSM has been developed in deliverable D3.3. The technical choice for Petri nets wasmotivated by the availability of a robust conformance checker for Petri nets, developed lastyear within ACSI in D3.1 [2, 4], which is able to report the least deviations between an artifactlife-cycle model and a given execution.

Thus, we assume artifact life-cycle models N1, . . . , Nn to be given, expressed as Petri nets.We also assume logs L1, . . . , Ln to be given, extracted from a raw log or a database. Using thestructural conformance checker of D3.1, one can ensure that Li structurally corresponds to Ni,for i = 1, . . . , n, in the sense that there is a log Li for each model Ni and each log can be mappedto the corresponding life-cycle model. If there is a different number of logs L1, . . . , Lm, n 6= mextracted from the data source, then the life-cycle models Ni cannot be repaired with respectto the logs Lj due to a structural mismatch. In this this general case, the following procedureis applied:

1. a life-cycle model Ni with a corresponding log Li is repaired as described in the following,

2. a life-cycle model Ni without a corresponding log is discarded (i.e., there is no log justifyingthat Ni exists), and

3. for a log Lj without corresponding life-cycle model, a new life-cycle model is discovered,using the artifact life-cycle discovery techniques of Deliverable D3.3.

The concrete problem to repair a given life-cycle model N with a corresponding log L, thatis addressed in this deliverable, reads as follows. We assume a Petri net N (a life-cycle model ofan artifact) and a log L (being a set of observed cases of that artifact) to be given. N conformsto L if N can execute each case in L, i.e., N can replay L. If N cannot replay L, then we haveto change N to a Petri net N ′ s.t. N ′ can replay L and N ′ is as similar to N as possible.

We solve this problem in a compositional way: we identify subprocesses that have to be addedin order to repair N . Experimental results [12] showed that only a few and relatively small,nicely structured sub-processes have to be added to repair models, even in case of significant


deviations. In more detail, using the conformance checker of D3.1 we first compute for each casel ∈ L an alignment that describes at which parts, N and l deviate. Based on this alignment,we identify transitions of N that have to be skipped to replay l and which particular eventsof l could not be replayed on N . Moreover, we identify the location at which N should havehad a transition to replay each of these events. We group sequences of non-replayable events atthe same location to a sublog L′ of L. For each sublog L′, we construct a small subprocess N ′

that can replay L′ by using a process mining algorithm. We then insert N ′ in N at the locationwhere each trace of L′ should have occurred. By doing this for every sublog of non-replayableevents, we obtain a repaired model that can replay L. Moreover, by the way we repair N , wepreserve the structure of N giving process stakeholders useful insights into the way the artifactlife-cycle has to be changed to conform to L.

The remainder of this deliverable is structured as follows. Section 2 recalls basic notions onlogs, Petri nets and alignments. Section 3 explains the model repair problem in more detail.Section 4 presents a naive solution to model repair that is extended to the final solution inSect. 5. We report on experimental results in Sect. 6, discuss related work in Sect. 7 andconclude in Sect. 8.

2 PreliminariesThis section recalls the basic notions on Petri nets and introduces notions such as event logsand alignments.

2.1 Event LogsEvent logs serve as the starting point for process mining. An event log is a multiset of traces.Each trace describes the life-cycle of a particular case (i.e., a process instance) in terms of theactivities executed.

Definition 1 (Trace, Event Log). Let Σ be a set of actions. A trace l ∈ Σ∗ is a sequence ofactions. L ∈ IB(Σ∗) is an event log, i.e., a ultiset of traces.

An event log is a multiset of traces because there can be multiple cases having the sametrace. If the number of traces is irrelevant, we refer to a log as a set of traces L = {l1, . . . , ln}. Inthis simple definition of an event log, an event is fully described by an action label. We abstractfrom extra information such as the resource (i.e., person or device) executing or initiating theactivity and the timestamp of the event.

Logs of this kind can be obtained from the ACSI Interaction Hub or other data sources thatuse relational databases or raw logs using the log extraction techniques of Deliverable D3.3.

2.2 Petri NetsWe use labeled Petri nets to describe processes. We first introduce unlabeled nets and then liftthese notions to their labeled variant.

Definition 2 (Petri net). A Petri net (P, T, F ) consists of a set P of places, a set T oftransitions disjoint from P , and a set of arcs F ⊆ (P ×T )∪(T ×P ). A marking m of N assigns


c

e

ba d

p1

p2

p3

p4

p5

p6

Figure 1 – A net system N .

each place p ∈ P a natural number m(p) of tokens. A net system N = (P, T, F,m0,mf ) is aPetri net (P, T, F ) with an initial marking m0 and a final marking mf .

We write •y := {x | (x, y) ∈ F} and y• := {x | (y, x) ∈ F} for the pre- and the post-set of y,respectively. Fig. 1 shows a simple net system N with the initial marking [p1] and final marking[p6]. N will serve as our running example.

The semantics of a net system N are typically given by a set of sequential runs. A transitiont of N is enabled at a marking m of N iff m(p) ≥ 1, for all p ∈ •t. If t is enabled at m,

then t may occur in the step mt−→ mt of N that reaches the successor marking mt with

mt(p) = m(p) − 1 if p ∈ •t \ t•, mt(p) = m(p) + 1 if p ∈ t• \ •t, and mt(p) = m(p) otherwise,

for each place p of N . A sequential run of N is a sequence m0t1−→ m1

t2−→ m2 . . .tk−→ mf of steps

miti+1−−→ mi+1, i = 0, 1, 2, . . . of N beginning in the initial marking m0 and ending in the final

marking mf of N . The sequence t1t2 . . . tk is an occurrence sequence of N . For example, in thenet N of Fig. 1 transitions a is enabled at the initial marking; abcd is a possible occurrencesequence of N .

The places and transitions of a Petri net can be labeled with names from an alphabet Σ. Inparticular, we assume label τ ∈ Σ denoting an invisible action. A labeled Petri net (P, T, F, `) is anet (P, T, F ) with a labeling function ` : P∪T → Σ. A labeled net system N = (P, T, F, `,m0,mf )is a labeled net (P, T, F, `) with initial marking m0 and final marking mf . The semantics of alabeled net is the same as for an unlabeled net. Additionally, we can consider labeled occurrencesequences of N . Each occurrence sequence σ = t1t2t3 . . . of N induces the labeled occurrencesequence `(σ) = `(t1)`(t2)`(t3) . . . `(tk)|Σ\{τ} obtained by replacing each transition ti by its label`(ti) and omitting all τ ’s from the result by projection onto Σ \ {τ}.

2.3 Conformance CheckingConformance checking techniques investigate how well an event log L ∈ IB(Σ∗) and a labelednet system N = (P, T, F, `,m0,mf ) fit together. Note that the process model N may have beendiscovered through process mining or may have been made by hand. In any case, it is interestingto compare the observed example behavior in L and the potential behavior of N .

Conformance checking can be done for various reasons. First of all, it may be used to auditprocesses to see whether reality conforms to some normative or descriptive model. Deviationsmay point to fraud, inefficiencies, and poorly designed or outdated procedures. Second, con-formance checking can be used to evaluate the results of process discovery techniques. In fact,genetic process mining algorithms use conformance checking to select the candidate models usedto create the next generation of models [18].

There are four quality dimensions for comparing model and log: (1) fitness, (2) simplicity, (3)precision, and (4) generalization [24]. A model with good fitness allows for most of the behaviorseen in the event log. A model has a perfect fitness if all traces in the log can be replayed by


the model from beginning to end. The simplest model that can explain the behavior seen in thelog is the best model. This principle is known as Occam’s Razor. A model is precise if it is not“underfitting”, i.e., the model does not allow for “too much” behavior. A model is general if itis not “overfitting”, i.e., the model is likely to be able to explain unseen cases [24, 25].

In this deliverable, we primarily focus on fitness and secondarily on precision. The primaryaim of model repair is to obtain a life-cycle model that describes the behaviors seen in reality,that is, model repair is only applied in case of a non-fitting model. As secondary quality criterion,we consider precision, that is, the repaired life-cycle model should not allow arbitrary behavior.

The two other criteria of generalization and simplicity may contradict these aims. [24, 15]An overly simple model is known to either have insufficient fitness (as it cannot explain themore involved behaviors), or to allow too much behavior (by allowing any behavior over a givenset of actions).

A model that does not fit a given log (i.e., the observed behavior cannot be explained bythe model) is repaired using the notion of alignment presented next. Our particular techniquefor model repair will cater for precision as a side effect. Generalization and precision can bebalanced, for instance using a post-processing technique such as the one presented in [11].

2.4 AlignmentsTo find out how to repair a given model (a Petri net system N) s.t. it can replay a given logL, we use a technique to determine a minimal number of changes that are needed to replay atrace l on N [25, 5, 3]. It essentially boils down to relate l ∈ L to an occurrence sequence σ ofN s.t. l and σ are as similar as possible. When putting l and σ next to each other, i.e., aligningσ and l, we will find (1) transitions in σ that are not part of l and (2) activities of l that arenot part of σ.

For instance, a trace l = accd is similar to the occurrence sequence σ = abcd of the net ofFigure 1 where trace l deviates from σ by skipping over b and having an additional c.

In order to repair N to fit trace l, N has to allow to skip over transitions of the first kind andhas to be extended to execute activities of the second kind. In [5, 3] an approach was presentedthat allows to automatically align a trace l to an occurrence sequence of N with a minimalnumber of deviations in an efficient way. All of this is based on the notion of an alignment anda cost function.

Definition 3 (Alignment). Let N = (P, T, F, `,m0) be a labeled net system. Let l = a1a2 . . . ambe a trace over Σ. A move is a pair (s, b) ∈ (T ∪ {τ})× (Σ ∪ {τ}) \ {(τ, τ)}. An alignment of lto N is a sequence α = (s1, b1)(s2, b2) . . . (sk, bk) of moves, s.t.

1. the restriction of the first component to transitions T , (s1s2 . . . sk)|T , is an occurrencesequence of N ,

2. the restriction of the second component to actions Σ is the trace l, i.e., (b1b2 . . . bk)|Σ = l,and

3. transition labels and actions coincide (whenever both are defined), i.e., for all i = 1, . . . , k,if si 6= τ, `(si) 6= τ , and bi 6= τ , then `(si) = bi.

We call a move (si, bi) (1) a move on model iff si 6= τ ∧ bi = τ , (2) a move on log iffsi = τ ∧ bi 6= τ , and (3) a synchronous move iff si 6= τ ∧ bi 6= τ .


For instance, for trace l = accd and the net of Figure 1, a possible alignment would be(a, a)(c, c)(b, τ)(τ, c)(d, d).

Each trace usually has several (possibly infinitely many) alignments to N . We are typicallyinterested in a best alignment, i.e., one that has as many synchronous moves as possible. Oneway to find a best alignment is to use a cost function on moves and to find an alignment withthe least costs.

Definition 4 (Cost function, cost of an alignment). Let κ : Σ ∪ T → N define for eachtransition and each action a positive cost κ(x) > 1 for all x ∈ Σ ∪ T . The cost of a move (s, b)is κ((s, b)) = 1 iff s 6= τ 6= b, κ((s, b)) = κ(s) iff b = τ , and κ((s, b)) = κ(b) iff ∧ s = τ . Thecost of an alignment α = (s1, b1) . . . (sk, bk) is κ(α) =

∑ki=1 κ((si, bi)).

Definition 5 (Best alignment). Let N = (P, T, F, `,m0) be a labeled net system. Let κ be a costfunction over moves of N and Σ. Let l be a trace over Σ. An alignment α is a best alignmentwrt. κ iff for all alignments α′ holds κ(α′) ≥ κ(α).

Note that a trace l can have several best alignments with the same cost. A best alignmentα of a trace l can be found efficiently using an A?-based search over the space of all prefixes ofall alignments of l. The cost function κ thereby serves as a very efficient heuristics to prune thesearch space and guide the search to a best alignment. See [5, 3] for details.

Using the notion of best alignment we can relate any trace l ∈ L to an occurrence sequenceof N . The fraction of moves on log or move model relative to all moves can be used to computefitness. Moreover, the aligned event log can be used as a starting point to compute otherconformance metrics such as precision and generalization.

3 Model Repair: The ProblemAlthough there are many approaches to compute conformance and to diagnose deviations givena log L and model N , we are not aware of techniques to repair model N to fit the event log L.Let N ′ be the repaired model. Ideally, N ′ perfectly fits log L, i.e., all traces in the log can bealigned using only synchronous moves. Moreover, N ′ should be as close to L as possible.

There are two “forces” guiding such repair. First of all, there is the need to improve fitness.Secondly, there is the desire to clearly relate the repaired model to the original model, i.e.,repaired model and original model should be similar. Given metrics for fitness and closeness ofmodels, we can measure the weighted sum or harmonic mean of both metrics to judge the qualityof a repaired model. If the first force is weak (i.e., minimizing the distance is more importantthan improving the fitness), then the repaired model may remain unchanged. If the second forceis weak (i.e., improving the fitness is more important than minimizing the distance), then repaircan be seen as process discovery. In the latter case, the initial model is irrelevant and it is betterto use conventional discovery techniques.

If we assume the initial model to be relevant, we need to make sure it is indeed taken intoaccount. While repairing, one should not be forced to extend the model to allow for all observednoisy behavior. Therefore, we propose the following approach.

1. Given a log L and model N , determine the multiset Lf of fitting traces and the multisetLn of non-fitting traces.

2. Split the multiset of non-fitting trace Ln into Ld and Lu. According to the domain expertthe traces in Ld should fit the model, but do not. Traces in Lu could be considered asoutliers/noise (according to the domain expert) and do not trigger repair actions.


3. Repair should be based on the multiset L′ = Lf ∪Ld of traces. L′ should perfectly fit therepaired model N ′, but there may be many of such repaired models.

4. Return a repaired model N ′ that can be easily related back to the original model N .

In the remainder, we assume L′ to be given, i.e., outliers Lu of L are removed. If an event logis noisy and one includes also undesired traces Lu, it makes no sense to repair the model whileenforcing a perfect fit. The resulting model will contain very complex control-flow structures toexplain all the noise, giving it a “spaghetti-like” appearance, and will thus not be similar to theoriginal model.

4 Naive Solution to Model RepairIn the following, we present 2 solutions to model repair. A naive one which introduces a numberof basic notions for model repair and an advanced one that yields better results; the latter ispresented in Sect. 5.

Alignments give rise to a naive solution to the model repair problem that we sketch in thefollowing. It basically comprises to extend N with a τ -transition that skips over a transition twhenever there is a move on model (t, τ), and to extend N with a self-looping transition t withlabel a whenever there is a move on log (τ, a). This extension has to be done for all traces andall moves on log/model. The crucial part is to identify the locations of these extensions.

4.1 Basic Idea: Locations of ExtensionsThe extension w.r.t. a move on model (t, τ) is trivial: we just have to create a new τ -labeledtransition t∗ that has the same pre- and post-places as t. The extension w.r.t. a move on log(τ, a) provides various options that only require that an a-labeled transition is enabled wheneverthis move on log occurs. In the following, we use the alignment to identify for each move on log(τ, a) in which marking m of N it should have occurred (the “enabling location” of this move).In principle, adding an a-labeled transition that consumes from the marked places of m andputs the tokens back immediately, repairs N w.r.t. this move on log. However, we improve theextension by checking if two moves on log would overlap in their enabling locations. If this isthe case, we only add one a-labeled transition that consumes from and produces on this overlaponly.

Figure 2 illustrates this idea where the net log L is aligned to the net N of Fig. 1. The thirdline of the alignment describes the marking that is reached in N by replaying this prefix of thealignment on N . The move on model (b, τ) requires to repair N by adding a τ transition thatmimics b as shown in Fig. 2. The move on log (τ, c) occurs at two different locations {p4, p3} and{p4, p3} in the different traces. They overlap on p4. Thus, we repair N w.r.t. (τ, c) by addinga c-labeled transition that consumes from and produces on p4. Correspondingly for (τ, f). Theextended model can replay the log L of Fig. 2.

The formal definitions read as follows.

4.2 Formal DefinitionsFor the remainder of this deliverable, let L be a Petri net system, let L be a log. For each tracel ∈ L, assume an arbitrary but fixed best fitting alignment α(l) to be given. Let α(L) = {α(l) |


c

e

b

a d

p1

p2

p3

p4

p5

p6

f

c

Log: L = {acfced, abccfed}

Alignments:

a c τ τ b e b da c f c τ e τ d

[p2,p3] [p4,p3] [p4,p5] [p4,p3] [p4,p5] [p6]

a b c τ τ e b da b c c f e τ d

[p2,p3] [p2,p5] [p4,p5] [p4,p3] [p4,p5] [p6]

Figure 2 – Result of locally repairing the net of Fig. 1 w.r.t. the log on the right.

l ∈ L} be the set of all alignments of the traces in L to N .Let α = (t1, a1) . . . (tn, an) be an alignment w.r.t. N = (P, T, F,m0, `). For any move (ti, ai),

let mi be the marking of N that is reached by the occurrences sequence t1 . . . ti−1|T of N . Forall 1 ≤ i ≤ n, if (ti, ai) = (τ, ai) is a log move, then the enabling location of (τ, ai) is the setloc((τ, ai)) = {p ∈ P | mi(p) > 0} of places that are marked in mi. For example in Fig. 2,loc((c, τ)) = {p4, p3} in the first alignment and loc((c, τ)) = {p4, p5} in the second alignment.

It is easy to check that extending N with a new a-labeled transition t with •t = loc((τ, ai)) =t• turns the log move (τ, ai) into synchronous move (t, ai), i.e., repairs N w.r.t. (τ, ai). The sameeffect is achieved by letting t loop on a nonempty subset Q ⊆ loc(a) of a location of a.

We now lift this local repair for one log move to a repair for all alignments of a log to N . LetL be an event log. For each event a ∈ ΣL, define the locations of log-moves of a as the set loc(a)where Q ∈ loc(a) iff Q = loc((τ, ai)) for some trace l ∈ Σ with alignment (t1, a1) . . . (tn, an) of lto N and a log move (ti, ai) = (τ, ai), i ∈ {1, . . . , n}.

If there are two different log moves of a at two different locations Q1 and Q2, then we extendN by two different a-labeled transitions t1 and t2 that loop on Q1 and Q2. However, if Q1 andQ2 overlap on some places Q1 ∩Q2, then we could, instead of adding t1 and t2, just add onea-labeled transition t12 that loops on Q1 ∩Q2. This transition can mimic both log-moves. Forinstance, in Fig. 2, instead of adding two c-labeled transitions that loop on {p4, p3} and {p4, p3}we just added one transition that loops on p4.

Now, for any two locations Q1, Q2 ∈ loc(a) where Q1 ∩Q2 6= ∅, their intersection Q1 ∩Q2

is a “sublocation” that allows to extend N with an a-labeled t transition that consumes fromand produces on Q1 ∩Q2. This t repairs all log-moves occurring at locations Q1 and Q2.

When considering three locations Q1, Q2, Q3 ∈ loc(a), we may have Q1 ∩Q2 6= ∅ andQ2 ∩Q3 6= ∅, but Q2 ∩Q3 = ∅. Thus, there is not a unique set of sublocations of all loca-tions in loc(a). In the following, we characterize feasible sets of sublocations; any model repairalgorithm then has the freedom to pick a suitable set of sublocations.

A set Q′ of places of N is a sublocation of a ∈ ΣL iff there exists a location Q ∈ loc(a) andQ′ ⊆ Q. A set Qa = {Q1

a, . . . , Qka} of sublocations of a is complete iff for each Q ∈ loc(a) exists

Qia ∈ Qa with Qia ⊆ Q. A complete set of sublocations ensures that each log move is indeedrepresented by some sublocation Qia (allowing to extend N at Qia). We call Qa minimal iff notwo sublocations overlap, i.e., Qia ∩Q

ja = ∅ for all 1 ≤ i < j ≤ k. A non-minimal Qa can be

made minimal: there exist sublocations Qia, Qja ∈ Qa, i 6= j with Qia ∩Q

ja 6= ∅. Then we can


replace Qia and Qja by their overlap Qia ∩Qja. Minimality is not required for achieving fitness,

but it helps to add as few transitions and arcs to N as possible.We now have all formal definitions for the naive solution to model repair: add a self-looping

transition at each sublocation of a move on log of an event a ∈ Σ, and add a τ -transition tobridge any move on model.

Definition 6 (Local model repair). Let L be a log, let N be a Petri net. Let α(L) be thealignments of the traces of L to N . For each a ∈ ΣL, let Qa be complete set of sublocations of abased on the alignments α(L). The locally repaired model of N w.r.t. α(L) is the net N ′ thatis obtained from N by adding to N

• a fresh transition tτ 6∈ TN with •tτ = •t and tτ• = t•, `′(tτ ) = τ iff there exists an

alignment α ∈ α(L) and a model move (t, τ), and

• a fresh transition ta 6∈ TN with •ta = Q = ta•, `′(ta) = a iff Q ∈ Qa.

Theorem 1. Let L be a log, let N be a Petri net. Let α(L) be the alignments of the traces of Lto N . Let N ′ be the locally repaired model of N w.r.t. α(L). Then each trace l ∈ L is a labeledoccurrence sequence of N ′, that is, N ′ can replay L.

Sketch. The theorem holds from the observation that each alignment α = (t1, a1) . . . (tn, an) ∈α(L) of L to N can be transformed into an alignment of L to N ′ having synchronous movesonly:

• a move on model (ti, τ) of α w.r.t. N is replaced by a move on model (ti,τ , τ) w.r.t. N ′

where `(ti,τ ) = τ (i.e., the new transition ti,τ allows to skip over ti), and

• a move on log (τ, a) w.r.t. N is replaced by a move on model (ta, a) w.r.t. N ′ where•ta = ta

• = Q ⊆ loc((τ, a)).

The transformed alignment documents that the corresponding trace can replayed in L.

5 Repairing Processes by Adding Sub-processes

The downside of the naive solution to model repair is that model is fixed only locally. For alog L where a best alignment contains only few synchronous moves, i.e., N does not conformto L, many τ -transitions and self-loops are added. In fact, we observed in experiments thatself-looping transitions were often added at the same location creating a “flower sub-process” ofevents Σ′ ⊂ ΣL that locally permitted arbitrary sequences Σ′∗ to occur. Thus, the approach ofSect. 4 achieves fitness but entirely disregards the quality dimension of precision (see Sect. 2.3).In the following we turn the naive approach into a structured approach to model repair thatalso considers precision.

5.1 Basic Idea: Identify SubprocessesThe previous approach just records log moves of individual events a ∈ Σ and their enablinglocations. In the following, we record sequences of log moves and their enabling locations. Each


c

e

b

a d

p1

p2

p3

p4

p5

p6

f

c

Figure 3 – Result of repairing the net of Fig. 1 by adding subprocess w.r.t. thelog of Fig. 2.

maximal sequence of log moves (of the same alignment) that all occur at the same location is anon-fitting sub-trace. We group non-fitting subtraces at the same location Q into a non-fittingsublog LQ of that location. We then discover from LQ a subprocess N(LQ) that can replayLQ by using a mining algorithm that guarantees perfect fitness of N(LQ) to LQ. We ensurethat N(LQ) has a unique start transition and a unique end transition. We then add subprocessN(LQ) to N and let the start transition of N(LQ) consume from Q and let the end transitionof N(LQ) produce on Q, i.e., the subprocess models a structured loop that starts and ends atQ.

Figure 3 illustrates this idea. The shown model is the result of repairing N of Fig. 1 w.r.t. thelog of Fig. 2 by adding subprocesses as described by the alignments of Fig. 2. We can identifytwo subtraces cf and fc that occur at the same sublocation p4. Applying process discovery onthe sublog {cf, fc} yields the subprocess at the top right of Fig. 3 that puts c and f in parallel.The two grey-shaded silent transitions connected to place p4 indicate the start and end of thissubprocess.

The technical details read as follows.

5.2 Formal DefinitionsLet l ∈ L be a trace, let α = (t1, a1) . . . (an, tn) be an alignment of l to N . Recall from Sect.4 thatloc((τ, ai)) = Q denotes the location of log move (τ, ai), i.e., the places that were marked whenai should have occurred. We call a maximal sequence β = (τ, ai) . . . (τ, ai+k) of consecutive logmoves of α a subtrace of α at location Q iff loc((τ, aj)) = loc((τ, ai)) = Q, i ≤ j ≤ i+ k, and nolonger sequence of log moves has this property. We write loc(β) = loc((τ, ai)) for the location ofsubtrace β. Let β(L) be the set of all subtraces of all alignments α(L) of L to N . For example,in Fig. 2, fc is a subtrace of the first alignment at location {p4, p3} and cf is a subtrace of thesecond alignment at location {p4, p5}.

As in Sect. 4, we can group subtraces if they share the same sublocation. Formally, we saythat Q is a sublocation of a subtrace β = (τ, a1) . . . (τ, ak) iff Q ⊆ loc(β). We put subtraces inthe same sublog if they have a joint sublocation.

All subtraces of L are then partitioned into sublogs, each having a different sublocation.A sublog (LQ, Q) of α(L) at location Q is a set of subtraces LQ ⊆ β(L) s.t. for all β ∈ LQ,∅ 6= Q ⊆ loc(β), that is each trace in LQ can start at sublocation Q of its first event. Theentire set of subtraces β(L) can be partitioned in several sublogs, though there are multipleways of partitioning. We call a set {(LQ,1, Q1), . . . , (LQ,k, Qk)} of sublogs of α(L) complete iffLQ,1∪. . .∪LQ,k = L(β). While completeness is enough to repairN w.r.t. L, one may want to haveas few sublogs at as few locations as possible, for instance, by merging two sublogs (LQ,1, Q1)


and (LQ,2, Q2) to (LQ,1 ∪ LQ,2, Q1 ∩Q2) if Q1 ∩Q2 6= ∅. We call {(LQ,1, Q1), . . . , (LQ,k, Qk)}minimal iff Qi ∩Qj = ∅ for all 1 ≤ i < j ≤ k. Similar to Sect. 4, there may be multiple minimalcomplete sets of sublogs of L. This allows to configure the repair w.r.t. the locations and thecontents of the different sublogs, yielding different repair options.

For a complete set of sublogs of α(L), we can repair N w.r.t. α(L) by discovering for eachsublog (LQ, Q) a process model NQ, adding NQ to N and connecting the start and end transitionof NQ to Q.

Definition 7 (Subprocess of a sublog). Let L be a log, let N be a Petri net, let α(L) be analignment of L to N , and let (LQ, Q) be a sublog of α(L).

Let L+Q = {start a1 . . . ak end | (τ, a1) . . . (τ, ak) ∈ LQ} be the sequences of events described

in LQ extended by a start event and an end event, start , end 6∈ ΣL.Let M be a mining algorithm that returns for any log a fitting model (i.e., a Petri net that

can replay the log). Let NQ =M(L+Q). Then (NQ, Q) is the subprocess of LQ.

The mining algorithmM will produce transitions labeled with the events in L+Q and a start

transition tNQ

start with label start and an end transition tNQ

end with label end . In the following,

we assume that •tNQ

start = ∅ and tNQ

end

•= ∅, i.e., that start and end transitions have no pre- or

post-places. In case M produced pre- and post-places for start and end, these palces can be

safely removed without changing that NQ can replay L+Q. When repairing N , we connect t

NQ

start

and tNQ

end to the location Q of the subprocess.

Definition 8 (Subprocess model repair). Let L be a log, let N be a Petri net. Let α(L) be thealignments of the traces of L to N .

Let {(LQ,1, Q1), . . . , (LQ,k, Qk)} be a minimal and complete set of subtraces of α(L). Thesubprocess-repaired model of N w.r.t. α(L) is the net N ′ that is obtained from N as follows.

• Add to N a fresh transition tτ 6∈ TN with •tτ = •t and tτ• = t•, `′(tτ ) = τ iff there exists

an alignment α ∈ α(L) and a model move (t, τ), and

• For each sublog (LQ,i, Qi), i = 1, . . . , k, let (NQ,i, Qi) be the subprocess of LQ,i s.t. NQ,i

and N are disjoint (share no transitions or places). Extend N with NQ,i (add all places,transition and arcs of NQ,i to N) and add arcs (q, start(NQ,i) and (end(NQ,i, q) for eachqi ∈ Qi, and set labels `′(start(NQ,i)) = τ and `′(end(NQ,i)) = τ .

Theorem 2. Let L be a log, let N be a Petri net. Let α(L) be the alignments of the traces of Lto N and {(LQ,1, Q1), . . . , (LQ,k, Qk)} be a minimal and complete set of subtraces of α(L). LetN ′ be a subprocess-repaired model of N w.r.t. these subtraces. Then each trace l ∈ L is a labeledoccurrence sequence of N ′, that is, N ′ can replay L.

Sketch. This theorem holds by the same arguments as Theorem 1: each move on model (t, τ)of α(L) is replaced by a synchronous move (tτ , τ), and each maximal sequence (τ, a1) . . . (τ, ak)

that was part of sublog (LQ,i, Qi) is replaced by a sequence (tNQ,i

start), τ)(t1, a1) . . . (tk, ak)(tNQ,i

end , τ)

of synchronous moves in the subprocess NQ,i. (tNQ,i

start , τ) and (tNQ,i

end , τ) are synchronous because

they are made silent by relabeling tNQ,i

start and tNQ,i

end with τ .

This theorem concludes the techniques for process model repair presented in this deliverable.Observe that original model N is preserved entirely as we only add new transition and newsubprocesses. By taking a best alignment α(L) of L to N , one ensures that number of new τ -transitions and the number of new subprocesses (or of new self-looping transitions) is minimal.


5.3 Improving RepairThe quality of the model repair step can be improved in some cases. According to Def. 8, eachsublog (LQ,i, Qi) is added as a subprocess NQ,i that consumes from and produces on the sameset Qi of places, i.e., the subprocess is a loop. If this loop is executed in each case of L onlyonce, then NQ,i is also executed exactly once. Thus, N could be repaired by inserting NQ,i insequence (rather than as a loop), by refining the places Qi = {q1, . . . , qk} to places {q−1 , . . . , q

−k }

and {q+1 , . . . , q

+k } with

1. •q−j = •qj , q−j•

= {tNQ,i

start}, j = 1 . . . , k, and

2. q+j•

= qj•, •q+

j = {tNQ,i

end }, j = 1 . . . , k.

Also, the repaired model N ′ can structurally simplified by removing those model elements whichare no longer used. Consider for instance a transition t which is never executed because thealignment only includes moves on model (t, τ). In this case t is always skipped by transition tτand t can be removed from N ′.

6 Experimental EvaluationThe technique for repairing artifact life-cycles presented in this deliverable is implemented inthe Process Mining Toolkit ProM 6 in the package ArtifactModelling which is available fromhttp://www.promtools.org/prom6/ACSI. These plugins were used to validate our techniquefor model repair on the Order-to-Cash example (the running example used within ACSI) andon first data from FRIS use case provided by Collibra.

6.1 Implementation in ProMArtifactModelling provides a plugin Repair Model that takes as input a Petri net N , a log L,and a best-fitting alignment α(L) of L to N , see Fig. 7. The alignment can be computed inProM 6 using the Conformance Checker of [25, 5, 3]. The plugin repairs N by extending Nwith subprocesses as defined in Def. 8 and Sect. 5.3. For this, it first replays each alignmenton N , and identifies all subtraces. Then subtraces are grouped to sublogs at the same location.The resulting sublogs are merged if they share the same location in a greedy way (by mergingsublogs with the largest overlap of places first), until the resulting set of sublogs is minimal (i.e.,all locations are disjoint). Each sublog is then passed to the ILP miner [26] which guarantees toreturn a model that can replay the sublog and has good precision. The returned model is thenadded to N as a subprocess as defined in Def. 8.

6.2 Validation on the Order-to-Cash ExampleWe validated our implementation for repairing artifact life-cycles on the Order-to-Cash exampleused throughout ACSI (described in this section) and on data from the FRIS use case (describedin Sect. 6.3).

The Order-to-Cash process was introduced in Deliverable D1.1 as a running example for allconceptual work packages. In this process, a customer requests to build a specific product which


http://www.promtools.org/prom6/ACSI

is handled through a Customer Purchase Order (CPO). The process then invokes a number ofMaterial Purchase Orders (MPOs) to obtain all materials that are needed to build the productfrom various suppliers. Once all materials have been received from all suppliers, the product isbuilt and shipped to the customer. This process consists of two artifacts, CPO and MPO.

We used this process to validate our technique in a fully controlled environment that alsoallows to control the quality of the logs used for model repair (see Sect. 3), as follows. Weimplemented a simple version of the Order-to-Cash Process in CPN Tools2; the details of thismodel are presented in Deliverable D3.3. We then used the simulation feature of CPN Tools togenerate a raw log of this process. Based on this raw log, we manually created artifact modelsC1 and M1 of CPO and of MPO that conform to this raw log L1. We validated conformance ofC1 and M1 to L1 using the conformance checking techniques of Deliverable D3.1.

Then we refined and extended the first simple model of the Order-to-Cash Process, whichcorresponds to an evolution of the process in reality. Again, the process was simulated andprocess executions were recorded in a raw log L2. Using the conformance checking techniquesof Deliverable D3.1, we discovered misconformances of C1 and M1 to L2. Then, C1 and M1

were repaired w.r.t. L2 using the Model Repair plugin of ProM, ensuring conformance of therepaired life-cycle models to L2.

We present the details of this evaluation next.

6.2.1 Initial Situation: Simple Order-to-Cash ProcessFigure 4 shows the initial artifact model of the simple variant of the Order-to-Cash process.As can be seen from the entity relationship diagram at Fig. 4(top), the process contains twoartifacts (CPO and MPO) where each CPO has one MPO and vice versa. The respective life-cycle models are shown below.

The CPO’s life-cycle model C1 is shown in Fig. 4(left). A CPO is received from the customer,then a work order is created after which an MPO is instantiated through the CPO and the CPOwaits to collect parts (materials) that are requested through the MPO. The MPO’s life-cyclemodelM1 is shown in Fig. 4(right): once created, a request for material is sent to a supplier whichis eventually answered positive and the ordered parts can be received, or is negative and theMPO is canceled. When the MPO finished by receiving parts, the CPO collects the deliveredparts, and assembles and delivers the product to the customer. When the MPO finished bycancelation, receiving parts is skipped and the entire CPO is canceled.

We validated the conformance of this model to the first raw log L1. In particular, the entity-relationship diagram of Fig. 4(top) was rediscovered from L1 using the technique described inDeliverables D3.1 and D3.3. Life-cycle conformance of C1 and M1 to L1 was then verified usingthe conformance checker of D3.1.

6.2.2 Change: Complex Order-to-Cash ProcessThis very simplistic first version of the Order-to-Cash process was then extended as follows:

1. A CPO can create multiple MPOs taking care of different parts of the product to be built.

2. When a supplier has a negative response to a material request in an MPO, the MPO isnot canceled, but two things can happen: the request is retried at the same supplier bythe same MPO, or the material order is reassigned to a different supplier.

2http://cpntools.org/


Initial

pl2

pl3

pl4

pl5

pl6pl7

pl8

pl9

Final

pl11

pl12

pl13

Receive_customer_PO_COMPLETE+

Create_work_order_COMPLETE+

Create_new_MPO_START+

Create_new_MPO_COMPLETE+

Collect_Parts+

All_parts_received+

Assemble_COMPLETE

cancel_CPO+

Cancel_Completed

Deliver_COMPLETE

Close_CPO_COMPLETE+

tr14

n1

XOR-split

Sink

n4

Source

Create_material_PO_COMPLETE+

Send_MPO_req_COMPLETE+

Receive_supp_answer_COMPLETE+

Cancel_MPO_COMPLETE+Receive_parts_COMPLETE+

Figure 4 – Given life-cycle models of the Order-to-Cash example: each customerpurchase order C1 (left) is related to one material purchase order M1

(right), and vice versa.


Figure 5 – Entity-Relationship Diagram of the extended version of theorder-to-cash example.

3. As a consequence, the CPO has to collect multiple parts before assembling the product.

4. Also the delivery step is split into two independent steps: shipment and invoice.

Figure 5 shows the entity-relationship diagram that was extracted from the rawlog L2 of theextended process, using the preprocessing techniques of D3.1/D3.3. We can clearly see thechanged cardinalities between CPOs and MPOs (one CPOs is related to several MPOs).

When checking conformance of the simple life-cycle models of CPO and MPO of Fig. 4 toL2 using the life-cycle conformance checking technique of D3.1, we obtain the alignments shownin Fig. 6. The alignment for CPO (Fig. 6(top)) shows several moves on log (yellow) indicatingevents in the log not explained by the life-cycle model of Fig. 4(left). Also one move on model(purple) is shown indicating that event delivery is no longer part of the artifact. Similar changescan be observed for the life-cycle of MPO.

6.2.3 Evolving from Simple to Complex by Model RepairWe then repaired the life-cycles C1 of CPO and M1 of MPO shown in Fig. 4 w.r.t. the alignmentsof Fig. 6 using the Repair Model plugin of ProM (see Fig. 7).

Figure 8 shows the repaired life-cycle model C2 of the Customer Purchase Order artifact. Thetwo boxes indicate subprocesses added by the model repair. The subprocess at the bottom insertsshipment and invoice (can be executed in parallel) which replace the delivery step (deliveryis always skipped by transition Silent Deliver Complete). The second subprocess to the rightcontains steps that handle (1) the creation of additional MPOs, and (2) the arrival of additionalparts from the additional MPOs. Only when all parts are received, the product is assembled.Finally, a CPO can no longer be cancelled (because none of its MPOs can be cancelled). Thecorresponding part of the artifact life-cycle is never used and hence has been removed from themodel as described in Sect. 5.3.

Figure 9 shows the repaired life-cycle model of the material purchase order artifact. Can-celation is now always skipped (transition Silent Cancel MPO Complete). Instead, the firstsubprocess allows several retries to order materials from the supplier assigned to the runningMPO until ordered parts are received. The second subprocess describes the situation where(despite several retries) the supplier cannot provide the ordered material and the supplier hasto be reassigned (causing termination of the current MPO and the instantiation of anotherMPO).


Figure 6 – Best-Matching Alignments of logs of the extended order-to-cashexample to the Customer Purchase Order (top) and the MaterialPurchase Order (bottom) life-cycle models of Fig. 4

Figure 7 – Model repair plugin in ProM.


Initial

pl2

pl3

pl4

pl5

pl6

pl8

pl9

Final

pl11

pl12

pl13

pl8_post

Receive_customer_PO_COMPLETE+

Create_work_order_COMPLETE+

Create_new_MPO_START+

Create_new_MPO_COMPLETE+

Collect_Parts+

All_parts_received+

Assemble_COMPLETE

Close_CPO_COMPLETE+

tr14

SILENT Deliver_COMPLETE

P 4 P 5

P 1 P 2

Invoice_COMPLETE Ship_COMPLETE

P 1

P 1

P 1

Create_new_MPO_START

Create_new_MPO_COMPLETE

Collect_Parts

Collect_PartsCreate_new_MPO_START

Create_new_MPO_COMPLETECollect_Parts

Figure 8 – Repaired Life-Cycle model of the customer purchase order ofFig. 4(left) by alignment of Fig. 6(top).


n1

XOR-split

Sink

n4

Source

Create_material_PO_COMPLETE+

Send_MPO_req_COMPLETE+

Receive_supp_answer_COMPLETE+

Receive_parts_COMPLETE+SILENT Cancel_MPO_COMPLETE+

P 3

P 2P 4

Restart_supplier_assignment_COMPLETE

Retry_MPO_COMPLETE

Send_MPO_req_COMPLETE

Receive_supp_answer_COMPLETE

P 1

P 2

P 4

Retry_MPO_COMPLETE

Send_MPO_req_COMPLETE

Receive_supp_answer_COMPLETE

Figure 9 – Repaired Life-Cycle model of the material purchase order ofFig. 4(right) by alignment of Fig. 6(bottom).

6.3 Validation on the FRIS Use-CaseIn addition to a controlled experiment on the Order-to-Cash example, we also validated modelrepair on real-life data provided by IWT and Collibra in the FRIS use case.

In Deliverable D5.2, Collibra provided us with several to-be FSM life-cycle models of theartifacts of the IWT research project funding process that could be related to a first set of logfiles of the IWT FRIS processes recorded over the course of the last two years.

At the current stage, we had to limit ourselves to an evaluation of the execution of paymentsartifact as only for this to-be model a significant mapping between life-cycle transitions andlog events could be found. For the other artifacts, we were only able to map at most 2 eventsof the log to the given life-cycle model. However, it is clear that the proposed to-be modelsare incapable of expressing the current situation at IWT correctly. Nevertheless, the availablemapping in case of the execution of payments artifacts shows the applicability of our techniquein practice.

Figure 10 shows at the top the mapping of log events to steps in the to-be life-cycle model3;in the middle the alignment of the log to the given life-cycle model is shown; the graphics at thebottom shows the projection of this alignment onto the given life-cycle model. We can clearlysee strong misconformances between the to-be model and the current situation at IWT: thefirst step (examine payments requirements) is executed in all cases, the third step (validatedpayment) and the last step (receive payment) are executed in almost all cases, whereas thefourth step (validate) only in about 40%, and the fifth step is never executed.

3The original to-be life-cycle model of the payments artifacts is shown in Fig.8 of D5.2.


Figure 10 – Conformance of to-be FRIS payments artifact life-cycle to currentprocess executions based on the given event mapping.


Figure 11 – To-be FRIS payments artifact life-cycle model repaired wrt. thecurrent process executions.

Moreover, the alignment of Fig. 10(middle) shows that some of the steps are executedmultiple times which is not represented by the given life-cycle model. Repairing this life-cyclemodel using the technique presented in this deliverable yields the life-cycle model of Fig. 11.

The payments life-cycle model of D5.2 of is extended by two subprocesses at the right.The top subprocess shows that after receiving the first payment, additional payments can bevalidated and received. The bottom subprocess describes a relatively unstructured life-cyclecomprising all four payment events in arbitrary order. Moreover, we can clearly see that thesecond step of the original life-cycle model is skipped entirely and the third step can now beskipped sometimes.

While the resulting, automatically repaired, life-cycle model may not be a perfect artifactmodel in terms of simplicity, it reflects the artifact behavior as it happens in reality. Thus, theproposed technique allows to evolve artifact-centric processes based on event data observed inreality. Also, the repaired artifact models may jump-start process improvement efforts leadingto a better design of the artifact-centric process.

7 Related WorkThe model repair technique presented in this deliverable largely relates to two bodies of work:conformance checking of models and changing models to reach a particular aim.

Various conformance checking techniques that relate a given model to an event log havebeen developed in the past. Their main aim is to quantify fitness, i.e., how much the modelcan replay the log, and if possible to highlight deviations where possible [25, 3, 5, 7, 22] Themore recent technique of [3] uses alignments to relate log traces to model executions whichis a prerequisite for the repair approach presented in this deliverable. Besides fitness, other


metrics [8, 14, 19, 20, 27] (precision, generalization, and simplicity) are used to describe howgood a model represents reality. Precision and generalization are currently not taken into accountin our approach. Incorporating these measure into model repair is future work. Simplicity isconsidered in our approach in the sense that changes should be as tractable as possible, whichwe could validate experimentally.

A different approach to enforcing similarity of repaired model to original model could bemodel transformation incorporating an edit distance. The work in [16] describes similarity ofprocess model variants based on edit distance. Another approach to model repair is presentedin [13] to find for a given model a most similar sound model (using local mutations). [17]considers repairing incorrect service models based on an edit distance. These approaches do nottake the behavior in reality into account. Other approaches to adjust a model to reality are toadapt a model at runtime [23, 21], which creates an individual model for each process execution.The technique of this deliverable repairs a model for multiple past executions recorded in a log.The approach of [11] uses observed behavior to structurally simplify a given model obtained inprocess discovery.

8 ConclusionThis deliverable addressed the problem of repairing an artifact life-cycle model w.r.t. a givenlog. We proposed a repair technique that preserves the original model structure and introducessubprocesses into the model to permit to replay the given log on the repaired model. Ourtechnique builds on the conformance checker developed in D3.1 and the log extraction techniquesdeveloped in D3.3. We validated our technique on the running example of the order-to-cashprocess of D1.1 and on models and logs of the FRIS use case of D5.2. Here, we showed thatthe approach is effective and the resulting model allows to understand the changes done to theoriginal model for repair.

Future work is mostly concerned with two directions. First, the quality of the repairedmodel highly depends on the alignment that was found between given model and log by theconformance checker. Currently, the alignment is identified by optimizing a local cost function(assigning penalties for adding a move on log or move on model). Future work is to find a moreglobal cost function that aims at finding a most similar alignment of all traces yielding a simplerstructure in the repaired model. Second, the quality of the repaired model may also depend onattribute values of events reflecting data-based decision as well as interactions between artifacts.Attribute values are orthogonal to the model repair technique presented in this deliverable in thesense that they influence finding a best-matching alignment between log and model. Once thishas been found, the model can be repaired with respect to this alignment using the techniqueof this deliverable.


References[1] ACSI Project. Deliverable d3.3, discovery of artifact life-cycles. Technical report, ACSI

Project, 2012.

[2] A. Adriansyah, N. Sidorova, and B. F. van Dongen. Cost-based fitness in conformancechecking. In B. Caillaud, J. Carmona, and K. Hiraishi, editors, ACSD, pages 57–66. IEEE,2011.

[3] A. Adriansyah, B. van Dongen, and W. M. P. van der Aalst. Conformance Checking usingCost-Based Fitness Analysis. In IEEE International Enterprise Computing Conference(EDOC 2011). IEEE Computer Society, 2011.

[4] A. Adriansyah, B. F. van Dongen, and W. M. P. van der Aalst. Conformance checkingusing cost-based fitness analysis. In EDOC, pages 55–64. IEEE Computer Society, 2011.

[5] A. Adriansyah, B. F. van Dongen, and W. M. P. van der Aalst. Towards Robust Confor-mance Checking. In M. Muehlen and J. Su, editors, BPM 2010 Workshops, Proceedingsof the Sixth Workshop on Business Process Intelligence (BPI2010), volume 66 of LectureNotes in Business Information Processing, pages 122–133. Springer-Verlag, Berlin, 2011.

[6] E. Badouel and P. Darondeau. Theory of regions. In Petri Nets’96, pages 529–586, 1996.

[7] T. Calders, C. Guenther, M. Pechenizkiy, and A. Rozinat. Using Minimum DescriptionLength for Process Mining. In ACM Symposium on Applied Computing (SAC 2009), pages1451–1455. ACM Press, 2009.

[8] J. E. Cook and A. L. Wolf. Software Process Validation: Quantitatively Measuring theCorrespondence of a Process to a Model. ACM Transactions on Software Engineering andMethodology, 8(2):147–176, 1999.

[9] D. Fahland, M. de Leoni, B. F. van Dongen, and W. M. P. van der Aalst. Behavioralconformance of artifact-centric process models. In W. Abramowicz, editor, BIS, volume 87of Lecture Notes in Business Information Processing, pages 37–49. Springer, 2011.

[10] D. Fahland, M. de Leoni, B. F. van Dongen, and W. M. P. van der Aalst. Confor-mance checking of interacting processes with overlapping instances. In S. Rinderle-Ma,F. Toumani, and K. Wolf, editors, BPM, volume 6896 of Lecture Notes in Computer Sci-ence, pages 345–361. Springer, 2011.

[11] D. Fahland and W. M. P. van der Aalst. Simplifying Mined Process Models: An ApproachBased on Unfoldings. In S. Rinderle, F. Toumani, and K. Wolf, editors, Business ProcessManagement (BPM 2011), volume 6896 of Lecture Notes in Computer Science, pages 362–378. Springer-Verlag, Berlin, 2011.

[12] D. Fahland and W. M. P. van der Aalst. Repairing process models to reflect reality. In 10thInternational Conference on Business Process Management September 3-6, 2012, Tallinn,Estonia, 2012. to appear.

[13] M. Gambini, M. L. Rosa, S. Migliorini, and A. H. M. ter Hofstede. Automated ErrorCorrection of Business Process Models. In S. Rinderle, F. Toumani, and K. Wolf, editors,Business Process Management (BPM 2011), volume 6896 of Lecture Notes in ComputerScience, pages 148–165. Springer-Verlag, Berlin, 2011.


[14] S. Goedertier, D. Martens, J. Vanthienen, and B. Baesens. Robust Process Discovery withArtificial Negative Events. Journal of Machine Learning Research, 10:1305–1340, 2009.

[15] IEEE Task Force on Process Mining. Process Mining Manifesto. In F. Daniel, K. Barkaoui,and S. Dustdar, editors, Business Process Management Workshops, volume 99 of LectureNotes in Business Information Processing, pages 169–194. Springer-Verlag, Berlin, 2012.

[16] C. Li, M. Reichert, and A. Wombacher. Discovering Reference Models by Mining ProcessVariants Using a Heuristic Approach. In U. Dayal, J. Eder, J. Koehler, and H. Reijers,editors, Business Process Management (BPM 2009), volume 5701 of Lecture Notes in Com-puter Science, pages 344–362. Springer-Verlag, Berlin, 2009.

[17] N. Lohmann. Correcting Deadlocking Service Choreographies Using a Simulation-BasedGraph Edit Distance. In M. Dumas, M. Reichert, and M. Shan, editors, InternationalConference on Business Process Management (BPM 2008), volume 5240 of Lecture Notesin Computer Science, pages 132–147. Springer-Verlag, Berlin, 2008.

[18] A. Medeiros, A. Weijters, and W. Aalst. Genetic Process Mining: An Experimental Eval-uation. Data Mining and Knowledge Discovery, 14(2):245–304, 2007.

[19] J. Munoz-Gama and J. Carmona. A Fresh Look at Precision in Process Conformance.In R. Hull, J. Mendling, and S. Tai, editors, Business Process Management (BPM 2010),volume 6336 of Lecture Notes in Computer Science, pages 211–226. Springer-Verlag, Berlin,2010.

[20] J. Munoz-Gama and J. Carmona. Enhancing Precision in Process Conformance: Stability,Confidence and Severity. In N. Chawla, I. King, and A. Sperduti, editors, IEEE Symposiumon Computational Intelligence and Data Mining (CIDM 2011), Paris, France, April 2011.IEEE.

[21] M. Reichert and P. Dadam. ADEPTflex-Supporting Dynamic Changes of Workflows With-out Losing Control. JIIS, 10(2):93–129, March 1998.

[22] A. Rozinat and W. M. P. van der Aalst. Conformance Checking of Processes Based onMonitoring Real Behavior. Information Systems, 33(1):64–95, 2008.

[23] S. W. Sadiq, W. Sadiq, and M. E. Orlowska. Pockets of flexibility in workflow specification.In ER’2001, volume 2224 of LNCS, pages 513–526, 2001.

[24] W. M. P. van der Aalst. Process Mining: Discovery, Conformance and Enhancement ofBusiness Processes. Springer, 2011.

[25] W. M. P. van der Aalst, A. Adriansyah, and B. van Dongen. Replaying History on ProcessModels for Conformance Checking and Performance Analysis. WIREs Data Mining andKnowledge Discovery, 2(2):182–192, 2012.

[26] J. van der Werf, B. van Dongen, C. Hurkens, and A. Serebrenik. Process Discovery usingInteger Linear Programming. Fundamenta Informaticae, 94:387–412, 2010.

[27] J. D. Weerdt, M. D. Backer, J. Vanthienen, and B. Baesens. A Robust F-measure forEvaluating Discovered Process Models. In N. Chawla, I. King, and A. Sperduti, editors,IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2011), pages148–155, Paris, France, April 2011. IEEE.


Documents

ACSI { Artifact-Centric Service Interoperation€¦ · given model of an artifact-centric process conforms to process executions observed in reality. We provided checking techniques