Detecting Data-Flow Errors in BPMN 2 - RonPub · PDF fileSilvia von Stackelberg, Susanne Putze, Jutta M¨ulle, Klemens B ¨ohm: Detecting Data-Flow Errors in BPMN 2.0 Correction done

c© 2014 by the authors; licensee RonPub, Lubeck, Germany. This article is an open access article distributed under the terms and conditions ofthe Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Open Access

Open Journal of Information Systems (OJIS)Volume 1, Issue 2, 2014

http://www.ronpub.com/journals/ojisISSN 2198-9281

Detecting Data-Flow Errors in BPMN 2.0Silvia von Stackelberg, Susanne Putze, Jutta Mulle, Klemens Bohm

Institute for Program Structures and Data Organization, Karlsruhe Institute of Technology (KIT),Am Fasanengarten 5, 76131 Karlsruhe, Germany,

{silvia.stackelberg, susanne.putze, jutta.muelle, klemens.boehm}@kit.edu

ABSTRACT

Data-flow errors in BPMN 2.0 process models, such as missing or unused data, lead to undesired process executions.In particular, since BPMN 2.0 with a standardized execution semantics allows specifying alternatives for data aswell as optional data, identifying missing or unused data systematically is difficult. In this paper, we propose anapproach for detecting data-flow errors in BPMN 2.0 process models. We formalize BPMN process models bymapping them to Petri Nets and unfolding the execution semantics regarding data. We define a set of anti-patternsrepresenting data-flow errors of BPMN 2.0 process models. By employing the anti-patterns, our tool performs modelchecking for the unfolded Petri Nets. The evaluation shows that it detects all data-flow errors identified by hand,and so improves process quality.

TYPE OF PAPER AND KEYWORDS

Regular research paper: Business Processes, BPMN 2.0, Data-Flow Error, Anti-Patterns, Petri Nets, Model Check-ing

1 INTRODUCTION

The language BPMN (Business Process Model and No-tation), now an ISO standard, is widely accepted in re-search and practice. One important trait of BPMN 2.0 [5]is the possibility to specify executable processes. Fur-ther, the standard integrates the data aspect: With BPMN2.0, data objects are first-class flow elements. This is notonly an issue at a syntactic level, as is the case with, say,BPEL [16], but also addresses the semantics, i.e., howthe data is used. Process designers can now model, forexample, the data needs and data results of a task. Thedata needs describe the data elements a task requires forits execution, data results the ones available afterwards.This does not only hold for tasks, but - with restrictions- also for other flow elements, such as events or condi-tional sequence flows. Finally, one can specify alterna-tives for data as well as optional data.

Process designers model the data flow, i.e., how data

traverses a process, by specifying the data needs and dataresults for individual flow elements. The problem stud-ied here is to decide at design time whether the data-flowspecifications in executable BPMN process models arecorrect. In line with other research [14, 18, 22], a dataflow is correct if there are no anomalies regarding pro-cessed data. Such anomalies occur if data needed byflow elements is not available, if flow elements do notuse data produced before, or if they use data inconsis-tently. Data-flow correctness is crucial. To illustrate,a missing data element (e.g., a non-initialized element)may lead to blocking due to starvation, or to incorrectgateway decisions [3, 18]. In executable BPMN models,specifications for optional data and alternatives for datacan contain errors as well.

Example 1: Think of a process withdrawing moneyfrom an ATM with two alternative authentication meth-ods. In this process, a task authentication needs eitherthe data elements cash card and PIN, or the elements

1

http://creativecommons.org/licenses/by/3.0/

http://www.ronpub.com/journals/ojis

Open Journal of Information Systems (OJIS), Volume 1, Issue 2, 2014

cash card and fingerprint. If at least one of the data el-ements of an alternative authentication method may notbe available at task execution time, it indicates a designfault, namely a missing data error in one of the alterna-tives for data input of a task. This means that we define amissing data error in a strict manner: Not only an unini-tialized data element which definitely leads to blockingis critical, but also an uninitialized one which might becompensated with alternatives at runtime. �

In consequence, an approach for detecting data-flowerrors in BPMN 2.0 process models, i.e., at design time,has to take alternatives for data and optional data intoaccount.

It is advantageous to check the correctness of a dataflow already at design time (correctness at design time).In particular, this holds for data-flow errors which onecannot detect at runtime. To illustrate, compensatingmissing data at runtime with an available alternativehides design errors, c.f. Example 1. Existing approachesfor checking correctness, e.g., [14, 18, 22], are not ableto detect such errors. Moreover, they cannot take thespecifics of BPMN 2.0 into account, namely the execu-tion semantics when using mandatory and optional dataas well as alternatives for data.

Detecting data-flow errors in executable BPMN givesrise to the following challenges: (1) The BPMN 2.0specification defines the execution semantics of flow el-ements with their data needs and data results only in-formally, in a textual representation. Hence, a formalverification requires a transformation of BPMN processmodels into a formal structure like Petri Nets. This trans-formation is complex, because it has to consider the exe-cution semantics with respect to data elements. (2) Data-flow errors in executable BPMN may have to do withthe fact that some data elements are optional, whereasothers are mandatory. This leads to additional complex-ity, compared to the case where everything is mandatory.The definitions of data-flow errors must take all possiblecombinations into account. (3) A task may read or writedata elements out of alternative sets. Detecting errorsmust take this into account as well. However, avoiding toexplicitly handle the huge number of possible constella-tions for data within the process execution path that stemfrom such flexible alternatives is not trivial. As data ele-ments may be optional at the same time, things becomeeven more complex. (4) There is a lack of publicly avail-able process models conforming to BPMN 2.0, whichcould be used in an evaluation. For instance, the repos-itory provided by the BPM Academic Initiative [4] doesnot contain models suited for this purpose, as we willexplain in Section 4.

This paper proposes an approach to detect data-flowerrors in BPMN 2.0 process models. More specifically,

we make the following contributions:Definition of Anti-Patterns. Starting with classifica-

tions of anti-patterns from the literature [14, 21, 22], wedefine a set of data-flow anti-patterns enhanced to cap-ture the specifics of data in BPMN 2.0. Our anti-patternsdescribe anomalies of the data flow. Their definitionsconsider the execution semantics for data needs and dataresults of BPMN 2.0 flow elements. In particular, theyallow for combinations of alternative data needs and dataresults and distinguish between mandatory and optionaldata.

Transformation. We specify a new transformation ofprocess models into unfolded Petri Nets. In particular,these Petri Nets formally express the execution seman-tics of the process and its flow elements using data. Theydo so by taking alternatives for data as well as optional-ity of data into account. They also avoid handling thehuge number of data alternatives explicitly by coordinat-ing data needs and data results separately.

Tool Support. We have implemented a tool to detectdata-flow errors in BPMN 2.0 process models automat-ically at design time. It realizes the transformation justmentioned and finds data-flow errors in the Petri Net us-ing a model checker. When a data-flow error is found,the tool points to the task where it occcurs.

Evaluation. We have asked a BPMN expert to developa set of process models, which we then have used in ourevaluation. Our tool systematically detects all data-flowerrors generated by him as well as errors occurring inanother user experiment.

Using this approach, process designers can now in-crease the quality of their models by analyzing the dataflow of BPMN 2.0 processes at design time. This avoidscostly process executions with errors. Our approach al-lows to detect data-flow errors such as missing and re-dundant optional data.

Section 2 analyzes and explains the data perspectiveand describes data-flow errors in BPMN 2.0 processmodels. Section 3 introduces our approach to check data-flow correctness in executable BPMN process modelsand its implementation. We describe the evaluation ofour approach in Section 4. Section 5 discusses relatedwork, and Section 6 concludes.

2 DATA IN BPMN

The BPMN 2.0 standard [5] distinguishes several repre-sentations for process models. They differ regarding ex-pressiveness: The executable subclass contains the com-plete execution information. The graphical process dia-grams in turn cover only a subset of this. Data elementsplay different roles in these representations: Data associ-ations with flow elements in graphical process diagrams

2

Silvia von Stackelberg, Susanne Putze, Jutta Mulle, Klemens Bohm: Detecting Data-Flow Errors in BPMN 2.0

Correctiondone

Answersheets

StateC

orrection not done

Answer key

incomplete

Single correction

Complementanswer key

Answer key

Check state of

correctionGet answer

sheetGet data

Answersheets

pred3(State)

pred2(State)

pred1(State)

Answer sheet

Figure 1: Exam Correction Process Diagram with Data-Flow Errors

mean potential data needs and potential data results offlow objects. The specifications for mandatory data, al-ternatives for data, and optional data in the executablesub-class give precise information on data needs anddata results. So-called InputSets and OutputSets,containing DataInputs and DataOutputs, representthis information, which is not visible in the graphicalprocess diagram.

In the following, we first describe concepts for spec-ifying process diagrams with data graphically and givean example of a BPMN 2.0 process diagram with dif-ferent data-flow errors. Then we analyze the executionsemantics of process flow elements handling data. In thefollowing, we will use the term BPMN instead of BPMN2.0 if it is unambiguous.

2.1 Data in Graphical Process Diagrams

The most important concepts regarding data flow inprocess diagrams are DataObjects and their Data-Associations to the flow elements task and event. Arepresentation of a DataObject, visualized as docu-ment, can be a single instance or a collection of dataelements of the same type. Process designers can de-fine potential data needs and data results by modelingDataAssociations to or from DataObjects, vi-sualized as dotted lines. They represent potential readsor writes on the DataObjects. DataObjects areunique within a process, but one DataObject canbe referenced (i.e., visualized) several times in a pro-cess model. All specifications for DataObjects to-gether determine the graphically visible data flow withina process. BPMN describes DataObjects local tothe process, so a DataObject does not require an ex-plicit deletion. DataStores, DataInput and Data-Output of processes allow to specify data exchange

between databases and processes, thus they do not af-fect the data flow within the process. While taskscan have DataAssociations to and from Data-Objects, events have either DataAssociations toor DataAssociations from DataObjects, depen-dent on their type (catching or throwing).

Example 2: Figure 1 displays a process diagram forcorrecting answer sheets of a written exam. At the begin-ning, the DataInput Collection Answer sheetsof the process contains a set of answer sheets (i.e., filledout solutions of the exam). Task Get answer sheet selectsone element of a local copy of this collection and writes itto the DataObject Answer sheet. A performer of taskSingle correction marks this Answer sheet using cor-rection guidelines given in DataObject Answer key.Task Check state of correction reviews whether all An-swer sheets have been corrected or not. The task writesits result in DataObject State. It records the statusof the correction process. The performer of task Singlecorrection may identify a solution in an Answer sheetwhich is not part of the correction guidelines in Answerkey up to now. In this case, Single correction writes thestatus Answer key incomplete to State, and the processruns task Complement answer key later on. Because ofthis, State is optional output of this task and is writtenon demand. As the graphical representation does not al-low modeling optionality of output, this characteristic isnot visible in Figure 1. In the other cases, a performercorrects the next answer sheet (Correction not done &Answer key complete), or the process terminates (Cor-rection done). �

2.2 Data in Executable Processes

To get executable processes, BPMN requires to refine thepotential data needs and data results of a task or event,

3


Singlecorrection

Check state of

correction

Answer sheet

[optional subset

[mandatory subset

do_2

T

InputSet

OutputSet

(task)DataInput

T

(task)DataOutput

T

Symbols:

Task

Answersheets

State

di_2

T

di_1

T

Answer key

di_2

T

di_1

T

do_1

T

do_2

T

Figure 2: Data Specifications for two Executable Tasks

represented as DataAssociations in the process di-agram, by a more concrete specification [5]. This is im-portant because there exist several cases of ambiguity ingraphical diagrams. For example, several input or out-put DataAssociations can mean that a task reads orwrites all data elements, only one of them, or some com-bination. Further, process designers can combine speci-fications for alternatives and optional data.

Example 2 (cont.): In Figure 1, it is unclear whetherthe data output State of task Single correction is optionalor mandatory. �

We now focus on the execution semantics for tasks.The data elements a task requires for its execution areits data needs. When a task is ready for execution, theprocess engine checks the availability of its data needs.The concepts InputSet and OutputSet describe thedata needs and data results. Figure 2 shows an excerptof our sample process of Figure 1. The boxes displaytwo expanded tasks Single correction and Check state ofcorrection, with their InputSets and OutputSets.Each InputSet consists of references to DataInput,which are associated with one or more DataObjects.In Figure 2, the ovals with dotted lines represent anInputSet and an OutputSet of a task (the first oneswith small, the latter ones with larger dots), containingDataInputs and DataOutputs. For the visual rep-resentation of DataInputs and DataOutputs of atask, we have harnessed the visual notation of Data-Input (empty arrow) and DataOutput (filled arrow)of processes and mark document symbols with “T”.

An InputSet summarizes data needs of a task t,with a mandatory and an optional subset. In Figure2, the parts of the ovals with grey background denote

the optional subsets, and those with white backgroundthe mandatory subsets of InputSets or OutputSets.An InputSet IS(t) is available if all DataObjectsreferred to in the mandatory subset of the InputSetISM (t) are available. DataObjects referred to in theoptional subset of the InputSet ISO(t) do not af-fect its availability. The converse concept to data needsis data results. They are the output of a task resultingfrom its execution. Analogously to the InputSets,OutputSets may have mandatory OSM (t) and op-tional subsets OSO(t). In Figure 2, do 1 of task Sin-gle correction is mandatory (white), and do 2 is optional(grey).

A task may have several InputSets, representing al-ternative data needs. If so, the process engine checksthe availability of the InputSets in the order of theirspecification in the process model and takes the first oneavailable to execute the task. In other words, at leastone InputSet has to be available for execution. Anal-ogously to the InputSets, a task may have severalOutputSets. When a task terminates, the process usesone OutputSet for further execution. InputSetscan optionally have references to the OutputSets thatshould be generated when the InputSet is used forthe execution of the task (outputSetRef). This addi-tional knowledge could be useful to limit the combina-tion of Input-/OutputSets at a task to reduce theeffort for model checking, see Subsection 3.4. However,the execution semantics in the standard does not takethese relationships into account, i.e., they only may serveas guideline for implementing the task but not for de-tecting data-flow errors. Moreover, in the standard theredoes not exist any rule defining which OutputSet willbe used during execution. The standard says: “The im-

4


plementation of the element where the OutputSet isdefined determines the OutputSet that will be pro-duced. So it is up to the Activity implementation or theEvent to define which OutputSet will be produced.”Consequently, for detecting data-flow errors we have totake the OutputSets of a task without any orderinginformation of the OutputSets or relationships to cer-tain InputSets.

2.3 Data-Flow Errors in BPMN 2.0

A common understanding is that a data flow of a pro-cess is correct if there are no anomalies regarding dataprocessed [21]. To capture these anomalies, existing ap-proaches [14, 21, 22] specify a set of data-flow errorsfor any data element of a process: missing data, redun-dant data, lost data, inconsistent data, and wrongly ornot destroyed data. A missing data error occurs if a taskneeds a DataObject, but it is not available, becauseit has not been initialized yet. A redundant data errorhappens if a task writes a DataObject which is notread by any task in the subsequent process execution. Alost data error holds if a task writes a DataObject,and no upcoming task reads it until another task over-writes it. An inconsistent data error occurs if one taskwrites a DataObject and another task reads or writesit in parallel. BPMN does not foresee the possibility todelete DataObjects explicitly, so wrongly or not de-stroyed data is not relevant in our context. [22] furtherdifferentiate between weak and strong variants of redun-dant and lost data. In other words, it may be some or allexecution paths containing the error. Regarding the dataaspects of BPMN, these general classifications of errorsare relevant as well. However, those publications do notcover the specifics of optionality of data as well as ofalternatives for data in BPMN.

Optionality of Data: In BPMN a task reads or writesa DataObject mandatorily or optionally. This af-fects the data flow and its correctness. For example, op-tional DataInputs and optional DataOutputs cancause lost data, but with partly different effects as in themandatory case. An optionally lost data error occurs ifa task writes a DataObject optionally or mandatorilybut it is not read by any task before it is optionally over-written. This may be problematic, but does not have tobe, in contrast to the mandatory case. Furthermore, op-tional outputs are not sufficient to avoid a missing dataerror.

Example 3: Our example in Figure 1 contains fourdata-flow errors: (1 & 2) two Missing Data errors, (3)Weakly Lost Data, (4) Strongly Redundant Data. (1) Inthe first run of Single correction, Answer key is unini-tialized. (2) Task Single correction initializes State only

optionally, but Check state of correction needs it manda-torily. (3) Weakly Lost Data refers to Answer sheet. Intwo execution paths (Correction not done & answer keycomplete, Answer key incomplete) there is no task read-ing Answer sheet until task Get answer sheet writes itagain. (4) There is no task using Answer sheet that hasbeen updated by Single correction. �

Alternatives for Data: The data flow of a BPMN pro-cess depends on the InputSets and OutputSetschosen during process execution. For example, to avoidcompensations for uninitialized data with alternatives atruntime, this asks for analyzing data flow with respectto these alternatives for data at design time. A miss-ing data error occurs in one alternative if at least oneDataObject of this InputSet is uninitialized. Ineach InputSet the missing data error depends on thealternatives for OutputSets chosen previously. To de-tect all potentially incorrect data alternatives, we have toconsider all possible combinations of InputSets andOutputSets for each DataObject involved. By do-ing so, we do not have to take the order of InputSetsand OutputSets into account.

Example 4: Think of a slight modification of the exam-ple in Figure 1: Task Check state of correction requiresDataObjects Answer sheets and State mandatorily.But now we assume that both DataObjects are alter-native in use. I.e., there are two InputSets specified,each with one DataObject. In this case, the error op-tionally missing data occurs at task Check state of cor-rection in one alternative for DataObject State. �

Due to the specifics of BPMN with respect to data,detecting data-flow errors in BPMN asks for new anti-patterns and their specifications by using the classifica-tion of anti-patterns known from literature. This requiresfirst to distinguish mandatory and optional data explic-itly. Second, multiple InputSets and OutputSetsmust be taken into account. As both features are speci-fied by the executable subclass of a process model, thisneed for new anti-patterns is not obvious having only agraphical process diagram at hand. Our anti-patterns forBPMN processes reflect alternatives for data and option-ality, see Table 1.

3 DETECTING DATA-FLOW ERRORS

To achieve data-flow correctness in BPMN 2.0 processmodels, we formalize the execution semantics regardingdata-dependent flow elements in BPMN by using a trans-formation algorithm, see Section 3.1. In Section 3.2 weformalize generic data-flow anti-patterns for BPMN 2.0and say how to generate process-specific anti-patterns formodel checking. The final step is proving the possible

5


Process-specific anti-patterns generation

I/O places

BPMN 2 Petri Net transformationBPMN 2.0 Model

checkingCorrectnessinformation

PetriNet

GenericBPMN 2.0

anti-patterns

Mapping & unfolding

rules

Process-specific anti-patterns

Figure 3: Overview of Our Approach on Detecting Data-Flow Errors

existence of data-flow errors in the process model withmodel checking. Figure 3 gives an overview of our ap-proach.

3.1 The BPMN2PetriNets Transformation Al-gorithm

To analyze process models, many approaches employPetri Nets to represent the models; see [24] or Ap-pendix A for the definition. For example, [10] uses PetriNets for representing BPMN 1.0 models and [11] forBPEL processes; [12] gives an overview of transforma-tions of process models to Petri Nets.

We transform the control flow of BPMN models toPetri Nets by following the approach in [10]. How-ever, [10] does not capture the data characteristics ofBPMN 2.0. Existing work [2] transforms data objects butinto BPMN 1.2. Their way to represent data with PetriNets including the limited data perspective of BPMN 1.2is not sufficient for our purposes, and their proprietaryinterpretation of the execution semantics of data objectsand related states is not compatible with BPMN 2.0. Inparticular, supporting alternatives for data in [2] wouldinflate the Petri Nets because any possible combinationof InputSets and OutputSets must be reflected.This asks for an appropriate transformation to representalternative InputSets and OutputSets.

Alternatively, one could use Colored Petri Nets torepresent data-dependent flow elements. This wouldslightly reduce the number of places of the Petri Nets,but would require complex firing rules for the represen-tation of the execution semantics regarding data.

Due to the complexity of the required transformation,our BPMN2PetriNet algorithm transforms a BPMN pro-cess model into a Petri Net representation in two steps,see Algorithm 1: Step (1) Mapping the control flow toPetri Nets; Step (2) Unfolding the data needs and dataresults of mapped flow elements according to the execu-tion semantics.Mapping: The procedure Map() in Algorithm 1 startswith Step (1a) mapping the flow elements, including data

elements, to a Petri Net using mapping rules. In Step (1b)it connects the mapped flow element with its successorand predecessor flow elements according to the sequenceflow specified in the BPMN process model.

We distinguish two cases for the mapping.Case (1): For the mapping of all data elements and

data-dependent flow elements, i.e., tasks, events, andconditional sequence flows, we have developed newmapping rules to Petri Net representations. Our mappedPetri Nets depict data-dependent flow elements with in-terfaces for embedding data characteristics in the unfold-ing step later on. Further, we distinguish explicitly be-tween a reading and a writing subnet, in the followingcalled R/W subnet, of a mapped flow element. We rep-resent multiple InputSets and OutputSets by com-bining the respective R/W subnets with an XOR split inthe Petri Net. A place in the mapped Petri Net coordi-nates them. This gives rise to combining InputSets aswell as OutputSets without the need for having to ex-plicitly handle all possible combinatorial constellationsof data uses.

Case (2): To map the sequence flow and data-independent flow elements, e.g., parallel gateways, weuse an existing BPMN 1.0 to Petri Nets transformation[10], which requires some preconditions, e.g., for in-coming and outgoing flows of split or join gateways, notaffecting the generality of the proposed mapping [10].This approach reflects the state-of-the art for transform-ing BPMN sequence flows.

We now describe our mapping rules for data elementsand tasks, cf. Figure 4.

Data Elements. The mapping rules are applied tounique data elements, not to their references. Therules are straight-forward (see Figs. 4a and 4b): Thetwo places p.d id and p.¬d id, with number of tokensm(p.d id) + m(p.¬d id) = 1, represent the initializa-tion status of a DataObject or a DataInput or -DataOutput of a process. We distinguish two cases:First, a DataObject or DataOutput of a process isuninitialized, i.e., p.¬d id has a token, see Fig. 4a. Sec-

6


Algorithm 1: The BPMN2PetriNet Transformation Algorithm

Algorithm BPMN2PetriNet()for each flow element fi do

Map(fi) // Step (1)if fi is a data-dependent flow element then

if IS(fi) 6= ∅ ∨OS(fi) 6= ∅ thenUnfold(fi) // Step (2)

Procedure Map(Flow element f)Map f to Petri Net // Step (1a)Connect f with predecessors & successors // Step (1b)

Procedure Unfold(Flow element f)if |IS(f)| ≥ 2 ∨ |OS(f)| ≥ 2 then

Unfold InputSets/OutputSets of f // Step (2a)

for each InputSet/OutputSet IOSi(f) dofor each data element dj ∈ IOSi(f) do

Unfold input & output subsets of dj in IOSi(f) // Step (2b)Record I/O places of dj // Step (2c)if dj ∈ OM (IOSi(f)) ∧ ∃ pred(dj) then

Generate predicate subnet of dj // Step (2d)

ond, a DataInput of a process is already initialized,i.e., p.d id has a token, see Fig. 4b. As a result, eachunique DataObject is represented by two places inthe Petri Net, even if the process model contains severalreferences to this DataObject.

Tasks. The main idea behind the structure of a mappedtask is to prepare the Petri Net for embedding com-plex data constellations, such as multiple InputSetsor OutputSets as well as optional and mandatorydata specifications. This asks for interfaces to expanda mapped task with input/output subnets for optional andmandatory data use as well as with combining places fordata alternatives. Figure 4c shows the mapping rule fora task. Each mapped task has a reading and a writing(R/W) subnet with particular transitions. The abbrevia-tions in the transition names mean a reading start (rs),a reading end (re), a writing start (ws), etc. The R/Wsubnets serve as interfaces for the input/output subnetsgenerated in the unfolding step and represent the de-fault setting, namely one ’empty’ InputSet and one’empty’ OutputSet. But they do not contain theirDataInput and DataOutput elements yet. The un-folding step will add them to the R/W subnets. Be-tween the R/W subnet, there is place p.t.t id to combinedata alternatives. Additionally, the Petri Net represent-ing a mapped task has two connecting places p.x.t idand p.t id.y (dashed lines) for connecting the mappedtask with its predecessor flow element x and successorflow element y. Each reading subnet contains an input

place p.I i.t id and each writing subnet an output placep.O i.t id (filled in grey). The input and output places,in short I/O places, in the R/W subnets are essential forchecking the data-flow correctness.

Definition 1 (Input and output places): An input pla-ce p.I i.t id is a place of the reading subnet of the i-thInputSet of a task t id with the following character-istics: If it has a token, all DataInputs of the i-thInputSet have been read successfully. The firing oftransition t.rs i.t id indicates the successful reading.An output place is the analogous place of a writing sub-net. �

The mapping rules for other data-dependent flow el-ements, i.e., events and conditional sequence flows, fol-low the same concept as described, see [25] for the de-tails. In contrast to the mapping rules for tasks, the onesfor events and conditional sequence flows are less com-plex. Mapped events have either one reading or one writ-ing subnet, and mapped conditional sequence flows havea reading subnet only.

Unfolding: If a data-dependent flow element containsspecifications for data, procedure Unfold() of Algo-rithm 1 adds Petri Net representations of the data needsand/or data results to an already mapped data-dependentflow element f . The added input/output subnets repre-sent the execution semantics regarding data. In partic-ular, our algorithm extends the default Petri Net repre-

7


d_id d_idd_id d_id p.d_id p.¬d_id(a) Uninitialized Data Elements

d_id d_id p.d_id p.¬d_id

(b) Initialized Data Elements

p.I_1.t_id

p.t_id.y

p.w_1.t_id

p.x.t_id

t.rs_1.t_id

t.ws_1.t_id

t.we_1.t_id

p.O_1.t_id

t.e_1.t_id

Task t_idx y

p.t.t_id

t.re_1.t_id

Reading subnet Writing subnet

(c) Task

Figure 4: Mapping Rules

p.t.t_id p.t_id.yp.w_1.t_idp.x.t_id

t.rs_1.t_id t.ws_1.t_id t.we_1.t_id

p.O_1.t_id

t.e_1.t_id

p.w_2.t_id

t.ws_2.t_id t.we_2.t_id

p.O_2.t_id

t.e_2.t_id

p.I_1.t_id

t.re_1.t_id

t.rs_2.t_id

p.I_2.t_id

t.re_2.t_id

Two reading subnets Two writing subnets

Figure 5: Task after Unfolding a Second InputSet and a Second OutputSet (Step (2a))

sentation by further InputSets and OutputSets ifneeded, and by representing the behavior for mandatoryand optional data. It comprises the following four steps:

Step (2a): The mapped flow element contains onereading subnet or one writing subnet, representing oneInputSet or one OutputSet. If f has severalInputSets or OutputSets, we unfold each furtherInputSet ISi(f) by adding an additional reading sub-net as alternative path to already existing reading sub-nets. For each further OutputSet OSi(f) we add anadditional writing subnet as alternative path to alreadyexisting writing subnets. Figure 5 shows the resultingPetri Net of a task after unfolding a second InputSetand a second OutputSet. By doing so, we repre-sent alternative InputSets and OutputSets of tasksand coordinate their paths at place p.t.t id, leaving asidetheir ordering.

Step (2b): Each InputSet/OutputSethas a mandatory and an optional subset ofDataInput/DataOutput elements. Figure 6represents the subnets for these four different cases(input/output, mandatory/optional) for unfolding oneelement of an InputSet/OutputSet of a flowelement t id as well as their interfaces. The partsvisualized with dashed lines in Fig. 6 represent thealready mapped flow elements (c.f. procedure Map())of the Petri Net. For each DataObject dj in anInputSet/OutputSet we add the appropriate

input/output subnet. Next, we connect each unfoldedinput/output subnet with the reading or writing subnet ofthe flow element mapped and with the two places p.d idand p.¬d id, which represent the data object dj mapped.

This results in a representation of the execution se-mantics regarding data for InputSets/OutputSets,as well as for the process as a whole: The subnets con-nected to the reading or writing subnet of a mappeddata-dependent flow element reflect the local executionsemantics of a flow element regarding data. Further,the connections of the unfolded input/output subnets tothe mapped places of DataObjects account for theglobal behavior of a process. The latter is becausefor each unique DataObject there exist exactly twoplaces in the resulting Petri Net representing whetherDataObject is initialized or not. Any subnet of thePetri Net representing the local execution semantics fora particular DataObject is connected to these twoprocess-global places of the particular DataObject.These connections follow the structure as given in theunfolding rules (see Fig. 6). To illustrate, there are con-nections from transitions t1 and t2 to the two places fora subnet representing a mandatory output (see Fig. 6c).For a subnet representing an optional output (see Fig. 6d)transitions t1, t2, t3 and t4 are connected. We will de-scribe the subnets in detail at the end of this subsection.

Step (2c): This step records all I/O places as well as in-formation on the kind of usage (mandatory/optional, as

8


d_id

T

InputSetIS_i

p.I_i.t_id

p.d_id

t.rs_i.t_id

Reading subnet (excerpt)

(a) Mandatory Input Subnet

d_id

T

InputSetIS_i

p.io.d_id.t_id

p.I_i.t_id

t.rs_i.t_id

Reading subnet (excerpt)

(b) Optional Input Subnet

d_id

T

p.d_id

p.¬d_id

t2

t1

p1 p2

p.w_i.t_id

p.O_i.t_id

t.ws_i.t_id

t.we_i.t_id

OutputSetOS_i

Writing subnet (excerpt)

(c) Mandatory Output Subnet

p.w_i.t_id

p.O_i.t_id

t1 t4p1

p.d_idp.¬d_id

t3t2

p2

p3

t.ws_i.t_id

t.we_i.t_id

d_id

T

OutputSetOS_i

Writing subnet (excerpt)

(d) Optional Output Subnet

Figure 6: Unfolding Rules

alternatives) for DataObject dj . We deploy this forthe following process-specific anti-pattern formula gen-eration.

Step (2d): If OSi refers the DataObject mandato-rily, and if any condition of a conditional sequence flowrefers the DataObject, we generate a so-called pred-icate subnet [22]. [22] introduced ’unfolding’ of predi-cate subnets to express that a DataObject may holdseveral values which influence the process execution bydata-dependent gateways.

Input/Output Subnets. We now describe the unfoldingrules, i.e., the input and output subnets, in more detail byfocusing on the local execution semantics for data. Fig-ure 6a displays the subnet for a mandatory input. As atask requires its mandatory data input for its execution,the subnet specifiying one mandatory DataObject isvery simple: It consists of a bidirectional arc from theplace p.d id, representing that the DataObject is ini-tialized, to the reading transition of the mapped task. Thebidirectional arc ensures that the DataObject staysinitialized. An optional data input does not affect theexecution semantics of a task. Thus, the subnet containsan optional input place p.io.d id.t id with a token and aconnection to the reading subnet of the unfolded task, seeFig. 6b. The subnet for a mandatory output, see Fig. 6c,reflects that a task t id initializes a DataObject in anycase, i.e., place p.d id finally has a marking. Before thestart of task t id, either place p.d id or place p.¬d id hasa marking. Accordingly, either transition t1 or t2 fires.This subnet is connected to the writing part of the taskmapped. The subnet in Fig. 6d covers all three executionscenarios of an optional output d id of an OutputSetOS(t id) of a flow element t id: (a) transitions t1 andt3 fire if d id has no value before start of t id, and t idinitializes d id, (b) transitions t1 and t2 fire if d id hasno value before start of t id and t id does not write tod id, and (c) t4 fires if d id is initialized before start oft id, and t id writes d id again.

3.2 Formalization of BPMN 2.0 Data-FlowAnti-Patterns

In this section we formalize the data-flow errors as anti-patterns, which we have identified as relevant for aBPMN process model, see Section 2.3. The result is a setof so-called generic anti-patterns, reflecting the specificsof BPMN for several Input- and OutputSets andfor optional data. These generic anti-patterns are inde-pendent of a concrete process model. Then we say howto generate process-specific anti-patterns.

Formalizing Generic Anti-Patterns: The ideas behindthe formalization are as follows. BPMN allows speci-fying the data needs and data results of flow elements.This results in a data flow from the perspective of an in-

9


dividual DataObject. Thus, we examine data-flow er-rors for each DataObject d. To consider alternativesof data modelled as several Input- and OutputSets,we analyze the data flow regarding the combinations ofthese alternatives. Further, supporting mandatory or op-tional use of data gives rise to several data-flow errorswe define anti-patterns for. We aim to achieve correct-ness at design time, however, the availability of data isonly known during execution. This is why we have toanalyze all possible execution variants that contain ex-ecution paths determined by the control flow as well asby the choice of alternative data needs and data results.These variants are contained in the state space of the un-folded Petri Net model which we use for error detection.

We illustrate how to formalize the data-flow errorswith anti-patterns, using the example of a Missing Dataflow error of a DataObject d. This error occurs if theprocess contains a flow element f which needs dmanda-torily, and the process has no flow element f ′ which ini-tializes dmandatorily before the execution of f . f and f ′

might have several alternative data needs in combinationwith several alternative data results. Hence, we considerall possible combinations of input and output alternativeswhere d is used to analyze its data flow. Note that thedata flow of d considers its availability in the whole pro-cess (not only local to a certain task or event).

As a basis for our formalization, for eachDataObject d we need information on where inthe process d can be processed (as input or output,optionally or mandatorily). To this end, we nowdefine these sets of BPMN flow elements for a cer-tain DataObject. We will make use of these setsto specify the anti-patterns. For the definitions thatfollow we assume a process model containing a num-ber of tasks |tasks|, a number of events |events|,and a number of conditional sequence flows |conds|.setf = tasks ∪ events denotes the set of all tasksand events of the process model. A task or event f hasa number of InputSets and OutputSets that wedenote with |IS(f)| and |OS(f)|. Note that these setscontain DataObjects as data needs and data results.Let ISO

i (fm) be the subset containing the optionalDataObjects of the i-th InputSet ISi of the m-thtask or event, and OSO

i (fm) the subset containing theoptional DataObjects of the i-th OutputSet OSi

of the m-th task or event; an event has at most oneInputSet or one OutputSet depending on its type.

In Definition 2, we summarize all InputSetsSet IO(d) resp. OutputSets Set OO(d) d is an op-tional element of. Using these sets, IO(d) of typeBoolean specifies whether d has been read successfullyin one of its InputSets; correspondingly, OO(d) oftype Boolean specifies whether d has been written suc-cessfully in one of its OutputSets.

For the mandatory use of d, we define IM (d) andOM (d) analogously to Definition 2 In addition to tasksand events, a conditional sequence flow s can use aDataObject within a condition cond(s) mandatorilyas well. For Definition 3, setf = tasks ∪ events ∪conds denotes the set of all tasks, events and condi-tional sequence flows of the process model. The defi-nitions of IM (d) and OM (d) are analogous to the op-tional case. In the following definition, we summarizeSet IM (d) is the set of all InputSets d is a manda-tory element of. Analogously, Set OM (d) is the setof all such OutputSets. Using these sets, IM (d) speci-fies whether d has been read successfully in one of theInputSets of Set IM (d). Correspondingly, OM (d)specifies whether d has been written successfully in oneof the OutputSets of Set OM (d). read(x) and writ-ten(x) as defined in Definitions 2 and 3 are formalized inthe generation step of process-specific anti-patterns, seebelow.

We now formalize the anti-patterns for BPMN usinga temporal logic formalism, namely CTL (ComputationTree Logic). See [8], Appendix A for a descriptionof CTL. [8] introduces an effective algorithm to ver-ify properties specified in CTL on Petri Net models.To achieve data-flow correctness, our approach aims toidentify errors which occur during reading or writing ofa DataObject in the context of a certain choice of al-ternatives. Next, the specification of data needs and re-sults in BPMN allows for optional or mandatory use ofdata. This gives rise to a distinction between optionaland mandatory errors, as described in Section 2.3. Fur-ther, we differentiate between weak and strong variantsof redundant and lost data, see Section 2.3. The resultis a set of generic anti-patterns for a DataObject d,in the following called Data-Flow Anti-Patterns (DAP).Table 1 lists our generic anti-patterns tailored to the exe-cution semantics of BPMN 2.0.

In the following, we explain our generic anti-patternsfor BPMN 2.0. We provide their formal specification inCTL.

DAP 1: Missing Data occurs if a DataObject d isa mandatory input IM (d), but not a mandatory outputOM (d) before, i.e., d is not initialized.

DAP 2: Missing Optional Data means that aDataObject d is optional input IO(d), and d is notinitialized before by a mandatory outputOM (d) or by anoptional output OO(d).

DAP 3: Strongly Redundant Data is given if aDataObject d is a mandatory outputOM (d), but thereis no flow element using this DataObject as manda-tory input IM (d) or as an optional input IO(d) in all fol-lowing execution paths.

DAP 4: Weakly Redundant Data happens if there ex-ists at least one execution path of the process where a

10


Definition 2 (Optional InputSets/OutputSets containing DataObject d):

Set IO(d) =⋃

m∈{1..|setf |}

{ISi(fm) | d ∈ ISOi (fm) ∧ i ∈ {1..|IS(fm)|}}

Set OO(d) =⋃

m∈{1..|setf |}

{OSi(fm) | d ∈ OSOi (fm) ∧ i ∈ {1..|OS(fm)|}}

read(x) = all DataInputs in InputSet x have been read successfully % type: boolean

written(x) = all DataOutputs in OutputSet x have been written successfully % type: boolean

IO(d) =∨

x∈Set IO(d)

(read(x)

)OO(d) =

∨x∈Set OO(d)

(written(x)

)

Definition 3 (Mandatory InputSets/OutputSets containing DataObject d):

Set IM (d) = {sl | d ∈ cond(sl) ∧ l ∈ {1..|cond|}} ∪( ⋃m∈{1..|setf |}

{ISi(fm) | d ∈ ISMi (fm) ∧ i ∈ {1..|IS(fm)|}}

)Set OM (d) =

⋃m∈{1..|setf |}

{OSi(fm) | d ∈ OSMi (fm) ∧ i ∈ {1..|OS(fm)|}}

read(x) = all DataInputs in InputSet x have been read successfully % type: boolean

written(x) = all DataOutputs in OutputSet x have been written successfully % type: boolean

IM (d) =∨

x∈Set IM (d)

(read(x)

)OM (d) =

∨x∈Set OM (d)

(written(x)

)

DataObject d is neither a mandatory input IM (d)nor an optional input IO(d), but DataObject d is amandatory output OM (d) before.

DAP 5: Redundant Optional Data holds for aDataObject d if an optional output OO(d) is not usedlater on, i.e., DataObject d is no mandatory inputIM (d) or optional input IO(d) in one of the succeedingexecution paths.

DAP 6: Strongly Lost Data occurs if a DataObjectd is mandatory output OM (d) several times, but beforethe later mandatory outputs happen, there is no manda-tory input IM (d) in between. This situation holds for allexecution paths and means that earlier data results (e.g.,initializations, updates) for d are lost.

DAP 7: Weakly Lost Data means that a DataObjectd is mandatory output OM (d), and there exists at leastone execution path where it is mandatory output OM (d),but not mandatory input IM (d) before.

DAP 8: Optional DataOutput does not leadto an error in every case. Lost optional data areDataObjects d which are optional output (OO(d)).Additionally, no upcoming flow elements exist that readd (IM (d) ∨ IO(d)) until a flow element writes d manda-torily, i.e., U OM (d).

DAP 9: Optionally Lost Data includes all

DataObjects d which are mandatory (OM (d))or optional output (OO(d)), and, additionally, there isa flow element that optionally writes d subsequentlywithout a flow element reading d in between (optionallyor mandatorily).

DAP 10: Inconsistent Data Two elements are in con-flict if one element f writes d, i.e., d belongs to its (op-tional or mandatory) data results (OS(f)), and anotherelement f ′ reads (d ∈ IS(f ′)) or writes d optionally ormandatorily (d ∈ OS(f ′)). d is inconsistent if two ele-ments in conflict regarding d can be executed in parallel.

Figure 7 shows examples of process models which il-lustrate the ten anti-patterns of Table 1. Some of thesemodels include more than one data-flow error. For in-stance, the model in Figure 7f illustrates strongly lostdata and also features a strongly redundant data-flow er-ror. This is because the Data Object is not read by anytask before the process terminates. Furthermore, eachstrong data-flow error also is a weak one.

Generating and Checking Process-Specific Anti-Patterns: Using our generic BPMN data-flow anti-patterns, the Step Process-specific anti-patterns genera-tion of our approach in Figure 3 delivers specific formu-las of the generic anti-patterns for each DataObject.To this end, we use the results of the transformation from

11


Anti-Patterns - DAP Formalization1 Missing Data E

(¬OM (d) U IM (d)

)2 Missing Optional Data E

(¬(OO(d) ∨OM (d)) U IO(d))

)3 Strongly Redundant Data EF

(OM (d) ∧ AX(A[¬(IM (d) ∨ IO(d)) U term])

)4 Weakly Redundant Data EF

(OM (d) ∧ EX(E[¬(IM (d) ∨ IO(d)) U term])

)5 Redundant Optional Data EF

(OO(d) ∧ EX(E[¬(IM (d) ∨ IO(d)) U term])

)6 Strongly Lost Data EF

(OM (d) ∧ AX(A[¬(IM (d) ∨ IO(d)) U OM (d)])

)7 Weakly Lost Data EF

(OM (d) ∧ EX

(E[¬(IM (d) ∨ IO(d)) U OM (d)]

))8 Lost Optional Data EF

(OO(d) ∧ EX

(E[¬(IM (d) ∨ IO(d)) U OM (d)]

))9 Optionally Lost Data EF

((OM (d) ∨OO(d)

)∧(EX(E[¬(IM (d) ∨ IO(d)) U OO(d)])

))10 Inconsistent Data

∨f∈{E∪T}∧d∈OS(f)} EF

[exec(f)∧

∨f ′ 6=f∧(d∈IS(f ′)∨d∈OS(f ′)) exec(f

′)]

Legend (for CTL formulas):A path operator (A or E) occurs together with a state operator (X, F, U).A/E: the formula needs to hold in all/at least one of the succeeding execution paths.X/F: the formula holds in the next/at least in one succeeding state.[φ1 U φ2]: φ1 holds until φ2 is reached.exec(f) means that flow element f is ready to be executed, i.e., the respectivetransition is activated.term denotes the termination of the process.

Table 1: BPMN 2.0 Generic Anti-Patterns for a DataObject d

BPMN to Petri Nets with its Input and Output Places,see Definition 1. The information recorded in Step (2c)of Algorithm 1 comprises the I/O places as well as infor-mation on the kind of usage (mandatory/optional, in al-ternatives) of the DataObject d. The generation stepof process-specific anti-patterns instantiates the genericBPMN 2.0 anti-patterns by means of the optional andmandatory usages of a DataObject d according toDefinition 2. In order to be able to check whether d is anoptional or mandatory data input, the model checker hasto prove that an IS ∈ Set IO(d)/∈ Set IM (d) is avail-able. To do so, it checks whether the input place of IScontains a token, see Definition 1. This means that weinstantiate the anti-patterns by replacing read(IS) (anInputSet of a flow element t id) with m(p.I i.t id)= 1 (respectively for a flow element e id)). The pictureis analogous for the OutputSets. written(t id) is re-placed using the output place, i.e., with m(p.O i.t id)= 1. The alternative use of d induces additional pathexpressions of the data flow in the process-specific anti-patterns.

For the final step, we employ a representation of thepossible states of the process model to find the states witherrors. A CTL model checker, namely LoLA [19], ana-lyzes the process-specific formulas of the anti-patternsfor each DataObject on the unfolded Petri Net modelin question. If a formula is satisfied, the data-flow erroris detected.

3.3 Tool Support

We have implemented a tool for automatically ana-lyzing data-flow errors in process models specifiedwith BPMN 2.0, see Figure 3. Input to the tool isa well-formed and sound BPMN process model, seealso Subsection 3.4. For each data object of the modelin question, the tool generates the process-specificanti-patterns. For each of these anti-patterns in turn, itruns the model checker LoLA to prove it. If a data-flowerror has been found, our tool returns the data objectand the task of the BPMN model where the error occurs.The naming of a BPMN task corresponds to the label ofthe respective place of the Petri Net. A special case is amissing data error of a data object d, because it causesa dead transition in the Petri Net. In this case the toolinserts an initialization of d into the Petri Net to repairthe error before checking the other anti-patterns. Ourevaluation makes use of the tool, see Section 4.

3.4 Discussion

Features Supported: We support all standardized datafeatures of BPMN 2.0. Our data-flow verification con-siders BPMN 2.0 process models according to the pro-cess execution conformance type of the BPMN 2.0 stan-dard. We do not consider business process choreogra-

12


(a) DAP 1 - Missing Data (b) DAP 2 - Missing Optional Data

(c) DAP 3 - Strongly Redundant Data (d) DAP 4 - Weakly Redundant Data

(e) DAP 5 - Redundant Optional Data (f) DAP 6 - Strongly Lost Data

(g) DAP 7 - Weakly Lost Data (h) DAP 8 - Lost Optional Data

(i) DAP 9 - Optionally Lost Data (j) DAP 10 - Inconsistent Data

Figure 7: Examples of Process Models Illustrating all Anti-Patterns of Table 1

13


phies. Specifying more complex data dependencies indata-aware business process models, see e.g., [15], [27],[7] is not our current focus. Such proposals extendthe standard, e.g., by annotations for newly created anddeleted data objects, by introducing semantics on statesof the data objects forming a life-cycle, or by semanticdata constraints, e.g., in preconditions of tasks.

Incorrect Process Models: Our focus is on data-flowcorrectness. Here, we assume that the input processmodel is well-formed and sound. In other words, data-flow correctness and soundness are issues that can bedealt with separately from each other. This means thatour tool can be integrated into a verification tool chainfor BPMN process models, after the soundness check.

Complexity: The state space depends on the dif-ferent cases enhancing the Petri Net: A) insertingthe input/output subnets for each DataInput/Ouputof each Input-/OutputSet adds up to a lo-cally restricted increase, i.e., with minor problemsfor verification, and B) inserting paths for severalInput-/Output- Sets can result in a polynomialincrease, accounting for state space explosion.

A) For each DataInput or DataOutput of a flowelement, a predefined number of states is added to thestate space. This is due to the additional input/outputsubnets of the unfolding step. c = 4 is the maximumnumber of states for these subnets, namely the numberof states of subnet 6d. Let n(f) be the number of dataneeds and data results of the flow element f . Then eachstate containing a marked place which represents a flowelement f with data incurs the following number of ad-ditional states at most: cn(f).

B) The occurrence of a DataObject in severalInput/OutputSets of tasks all over the processmodel can generate a growth of the state space in poly-nomial order. This is because alternatives for data gener-ate parallel paths. This so-called state-space explosionis a well-known problem for model checking [9] andcan hinder verification. Currently, alternatives for datain processes is a new concept in BPMN 2.0. We planto investigate respective optimization methods as futurework.

4 EVALUATION

The evaluation of our approach consists of several stepsfrom designing a collection of process models with in-tensive use of data needs and data results, explicitlyadding data-flow errors into some of the process modelsto a user experiment for modeling data in process mod-els given. Step 1 is auxiliary, to prepare the evaluation ofSteps 2 and 3.

To evaluate process models of realistic complexity, we

first looked for publicly available process models whichwe could use as benchmarks. In particular, this includesthe huge process repository of the BPM Academic Ini-tiative [4] for BPMN 2.0 process models with intensivedata usage. Unfortunately, only a small fraction of theprocess models contains DataObjects. Next, evenfewer models of this share conform to the standard. Forinstance, some models have non-conformant annotationsof message flows with DataObjects. In addition, themost important feature required for analyzing BPMN 2.0process models with data is missing: None of the processmodels contains specifications for DataInput, Data-Output, InputSet, OutputSet, etc., being part ofexecutable processes (see Section 2.2). We argue that inbusiness processes modeling data as first-class objects isnot ubiquitous yet and is very new in BPMN. Further-more, we hypothesize that BPMN process models withspecifications for optional data and alternatives are ab-sent because the know-how regarding these new featurescurrently is only in an early state. Without such spec-ifications, the conventional detection of data flow errors(see for instance [14, 18, 22]) is possible, but not a detec-tion of data-flow errors for optional data and alternativesof data in BPMN 2.0. Consequently, we have faced theproblem that such a set of processes that we could use asbenchmark has not yet been available publicly.

To deal with this problem, we have asked an ex-pert to design process models for 11 scenarios. Thesescenarios use data intensively. Some examples areadapted from literature, others we have developed our-selves. The scenarios comprise, say, order handling(S1) and job interview (S7) and are available online,see http://dbis.ipd.kit.edu/2134.php. These process mod-els do not have any data-flow errors according to our def-inition. We also checked these process models (namelythe Processes Sx.1 in Table 2) with our data-flow analysistool, which confirmed their error-freeness. Sx.n standsfor the n-th variant of a process model of Scenario x.

In Step 2, our expert has added data-flow errors tosome of the error-free process models. These processmodels cover all types of data-flow errors (see S3.2,S5.2, S8.2, S9.2 and S10.2 in Table 2). Our data-flowanalysis tool has correctly detected all of them.

In Step 3, we have run a user experiment to understandthe difficulties of modeling an error-free data flow, andalso to obtain process models with data-flow errors forfurther evaluation of our tool. We have organized thisexperiment as an exercise of a lecture with seven stu-dents with knowledge in BPMN modeling, including thedata aspect. These experienced individuals have startedwith two process models given, namely S1.1 and S2.1(from Step 1 of our evaluation). We had removed thedata elements from the models before. The task has beento enhance the process models with data needs and re-

14


Process

# Tasks# Events

# XOR Splits

# conditions

# DataObjec

ts

# DataAsso

c.

# Places

# Transiti

ons

# Miss. Data

# Redund. Data

# Lost Data

# Incon. Data

S1.1-8 6 4 1 2 3-8 5-12 49-72 40-50 0-4 0-4 0 0S2.1-6 8 2 1 2 4-10 12-21 63-84 47-58 0-3 0-3 0-1 0S3.1 8 2 3 6 5 17 85 74 0 0 0 0S3.2 8 2 2 4 5 15 80 70 0 2 1 0S4.1 14 2 0 0 12 24 107 82 0 0 0 0S5.1 7 2 1 2 5 15 64 52 0 0 0 0S5.2 7 2 1 2 6 16 64 52 0 0 2 0S6.1 8 2 2 4 7 17 90 70 0 0 0 0S7.1 6 2 1 2 5 10 58 33 0 0 0 0S8.1 13 3 2 5 8 22 110 94 0 0 0 0S8.2 14 3 2 5 9 16 121 104 1 5 5 2S9.1 8 2 1 2 3 21 72 58 0 0 0 0S9.2 8 2 1 2 5 24 79 62 0 2 2 0S10.1 8 2 2 4 5 12 75 60 0 0 0 0S10.2 8 2 2 5 6 13 91 81 0 1 0 0S11.1 19 3 2 5 6 13 91 81 0 0 0 0

Table 2: Evaluation of the Correctness Tool

sults. Modifying them has also been allowed if neces-sary. We textually described the use of data, so that theparticipants were able to model the data perspective ofthe processes. We have obtained several process mod-els for the two scenarios. They differ in the number ofDataObjects and control flow elements, see Scenar-ios S1 & S2 in Table 2. Because the process models arenot overly complex, we have been able to check manu-ally if they contain data-flow errors. This serves as goldstandard to evaluate our tool. Using it, we then havechecked the models. Our tool has detected all data-flowerrors contained in the models, 40 errors altogether.

Table 2 gives an overview of the results of our evalu-ation, i.e., detecting data-flow errors and confirming thecorrectness of process models without data-flow errors.Because of better readability we present the results forScenarios 1 and 2 in an aggregated form. See [25] forthe detailed results. We list the size of the BPMN pro-cess models analyzed, the size of the Petri Nets gener-ated and the number of data-flow errors identifed. Thenumber of the BPMN elements determines the size ofthe corresponding Petri Net, defined by the number oftransitions and places. The input and output subnets inparticular, added in the unfolding step of our transforma-tion, increase the size of the Petri Net.

For one, the experiment of Step 3 (user experiment)shows that our tool has detected all data-flow errors inthe process models created. Further, Missing Data is afrequent error in process modeling. The numbers of Re-dundant Data errors and of Lost Data errors reflect that

we count both strong as well as weak variants of data-flow errors. Inconsistent Data errors occur when differ-ent tasks read or write a DataObject in parallel. Thisonly happens with parallel execution paths. Only one ofour scenarios (S8.2) has this characteristic. All in all, theevaluation shows that in process modeling all types ofdata-flow errors are relevant and can occur.

5 RELATED WORK

Behavioural analysis of process models without the dataaspect has been studied extensively, see [13] for anoverview, but is not the focus of our work. In what fol-lows, we concentrate on the correctness of the data flow.[18] is one of the first approaches illustrating the impor-tance of data-flow correctness in process modeling. Theauthors have thoroughly analyzed problems which canoccur with a data flow but do not provide a solution forerror detection. [3] defines data-flow errors as patterns,but focuses on visually specifying compliance rules inorder to explain the violations. There, key requirementsare availability of data input and data output of an activ-ity, and consistent flow of data between two activities.We in turn use the patterns to express the execution se-mantics of process models with data elements and thusanalyze the correctness of the data flow not restricted tothe availability perspective supported in [3]. [23] pro-poses using patterns for the analysis of general compli-ance violations. In particular, order and occurrence pat-terns support the users when specifying constraints on a

15


process model with data. However, they do not supportBPMN but use BPEL with its specific data semantics forprocess modeling. This is different from BPMN and alsodoes not handle optional data or alternatives for data.

[22] introduces a method based on CTL* that com-bines the detection of control-flow and data-flow er-rors. They use anti-patterns for missing data, inconsis-tent data, redundant data, and lost data for Workflow-Net process models. In contrast to our approach, they donot cover the BPMN 2.0 semantics of data during pro-cess execution, including specific more elaborate waysto use data, i.e., alternatives for data and mandatory aswell as optional data. [21] regards data-flow analysisin processes based on UML activity diagrams. The au-thors provide separate correctness proofs directly imple-mented as procedures for each of the three basic types ofdata-flow errors, namely missing data, conflicting data,and redundant data. [14] extends the results of [21] forUML activity diagrams, by discussing additional data-flow errors such as inconsistent data. For error detectionthey also use separate checking procedures for each er-ror type. To do so, they have to explicitly generate andstore all possible paths of the process model. Due toXOR-nodes in particular, their number tends to be daunt-ing. Another drawback is that they do not use a state-based approach to represent the dynamic behaviour ofprocesses (e.g., with a state space of a Petri Net). Fur-ther approaches exist using UML data-flow analysis onUML activity models, e.g., [20], and [26], with anotherfocus than ours. [17] deals with data-flow correctness ofBPMN 2.0. They use the work of [21] adding optionalreading and writing access. To this end, they add behav-ioral profiles consisting of information on conflicts be-tween a pair of nodes of a process model. In other wordsthey establish behavioral relationships each between apair of tasks (nodes) of the process and use this list of re-lationships for the data-flow analysis. In contrast to ourmethod, their modeling of errors with behavioral profileshandles data separately from the process model. Further,they do not take alternatives into account. Finally, theyaddress only some of the data-flow errors, our approachcan cope with. For instance, they define inconsistent dataonly for write-write conflicts, do not distinguish betweenredundant data with optionality, and do not consider lostdata.

Summing up, our approach for detecting data-flowerrors uses a process formalization by Petri Nets. Toachieve this, we have provided new transformation rules,to represent execution semantics for data in particular.Our new data-flow anti-patterns distinguish mandatoryand optional data as well as alternatives. By doing so,they cover any anomalies known from existing work. Inparticular, we are, to our knowledge, first to support thedata-flow perspective of BPMN 2.0, as well as alterna-

tives for data and mandatory and optional data.Considering more intricate dependencies of data ob-

jects in data-aware process models is subject of furtherapproaches. For instance, some deal with data depen-dencies like inclusion, referential dependencies [15], se-mantically defined constraints [6] or with informationleaks [1]. These approaches focus on dependencieswhich are not part of the BPMN data perspective andwould require to enhance its specification concepts. Inother words, the problem is different from the one stud-ied here.

6 CONCLUSIONS

In this paper, we have proposed a new method for de-tecting data-flows errors in BPMN 2.0 process modelsat design time. This approach takes alternatives for dataas well as optional data into account. An automatic de-tection scheme requires a formal representation of theexecution semantics of BPMN 2.0 flow elements withdata associations. To achieve this, we have developedtransformation rules and a set of anti-patterns represent-ing data-flow errors in BPMN 2.0 process models. Onthis basis, we transform data-dependent flow elements ofa process model into unfolded Petri Nets to detect data-flow errors by using an existing model checker. Experi-ments with users have shown that our tool identifies thedata-flow errors present.

ACKNOWLEDGEMENT

This research was supported in part by the European So-cial Fund and by the Ministry Of Science, Research andthe Arts Baden-Wurttemberg.

We acknowledge support by Deutsche Forschungsge-meinschaft and Open Access Publishing Fund of Karl-sruhe Institute of Technology.

REFERENCES

[1] R. Accorsi and A. Lehmann, “Automatic Informa-tion Flow Analysis of Business Process Models”,in Proc. BPM, 2012.

[2] A. Awad, G. Decker, and N. Lohmann, “Diagnos-ing and Repairing Data Anomalies in Process Mod-els”, in Proc. BPM Workshops, 2010.

[3] A. Awad, M. Weidlich, and M. Weske, “Visu-ally Specifying Compliance Rules and ExplainingTheir Violations for Business Processes”, VisualLanguages and Computing, vol. 22, no. 1, 2011.

16


[4] Business Process Academic Initiative, “TheBPM Academic Initiative Model Collection”. bp-mai.org/download, accessed 14th August 2014.

[5] Object Management Group, “Business ProcessModel and Notation, V2.0”, OMG Specification,2011.

[6] D. Borrego et al, “Diagnosing Correctness of Se-mantic Workflow Models”, Data & Knowledge En-gineering, vol. 87, Sept 2013.

[7] C. Cabanillas et al, “Automatic Generation of aData-Centered View of Business Processes”, inProc. CAiSE, 2011.

[8] E. Clarke, E. Emerson, and A. Sistla, “Auto-matic Verification of Finite-State Concurrent Sys-tems Using Temporal Logic Specifications”, inProc. TOPLAS, 1986.

[9] E. Clarke, O. Grumberg, and D. Peled, ”Modelchecking”, MIT Press, 1999.

[10] R. Dijkman, M. Dumas, and C. Ouyang, “Seman-tics and Analysis of Business Process Models inBPMN”, Information and Software Technology,vol. 50, no. 12, 2008.

[11] S. Hinz, K. Schmidt, and C. Stahl, “TransformingBPEL to Petri Nets”, in Proc. BPM, 2005.

[12] N. Lohmann, E. Verbeek, and R. Dijkman, “PetriNet Transformations for Business Processes – ASurvey”, in Transactions on Petri Nets and OtherModels of Concurrency II. 2009.

[13] Z. Manna and A. Pnueli, ”The Temporal Logic ofReactive and Concurrent Systems – Specification”,Springer, 1992.

[14] H. S. Meda, A. K. Sen, and A. Bagchi, “On De-tecting Data Flow Errors in Workflows”, Data andInformation Quality, vol. 2, no. 1, 2010.

[15] A. Meyer et al, “Modeling and Enacting Com-plex Data Dependencies in Business Processes”, inProc. BPM, 2013.

[16] OASIS, Web Services Business Process ExecutionLanguage Version 2.0, April 2007.

[17] A. Rogge-Solti et al, “Business Process Configu-ration Wizard and Consistency Checker for BPMN2.0”, in Proc. BPMDS, 2011.

[18] S. Sadiq et al., “Data Flow and Validation in Work-flow Modelling”, in Proc. Australian DatabaseConference, 2004.

[19] K. Schmidt, “LoLA a Low Level Analyser”, inProc. Appl. and Theory of Petri Nets, 2000.

[20] H. Storrle, “Semantics and Verification of DataFlow in UML 2.0 Activities”, Electronic NotesTheor. Comput. Sci., vol. 4, no. 127, 2005.

[21] S. X. Sun et al., “Formulating the Data-Flow Per-spective for Business Process Management”, Infor-mation Systems Research, vol. 17, no. 4, 2006.

[22] N. Trcka, W. v. d. Aalst, and N. Sidorova, “Data-Flow Anti-Patterns: Discovering Data-Flow Errorsin Workflows”, in Proc. CAISE, 2009.

[23] W.-J. van den Heuvel, A. Elgammal, andO. Turetken, “Using Patterns for the Analy-sis and Resolution of Compliance Violations”,Coop.Inf.Systems, vol. 21, no. 1, 2012.

[24] W. M. P. van der Aalst, “The Application of PetriNets to Workflow Management”, Circuits, Systemsand Computers, vol. 8, no. 1, 1998.

[25] S. von Stackelberg, S. Putze, J. Mulle, andK. Bohm, “Detecting Data-Flow Errors in BPMN2.0”, Technical Report 2014-06, Karlsruhe Instituteof Technology, Department of Informatics, no. 06,2014.

[26] T. Waheed, M. Iqbal, and Z. Malik, “Data FlowAnalysis of UML Action Semantics for ExecutableModels”, in Proc. Europ. Conf. Model Driven Ar-chitecture, 2008.

[27] I. Weber, J. Hoffmann, and J. Mendling, “Beyondsoundness: on the verification of semantic busi-ness process models”, Distributed and ParallelDatabases, vol. 27, no. 3, 2010.

17


A BASIC CONCEPTS FOR PROCESSANALYSIS

In this appendix, we review fundamental concepts fordata-flow analysis in processes, namely Petri Nets withits state space formalism and the temporal logic CTL.

Petri Nets and State Space Formalism.

Petri Nets are a representative of formal graph-based pro-cess languages. The definition of a Petri Net used hereis the one from [24]. A Petri Net is a directed bipartitegraph with two types of nodes called places and transi-tions.

Definition 4 (Petri Net): A Petri Net is a triple(P, T, F ) with P a set of places, T a set of transitions(P ∩ T = ∅) and F ⊆ (P × T ) ∪ (T × P ) a set of arcs(flow relation).

p ∈ P is an input place of t ∈ T if (p, t) ∈ F andan output place if (t, p) ∈ F . •t denotes the set of inputplaces of t and t• the set of output places. A mappingM : P → N0 maps every p ∈ P to a positive numberof tokens, i.e., at any time a place contains zero or moretokens. The distribution of tokens over places (M ) rep-resents a state of the Petri Net, often referred to as itsmarking. A transition t ∈ T is activated in a state Mif ∀p ∈ •t : M(p) ≥ 1. A transition t ∈ T in M canfire, leading to a new stateM ′ which reduces the value ofM(p) by 1 if p ∈ •t, adds 1 to M(p) if p ∈ t• and doesnot change otherwise. The set of reachable states froma start state M0 of a Petri Net builds the state space. Tocheck properties of a BPMN process, we need this statespace, for which we use the Kripke structure [22] of thePetri Net corresponding to the original BPMN model.

CTL.

Computation Tree Logic CTL is a temporal logic formal-ism often used to specify properties for model checking.In our case, those properties are data-flow anti-patterns.E.g., [8] describes CTL and an effective algorithm to ver-ify properties specified in CTL.

The formal syntax of CTL is as follows:

Definition 5 (Computation Tree Logic): Everyatomic proposition ap is a CTL formula. If φ1 and φ2are CTL formulas then ¬φ1, φ1 ∨ φ2, φ1 ∧ φ2, AXφ1,EXφ1, AGφ1, EGφ1, AFφ1, EFφ1, A[φ1 U φ2], E[φ1U φ2] are CTL formulas.

In our context, ap is a state of a Petri Net which repre-sents the status of a data element in the BPMN process.

The logical operators always occur in pairs: A pathoperator (A or E) together with a state operator (X,G, F

or U). A means that the formula needs to hold in all suc-ceeding execution paths. E means that at least one exe-cution path exists where the formula holds. X means thatthe formula holds in the next state, G means the formulaholds in all succeeding states, F means that the formulaholds at least in one succeeding state, [φ1 U φ2] meansthat φ1 holds until φ2 is reached.

18


AUTHOR BIOGRAPHIES

Dr. Silvia von Stackelbergis currently a postdoctoralresearcher at the groupInformation Systems of theInstitute for Program Structuresand Data Organization (IPD)at Karlsruhe Institute ofTechnology (KIT), Germany.Her research and teaching atKIT focuses on business processmanagement, in particulardata and privacy, and crowd

computing. She received her PhD in Computer Sciencefrom Technical University of Darmstadt and her diplomain Information Systems from University of Bamberg,Germany. She is a Margarete von Wrangell fellow.

M.Sc. Susanne Putze receivedher Diploma (M.Sc.) inInformatics from the Karls-ruhe Institute of Technology(KIT), Germany in 2013. Inher diploma thesis, she workedon data flow modeling andcorrectness in BPMN 2.0. Now,she is a research assistant in theinformation systems group ofKlemens Bohm at IPD, KIT.Current research topics are

crowd computing and process modeling.

M.Sc. Jutta Mulle receivedher Diploma (M.Sc.) inInformatics from the Universityof Karlsruhe, Germany. Sheis a senior research assistantin the information systemsgroup of Klemens Bohm atIPD, Karlsruhe Institute ofTechnology (KIT). Currentresearch topics are workflowmanagement, process analysis,data and privacy in workflows.

She has wide experience in project work also withindustry.

Dr. Klemens Bohm is fullprofessor (chair of databasesand information systems)at Karlsruhe Institute ofTechnology (KIT), Germany,since 2004. Prior to that, hehas been professor of appliedinformatics/data and knowledgeengineering at Universityof Magdeburg, Germany,senior research assistant atETH Zurich, Switzerland,

and research assistant at GMD – ForschungszentrumInformationstechnik GmbH, Darmstadt, Germany.Current research topics at his chair are knowledgediscovery and data mining, data privacy and workflowmanagement. Klemens gives much attention tocollaborations with other scientific disciplines and withindustry.

19

Documents

Detecting Data-Flow Errors in BPMN 2 - RonPub · PDF fileSilvia von Stackelberg, Susanne Putze, Jutta M¨ulle, Klemens B ¨ohm: Detecting Data-Flow Errors in BPMN 2.0 Correction done