Upload
marlon-dumas
View
511
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Paper presentation given at the International Conference on Fundamental Approaches to Software Engineering (FASE) in March 2013. The paper can be found here.
Citation preview
Discovering Branching Conditions from Business
Process Execution Logs
Massimiliano de Leoni, Marlon Dumas, Luciano García-Bañuelos
University of Tartu, Estonia(Joint work with Eindhoven University of Technology)
Dis
cove
ring
Bran
chin
g Co
nditi
ons
Business Process Management
2
Start
Get Ready
Travel by CarTravel by Train
BETA PhD Day Starts
Visit Brewery
Have Dinner
Go Home
Travel by Train Pay for Parking
Travel by Car
End
Give a Talk
Implementation
EventLog
Execution
Dis
cove
ring
Bran
chin
g Co
nditi
ons
Business Process Mining
3
Start
Register order
Prepareshipment
Ship goods
(Re)send bill
Receive paymentContact
customer
Archive order
End
Performance Analysis
Process Model
Organizational Model
Social Network
EventLog
Slide by Ana Karla Alves de Medeiros (TU/e)
ProMProcess miningworkbench
Dis
cove
ring
Bran
chin
g Co
nditi
ons
Data perspective?
4
salaryage
installment
amount
length
Branching points
Dis
cove
ring
Bran
chin
g Co
nditi
ons
ProM’s Decision Miner
5
salaryage
installment
amount
length
CID Amount Len Salary Age Installm Task
CID Amount Len Salary Age Installm Task13219 8500 1 NULL NULL NULL ELA
Event
Log
CID Task Data Time Stamp …
13219 ELA Amount=8500 Len=1 2007-11-09 T 11:20:10 -
13219 RAP Salary=2000 Age=25 2007-11-09 T 11:22:15 -
13220 ELA Amount=25000Len=1 2007-11-09 T 11:22:40 -
13219 CI Installm=750 2007-11-09 T 11:22:45 -13219 NE 2007-11-09 T 11:23:00 -13219 ASA 2007-11-09 T 11:24:30 -13220 CI Installm=1200 2007-11-09 T 11:24:35 -
… … … … …
CID Amount Len Salary Age Installm Task13219 8500 1 NULL NULL NULL ELA13219 8500 1 2000 25 NULL RAP13219 8500 1 2000 25 750 RAP13219 8500 1 2000 25 750 NE
(amount < 10000)
(amount < 10000) ∨ (amount ≥ 10000 age < 35)∧
amount
Approve SimpleApplication (ASA)
≥ 10000 < 10000
Approve Complex Application (ACA)
Approve SimpleApplication (ASA)
≥ 35
age< 35
ProM’s decision miner / 2CID Amount Installm Salary Age Len Task
13219 8500 750 2000 25 1 ASA13220 12500 1200 3500 35 4 ACA13221 9000 450 2500 27 2 ASA
… … … … … … …
6
Decision tree learning
amount ≥ 10000 age ≥ 35∧
Dis
cove
ring
Bran
chin
g Co
nditi
ons
Decision miner: Not a panacea!
• Decision tree learning cannot discover expressions of the form “v op v”
7
installment > salary
The decision miner would return:
installment ≤ 1760 ∧ salary ≤ 1750 ∨ installment ≤ 1810 ∧ salary ≤ 1800 ∨ installment ≤ 1875 ∧ salary ≤ 1850 ∨ installment ≤ 1960 ∧ salary ≤ 1950 ∨installment ≤ 1975 ∧ salary ≤ 1970 ∨ installment ≤ 2000 ∧ salary ≤ 1990 ∨ …
Dis
cove
ring
Bran
chin
g Co
nditi
ons
Problem statement
• Discovery of branching conditions composed of atoms of the form “v op c” and “v op v”, including linear equations or inequalities involving multiple variables
• Our solution combines• Tools for dynamic analysis of software
(i.e., likely invariant discovery)
• Theory of decision tree learning
8
Dis
cove
ring
Bran
chin
g Co
nditi
ons
Daikon
• Tool for discovering likely invariants from execution logs• Given a set of program points, Daikon:• Instantiates a set of invariant templates
(over certain combination of variables)• Traverses the execution log• Falsifying some invariants• Gathering the statistical support for the remaining templates
• Discards some invariants based on:• Subsumption• Statistical support
Daikon strongly relies on code instrumentation/analysis9
Dis
cove
ring
Bran
chin
g Co
nditi
ons
CID Amount Installm Salary Age Len Task13210 20000 2000 2000 25 1 NR13220 25000 1200 3500 35 2 NE13221 9000 450 2500 27 2 NE13219 8500 750 2000 25 1 ASA13220 25000 1200 3500 35 2 ACA13221 9000 450 2500 27 2 ASA
… … … … … … …
Daikon: Tool for mining likely invariants
10
Daikon
installment > salaryamount ≥ 5000length < age…
installment ≤ salaryamount ≥ 5000length < age…
installment ≤ salaryamount ≤ 9500length < age…
installment ≤ salaryamount ≥ 10000length < age…
Dis
cove
ring
Bran
chin
g Co
nditi
ons
BranchMiner (Conjunctive)• Information Gain (IG) quantifies the discriminating power of a
predicate (with respect to two different outcomes)• Approach: • Use Daikon for discovering invariants• Combine invariants in a conjunction so as to maximize the overall IG
11
a1: installment > salarya2: amount ≥ 5000a3: length < age…
IG(a1) = 0.8IG(a2) = 0.2IG(a3) = 0…
IG(a1∧a2) = 0.8…
Dis
cove
ring
Bran
chin
g Co
nditi
ons
¬(P Q)∧
Disjunctions?
12
P Q∧
¬P∨¬Q
• Only the negation of conjunctive expression by the de Morgan Laws
Dis
cove
ring
Bran
chin
g Co
nditi
ons
BranchMiner (Disjunctive)
13
…
Partition 1 Partition 2
ConjunctiveBranchMiner
ConjunctiveBranchMiner
CONJ1 CONJ2
Partition n
ConjunctiveBranchMiner
CONJn
EventLog
Dis
cove
ring
Bran
chin
g Co
nditi
ons
BranchMiner (Disjunctive)
14
…
Partition 1 Partition 2
ConjunctiveBranchMiner
ConjunctiveBranchMiner
CONJ1 CONJ2
EventLog
Notify Rejection
Notify Eligibility
Notify Rejection
Decision Tree
…
IG(CONJ1) = 0.4IG(CONJ2) = 0.45IG(CONJ3) = 0.5…
IG(CONJ1∨CONJ2) = 0.78IG(CONJ1∨CONJ3) = 0.6…
Dis
cove
ring
Bran
chin
g Co
nditi
ons
Linear and polynomial expressions
• Approach• Select all numerical
variables and generate some derived (a.k.a. latent) variables using an arithmetic operatore.g., salary_div_installment, meaning “salary/installment”• Augment the event log with
the values for latent variables• Run the discovery method
for conjunctive/disjunctive conditions
15
CID Amount Installm Salary Age Len Task13210 20000 2000 2000 25 1 NR13220 25000 1200 3500 35 2 NE13221 9000 450 2500 27 2 NE13219 8500 750 2000 25 1 ASA13220 25000 1200 3500 35 2 ACA13221 9000 450 2500 27 2 ASA
… … … … … … …
CID Amount Installm Salary Sal/Inst Age LenAge+Le
n Task13210 20000 2000 2000 1.00 25 1 26 NR13220 25000 1200 3500 2.92 35 2 37 NE13221 9000 450 2500 5.56 27 2 29 NE13219 8500 750 2000 2.67 25 1 26 ASA13220 25000 1200 3500 2.92 35 2 37 ACA13221 9000 450 2500 5.56 27 2 29 ASA
… … … … … … … … …
CID Amount Installm Salary Sal/Inst Age LenAge+Le
n Task13210 20000 2000 2000 1.00 25 1 26 NR13220 25000 1200 3500 2.92 35 2 37 NE13221 9000 450 2500 5.56 27 2 29 NE13219 8500 750 2000 2.67 25 1 26 ASA13220 25000 1200 3500 2.92 35 2 37 ACA13221 9000 450 2500 5.56 27 2 29 ASA
… … … … … … … … …
Dis
cove
ring
Bran
chin
g Co
nditi
ons
Assessment
16
Dis
cove
ring
Bran
chin
g Co
nditi
ons
Conclusions• We developed a technique for discovering branching
conditions from event logs• Complex expressions (e.g., “v op v”, linear inequalities, etc.)• More compact than those mined with conventional decision trees
• Integration into ProM• Implemented as a command line tool
• Validation with real-life logs• Assessed with synthetically generated event logs
• Areas for extensions• Coping with noise in the event logs• Handling of null values• Extending the coverage to more complex types of expressions
17