Upload
marlon-dumas
View
1.982
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Keynote at the Brazilian Workshop on Business Process Management (WBPM) and Brazilian Symposium on Information Systems (SBSI), 23 May 2013
Citation preview
Beyond Process Mining:Discovering Business Rules
From Event Logs
Marlon Dumas
University of Tartu, Estonia
With contributions from Luciano García-Bañuelos, Fabrizio Maggi & Massimiliano de Leoni
Brazilian BPM Workshop (WBPM’ 2013)
2
Business Process MiningStart
Register order
Prepareshipment
Ship goods
(Re)send bill
Receive paymentContact
customer
Archive order
End
Performance Analysis
Process Model
Organizational Model
Social Network
EventLog
Slide by Ana Karla Alves de Medeiros
Process mining tool (ProM, Disco, IBM BPI)
3
Automated Process DiscoveryCID Task Time Stamp …
13219 Enter Loan Application 2007-11-09 T 11:20:10 -
13219 Retrieve Applicant Data 2007-11-09 T 11:22:15 -
13220 Enter Loan Application 2007-11-09 T 11:22:40 -
13219 Compute Installments 2007-11-09 T 11:22:45 -
13219 Notify Eligibility 2007-11-09 T 11:23:00 -
13219 Approve Simple Application 2007-11-09 T 11:24:30 -
13220 Compute Installements 2007-11-09 T 11:24:35 -
… … … …
The Problem of Process Mining
Dealing with Complexity
• Question: How to cope with complexity in (information) system specifications?
• Aggregate-Decompose (“part-of”)• Generalize-Specialize (“is a”)• Special cases
• Summarize by aggregating and ignoring “uninteresting” parts
• Summarize by specializing and ignoring “uninteresting” specialized classes
Approach 1: Aggregation
Bose, Veerbeck & van det Aalst: Discovering Hierarchical Process Models using ProM
ProM’s Fuzzy MinerRemove Infrequent Behavior & Aggregate
Approach 2: Trace Clustering
G. Greco et al., Discovering Expressive Process Models by Clustering Log Traces, TKDE, 2006
Trace clustering in a nutshell
Slide by Dirk Fahland
Bottom-Line
Do we want models
or do we want insights?
www.interactiveinsightsgroup.com
Discovering Business Rules
Mining Decision Rules
13
What’s missing?
salaryage
installment
amount
length
Decisionpoints
14
ProM’s Decision Minersalaryage
installment
amount
length
CID Amount Len Salary Age Installm Task
CID Amount Len Salary Age Installm Task13219 8500 1 NULL NULL NULL ELA
Event
Log
CID Task Data Time Stamp …
13219 ELA Amount=8500 Len=1 2007-11-09 T 11:20:10 -
13219 RAP Salary=2000 Age=25 2007-11-09 T 11:22:15 -
13220 ELA Amount=25000Len=1 2007-11-09 T 11:22:40 -
13219 CI Installm=750 2007-11-09 T 11:22:45 -13219 NE 2007-11-09 T 11:23:00 -13219 ASA 2007-11-09 T 11:24:30 -13220 CI Installm=1200 2007-11-09 T 11:24:35 -
… … … … …
CID Amount Len Salary Age Installm Task13219 8500 1 NULL NULL NULL ELA13219 8500 1 2000 25 NULL RAP13219 8500 1 2000 25 750 RAP13219 8500 1 2000 25 750 NE
15
(amount < 10000)
(amount < 10000) ∨ (amount ≥ 10000 age < 35)∧
amount
Approve SimpleApplication (ASA)
≥ 10000 < 10000
Approve Complex Application (ACA)
Approve SimpleApplication (ASA)
≥ 35
age< 35
ProM’s Decision Miner / 2CID Amount Installm Salary Age Len Task
13219 8500 750 2000 25 1 ASA13220 12500 1200 3500 35 4 ACA13221 9000 450 2500 27 2 ASA
… … … … … … …
Decision tree learning
amount ≥ 10000 age ≥ 35∧
16
ProM’s Decision Miner – Limitations• Decision tree learning cannot discover expressions
of the form “v op v”
installment > salary
The decision miner would return:
installment ≤ 1760 ∧ salary ≤ 1750 ∨ installment ≤ 1810 ∧ salary ≤ 1800 ∨ installment ≤ 1875 ∧ salary ≤ 1850 ∨ installment ≤ 1960 ∧ salary ≤ 1950 ∨installment ≤ 1975 ∧ salary ≤ 1970 ∨ installment ≤ 2000 ∧ salary ≤ 1990 ∨ …
17
Generalized Decision Rule Mining in Business Processes
• Discover of decision rules composed of atoms of the form “v op c” and “v op v”, including linear equations or inequalities involving multiple variables
• Approach: – Likely invariant discovery (Daikon)– Decision tree learning
De Leoni et al. FASE’2013
18
CID Amount Installm Salary Age Len Task13210 20000 2000 2000 25 1 NR13220 25000 1200 3500 35 2 NE13221 9000 450 2500 27 2 NE13219 8500 750 2000 25 1 ASA13220 25000 1200 3500 35 2 ACA13221 9000 450 2500 27 2 ASA
… … … … … … …
Daikon: Mining Likely Invariants
Daikon
installment > salaryamount ≥ 5000length < age…
installment ≤ salaryamount ≥ 5000length < age…
installment ≤ salaryamount ≤ 9500length < age…
installment ≤ salaryamount ≥ 10000length < age…
19
• Information Gain (IG) quantifies the discriminating power of a predicate (with respect to two different outcomes)
• Approach: – Use Daikon for discovering invariants– Combine invariants in a conjunction so as to maximize the overall IG
a1: installment > salarya2: amount ≥ 5000a3: length < age…
IG(a1) = 0.8IG(a2) = 0.2IG(a3) = 0…
IG(a1∧a2) = 0.8…
Conjunctive Decision Rule Mining
20
Disjunctive Decision Rule Mining
…
Partition 1
Partition 2
ConjunctiveMiner
ConjunctiveMiner
CONJ1 CONJ2
Partition n
ConjunctiveMiner
CONJn
EventLog
21
…
Partition 1
Partition 2
ConjunctiveBranchMiner
ConjunctiveBranchMiner
CONJ1 CONJ2
EventLog
Notify Rejection
Notify Eligibility
Notify Rejection
Decision Tree
…
IG(CONJ1) = 0.4IG(CONJ2) = 0.45IG(CONJ3) = 0.5…
IG(CONJ1∨CONJ2) = 0.78IG(CONJ1∨CONJ3) = 0.6…
Disjunctive Decision Rule Mining
Mining Descriptive Temporal Rules
Problem Statement
• Given a log, discover a set of temporal rules (LTL) that describe the underlying process, e.g.– In a lab analysis process, every leukocyte
count is eventually followed by a platelet count• ☐(leukocyte_count platelet_count)
– Patients who undergo surgery X do not undergo surgery Y later• ☐(X ☐ not Y)
DeclareMiner(Maggi et al.)
Oh no! Not again!
What went wrong?
• Not all rules are interesting• What is “interesting”?
– Not necessarily what is frequent (expected)– But what deviates from the expected
• Example:– Every patient who is diagnosed with
condition X undergoes surgery Y• But not if the have previously been diagnosed
with condition Z
Interesting Rules
Discovering Refined Temporal Rules
• Discover temporal rules that are frequently “activated” but not always “fulfilled”, e.g.– When A occurs, eventually B occurs in 90% of
cases• ☐(A B) has 90% fulfillment ratio
– Discover a rule that describes the remaining 10% of cases, e.g. using data attributes• ☐(A [age < 70] B) has 100% fulfillment ratio
Now it’s better
Bose et al. BPM’2013
And better (with data)
Maggi et al. BPM’2013
Discriminative Rules Mining
Problem Statement
• Given a log partitioned into classes– e.g. good vs bad cases, on-time vs late cases
• Discover a set of temporal rules that distinguish one class from the other, e.g.
• Claims for house damage that end up in a complaint, are often those for which at two or more data entry errors are made by the customer when filing the claim
Mining Anomalous Software Development Issues (Sun et al. 2013)
• Extract features from traces based on which events occur in the trace
• Apply a contrasting itemset mining technique features in one class and not in the other
• Decision tree to construct readable rules
Discovering Signature PatternsBose & van der Aalst 2013
K-nearest neighbor, one-class SVM
kgrams, tandem repeats, …
Decision trees, class association rules
Cross validation
IBM Business Process Insight
1. Apply sequence mining to extract frequent patterns from event logs
2. Determine which patterns best discriminate between different outcomes– Uses Information Gain (IG) to rank patterns
according to their discriminative power
Lakshmanan et al. BPM’2013
Conclusion
References• Mining decision rules
– Rozinat, van der Aalst: “Decision Mining in ProM”. BPM’2006– De Leoni, Dumas, García-Bañuelos: “Discovering Branching Conditions from
Business Process Execution Logs”. FASE’2013
• Mining rule-based process models– Maggi, Bose, van der Aalst: “Efficient Discovery of Understandable Declarative
Process Models from Event Logs”. CAiSE'2012.– Di Ciccio, Mecella: “A Two-Step Fast Algorithm for the Automated Discovery of
Declarative Workflows”. CIDM’2013.– Maggi, Dumas, García-Bañuelos, Montali: “Discovering Data-Aware Declarative
Process Models from Event Logs”. BPM’2013– Bose, Maggi, van der Aalst: “Enhancing Declare Maps Based on Event
Correlations”. BPM’2013.
• Discriminative rules mining– Sun et al. Mining “Explicit Rules for Software Process Evaluation”. ICSSP’2013– Bose and van der Aalst: “Discovering Signature Patterns from Event Logs”.
CIDM’2013. – Lakshmanan et al. “Investigating Clinical Care Pathways Correlated With
Outcomes”. BPM’2013