Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
From data to event log
Mieke Jans Hasselt University
“Yes, we have data…”
´ “… now we can start applying process mining”
How to…
´ … transform that load of available data into an event log that can be fed into a commercial process mining tool?
Step-by-step approach
But first… … set your goal
Efficiency versus Compliance
Identify process cornerstones
´ Talk to process owner AND IT-related coordinator
´ Identify the cornerstones (the key activities) of the process and in which tables this information is to be found
´ Chances are that activities are related to transactions on documents
´ List 3-5 key questions/tests that you wish to run.
1
Identify process cornerstones
´ Running example: purchasing process, goal of compliance
´ Following blocks could have been identified:
´ Create a Purchase Requisition (PR)
´ Create a Purchase Order (PO)
´ Approve PO
´ Receive Goods
´ Book invoice
Extra information: if an extra line is added to the PO, the process restarts at ‘Approve PO’ for that part.
Example
Identify process cornerstones
´ Key tests include:
SOD between PO creation and Receiving Goods
SOD between 2 levels of Approving PO
Does an invoice always stem from a PO?
Example
Cornerstone (CS) Document CS/Document Table
´ The cornerstones are typically described as activities that are related to transactions on documents.
´ Firstly, identify the underlying documents
´ Secondly, identify the tables that capture the timestamps of the transactions on the documents (the cornerstones)
2
Cornerstone (CS) Document CS/Document Table
´ Create a Purchase Requisition (PR)
´ Create a Purchase Order (PO)
´ Approve PO
´ Receive Goods
´ Book invoice
´ Add PO line
Example
´ Purchase Requisition (PR)
´ Purchase Order (PO) .
´ PO
´ PO
´ Invoice
´ PO
Cornerstone Which document is affected?
PR PO Invoice
Cornerstone (CS) Document CS/Document Table
´ Create a Purchase Requisition (PR)
´ Create a Purchase Order (PO)
´ Approve PO
´ Receive Goods
´ Book invoice
´ Add PO line
Example
´ Purchase Requisition (PR)
´ Purchase Order (PO) .
´ PO
´ PO
´ Invoice
´ PO
´ PR header .
´ PO header .
´ Log header
´ PO History
´ Invoice header
´ Log line
Cornerstone Which document is affected?
Which table holds cornerstone timestamp?
PR PO Invoice
Identify the table relationships of key tables
´ Create an Entity-Relationship Diagram of the tables that were listed in Step 2.
For example:
For a PO line, there are multiple invoice lines possible,
while an invoice line can belong maximum to 1 PO line
3
PO line Invoice line
Identify the table relationships of key tables
´ Investigate the ER-diagram in combination with the document flow.
´ Test whether there are:
´ Parent-child relationships between tables that represent 1 document
´ Many-to-many relationships between documents
3
Identify the table relationships of key tables
PO header
PO line
PO history
Invoice line
Invoice header Log header
Log line
Example
PR header
Identify the table relationships of key tables
´ In the example, there are clear parent-child relationships between tables that represent 1 document.
For example:
PO header and PO line,
Invoice header and Invoice line,
Log header and Log line.
´ Between the documents, there are n-to-n relationships
For example:
between a PO and an invoice
Example
Select a process instance
´ Process instance: the object that you will follow throughout the process and which will be subjected to process activities
´ In a document-based process, the instance will probably be related to one of the documents
´ 2 dimensions:
´ Start – middle – end document?
´ Header or line level?
4
Select a process instance
´ 2 dimensions:
´ Start – middle – end document?
´ Header or line level?
For example:
4
header
item
Doc 1 Doc 2 Doc 3 time
H-1 H-2
I-2
H-3
I-3
?
Select a process instance - Start – middle – end doc?
´ Is there a single point of entry to the process?
Start document
´ Are there multiple points of entry?
goal
Efficiency Compliance
Start document End document
You will identify fall-out You will identify cases
that did not follow prescribed procedures
4.1
For example:
Doc 2
Doc 3
Doc 1
Doc 2
Doc 3
Doc 1
Doc 2
Doc 3
Doc 1
Doc 2
Doc 3
Doc 1
Doc 2
Doc 3
Doc 2
Doc 1
Doc 2
Doc 1
Doc 1
Doc 2
Doc 1
Doc 2
Doc 3
Doc 1
Doc 2
Doc 3
Doc 3
Doc 3
Doc 1
Doc 2
Doc 3
Doc 1
Doc 2
Doc 3
Doc 2
Doc 3
Reality
Doc 2
Doc 3
Doc 1
Doc 2
Doc 3
Doc 1
Doc 2
Doc 3
Doc 1
Doc 2
Doc 3
Doc 1
Doc 2
Doc 3
Doc 2
Doc 1
Doc 2
Doc 1
Doc 1
Doc 2
Doc 1
Doc 2
Doc 3
Doc 1
Doc 2
Doc 3
Doc 3
Doc 3
Doc 1
Doc 2
Doc 3
Doc 1
Doc 2
Doc 3
Doc 2
Doc 3
Start doc as process instance
Fall-out will be identified
Non-compliance won’t be captured
Efficiency
Doc 2
Doc 3
Doc 1
Doc 2
Doc 3
Doc 1
Doc 2
Doc 3
Doc 1
Doc 2
Doc 3
Doc 1
Doc 2
Doc 3
Doc 2
Doc 1
Doc 2
Doc 1
Doc 1
Doc 2
Doc 1
Doc 2
Doc 3
Doc 1
Doc 2
Doc 3
Doc 3
Doc 3
Doc 1
Doc 2
Doc 3
Doc 1
Doc 2
Doc 3
Doc 2
Doc 3
End doc as process instance
Fall-out won’t be identified anymore
Non-compliance will show
Exceptions
Compliance
Select a process instance – Header or line level? ´ Reconsider the key tests in step 1, and check at which
level the involved activities are situated
´ Take that level as process instance level
´ In case of mixed levels -> take the highest level
The activities on the lower level, have to be aggregated on the higher level
4.2
Select a process instance
´ 2 dimensions
header
item
time
H-PR H-PO
I-PO
H-Inv
I-Inv
PR PO Invoice
Example
Select a process instance Start – middle – end doc?
´ In our example process, it is possible to have invoices without a preceding PR or PO.
´ Therefore, there are multiple points of entry in this process
´ Goal = compliance
´ End doc as process instance invoice
Example
head
item
time
H-PR H-PO
I-PO
H-Inv
I-Inv
PR PO Invoice
Select a process instance – Header or line level? ´ Key tests included:
SOD between PO creation and Receiving Goods
SOD between 2 levels of Approving PO
Does an invoice always stem from a PO?
´ Activities involved are:
Create PO (PO) header
Receive Goods (PO) line
Approve PO (PO) header
Create Invoice (Inv) header
However, how to aggregate Receive Goods to header level?
Decide to take all, first, last… line level Goods Receipts
Example
(Invoice) Header level
Select a process instance
´ 2 dimensions
header
item
time
H-PR H-PO
I-PO
H-Inv
I-Inv
PR PO Invoice
Example
Identify activities 1. Start = set of timestamps, identified in step 2
Condition = possible to link to chosen process instance,
taking into account the n-to-n relationships
2. Add all other timestamps in identified tables
set of candidate activities
3. Pruning: delete activities that are not of interest
4. For the attribute-dependant timestamps
-> check whether other attribute values would deliver interesting activities and add accordingly
5
Identify activities
´ 1.
´ 2.
´ 3.
´ 4.
5
Identify activities
1. Cornerstones:
Create PR, Create PO, Approve PO, Receive Goods, Book Invoice, Add PO line -> possible to link.
2. Add:
PO last changed, Enter Invoice in system, Receive Payment, Change PO line
4. ‘Receive Goods’ is created by taking the timestamp of table ‘PO history’ when a certain field = X
-> attribute dependent
‘Receive Invoice’ is also captured, when that field holds XX -> add
Example
3.
Attribute versus extra dimension
´ In this last step, all attributes are evaluated as possible activity dimensions
´ Differentiate between case attributes and activity attributes
´ Case attribute with n values -> create n*activities?
´ Activity attribute with n values -> multiply 1 act by n?
6
Attribute versus extra dimension
For example:
´ Case attribute concerning the type of a case
(n types)
´ The values are mutually exclusive
2 options:
´ Type of case is an attribute -> possible to use as filter
OR
´ All activities are created in n-fold -> visual outcome
6
Attribute 2-fold activities 6
Attribute versus extra dimension
For example:
´ Activity attribute concerning a ‘scan doc’ activity
(3 values: scan PR, scon PO, scan invoice)
´ The values are not mutually exclusive within a case
2 options:
´ Keep the activity at higher level (‘scan doc’) and decide on (a possible) aggregation level (only first scan…?)
OR
´ Create seperate activities at the lower level (‘scan PR’, ‘scan PO’, ‘scan invoice’)
6
Attribute (no aggregation)
3 activities on lower level 6
Attribute versus extra dimension
In our example, the activities ‘Receive Goods’ and ‘Receive Invoice’ are an example of 2-fold activities instead of 1 activity with 2 attribute values
Example
Overview
´ Set your Goal
´ Identify the Process Cornerstones
´ What are the related documents and tables
´ Draw the ER-diagram for the table relationships
´ Select your Process Instance (2 dimensions)
´ Identify activities
´ Evaluate attributes as possible activity dimensions
Event log to mine
Thank you [email protected]