36
From data to event log Mieke Jans Hasselt University

Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

From data to event log

Mieke Jans Hasselt University

Page 2: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

“Yes, we have data…”

´  “… now we can start applying process mining”

Page 3: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

How to…

´  … transform that load of available data into an event log that can be fed into a commercial process mining tool?

Step-by-step approach

Page 4: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

But first… … set your goal

Efficiency versus Compliance

Page 5: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Identify process cornerstones

´  Talk to process owner AND IT-related coordinator

´  Identify the cornerstones (the key activities) of the process and in which tables this information is to be found

´  Chances are that activities are related to transactions on documents

´  List 3-5 key questions/tests that you wish to run.

1

Page 6: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Identify process cornerstones

´  Running example: purchasing process, goal of compliance

´  Following blocks could have been identified:

´  Create a Purchase Requisition (PR)

´  Create a Purchase Order (PO)

´  Approve PO

´  Receive Goods

´  Book invoice

Extra information: if an extra line is added to the PO, the process restarts at ‘Approve PO’ for that part.

Example

Page 7: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Identify process cornerstones

´  Key tests include:

SOD between PO creation and Receiving Goods

SOD between 2 levels of Approving PO

Does an invoice always stem from a PO?

Example

Page 8: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Cornerstone (CS) Document CS/Document Table

´  The cornerstones are typically described as activities that are related to transactions on documents.

´  Firstly, identify the underlying documents

´  Secondly, identify the tables that capture the timestamps of the transactions on the documents (the cornerstones)

2

Page 9: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Cornerstone (CS) Document CS/Document Table

´  Create a Purchase Requisition (PR)

´  Create a Purchase Order (PO)

´  Approve PO

´  Receive Goods

´  Book invoice

´  Add PO line

Example

´  Purchase Requisition (PR)

´  Purchase Order (PO) .

´  PO

´  PO

´  Invoice

´  PO

Cornerstone Which document is affected?

PR PO Invoice

Page 10: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Cornerstone (CS) Document CS/Document Table

´  Create a Purchase Requisition (PR)

´  Create a Purchase Order (PO)

´  Approve PO

´  Receive Goods

´  Book invoice

´  Add PO line

Example

´  Purchase Requisition (PR)

´  Purchase Order (PO) .

´  PO

´  PO

´  Invoice

´  PO

´  PR header .

´  PO header .

´  Log header

´  PO History

´  Invoice header

´  Log line

Cornerstone Which document is affected?

Which table holds cornerstone timestamp?

PR PO Invoice

Page 11: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Identify the table relationships of key tables

´  Create an Entity-Relationship Diagram of the tables that were listed in Step 2.

For example:

For a PO line, there are multiple invoice lines possible,

while an invoice line can belong maximum to 1 PO line

3

PO line Invoice line

Page 12: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Identify the table relationships of key tables

´  Investigate the ER-diagram in combination with the document flow.

´  Test whether there are:

´  Parent-child relationships between tables that represent 1 document

´  Many-to-many relationships between documents

3

Page 13: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Identify the table relationships of key tables

PO header

PO line

PO history

Invoice line

Invoice header Log header

Log line

Example

PR header

Page 14: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Identify the table relationships of key tables

´  In the example, there are clear parent-child relationships between tables that represent 1 document.

For example:

PO header and PO line,

Invoice header and Invoice line,

Log header and Log line.

´  Between the documents, there are n-to-n relationships

For example:

between a PO and an invoice

Example

Page 15: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Select a process instance

´  Process instance: the object that you will follow throughout the process and which will be subjected to process activities

´  In a document-based process, the instance will probably be related to one of the documents

´  2 dimensions:

´  Start – middle – end document?

´  Header or line level?

4

Page 16: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Select a process instance

´  2 dimensions:

´  Start – middle – end document?

´  Header or line level?

For example:

4

header

item

Doc 1 Doc 2 Doc 3 time

H-1 H-2

I-2

H-3

I-3

?

Page 17: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Select a process instance - Start – middle – end doc?

´  Is there a single point of entry to the process?

Start document

´  Are there multiple points of entry?

goal

Efficiency Compliance

Start document End document

You will identify fall-out You will identify cases

that did not follow prescribed procedures

4.1

Page 18: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

For example:

Doc 2

Doc 3

Doc 1

Doc 2

Doc 3

Doc 1

Doc 2

Doc 3

Doc 1

Doc 2

Doc 3

Doc 1

Doc 2

Doc 3

Doc 2

Doc 1

Doc 2

Doc 1

Doc 1

Doc 2

Doc 1

Doc 2

Doc 3

Doc 1

Doc 2

Doc 3

Doc 3

Doc 3

Doc 1

Doc 2

Doc 3

Doc 1

Doc 2

Doc 3

Doc 2

Doc 3

Reality

Page 19: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Doc 2

Doc 3

Doc 1

Doc 2

Doc 3

Doc 1

Doc 2

Doc 3

Doc 1

Doc 2

Doc 3

Doc 1

Doc 2

Doc 3

Doc 2

Doc 1

Doc 2

Doc 1

Doc 1

Doc 2

Doc 1

Doc 2

Doc 3

Doc 1

Doc 2

Doc 3

Doc 3

Doc 3

Doc 1

Doc 2

Doc 3

Doc 1

Doc 2

Doc 3

Doc 2

Doc 3

Start doc as process instance

Fall-out will be identified

Non-compliance won’t be captured

Efficiency

Page 20: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Doc 2

Doc 3

Doc 1

Doc 2

Doc 3

Doc 1

Doc 2

Doc 3

Doc 1

Doc 2

Doc 3

Doc 1

Doc 2

Doc 3

Doc 2

Doc 1

Doc 2

Doc 1

Doc 1

Doc 2

Doc 1

Doc 2

Doc 3

Doc 1

Doc 2

Doc 3

Doc 3

Doc 3

Doc 1

Doc 2

Doc 3

Doc 1

Doc 2

Doc 3

Doc 2

Doc 3

End doc as process instance

Fall-out won’t be identified anymore

Non-compliance will show

Exceptions

Compliance

Page 21: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Select a process instance – Header or line level? ´  Reconsider the key tests in step 1, and check at which

level the involved activities are situated

´  Take that level as process instance level

´  In case of mixed levels -> take the highest level

The activities on the lower level, have to be aggregated on the higher level

4.2

Page 22: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Select a process instance

´  2 dimensions

header

item

time

H-PR H-PO

I-PO

H-Inv

I-Inv

PR PO Invoice

Example

Page 23: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Select a process instance Start – middle – end doc?

´  In our example process, it is possible to have invoices without a preceding PR or PO.

´  Therefore, there are multiple points of entry in this process

´  Goal = compliance

´  End doc as process instance invoice

Example

head

item

time

H-PR H-PO

I-PO

H-Inv

I-Inv

PR PO Invoice

Page 24: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Select a process instance – Header or line level? ´  Key tests included:

SOD between PO creation and Receiving Goods

SOD between 2 levels of Approving PO

Does an invoice always stem from a PO?

´  Activities involved are:

Create PO (PO) header

Receive Goods (PO) line

Approve PO (PO) header

Create Invoice (Inv) header

However, how to aggregate Receive Goods to header level?

Decide to take all, first, last… line level Goods Receipts

Example

(Invoice) Header level

Page 25: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Select a process instance

´  2 dimensions

header

item

time

H-PR H-PO

I-PO

H-Inv

I-Inv

PR PO Invoice

Example

Page 26: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Identify activities 1.  Start = set of timestamps, identified in step 2

Condition = possible to link to chosen process instance,

taking into account the n-to-n relationships

2.  Add all other timestamps in identified tables

set of candidate activities

3.  Pruning: delete activities that are not of interest

4.  For the attribute-dependant timestamps

-> check whether other attribute values would deliver interesting activities and add accordingly

5

Page 27: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Identify activities

´  1.

´  2.

´  3.

´  4.

5

Page 28: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Identify activities

1.  Cornerstones:

Create PR, Create PO, Approve PO, Receive Goods, Book Invoice, Add PO line -> possible to link.

2. Add:

PO last changed, Enter Invoice in system, Receive Payment, Change PO line

4. ‘Receive Goods’ is created by taking the timestamp of table ‘PO history’ when a certain field = X

-> attribute dependent

‘Receive Invoice’ is also captured, when that field holds XX -> add

Example

3.

Page 29: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Attribute versus extra dimension

´  In this last step, all attributes are evaluated as possible activity dimensions

´  Differentiate between case attributes and activity attributes

´  Case attribute with n values -> create n*activities?

´  Activity attribute with n values -> multiply 1 act by n?

6

Page 30: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Attribute versus extra dimension

For example:

´  Case attribute concerning the type of a case

(n types)

´  The values are mutually exclusive

2 options:

´  Type of case is an attribute -> possible to use as filter

OR

´  All activities are created in n-fold -> visual outcome

6

Page 31: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Attribute 2-fold activities 6

Page 32: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Attribute versus extra dimension

For example:

´  Activity attribute concerning a ‘scan doc’ activity

(3 values: scan PR, scon PO, scan invoice)

´  The values are not mutually exclusive within a case

2 options:

´  Keep the activity at higher level (‘scan doc’) and decide on (a possible) aggregation level (only first scan…?)

OR

´  Create seperate activities at the lower level (‘scan PR’, ‘scan PO’, ‘scan invoice’)

6

Page 33: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Attribute (no aggregation)

3 activities on lower level 6

Page 34: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Attribute versus extra dimension

In our example, the activities ‘Receive Goods’ and ‘Receive Invoice’ are an example of 2-fold activities instead of 1 activity with 2 attribute values

Example

Page 35: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Overview

´  Set your Goal

´  Identify the Process Cornerstones

´  What are the related documents and tables

´  Draw the ER-diagram for the table relationships

´  Select your Process Instance (2 dimensions)

´  Identify activities

´  Evaluate attributes as possible activity dimensions

Event log to mine

Page 36: Hasselt University · 2020. 1. 6. · Identify activities 1. Start = set of timestamps, identified in step 2 Condition = possible to link to chosen process instance, taking into account

Thank you [email protected]