18
Ghent University, Faculty of Economics and Business Administration Department of Management Information and Operations Management Jan Claes for BPI@BPM Saturday 14 May FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION Merging Computer Log Files for Process Mining: An Artificial Immune System Technique Jan Claes and Geert Poels http://processmining.ugent.be

BPI@BPM2011

Embed Size (px)

DESCRIPTION

Slides of my presentation at BPI workshop at BPM conference, 29 August 2011, Clermont-Ferrand, FR

Citation preview

Page 1: BPI@BPM2011

Ghent University, Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for BPI@BPM 201110 April 2023

FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION

Merging Computer Log Files for Process Mining:An Artificial Immune System Technique

Jan Claes and Geert Poelshttp://processmining.ugent.be

Page 2: BPI@BPM2011

Ghent University, Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for BPI@BPM 20112 / 18

Process Mining

Processes are supported by IT systemsIT systems record actual process dataProcess data can be used to

Discover process model Check conformance with existing process info Improve or extend existing process model

Attention Only As-Is Only (correctly) recorded information

Process Mining

Page 3: BPI@BPM2011

Ghent University, Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for BPI@BPM 20113 / 18

Keynote BPI 2010, Michael Zur Muehlen

BPI 2010, Keynote Michael Zur Muehlen http://www.slideshare.net/mzurmuehlen/bu-5236080

Process Controlling

Business Activity

Monitoring

Process Intelligence

Event Detection & Correlation

Decision Making

Main focus point of

current BPI research

Deserves more focus

in BPI research

Page 4: BPI@BPM2011

Ghent University, Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for BPI@BPM 20114 / 18

Preparation Collect data: find event information Merge data: from different sources Structure data: group per instance Convert data: to tool specific format

Process mining Make decisions, take actionM

Process Mining steps

A

MM

M

MA

A

MA

Manual task Analysts needed in most cases

Automated task Less human involvement needed

Page 5: BPI@BPM2011

Ghent University, Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for BPI@BPM 20115 / 18

Merging log files

My research:Merging log files

Page 6: BPI@BPM2011

Ghent University, Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for BPI@BPM 20116 / 18

Merging log files

1. Find links 2. Merge chronologically 3. Add unlinked traces 4. Put in new log file

Page 7: BPI@BPM2011

Ghent University, Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for BPI@BPM 20117 / 18

Find links

Required properties of solution Finds traces in both log files that belong to the

same process execution Without prior knowledge about the provided log

files (as generic as possible) But with maximal possibilities for the (expert) user

to include his knowledge about the log files

Page 8: BPI@BPM2011

Ghent University, Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for BPI@BPM 20118 / 18

Find links

Proposed solution Take the best possible guess based on assumptions Include multiple indicator factors in analysis Calculate factor scores for each analysed solution Combine factor scores into global score per solution ‘Best guess’ is solution with highest combined score,

because based on assumed indicators, most indicator value points to this solution

Provide user interaction possibilities

Page 9: BPI@BPM2011

Ghent University, Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for BPI@BPM 20119 / 18

Decisions to make

Which indicator factors?How to calculate a score for each factor?How to combine factor scores to global score?Which solutions to analyse?

(analyse = calculate & compare scores)

Which user interactions to include (expert) user knowledge?

See paper for more details

Page 10: BPI@BPM2011

Ghent University, Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for BPI@BPM 201110 / 18

Indicator factors

Same trace identifier Assumption: If both logs contain a trace with the

same id, there is a very high chance they match Not always though (e.g. customer id vs. order id)

161718192021

101214161820

Page 11: BPI@BPM2011

Ghent University, Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for BPI@BPM 201111 / 18

Indicator factors

Equal attribute values Assumption: The more attributes of a trace and its

events from both logs are equal, the higher the chance they match

JAN 12:00JAN 12:10JAN 12:20JAN 12:30JAN 12:40JAN 12:50

JC 14 14:00JC 15 14:10JC 16 14:20JC 17 14:30JC 18 14:40JC 19 14:50

161718192021

1718191A1B1C

Page 12: BPI@BPM2011

Ghent University, Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for BPI@BPM 201112 / 18

Indicator factors

Extra trace & Missing trace Assumption: A trace from one log has more chance

to match with only one trace from the other log Extra trace: Negative if trace is linked with multiple

traces in other log Missing trace: Negative if trace is not linked

Page 13: BPI@BPM2011

Ghent University, Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for BPI@BPM 201113 / 18

Indicator factors

Time difference Assumption: For a certain trace t in one log the

trace in the other log that starts sooner after t has a higher chance to match

More difficult when traces overlap

JAN 12:00JAN 12:10JAN 12:20JAN 12:30JAN 12:40JAN 12:50

JC 10 11:45JC 11 11:55JC 12 12:05JC 13 12:15JC 14 12:25JC 15 12:35

161718192021

1718191A1B1C

Page 14: BPI@BPM2011

Ghent University, Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for BPI@BPM 201114 / 18

User interaction

Step 1 let user adapt parameters & weightsStep 2 give feedback on individual scores:

user can change weights and restart? Step 3 present best solution per factor:

let user choose which factor dominatesbased on factor score feedback

? Step 4 provide other ways for user to feed algorithm with his insights

Page 15: BPI@BPM2011

Ghent University, Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for BPI@BPM 201115 / 18

Test results

Simulated data (300-400 msec on standard laptop) Benefit of controllable parameters, known solution Correct number of linked traces in all tests Perfect results for same trace id and up to 50%

noise, worse results for higher overlap of tracesReal data (6-10 min on standard laptop)

Correct number of linked traces in all tests Almost perfect results for same trace id and up to

50% noise, worse results for higher overlap

Page 16: BPI@BPM2011

Ghent University, Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for BPI@BPM 201116 / 18

Further research plans

Refining merging technique Quest for optimal indicators and weights

is continuous effort (based on experiences from case studies)

Implementation optimisation (speed, memory usage, scalability) is continuous effort

Validation (case studies)

Page 17: BPI@BPM2011

Ghent University, Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for BPI@BPM 201117 / 18

Questions

Do you agree that combined set of logical assumptions can be strong indicator (stronger than individual assumptions)?

Any feedback on the used factors?Any other factors that should be included?Any concerns about performance and

scalability?

Page 18: BPI@BPM2011

Ghent University, Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for BPI@BPM 201118 / 18

Contact information

Jan [email protected]

http://processmining.ugent.beTwitter: @janclaesbelgium