Process Mining: Understanding and Improving Desire Lines in Big Data

  • View

  • Download

Embed Size (px)


We are pleased to announce the lecture: “Process Mining: Understanding and Improving Desire Lines in Big Data” in honour of doctor honoris causa Wil van der Aalst. Wednesday May 30th - 10.00 a.m. - 12 a.m., Hasselt University, campus Diepenbeek (Agoralaan, building D) - auditorium H5 The Faculty of Business Economics of Hasselt University is pleased to invite you to the lecture “Process Mining: Understanding and Improving Desire Lines in Big Data”. This lecture is organised to honour prof. dr. Wil van der Aalst, on whom the degree of ‘doctor honoris causa’ will be conferred by Hasselt University, Faculty of Business Economics (promotor prof. Koen Vanhoof). Professor van der Aalst is a full professor of Information Systems at the Technische Universiteit Eindhoven (TU/e). Currently he is also an adjunct professor at Queensland University of Technology (QUT).His research interests include workflow management, process mining, Petri nets, business process management, process modeling, and process analysis. Many of his ideas have influenced researchers, software developers and standardization committees working on process support.

Text of Process Mining: Understanding and Improving Desire Lines in Big Data

  • 1. Process MiningUnderstanding and Improving DesireLines in Big Wil van der

2. Lets Play: Play-Out, Play-In, ReplayBig DataDesire LinesProcess MiningHow Good is My Model? Process DiscoveryConformance Checking Food for Thought: Lasagna and SpaghettiGoogle Maps and TomTom How to Get Started? Conclusion PAGE 2 3. On the different roles of (process) models PAGE 3 4. Play-Out process model event logPAGE 4 5. Play-Out (Classical use of models)BA p1Ep3Dstart endp2Cp4 A B C D AEDAED ABCDACBD ACBDAED ACBDPAGE 5 6. Play-Inevent log process modelPAGE 6 7. Play-InABCD AEDAED ABCDACBD ACBDAED ACBD BA p1 E p3 Dstart endp2 C p4 PAGE 7 8. Example Process Discovery(Vestia, Dutch housing agency, 208 cases, 5987 events) PAGE 8 9. Example Process Discovery(ASML, test process lithography systems, 154966 events)PAGE 9 10. Example Process Discovery(AMC, 627 gynecological oncology patients, 24331 events) PAGE 10 11. Replay extended modelshowing times,frequencies, etc. diagnostics predictions recommendationsevent log process model PAGE 11 12. ReplayA BC D BA p1 E p3 Dstart endp2 C p4 PAGE 12 13. ReplayAED B Ap1 E p3 Dstart endp2 C p4 PAGE 13 14. Replay can detect problemsAC DProblem!Problem!token left behindBmissing token Ap1 E p3Dstart endp2 C p4 PAGE 14 15. Conformance Checking(WOZ objections Dutch municipality, 745 objections, 9583 event, f= 0.988)PAGE 15 16. Replay can extract timing informationA5 B8 C9 D13 8 5 67 4 3 B 25 8 A p1E p3 Dstartend 513 4 p23C p4 4 374 7 6 9PAGE 16 17. Performance Analysis Using Replay(WOZ objections Dutch municipality, 745 objections, 9583 event, f= 0.988) PAGE 17 18. Big Data PAGE 18 19. All of the worlds musicBig Datacan be stored on a $600 disk drive. Enterprises globally stored more than 7 exabytes of new data on disk drives in 2010, while consumers stored more than 6 exabytes of new data on devices such as PCs and Indeed, we are notebooks.generating so much data today that it isphysically impossible to store it all. Health care providers, for instance, discard 90 percent of the data that they generate.Source: Big Data: The Next Frontier for Innovation, Competition, andProductivity McKinsey Global Institute, 2011. PAGE 19 20. Hilbert and Lopez. The Worlds Technological Capacity to Store, Communicate, andCompute Information. Science, 332(6025):60-65, 2011. PAGE 20 21. PAGE 21 22. PAGE 22 23. PAGE 23 24. PAGE 24 25. Evidence-BasedBusiness Process ManagementPAGE 25 26. PAGE 26 27. Process Mining PAGE 27 28. Process Mining = Event Data + ProcessesData Mining + Process AnalysisMachine Learning + Formal Methods PAGE 28 29. Process Miningsupports/worldbusiness controlsprocessessoftware people machinessystemcomponents organizationsrecords events, e.g., messages, specifies transactions, models configuresanalyzesetc. implements analyzes discovery(process) eventconformancemodellogsenhancement 30. Starting point: event logXES, MXML, SA-MXML, CSV, etc.PAGE 30 31. Simplified event log a = register request, b = examine thoroughly, c = examine casually, d = check ticket, e = decide, f = reinitiate request, g = pay compensation, and h = reject requestPAGE 31 32. Processdiscoverybexamine thoroughly g c1 c3payc compensation aexamine estart registercasually decide c5endrequest hc2d c4 reject check ticket request f reinitiaterequest PAGE 32 33. Conformancecheckingbcase 7: e isexecutedexaminewithout thoroughlycase 8: g or hbeingg is missingenabled c1 c3payc compensation aexamine estart registercasually decide c5endrequest case 10: e h is missing inc2d c4rejectsecond check ticketround request f reinitiaterequest PAGE 33 34. Extension: Adding perspectives tomodel based on event logThe event log can be used todiscover roles in the organization(e.g., groups of people with similarwork patterns). These roles can be Performance information (e.g., theused to relate individuals and average time between twoactivities.subsequent activities) can be extracted from the event log and visualized on top of the model . Role A:Role E:Role M:Assistant Expert ManagerDecision rules (e.g., a decision treebased on data known at the time a Pete Sue Saraparticular choice was made) can belearned from the event log and used MikeSean to annotated decisions. Ellen Eb Aexamine thoroughly Ag AM c1 c3 payccompensation aexaminee A start register casually Adecide c5 end requesthc2dc4Mreject check ticketrequestfreinitiatePAGE 34 request 35. How good is my model?PAGE 35 36. Four Competing Quality Criteria able to replay event log Occams razorfitness simplicity processdiscoverygeneralizationprecision not overfitting the lognot underfitting the logPAGE 36 37. Example: one log four models bexamine thoroughlyg pay c compensationa examinee start register casuallydecide end #trace requesth455 acdeh d rejectcheck ticketrequest191 abdeg f reinitiaterequest177 adceh N1 : fitness = +, precision = +, generalization = +, simplicity = + 144 abdeh 111 acdegacdeh82 adcegstartregisterexaminecheckdecidereject end request casually ticketrequest56 adbeh N2 : fitness = -, precision = +, generalization = -, simplicity = +47 acdefdbeh able to replay event log Occams razor38 adbeg examinecheckthoroughlyb d ticketgfitness simplicitypaycompensation33 acdefbdeha 14 acdefbdeg start register examine c end11 acdefdbeg requestcasually ef reinitiate h processdeciderequestrejectrequest 9 adcefcdehdiscoveryN3 : fitness = +, precision = -, generalization = +, simplicity = + 8 adcefdbeh 5 adcefbdeg adc eg 3 acdefbdefdbeggeneralizationprecision registerrequestcheckticketexaminecasuallydecidepaycompensation 2 adcefdbeg acde g2 adcefbdefbdeg not overfitting the lognot underfitting the logregisterexaminecheckdecidepayrequest casually ticket compensation 1 adcefdbefbdeh adce h1 adbefbdefdbegregistercheck examine deciderejectrequest ticketcasually request 1 adcefdbefcdefdbeg ac d e h 1391 startendregisterexaminecheck deciderejectrequest casually ticket request (all 21 variants seen in the log )abd e g register examine checkdecide pay request thoroughly ticketcompensationadb e h register check examinedecidereject requestticket thoroughly requestabd e h registerexaminecheckdecidereject requestthoroughlyticketrequestPAGE 37N4 : fitness = +, precision = +, generalization = -, simplicity = - 38. # trace 455 acdehModel N1 191 abdeg 177 adceh 144 abdeh 111 acdeg82 adceg56 adbehb 47 acdefdbeh examinethoroughly38 adbegg 33 acdefbdeh payccompensation 14 acdefbdeg aexamine e11 acdefdbegstartregister casuallydecideend request 9 adcefcdehhd reject 8 adcefdbeh check ticketrequest 5 adcefbdegf reinitiate 3 acdefbdefdbeg requestN1 : fitness = +, precision = +, generalization = +, simplicity = +2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 381391 39. # trace455 acdehModel N2191 abdeg177 adceh144 abdeh111 acdeg 82 adceg 56 adbeh 47 acdefdbeh 38 adbega cd e h 33 acdefbdehstart register examinecheckdecide reject end 14 acdefbdegrequestcasually ticket request N2 : fitness = -, precision = +, generalization = -, simplicity = + 11 acdefdbeg9 adcefcdeh8 adcefdbeh5 adcefbdeg3 acdefbdefdbeg2 adcefdbeg2 adcefbdefbdeg1 adcefdbefbdeh1 adbefbdefdbeg1 adcefdbefcdefdbegPAGE 39 1391 40. # trace 455 acdehModel N3 191 abdeg 177 adceh 144 abdeh 111 acdeg82 adceg56 adbeh47 acdefdbeh examinecheckthoroughlyb d ticketg 38 adbeg pay33 acdefbdeh compensationa 14 acdefbdegstartregister examine end 11 acdefdbeg requestcasually ce f reinitiate rejecth9 adcefcdeh deciderequestrequest8 adcefdbeh N3 : fitness = +, precision = -, generalization = +, simplicity = + 5 adcefbdeg 3 acdefbdefdbeg 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 401391 41. # trace455 acdehModel N4191 abdeg177 adceh144 abdeh ad c e g 111 acdeg register checkexamine decide pay requestticket casually compensation 82 adcega c de g 56 adbeh registerexaminecheckdecide pay request casually ticketcompensation 47 acdefdbeha d c eh 38 adbeg register checkexamine decide reject requestticket casuallyrequest 33 acdefbdeha c de h 14 acdefbdegstartend register examine checkdecide reject requestcasuallyticket request 11 acdefdbeg (all 21 variants seen in the log )9 adcefcdeh8 adcefdbeh5 adcefbdegab d e g registerexamine check decide pay 3 acdefbdefdbeg requestthoroughly ticket compensation2 adcefdbegad b e h register checkexamine decide reject2 adcefbdefbdeg requestticketthoroughly request1 adcefdbefbdeh a b d e h registerexamine checkdecide reject 1 adbefbdefdbeg requestthoroughly ticket request1 adcefdbefcdefdbegN 4 : fitness = +, precision = +, generalization = -, simplicity = -PAGE 41 1391 42. Process DiscoveryPAGE 42 43. Process Discovery (small selection)distributed genetic mining automata-based learninglanguage-based regionsheuristic mininggenetic miningstate-based regionsLTL miningstochastic task graphsneural networksfuzzy mining hidden Markov modelsmining block structures algorithmconformal process graph multi-phase mining partial-order based mining# algorithmILP mining ++ algorithmPAGE 43 44. Petri net view:Just discover the places able to replay event log Occams razor fitness simplicityprocess discovery generalizationprecisionnot overfitting the lognot underfitting the loga1b1a2b2... p(A,B)...ambnAdding a place limits behavior:overfitting adding too many placesunderfitting adding too few placesA={a1,a2, am}B={b1,b2, bn} PAGE 44 45. Example: Process Discovery UsingState-Based Regions01011001101101001011111101101000110110011110111000001101101001001100 de [a,e] [a,d,e] [ a,b] a b event log[] [a] cc bd[a,c] [a,b,c] [a,b,c,d] bap1ep3dstartend p2cp4 PAGE 45 46. Example of State-Based Regionde [a,e] [a,d,e] [ a,b] a b[] [a]c c bd[a,c] [a,b,c] [a,b,c,d] enter: b,e leave: d do-not-cross: a,c bap1ep3dstartend p2cp4PAGE 46 47. Example: Process Discovery UsingLanguage-Based RegionsA place is feasible if it canbe added without fc1disabling any of thetraces in the event