A business process mining application for internal transaction fraud mitigation

Expert Systems with Applications 38 (2011) 13351–13359

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier .com/locate /eswa

A business process mining application for internal transaction fraud mitigation

Mieke Jans a,⇑, Jan Martijn van der Werf b, Nadine Lybaert a, Koen Vanhoof a

a Faculty of Business Economics, Hasselt University, Agoralaan, Gebouw D, 3590 Diepenbeek, Belgiumb Department of Mathematics and Computer Science, Technische Universiteit Eindhoven, P.O. Box 513, 5600 MB Eindhoven, The Netherlands

a r t i c l e i n f o

Keywords:Internal fraudTransaction fraudProcess mining

0957-4174/$ - see front matter � 2011 Elsevier Ltd. Adoi:10.1016/j.eswa.2011.04.159

⇑ Corresponding author. Tel.: +32 11268652.E-mail addresses: [email protected] (M. J

(J.M. van der Werf), [email protected] (N. Lselt.be (K. Vanhoof).

1 The remaining five elements concern investoralignment and support of the roles of various stake horeporting and information quality.

a b s t r a c t

Corporate fraud these days represents a huge cost to our economy. In the paper we address one specifictype of corporate fraud, internal transaction fraud. Given the omnipresence of stored history logs, thefield of process mining rises as an adequate answer to mitigating internal transaction fraud. Process min-ing diagnoses processes by mining event logs. This way we can expose opportunities to commit fraud inthe followed process. In this paper we report on an application of process mining at a case company. Theprocurement process was selected as example for internal transaction fraud mitigation. The results con-firm the contribution process mining can provide to business practice.

� 2011 Elsevier Ltd. All rights reserved.

1. Introduction

In recent years, the problem of internal fraud has received moreand more attention. Not unfounded, there the Association of Certi-fied Fraud Examiners (ACFE), an American worldwide organizationthat studies internal fraud, estimates a US company’s losses oninternal fraud to be seven percent of its annual revenues (ACFE,2008). In a previous report of the ACFE, in 2006, this estimationwas only 5%, confirming the increasing threat internal fraud posesto companies.

Internal fraud has received a great deal of attention from inter-ested parties like governments or non-profit institutions. The emer-gence of fraud into our economic world did not go unnoticed. A USfraud standard (Statement on Auditing Standard No. 99) and aninternational counterpart (International Standard on Auditing No.240) were created to point auditors to their responsibility relatingto fraud in an audit of financial statements. Section 404 of theSarbanes–Oxley act of 2002 and the Public Company AccountingOversight Board’s (PCAOB) Auditing Standard No. 2 also addressthis issue. Meanwhile, the CEO’s of the International Audit Net-works released a special report in November 2006. This report, is-sued by the six largest global audit networks, is released in thewake of corporate scandals. The authors of this report express theirbelieve in fighting fraud, as they name it ‘‘one of the six vital ele-ments, necessary for capital market stability, efficiency and growth’’.1

ll rights reserved.

ans), [email protected]), koen.vanhoof@uhas-

needs for information, thelders, the auditing profession,

All these standards and reports address the issue of internal fraud(as opposed to external fraud – fraud committed by someoneexternally related to the company).

In general, two categories within internal fraud can bedistinguished: financial statement fraud and transaction fraud.Bologna and Lindquist (1995) define financial statement fraud as‘the intentional misstatement of certain financial values to enhancethe appearance of profitability and deceive shareholders or credi-tors’. Statement fraud concerns the abuse of a managers position(hence ‘management fraud’) to alter financial statements in sucha way that they do not give ‘a true and fair view’ of the companyanymore. Transaction fraud however can be committed by bothmanagement and non-management. The intention with transac-tion fraud is to steal or embezzle organizational assets. Violationscan range from asset misappropriation, corruption over pilferageand petty theft, false overtime, using company property forpersonal benefit to payroll and sick time abuses (Wells, 2005).Davia, Coggins, Wideman, and Kastantin (2000) state that the maindifference between statement and transaction fraud is that there isno theft of assets involved in financial statement fraud (FSF).

Turning to academic studies on this subject, some research isfound concerning internal fraud. Green and Choi (1997), Lin,Hwang, and Becker (2003) and Fanning and Cogger (1998) assessthe risk on FSF by means of neural networks. Deshmukh and Tall-uru (1998) use a rule-based fuzzy reasoning system for the samegoal and Kirkos, Spathis, and Manolopoulos (2007) use several datamining techniques in order to identify financial factors to assessthe risk on FSF. Hoogs, Kiehl, Lacomb, and Senturk (2007) use agenetic algorithm approach to detect patterns in publicly availablefinancial data that are characteristic for FSF. This approach uses asliding-window approach for evaluating patterns of financial dataover quarters in terms of potentially fraudulent or not.

http://dx.doi.org/10.1016/j.eswa.2011.04.159

mailto:[email protected]





http://dx.doi.org/10.1016/j.eswa.2011.04.159

http://www.sciencedirect.com/science/journal/09574174

http://www.elsevier.com/locate/eswa

13352 M. Jans et al. / Expert Systems with Applications 38 (2011) 13351–13359

As can be seen, all articles on internal fraud that are using ex-pert systems, discuss financial statement fraud, which is only onetype of internal fraud. Aside from this, a lot of expert systems areinvestigated in the context of external fraud. (External fraud isfraud committed by someone external to the company, for exam-ple a supplier sending false invoices.) It is no coincidence that onlythis one type of internal fraud, transaction fraud, is not yet ad-dressed in academic literature. Looking at the articles on internalstatement fraud and external fraud, all studies, but a few, usesupervised data sets. Supervised data sets are provided with a la-beled output attribute, in this case ‘fraudulent’ versus ‘legitimate’.The availability of these data sets in the case of statement fraud isto explain by the public nature of financial statements. A companyneeds to file its financial statements with the government. As a re-sult, fraud committed on these statements is gathered at one cen-tral point, normally classified meticulously in order to prosecutethese companies. Files on external fraud are also classified veryprecise for the same reason. Also, there are no reputation relatedincentives to keep these fraud numbers away from the public, asthere is with discovered internal fraud. The faith of stakeholdersin the company plummets when stories about internal fraud leak.While a company cannot control this ‘leakage’ for statement frauduncovered by the government, they can control the information as-pect on statement fraud. This incentive, together with the dis-persed methods of committing transactional fraud and the lackof enough fraud files documented meticulously in a company orbusiness process, leads to a general absence of supervised data setsconcerning transactional fraud. We believe this is the reason forthe literature gap on expert systems for internal transaction fraud.This gap contrasts strikingly with the accompanying costs of thistype of fraud.

In two other papers (Jans, Lybaert, & Vanhoof, 2009, 2010), wesuggest to use and apply descriptive data mining techniques forinternal fraud risk reduction, which also includes mitigating trans-action fraud. In this paper, we wish to extend the suggested frame-work with the field of process mining. Yang and Hwang (2006)already use a process mining approach to detect health care fraud,a type of external fraud that is intensively investigated. We believethe added value of process mining is particularly high in the miti-gation of internal transaction fraud. By mitigation, we aim at bothfraud detection and fraud prevention. By applying process miningat business processes, a company gains insights in the way proce-dures are followed or circumvented. This study reports on theapplication of process mining in a case company.

An organization has business processes mapped out in proce-dures, guidelines, user guides etcetera. With process mining, wevisualize the actual process that occurs in a certain business unitinstead of the designed process. This way one can detect flows orsub flows that for example were not meant to exist. This can giveinsights in potential ways of misusing or abusing the system. Pro-cess mining also provides the possibility to specifically monitorinternal controls, like for example the four-eyes principle or thesegregation of duty. As opposed to currently wide used internalcontrol tests, the process mining approach for monitoring internalcontrol is data oriented, and not system oriented. In other words:we are able to test whether the true transactional data (the outputof the internal control system) are effectively submitted to the pre-sumed internal controls. Instead of testing whether the internalcontrol settings function by means of performing a set of randomtests, we mine the actual submitted data and are able to testwhether all conditions are met.

Another advantage is the objectivity with which the processmining techniques work, without making any presuppositions.We see the exploratory diagnostics step as a starting point to eval-uate with an open mind what opportunities possible deviationscan mean for a perpetrator. This is opposed to interpreting results

with a specific fraud in mind, resulting in possible blindness forother opportunities. On the other hand, when mining the organiza-tional and the case perspective (see below), it can be beneficial tohave some specific fraud(s) in mind. This is certainly the case whenmonitoring internal controls. At this stage specific internal con-trols, motivated by specific frauds in mind, are monitored andchecked.

We start the paper with an introduction in process mining inSection 2. In Section 3 we give information on the technique usedin this application. In Section 4, the application of this technique ina case company is presented. Sections 4.1 through 4.5 describe theprocess diagnostic steps. Process diagnostics are necessary in orderto first confirm the event log captures the general process and nextto reveal weaknesses and problems in the business process. In Sec-tion 5 we advance to a verification step where we check whethercertain assertions of the process hold or not. We end with a conclu-sion in Section 6.

2. Process mining

Many information systems that nowadays support businessprocesses, like ERP, WFM, CRM and B2B systems, are characterizedby the omnipresence of logs. Typically, these information systemsrecord information about the usage of the system by its users.These logs contain information about the instances processed inthe system (also called cases), the activities executed for each in-stance, at what time the activities were executed and by whom.Some systems also contain information about the data users en-tered for each activity. However, this data is not actively used bythe organization to analyze the underlying processes supportedby the system.

Process mining aims to make a difference. ‘‘The basic idea of pro-cess mining is to diagnose processes by mining event logs for knowl-edge’’ (van der Aalst & de Medeiros, 2005). It allows to analyzethese event logs, sometimes also referred to as ‘audit trail’, ‘trans-action log’ or ‘history’. Records in these logs are called events, or‘audit trail entries’. In process mining, each event needs to referto an activity for a specific case or process instance. Preferably, eachevent also refers to the performer, the originator of the event, and atime stamp. For each process under investigation these are the con-straining assumptions. If available data fulfills these assumptions,process mining can be applied on that particular process.

Event logs are the starting point of process mining. The data ofthe event log can be mined and different aspects about the under-lying process can be analyzed. In general, three different perspec-tives can be distinguished: the process perspective, theorganizational perspective and the case perspective. The processperspective tries to answer the ‘‘How?’’ question, and focuses onthe ordering of activities. The main focus in this perspective is pro-cess discovery and delta analysis. It tries to answer the question‘‘Which paths are followed?’’ in the most broadest sense. Typicallyone of the results in this perspective is a process model, mostly ex-pressed in graphical notations like Petri nets, event-driven processchains (EPCs) or in the business process modeling notation(BPMN). The organizational perspective focuses on the ‘‘Who?’’question. It analyzes event logs based on the users, called the orig-inators, that play a role within the process. In this perspective,underlying relations between performers or between performersand tasks can be exposed. Typical examples in this perspectiveare social networks, handover of work, and cooperation diagrams.The case perspective or the ‘‘What?’’ question focuses on a case inisolation. Typically for this analysis, the log needs to be enrichedby extra data about the case. This can be data about the completecase, or data for a specific event, like the data submitted at theevent (van der Aalst et al., 2007).

M. Jans et al. / Expert Systems with Applications 38 (2011) 13351–13359 13353

In the context of internal transactional fraud mitigation, themost important perspective to start with is the process perspec-tive. At a later stage, we turn to the organizational and the caseperspective. This order will also be followed in this study.

3. Process discovery

An important activity within the process perspective is processdiscovery, i.e., to discover a process model which explains thepaths followed by the cases in the log. In general this is a hardproblem, since, the process model should not only explain thepaths followed, but also abstract from certain infrequent paths,to maintain a readable, yet understandable model. In literatureone can find many different algorithms, e.g., the alpha algorithm(van der Aalst, Weijters, & Maruster, 2004), the multi phase algo-rithm (Dongen, 2007) and the genetic miner (Alves de Medeiros,2006; Alves de Medeiros, Weijters, & van der Aalst, 2007). Thealpha algorithm was one of the first algorithms to generate aprocess model, and it was shown that if the log is complete, andthe process that generated the log belongs to a certain class, theprocess model that generated the log can be reconstructed.However, in general this is not the case. Therefore, the multi phaseminer returns an EPC, in which all paths of the logs can beexecuted. The genetic miner uses genetic algorithms and heuristicsto decide whether a process model in a population fits the log, orthat the population needs to be updated.

The disadvantage of the aforementioned algorithms, is that themodels they return are static views on the process model, withoutshowing details about e.g., main streams. The Fuzzy Miner(Günther, 2009; Günther & Aalst, 2007a, 2007b), uses an approachthat fundamentally differs. It uses methods from thematic cartog-raphy to provide a dynamic view on the process. In thematiccartography, the main concept is generalization, to abstract fromcertain details, which may change over time. In the fuzzy miner,two metrics are identified: significance and correlation. The firstmetric calculates the frequency of event occurrences and theirorder: the more frequent a precedence relation is observed, themore significant it is. The second metric, correlation, measureshow closely events are related. This can be measured by the datathey share, or on the similarity of activity names. To generate aprocess model, the miner simplifies the process based on howsignificant and correlated events are: highly significant behavioris preserved, if the behavior is highly correlated, but less signifi-cant, events are aggregated into clusters, and are left out if thebehavior is lowly correlated. Based on these measures, views onthe event log can be generated dynamically. This allows the analystto zoom in and out on certain aspects of the process model.

4. Application at procurement process

For the application of process mining for internal transactionfraud mitigation, we got the cooperation of a case company. Thecase company, which chooses to stay anonymous, is ranked inthe top 20 of European financial institutions. The business processselected for mitigating internal transaction fraud is procurement,so data from the case company’s procurement cycle is the inputof our study. More specifically, the creation of purchasing orders(POs) was adopted as process under investigation. This is inspiredby the lack of fraud files (at the compliance department) in thisbusiness process within the case company, while one assumes thisbusiness process is as vulnerable to fraud as every other businessprocess. Procurement is on top of this a very typical field forcommitting transaction fraud.

In a first part of the case study, we want to support the ideas ofthe domain experts about the process. For this purpose, we

perform a process diagnostic step. A good methodology for processdiagnostics by process mining can be found in Bozkaya, Gabriels,and van der Werf (2009), which will be the applied methodologyin the next paragraphs. It consists of five phases: log preparation,log inspection, control flow analysis, performance analysis, androle analysis.

Process diagnostics only focuses on a global view of the busi-ness process, in order to help the analysts and domain experts toreveal weaknesses and problems in the business process. In thesecond part of the case study, we advance to a verification step.In this step, we check whether certain aspects and conditions ofthe process hold or not. This will be elaborated in Section 5. Wenow start with the five phases of process diagnostics.

4.1. Log preparation

As a start, a txt-dump is made out of the ERP system SAP. AllPOs that in 2007 resulted in an invoice are subject of our investiga-tion. We restricted the database to invoices of Belgium. This rawdata is then reorganized into an event log and a random sampleof 10,000 process instances out of 402,108 was taken (for reasonsof computability). Before creating the event log, the different activ-ities or events a case passes through, have to be identified, in orderto meet the assumptions.

It is beyond the scope of this paper to fully describe the procure-ment process supported by SAP. What it boils down to, based oninterviewing domain experts, is that a PO is created, signed andreleased. Then the goods are received, an invoice is received andit will be paid. During this process different aspects are loggedby the ERP system, being the input to our event log. The firstquestion we need to ask ourselves is ‘ What would be a correctprocess instance to allocate events to?’.

After examining the feasibility of using a PO item line as processinstance, this was selected as process instance to allocate events to.We established the following events as activities of the process:

– Creation of the PO (parent of item line).– (Change of the particular item line).– Sign of parent PO.– Release of parent PO.– Goods Receipt on item line (GR).– Invoice Receipt on item line (IR).– Payment of item line.

The change of an item line is no imperative event and couldoccur on several different moments in the process. This changecan trigger a new ‘Sign’ and ‘Release’, but this is not always thecase. Also important to note is the double dimensionality of theevents. ‘Create PO’, ‘Sign’ and ‘Release’ are activities that occur onthe header (document) level of a PO. The remaining events areon the level of a PO line item. This can lead for instance to a ‘Sign’and ‘Release’ in an audit trail of a particular PO line item (theprocess instance), while these events are not actually related tothis line item, but perhaps to another line item of the same parentPO. This aspect is important to be aware of when interpreting theresults.

For modeling the described process we use a Petri Net (e.g.,Reisig, 1985). The Petri Net in Fig. 1 represents the procurementprocess at the case company. After the creation of the PO and theitem line, the PO is released. Depending on the PO, an additionalsignature could be needed before it can be released. Often, the itemline will be changed between the creation and the sign and releaseactivities. It is also possible that the item line is changed after itwas released, and hence a new sign and release need to be triggered.Only after a release, eventually goods and an invoice are received, inany order. After receiving the invoice and goods, a payment can be

Fig. 1. Process model of procurement in Petri Net.


made. Normally, both a goods and invoice receipt are prerequisites.However, in some circumstances no goods receipt is necessary. Inthese cases the goods receipt indicator must be turned off.

After turning the information from the SAP data base into thesuggested events and event log, this event log was converted tothe MXML format, a generic XML format to store event logs in,which can be read by the tool ProM. In this format, it is possibleto add extra data, as attributes of the POs and their events. Theattributes created in our event log are listed in Table 1. On the levelof a process instance, we added the following information: the doc-ument type of the parent PO, the purchasing group that enteredthis parent PO, and the associated supplier. Although these threeattributes are actually linked to the parent PO and not to a separateitem line, this is useful information. Aside from these first threeattributes, we also included the order quantity and unit of the POitem line, the resulting net value and whether or not the goods re-ceipt indicator was turned off. Next to this PO related information,we also included the total quantity and total value of all goodsreceipts that are linked to this PO item line. We did the same for

Table 1Attributes of event log.

Level Attribute WFMElt

Process instance Document typePurchasing groupSupplierOrder quantityOrder unitNet valueGoods receipt indicatorIR total quantityIR total valueGR total quantityGR total valuePay total value

Audit trail entry Modification Change LineRelative Modification Change LineReference GR IRReference pay IRQuantity IR IRValue IR IRReference IR GRQuantity GR GRValue GR GRReference IR PayValue Pay

the related invoice receipts and the total value of all Payments thatare associated with this process instance.

On the level of the audit trail entry, a work flow model elementalso carries unique information. In particular four events areenriched with additional information: ‘Change Line’, ‘IR’, ‘GR’, and‘Pay’. If the event concerns a ‘Change Line’, we store informationabout the change: if it was a change of the net value, what wasthe size of this modification? If not the net value was changed,but another field, for example the delivery address, this fieldcontains a modification of zero. The other stored attribute givesus, in case of a change in net value, the size of the modification,relative to the net value before the change (hence a percentage).If the event concerns an ‘IR’, four attributes are stored. We storethe references that contain the (possible) link to the ‘GR’ and‘Pay’, the quantity of the units invoiced, and the credited amount,the value. Notice that these quantities and values only concern thisspecific invoice receipt, as opposed to the invoice receipt relatedattributes of the process instance. Those attributes provide sum-marized information of all invoice receipts attached to the processinstance. Also beware that this information is not collected from anentire invoice, but only from the specific line that refers to the POitem line of this process instance. Similar to the ‘IR’, three attri-butes are stored when the event concerns a ‘GR’: the reference topossibly link this goods receipt to the associated ‘IR’ (this is not al-ways possible, only in a specific number of cases), the quantity ofgoods received and the resulting value that is assigned to thisgoods receipt. This value is the result of multiplying the goodsreceipt quantity with the price per unit agreed upon in the PO.The last event that is provided of attributes is ‘Pay’. The value ofthis payment is captured, as well as the key to create a link to anassociated ‘IR’.

After collecting all the data necessary for the event log, ProMImport is used to convert our event log into the desired MXMLformat. The MXML file is analyzed next using the open-source toolProM. For more information about ProM, we refer the reader to vanDongen, de Medeiros, Verbeek, Weijters, and van de Aalst (2005)and to www.processmining.org.

4.2. Log inspection

As already stated, we start with a random sample event log of10,000 process instances. A process instance is a PO item line.The process analyzed in this paper contains seven real activities.The log at hand contains 62.531 events in total and 290 originators

http://www.processmining.org


participated in the process execution. All audit trails (the flow oneprocess instance follows) start with the event ‘Create PO’, but theydo not all end with ‘Pay’. The ending log events are ‘Pay’ (93.85%),‘Change Line’ (5.02%), ‘Release’, ‘IR’, ‘GR’ and ‘Sign’. Since not allaudit trails end with ‘Pay’, we could add an artificial ‘End’ task be-fore we start mining this process. However, we choose to clean upthe event log, and filter on cases that start with ‘Create PO’ and endwith ‘Pay’. There are two ways we can obtain this. Either we filterout all process instances that do not end with ‘Pay’, or we keep therandomly selected process instances, but cut off the audit trail afterthe last ‘Pay’ activity of that trail. We have chosen the latter option.This choice is inspired by the fact that if we filter out all PO’s thatdo not end with ‘Pay’, we might filter out a certain group of PO’sthat behave in a different manner. We think for example of PO’sthat are being used over and over again. The audit trail of such aPO may look as follows: CreatePO–Sign–Release–GR–IR–Pay–ChangeLine–Sign–Release–GR–IR–Pay–Change Line, etc. By filtering PO’s on‘end task equals ‘Pay’’ we could create a bias on the proportion ofthis kind of PO’s in the total data set. By cutting off the audit trailafter the last payment, we preserve the original representation ofPO behavior. This cleaning step resulted in an event log of 10.000cases with 61.562 audit trail entries and 285 originators which willbe our process mining input.

We first apply the Fuzzy Miner (a plugin in the tool ProM) to geta first glance of the real process. The result reveals Create PO–Sign–Release–IR–Pay as the most frequent path. This is corresponding tothe designed process model. Also the side paths are well explicable.The digress onto Change Line and the use of a Goods Receipt beforethe Invoice Receipt are part of the designed model. Also the path ofhaving a Goods Receipt after a payment is easy to understand inthe light of a split delivery.

4.3. Control flow analysis

The third step of the followed process diagnostics methodologyis analyzing the control flow. In a first part, we wish to uncover thecore process that is embedded in the event log, and to be able toconfirm that the business process functions in a way that corre-sponds to the designed model. Using the performance sequenceanalysis plugin of ProM, we have a view on the patterns followedin this log. The analysis reveals 161 patterns. This is a very highnumber, certainly for such a relatively simple process modeldesign. This gives us already an idea of the complexity of thisprocess and the ‘noise’ in this event log. Recall that processdiagnostics tries to get an overview of a log, while this noise isthe main input and focus for fraud detection. Five, respectivelyseven patterns suffice to cover 82% and 90% of the entire log (seeTable 2). Inspection of these patterns with the domain expert tellsus already that all these patterns are completely according to thecase company’s procedures. To discover a process model that cov-ers the run of the mill, it is necessary to filter out the unfrequentpatterns. That is why we will only use the first five patterns(describing 82% of the log) to discover a process model. This way

Table 2Top seven of most occurring sequences.

Pattern Sequence Occurrences

# %

0 Create PO–Sign–Release–IR–Pay 3,066 301 Create PO–Sign–Release–GR–IR–Pay 2,528 252 Create PO–Change Line–Sign–Release–GR–IR–Pay 1,393 133 Create PO–Change Line–Sign–Release–IR–Pay 633 64 Create PO–Release–Change Line–IR–Pay 599 65 Create PO–Sign–Release–Change Line–IR–Pay 546 56 Create PO–Release–IR–Pay 232 2

we can extract an understandable process model from the eventlog that describes the overall process. This model will in turn becompared with the designed process model, in order to assurethe process in general is executed as desired.

Taking the selection of the log with only the five most occurringpatterns and applying thefinal state machine (FSM) miner results ina transition diagram which was input for the tool Petrify to get aprocess model. The resulting process model is depicted in Fig. 2.Running a conformance check reveals that 80% of the total log, iscovered by this process model. This result is used as a feedbackto the domain experts. It was concluded that the general outlinesof the process are clearly coming forward in the event log. This isseen as a reassuring start.

4.3.1. Exposing less frequent flowsAnother contribution of the control flow analysis is to use the

complete event log with the 10.000 cases and to have a look atthe resulting flows when lower thresholds are used. Loweringthe threshold settings will result in a graph with more edges,exposing flows that are less frequently followed. This is a niceand convenient way (visual) of looking at the most importantunfrequent paths. Turning back to the application of the FuzzyMiner, we change the settings in such a manner that more flowsbecome apparent. Concretely, we change the ‘Cutoff’ edge filterto the values 0.70 and 0.85. These different settings indeed resultin models with more edges. Elevating the ‘Cutoff’ to 0.70 (com-pared to the default setting of 0.20) revealed two extra flows: ‘Cre-ate PO ? Release’ and ‘Sign ? GR’. Elevating the ‘Cutoff’ further to0.85 (depicted in Fig. 3) revealed even four more extra flows (ontop of the other two):

– Create PO ? GR.– Release ? Pay.– Sign ? IR.– GR ? Change Line.

Before discussing the extra six flows, visible at the graph inFig. 3, an important aspect of interpreting these results has to behighlighted. The arcs from one event to another in a resultinggraph of the Fuzzy Miner, need to be seen in an AND/OR relation-ship, which is not visible at this output graph. This means that forinstance an arc from activity A to activity B does not per definitionmean that B directly follows A. Perhaps this arc should be inter-preted along with another arc, from activity A to activity C. Thetwo flows ‘A ? B’ and ‘A ? C’ may represent an AND (or OR) rela-tionship (after A, B and/or C follow) without having B per definitiondirectly after A, the same for C. Hence, looking at the Fuzzy Minerresult gives us ideas of extra flows, but deducing direct flowsbetween one activity and another, needs to be explicitly checked.

In the next paragraphs the six extra flows are discussed with thedomain experts and if necessary explicitly checked. Two flows arevery normal: ‘Create PO ? Release’ and ‘GR ? Change Line’. A‘Change Line’ can occur at every stage of the process and the fact

Total Throughput time (days)

% Average Min Max Standard deviation

.7 31 16.75 3 176 9.59

.3 56 34.77 2 327 26.11

.9 70 29.74 4 328 36.46

.3 76 25.28 3 241 27.15

.0 82 68.88 4 264 39.23

.5 88 21.4 9 299 16.2

.3 90 20.04 2 197 24.16

GR

Change Line

IR (complete)

Release

(complete)

IR (complete)(complete)

Release (complete)

Release (complete)

Pay (complete)

IR (complete)

Change Line (complete)

Sign (complete)

Sign (complete) (complete)

Create PO (complete)

Fig. 2. FSM Miner result.

Create POcomplete

0,962

Change Linecomplete

0,237

0,3820,321

Signcomplete

0,474

0,7830,120

Releasecomplete

0,651

0,3320,104

GRcomplete

0,350

0,0530,162

0,068?

0,3140,141

0,0780,277

0,0230,467

0,9670,130

IRcomplete

0,610

0,2190,052

0,1950,039

0,0100,398

0,6710,055

Paycomplete

1,000

0,2460,076

0,5110,063

0,049?

1,0000,242

0,1171,000

0,0160,046

0,5300,188

0,091?

Fig. 3. Fuzzy Miner result with ‘Cutoff’ = 0.85.


that the PO is not first signed, before it is released is a realisticpossibility. However, there are certain conditions attached toleaving out the ‘Sign’. These will be verified in a later stage.

The flows ‘Create PO ? GR’, ‘Sign ? GR’ and ‘Sign ? IR’ eachhave the same problem. A release is a prerequisite for orderinggoods at a supplier (hence the name). Normally speaking, onlyafter placing an order at a supplier, a goods receipt or an invoicereceipt could be received at a purchasing department. Followingthis train of thought, all three flows are contrary to the designedprocess, and should also not exist if the SAP settings function asthey should. Therefore, before looking for explanations or goingover to investigation, we need to confirm whether these flowsreally occur in these specific orders, or that they are part of anAND/OR relationship that takes care of the above mentioned

restriction. For this, we use the LTL-Checker plugin in ProM and testwhether ‘eventually activity A next B’ with A and B being the eventsin question we want to check. The LTL checks reveal that out of the10.000 process instances, none showed the direct flow of ‘CreatePO ? GR’, three instances had a direct flow ‘Sign ? GR’ and againnone had a flow ‘Sign ? IR’.

We take the three process instances with the flow ‘Sign ? GR’under investigation. The first one shows a pattern of Create PO–Sign–Release–Sign–GR–Release. So a release has been taken place beforethe Goods Receipt is entered into the system, confirming the SAPcontrol settings. Because the events ‘Sign’ and ‘Release’ are bothon the header level of a PO and hence not per definition linkedto the process instance (only one line item of a PO), it could be thatthe ‘GR’ in this case fell in between a Sign–Release flow, triggered byanother line item. The other two process instances we looked intoshowed the same situation.

The last flow of the six, ‘Release ? Pay’, raises the questionwhether for these payments an according invoice is received.Normally, each ‘Pay’ should be preceded by an ‘IR’. Again we startwith checking whether there exists a direct flow of ‘Release ? Pay’for our process instances. We check the same formula ‘eventuallyactivity A next B’ with A and B being ‘Release’ and ‘Pay’. Thereare 55 instances (i.e., only 0.55% of the cases) that show this directflow. There are two possible scenarios for this flow: (1) the ‘IR’ hastaken place before ‘Release’. This can again be explained as the‘Sign ? GR’ flow: a Sign –Release flow, triggered by another lineitem, popped in between an IR–Pay flow of this process instance.Or (2), there is no ‘IR’ related to this ‘Pay’. This condition can betested and looked into later, at the verification step.

4.4. Performance analysis

At the phase of performance analysis, questions like ‘‘Are thereany bottlenecks in the process?’’ are answered. In this phase theaverage and maximum throughput times of cases are looked intoand analyzed. Also for continuous auditing and fraud detection aperformance analysis can help. It could be the case that a certaingroup of tasks is executed within a very short period of time, whichmight indicate an attempt to fraud. However, performance analysisapplied for fraud detection is outside the scope of this paper.

4.5. Role analysis

At the fifth phase of process diagnostics, role analysis, the rolesin a process are analyzed. A role should be seen as a person (in thiscase study) that is involved in the process by executing activities ofthat process. Role analysis attempts to answer questions like ‘‘Whoexecutes what activities?’’ and ‘‘Who is working with whom?’’. Inthis phase, it is interesting to check on the efficiency of the segre-gation of duty.

The segregation of duty is a principle to reduce potentialdamage from the actions of one employee (Elsas, 2008). Thereforeit is hindered that one single employee has control over a critical

Fig. 4. Role-activity matrix.


combination of business transactions, such as there are for exam-ple a ‘Sign’ and a ‘Release’ authority in one single purchase. Bylooking at the role-activity matrix, we can have a first look whethera person executing the activity ‘Sign’, also executes the activity‘Release’. A print screen of a part of the matrix can be found inFig. 4. At this screen we find for instance one originator thatexecuted 1.733 times a release, and also signed 1.512 PO’s. Thisis the most extreme case of the event log, but other originators alsocombine these two tasks. This matrix however does not tell ussomething about whether this should be a problem or not, becauseif these signs and releases concern different POs, there is nothingwrong with having both authorities in one person. We findhowever confirmation for the necessity to investigate this further.These checks require a case perspective of process mining, whichbrings us to the verification step, the second part of our analysis.

5. Verifying properties

After mainly looking at the process perspective, we turn to thecase perspective of process mining by the verification of certainproperties. In this section we want to check whether certain asser-tions hold, i.e., whether the associated internal controls efficientlyfunction. Several internal control settings are possible at an ERPenvironment. Rather than just checking whether these settingsare in place at a specific moment, we can test the output datawhether the internal controls function properly. We classify thechecks to execute in three categories: checks on the segregationof duty, case specific checks and other internal control checks,not belonging to one of the two categories mentioned before. Forall these checks, we use the LTL Checker plugin of ProM.

5.1. Checks on segregation of duty

As already was confirmed by the role-activity matrix, there is aneed to further investigating whether the segregation of duty isrespected in this business process. After discussing with thedomain expert which controls are interesting for a company to

check whether the segregation of duty is efficient, we came tothe following three checks:

– Are ‘Sign’ and ‘Release’ always executed by two distinctpersons?

– Are ‘GR’ and ‘IR’ always entered by two distinct persons?– Are ‘Release’ and ‘GR’ always executed by two distinct persons?

When designing the right formula to execute the first check, it isimportant to take into account that this has to be checked pairwise.If a release takes place, then a ‘Change Line’ occurs, and the nextsign is performed by the previous releaser, this does not have topose a problem. As long as the release, following the last sign, isgiven by another employee, the segregation of duty is intact. Itturned out that in all cases this first proposition holds.

To examine the second proposition we checked whether thereexist cases where the activities ‘GR’ and ‘IR’ are executed by thesame originator. The LTL Checker revealed that in all cases, theseactivities were performed by different originators.

We used the same formula to check whether there exists a casein which ‘Release’ and ‘GR’ are performed by the same originator.This check revealed that in 21 cases the originator of ‘Release’and ‘GR’ were the same. Running some extra analyzes on thesecases, revealed that only four persons were responsible for the21 inconsistencies (as opposed to 17 originators whom wereinvolved in the complete audit trails of the 21 cases). One personperformed ten times both activities ‘Release’ and ‘GR’, anotherperson nine times, and two other persons each performed thetwo activities (in one case) once. The 21 cases also belong onlyto five purchasing groups.

Where in the process diagnostic step the role analysis isrestricted to a role-activity matrix, it is interesting to have a closerlook at the collaboration between the 17 originators involved,because this is a smaller, manageable group. Having a look at thehandover of work with the social network miner, reveals the picturein Fig. 5. The true user ID’s of the originators are erased for reasonsof confidentiality. Hence the empty rectangles present 17 uniqueoriginators. It is clear there are four groups of collaboration with

Fig. 5. Result of social network miner–handover of work for cases where ‘Release’ and‘GR’ were not separated.


one person belonging to two of them. These relationships can belooked into by the domain expert to test whether they match thedesigned procedures.

5.2. Case specific checks

Also some very specific checks, related to the company underinvestigation, can be formulated. For this case for example, thereis always a ‘Sign’ needed before a ‘Release’ can be given, exceptin two situations:

– The PO document is of a certain ‘Type A’ and the total PO valueis less than ‘amount B’.

– The supplier is ‘X’ and the total PO value is less than ‘amount C’.

Because we already found evidence of cases where no signoccurred, we checked these properties using the LTL Checker avail-able in ProM. At first, we got 938 incorrect cases, but this was dueto an error in ProM regarding the handling of floating points. Afterchanging the amounts into cents, ProM had no problem in check-ing these constraints. Nevertheless, 259 cases were found not tofollow this rule. This is over 2.5% of the random sample. There were34 originators involved. How Epsilon will deal with this biggeranomaly is not known yet.

At this case study we only formulated one case specific check.This can of course be elaborated, depending on the case company.We just want to show the applicability of the case perspective ofprocess mining.

5.3. Other internal control checks

In this case study, we selected four remaining internal controlsto check. The first internal control we wish to test for is verystraightforward: Is every case in this event log released at leastonce? This is the minimum of authorization each process instancemust have passed. The second test checks whether it is ensuredthat no payment occurred without having a corresponding invoiceentered into the system. The third test checks whether the goodsreceipt indicator is indeed turned off when no ‘GR’ is found in anaudit trail. The fourth control test checks whether the change ofa PO line item appropriately triggers a new sign and/or release.

The first control of at least one ‘Release’ per process instance isnot inspired by the process diagnostics steps. However, this is afundamental step in the process design, which makes it vital tothe verification of internal controls. Running this first check revealsone case that has no ‘Sign’ and no ‘Release’ activity. The processinstance was started by a batch file, but nevertheless an invoicewas entered and paid afterwards, without any authorization atall. Later, a credit note was entered for the same amount. The casecompany needs to examine how this case got through the system.

For the second control, is there for each ‘Pay’ a corresponding‘IR’?, we have to use the attributes ‘Reference Pay’ and ‘ReferenceIR’ of the events ‘IR’ and ‘Pay’ respectively. Running the appropriateformula gave us first 14 cases with in total 71 ‘Pay’s that did nothave a corresponding invoice. When looking closer at these pay-ments, there were a lot of them created by a batch file. Eliminatingthese payments resulted in ten cases (encompassing 32 standalone payments). Studying the seven involved originators (onlythe originators of the ‘Pay’ activities) revealed that one personwas responsible of 22 stand alone payments. This makes it lesstime consuming to examine all 32 payments. After manual exam-ination, it turned out that all payments had an accompanyingreference, but on another document type than ‘invoice receipt’.This other type, ‘Subsequent Debit’, was not taken into accountin our event log. In future applications, this type, and maybe othertypes, should be inserted in the event log at the log preparationphase.

For the third control, is the goods receipt indicator turned offwhen no ‘GR’ is found?, we write a corresponding LTL formula.Of the 10,000 cases, one turned out not to respect this rule. Againthe question raises how this case got through the informationsystem.

The fourth control, is a ‘Change Line’ appropriately followed bya ‘Sign–Release’?, refers to the following rules: if an order is about12,500 euros or less, an order change up to 5% is permitted to bemade by an employee without approval. If an order has a totalvalue between 12,500 and 125,000 euros, there is only a 2% free-dom in modification. An order above 125,000 euros can not bemodified without approval. For this check we use the attributes‘Net Value’ of the process instance and the ‘Relative Modification’and ‘Modification’ attributes of the activity ‘Change Line’. 77 casesturned out not to respect the above described rules. This is a lownumber of cases relative to 10,000 (only 0.77%).

6. Conclusion

For the case of the data mining domain field, it took some decadesbefore the application of this research domain was projected fromthe academic world into the business environment (and more pre-cisely as a fraud detection mean and as a market segmentationaid). As for the case of process mining, we wish to accelerate this stepand recognize already in this quite early stage which opportunitiesprocess mining offers to business practice. Process mining offersthe ability to objectively extract a model out of transactional logs,so this model is not biased towards any expectations the researchermay have. In the light of finding flaws in the process under investi-gation, this open mind setting is a very important characteristic. Alsothe ability of monitoring internal controls is very promising.

In this paper we presented a case study in which we appliedprocess mining in the context of transaction fraud. Given theprocurement process of an organization using SAP as ERP system,we applied the process diagnostics approach to discover the realprocess and to analyze flaws, i.e., to discover cases that are notcompliant. This enables the explicit possibility of checking internalcontrols and business rules in more general. This way, process min-ing enables auditing by not only providing theory and algorithms


to check compliance, but also by providing tooling that help theauditor to detect fraud or other flaws in a much earlier stage. How-ever, the case study also shows that, although tools are available,they are still quite premature. Therefore, we need to enhance toolslike ProM to better automate the audit process and to visualize re-sults for management.

References

ACFE, (2008). 2008 ACFE report to the nation on occupational fraud and abuse.Technical report. Association of Certified Fraud Examiners.

Alves de Medeiros, A. (2006). Genetic process mining. Ph.D. Thesis.Alves de Medeiros, A. K., Weijters, A. J. M. M., & van der Aalst, W. M. P. (2007).

Genetic process mining: An experimental evaluation. Data Mining andKnowledge Discovery, 14(2), 245–304.

Bologna, G., & Lindquist, R. (1995). Fraud auditing and forensic accounting. JohnWiley and Sons.

Bozkaya, M., Gabriels, J., & van der Werf, J. M. (2009). Process diagnostics: A methodbased on process mining. In Proceedings of international conference oninformation, process, and knowledge management (eKNOW).

Davia, H. R., Coggins, P., Wideman, J., & Kastantin, J. (2000). Accountant’s guide tofraud detection and control (2nd ed.). John Wiley and Sons.

Deshmukh, A., & Talluru, L. (1998). A rule-based fuzzy reasoning system forassessing the risk of management fraud. International Journal of IntelligentSystems in Accounting, Finance & Management, 7(4), 223–241.

Dongen, B. (2007). Process mining and verification. Ph.D. Thesis.Elsas, P. I. (2008). X-raying segregation of duties: Support to illuminate an

enterprises’s immunity to solo-fraud. International Journal of AccountingInformation Systems, 9(2), 82–93.

Fanning, K., & Cogger, K. (1998). Neural network detection of management fraudusing published financial data. International Journal of Intelligent Systems inAccounting, Finance & Management, 7, 21–41.

Green, B., & Choi, J. (1997). Assessing the risk of management fraud through neuralnetwork technology. Auditing, 16(1, Spring), 14–28.

Günther, G. (2009). Process mining in flexible environments. Ph.D. Thesis.

Günther, C., & Aalst, W. (2007a). Fuzzy mining: Adaptive process simplificationbased on multi-perspective metrics. In BPM 2007. LNCS (Vol. 4714,pp. 328–343). Springer.

Günther, C., & Aalst, W. (2007). Finding structure in unstructured processes: Thecase for process mining. In ACSD 2007.

Hoogs, B., Kiehl, T., Lacomb, C., & Senturk, D. (2007). A genetic algorithm approachto detecting temporal patterns indiciative of financial statement fraud.Intelligent Systems in Accounting, Finance & Management, 15, 41–56.

Jans, M., Lybaert, N., & Vanhoof, K. (2009). A framework for internal fraud riskreduction at IT integrating business processes: The IFR2 framework.International Journal of Digital Accounting Research, 9, 1–29.

Jans, M., Lybaert, N., & Vanhoof, K. (2010). Internal fraud risk reduction: Results of adata mining case study. International Journal of Accounting Information Systems,11, 17–41.

Kirkos, E., Spathis, C., & Manolopoulos, Y. (2007). Data mining techniques for thedetection of fraudulent financial statements. Expert Systems with Applications,32(4), 995–1003.

Lin, J., Hwang, M., & Becker, J. (2003). A fuzzy neural network for assessing therisk of fraudulent financial reporting. Managerial Auditing Journal, 18(8),657–665.

Reisig, W. (1985). Petri Nets: An introduction. Monographs in theoretical computerscience: An EATCS series (Vol. 4). Berlin: Springer-Verlag.

van der Aalst, W., & de Medeiros, A. (2005). Process mining and security: Detectinganomalous process executions and checking process conformance. ElectronicNotes in Theoretical Computer Science, 121, 3–21.

van der Aalst, W., Rijers, H., Weijters, A., van Dongen, B., de Medeiros, A., Song, M.,et al. (2007). Business process mining: An industrial application. InformationSystems, 32(5), 712–732.

van der Aalst, W., Weijters, T., & Maruster, L. (2004). Workflow mining: Discoveringprocess models from event logs. IEEE Transactions on Knowledge and DataEngineering, 16(9), 1128–1142.

van Dongen, B., de Medeiros, A., Verbeek, H., Weijters, A., & van de Aalst, W. (2005).The ProM framework: A new era in process mining tool support. Lecture Notes inComputer Science, 3536, 444–454.

Wells, J. (2005). Principles of fraud examination. John Wiley and Sons.Yang, W.-S., & Hwang, S.-Y. (2006). A process-mining framework for the

detection of healthcare fraud and abuse. Expert Systems with Applications,31, 56–68.

Documents

A business process mining application for internal transaction fraud mitigation