View
171
Download
0
Category
Preview:
DESCRIPTION
A research that shows how managers can control the service levels of their product using the event log of the incidentmanagement system
Citation preview
Improving Service level control with process mining
A research that shows how managers can control the service
levels of their product using the event log of the incident
management system
Subject: Research project
Student: Ing. R.H.J.C. van Wel
Date: 09 January 2013
Status: Complete
Improving Service level control with process mining| 09 January 2013
Summary
The objective of this research was to examine if the information, which is registered
in the event log of the incident management system, can add value in controlling
the service level of a product.
By using process mining techniques and tools, we were able to get insight in the
distribution- and handling activities of the incident management process. During our
process discovery phase we discovered that the service level of the product types
Desktop and Laptop, rapidly decreases when incidents are handled by two or more
assignment groups. In addition, we discovered that the incident management
system does not always register the correct timestamp of executed incident handler
activities. Also we saw that some incident handlers execute unusual process
activities and that the incident management system does not add extra service level
time when an incident is reopened after it was closed. Finally we discovered that the
company is able to extract data from the event log that can be used as a predictive
indicator for an increasing or decreasing workload.
The conclusion of this research is that the event log of the incident management
system contains enough information to visualize the distribution- and handling
activities of the incident management process. By using this information the
company is able to be more in control over the service levels of their products.
Improving Service level control with process mining| 09 January 2013
Pagina 3 van 27
Colophon
CAI Master of Science program
University Leiden
Course element Research project
Student Ing. R.H.J.C. van Wel
Email Royvanwel@gmail.com
Version 1.4
Improving Service level control with process mining| 09 January 2013
Index
Summary 2
1 Introduction 5
1.1 Preface 5
1.2 Business case 5
1.3 Research relevance 5
1.4 Theoretical framework 6
1.5 Research question 7
1.6 Scope & delineation 7
2 Research methodology 8
3 Research results 9
3.1 Analyze event log 9
3.2 Process discovery 10
3.3 Process conformance 13
3.4 Workload prediction 16
4 Conclusion 21
4.1 Conclusion 21
4.2 Recommendations 21
4.3 Discussion 23
References 25
Appendix 26
Improving Service level control with process mining| 09 January 2013
Pagina 5 van 27
1 Introduction
This research has been conducted within a company whose name cannot be
mentioned for security reasons.
1.1 Preface
Many IT companies use incident management processes and incident management
systems to control their incident handling process. To manage this process, one can
use Key Performance Indicators (KPI).
The use of a KPI can, for example, be very helpful to see how good (or bad) the
service level of a certain product has performed or how well a related business unit
has performed in the handling of incidents.
1.2 Business case
Based on an interview, which was held with the Senior Process Manager (SPM) of
the company, the SPM states that the company is currently not able to respond
quickly enough to increasing workloads. One of the main reasons is that most
Business Unit Managers (BUM) focus on a monthly based KPI Incidents Resolved in
Time.
By using the KPI Incidents Resolved in Time, BUM’s can only act on a reactive way
because the distribution- and handling process has already occurred. This KPI also
does not show how incidents were distributed and handled by the business units.
Therefore it is difficult to find the reason why a service level of a product has
decreased.
1.3 Research relevance
The purpose of this research is to examine if the information, which is registered in
the event log of the incident management system, can add value in controlling the
service level of a product. Therefore, the first goal is to get insight in the
distribution- and handling activities of the incident management process.
The second goal is to examine if the information from this event log can be used to
predict increasing workloads and see what the effects are of these increasing
workloads.
Improving Service level control with process mining| 09 January 2013
Pagina 6 van 27
1.4 Theoretical framework
According to van der Aalst (2011, p.55) the performance of a process or
organization can be defined in different ways. Typically, three dimensions of
performance are identified: time, cost and quality. For each of these performance
dimensions, different Key Performance Indicators (KPIs) can be defined. When
looking at the time dimension, the following performance indicators can be
identified:
• The lead time (also referred to as flow time) is the total time from the
creation of the case to the completion of the case;
• The service time is the time actually worked on a case;
• The waiting time is the time a case is waiting for a resource to become
available;
• The synchronization time is the time an activity is not yet fully enabled and
waiting for an external trigger or another parallel branch.
Many systems have some kind of event log often referred to as ‘‘history’’, ‘‘audit
trail’’, ‘‘transaction log’’, etc. The event log typically contains information about
events referring to an activity and a case. The case (also named process instance) is
the ‘‘thing’’ which is being handled, e.g., a customer order, a job application, an
insurance claim, a building permit, etc. The activity (also named task, operation,
action, or work item) is some operation on the case. Typically, events have a
timestamp indicating the time of occurrence. Moreover, when people are involved,
event logs will characteristically contain information on the person executing or
initiating the event, i.e., the performer. (van der Aalst, van Hee, 2002)
The idea of process mining is to discover, monitor and improve real processes (i.e.
not assumed processes) by extracting knowledge from event logs readily available
in today’s systems. (van der Aalst, 2011).
According to van der Aalst (2011, p.9) event logs can be used to conduct three
types of process mining, namely:
1. Process discovery
The first type of process mining is discovery. A discovery technique takes an
event log and produces a model without using a-priori information. […] If
the event log contains information about resources, one can also discover
resource-related models, e.g., a social network showing how people work
together in an organization.
2. Process conformance
The second type of process mining is conformance. Here, an existing
process model is compared with an event log of the same process.
3. Process enhancement
The third type of process mining is enhancement. Here, the idea is to
extend or improve an existing process model using information about the
actual process recorded in some event log. Whereas conformance checking
measures the alignment between model and reality, this third type of
process mining aims at changing or extending the a-priori model.
Improving Service level control with process mining| 09 January 2013
Pagina 7 van 27
This research addresses the Process discovery phase and Process conformance
phase. The manner in which these phases have been executed, is described in
Section 2. The research conclusion and recommendations are defined in Section 4
and meant to be used for the Process enhancement phase in further research.
1.5 Research question
To be able to answer the main research question the following sub questions have
to be answered:
Sub questions:
Sub question 1: Which event log data must be used as information to visualize the
distribution- and handling activities of the incident management
process?
Sub question 2: Which information should business unit managers extract from the
event log to be able to predict a workload increase and see what
the effects are of these increasing workloads?
1.6 Scope & delineation
This research will only focus on incident management activities that were managed
by business units and in particular one business unit which we will call EUS (End
User Services). Therefore this research will not examine human resource activities.
The Process mining tools ProM and Disco will be used to execute the process
discovery phase. ProM will be used because it is an open-source tool which has
many plugins (e.g. Social networks and Petri nets) that can be used for process
analyses. However, the commercial process mining tool Disco is more easy to work
for the process conformance phase.
The results of the process discovery phase and process conformance phase will be
based on quantitative measurements. The quality of the incident management
process and related activities, can be discussed when the process enhancement
phase is executed.
The main research question will answer how managers can control the service level
of their products. In this research the meaning of “service level control” implies that
one is able to explain the cause and effects of a service level performance, based on
the information that is registered within the event log. If one is able to explain the
cause and effects of a service level performance, one also has the ability to share
this information and take action when this is necessary.
Which information should business unit managers extract from the incident
management event log to control the service level of their products?
Improving Service level control with process mining| 09 January 2013
Pagina 8 van 27
2 Research methodology
Analyze event log
First we need to extract the data, which is registered within the event log of the
incident management system. To analyze distribution- and handling activities we
need to have a substantial amount of historical data. Therefore we will extract an
event log, which contains information of all closed incidents between the period of
01-09-2010 and 30-09-2012.
After analyzing this event log, we will determine the research focus and define
which data (Case ID, Activity ID, Resource ID and Time dimension) can be used for
the process discovery step.
Process discovery
To determine which information is valuable for controlling the service level of a
product, we need to get insight in how the distribution- and handling activities were
executed. Therefore, we will visualize the incident management process by using
the process mining tool Disco. The results will show information about the
distribution activities of the incident management process. To see how the incidents
were handled by the business units, we will use a social network plugin from the
process mining tool ProM. These results will show how the business units interacted
with each other.
Process conformance
The results of the process discovery phase will be discussed with the SPM of the
incident management system and each unusual observation will be defined.
After executing the process discovery phase and the process conformance phase,
we are able to identify which information is needed to visualize the distribution- and
handling activities of the incident management process (Sub question 1).
Workload prediction
To predict an increasing or decreasing workload, we will build hypotheses and
examine them based on the information that is registered in the event log. In each
hypothesis we will explain our assumptions, explain what must be done to examine
these assumptions and analyze the results. This hypothesis cycle will be continued
until we are able to define which information business unit managers should extract
from the event log to be able to predict a workload increase and see what the
effects are of these increasing workloads (Sub question 2).
After answering sub question 1 and sub question 2, we are able to conclude which
information business unit managers should extract from the incident management
event log to control the service levels of their products (Main research question).
Process enhancement
The research conclusion and recommendations are defined in Section 4 and meant
to be used for the Process enhancement phase in further research.
Improving Service level control with process mining| 09 January 2013
Pagina 9 van 27
3 Research results
3.1 Analyze event log
The extracted event log consists of a lot of information. To define which information
we can extract from the event log, we have aggregated the data into one table1.
The event log encompasses the numbers, which the incidents are registered on.
These incident numbers can be used as Case ID’s. The executed activities
(Opened, Assignment, Resolved, Closed) can be used as Activity ID’s. Each
activity ID is linked to a Time Stamp and a Resource ID. This Recourse ID shows
the name of the business unit (named assignment group in the event log) that
executed the activity.
To distinguish the different types of assignment groups, we will rename these
assignment groups in the event log before we execute het process discovery phase.
The assignment group, which is linked to an open activity, will be called Control
group. According to the SPM, this type of resource is responsible for managing the
incident to the assignment group that is responsible for resolving the incident.
The assignment groups, which handle the activities after the control group, will be
called First Reassignment group, Second Reassignment group, Third
Reassignment group, Fourth Reassignment group, Fifth Reassignment
group and Nth Reassignment group2.
The assignment group, which is linked to a resolved activity, will be called a
Resolved group. The assignment group, which is linked to a closed activity, will be
called Closed group. This assignment group is always the same assignment group
as the Control group.
The event log also shows which incident has Breached the service level time and
which incidents were Resolved in time. Therefore we can use this type of
information to determine the service level performance of a product.
Each incident is registered on a Product type. Therefore we can use this
information to filter on specific product types that were managed by the business
unit EUS. We can use the Elapsed time3 data to see how much time it has taken to
resolve an incident. This time-type only measures the time, which is stipulated
according to the Service Level Agreement of a product.
Research focus
Our event log covers all incidents that were closed between the period of 01
September 2010 and 30 September 2012. In this period the company had resolved
162677 incidents. These incidents were registered on 1679 different product types.
52546 of the incidents were registered on the product types Desktop and Laptop
and managed by the Business unit EUS. Therefore we will continue this research by
focussing on all incidents that were registered on the product types Desktop and
Laptop.
1 See appendix Event log information 2 Nth reassignment groups means 6=< reassignment group 3 Elapsed time = service level time (measured time between opened time and resolved time)
Improving Service level control with process mining| 09 January 2013
Pagina 10 van 27
Trimmed mean
Looking at the spread of the elapsed times, we see that there are several outliers.
The maximum recorded elapsed time is 3403 hours and the minimum recorded
elapsed time is 0,0 hour. We will call the outliers with high elapsed time top-outliers
and outliers with low elapsed time bottom-outliers.
Moore and McCabe (2006) describe outliers as
individual values that fall outside the overall
pattern. The trimmed mean is a measure of
centre that is more resistant than the mean but
uses more of the available information than the
median. Trimming eliminates the effect of a
small number of outliers.
According to the SPM, these outliers should not be taken into account for this
research, because these outliers are unusual circumstances and will affect the
research results in a negative way. Therefore we will compute a 5% trimmed mean.
To execute this 5% trimmed mean, we discarded 5% of the top-outliers and 5% of
the bottom-outliers. After trimming the top-outliers and bottom-outliers, the event
log consists of 47290 incidents. The maximum elapsed time is 309 hours and the
minimum elapsed time of 1,3 hours. Table 3.1.1 shows the amount of incidents that
were controlled or resolved by the business unit EUS. Table 3.1.2 shows the amount
of incidents that were controlled and resolved by the business unit EUS. Table 3.1.3
show the amount of incidents that were controlled by the business unit EUS and
resolved by other business units.
Table 3.1.1 Incidents divided per control group and Resolved group
Control group Opened Resolved
Business unit EUS 47257 20600
Other Business units 33 26690
Total 47290 47290
Table 3.1.2 Controlled and Resolved incidents by Business unit EUS
Control group Resolved group EUS
Business unit EUS 20598
Table 3.1.3 Control group EUS / All resolved groups except EUS
Control group Resolved group ALL except EUS
Business unit EUS 26652
3.2 Process discovery
Figure 3.2.1 shows the process model that Disco has discovered based on the
47257 incidents that were managed by the business unit EUS. The process model
visualizes the flow of the incident distribution process. The arrows show how the
incidents were forwarded between the Control group (01 Opened), Assignment
groups (02 First Reassignment group, 03 Second Reassignment group, 04 Third
Reassignment group, 05 Fourth Reassignment group, 06 Fifth Reassignment group
and 07 Nth Reassignment group), Resolved group (06 Resolved) and Closed group
(09 closed).
The frequency of the activities are visualized per colour (low frequency = light blue
& high frequency = dark blue), by number and thickness of the arrows (low
frequency = small arrow & high frequency = thick arrow).
We will comment on the process model in section 3.3
Identifying outliers is a
matter for judgement. Look for
points that are clearly apart
from the body of the data, not
just the most extreme
observations in a distribution.
Moore & McCabe (2006)
Improving Service level control with process mining| 09 January 2013
Figure 3.2.1 Process model
Improving Service level control with process mining| 09 January 2013
Incident handling process
To visualize how the incidents were handled between the control groups and the
resolved groups, we used a social network plugin within the process mining tool
ProM. Hereby we divided the results by:
• Incidents that were managed (controlled) and resolved by the business unit
EUS (Figure 3.2.2);
• Incidents that were managed (controlled) by the business unit EUS and
resolved by other business units (Figure 3.2.3).
The size of the circles illustrate the number of incidents that each control group or
resolved group handled. The arrows show the relation between the control groups
and resolved groups. The colours are used to divide the control groups and resolved
group from each other.
Figure 3.2.2 Control groups EUS & resolved groups EUS
Figure 3.2.3 Control groups EUS & all resolved groups except EUS
We see that Control group 1 EUS managed most incidents within the business unit
EUS, but also the incidents that were resolved by other business units. By
generating these social networks (Figure 3.2.2 and Figure 3.2.3) we see how many
different types of resolved groups exists. BUM’s can distinguish the importance of a
relationship by creating these social networks and use this information to control the
service level of their product.
Improving Service level control with process mining| 09 January 2013
Pagina 13 van 27
3.3 Process conformance
The process model gives a good overview of the handling of the incident
management process. However, looking at the process model and data in the event
log we also observe some unusual process activities, namely:
Observation 1
We would not have expected that incidents need to be reassigned to an assignment
group after an incident is closed.
In each case this activity occurs, the time difference between the registered
activities Closed and Reassignment group is 1 second (e.g. Table 3.3.1). We assume
that both activities were executed simultaneously by one resource, however the
system registered those activities with a small time difference. This activity should
not occur because it illustrates a wrong perspective on the incident distribution- and
handling process. Therefore we recommend that the SPM should examine this
observation further.
Table 3.3.1 example reassignment activity after closed activity Activity Resource Date Time
Opened Control group 1 EUS 04.06.2011 23:43:56
Reassignment Group Assignment group 1 06.06.2011 7:34:20
Resolved Resolved group 1 06.06.2011 10:01:01
Closed Closed group 1 EUS 06.06.2011 10:01:29
Reassignment Group Assignment group 1 06.06.2011 10:01:30
Observation 2
We would not have expected that an incident needs to be reassigned to another
assignment group after an incident is resolved.
It seems that incident handlers execute additional tasks after the incident was
resolved (e.g. Table 3.3.2). These activities do not influence the service level
performance, because the incident is already resolved. However, this sort of activity
should not be executed according to the process model. Therefore we recommend
that the SPM should examine this observation further.
Table 3.3.2 example reassignment activity after resolved activity Activity Resource Date Time
Opened Control group 1 EUS 17.05.2011 0:25:42
Reassignment Group Assignment group 1 17.05.2011 7:38:59
Resolved Resolved group 1 25.05.2011 9:00:11
Reassignment Group Assignment group 2 25.05.2011 9:08:01
Closed Closed group 1 EUS 25.05.2011 9:08:38
Observation 3
When an incident is closed, it is possible to reopen the incident. For example, when
the end user is not satisfied with the resolved solution. However, when an incident
is reopened, the incident management system does not restart elapsed time. As a
result, the actual service level time is not registered correctly.
Improving Service level control with process mining| 09 January 2013
Pagina 14 van 27
In Table 3.3.3 we can see that the registered elapsed time of a case, is 6 hours and
28 minutes. This time is based on the opened activity (16.08.2012 / 8:17:11) and
first resolved activity (16.08.2012 / 14:45:45) . Because the incident was reopened,
the elapsed time should be measured up until the second resolved activity that was
executed on 30.11.2012 / 9:01:52.
Because the incident management system does not register the actual elapsed time,
it is likely that more incidents breached the service level time. Therefore we
recommend that the SPM should examine the cause of this type of occurrence and
show how this effects the service level performance.
Table 3.3.3 Actual elapsed time vs. registered elapsed time Activity Resource Date Time
Opened Control group 1 EUS 16.08.2012 8:17:11
Reassignment Group Assignment group 1 16.08.2012 8:21:12
Resolved Resolved group 1 16.08.2012 14:45:45
Closed Closed group 1 EUS 16.08.2012 14:47:18
Reopen Control group 1 EUS 17.08.2012 12:35:33
Reassignment Group Assignment group 1 17.08.2012 12:56:50
Reassignment Group Assignment group 2 23.08.2012 9:08:25
Reassignment Group Assignment group 3 23.08.2012 9:21:25
Reassignment Group Assignment group 4 23.08.2012 16:33:00
Resolved Resolved group 2 30.11.2012 9:01:52
Closed Closed group 1 EUS 30.11.2012 9:06:22
Observation 4
Only the time dimension elapsed time is usable without modifying the original event
log data. This is because the incident management system has already calculated
the actual service level time. Therefore we cannot measure, for example, the
waiting time between the activity opened and activity first reassignment. Also the
process mining tool Disco does not provide a filter method that exclusively
measures the service level time. To solve this problem we built a formula into the
event log, which measures only the service level window time.
Observation 5
By using the formula, as described in Observation 4, we can measure the lead time
and two types of waiting times, namely:
• Waiting time between opened activity and first reassignment activity;
• Waiting time between resolved activity and closed activity.
It is not possible to measure the time dimension service time and synchronization
time, because the event log does not provide data that is usable to measure these
types of time dimensions. As we cannot measure the service time and
synchronization time, it is also difficult to measure the amount of skills (human
resources) that are needed to cope with the current (or future) workload. Therefore
we recommend that the SPM should examine if the incident management system is
able to measure the time dimensions service time and synchronization time.
Improving Service level control with process mining| 09 January 2013
Pagina 15 van 27
Observation 6
As we have distinguished the assignment groups from each other in Section 3.1, we
also can examine the effect of the service level performance when incidents are
handled by one or more assignment groups within the business unit EUS. Table
3.3.4 shows the effect of the decreasing service level performance (% Resolved in
time Business unit EUS) when incidents are handled by one or more assignment
groups.
By comparing the % Resolved in time Business unit EUS with the % Average norm,
we observe that incidents, most likely, do not meet the service level norm when
they are not resolved after the first reassignment group. This effect is illustrated in
Figure 3.3.5. In addition, we observe that there are more incidents closed after they
were forwarded to three different assignment groups (Third reassignment group)
instead of two assignment groups (Second reassignment group).
Table 3.3.4 Service level performance # Resolved
in time # Breached # Total % Resolved
in time Business units *EUS*
% Average norm
No Reassignment group 108 4 112 96.4% 85%
First Reassignment group 13063 2406 15469 84.4% 85%
Second Reassignment group 911 308 1219 74.7% 85%
Third Reassignment group 984 458 1442 68.2% 85%
Fourth Reassignment group 176 86 262 67.2% 85%
Fifth Reassignment group 89 80 169 52.7% 85%
Nth Reassignment group 55 67 122 45.1% 85%
Total 15386 3409 18795 81.9% 85%
These results show the importance that incidents must be assigned to correct
assignment group in order to meet the service level norm. Therefore we
recommend that the SPM examines how incidents can be forwarded more efficient
in order to meet service level norm.
Figure 3.3.5 Service level performance
Improving Service level control with process mining| 09 January 2013
Pagina 16 van 27
Answering sub question 1
Which event log data must be used as information to visualize the distribution- and
handling activities of the incident management process?
To visualize the distribution- and handling activities the following data must be
used:
During the process of answering sub question 1 we also discovered that if we
rename the assignment groups, we were able to generate social networks to see
how the assignment groups interact with each other. By using the data Resolved in
time we were able to show what the effects are of the service level performance
when incidents are handled by one or more assignment groups. The given
recommendations are defined in Section 4.2 and discussed with the SPM in
Section 4.3.
3.4 Workload prediction
In this section we will use the answers of sub question 1 to find information that will
predict an increasing workload. Based on the information from the event log we will
build hypotheses and examine them. In each hypothesis we will explain our
assumptions, explain what must be done to examine these assumptions and analyze
the results. This hypothesis cycle will be continued until we are able to define which
information business unit managers should extract from the event log to be able to
predict a workload increase and see what the effects are of these increasing
workloads.
Hypothesis 1
We assume that the time dimension average waiting time first assignment and total
average elapsed time will be affected when an increasing workload occurs. We
assume that the number of incidents that breached the service level time will
increase when the workload increases (number of opened incidents).
Examine hypothesis 1
To examine our hypothesis, we need to count the number of opened and closed
incidents and compare these results with the number of incidents that were resolved
in time and/or the incidents that breached the service level time. In addition we will
add the time dimensions and analyze if the time dimension can be related with an
increasing workload.
Event log data Information
Incident number Case ID
Opened activity Activity ID
Assignment activity Activity ID
Resolved activity Activity ID
Closed activity Activity ID
Time stamps Waiting time & total time
Elapsed time Service level time
Control group Resource ID
Assignment group Resource ID
Resolved group Resource ID
Closed group Resource ID
Improving Service level control with process mining| 09 January 2013
Pagina 17 van 27
Observation hypothesis 1
Based on Figure 3.4.1 and Figure 3.4.2 we conclude that we cannot relate a time
dimension with an increasing workload. We see that numbers Resolved in time and
Breached are related with the numbers closed and resolved but none of these
numbers show predictive signals that are usable for the BUM to act upon.
Figure 3.4.1 Results hypothesis 1
Figure 3.4.2 Results hypotheses 1
To be able to act proactive on an increasing workload, we need to find information
within the event log that will predict this increasing workload. Based on the
information that is visualized in Figure 3.4.1 and Figure 3.4.2 we do not see any
warning signals that show that the service level of the product is increasing or
decreasing. When the amount of Opened incidents increases this effects the values,
Closed, Resolved in time and Breached. Because we want to extract information to
control the service level of their products, we need information that will tell us what
the effects are on the incidents that have breached the service level time or have
resolved in time. Therefore we created the second hypothesis.
Hypothesis 2
We assume that when a business unit is not able to handle an increasing workload,
the value Resolved in time will decrease and the value Breached will increase.
Therefore we think that the difference between the value Opened and Resolved in
time will correlate with the value breached.
Execute hypothesis 2
To examine our hypothesis we need to subtract the value Resolved in time from the
value Opened and analyze the relation between this value (Resolved in time –
Opened) with the value breached. In addition we will show how these results effect
the service level performance.
Improving Service level control with process mining| 09 January 2013
Figure 3.4.3 Results hypothesis 2 Relation (Open-resolved in time | Breached)
Figure 3.4.4 Results hypothesis 2 service level performance
Observation hypothesis 2
In Figure 3.4.3 we can see that value Opened – Resolved in time relates with the value Breached. In addition we see that the value Opened –
Resolved in time has a predictive character when the workload rapidly increases or decreases. Also we see that when the value Opened –
Resolved in time increases or decreases, this effects the service level performance in a later time period. To examine how good the values
Opened – Resolved in time and Breached correlate4 with each other, we will measure the correlation coefficient of the two values.
Correlation Breached
Opened – resolved in time r 0,74
The correlation coefficient confirms that these two values have a relatively strong relationship.
4 The correlation measures the direction and strength of the linear relationship between two quantitative variables. The correlation (r) is always a number
between -1 and 1. Values of r close to -1 or 1 indicate a close linear relationship (Moore & McCabe, 2006).
Improving Service level control with process mining| 09 January 2013
Pagina 19 van 27
Now we will examine hypothesis 2 again to see if the value Opened – Resolved in time also has a predictive character, based on weekly results.
Figure 3.4.5 Weekly results hypothesis 2 (Open-resolved in time | Breached)
Figure 3.4.5 Weekly results hypothesis 2 service level performance
Observation hypothesis 2
In Figure 3.4.5 we can see that value Opened – Resolved in time still shows a predictive character. However, it seems that the relation between
the two values is less accurate. When we examine the correlation coefficient based on these weekly results, we see that our assumption is
correct.
Correlation Breached
Opened – resolved in time r 0,59
Improving Service level control with process mining| 09 January 2013
Answering sub question 2
Which information should business unit managers extract from the event log to be
able to predict a workload increase and see what the effects are of these increasing
workloads?
Based on our examinations, we observed that if the number of incidents, that were
Resolved in time, are subtracted from the number of incidents that were opened,
that this value has a predictive character compared with the number of incidents that
breached the service level time. In addition we see that when the value Opened –
Resolved in time increases or decreases, this effects the service level performance in
a later time period. Our results also show that the value Opened – Resolved in time
indicates a higher predictive character based on monthly results, compare to weekly
results.
According to the SPM, most BUM’s focus on the KPI incidents resolved in time. Our
research results show that the value Opened – Resolved in time can be used to
predict the effect of the service level performance. By using this value, BUM’s can act
more proactive and therefore can be more in control of their service level.
Observation 7
Because the results on a monthly overview are more accurate than the results on a
weekly overview, we recommend the BUM’s to use the monthly overviews for long
term decision making and use the weekly overviews to see what the effects are when
short term decisions are made.
Improving Service level control with process mining| 09 January 2013
Pagina 21 van 27
4 Conclusion
4.1 Conclusion
Which information should business unit managers extract from the incident
management event log to control the service level of their products?
To visualize distribution- and handling activities, BUM should extract the following
information from the event log of the incident management system and import this
information into a process mining tool.
Event log data Information
Incident number Case ID
Opened activity Activity ID
Assignment activity Activity ID
Resolved activity Activity ID
Closed activity Activity ID
Control group Resource ID
Assignment group Resource ID
Resolved group Resource ID
Closed group Resource ID
By renaming the assignment groups to first reassignment group, second
reassignment group, etc., the BUM can create social networks and visualize how
business units interact with each other.
By comparing the number of incidents that were Resolved in time and the number
of incidents that were closed in the same time period, the BUM can calculate the
service level performance percentage (KPI incidents resolved in time). However, by
adding the variable assignment groups, the BUM is also able to see effect of the
service level performance when incidents are handled by one or more business
units.
The BUM can filter on the data Product type to focus on the products
that are related to his/her responsibility.
By subtracting the number of incidents that were resolved in time from the number
of incidents that were opened in that time period, the BUM can use this information
as a predictive indicator to see how the service level of an product will perform in the
future if no action is taken into account. Our research results show that this
predictive indicator is more accurate on a monthly based overview compare to a
weekly based overview.
4.2 Recommendations
Based on our observation the following recommendations are made;
Recommendation 1
During the process conformance phase, we observed that in 80 cases the incident
management system registers a wrong time stamp on the closed activity in the event
log. This type of occurrence illustrates a wrong impression of how incidents were
Improving Service level control with process mining| 09 January 2013
Pagina 22 van 27
handled. Therefore we recommend that the settings of the incident management
system should be changed, so that BUM’s have an accurate view of the incident
management distribution process.
Recommendation 2:
During the process conformance phase, we observed that, in 72 cases, an incident
handler executes unusual activities in the incident management system, between the
period an incident handler resolves an incident and the period an incident handler
closes an incident. According to the SPM, this type of activity should not be executed
according to the incident management process. Therefore, we recommend that the
SPM examines the cause of this type of activity. The solution can be found in two
types of changes:
1. The incident handler must execute this type of activity to be able to close
the incident. In this case the SPM needs to change the incident
management process model;
2. The incident handler executes an unnecessary activity. Therefore the
incident handler needs to be briefed how the incident should be handled
within the incident management system.
Recommendation 3:
The incident management system does not add the extra elapsed time when an
incident is reopened after it was closed. This means, that the chances are relatively
high that many incidents breached the service level time after they were reopened.
This affects automatically the service level performance of a product. Therefore we
recommend that the SPM examines the cause of this type of occurrence and show
how this affects the service level performance. If the effects on the service level
performance are relatively high, we recommend that the settings of the incident
management system should be changed, so that BUM’s will have more accurate
information to control the service level of his product.
Recommendation 4:
It is difficult to measure the amount of skills (human resources) that are needed to
stay in control with the service level control, because the event log does not provide
activity data that can be used to measure the time dimension Service time and
synchronization time. We recommend that the SPM examines the possibility to
measure the service times. If this is possible, the BUM can compare this information
with the number of skills (human resources) and the KPI incidents resolved in time
and calculate the amount of extra skills (human resources) that are needed when a
service level decreases.
Recommendation 5:
The performance of the service level rapidly decreases when incidents are handled by
two or more assignment groups. Therefore it is important that the quality of the
information, by which incidents are registered on, increases, so that the incident
coordinator knows which business unit must resolve the incident. If this quality can
be increased, this will automatically affect the performance of the service level in a
positive way. We recommend that the SPM examines how the quality of the
information can be increased so that incident handlers can act more efficiently and
more effectively.
Improving Service level control with process mining| 09 January 2013
Pagina 23 van 27
Recommendation 6:
By subtracting the number of incidents that were resolved in time from the number
of incidents that were opened in that time period, the BUM can use this information
as a predictive indicator to see how the service level of an product will perform in the
future if no action is taken into account. Because the results on a monthly overview
are more accurate than the results on a weekly overview, we recommend the BUM’s
to use the monthly overviews for long term decision making (approximately 4 weeks)
and use the weekly overviews to see what the effects are when short term decisions
(approximately 1 week) are made.
4.3 Discussion
Based on the conclusions and recommendations, the SPM stated the following:
The research results are very interesting, because now we know that we can use
valuable information from the event log of the incident management system to
control the performance of our service levels.
Recommendation 1:
Although these unusual time stamp registrations will not affect the performance of
the service level, it is interesting to see that process mining techniques can visualize
these kinds of problems. We always strive to improve our processes including the
systems that support the handling of these processes. Therefore I will ask an expert
to examine this problem and change the registration activities when this is possible.
Recommendation 2:
This unusual activity also does not affect the performance of the service level. Based
on your observation, I would like to know why this activity is executed. Therefore I
will ask a process manager to examine this type of activity and make changes when
this is needed.
Recommendation 3:
It is important that the incident management system registers the absolute elapsed
time. Therefore I will examine how much percentage of the incidents were reopened
after they were closed. If this percentage is significant, than we will look for
possibilities of how we can measure and register the absolute elapsed time within our
incident management system.
Recommendation 4:
At the moment it is not possible to measure the service time from the incident
management system. To measure the service times, we extract data from our
Enterprise Recourse Planning (ERP) system and compare this information with the
incidents that were resolved per employee. These results give us a good estimation
of how many extra skills (human resources) are needed. Therefore we do not need to
examine the possibilities to measure the service time from the incident management
system.
Recommendation 5:
By visualizing the effects on the performance of a service level when incidents are
handled by two or more assignment group, we see the quality of information, by
which incidents are registered on, must be improved. If we are able to improve the
quality of information, incidents will be resolved quicker. In addition, if the incidents
Improving Service level control with process mining| 09 January 2013
Pagina 24 van 27
coordinators are more capable to assign the incidents to the correct assignment
group, the workload of other assignment groups will decrease. These effects will
increase the performance of the service levels. Therefore I will examine how we can
improve the quality of information by which incidents are registered on.
Recommendation 6:
The research results show that we can use relatively simple data that can be used as
a predictive indicator to control our service levels in a proactive way. Unfortunately
we have to use the variables only based on judgment. I will investigate if we can use
these variables on different product types. If so, then I will inform the BUM to use
these variables and see what the effects are on our service levels.
Improving Service level control with process mining| 09 January 2013
Pagina 25 van 27
References
Literature:
• Jonker, J. & Pennink, B.J.W. (2004). De kern van methodologie. De kern van
organisatieonderzoek. 2e dr. Assen: Koninklijke Van Gorcum.
• Leeuw, A.C.J. de. (2005). Bedrijfskundige methodologie. Management van
onderzoek. 6e dr. Assen: Koninklijke Van Gorcum.
• Turban, E. & Sharda, R. & Delen, D. (2011). Decision Support and Business
Inteligence Systems. 9th edition New Jersey: Pearson Education, Inc.
• Aalst, W.M.P. van der (2011). Navigeren met process mining. Automatisering
Gids.
• Aalst, W.M.P. van der & Reijers, H.A. & Weijters, A.J.M.M. & Dongen, B.F.
van & Alves de Medeiros, A.K. & Song, M. & Verbeek, H.M.W. (2007).
Business process mining: An industrial application. Information systems
Volume 32, issue 5, pages 713-732. Amsterdam: Elsevier.
• Aalst, W.M.P. van der (2011). Process mining. Dordrecht: Springer.
• Aalst, W.M.P. van der & Hee, K.M. van (2002) Workflow Management:
Models, Methods, and Systems. Cambridge: MIT press.
• Moore, D.S. & McCabe, G.P. (2006) Introduction to the practice of statistics,
fifth edition. W.H. Freeman and Company.
Internet sources Process mining tooling:
• Process mining tool ProM
www.process mining.org
• Process mining tool Disco
http://www.fluxicon.com
Improving Service level control with process mining| 09 January 2013
Pagina 26 van 27
Appendix
Improving Service level control with process mining| 09 January 2013
Pagina 27 van 27
Event log information
Nr. Name Description
1 Agreement ID Service level contract number
2 Assignee Incident handler who executed the activity
3 Assignment group Business unit to which the incident is assigned when the activity
were handled
4 Breached Was the incident resolved in time (True, False)
5 Brief Description One liner incident problem
6 Calamity Did the incident lead to an calamity
9 Closed Group Business unit who has closed the incident
10 Closed By Incident handler who closed the incident
11 Closed on (date/time) Date and time when the incident is closed
15 Company Company who registered the incident
17 Control group Business unit who was responsible for managing the incident.
17 Impact Which impact is related to the incident (e.g. Users, Site, Enterprise)
18 Incident Registered incident number
19 Incident (type) - Incident: a (potential) disruption of an agreed service. - Pro-active incident: a (system) message, which is (still) no disruption of service provides. - Information request: a question about a service
- User support: a request to provide user support to a service.
20 Linked to problem If the incident is linked to a problem
21 Norm Service level resolve time (e.g.4 hours, 11 hours, 33 hours, 110
hours)
22 Opened by Incident handler who opened the incident
23 Opened on (date/time) Date and time when the incident is opened
27 Priority Which priority did the incident get based on the impact variable &
Urgency variable. (e.g. Low, Standard, High, Major, Critical)
28 Problem Type Product type name (e.g. Desktop, Laptop)
31 Resolved group Assignment group who resolved the incident
32 Resolved by Incident handler who resolved the incident
33 Resolved on (date/time) Date and time on which the incident is resolved
35 SLA title Name of the Service Level Agreement
36 SLO end date/time When was the incident closed (date en time)
38 SLO expiration
date/time
When must the incident be resolved (date time)
40 SLO name Service Level Object name
41 SLO start date Date and time when the incident is opened
44 Suspended Is the incident currently suspended? (True/False)
45 Ticket status Closed, Open
46 Urgency Which urgency variable did the incident get? (.g. Low, Normal,
Major)
47 Elapsed time Measured time between open activity and resolved activity
Recommended