Using Data Mining to Aid in Rule Construction for Event

MINING EVENT DATA FOR ACTIONABLE PATTERNS

Joseph L. Hellerstein and Sheng Ma IBM T.J. Watson Research Center Hawthorne, New York

{hellers, shengma}@us.ibm.com

A central problem in event management is constructing correlation rules. Doing so requires characterizing patterns of events for which actions should be taken (e.g., sequences of printer status changes that foretell a printer-off line event). In most cases, rule construction requires experts to identify problem patterns, a process that is time-consuming and error prone. Herein, we describe how data mining can be used to identify actionable patterns. In particular, we present efficient mining algorithms for three kinds of patterns found in event data: event bursts, periodicities, and mutually dependent events.

1. Introduction

Event management is central to the operations of computer and communications systems in that it provides a way to focus on exceptional situations. As installations grow in size and complexity, event management has become increasingly burdensome. Automated operations (AO), which was introduced in the mid 1980s (e.g., Mill86]), provides a way to automatically filter and act on events, typically by using correlation rules. While this reduces the burden on the

operations staff, AO creates a challenge as well---constructing the correlation rules. Visual programming techniques have simplified the mechanics of rule construction. However, determining the content and effectiveness of rules remains an impediment to increased automation. Herein, we propose an approach to simplifying rule construction by using data mining to characterize the left-hand side of correlation rules.

Figure 1 summarizes how event management is done today and our vision for how it can be improved. The area above the dotted line depicts the current state-of-the-art. Raw events flow into the event management system, where they are parsed and stored. Then, a correlation engine uses (correlation) rules to interprets these events. Some events are filtered. Others are coalesced. And some result in alarms, emails, or other actions. Correlation rules are structured as if-then statements. The if-part (or left-hand side) describes a situation in which actions are to be taken. The then-part (or right-hand side) details what is to be done when the condition is satisfied.

Our focus is rule construction, especially the left-hand side of rules. The rationale for this focus is that we must know the situation to be addressed before an action can be identified. Today, two broad approaches are used to specify the left-hand side of rules. The first is based on universal truths, such as exploiting topology information to make inferences about connectivity (e.g., [YeLkMoYeOh96]). Unfortunately,

there remains a broad range of problems that are not addressed by universal truths. For these, human experts must construct correlation rules, a process that is time consuming and error prone.

We propose two approaches to assist experts in rule construction. The first is to visualize large volumes of event data to aid in determining relationships between events. The second employs data mining techniques to automate the search for patterns. These approaches can be used separately, but they are most effective when employed in combination.

To elaborate, consider the events displayed in Figure 2. These events were collected from a corporate intranet over a three day period. The events consist of SNMP traps such as “threshold violated”, “connection-closed”, “port-up”, and “port-down”. The x-axis is time, and the y-axis is the host from which the event originated. The latter are encoded as integers between 1 and 149, the number of hosts present. Note that while this plot contains a considerable amount of information, little can be discerned in terms of patterns.

Now, consider Figure 3. Here, the hosts are ordered in a way to reveal patterns (as in [MaHe99]), many of which provide the basis for constructing correlation rules. For example, pattern 1 consists of “threshold violated” and “threshold reset” events that occur every thirty seconds. Such a pattern may be indicative of hosts nearing their capacity limits. Pattern 2 has a cloud-like appearance that consists of “port-up” and “port-down” events generated as a result of mobile users connecting to and disconnecting from hubs. Such patterns are probably of little interest to the operations staff and hence should be filtered since they represent normal behavior. Pattern 3, which happens every day at 2:00pm, consists of SNMP “request” and “authentication failure” events. This is most likely due to an improperly configured monitor. Pattern 4 is a series of link-up and link-down events that resulted from a software problem on a group of hubs.

In a well managed installation, errors are rare. Thus months of data are needed to identify actionable abnormalities. These data volumes can be substantial. For example, several installations we work with routinely collect five million events per week. Given the large volume of data and the different time scales at which patterns may be present, it is difficult to systematically identify patterns only through visualization. Clearly, automation is needed as well.

Our approach to this automation makes use of data mining. Data mining is a mixture of statistical and data management techniques that provide a way to cluster categorical data so as to find “interesting” combinations (e.g., hub “cold start” trap is often preceded by “CPU threshold violated”). Considerable work has been done in mining transactional data (e.g., supermarket purchases), much of which is based on [AgImSw93] and [AgSr94]. For event data, time plays an important role. Follow-on research has pursued two directions that address this requirement. The first, sequential data mining (e.g., [AgSr95]), takes into account the sequences of events rather than just their occurrence. The second, temporal data mining (e.g., [MaToVe97]), considers the time between event occurrences. Data mining has been applied to numerous domains. [ApHo94] discuss the construction of rules based for capital markets. [CoSrMo97] describes approaches to finding patterns in web accesses. [ApWeGr93] discusses prediction of defects in disk drives. [MaToVe97] addresses sequential mining in the context of telecommunications events. The latter, while closely related to our interests, uses event data to motivate temporal associations, not to identify characteristic patterns and their interpretation. In summary, while much foundational work has been done on data mining and some consideration have been made that pertain to mining event data, no one has addressed the specifics of patterns that arise in enterprise event management.

The remainder of this paper is organized as follows. Section 2 provides background on data mining. Sections 3, 4, and 5 describe, several patterns of particular interest in event management—event bursts, periodic patterns, and mutually dependent patterns. Our conclusions are contained in Section 6.

2. Data mining background

This section provides a brief overview of data mining as it pertains to the analysis of event data. We begin by describing market basket analysis, the context in which data mining was first proposed. Then, we discuss efficiency considerations, a topic of particular importance given the large size of event histories that must be mined. Last, we show how traditional data mining relates to event mining.

Market basket analysis originated from looking at data from supermarkets. The context is as follows. Each customer has a basket of goods. The question addressed is “Which items, when purchased, indicate that another item will be purchased as well?” This is commonly referred to as an association rule. For example, early studies found that when diapers are purchased beer is frequently purchased as well. Association rules indicate a one-way dependency. For example, it turns out that purchasers of beer are, in general, not particularly inclined to buy diapers.

To proceed, some notation is introduced. Let be items that can be purchased. Thus, each

market basket contains a subset of these items. We use to denote the set of market baskets, where there is one basket per transaction. Thus, for

.

A key data mining problem is to find sets of items, typically referred to as itemsets, that occur in a large number of market baskets. This is captured in a metric called support. Support is computed by counting the number of baskets in which the itemset occurs and then dividing by , the number of baskets. A second and closely related problem addresses prediction.

Here, we are looking for that have a high

probability of predicting that will be in the same

basket. The metric used here is confidence. Confidence is computed by counting the baskets in

which occur (which we denote by

) and then dividing by

.

Typically, mining involves finding all patterns whose support is larger than a minimum value of support,

which we denote by MinSupp. A naïve approach is displayed in Algorithm 1.

QC =

For each possible pattern P

Count = 0

For each market basket B

For each item I in P

If I is not in B, advance to next market basket

End

Count++

End

If Count > MinSupp, add P to QC

End

Algorithm 1: Naive Approach to Finding Frequent Patterns

In Algorithm 1, QC is the set of qualified candidates, those patterns that have the minimum support level. The algorithm considers all possible patterns, scans through all market baskets for each pattern, and for each market basket, counts the number tests if each item of the pattern is present in the market basket.

Considerable computation time is required to perform this algorithm, even on modest-sized data sets. In particular, observe that the number of iterations in the outer loop is exponential in the number of patterns since there are possible patterns (where M is the number of items). Clearly, Algorithm 1 scales poorly.

Fortunately, the search for frequent patterns can be made more efficient. Doing so rests on the following observation:

The support for can be no

greater than the support for .

This means that if we find a pattern with low support, there is no need to consider any pattern that contains that pattern. This is an example of the downward closure property.

With the downward closure property, we can improve the efficiency of Algorithm 1. This is shown below in Algorithm 2 that considers patterns of increasing length. Such a strategy is referred to as level-wise search [AgImSw93].

FI = {I such that Count(I)> MinSupp} /* Frequent Items */

QC(1) = FI

N = 1

While QC(N)

For

Count = 0

For each market basket B

For each item I in P

If I is not in B, advance to next market basket

End

Count++

End

If Count > MinSupp, add P to QC(N+1)

End

N++

End

Algorithm 2: Using Downward Closure to Find Frequent Patterns

The algorithm first finds frequent items, since by downward closure infrequent items cannot be in frequent patterns. QC(N) contains the qualified contains with N items. The potential patterns with N+1 items are those that have N items in combination with one of the frequent items not already in the N item pattern1. Even though Algorithm 2 has four loops instead of the three in Algorithm 1, Algorithm 2 avoids looking through an exponential number of patterns and so is considerably more efficient.

The downward closure property holds for some patterns and not for others. In particular, downward closure does not hold for the confidence of association rules. To see this, recall that confidence is computed

as . Now

consider the confidence with which

predicts . Observe that by including a new item,

we decrease (or at least do not increase) the numerator of

.

However, including in the pattern affects the denominator as well. Thus, it is unclear if the resulting

ratio will be smaller or larger than the original ratio. Hence, downward closure does not hold.

Now, we return to the problem of mining event data. Here, the context changes in a couple of ways. First, there is no concept of a market basket. However, events have a timestamp and so looking for patterns of events means looking at events that co-occur within a time range. These ranges may be windows (either of fixed or variable size) or they may be contiguous segments of the data that are designated in some other way. In the data mining literature, this is referred to as temporal mining or temporal association.

A second consideration needed in event mining relates to the attributes used to characterize membership in itemsets. Several attributes are common to event data. Event type describes the nature of the event. Event origin specifies the source of the event, which is a combination of the host from which the event originated and the process and/or application that generated the event. (Due to the limited granularity of the data used in our running examples, we simplify matters in the sequel by just referring to the host from which the event originated.) In addition to type and origin, there is a plethora of other attributes that depend on these two, such as the port associated with a “port down” event and the threshold value and metric in a “threshold violated” event.

The next three sections address patterns we have discovered in the course of analyzing event data: event bursts, periodicities, and mutually dependent events. Each is illustrated using the corporate intranet data. Then we discuss issues related to the efficient discovery of these patterns.

3. Event bursts

This section describes a commonly occurring pattern in problem situations—event bursts. We begin by motivating this pattern and providing an example. Next, we outline our approach to discovering these patterns.

Event bursts (or event storms) arise under several circumstances. For example, when a critical element fails in a network that lacks sufficient redundancy (e.g., the only name server fails), communications are impaired thereby causing numerous “cannot reach destination” events to be generated in a short time period. Another situation relates to cascading problems, such as those introduced by a virus or, more subtly, by switching loads after a failure, a change that can result in additional failures due to heavier loads.

1 is a shorthand for this statement, although it is not precise technically

Figure 4 provides a means for visual identification of event bursts in our corporate intranet data. The plot in the lower left contains the raw data in the same form as in Figure 3 (although the ordering of hosts on the y-axis is somewhat different). Given the coarse time scale of the plot relative to the granularity of event arrivals, there are many cases in which more than one event occupies the same pixel. As a result, it is difficult to discern event rates by inspection. We could do drill-downs in various sections of the plot to better determine event rates, but this is labor intensive.

Instead, the upper left plot summarizes the rates of events for a specific window size (as indicated in the lower left). Further, the table in the upper right of Figure 4 summarizes those situations in which large event rates are present. This provides a convenient way to select subsets of the data to study in detail.

Mining event bursts consists of the following two steps:

1. Finding periods in which event rates are higher than a specified threshold

2. Mining for patterns common to the periods identified in step (1)

For step 1, we proceed by first intervalizing the data. Then, event rates within each interval are computed. Those intervals in which rates exceed a specified threshold are then identified. In Figure 3, these

intervals are indicated by the vertical lines that lie above the threshold (which is indicated by the horizontal line).

Step 2 uses the intervals identified in Step 1 as the “market baskets” of events. For example, mining the three intervals with the largest event rates in Figure 3 finds the pattern “SNMP request”, “Authentication Failure. Note that the mining employed here is essentially that done in Algorithm 2. However, our “market baskets” are just those intervals that have high event rates.

4. Periodicities

Periodic patterns consist of repeated occurrences of the same event or event set. Our experience has been that such patterns are common in event data, often accounting for one half to two thirds of the events present.

Why are periodic behaviors common in networks? Two factors contribute to this phenomenon. The first relates to monitoring--when a managed element emits a high severity event, the management server often initiates periodic monitoring of key resources (e.g., router CPU utilization). The second consideration is a consequence of scheduling routine maintenance tasks, such as rebooting print servers every morning or backing up data every week.

Our experience with analyzing events in computer networks is that periodic patterns often lead to actionable insights. There are two reasons for this. First, a periodic pattern indicates something persistent and predictable. Thus, there is value in identifying and characterizing the periodicity. Second, the period itself often provides a signature of the underlying

phenomena, thereby facilitating diagnosis. In either case, patterns with a very low support are often of great interest. For example, we found a one-day periodic pattern due to a periodic port-scan. Although this pattern only happens three times in a three-day log, it provides a strong indication of a security intrusion.

Unfortunately, mining such periodic patterns is complicated by several factors.

1. Periodic behavior is not necessarily persistent. For example, in complex networks, periodic monitoring is initiated when an exception occurs (e.g., CPU utilization exceeds a threshold) and stops once exceptional situation is no longer present. During the

monitoring interval or “on'' segment, the monitoring request and its response occur periodically. The ''off'' segment consists of a random gap in the periodicity until another exceptional situation initiates periodic monitoring. This makes it difficult to apply well established techniques such as the fast fourier transforms.

2. There may be phase shifts and variations in the period due to network delays, lack of clock synchronization, and rounding errors.

3. Period lengths are not known in advance. This means that either an exhaustive search is required or there must be a way to infer the periods. Further, periods may span a wide range, from milliseconds to days.

4. The number of occurrences of a periodic pattern typically depends on the period. For example, a pattern with a period of one day period has, at most, seven occurrences in a week, while one minute period may have as many as 1440 occurrences. Thus, mining patterns with longer periods requires adjusting support levels. In particular, mining patterns with low support greatly increases computational requirements in existing approaches to discovering temporal associations.

In order to capture items (1) and (2) above, we employ the concept of a partially periodic temporal association. We refer to as a p-pattern. A p-pattern generalizes the concept of partial periodicity defined [HaDoYi99] by combining it with temporal associations (akin to episodes in [MaTuVe97]) and including the concept of time tolerance to account for imperfections in the periodicities.

Figure 5 illustrates the structure of a partially periodic pattern. Such patterns consist of an on-segment and an off-segment. During the on-segment, events are periodic with a period of . No periodic event is present during the off-segment. Spurious events (or

noise) may arise during both on-segments and off-segments.

Pattern 1 in Figure 2 is an example of a partial periodicity. Figure 6 displays a zoomed version of Figure 2 for two AFS servers in pattern 1. These partial periodicities contain two types of events: “threshold violated” (circled point) and “threshold reset” (uncircled point). As in Figure 2, the x-axis is time, and the y-axis is the host on which the events originated. Here the periodicities occur approximately every 30 seconds, although some are closer to 28 seconds and others are near 33 seconds.

We view mining for p-patterns as consisting of two sub-tasks: (a) finding period lengths and (b) finding temporal associations. While a variation of level-wise search can be employed to address the second sub-task, the first sub-task has not been addressed (to the best of our knowledge).

Our approach to finding the periods of p-patterns is to compute event inter-arrival times and then test if inter-arrival counts exceed what would have been expected by chance. Note that a simple threshold test is not sufficient here since small inter-arrival times are much more common than longer ones and hence the threshold must be adjusted by the size of the inter-arrival time. We address this by using a Poisson distribution as our null hypothesis for the count of events at specified inter-arrival times. A Chi-Square test is used to assess statistical significance. Next, we mine for the patterns at each statistically significant inter-arrival time. This is done by employing a level-wise search on each interval that comprises each period.

We have applied our algorithm for p-pattern discovery to the corporate intranet data. Over 30 patterns were

discovered, ranging in length from 1 event to 13 with periods ranging from 1 second to 1 day.

5. Mutual dependencies

Thus far, we have considered frequently occurring patterns. That is, given a set of intervals, we look for pairs of event type and host that commonly occur together. While this problem statement is the focus in mainstream data mining, event management requires another perspective as well.

In particular, we are interested in events that occur together when they occur. We use the term mutually dependent pattern or m-pattern to refer to a group of events that occur together when they occur. In Figure 2, pattern 4 is an m-pattern. It consists of a combination of “link down” and “link up” events. The occurrence of these events is displayed in Figure 7 (which is a zoomed version of Figure 2).

The events in an m-pattern do not necessarily occur in the same sequence. However, they do occur as a group. Thus, m-patterns are quite different from association rules in that the latter only indicate a one-way dependency. Also, the metric for quantifying the presence of an m-pattern differs from that for associations rules. Association rules are typically quantified in terms of support, the fraction of intervals in which the association is evidenced. In contrast, m-patterns require a metric more akin to confidence. This observation leads to some difficulties in terms of efficiently mining m-patterns since, as we discussed in Section 2, confidence does not have the downward closure property and hence Algorithm 2 cannot be used.

Figure 8 compares m-patterns and frequent patterns. Here, a, b, c, d are events, and the intervalization consists of two time units (as indicated by the dotted lines). Suppose that: (a) the support threshold for a frequent pattern is 40% (i.e., a pattern must appear in

40% of the intervals to be considered frequent) and (b) the m-pattern co-occurrence threshold is 90%. (The latter is much higher because of the semantics of an m-pattern.) Observe that the pattern ab is frequent in that this pattern occurs in 50% of the intervals. However, there are four cases in which a occurs but b does not. Thus, ab is not an m-pattern. Now consider, dc. This pattern is much less frequent than ab in that dc occurs in only two of the eight intervals, which is below the support threshold. However, whenever c or d occurs the other does as well. Thus, dc is an m-pattern.

What should be done when an m-pattern is discovered? Logically, the events in the pattern can be treated as a group. So, from an analysis perspective, it is desirable to coalesce an m-pattern into a single event. This not only reduces the number of events, it also provides the opportunity for higher level semantics (e.g., the m-pattern caused by a router failure).

We turn now to the specifics of m-pattern discovery. An m-pattern is present if where

are subsets of the events in the m-pattern and is the

co-occurrence threshold. is computed by counting all intervals in which both sets of events are present and dividing by those intervals in which only

is present.

First, observe that m-patterns have the downward closure property. To see this, let and suppose that S is not an m-pattern. We show that T is also not an m-pattern. By definition, there exists subsets of S,

such that . But are subsets of T as well. So, the conclusion follows.

Even though downward closure holds for m-patterns, other computational difficulties arise. In particular, if we use the definition of an m-pattern to test for its presence, then the number of tests we must make is exponential in the size of the pattern. Clearly, this scales poorly.

Fortunately, there is a way to simplify matters. We claim that if , then is a m-pattern. That is, we must demonstrate that under this assumption it follows that for .

Let Note that

This means that only linear time is required to check for the presence of an m-pattern within an interval.

6. Conclusions

Event management is a fundamental part of systems management. Over the last fifteen years, automated operations has increased operator productivity by using correlation rules to interpret event patterns. While productivity has improved, a substantial bottleneck remains—determining what correlation rules to write.

This paper describes how data mining can be used to identify patterns of events that indicate underlying problems. Traditionally, data mining has been applied to consumer purchases, often referred to as market basket data. A central consideration in this work is scalability. This is achieved by using a level-wise search in which patterns are discovered by building larger patterns from smaller ones.

We show how to apply data mining to event data in an efficient and effective way. Several patterns are identified that are of particular interest to event management—event bursts, periodicities, and mutual dependencies. We provide interpretations for each, and we show how pattern discovery can be structured to exploit a level-wise search, thereby improving scalability. The latter is particularly important since, based on our experience, tens of millions of events may need to be analyzed in order to discover actionable patterns.

The first pattern we describe is event bursts (or event storms). These commonly occur when a critical component fails. Of particular interest is the set of events common to event bursts (e.g., in order to

classify the kind of problem present). Here, the bursts serve as the market baskets to which a level-wise search is applied.

A second pattern is periodic occurrences of a set of events. Of most interest to event management are partial periodicities since periodic behavior is often initiated by some other source (e.g., violating a threshold). A key consideration here is finding the period. We describe how to construct a statistical test to do this. Once periods have been identified, they can be used in a level-wise search.

Last, we introduce mutually dependent patterns (or m-patterns). The objective is to identify groups of events that occur together when they occur, even though the occurrence of these events is not frequent. Looking for co-occurrences introduces some algorithmic challenges in that some new insights are required in order to achieve computational efficiencies. Even so, we show that m-patterns have the downward closure property, and we show how an efficient level-wise search can be applied to the discovery of m-patterns.

Acknowledgements

Our thanks to Luanne Burns and David Rabenhorst for developing the prototype visualization and mining facility used for the studies in this paper. We also thank Chang-Shing Perng, David Taylor, and Sujay Parekh who, in addition to Luanne Burns and David Rabenhorst, provided stimulating discussion and useful comments on this work.

References

[AgImSw93] R. Agrawal, T. Imielinski, and A. Swami. Mining Association Rules Between Sets of Items in Large Databases. Proc of Very Large Data Bases, pp. 207-216, 1993.

[AgSr94] R. Agrawal and R. Srikant. Fast algorithms for mining association rules. Proc of Very Large Data Bases, 1994.

[AgSr95] R. Agrawal and R. Srikant. Mining Sequential Patterns. Proc. 1995 Int. Conf. Data Engineering, pp. 3-14, 1995.

[ApHo94] C. Apte and S.J. Hong. Predicting Equity Returns from Securities Data with Minimal Rule Generation. Knowledge Discovery and Data Mining, 1994.

[ApWeGr93] C. Apte, S. Weiss, G. Grout. Predicting Defects in Disk Drive Manufacturing: A Case Study in High-Dimensional Classification. Proc. of IEEE Conference on Artificial Intelligence and its Applications, 1993.

[CoSrMo97] R. Cooley, J. Srivastava, and B. Mobasher. Web mining: Information and pattern discovery on the world wide web. 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'97), 1997.

[HaDoYi99] J. Han, G. Dong, and Y. Yin. Efficient mining of partially periodic patterns in time series database. International Conference on Data Engineering, 1999.

[MaHe99] S. Ma and J.L. Hellerstein. “Ordering Categorical Data to Improve Visualization," IEEE Symposium on Information Visualization, 1999.

[Mill86] K.R. Milliken, A.V. Cruise, R.L. Ennis, A.J. Finkel, J.L. Hellerstein, D.J. Loeb, D.A. Klein, M.J. Masullo, H.M. Van Woerkom, N.B. Waite. YES/MVS and the Automation of Operations for Large Computer Complexes. IBM Systems Journal. Vol 25, No. 2, 1986.

[MaToVe97] H. Mannila, H. Toivonen, and A ~Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3), 1997.

[YeKlMoYeOh97] S.A. Yemini, S. Sliger, M. Eyal, Y. Yemini, and D. Ohsie. High Speed and Robust Event Correlation. IEEE Communications Magazine, Vol 34, No. 5. pp. 82-90, 1996.

Documents

Using Data Mining to Aid in Rule Construction for Event