26
TEMPORAL ASSOCIATION RULE MINING Prepared by : Ajit Padukone, Komal Kapoor

TEMPORAL ASSOCIATION RULE MINING

  • Upload
    ion

  • View
    99

  • Download
    0

Embed Size (px)

DESCRIPTION

TEMPORAL ASSOCIATION RULE MINING. Prepared by : Ajit Padukone , Komal Kapoor. Outline. Association Rule Mining Applications Temporal Association Rule Mining Existing Techniques and their Limitations Problem Statement Proposed Approach Finding Maximal Valid Time Intervals - PowerPoint PPT Presentation

Citation preview

Page 1: TEMPORAL ASSOCIATION RULE MINING

TEMPORAL ASSOCIATION RULE MINING

Prepared by : Ajit Padukone, Komal Kapoor

Page 2: TEMPORAL ASSOCIATION RULE MINING

Outline• Association Rule Mining• Applications• Temporal Association Rule Mining• Existing Techniques and their Limitations• Problem Statement• Proposed Approach

– Finding Maximal Valid Time Intervals– Finding All Temporally Frequent Itemset

• Future Work

Page 3: TEMPORAL ASSOCIATION RULE MINING

MotivationAssociation Rule Mining {onion, potatoes} => {burgers}{bread, milk} => {butter}

Transaction Data

Frequent itemsets : {onion,potatoes,burgers}, {bread,milk,butter}

Transaction ID Items1 bread, milk, butter, cheese, chips

2 onion, capsicum, potatoes, burgers3 bread, milk, yogurt, butter

4 onion, potatoes, ketchup, burgers

5 soap, shampoo, comb, toothbrush

Page 4: TEMPORAL ASSOCIATION RULE MINING

Applications

• Retail Data Analysis• Web Usage Mining• Intrusion Detection• Bioinformatics

Page 5: TEMPORAL ASSOCIATION RULE MINING

Spatial Association Rule Mining

• Extract spatial predicates• Find all frequent patterns/predicates/sets• Generate strong rules

E.g. {Contains(Port),crosses(WaterBody)}

Source : Vania Borgony, Enhancing Spatial Association Rule Mining in Geographic Databases, 2006 - lume.ufrgs.br

Page 6: TEMPORAL ASSOCIATION RULE MINING

Temporal Association Rule Mining

Chapter 10 of the reference book defines two types of temporal references:• Transaction Time• Valid Time

Time attribute for association rules can also be defined in an analogous way.

Page 7: TEMPORAL ASSOCIATION RULE MINING

Existing Technique – Apriori Algorithm

• Apriori Algorithm finds the frequent item sets in a set of transaction which satisfy the minimum support threshold.

• Support of the item set is defined as the proportion of transactions in the data set which contain the itemset.

Algorithm:• Find all k-itemsets that have transaction support above minimum

support (frequent k-itemsets)• Generate candidate k+1-itemsets using large k-itemsets• Prune the candidate k+1-itemsets to obtain frequent k+1-itemsets

which have a transaction support above minimum support• If size(frequent k+1-itemsets) > 0, Repeat

Page 8: TEMPORAL ASSOCIATION RULE MINING

Apriori Algorithm (contd.)

Transaction

Items

1 A, B, C2 B, C, F3 B, F, G4 A, C, D, F5 C, D, E, G6 A, B, E, G7 B, C, F, G8 A, B, G9 A, B, F, G

10 C, F,G

Universal Set of Items = { A, B, C, D, E, F, G }Minimum support = 30 % (3 transactions)

Table 1: Transaction Database

Item Set Count{ A,B } 4{ A,C } 2{ A,E } 1{ A,F } 2{ A,G } 3{ B,C } 3{ B,E } 1{ B,F } 4{ B,G } 5{ C,E } 1{ C,F } 4{ C,G } 3{ E,F } 0{ E,G } 2{ F,G } 3

Item Set Count{ A,B,G } 4{ B,F,G } 3{ C,F,G } 2{ B,C,F } 2{ B,C,G

}1

   

Step 2: 2 – itemsets.All 2 itemsets with { D } or { E } as one

of the subsets are pruned. Non-struck out ones are frequent.

Step 3: 3 – itemsets.All 3 itemsets with non-frequent 2-item sets as

subsets have been pruned. Non-struck out ones are

frequent.

Step 1: 1 – itemsets.Non-struck out ones are

frequent.

Item Count{ A } 5{ B } 7{ C } 6{ D } 2{ E } 2{ F } 6{ G } 7

   

Page 9: TEMPORAL ASSOCIATION RULE MINING

Limitation• The Apriori Algorithm finds the frequent itemsets in the transaction

database which satisfy the minimum support threshold for the entire transaction database.

• What about those itemsets which are highly frequent over a limited period of time and not over the entire set of transactions?

For e.g. – Turkey-> Pumpkin Pie (Halloween)• The itemsets extracted using the Apriori Algorithm, might not be valid

for the entire period over which association rule mining has been performed.

Page 10: TEMPORAL ASSOCIATION RULE MINING

Related Work• X. Chen and I. Petrounias, Mining Temporal Features in Association

Rules, Proc. Third European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD '99).

• Yingjiu Li, Peng Ning, X. Sean Wang, Sushil Jajodia, Discovering Calendar-based Temporal Association Rules , journal Data & Knowledge Engineering - Special issue: Temporal representation and reasoning archive Volume 44 Issue 2, February 2003.

• Kang et. al., Discovering Flow Anomalies: A SWEET Approach, Eighth IEEE International Conference on Data Mining, 2008. ICDM

Page 11: TEMPORAL ASSOCIATION RULE MINING

The book also defines ‘Time instants’ or ‘Time Intervals’‘chronon’ and ‘duration’

e.g. 12th Dec-2009, 11:20:24 {bread, milk, butter, cheese, chips}12th Dec-2009, 11:27:04 {onion, capsicum, potatoes, burgers}12th Dec-2009, 12:05:44 {soap, shampoo, comb, toothbrush}

12th Dec-2009, 11th hr {{bread, milk, butter, cheese, chips}, {onion, capsicum, potatoes, burgers}}12th Dec-2009, 12th hr {{soap, shampoo, comb, toothbrush}}

Temporal Association Rule Mining

Page 12: TEMPORAL ASSOCIATION RULE MINING

The book also defines ‘Time instants’ or ‘Time Intervals’‘chronon’ and ‘duration’

e.g. 12th Dec-2009, 11:20:24 {bread, milk, butter, cheese, chips}12th Dec-2009, 11:27:04 {onion, capsicum, potatoes, burgers}12th Dec-2009, 12:05:44 {soap, shampoo, comb, toothbrush}

12th Dec-2009, 11th hr {{bread, milk, butter, cheese, chips}, {onion, capsicum, potatoes, burgers}}12th Dec-2009, 12th hr {{soap, shampoo, comb, toothbrush}}

Temporal Association Rule Mining

Time Unit (chronon)

Page 13: TEMPORAL ASSOCIATION RULE MINING

Problem Statement

Definitions :• Support of an itemset I over interval (ti,tj) = frequency of I in the

interval (ti,tj)/Total number of transaction during the interval (ti,tj)• Valid Time Interval for itemset I: the time interval during which the

support of I over the interval is greater than a threshold (lmin_sup)• Maximal Valid Time Interval: A valid interval for an itemset I which

not contained in any other valid time interval for I.• Temporally Frequent itemset: A itemset which has atleast one valid

time interval associated with it.Lmin_sup = 0.5

0.3 0.4 0.5 0.7 0.6 0.2 0.3 0.7 0.8 0.2

Valid Time Intervals

Page 14: TEMPORAL ASSOCIATION RULE MINING

Problem Statement

Definitions :• Support of an itemset I over interval (ti,tj) = frequency of I in the

interval (ti,tj)/Total number of transaction during the interval (ti,tj)• Valid Time Interval for itemset I: the time interval during which the

support of I is greater than a threshold (lmin_sup)• Maximal Valid Time Interval: A valid interval for an itemset I which

not contained in any other valid time interval for I.• Temporally Frequent itemset: A itemset which has atleast one valid

time interval associated with it.Lmin_sup = 0.5

0.3 0.4 0.5 0.7 0.6 0.2 0.3 0.7 0.8 0.2

Maximal Valid Time Intervals

Page 15: TEMPORAL ASSOCIATION RULE MINING

Problem Statement (contd.)Given:Transaction data D in the format (TU, {T1,T2,…,Tk})Where TU-> Time Unit Ti-> Transaction Find:All temporally frequent itemsets along with their maximal valid time intervals.

Page 16: TEMPORAL ASSOCIATION RULE MINING

Problem Statement (contd.)

So now, along with finding the frequent itemsets we have to find the maximal valid time intervals for each frequent itemset.Complexity of the naive approach for finding maximal valid time intervals for each frequent itemset: O(n2) Where, n= |D|

Page 17: TEMPORAL ASSOCIATION RULE MINING

Finding Maximal Valid Time Intervals

Definition : • Valid/Supporting Time Unit for I: Time Unit during which the support

of I is greater than lmin_supp.• Non-valid/Non-Supporting Time Unit for I: Time Unit during which

the support of I is less than lmin_supp..

0.3 0.4 0.5 0.7 0.6 0.2 0.3 0.7 0.8 0.2

Page 18: TEMPORAL ASSOCIATION RULE MINING

Finding Maximal Valid Time Intervals

Lemma 1: Each valid time interval TUi,TUj should contain atleast 1 valid/supporting time unit for I. Lemma 2: If an interval (TUi,TUj) is not valid for I then the interval (TUi,TUj+1) where TUj+1 is a non-valid time unit cannot be valid. Lemma 3: If an interval (TUi,TUj) is valid for I then the interval (TUi,TUj+1) where TUj+1 is a valid time unit would be valid.

0.3 0.4 0.5 0.7 0.6 0.2 0.3 0.7 0.8 0.2

0.3 0.4 0.6 0.2 0.3 0.75 0.2

Using Lemma 3, collapse continuous runs of supporting time units into 1 unit with the average density

Page 19: TEMPORAL ASSOCIATION RULE MINING

Finding Maximal Valid Time Intervals (contd.)

Given: Item set I, Transaction data D <TUi, {T1,T2, …,Tn}>, lmin_supPart 1:Find_maximal_valid_time_intervals(I,D,lmin_sup)Find STU={TUa1

,TUa2,…,TUan

} such than TUak is a supporting time unit for I

For i = 1 to n For j=n to i+1 IF is_valid_time_interval(TUai

,TUaj,D,lmin_sup)

break; End EndEnd

Lemma 1,3

0.3 0.4 0.6 0.4 0.4 0.7 0.4 0.3 0.2 0.75 0.2

0.3 0.4 0.6 0.4 0.4 0.7 0.4 0.3 0.2 0.75 0.2

Page 20: TEMPORAL ASSOCIATION RULE MINING

Finding Maximal Valid Time Intervals (contd.)

Given: Item set I, Transaction data D <TUi, {T1,T2, …,Tn}>, lmin_supPart 2:start = TUai-1+1 , finish=TUaj+1-1

low = start, high = TUaj

While low <= TUai and end < = finish

IF is_valid_time_interval(low,high) high = high +1 Else

low = low+1 EndEnd

Lemma 2

Page 21: TEMPORAL ASSOCIATION RULE MINING

Finding Maximal Valid Time Intervals (contd.)

0.3 0.4 0.6 0.4 0.4 0.7 0.4 0.3 0.2 0.75 0.2

0.3 0.4 0.6 0.4 0.4 0.7 0.4 0.3 0.2 0.75 0.2

0.3 0.4 0.6 0.4 0.4 0.7 0.4 0.3 0.2 0.75 0.2

Page 22: TEMPORAL ASSOCIATION RULE MINING

Finding Maximal Valid Time Intervals (contd.)

0.3 0.4 0.6 0.4 0.4 0.7 0.4 0.3 0.2 0.75 0.2

0.3 0.4 0.6 0.4 0.4 0.7 0.4 0.3 0.2 0.75 0.2

0.3 0.4 0.6 0.4 0.4 0.7 0.4 0.3 0.2 0.75 0.2

Complexity: O(n’2 + n)

Further iterations…

Page 23: TEMPORAL ASSOCIATION RULE MINING

Finding All Temporally Frequent Itemset

Given: Transaction data D <TUi,{T1, T2, …,Tn}>, lmin_sup, UI (Universal Itemset)C->Generate_1-item_candidate_sets(UI,D) Interval = (1, |D|)While (|C|>0) For each candidate set c in C max_valid_intervals-> find_maximal_valid_time_interval(c,D,lmin_sup) If |max_valid_intervals|>0 temp_freq_sets.add(<c,max_valid_intervals>) End End If |temp_freq_sets| > 0 C-> generate_new_candidate_sets(temp_freq_sets , D,lmin_sup) Else C-> null EndEnd

Page 24: TEMPORAL ASSOCIATION RULE MINING

Pruning in Candidate Set Generation

    Transactions  Item

SetT1 T2 T3 T4 T5 T6 T7 T8 T9

L-2 a-b                  a-c                  

C-3 a-b-c                  

Page 25: TEMPORAL ASSOCIATION RULE MINING

Future Work

• Find cyclic valid time intervals• Identify interesting maximal valid time

intervals

Page 26: TEMPORAL ASSOCIATION RULE MINING

Questions?