Upload
ion
View
99
Download
0
Tags:
Embed Size (px)
DESCRIPTION
TEMPORAL ASSOCIATION RULE MINING. Prepared by : Ajit Padukone , Komal Kapoor. Outline. Association Rule Mining Applications Temporal Association Rule Mining Existing Techniques and their Limitations Problem Statement Proposed Approach Finding Maximal Valid Time Intervals - PowerPoint PPT Presentation
Citation preview
TEMPORAL ASSOCIATION RULE MINING
Prepared by : Ajit Padukone, Komal Kapoor
Outline• Association Rule Mining• Applications• Temporal Association Rule Mining• Existing Techniques and their Limitations• Problem Statement• Proposed Approach
– Finding Maximal Valid Time Intervals– Finding All Temporally Frequent Itemset
• Future Work
MotivationAssociation Rule Mining {onion, potatoes} => {burgers}{bread, milk} => {butter}
Transaction Data
Frequent itemsets : {onion,potatoes,burgers}, {bread,milk,butter}
Transaction ID Items1 bread, milk, butter, cheese, chips
2 onion, capsicum, potatoes, burgers3 bread, milk, yogurt, butter
4 onion, potatoes, ketchup, burgers
5 soap, shampoo, comb, toothbrush
Applications
• Retail Data Analysis• Web Usage Mining• Intrusion Detection• Bioinformatics
Spatial Association Rule Mining
• Extract spatial predicates• Find all frequent patterns/predicates/sets• Generate strong rules
E.g. {Contains(Port),crosses(WaterBody)}
Source : Vania Borgony, Enhancing Spatial Association Rule Mining in Geographic Databases, 2006 - lume.ufrgs.br
Temporal Association Rule Mining
Chapter 10 of the reference book defines two types of temporal references:• Transaction Time• Valid Time
Time attribute for association rules can also be defined in an analogous way.
Existing Technique – Apriori Algorithm
• Apriori Algorithm finds the frequent item sets in a set of transaction which satisfy the minimum support threshold.
• Support of the item set is defined as the proportion of transactions in the data set which contain the itemset.
Algorithm:• Find all k-itemsets that have transaction support above minimum
support (frequent k-itemsets)• Generate candidate k+1-itemsets using large k-itemsets• Prune the candidate k+1-itemsets to obtain frequent k+1-itemsets
which have a transaction support above minimum support• If size(frequent k+1-itemsets) > 0, Repeat
Apriori Algorithm (contd.)
Transaction
Items
1 A, B, C2 B, C, F3 B, F, G4 A, C, D, F5 C, D, E, G6 A, B, E, G7 B, C, F, G8 A, B, G9 A, B, F, G
10 C, F,G
Universal Set of Items = { A, B, C, D, E, F, G }Minimum support = 30 % (3 transactions)
Table 1: Transaction Database
Item Set Count{ A,B } 4{ A,C } 2{ A,E } 1{ A,F } 2{ A,G } 3{ B,C } 3{ B,E } 1{ B,F } 4{ B,G } 5{ C,E } 1{ C,F } 4{ C,G } 3{ E,F } 0{ E,G } 2{ F,G } 3
Item Set Count{ A,B,G } 4{ B,F,G } 3{ C,F,G } 2{ B,C,F } 2{ B,C,G
}1
Step 2: 2 – itemsets.All 2 itemsets with { D } or { E } as one
of the subsets are pruned. Non-struck out ones are frequent.
Step 3: 3 – itemsets.All 3 itemsets with non-frequent 2-item sets as
subsets have been pruned. Non-struck out ones are
frequent.
Step 1: 1 – itemsets.Non-struck out ones are
frequent.
Item Count{ A } 5{ B } 7{ C } 6{ D } 2{ E } 2{ F } 6{ G } 7
Limitation• The Apriori Algorithm finds the frequent itemsets in the transaction
database which satisfy the minimum support threshold for the entire transaction database.
• What about those itemsets which are highly frequent over a limited period of time and not over the entire set of transactions?
For e.g. – Turkey-> Pumpkin Pie (Halloween)• The itemsets extracted using the Apriori Algorithm, might not be valid
for the entire period over which association rule mining has been performed.
Related Work• X. Chen and I. Petrounias, Mining Temporal Features in Association
Rules, Proc. Third European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD '99).
• Yingjiu Li, Peng Ning, X. Sean Wang, Sushil Jajodia, Discovering Calendar-based Temporal Association Rules , journal Data & Knowledge Engineering - Special issue: Temporal representation and reasoning archive Volume 44 Issue 2, February 2003.
• Kang et. al., Discovering Flow Anomalies: A SWEET Approach, Eighth IEEE International Conference on Data Mining, 2008. ICDM
The book also defines ‘Time instants’ or ‘Time Intervals’‘chronon’ and ‘duration’
e.g. 12th Dec-2009, 11:20:24 {bread, milk, butter, cheese, chips}12th Dec-2009, 11:27:04 {onion, capsicum, potatoes, burgers}12th Dec-2009, 12:05:44 {soap, shampoo, comb, toothbrush}
12th Dec-2009, 11th hr {{bread, milk, butter, cheese, chips}, {onion, capsicum, potatoes, burgers}}12th Dec-2009, 12th hr {{soap, shampoo, comb, toothbrush}}
Temporal Association Rule Mining
The book also defines ‘Time instants’ or ‘Time Intervals’‘chronon’ and ‘duration’
e.g. 12th Dec-2009, 11:20:24 {bread, milk, butter, cheese, chips}12th Dec-2009, 11:27:04 {onion, capsicum, potatoes, burgers}12th Dec-2009, 12:05:44 {soap, shampoo, comb, toothbrush}
12th Dec-2009, 11th hr {{bread, milk, butter, cheese, chips}, {onion, capsicum, potatoes, burgers}}12th Dec-2009, 12th hr {{soap, shampoo, comb, toothbrush}}
Temporal Association Rule Mining
Time Unit (chronon)
Problem Statement
Definitions :• Support of an itemset I over interval (ti,tj) = frequency of I in the
interval (ti,tj)/Total number of transaction during the interval (ti,tj)• Valid Time Interval for itemset I: the time interval during which the
support of I over the interval is greater than a threshold (lmin_sup)• Maximal Valid Time Interval: A valid interval for an itemset I which
not contained in any other valid time interval for I.• Temporally Frequent itemset: A itemset which has atleast one valid
time interval associated with it.Lmin_sup = 0.5
0.3 0.4 0.5 0.7 0.6 0.2 0.3 0.7 0.8 0.2
Valid Time Intervals
Problem Statement
Definitions :• Support of an itemset I over interval (ti,tj) = frequency of I in the
interval (ti,tj)/Total number of transaction during the interval (ti,tj)• Valid Time Interval for itemset I: the time interval during which the
support of I is greater than a threshold (lmin_sup)• Maximal Valid Time Interval: A valid interval for an itemset I which
not contained in any other valid time interval for I.• Temporally Frequent itemset: A itemset which has atleast one valid
time interval associated with it.Lmin_sup = 0.5
0.3 0.4 0.5 0.7 0.6 0.2 0.3 0.7 0.8 0.2
Maximal Valid Time Intervals
Problem Statement (contd.)Given:Transaction data D in the format (TU, {T1,T2,…,Tk})Where TU-> Time Unit Ti-> Transaction Find:All temporally frequent itemsets along with their maximal valid time intervals.
Problem Statement (contd.)
So now, along with finding the frequent itemsets we have to find the maximal valid time intervals for each frequent itemset.Complexity of the naive approach for finding maximal valid time intervals for each frequent itemset: O(n2) Where, n= |D|
Finding Maximal Valid Time Intervals
Definition : • Valid/Supporting Time Unit for I: Time Unit during which the support
of I is greater than lmin_supp.• Non-valid/Non-Supporting Time Unit for I: Time Unit during which
the support of I is less than lmin_supp..
0.3 0.4 0.5 0.7 0.6 0.2 0.3 0.7 0.8 0.2
Finding Maximal Valid Time Intervals
Lemma 1: Each valid time interval TUi,TUj should contain atleast 1 valid/supporting time unit for I. Lemma 2: If an interval (TUi,TUj) is not valid for I then the interval (TUi,TUj+1) where TUj+1 is a non-valid time unit cannot be valid. Lemma 3: If an interval (TUi,TUj) is valid for I then the interval (TUi,TUj+1) where TUj+1 is a valid time unit would be valid.
0.3 0.4 0.5 0.7 0.6 0.2 0.3 0.7 0.8 0.2
0.3 0.4 0.6 0.2 0.3 0.75 0.2
Using Lemma 3, collapse continuous runs of supporting time units into 1 unit with the average density
Finding Maximal Valid Time Intervals (contd.)
Given: Item set I, Transaction data D <TUi, {T1,T2, …,Tn}>, lmin_supPart 1:Find_maximal_valid_time_intervals(I,D,lmin_sup)Find STU={TUa1
,TUa2,…,TUan
} such than TUak is a supporting time unit for I
For i = 1 to n For j=n to i+1 IF is_valid_time_interval(TUai
,TUaj,D,lmin_sup)
break; End EndEnd
Lemma 1,3
0.3 0.4 0.6 0.4 0.4 0.7 0.4 0.3 0.2 0.75 0.2
0.3 0.4 0.6 0.4 0.4 0.7 0.4 0.3 0.2 0.75 0.2
Finding Maximal Valid Time Intervals (contd.)
Given: Item set I, Transaction data D <TUi, {T1,T2, …,Tn}>, lmin_supPart 2:start = TUai-1+1 , finish=TUaj+1-1
low = start, high = TUaj
While low <= TUai and end < = finish
IF is_valid_time_interval(low,high) high = high +1 Else
low = low+1 EndEnd
Lemma 2
Finding Maximal Valid Time Intervals (contd.)
0.3 0.4 0.6 0.4 0.4 0.7 0.4 0.3 0.2 0.75 0.2
0.3 0.4 0.6 0.4 0.4 0.7 0.4 0.3 0.2 0.75 0.2
0.3 0.4 0.6 0.4 0.4 0.7 0.4 0.3 0.2 0.75 0.2
Finding Maximal Valid Time Intervals (contd.)
0.3 0.4 0.6 0.4 0.4 0.7 0.4 0.3 0.2 0.75 0.2
0.3 0.4 0.6 0.4 0.4 0.7 0.4 0.3 0.2 0.75 0.2
0.3 0.4 0.6 0.4 0.4 0.7 0.4 0.3 0.2 0.75 0.2
Complexity: O(n’2 + n)
Further iterations…
Finding All Temporally Frequent Itemset
Given: Transaction data D <TUi,{T1, T2, …,Tn}>, lmin_sup, UI (Universal Itemset)C->Generate_1-item_candidate_sets(UI,D) Interval = (1, |D|)While (|C|>0) For each candidate set c in C max_valid_intervals-> find_maximal_valid_time_interval(c,D,lmin_sup) If |max_valid_intervals|>0 temp_freq_sets.add(<c,max_valid_intervals>) End End If |temp_freq_sets| > 0 C-> generate_new_candidate_sets(temp_freq_sets , D,lmin_sup) Else C-> null EndEnd
Pruning in Candidate Set Generation
Transactions Item
SetT1 T2 T3 T4 T5 T6 T7 T8 T9
L-2 a-b a-c
C-3 a-b-c
Future Work
• Find cyclic valid time intervals• Identify interesting maximal valid time
intervals
Questions?