View
255
Download
1
Embed Size (px)
Citation preview
Association Rule Mining
- MaxMiner
Mining Association Rules in Large Databases
Association rule mining
Algorithms Apriori and FP-Growth
Max and closed patterns
Mining various kinds of association/correlation
rules
Max-patterns & Close-patterns If there are frequent patterns with many
items, enumerating all of them is costly. We may be interested in finding the
‘boundary’ frequent patterns. Two types…
Max-patterns Frequent pattern {a1, …, a100} (100
1) + (100
2) + … + (11
00
00) = 2100-1 = 1.27*1030
frequent sub-patterns! Max-pattern: frequent patterns without
proper frequent super pattern BCDE, ACD are max-patterns BCD is not a max-pattern
Tid Items
10 A,B,C,D,E
20 B,C,D,E,
30 A,C,D,FMin_sup=2
Maximal Frequent Itemset
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
Border
Infrequent Itemsets
Maximal Itemsets
An itemset is maximal frequent if none of its immediate supersets is frequent
Closed Itemset An itemset is closed if none of its immediate
supersets has the same support as the itemset
TID Items1 {A,B}2 {B,C,D}3 {A,B,C,D}4 {A,B,D}5 {A,B,C,D}
Itemset Support{A} 4{B} 5{C} 3{D} 4{A,B} 4{A,C} 2{A,D} 3{B,C} 3{B,D} 4{C,D} 3
Itemset Support{A,B,C} 2{A,B,D} 3{A,C,D} 2{B,C,D} 3{A,B,C,D} 2
Maximal vs Closed Itemsets
TID Items
1 ABC
2 ABCD
3 BCE
4 ACDE
5 DE
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
124 123 1234 245 345
12 124 24 4 123 2 3 24 34 45
12 2 24 4 4 2 3 4
2 4
Transaction Ids
Not supported by any transactions
Maximal vs Closed Frequent Itemsets
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
124 123 1234 245 345
12 124 24 4 123 2 3 24 34 45
12 2 24 4 4 2 3 4
2 4
Minimum support = 2
# Closed = 9
# Maximal = 4
Closed and maximal
Closed but not maximal
Maximal vs Closed Itemsets
FrequentItemsets
ClosedFrequentItemsets
MaximalFrequentItemsets
MaxMiner: Mining Max-patterns
Idea: generate the complete set-enumeration tree one level at a time, while prune if applicable.
(ABCD)
A (BCD) B (CD) C (D) D ()
AB (CD) AC (D) AD () BC (D) BD () CD ()
ABC (C)
ABCD ()
ABD () ACD () BCD ()
Local Pruning Techniques (e.g. at node A)
Check the frequency of ABCD and AB, AC, AD. If ABCD is frequent, prune the whole sub-tree. If AC is NOT frequent, remove C from the
parenthesis before expanding.
(ABCD)
A (BCD) B (CD) C (D) D ()
AB (CD) AC (D) AD () BC (D) BD () CD ()
ABC (C)
ABCD ()
ABD () ACD () BCD ()
Algorithm MaxMiner
Initially, generate one node N= , where h(N)= and t(N)={A,B,C,D}.
Consider expanding N, If h(N)t(N) is frequent, do not expand N. If for some it(N), h(N){i} is NOT frequent,
remove i from t(N) before expanding N. Apply global pruning techniques…
(ABCD)
Global Pruning Technique (across sub-trees)
When a max pattern is identified (e.g. ABCD), prune all nodes (e.g. B, C and D) where h(N)t(N) is a sub-set of it (e.g. ABCD).
(ABCD)
A (BCD) B (CD) C (D) D ()
AB (CD) AC (D) AD () BC (D) BD () CD ()
ABC (C)
ABCD ()
ABD () ACD () BCD ()
Example
Tid Items
10 A,B,C,D,E
20 B,C,D,E,
30 A,C,D,F
(ABCDEF)
Items Frequency
ABCDEF 0
A 2
B 2
C 3
D 3
E 2
F 1
Min_sup=2
Max patterns:
A (BCDE)B (CDE) C (DE) E ()D (E)
Example
Tid Items
10 A,B,C,D,E
20 B,C,D,E,
30 A,C,D,F
(ABCDEF)
Items Frequency
ABCDE 1
AB 1
AC 2
AD 2
AE 1
Min_sup=2
A (BCDE)B (CDE) C (DE) E ()D (E)
AC (D) AD ()
Max patterns:
Node A
Example
Tid Items
10 A,B,C,D,E
20 B,C,D,E,
30 A,C,D,F
(ABCDEF)
Items Frequency
BCDE 2
BC
BD
BE
Min_sup=2
A (BCDE)B (CDE) C (DE) E ()D (E)
AC (D) AD ()
Max patterns:
BCDE
Node B
Example
Tid Items
10 A,B,C,D,E
20 B,C,D,E,
30 A,C,D,F
(ABCDEF)
Items Frequency
ACD 2
Min_sup=2
A (BCDE)B (CDE) C (DE) E ()D (E)
AC (D) AD ()
Max patterns:
BCDEACD
Node AC