18
Kuo-Yu Huang NCU CSIE DBLab 1 The Concept of Maximal The Concept of Maximal Frequent Itemsets Frequent Itemsets NCU CSIE Database Laborat NCU CSIE Database Laborat ory ory Kuo-Yu Huang Kuo-Yu Huang 2002-04-15 2002-04-15

The Concept of Maximal Frequent Itemsets

  • Upload
    simone

  • View
    53

  • Download
    0

Embed Size (px)

DESCRIPTION

The Concept of Maximal Frequent Itemsets. NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15. Outline. Introduction Max-Miner MAFIA GenMax Conclusion. Introduction(1/2). Interesting datasets with long patterns Questionnaire results Transactions database - PowerPoint PPT Presentation

Citation preview

Page 1: The Concept of Maximal Frequent Itemsets

Kuo-Yu Huang NCU CSIE DBLab 1

The Concept of Maximal FrequThe Concept of Maximal Frequent Itemsetsent Itemsets

NCU CSIE Database LaboratoryNCU CSIE Database LaboratoryKuo-Yu HuangKuo-Yu Huang

2002-04-152002-04-15

Page 2: The Concept of Maximal Frequent Itemsets

Kuo-Yu Huang NCU CSIE DBLab 2

OutlineOutline• Introduction• Max-Miner• MAFIA• GenMax• Conclusion

Page 3: The Concept of Maximal Frequent Itemsets

Kuo-Yu Huang NCU CSIE DBLab 3

Introduction(1/2)Introduction(1/2)• Interesting datasets with long patterns

– Questionnaire results– Transactions database

• Contain many frequently occurring items• A wide average record length

• Apriori-like algorithms are inadequate– Enumerates every single frequent itemsets

Page 4: The Concept of Maximal Frequent Itemsets

Kuo-Yu Huang NCU CSIE DBLab 4

Introduction(2/2)Introduction(2/2)• Maximal Frequent Itemsets

– If it has no superset that is frequent.– eq

• Items: a, b, c, d, e• Frequent Itemset: {a, b, c}• {a, b, c, d}, {a, b, c, e}, {a, b, c, d, e} are not Fre

quent Itemset.• Maximal Frequent Itemsets: {a, b, c}

Page 5: The Concept of Maximal Frequent Itemsets

Kuo-Yu Huang NCU CSIE DBLab 5

Max-Miner(1/4)Max-Miner(1/4)• Efficiently mining long patterns from database

s– R. J. Bayardo– ACM SIGMOD’98

• Max-Miner– Abandons a bottom-up traversal– Attempts to “look-ahead”– Identify a long frequent itemset, prune all its subse

ts.

Page 6: The Concept of Maximal Frequent Itemsets

Kuo-Yu Huang NCU CSIE DBLab 6

Max-Miner(2/4)Max-Miner(2/4)• Set-enumeration tree• Breadth-first search

Page 7: The Concept of Maximal Frequent Itemsets

Kuo-Yu Huang NCU CSIE DBLab 7

Max-Miner(3/4)Max-Miner(3/4)• Candidate group

– Head: h(g)• Itemset enumerated by the node.

– Tail: t(g)• An ordered set and contains all items not in h(g)

– eg:Node {1}• h{g}: {1}• t{g}: {2, 3, 4}

Page 8: The Concept of Maximal Frequent Itemsets

Kuo-Yu Huang NCU CSIE DBLab 8

Max-Miner(4/4)Max-Miner(4/4)• Support counting

– h(g), h(g) t{g}, h(g) {i} for all ∪ ∪– If h(g) t{g} is frequent, then any itemset e∪

numerated by a sub-node will also be frequent but no maximal.

– If h(g) {i} is infrequent, then any head of a ∪sub-node that contains item I will also be infrequent.

Page 9: The Concept of Maximal Frequent Itemsets

Kuo-Yu Huang NCU CSIE DBLab 9

MAFIA(1/4)MAFIA(1/4)• MAFIA: A Maximal Frequent Itemset Alg

orithm for Transactional Databases.– D. Burdick, M. Calimlim, and J. Gehrke.– ICDE’01

• MAFIA– Integrates a depth-first traversal of the itms

et lattice with eiffective pruning mechanisms

Page 10: The Concept of Maximal Frequent Itemsets

Kuo-Yu Huang NCU CSIE DBLab 10

MAFIA(2/4)MAFIA(2/4)

Page 11: The Concept of Maximal Frequent Itemsets

Kuo-Yu Huang NCU CSIE DBLab 11

MAFIA(3/4)MAFIA(3/4)• HUTMFI

– Check Head Union Tail is in MFI• Stop searching and return

• PEP– newNode = C i∪– Check newNode.support == C.support

• Move I from C.tail to C.head

• FHUT– newNode = C I∪– Whether I is the leftmost child in the tail

Page 12: The Concept of Maximal Frequent Itemsets

Kuo-Yu Huang NCU CSIE DBLab 12

MAFIA(4/4)MAFIA(4/4)

Page 13: The Concept of Maximal Frequent Itemsets

Kuo-Yu Huang NCU CSIE DBLab 13

GenMax(1/2)GenMax(1/2)• Efficiently Mining Maximal Frequent Ite

msets– Karam Gouda and Mohammed J. Zaki.– ICDM’01

• GenMax– A backtrack search based algorithm for mi

ning maximal frequent itemsets.

Page 14: The Concept of Maximal Frequent Itemsets

Kuo-Yu Huang NCU CSIE DBLab 14

GenMax(2/2)GenMax(2/2)• Superset checking techniques

– Do superset check only for Il+1 P∪ l+1

– Using check_status flag– Local maximal frequent itemsets

• Reordering the combine set• Diffsets propagation

Page 15: The Concept of Maximal Frequent Itemsets

Kuo-Yu Huang NCU CSIE DBLab 15

Conclusion(1/4)Conclusion(1/4)

database # of Items Average length # of records Maximal pattern length

ChessPumsb

767117

3774

319649046

23(20%)27(40%)

ConnectPumsb*

1307117

4350

6755749046

31(2.5%)43(2.5%)

T10I4D100KT40I10D100K

10001000

1040

100,000100,000

13(0.01%)25(0.1%)

Type I

Type II

Type III

• Type I:– normal MFI distribution with not too long maximal patterns.

• Type II:– Left-skewed distribution with longer pattern

• Type III:– Exponential decay distribution with short maximal pattern

Page 16: The Concept of Maximal Frequent Itemsets

Kuo-Yu Huang NCU CSIE DBLab 16

Conclusion(2/4)Conclusion(2/4)

Page 17: The Concept of Maximal Frequent Itemsets

Kuo-Yu Huang NCU CSIE DBLab 17

Conclusion(3/4)Conclusion(3/4)

Page 18: The Concept of Maximal Frequent Itemsets

Kuo-Yu Huang NCU CSIE DBLab 18

Conclusion(4/4)Conclusion(4/4)