Upload
kevyn
View
54
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Mining Frequent Patterns. Jiawei Han, Jian Pei and Yiwen Yin School of Computer Science Simon Fraser University. Without Candidate Generation. Afsoon Yousefi. CS:332, March 24 th , 2014 Inspired by Song Wang slides. Outline. Problem of mining frequent Pattern Review of Apriori - PowerPoint PPT Presentation
Citation preview
Without Candidate GenerationMining Frequent Patterns
Afsoon YousefiCS:332, March 24th, 2014Inspired by Song Wang slides
Jiawei Han, Jian Pei and Yiwen YinSchool of Computer ScienceSimon Fraser University
2
Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree
An ExampleDesign & Construction Properties
Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties
Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions
Outline
3
Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree
An ExampleDesign & Construction Properties
Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties
Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions
Outline
4
Frequent pattern mining plays an essential role in mining associations.
Most of the previous studies, adopt an Apriori-like approach.Achieves good performance but suffers from:
Problem of mining frequent Pattern
• Apriori:• frequent 1-itemsets → length-2 candidates• Accumulate and test• Find a length-100 frequent pattern → candidates
It is costly to handle a huge number of candidate sets
• Scan database• Check a large set of candidates
It is tedious to repeatedly scan the database
5
Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree
An ExampleDesign & Construction Properties
Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties
Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions
Outline
6
Knowing the minimum support threshold Use frequent (k-1)-itemsets generate candidates of frequent k-itemsets Scan database and count each pattern in Get frequent k-itemsets
Review of Apriori
TID Items Bought
100 f , a , c , d , g , i , m , p200 a , b , c , f , i , m , o300 b , f , h , j , o400 b , c , k , s , p500 a , f , c , e , i , p , m , n
Apriori itemsets
f , a , c , d , g , i , m , p , l , o , h , j , k , s , b , e , n
f , a , c , m , b , pfa , fc , fm , fp , ac , am , … bp
fa , fc , fm, …… …
7
Bottleneck of the Apriori-like method is at theCandidate set generationTest
How to avoid generating a huge set of candidates?A novel compact data structure, called FP-treeFP-tree based pattern fragment growth mining methodEmploying a divide-and-conquer search method for frequent
itemsets combinations
Review of Apriori
8
Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree
An ExampleDesign & Construction Properties
Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties
Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions
Outline
9
Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree
An ExampleDesign & Construction Properties
Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties
Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions
Outline
10
Minimum support threshold 1. One scan of DB to identify the set of frequent items
Items are ordered in frequency descending orderFor convenience, the frequent itemsets of each transaction is listed
in this ordering
Frequent Pattern Tree: An Example
TID Items Bought
100 f , a , c , d , g , i , m , p200 a , b , c , f , i , m , o300 b , f , h , j , o400 b , c , k , s , p500 a , f , c , e , i , p , m , n
¿ ( 𝑓 : 4 ) , (𝑐 : 4 ) , (𝑎 : 3 ) , (𝑏 :3 ) , (𝑚: 3 ) ,(𝑝 :3)>¿Frequent items
TID Items Bought Ordered frequent
items100
f , a , c , d , g , i , m , p f , c , a , m , p
200 a , b , c , f , i , m , o f , c , a , b , m
300 b , f , h , j , o f , b
400 b , c , k , s , p c , b , p
500
a , f , c , e , i , p , m , n f , c , a , m , p
TID
Ordered frequent items
100 f , c , a , m , p
200 f , c , a , b , m
300 f , b
400 c , b , p
500 f , c , a , m , p
11
1. One scan of DB to identify the set of frequent items2. Store the set of frequent items of each transaction in a tree
1. Create a “null” root2. Scan the DB for second time3. Add the paths which are the ordered frequent items4. Share the path until a different item comes up5. Branch and create a sub-path
Frequent Pattern Tree: An Example
TID
Ordered frequent items
100 f , c , a , m , p
200 f , c , a , b , m
300 f , b
400 c , b , p
500 f , c , a , m , p
root
f:4
c:3
a:3
m:2
b:1
b:1
p:2 m:1
c:1
b:1
p:1
12
1. One scan of DB to identify the set of frequent items2. Store the set of frequent items of each transaction in a tree
1. To facilitate tree traversal, build item header table2. Nodes with the same item-name are linked
Frequent Pattern Tree: An Example
TID
Ordered frequent items
100 f , c , a , m , p
200 f , c , a , b , m
300 f , b
400 c , b , p
500 f , c , a , m , p
root
f:4
c:3
a:3
m:2
b:1
b:1
p:2 m:1
c:1
b:1
p:1
item Head of pointer
f cabmp
13
Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree
An ExampleDesign & Construction Properties
Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties
Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions
Outline
14
1. The tree consist of
2. Each node in the tree has three fields
3. Each entry in the frequent-item header table consist of
Frequent Pattern Tree: Design and Construction
One root A set of item prefix subtrees as the children of the
root A frequent-item header table
Item-name Count Node-link
Item-name Head of node-link
15
Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree
An ExampleDesign & Construction Properties
Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties
Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions
Outline
16
1. Constructing FP-tree Needs exactly two scans of DB First to collect the set of frequent items Second to construct the FP-tree
The cost of inserting transaction is is the number of frequent items in
2. Completeness the FP-tree contains all the information related to mining frequent patterns given the minimum support threshold
3. Compactness The size of the tree is bounded by the occurrences of frequent items The height of the tree is bounded by the maximum number of items in a transaction
Frequent Pattern Tree: Properties
17
The frequent itemsets of transactions have descending orderAn example for unordered itemsets
Frequent Pattern Tree: Properties
TID
Ordered frequent items
100 p , m , a , c , f
200 m , b , a , c , f
300 b , f
400 p , b , c
500 p , m , a , c , f
m:2
a:2
c:2
f:2
c:1
b:1
p:1
m:2
b:1
a:2
c:1
f:2
b:1
c:1
p:3
rootroot
f:4
c:3
a:3
m:2
b:1
b:1
p:2 m:1
c:1
b:1
p:1
18
Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree
An ExampleDesign & Construction Properties
Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties
Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions
Outline
19
1. Examine the mining process by starting from the bottom of the header table
Collect all the patterns that node participates
Starting from ’s head in the header table and following ’s node-links
Mining Frequent Patterns Using FP-tree
20
Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree
An ExampleDesign & Construction Properties
Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties
Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions
Outline
21
Node p (p:3) FP-tree paths <f:4 , c:3 , a:3 , m:2 , p:2> , <c:1 , b:1 , p:1> Conditional pattern base {(f:2 , c:2 , a:2 , m:2), (c:1 , b:1)} Construction of a FP-tree on these
just keep the frequent items
Mining Frequent Patterns Using FP-tree: An Example
root
f:4
c:3
a:3
m:2
b:1
b:1
p:2 m:1
c:1
b:1
p:1
item Head of pointer
f cabmp
¿ (𝑐 :3 )>¿Frequent items
• <p:3 , cp:3>
Frequent itemsets containing p
22
Node m (m:3) FP-tree paths <f:4 , c:3 , a:3 , m:2 > , < f:4 , c:3 , a:3 , b:1 , m:1 > Conditional pattern base {(f:2 , c:2 , a:2 ), (f:1 , c:1 , a:1 , b:1)} Construction of a FP-tree on these
just keep the frequent items create the tree
Mining Frequent Patterns Using FP-tree: An Example
root
f:4
c:3
a:3
m:2
b:1
b:1
p:2 m:1
c:1
b:1
p:1
item Head of pointer
f cabmp
¿ ( 𝑓 :3 ,𝑐 :3 ,𝑎 :3 )>¿Frequent items
• <m:3 , am:3 , cm:3 , fm:3 , cam:3 , • fam:3 , fcm:3 , fcam:3>
Frequent itemsets containing m
23
Node b (b:3) FP-tree paths <f:4 , c:3 , a:3 , b:1 > , < f:4 , b:1 > , < c:1 , b:1 > Conditional pattern base {(f:1 , c:1 , a:1 ), (f:1), (c:1)} Construction of a FP-tree on these
just keep the frequent items create the tree
Mining Frequent Patterns Using FP-tree: An Example
root
f:4
c:3
a:3
m:2
b:1
b:1
p:2 m:1
c:1
b:1
p:1
item Head of pointer
f cabmp
¿ ()>¿Frequent items
• < b:3 >
Frequent itemsets containing m
24
Node a (a:3) FP-tree paths <f:4 , c:3 , a:3 > Conditional pattern base {(f:3 , c:3)} Construction of a FP-tree on these
just keep the frequent items create the tree
Mining Frequent Patterns Using FP-tree: An Example
root
f:4
c:3
a:3
m:2
b:1
b:1
p:2 m:1
c:1
b:1
p:1
item Head of pointer
f cabmp
¿ ( 𝑓 :3 ,𝑐 :3 )>¿Frequent items
• <a:3 , fa:3 , ca:3 , fca:3>
Frequent itemsets containing m
25
Node c (c:4) FP-tree paths <f:4 , c:3> , <c:1> Conditional pattern base {(f:3)} Construction of a FP-tree on these
just keep the frequent items create the tree
Mining Frequent Patterns Using FP-tree: An Example
root
f:4
c:3
a:3
m:2
b:1
b:1
p:2 m:1
c:1
b:1
p:1
item Head of pointer
f cabmp
¿ ( 𝑓 :3 ,𝑐 : 4 )>¿Frequent items
• <c:4 , fc:3>
Frequent itemsets containing m
26
Node f (f:4) FP-tree paths <f:4 > Conditional pattern base {()} Construction of a FP-tree on these
just keep the frequent items create the tree
Mining Frequent Patterns Using FP-tree: An Example
root
f:4
c:3
a:3
m:2
b:1
b:1
p:2 m:1
c:1
b:1
p:1
item Head of pointer
f cabmp
¿ ()>¿Frequent items
• <f:4>
Frequent itemsets containing m
27
Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree
An ExampleDesign & Construction Properties
Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties
Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions
Outline
28
Mining Frequent Patterns Using FP-tree: Design and construction
• FP-tree• Minimum support threshold
Input
• The complete set of frequent patterns
Output
• If Tree contains a single path • Then for each combination of the nodes () do
• Generate pattern • Support = min support in
• Else for each in the header table • Generate pattern with support = support• Construct ’s FP-tree call it • If • Then call FP-growth(
FP-growth(, )
29
Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree
An ExampleDesign & Construction Properties
Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties
Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions
Outline
30
1. To calculate the frequent patterns containing in path Only consider prefix sub-path of node in The frequency count of every node in tat sub-path is the same as node
2. Suppose FP-tree has a single path The complete set of the frequent patterns of FP-tree can be generated by Enumeration of all the combinations of the sub-paths of The support of each is equal to the minimum support of the items contained in that sub-path
Mining Frequent Patterns Using FP-tree : Properties
31
Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree
An ExampleDesign & Construction Properties
Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties
Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions
Outline
32
1. FP-tree is usually much smaller than the size of DB.
2. FP-trees constructed in the FP-growth are never bigger than the sub-paths
3. Mining operations consist of mainly prefix count adjustment Counting Pattern fragment concatenation
This is much less costly than Generating a very large number of candidate patterns Test each of them
Algorithm Efficiency Properties
33
Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree
An ExampleDesign & Construction Properties
Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties
Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions
Outline
34
Comparison of FP-growth with Apriori
Performed on a 450MHz Pentium PC 128MB main memory Microsoft Windows/NT
Written in Microsoft/Visual C++6.0
Run Time was considered time interval between input and output
Two datasets
Performance Study
D1 D2Items → 1KAverage transaction size → 25Average maximal frequent itemset size → 10Number of transactions → 10K
Items → 10KAverage transaction size → 25Average maximal frequent itemset size → 20Number of transactions → 100K
35
Performance Study
36
Performance Study
37
Performance Study
38
Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree
An ExampleDesign & Construction Properties
Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties
Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions
Outline
39
Construction of FP-trees for projected Databases
Database is large
FP-tree can not be constructed in the main memory
Partition database into a set of projected databases
Construct an FP-tree
Mine it in each projected databases
Future Works
40
Construction of a disk-resident FP-tree
Use B+-tree structure to index FP-tree Split the tree based on the common prefix paths
Materialization of an FP-tree
Constructing FP-tree needs two scan of the database Materialize an FP-tree for frequent pattern mining How to select a good minimum support threshold Use a low ?
Future Works
41
Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree
An ExampleDesign & Construction Properties
Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties
Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions
Outline
42
Constructs a highly compact FP-tree Usually substantially smaller than the original database
Applies a pattern growth method Avoids costly candidate generation and tests
Applies a partitioning-based divide and conquer method Dramatically reduces the size of the subsequent conditional FP-trees
Mines both short and long patterns efficiently in large databases
Conclution
43
Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree
An ExampleDesign & Construction Properties
Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties
Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions
Outline
44
What are the components of a FP-tree?
How To calculate the frequent patterns containing in path
Compare efficiency of mining operation in FP-growth with Apriori
Selected questions
One root A set of item prefix subtrees as the children of the root A frequent-item header table
Only consider prefix sub-path of node in The frequency count of every node in tat sub-path is the same as node Find all the combinations
Mining operations consist of mainly prefix count adjustment Counting Pattern fragment concatenation
This is much less costly than Generating a very large number of candidate patterns Test each of them
45
Without Candidate GenerationMining Frequent Patterns
Afsoon YousefiCS:332, March 24th, 2014Inspired by Song Wang slides
Jiawei Han, Jian Pei and Yiwen YinSchool of Computer ScienceSimon Fraser University
Category 1 Category 2 Category 3 Category 40
1
2
3
4
5
6
Series 1 Series 2 Series 3