Upload
albert-orriols-puig
View
141
Download
9
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
Introduction to MachineIntroduction to Machine LearningLearning
Lecture 13Lecture 13Introduction to Association Rules
Albert Orriols i Puigi l @ ll l [email protected]
Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q
Universitat Ramon Llull
Recap of Lecture 5-12
LET’S START WITH DATA CLASSIFICATIONCLASSIFICATION
Slide 2Artificial Intelligence Machine Learning
Recap of Lecture 5-12Data Set Classification Model How?
We have seen four different types of approaches to classification :• Decision trees (C4.5)
• Instance-based algorithms (kNN & CBR)Instance based algorithms (kNN & CBR)
• Bayesian classifiers (Naïve Bayes)
N l N t k (P t Ad li M d li SVM)
Slide 3Artificial Intelligence Machine Learning
• Neural Networks (Perceptron, Adaline, Madaline, SVM)
Today’s Agenda
Introduction to Association RulesA Taxonomy of Association RulesMeasures of InterestApriori
Slide 4Artificial Intelligence Machine Learning
Introduction to ARIdeas come from the market basket analysis (MBA)y ( )
Let’s go shopping!
Milk, eggs, sugar, bread
Milk, eggs, cereal, b d
Eggs, sugar
bread
Customer1Customer1
Customer2 Customer3
What do my customer buy? Which product are bought together?
Aim: Find associations and correlations between the different
Slide 5
d assoc at o s a d co e at o s bet ee t e d e e titems that customers place in their shopping basket
Artificial Intelligence Machine Learning
Introduction to ARFormalizing the problem a little bitg p
Transaction Database T: a set of transactions T = {t1, t2, …, tn}
E h t ti t i t f it I (it t)Each transaction contains a set of items I (item set)
An itemset is a collection of items I = {i1, i2, …, im}
General aim:General aim:Find frequent/interesting patterns, associations, correlations, or causal structures among sets of items or elements incausal structures among sets of items or elements in databases or other information repositories.
Put this relationships in terms of association rulesPut this relationships in terms of association rulesX ⇒ Y
Slide 6Artificial Intelligence Machine Learning
Example of AR
TID Items
T1 bread, jelly, peanut-butterExamples:
bread ⇒ peanut-butterT2 bread, peanut-butter
T3 bread, milk, peanut-butter
bread ⇒ peanut butter
beer ⇒ bread
T4 beer, bread
T5 beer, milk
Frequent itemsets: Items that frequently appear togetherI = {bread peanut-butter}I = {bread, peanut-butter}
I = {beer, bread}
Slide 7Artificial Intelligence Machine Learning
What’s an Interesting Rule?Support count (σ) TID Itemspp ( )
Frequency of occurrence ofand itemset
TID Items
T1 bread, jelly, peanut-butter
T2 bread, peanut-buttera d e seσ ({bread, peanut-butter}) = 3σ ({beer, bread}) = 1
, p
T3 bread, milk, peanut-butter
T4 beer, bread({ , })
SupportF ti f t ti th t
T5 beer, milk
Fraction of transactions thatcontain an itemset
s ({bread peanut butter}) = 3/5s ({bread,peanut-butter}) = 3/5s ({beer, bread}) = 1/5
F t it tFrequent itemsetAn itemset whose support is greater than or equal to a
Slide 8
minimum support threshold (minsup)
Artificial Intelligence Machine Learning
What’s an Interesting Rule?An association rule is an TID Itemsimplication of two itemsets
X ⇒ Y
TID Items
T1 bread, jelly, peanut-butter
T2 bread, peanut-butterX ⇒ Y , p
T3 bread, milk, peanut-butter
T4 beer, breadMany measures of interest.The two most used are:
T5 beer, milk
Support (s)The occurring frequency of the rule, ) ( YXs ∪σi.e., number of transactions that contain both X and Y
Confidence (c)
trans.of #s =
Confidence (c)The strength of the association, i e measures of how often items in Y (X)
) (σ
σ YXc ∪=
Slide 9
i.e., measures of how often items in Yappear in transactions that contain X
Artificial Intelligence Machine Learning
(X)σ
Interestingness of Rules TID Items
TID s c
bread ⇒ peanut-butter 0.60 0.75T1 bread, jelly, peanut-butter
T2 bread, peanut-butterpeanut-butter ⇒ bread 0.60 1.00
beer ⇒ bread 0.20 0.50T3 bread, milk, peanut-butter
T4 beer, breadpeanut-butter ⇒ jelly 0.20 0.33
jelly ⇒ peanut-butter 0.20 1.00
j ll ilk 0 00 0 00
T5 beer, milk
jelly ⇒ milk 0.00 0.00
Many other interesting measuresThe method presented herein are based on these twoThe method presented herein are based on these two approaches
Slide 10Artificial Intelligence Machine Learning
Types of ARBinary association rules:y
bread ⇒ peanut-butter
Quantitative association rules:weight in [70kg – 90kg] ⇒ height in [170cm – 190cm]
Fuzzy association rules:weight in TALL ⇒ height in TALL
Let’s start for the beginningBi i ti l A i i
Slide 11
Binary association rules – A priori
Artificial Intelligence Machine Learning
AprioriThis is the most influential AR miner
It consists of two stepsG ll f i h i1. Generate all frequent itemsets whose support ≥ minsup
2. Use frequent itemsets to generate association rules
So let’s pay attention to the first stepSo, let s pay attention to the first step
Slide 12Artificial Intelligence Machine Learning
Apriorinull
A B C D E
AB ADAC AE BDBC BE CECD DE
ABC ABEABD ACD ADEACE BCD BDEBCE CDEABC ABEABD ACD ADEACE BCD BDEBCE CDE
ABCD ABCE ABDE ACDE BCDE
Given d items, we have 2d possible itemsets.ABCDE
Slide 13
Given d items, we have 2d possible itemsets.Do I have to generate them all?
Artificial Intelligence Machine Learning
AprioriLet’s avoid expanding all the graphp g g p
Key idea:D d l A b f f iDownward closure property: Any subsets of a frequent itemset are also frequent itemsets
Therefore, the algorithm iteratively does:Create itemsets
Only continue exploration of those whose support ≥ minsupOnly continue exploration of those whose support ≥ minsup
Slide 14Artificial Intelligence Machine Learning
Example Itemset GenerationnullInfrequent
itemset
A B C D E
itemset
AB ADAC AE BDBC BE CECD DE
ABC ABEABD ACD ADEACE BCD BDEBCE CDEABC ABEABD ACD ADEACE BCD BDEBCE CDE
ABCD ABCE ABDE ACDE BCDE
Given d items, we have 2d possible itemsets.ABCD
Slide 15
Given d items, we have 2d possible itemsets.Do I have to generate them all?
Artificial Intelligence Machine Learning
Recovering the ExampleTID Items
T1 bread, jelly, peanut-butter
T2 bread, peanut-butter
T3 b d ilk bT3 bread, milk, peanut-butter
T4 beer, bread
T5 b ilkMinimum support = 3
T5 beer, milk
Item count1-itemsets
pp
Item count
bread 4
peanut-b 3 Item count2-itemsets
peanut-b 3
jelly 1
milk 1
bread, peanut-b 3
milk 1
beer 1
Slide 16Artificial Intelligence Machine Learning
Apriori Algorithmk=1
Generate frequent itemsets of length 1
Repeat until no frequent itemsets are foundk := k+1
Generate itemsets of size k from the k-1 frequent itemsets
Compute the support of each candidate by scanning DBCompute the support of each candidate by scanning DB
Slide 17Artificial Intelligence Machine Learning
Apriori AlgorithmAlgorithm Apriori(T)
C1 ← init-pass(T); F1 ← {f | f ∈ C1, f.count/n ≥ minsup}; // n: no. of transactions in Tfor (k = 2; Fk-1 ≠ ∅; k++) do
Ck ← candidate-gen(Fk-1);for each transaction t ∈ T do
for each candidate c ∈ Ck do if i t i d i t thif c is contained in t then
c.count++; dend
endF ← {c ∈ C | c count/n ≥ minsup}Fk ← {c ∈ Ck | c.count/n ≥ minsup}
endreturn F ← U F ;
Slide 18Artificial Intelligence Machine Learning
return F ← Uk Fk;
Apriori AlgorithmFunction candidate-gen(Fk-1)
Ck ← ∅; forall f1, f2 ∈ Fk-1
with f1 = {i1, … , ik-2, ik-1} and f2 = {i1, … , ik-2, i’k-1} and ik-1 < i’k-1 do
c ← {i1, …, ik-1, i’k-1}; // join f1 and f2Ck ← Ck ∪ {c}; for each (k-1)-subset s of c doif ( F ) thif (s ∉ Fk-1) then
delete c from Ck; // pruneendend
endreturn C ;
Slide 19Artificial Intelligence Machine Learning
return Ck;
Example of Apriori Run
D t b TDBItemset sup
Itemset supDatabase TDBC1
L1Tid Items10 A C D
{A} 2{B} 3{C} 3
Itemset sup{A} 2{B} 3
1st scan10 A, C, D20 B, C, E30 A, B, C, E
{C} 3{D} 1{E} 3
{C} 3{E} 3
C2 C2
40 B, E
ItemsetItemset sup{A B} 1L2 2nd scan
te set{A, B}{A, C}
{A, B} 1{A, C} 2{A, E} 1
Itemset sup{A, C} 2{B C} 2 {A, E}
{B, C}{B E}
{B, C} 2{B, E} 3{C, E} 2
{B, C} 2{B, E} 3{C, E} 2
C3 L33rd scan
{B, E}{C, E}
{C, E} 2
Itemset It t
Slide 20Artificial Intelligence Machine Learning
3 33 scante set{B, C, E}
Itemset sup{B, C, E} 2
AprioriRemember that Apriori consists of two stepsp p1. Generate all frequent itemsets whose support ≥ minsup
U f t it t t t i ti l2. Use frequent itemsets to generate association rules
We accomplished step 1. So we have all frequent itemsets
So, let’s pay attention to the second step
Slide 21Artificial Intelligence Machine Learning
Rule Generation in AprioriGiven a frequent itemset Lq
Find all non-empty subsets F in L, such that the association rule F ⇒ {L-F} satisfies the minimum confidenceu e ⇒ { } sat s es t e u co de ce
Create the rule F ⇒ {L-F}
If L={A,B,C}The candidate itemsets are: AB⇒C, AC⇒B, BC⇒A, A⇒BC, B⇒AC, C⇒AB
In general, there are 2K-2 candidate solutions, where k is the length of the itemset L
Slide 22Artificial Intelligence Machine Learning
Can you Be More Efficient?Can we apply the same trick used with support?pp y pp
Confidence does not have anti-monote property
Th t i (AB D) > (A D)?That is, c(AB⇒D) > c(A ⇒D)?Don’t know!
But confidence of rules generated from the same itemset d h h idoes have the anti-monote property
L={A,B,C,D}C(ABC⇒D) ≥ c(AB ⇒CD) ≥ c(A ⇒BCD)
We can apply this property to prune the rule generationpp y p p y p g
Slide 23Artificial Intelligence Machine Learning
Example of Efficient Rule Generation
ABCDLow confidenceconfidence
ABC⇒D ABD⇒C ACD⇒B BCD⇒A
AB⇒CD AC⇒BD BC⇒AD BD⇒ADAD⇒BC CD⇒AB
A⇒BCD B⇒ACD C⇒ABD D⇒ABC
Slide 24Artificial Intelligence Machine Learning
Challenges in AR MiningChallengesg
Apriori scans the data base multiple times
M t ft th i hi h b f did tMost often, there is a high number of candidates
Support counting for candidates can be time expensive
Several methods try to improve this points bySeveral methods try to improve this points byReduce the number of scans of the data base
Shrink the number of candidates
Counting the support of candidates more efficiently
Slide 25Artificial Intelligence Machine Learning
Next Class
Advanced topics in association rule mining
Slide 26Artificial Intelligence Machine Learning
Introduction to MachineIntroduction to Machine LearningLearning
Lecture 13Lecture 13Introduction to Association Rules
Albert Orriols i Puigi l @ ll l [email protected]
Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q
Universitat Ramon Llull