Upload
clyde-walters
View
219
Download
0
Embed Size (px)
DESCRIPTION
COMP53313 Classification root child=yeschild=no Income=high Income=low 100% Yes 0% No 100% Yes 0% No 0% Yes 100% No Decision tree RaceIncomeChildInsurance whitehighno? Suppose there is a person. RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno Training set Test set
Citation preview
COMP5331 1
COMP5331
Classification
Prepared by Raymond WongThe examples used in Decision Tree are borrowed from LW
Chan’s notesPresented by Raymond Wong
raywong@cse
COMP5331 2
Classification
root
child=yes child=no
Income=high Income=low
100% Yes0% No
100% Yes0% No
0% Yes100% No
Decision tree
Race Income
Child Insurance
white
high no ?
Suppose there is a person.
COMP5331 3
Classification
root
child=yes child=no
Income=high Income=low
100% Yes0% No
100% Yes0% No
0% Yes100% No
Decision tree
Race Income
Child Insurance
white
high no ?
Suppose there is a person.
Race Income Child Insurance
black high no yes
white high yes yes
white low yes yes
white low yes yesblack low no no
black low no no
black low no no
white low no no
Training set
Test set
COMP5331 4
Applications Insurance
According to the attributes of customers, Determine which customers will buy an insurance
policy Marketing
According to the attributes of customers, Determine which customers will buy a product
such as computers Bank Loan
According to the attributes of customers, Determine which customers are “risky” customers
or “safe” customers
COMP5331 5
Applications Network
According to the traffic patterns, Determine whether the patterns are
related to some “security attacks” Software
According to the experience of programmers,
Determine which programmers can fix some certain bugs
COMP5331 6
Same/Difference Classification Clustering
COMP5331 7
Classification Methods Decision Tree Bayesian Classifier Nearest Neighbor Classifier
COMP5331 8
Decision Trees ID3 C4.5 CART
Iterative Dichotomiser
Classification And Regression Trees
Classification
COMP5331 9
Entropy Example 1
Consider a random variable which has a uniform distribution over 32 outcomes
To identify an outcome, we need a label that takes 32 different values.
Thus, 5 bit strings suffice as labels
COMP5331 10
Entropy Entropy is used to measure how
informative is a node. If we are given a probability distribution
P = (p1, p2, …, pn) then the Information conveyed by this distribution, also called the Entropy of P, is:I(P) = - (p1 x log p1 + p2 x log p2 + …+ pn x log pn)
All logarithms here are in base 2.
COMP5331 11
Entropy For example,
If P is (0.5, 0.5), then I(P) is 1. If P is (0.67, 0.33), then I(P) is 0.92, If P is (1, 0), then I(P) is 0.
The entropy is a way to measure the amount of information.
The smaller the entropy, the more informative we have.
COMP5331 12
EntropyRace Incom
eChild Insuranc
eblac
khigh no yes
white
high yes yes
white
low yes yes
white
low yes yes
black
low no no
black
low no no
black
low no no
white
low no no
Info(T) = - ½ log ½ - ½ log ½ = 1
Info(Tblack) = - ¾ log ¾ - ¼ log ¼
For attribute Race,
Info(Twhite) = - ¾ log ¾ - ¼ log ¼
Info(Race, T) = ½ x Info(Tblack) + ½ x Info(Twhite)
Gain(Race, T) = Info(T) – Info(Race, T)= 1 – 0.8113 = 0.1887
For attribute Race, Gain(Race, T) = 0.1887
= 0.8113
= 0.8113
= 0.8113
COMP5331 13
EntropyRace Incom
eChild Insuranc
eblac
khigh no yes
white
high yes yes
white
low yes yes
white
low yes yes
black
low no no
black
low no no
black
low no no
white
low no no
Info(T) = - ½ log ½ - ½ log ½ = 1
Info(Tlow) = - 1/3 log 1/3 – 2/3 log 2/3
For attribute Income,
Info(Thigh) = - 1 log 1 – 0 log 0
Info(Income, T) = ¼ x Info(Thigh) + ¾ x Info(Tlow)
Gain(Income, T) = Info(T) – Info(Income, T)= 1 – 0.6887 = 0.3113
For attribute Race, Gain(Race, T) = 0.1887For attribute Income, Gain(Income, T) = 0.3113
= 0.9183
= 0
= 0.6887
COMP5331 14
Race Income
Child Insurance
black
high no yes
white
high yes yes
white
low yes yes
white
low yes yes
black
low no no
black
low no no
black
low no no
white
low no no
Info(T) = - ½ log ½ - ½ log ½ = 1
Info(Tyes) = - 1 log 1 – 0 log 0
For attribute Child,
Info(Tno) = - 1/5 log 1/5 – 4/5 log 4/5
Info(Child, T) = 3/8 x Info(Tyes) + 5/8 x Info(Tno)
Gain(Child, T) = Info(T) – Info(Child, T)= 1 – 0.4512= 0.5488
For attribute Race, Gain(Race, T) = 0.1887For attribute Income, Gain(Income, T) = 0.3113For attribute Child, Gain(Child, T) = 0.5488
= 0.7219
= 0
= 0.4512
12345678
root
child=yes child=no
{2, 3, 4}
{1, 5, 6, 7, 8}Insurance: 3 Yes; 0 NoInsurance: 1 Yes; 4 No
100% Yes0% No
20% Yes80% No
COMP5331 15
Race Income
Child Insurance
black
high no yes
white
high yes yes
white
low yes yes
white
low yes yes
black
low no no
black
low no no
black
low no no
white
low no no
Info(T) = - 1/5 log 1/5 – 4/5 log 4/5 = 0.7219
Info(Tblack) = - ¼ log ¼ – ¾ log ¾
For attribute Race,
Info(Twhite) = - 0 log 0 – 1 log 1
Info(Race, T) = 4/5 x Info(Tblack) + 1/5 x Info(Twhite)
Gain(Race, T) = Info(T) – Info(Race, T)= 0.7219 – 0.6490= 0.0729
For attribute Race, Gain(Race, T) = 0.0729
= 0
= 0.8113
= 0.6490
12345678
root
child=yes child=no
{2, 3, 4}
{1, 5, 6, 7, 8}Insurance: 3 Yes; 0 NoInsurance: 1 Yes; 4 No
100% Yes0% No
20% Yes80% No
COMP5331 16
Race Income
Child Insurance
black
high no yes
white
high yes yes
white
low yes yes
white
low yes yes
black
low no no
black
low no no
black
low no no
white
low no no
Info(T) = - 1/5 log 1/5 – 4/5 log 4/5 = 0.7219
Info(Thigh) = - 1 log 1 – 0 log 0
For attribute Income,
Info(Tlow) = - 0 log 0 – 1 log 1
Info(Income, T) = 1/5 x Info(Thigh) + 4/5 x Info(Tlow)
Gain(Income, T) = Info(T) – Info(Income, T)= 0.7219 – 0= 0.7219
For attribute Race, Gain(Race, T) = 0.0729
= 0
= 0
= 0
12345678
root
child=yes child=no
{2, 3, 4}
{1, 5, 6, 7, 8}Insurance: 3 Yes; 0 NoInsurance: 1 Yes; 4 No
For attribute Income, Gain(Income, T) = 0.7219
100% Yes0% No
20% Yes80% No
COMP5331 17
Race Income
Child Insurance
black
high no yes
white
high yes yes
white
low yes yes
white
low yes yes
black
low no no
black
low no no
black
low no no
white
low no no
Info(T) = - 1/5 log 1/5 – 4/5 log 4/5 = 0.7219
Info(Thigh) = - 1 log 1 – 0 log 0
For attribute Income,
Info(Tlow) = - 0 log 0 – 1 log 1
Info(Income, T) = 1/5 x Info(Thigh) + 4/5 x Info(Tlow)
Gain(Income, T) = Info(T) – Info(Income, T)= 0.7219 – 0= 0.7219
For attribute Race, Gain(Race, T) = 0.0729
= 0
= 0
= 0
12345678
root
child=yes child=no
For attribute Income, Gain(Income, T) = 0.7219
Income=high Income=low
100% Yes0% No
20% Yes80% No
{1}
{5, 6, 7, 8}Insurance: 1 Yes; 0 NoInsurance: 0 Yes; 4 No
100% Yes0% No
0% Yes100% No
COMP5331 18
Race Income
Child Insurance
black
high no yes
white
high yes yes
white
low yes yes
white
low yes yes
black
low no no
black
low no no
black
low no no
white
low no no
12345678
root
child=yes child=no
Income=high Income=low
100% Yes0% No
100% Yes0% No
0% Yes100% No
Decision tree
Race Income
Child Insurance
white
high no ?
Suppose there is a new person.
COMP5331 19
Race Income
Child Insurance
black
high no yes
white
high yes yes
white
low yes yes
white
low yes yes
black
low no no
black
low no no
black
low no no
white
low no no
12345678
root
child=yes child=no
Income=high Income=low
100% Yes0% No
100% Yes0% No
0% Yes100% No
Decision tree
Termination Criteria?
e.g., height of the treee.g., accuracy of each
node
COMP5331 20
Decision Trees ID3 C4.5 CART
COMP5331 21
C4.5 ID3
Impurity Measurement Gain(A, T)
= Info(T) – Info(A, T) C4.5
Impurity Measurement Gain(A, T)
= (Info(T) – Info(A, T))/SplitInfo(A) where SplitInfo(A) = -vA p(v) log p(v)
COMP5331 22
EntropyRace Incom
eChild Insuranc
eblac
khigh no yes
white
high yes yes
white
low yes yes
white
low yes yes
black
low no no
black
low no no
black
low no no
white
low no no
Info(T) = - ½ log ½ - ½ log ½ = 1
For attribute Race,
Gain(Race, T) = (Info(T) – Info(Race, T))/SplitInfo(Race) = (1 – 0.8113)/1 = 0.1887
For attribute Race, Gain(Race, T) = 0.1887
Info(Tblack) = - ¾ log ¾ - ¼ log ¼ Info(Twhite) = - ¾ log ¾ - ¼ log ¼ Info(Race, T) = ½ x Info(Tblack) + ½ x Info(Twhite)
= 0.8113= 0.8113
= 0.8113SplitInfo(Race) = - ½ log ½ - ½ log ½ = 1
COMP5331 23
EntropyRace Incom
eChild Insuranc
eblac
khigh no yes
white
high yes yes
white
low yes yes
white
low yes yes
black
low no no
black
low no no
black
low no no
white
low no no
Info(T) = - ½ log ½ - ½ log ½ = 1
For attribute Income,
Gain(Income, T)= (Info(T)–Info(Income, T))/SplitInfo(Income) = (1–0.6887)/0.8113= 0.3837
For attribute Race, Gain(Race, T) = 0.1887For attribute Income, Gain(Income, T) = 0.3837
Info(Tlow) = - 1/3 log 1/3 – 2/3 log 2/3 Info(Thigh) = - 1 log 1 – 0 log 0
Info(Income, T) = ¼ x Info(Thigh) + ¾ x Info(Tlow) = 0.9183
= 0
= 0.6887SplitInfo(Income) = - 2/8 log 2/8 – 6/8 log 6/8= 0.8113
For attribute Child, Gain(Child, T) = ?
COMP5331 24
Decision Trees ID3 C4.5 CART
COMP5331 25
CART Impurity Measurement
GiniI(P) = 1 – j pj
2
COMP5331 26
GiniRace Incom
eChild Insuranc
eblac
khigh no yes
white
high yes yes
white
low yes yes
white
low yes yes
black
low no no
black
low no no
black
low no no
white
low no no
Info(T) = 1 – (½)2 – (½)2 = ½
Info(Tblack) = 1 – (¾)2 – (¼)2
For attribute Race,
Info(Twhite) = 1 – (¾)2 – (¼)2
Info(Race, T) = ½ x Info(Tblack) + ½ x Info(Twhite)
Gain(Race, T) = Info(T) – Info(Race, T)= ½ – 0.375 = 0.125
For attribute Race, Gain(Race, T) = 0.125
= 0.375
= 0.375
= 0.375
COMP5331 27
GiniRace Incom
eChild Insuranc
eblac
khigh no yes
white
high yes yes
white
low yes yes
white
low yes yes
black
low no no
black
low no no
black
low no no
white
low no no
Info(T) = 1 – (½)2 – (½)2 = ½
Info(Tlow) = 1 – (1/3)2 – (2/3)2
For attribute Income,
Info(Thigh) = 1 – 12 – 02
Info(Income, T) = 1/4 x Info(Thigh) + 3/4 x Info(Tlow)
Gain(Income, T) = Info(T) – Info(Race, T)= ½ – 0.333 = 0.167
For attribute Race, Gain(Race, T) = 0.125
= 0.444
= 0.333
= 0
For attribute Income, Gain(Race, T) = 0.167For attribute Child, Gain(Child, T) = ?
COMP5331 28
Classification Methods Decision Tree Bayesian Classifier Nearest Neighbor Classifier
COMP5331 29
Bayesian Classifier Naïve Bayes Classifier Bayesian Belief Networks
COMP5331 30
Naïve Bayes Classifier Statistical Classifiers Probabilities Conditional probabilities
COMP5331 31
Naïve Bayes Classifier Conditional Probability
A: a random variable B: a random variable
P(A | B) = P(AB)P(B)
COMP5331 32
Naïve Bayes Classifier Bayes Rule
A : a random variable B: a random variable
P(A | B) = P(B|A) P(A)P(B)
COMP5331 33
Naïve Bayes Classifier
Independent Assumption Each attribute are independent e.g.,
P(X, Y, Z | A) = P(X | A) x P(Y | A) x P(Z | A)
Race Income
Child Insurance
black
high no yes
white
high yes yes
white
low yes yes
white
low yes yes
black
low no no
black
low no no
black
low no no
white
low no no
COMP5331 34
Naïve Bayes ClassifierRace Incom
eChild Insuranc
eblac
khigh no yes
white
high yes yes
white
low yes yes
white
low yes yes
black
low no no
black
low no no
black
low no no
white
low no no
P(Race = black | Yes) = ¼ For attribute Race,
P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼
P(Income = high | Yes) = ½ For attribute Income,
P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1
P(Child = yes | Yes) = ¾ For attribute Child,
P(Child = no | Yes) = ¼ P(Child = yes | No) = 0P(Child = no | No) = 1
Insurance = Yes
P(Yes) = ½ P(No) = ½
Race Income
Child Insurance
white
high no ?
Suppose there is a new person.
P(Race = white, Income = high, Child = no| Yes)Naïve Bayes Classifier
= P(Race = white | Yes) x P(Income = high | Yes)x P(Child = no | Yes)
= ¾ x ½ x ¼ = 0.09375P(Race = white, Income = high, Child = no| No)
= P(Race = white | No) x P(Income = high | No)x P(Child = no | No)
= ¼ x 0 x 1= 0
COMP5331 35
Naïve Bayes ClassifierRace Incom
eChild Insuranc
eblac
khigh no yes
white
high yes yes
white
low yes yes
white
low yes yes
black
low no no
black
low no no
black
low no no
white
low no no
P(Race = black | Yes) = ¼ For attribute Race,
P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼
P(Income = high | Yes) = ½ For attribute Income,
P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1
P(Child = yes | Yes) = ¾ For attribute Child,
P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1
Insurance = Yes
P(Yes) = ½ P(No) = ½
Race Income
Child Insurance
white
high no ?
Suppose there is a new person.
P(Race = white, Income = high, Child = no| Yes)Naïve Bayes Classifier
= 0.09375P(Race = white, Income = high, Child = no| No)
= P(Race = white | No) x P(Income = high | No)x P(Child = no | No)
= ¼ x 0 x 1= 0
COMP5331 36
Naïve Bayes ClassifierRace Incom
eChild Insuranc
eblac
khigh no yes
white
high yes yes
white
low yes yes
white
low yes yes
black
low no no
black
low no no
black
low no no
white
low no no
P(Race = black | Yes) = ¼ For attribute Race,
P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼
P(Income = high | Yes) = ½ For attribute Income,
P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1
P(Child = yes | Yes) = ¾ For attribute Child,
P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1
Insurance = Yes
P(Yes) = ½ P(No) = ½
Race Income
Child Insurance
white
high no ?
Suppose there is a new person.
P(Race = white, Income = high, Child = no| Yes)Naïve Bayes Classifier
= 0.09375
P(Race = white, Income = high, Child = no| No)= P(Race = white | No) x P(Income = high | No)
x P(Child = no | No)= ¼ x 0 x 1= 0
COMP5331 37
Naïve Bayes ClassifierRace Incom
eChild Insuranc
eblac
khigh no yes
white
high yes yes
white
low yes yes
white
low yes yes
black
low no no
black
low no no
black
low no no
white
low no no
P(Race = black | Yes) = ¼ For attribute Race,
P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼
P(Income = high | Yes) = ½ For attribute Income,
P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1
P(Child = yes | Yes) = ¾ For attribute Child,
P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1
Insurance = Yes
P(Yes) = ½ P(No) = ½
Race Income
Child Insurance
white
high no ?
Suppose there is a new person.
P(Race = white, Income = high, Child = no| Yes)Naïve Bayes Classifier
= 0.09375
P(Race = white, Income = high, Child = no| No)
= 0
COMP5331 38
Naïve Bayes ClassifierRace Incom
eChild Insuranc
eblac
khigh no yes
white
high yes yes
white
low yes yes
white
low yes yes
black
low no no
black
low no no
black
low no no
white
low no no
P(Race = black | Yes) = ¼ For attribute Race,
P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼
P(Income = high | Yes) = ½ For attribute Income,
P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1
P(Child = yes | Yes) = ¾ For attribute Child,
P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1
Insurance = Yes
P(Yes) = ½ P(No) = ½
Race Income
Child Insurance
white
high no ?
Suppose there is a new person.
P(Race = white, Income = high, Child = no| Yes)Naïve Bayes Classifier
= 0.09375
P(Race = white, Income = high, Child = no| No)= 0
COMP5331 39
Naïve Bayes ClassifierRace Incom
eChild Insuranc
eblac
khigh no yes
white
high yes yes
white
low yes yes
white
low yes yes
black
low no no
black
low no no
black
low no no
white
low no no
P(Race = black | Yes) = ¼ For attribute Race,
P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼
P(Income = high | Yes) = ½ For attribute Income,
P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1
P(Child = yes | Yes) = ¾ For attribute Child,
P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1
Insurance = Yes
P(Yes) = ½ P(No) = ½
Race Income
Child Insurance
white
high no ?
Suppose there is a new person.
P(Race = white, Income = high, Child = no| Yes)Naïve Bayes Classifier
= 0.09375P(Race = white, Income = high, Child = no| No)= 0
P(Yes | Race = white, Income = high, Child = no)=P(Race = white, Income = high, Child = no| Yes) P(Yes)
P(Race = white, Income = high, Child = no)= 0.09375 x 0.5P(Race = white, Income = high, Child = no)= 0.046875P(Race = white, Income = high, Child = no)
P(Yes | Race = white, Income = high, Child = no)= 0.046875P(Race = white, Income = high, Child = no)
COMP5331 40
Naïve Bayes ClassifierRace Incom
eChild Insuranc
eblac
khigh no yes
white
high yes yes
white
low yes yes
white
low yes yes
black
low no no
black
low no no
black
low no no
white
low no no
P(Race = black | Yes) = ¼ For attribute Race,
P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼
P(Income = high | Yes) = ½ For attribute Income,
P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1
P(Child = yes | Yes) = ¾ For attribute Child,
P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1
Insurance = Yes
P(Yes) = ½ P(No) = ½
Race Income
Child Insurance
white
high no ?
Suppose there is a new person.
P(Race = white, Income = high, Child = no| Yes)Naïve Bayes Classifier
= 0.09375P(Race = white, Income = high, Child = no| No)= 0
P(No | Race = white, Income = high, Child = no)=P(Race = white, Income = high, Child = no| No) P(No)
P(Race = white, Income = high, Child = no)= 0 x 0.5P(Race = white, Income = high, Child = no)= 0P(Race = white, Income = high, Child = no)
P(Yes | Race = white, Income = high, Child = no)= 0.046875P(Race = white, Income = high, Child = no)
P(No | Race = white, Income = high, Child = no)= 0P(Race = white, Income = high, Child = no)
COMP5331 41
Naïve Bayes ClassifierRace Incom
eChild Insuranc
eblac
khigh no yes
white
high yes yes
white
low yes yes
white
low yes yes
black
low no no
black
low no no
black
low no no
white
low no no
P(Race = black | Yes) = ¼ For attribute Race,
P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼
P(Income = high | Yes) = ½ For attribute Income,
P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1
P(Child = yes | Yes) = ¾ For attribute Child,
P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1
Insurance = Yes
P(Yes) = ½ P(No) = ½
Race Income
Child Insurance
white
high no ?
Suppose there is a new person.
Naïve Bayes Classifier
P(Yes | Race = white, Income = high, Child = no)= 0.046875P(Race = white, Income = high, Child = no)
P(No | Race = white, Income = high, Child = no)= 0P(Race = white, Income = high, Child = no)
Since P(Yes | Race = white, Income = high, Child = no)> P(No | Race = white, Income = high, Child = no).we predict the following new person will buy an insurance.
Race Income
Child Insurance
white
high no ?
COMP5331 42
Bayesian Classifier Naïve Bayes Classifier Bayesian Belief Networks
COMP5331 43
Bayesian Belief Network Naïve Bayes Classifier
Independent Assumption Bayesian Belief Network
Do not have independent assumption
COMP5331 44
Bayesian Belief NetworkExercise Diet Heartburn Blood
PressureChest Pain
Heart Disease
Yes Healthy No High Yes NoNo Unhealthy Yes Low Yes NoNo Healthy Yes High No Yes… … … … … …
Yes/No Healthy/Unhealth
yYes/No Yes/No Yes/NoHigh/Low
Some attributes are dependent on other attributes.
e.g., doing exercises may reduce the probability of suffering from Heart Disease
Exercise (E)
Heart Disease
COMP5331 45
Bayesian Belief NetworkExercise (E) Diet (D)
Heart Disease (HD) Heartburn (Hb)
Blood Pressure (BP) Chest Pain (CP)
E = Yes0.7
D = Healthy0.25
HD=Yes
E=YesD=Healthy 0.25
E=YesD=Unhealthy
0.45
E=NoD=Healthy 0.55
E=NoD=Unhealthy
0.75
CP=Yes
HD=YesHb=Yes
0.8
HD=YesHb=No
0.6
HD=NoHb=Yes
0.4
HD=NoHb=No 0.1
BP=High
HD=Yes 0.85HD=No 0.2
Hb=YesD=Healthy 0.85D=Unhealthy 0.2
COMP5331 46
Bayesian Belief NetworkLet X, Y, Z be three random variables. X is said to be conditionally independent of Y given Z if the following holds. P(X | Y, Z) = P(X | Z)
Lemma: If X is conditionally independent of Y given Z, P(X, Y | Z) = P(X | Z) x P(Y | Z) ?
COMP5331 47
Bayesian Belief NetworkExercise (E) Diet (D)
Heart Disease (HD) Heartburn (Hb)
Blood Pressure (BP) Chest Pain (CP)
Let X, Y, Z be three random variables. X is said to be conditionally independent of Y given Z if the following holds. P(X | Y, Z) = P(X | Z)
e.g., P(BP = High | HD = Yes, D = Healthy) = P(BP = High | HD = Yes)
e.g., P(BP = High | HD = Yes, CP=Yes) = P(BP = High | HD = Yes) “BP = High” is conditionally independent of “D = Healthy” given “HD = Yes”
“BP = High” is conditionally independent of “CP = Yes” given “HD = Yes”
Property: A node is conditionally independent of its non-descendants if its parents are known.
COMP5331 48
Bayesian Belief NetworkExercise Diet Heartburn Blood
PressureChest Pain
Heart Disease
Yes Healthy No High Yes NoNo Unhealthy Yes Low Yes NoNo Healthy Yes High No Yes… … … … … …
Suppose there is a new person and I want to know whether he is likely to have Heart Disease.
Exercise Diet Heartburn Blood Pressure
Chest Pain
Heart Disease
? ? ? ? ? ?
Yes/No Healthy/Unhealth
yYes/No Yes/No Yes/NoHigh/Low
Exercise Diet Heartburn Blood Pressure
Chest Pain
Heart Disease
? ? ? High ? ?Exercise Diet Heartburn Blood
PressureChest Pain
Heart Disease
Yes Healthy ? High ? ?
COMP5331 49
Bayesian Belief NetworkSuppose there is a new person and I want to know whether he is likely to have Heart Disease.
Exercise Diet Heartburn Blood Pressure
Chest Pain
Heart Disease
? ? ? ? ? ?
P(HD = Yes) = x{Yes, No} y{Healthy, Unhealthy} P(HD=Yes|E=x, D=y) x P(E=x, D=y)= x{Yes, No} y{Healthy, Unhealthy} P(HD=Yes|E=x, D=y) x P(E=x) x P(D=y)
= 0.25 x 0.7 x 0.25 + 0.45 x 0.7 x 0.75 + 0.55 x 0.3 x 0.25+ 0.75 x 0.3 x 0.75
= 0.49P(HD = No) = 1- P(HD = Yes)
= 1- 0.49= 0.51
COMP5331 50
Bayesian Belief NetworkSuppose there is a new person and I want to know whether he is likely to have Heart Disease.
Exercise Diet Heartburn Blood Pressure
Chest Pain
Heart Disease
? ? ? High ? ?
P(BP = High) = x{Yes, No} P(BP = High|HD=x) x P(HD = x)= 0.85x0.49 + 0.2x0.51= 0.5185
P(HD = Yes|BP = High) =P(BP = High|HD=Yes) x P(HD = Yes)P(BP = High)
= 0.85 x 0.490.5185
= 0.8033P(HD = No|BP = High) = 1 – P(HD = Yes|BP = High)
= 1 – 0.8033= 0.1967
COMP5331 51
Bayesian Belief NetworkSuppose there is a new person and I want to know whether he is likely to have Heart Disease.
Exercise Diet Heartburn Blood Pressure
Chest Pain
Heart Disease
Yes Healthy ? High ? ?
P(HD = Yes | BP = High, D = Healthy, E = Yes)
=P(BP = High | HD = Yes, D = Healthy, E = Yes)P(BP = High | D = Healthy, E = Yes)
x P(HD = Yes|D = Healthy, E = Yes)
P(BP = High|HD = Yes) P(HD = Yes|D = Healthy, E = Yes)x {Yes, No} P(BP=High|HD=x) P(HD=x|D=Healthy, E=Yes)
=
0.85x0.250.85x0.25 + 0.2x0.75
=
= 0.5862P(HD = No | BP = High, D = Healthy, E = Yes)
= 1- P(HD = Yes | BP = High, D = Healthy, E = Yes)= 1-0.5862= 0.4138
COMP5331 52
Classification Methods Decision Tree Bayesian Classifier Nearest Neighbor Classifier
COMP5331 53
Nearest Neighbor Classifier
Computer
History
100 4090 4520 95… … Computer
History
COMP5331 54
Nearest Neighbor Classifier
Computer
History Buy Book?
100 40 No (-)90 45 Yes (+)20 95 Yes (+)… … … Computer
History
++
+---
+
--
-+
COMP5331 55
Nearest Neighbor Classifier
Computer
History Buy Book?
100 40 No (-)90 45 Yes (+)20 95 Yes (+)… … … Computer
History
++
+---
+
--
-+
Suppose there is a new person
Computer History Buy Book?
95 35 ?
Nearest Neighbor Classifier:Step 1: Find the nearest neighborStep 2: Use the “label” of this neighbor
COMP5331 56
Nearest Neighbor Classifier
Computer
History Buy Book?
100 40 No (-)90 45 Yes (+)20 95 Yes (+)… … … Computer
History
++
+---
+
--
-+
Suppose there is a new person
Computer History Buy Book?
95 35 ?
k-Nearest Neighbor Classifier:Step 1: Find k nearest neighborsStep 2: Use the majority of the labels ofthe neighbors