56
COMP5331 1 COMP5331 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan’s notes Presented by Raymond Wong raywong@cse

COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

Embed Size (px)

DESCRIPTION

COMP53313 Classification root child=yeschild=no Income=high Income=low 100% Yes 0% No 100% Yes 0% No 0% Yes 100% No Decision tree RaceIncomeChildInsurance whitehighno? Suppose there is a person. RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno Training set Test set

Citation preview

Page 1: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 1

COMP5331

Classification

Prepared by Raymond WongThe examples used in Decision Tree are borrowed from LW

Chan’s notesPresented by Raymond Wong

raywong@cse

Page 2: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 2

Classification

root

child=yes child=no

Income=high Income=low

100% Yes0% No

100% Yes0% No

0% Yes100% No

Decision tree

Race Income

Child Insurance

white

high no ?

Suppose there is a person.

Page 3: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 3

Classification

root

child=yes child=no

Income=high Income=low

100% Yes0% No

100% Yes0% No

0% Yes100% No

Decision tree

Race Income

Child Insurance

white

high no ?

Suppose there is a person.

Race Income Child Insurance

black high no yes

white high yes yes

white low yes yes

white low yes yesblack low no no

black low no no

black low no no

white low no no

Training set

Test set

Page 4: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 4

Applications Insurance

According to the attributes of customers, Determine which customers will buy an insurance

policy Marketing

According to the attributes of customers, Determine which customers will buy a product

such as computers Bank Loan

According to the attributes of customers, Determine which customers are “risky” customers

or “safe” customers

Page 5: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 5

Applications Network

According to the traffic patterns, Determine whether the patterns are

related to some “security attacks” Software

According to the experience of programmers,

Determine which programmers can fix some certain bugs

Page 6: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 6

Same/Difference Classification Clustering

Page 7: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 7

Classification Methods Decision Tree Bayesian Classifier Nearest Neighbor Classifier

Page 8: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 8

Decision Trees ID3 C4.5 CART

Iterative Dichotomiser

Classification And Regression Trees

Classification

Page 9: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 9

Entropy Example 1

Consider a random variable which has a uniform distribution over 32 outcomes

To identify an outcome, we need a label that takes 32 different values.

Thus, 5 bit strings suffice as labels

Page 10: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 10

Entropy Entropy is used to measure how

informative is a node. If we are given a probability distribution

P = (p1, p2, …, pn) then the Information conveyed by this distribution, also called the Entropy of P, is:I(P) = - (p1 x log p1 + p2 x log p2 + …+ pn x log pn)

All logarithms here are in base 2.

Page 11: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 11

Entropy For example,

If P is (0.5, 0.5), then I(P) is 1. If P is (0.67, 0.33), then I(P) is 0.92, If P is (1, 0), then I(P) is 0.

The entropy is a way to measure the amount of information.

The smaller the entropy, the more informative we have.

Page 12: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 12

EntropyRace Incom

eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

Info(T) = - ½ log ½ - ½ log ½ = 1

Info(Tblack) = - ¾ log ¾ - ¼ log ¼

For attribute Race,

Info(Twhite) = - ¾ log ¾ - ¼ log ¼

Info(Race, T) = ½ x Info(Tblack) + ½ x Info(Twhite)

Gain(Race, T) = Info(T) – Info(Race, T)= 1 – 0.8113 = 0.1887

For attribute Race, Gain(Race, T) = 0.1887

= 0.8113

= 0.8113

= 0.8113

Page 13: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 13

EntropyRace Incom

eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

Info(T) = - ½ log ½ - ½ log ½ = 1

Info(Tlow) = - 1/3 log 1/3 – 2/3 log 2/3

For attribute Income,

Info(Thigh) = - 1 log 1 – 0 log 0

Info(Income, T) = ¼ x Info(Thigh) + ¾ x Info(Tlow)

Gain(Income, T) = Info(T) – Info(Income, T)= 1 – 0.6887 = 0.3113

For attribute Race, Gain(Race, T) = 0.1887For attribute Income, Gain(Income, T) = 0.3113

= 0.9183

= 0

= 0.6887

Page 14: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 14

Race Income

Child Insurance

black

high no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

Info(T) = - ½ log ½ - ½ log ½ = 1

Info(Tyes) = - 1 log 1 – 0 log 0

For attribute Child,

Info(Tno) = - 1/5 log 1/5 – 4/5 log 4/5

Info(Child, T) = 3/8 x Info(Tyes) + 5/8 x Info(Tno)

Gain(Child, T) = Info(T) – Info(Child, T)= 1 – 0.4512= 0.5488

For attribute Race, Gain(Race, T) = 0.1887For attribute Income, Gain(Income, T) = 0.3113For attribute Child, Gain(Child, T) = 0.5488

= 0.7219

= 0

= 0.4512

12345678

root

child=yes child=no

{2, 3, 4}

{1, 5, 6, 7, 8}Insurance: 3 Yes; 0 NoInsurance: 1 Yes; 4 No

100% Yes0% No

20% Yes80% No

Page 15: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 15

Race Income

Child Insurance

black

high no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

Info(T) = - 1/5 log 1/5 – 4/5 log 4/5 = 0.7219

Info(Tblack) = - ¼ log ¼ – ¾ log ¾

For attribute Race,

Info(Twhite) = - 0 log 0 – 1 log 1

Info(Race, T) = 4/5 x Info(Tblack) + 1/5 x Info(Twhite)

Gain(Race, T) = Info(T) – Info(Race, T)= 0.7219 – 0.6490= 0.0729

For attribute Race, Gain(Race, T) = 0.0729

= 0

= 0.8113

= 0.6490

12345678

root

child=yes child=no

{2, 3, 4}

{1, 5, 6, 7, 8}Insurance: 3 Yes; 0 NoInsurance: 1 Yes; 4 No

100% Yes0% No

20% Yes80% No

Page 16: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 16

Race Income

Child Insurance

black

high no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

Info(T) = - 1/5 log 1/5 – 4/5 log 4/5 = 0.7219

Info(Thigh) = - 1 log 1 – 0 log 0

For attribute Income,

Info(Tlow) = - 0 log 0 – 1 log 1

Info(Income, T) = 1/5 x Info(Thigh) + 4/5 x Info(Tlow)

Gain(Income, T) = Info(T) – Info(Income, T)= 0.7219 – 0= 0.7219

For attribute Race, Gain(Race, T) = 0.0729

= 0

= 0

= 0

12345678

root

child=yes child=no

{2, 3, 4}

{1, 5, 6, 7, 8}Insurance: 3 Yes; 0 NoInsurance: 1 Yes; 4 No

For attribute Income, Gain(Income, T) = 0.7219

100% Yes0% No

20% Yes80% No

Page 17: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 17

Race Income

Child Insurance

black

high no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

Info(T) = - 1/5 log 1/5 – 4/5 log 4/5 = 0.7219

Info(Thigh) = - 1 log 1 – 0 log 0

For attribute Income,

Info(Tlow) = - 0 log 0 – 1 log 1

Info(Income, T) = 1/5 x Info(Thigh) + 4/5 x Info(Tlow)

Gain(Income, T) = Info(T) – Info(Income, T)= 0.7219 – 0= 0.7219

For attribute Race, Gain(Race, T) = 0.0729

= 0

= 0

= 0

12345678

root

child=yes child=no

For attribute Income, Gain(Income, T) = 0.7219

Income=high Income=low

100% Yes0% No

20% Yes80% No

{1}

{5, 6, 7, 8}Insurance: 1 Yes; 0 NoInsurance: 0 Yes; 4 No

100% Yes0% No

0% Yes100% No

Page 18: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 18

Race Income

Child Insurance

black

high no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

12345678

root

child=yes child=no

Income=high Income=low

100% Yes0% No

100% Yes0% No

0% Yes100% No

Decision tree

Race Income

Child Insurance

white

high no ?

Suppose there is a new person.

Page 19: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 19

Race Income

Child Insurance

black

high no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

12345678

root

child=yes child=no

Income=high Income=low

100% Yes0% No

100% Yes0% No

0% Yes100% No

Decision tree

Termination Criteria?

e.g., height of the treee.g., accuracy of each

node

Page 20: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 20

Decision Trees ID3 C4.5 CART

Page 21: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 21

C4.5 ID3

Impurity Measurement Gain(A, T)

= Info(T) – Info(A, T) C4.5

Impurity Measurement Gain(A, T)

= (Info(T) – Info(A, T))/SplitInfo(A) where SplitInfo(A) = -vA p(v) log p(v)

Page 22: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 22

EntropyRace Incom

eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

Info(T) = - ½ log ½ - ½ log ½ = 1

For attribute Race,

Gain(Race, T) = (Info(T) – Info(Race, T))/SplitInfo(Race) = (1 – 0.8113)/1 = 0.1887

For attribute Race, Gain(Race, T) = 0.1887

Info(Tblack) = - ¾ log ¾ - ¼ log ¼ Info(Twhite) = - ¾ log ¾ - ¼ log ¼ Info(Race, T) = ½ x Info(Tblack) + ½ x Info(Twhite)

= 0.8113= 0.8113

= 0.8113SplitInfo(Race) = - ½ log ½ - ½ log ½ = 1

Page 23: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 23

EntropyRace Incom

eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

Info(T) = - ½ log ½ - ½ log ½ = 1

For attribute Income,

Gain(Income, T)= (Info(T)–Info(Income, T))/SplitInfo(Income) = (1–0.6887)/0.8113= 0.3837

For attribute Race, Gain(Race, T) = 0.1887For attribute Income, Gain(Income, T) = 0.3837

Info(Tlow) = - 1/3 log 1/3 – 2/3 log 2/3 Info(Thigh) = - 1 log 1 – 0 log 0

Info(Income, T) = ¼ x Info(Thigh) + ¾ x Info(Tlow) = 0.9183

= 0

= 0.6887SplitInfo(Income) = - 2/8 log 2/8 – 6/8 log 6/8= 0.8113

For attribute Child, Gain(Child, T) = ?

Page 24: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 24

Decision Trees ID3 C4.5 CART

Page 25: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 25

CART Impurity Measurement

GiniI(P) = 1 – j pj

2

Page 26: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 26

GiniRace Incom

eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

Info(T) = 1 – (½)2 – (½)2 = ½

Info(Tblack) = 1 – (¾)2 – (¼)2

For attribute Race,

Info(Twhite) = 1 – (¾)2 – (¼)2

Info(Race, T) = ½ x Info(Tblack) + ½ x Info(Twhite)

Gain(Race, T) = Info(T) – Info(Race, T)= ½ – 0.375 = 0.125

For attribute Race, Gain(Race, T) = 0.125

= 0.375

= 0.375

= 0.375

Page 27: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 27

GiniRace Incom

eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

Info(T) = 1 – (½)2 – (½)2 = ½

Info(Tlow) = 1 – (1/3)2 – (2/3)2

For attribute Income,

Info(Thigh) = 1 – 12 – 02

Info(Income, T) = 1/4 x Info(Thigh) + 3/4 x Info(Tlow)

Gain(Income, T) = Info(T) – Info(Race, T)= ½ – 0.333 = 0.167

For attribute Race, Gain(Race, T) = 0.125

= 0.444

= 0.333

= 0

For attribute Income, Gain(Race, T) = 0.167For attribute Child, Gain(Child, T) = ?

Page 28: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 28

Classification Methods Decision Tree Bayesian Classifier Nearest Neighbor Classifier

Page 29: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 29

Bayesian Classifier Naïve Bayes Classifier Bayesian Belief Networks

Page 30: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 30

Naïve Bayes Classifier Statistical Classifiers Probabilities Conditional probabilities

Page 31: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 31

Naïve Bayes Classifier Conditional Probability

A: a random variable B: a random variable

P(A | B) = P(AB)P(B)

Page 32: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 32

Naïve Bayes Classifier Bayes Rule

A : a random variable B: a random variable

P(A | B) = P(B|A) P(A)P(B)

Page 33: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 33

Naïve Bayes Classifier

Independent Assumption Each attribute are independent e.g.,

P(X, Y, Z | A) = P(X | A) x P(Y | A) x P(Z | A)

Race Income

Child Insurance

black

high no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

Page 34: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 34

Naïve Bayes ClassifierRace Incom

eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

P(Race = black | Yes) = ¼ For attribute Race,

P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼

P(Income = high | Yes) = ½ For attribute Income,

P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1

P(Child = yes | Yes) = ¾ For attribute Child,

P(Child = no | Yes) = ¼ P(Child = yes | No) = 0P(Child = no | No) = 1

Insurance = Yes

P(Yes) = ½ P(No) = ½

Race Income

Child Insurance

white

high no ?

Suppose there is a new person.

P(Race = white, Income = high, Child = no| Yes)Naïve Bayes Classifier

= P(Race = white | Yes) x P(Income = high | Yes)x P(Child = no | Yes)

= ¾ x ½ x ¼ = 0.09375P(Race = white, Income = high, Child = no| No)

= P(Race = white | No) x P(Income = high | No)x P(Child = no | No)

= ¼ x 0 x 1= 0

Page 35: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 35

Naïve Bayes ClassifierRace Incom

eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

P(Race = black | Yes) = ¼ For attribute Race,

P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼

P(Income = high | Yes) = ½ For attribute Income,

P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1

P(Child = yes | Yes) = ¾ For attribute Child,

P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1

Insurance = Yes

P(Yes) = ½ P(No) = ½

Race Income

Child Insurance

white

high no ?

Suppose there is a new person.

P(Race = white, Income = high, Child = no| Yes)Naïve Bayes Classifier

= 0.09375P(Race = white, Income = high, Child = no| No)

= P(Race = white | No) x P(Income = high | No)x P(Child = no | No)

= ¼ x 0 x 1= 0

Page 36: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 36

Naïve Bayes ClassifierRace Incom

eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

P(Race = black | Yes) = ¼ For attribute Race,

P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼

P(Income = high | Yes) = ½ For attribute Income,

P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1

P(Child = yes | Yes) = ¾ For attribute Child,

P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1

Insurance = Yes

P(Yes) = ½ P(No) = ½

Race Income

Child Insurance

white

high no ?

Suppose there is a new person.

P(Race = white, Income = high, Child = no| Yes)Naïve Bayes Classifier

= 0.09375

P(Race = white, Income = high, Child = no| No)= P(Race = white | No) x P(Income = high | No)

x P(Child = no | No)= ¼ x 0 x 1= 0

Page 37: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 37

Naïve Bayes ClassifierRace Incom

eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

P(Race = black | Yes) = ¼ For attribute Race,

P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼

P(Income = high | Yes) = ½ For attribute Income,

P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1

P(Child = yes | Yes) = ¾ For attribute Child,

P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1

Insurance = Yes

P(Yes) = ½ P(No) = ½

Race Income

Child Insurance

white

high no ?

Suppose there is a new person.

P(Race = white, Income = high, Child = no| Yes)Naïve Bayes Classifier

= 0.09375

P(Race = white, Income = high, Child = no| No)

= 0

Page 38: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 38

Naïve Bayes ClassifierRace Incom

eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

P(Race = black | Yes) = ¼ For attribute Race,

P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼

P(Income = high | Yes) = ½ For attribute Income,

P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1

P(Child = yes | Yes) = ¾ For attribute Child,

P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1

Insurance = Yes

P(Yes) = ½ P(No) = ½

Race Income

Child Insurance

white

high no ?

Suppose there is a new person.

P(Race = white, Income = high, Child = no| Yes)Naïve Bayes Classifier

= 0.09375

P(Race = white, Income = high, Child = no| No)= 0

Page 39: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 39

Naïve Bayes ClassifierRace Incom

eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

P(Race = black | Yes) = ¼ For attribute Race,

P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼

P(Income = high | Yes) = ½ For attribute Income,

P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1

P(Child = yes | Yes) = ¾ For attribute Child,

P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1

Insurance = Yes

P(Yes) = ½ P(No) = ½

Race Income

Child Insurance

white

high no ?

Suppose there is a new person.

P(Race = white, Income = high, Child = no| Yes)Naïve Bayes Classifier

= 0.09375P(Race = white, Income = high, Child = no| No)= 0

P(Yes | Race = white, Income = high, Child = no)=P(Race = white, Income = high, Child = no| Yes) P(Yes)

P(Race = white, Income = high, Child = no)= 0.09375 x 0.5P(Race = white, Income = high, Child = no)= 0.046875P(Race = white, Income = high, Child = no)

P(Yes | Race = white, Income = high, Child = no)= 0.046875P(Race = white, Income = high, Child = no)

Page 40: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 40

Naïve Bayes ClassifierRace Incom

eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

P(Race = black | Yes) = ¼ For attribute Race,

P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼

P(Income = high | Yes) = ½ For attribute Income,

P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1

P(Child = yes | Yes) = ¾ For attribute Child,

P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1

Insurance = Yes

P(Yes) = ½ P(No) = ½

Race Income

Child Insurance

white

high no ?

Suppose there is a new person.

P(Race = white, Income = high, Child = no| Yes)Naïve Bayes Classifier

= 0.09375P(Race = white, Income = high, Child = no| No)= 0

P(No | Race = white, Income = high, Child = no)=P(Race = white, Income = high, Child = no| No) P(No)

P(Race = white, Income = high, Child = no)= 0 x 0.5P(Race = white, Income = high, Child = no)= 0P(Race = white, Income = high, Child = no)

P(Yes | Race = white, Income = high, Child = no)= 0.046875P(Race = white, Income = high, Child = no)

P(No | Race = white, Income = high, Child = no)= 0P(Race = white, Income = high, Child = no)

Page 41: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 41

Naïve Bayes ClassifierRace Incom

eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

P(Race = black | Yes) = ¼ For attribute Race,

P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼

P(Income = high | Yes) = ½ For attribute Income,

P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1

P(Child = yes | Yes) = ¾ For attribute Child,

P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1

Insurance = Yes

P(Yes) = ½ P(No) = ½

Race Income

Child Insurance

white

high no ?

Suppose there is a new person.

Naïve Bayes Classifier

P(Yes | Race = white, Income = high, Child = no)= 0.046875P(Race = white, Income = high, Child = no)

P(No | Race = white, Income = high, Child = no)= 0P(Race = white, Income = high, Child = no)

Since P(Yes | Race = white, Income = high, Child = no)> P(No | Race = white, Income = high, Child = no).we predict the following new person will buy an insurance.

Race Income

Child Insurance

white

high no ?

Page 42: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 42

Bayesian Classifier Naïve Bayes Classifier Bayesian Belief Networks

Page 43: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 43

Bayesian Belief Network Naïve Bayes Classifier

Independent Assumption Bayesian Belief Network

Do not have independent assumption

Page 44: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 44

Bayesian Belief NetworkExercise Diet Heartburn Blood

PressureChest Pain

Heart Disease

Yes Healthy No High Yes NoNo Unhealthy Yes Low Yes NoNo Healthy Yes High No Yes… … … … … …

Yes/No Healthy/Unhealth

yYes/No Yes/No Yes/NoHigh/Low

Some attributes are dependent on other attributes.

e.g., doing exercises may reduce the probability of suffering from Heart Disease

Exercise (E)

Heart Disease

Page 45: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 45

Bayesian Belief NetworkExercise (E) Diet (D)

Heart Disease (HD) Heartburn (Hb)

Blood Pressure (BP) Chest Pain (CP)

E = Yes0.7

D = Healthy0.25

HD=Yes

E=YesD=Healthy 0.25

E=YesD=Unhealthy

0.45

E=NoD=Healthy 0.55

E=NoD=Unhealthy

0.75

CP=Yes

HD=YesHb=Yes

0.8

HD=YesHb=No

0.6

HD=NoHb=Yes

0.4

HD=NoHb=No 0.1

BP=High

HD=Yes 0.85HD=No 0.2

Hb=YesD=Healthy 0.85D=Unhealthy 0.2

Page 46: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 46

Bayesian Belief NetworkLet X, Y, Z be three random variables. X is said to be conditionally independent of Y given Z if the following holds. P(X | Y, Z) = P(X | Z)

Lemma: If X is conditionally independent of Y given Z, P(X, Y | Z) = P(X | Z) x P(Y | Z) ?

Page 47: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 47

Bayesian Belief NetworkExercise (E) Diet (D)

Heart Disease (HD) Heartburn (Hb)

Blood Pressure (BP) Chest Pain (CP)

Let X, Y, Z be three random variables. X is said to be conditionally independent of Y given Z if the following holds. P(X | Y, Z) = P(X | Z)

e.g., P(BP = High | HD = Yes, D = Healthy) = P(BP = High | HD = Yes)

e.g., P(BP = High | HD = Yes, CP=Yes) = P(BP = High | HD = Yes) “BP = High” is conditionally independent of “D = Healthy” given “HD = Yes”

“BP = High” is conditionally independent of “CP = Yes” given “HD = Yes”

Property: A node is conditionally independent of its non-descendants if its parents are known.

Page 48: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 48

Bayesian Belief NetworkExercise Diet Heartburn Blood

PressureChest Pain

Heart Disease

Yes Healthy No High Yes NoNo Unhealthy Yes Low Yes NoNo Healthy Yes High No Yes… … … … … …

Suppose there is a new person and I want to know whether he is likely to have Heart Disease.

Exercise Diet Heartburn Blood Pressure

Chest Pain

Heart Disease

? ? ? ? ? ?

Yes/No Healthy/Unhealth

yYes/No Yes/No Yes/NoHigh/Low

Exercise Diet Heartburn Blood Pressure

Chest Pain

Heart Disease

? ? ? High ? ?Exercise Diet Heartburn Blood

PressureChest Pain

Heart Disease

Yes Healthy ? High ? ?

Page 49: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 49

Bayesian Belief NetworkSuppose there is a new person and I want to know whether he is likely to have Heart Disease.

Exercise Diet Heartburn Blood Pressure

Chest Pain

Heart Disease

? ? ? ? ? ?

P(HD = Yes) = x{Yes, No} y{Healthy, Unhealthy} P(HD=Yes|E=x, D=y) x P(E=x, D=y)= x{Yes, No} y{Healthy, Unhealthy} P(HD=Yes|E=x, D=y) x P(E=x) x P(D=y)

= 0.25 x 0.7 x 0.25 + 0.45 x 0.7 x 0.75 + 0.55 x 0.3 x 0.25+ 0.75 x 0.3 x 0.75

= 0.49P(HD = No) = 1- P(HD = Yes)

= 1- 0.49= 0.51

Page 50: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 50

Bayesian Belief NetworkSuppose there is a new person and I want to know whether he is likely to have Heart Disease.

Exercise Diet Heartburn Blood Pressure

Chest Pain

Heart Disease

? ? ? High ? ?

P(BP = High) = x{Yes, No} P(BP = High|HD=x) x P(HD = x)= 0.85x0.49 + 0.2x0.51= 0.5185

P(HD = Yes|BP = High) =P(BP = High|HD=Yes) x P(HD = Yes)P(BP = High)

= 0.85 x 0.490.5185

= 0.8033P(HD = No|BP = High) = 1 – P(HD = Yes|BP = High)

= 1 – 0.8033= 0.1967

Page 51: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 51

Bayesian Belief NetworkSuppose there is a new person and I want to know whether he is likely to have Heart Disease.

Exercise Diet Heartburn Blood Pressure

Chest Pain

Heart Disease

Yes Healthy ? High ? ?

P(HD = Yes | BP = High, D = Healthy, E = Yes)

=P(BP = High | HD = Yes, D = Healthy, E = Yes)P(BP = High | D = Healthy, E = Yes)

x P(HD = Yes|D = Healthy, E = Yes)

P(BP = High|HD = Yes) P(HD = Yes|D = Healthy, E = Yes)x {Yes, No} P(BP=High|HD=x) P(HD=x|D=Healthy, E=Yes)

=

0.85x0.250.85x0.25 + 0.2x0.75

=

= 0.5862P(HD = No | BP = High, D = Healthy, E = Yes)

= 1- P(HD = Yes | BP = High, D = Healthy, E = Yes)= 1-0.5862= 0.4138

Page 52: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 52

Classification Methods Decision Tree Bayesian Classifier Nearest Neighbor Classifier

Page 53: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 53

Nearest Neighbor Classifier

Computer

History

100 4090 4520 95… … Computer

History

Page 54: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 54

Nearest Neighbor Classifier

Computer

History Buy Book?

100 40 No (-)90 45 Yes (+)20 95 Yes (+)… … … Computer

History

++

+---

+

--

-+

Page 55: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 55

Nearest Neighbor Classifier

Computer

History Buy Book?

100 40 No (-)90 45 Yes (+)20 95 Yes (+)… … … Computer

History

++

+---

+

--

-+

Suppose there is a new person

Computer History Buy Book?

95 35 ?

Nearest Neighbor Classifier:Step 1: Find the nearest neighborStep 2: Use the “label” of this neighbor

Page 56: COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 56

Nearest Neighbor Classifier

Computer

History Buy Book?

100 40 No (-)90 45 Yes (+)20 95 Yes (+)… … … Computer

History

++

+---

+

--

-+

Suppose there is a new person

Computer History Buy Book?

95 35 ?

k-Nearest Neighbor Classifier:Step 1: Find k nearest neighborsStep 2: Use the majority of the labels ofthe neighbors