COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP5331 1

COMP5331

Classification

Prepared by Raymond WongThe examples used in Decision Tree are borrowed from LW

Chan’s notesPresented by Raymond Wong

raywong@cse

COMP5331 2

Classification

root

child=yes child=no

Income=high Income=low

100% Yes0% No

100% Yes0% No

0% Yes100% No

Decision tree

Race Income

Child Insurance

white

high no ?

Suppose there is a person.

COMP5331 3

Classification

root

child=yes child=no


100% Yes0% No

100% Yes0% No

0% Yes100% No

Decision tree

Race Income

Child Insurance

white

high no ?

Suppose there is a person.

Race Income Child Insurance

black high no yes

white high yes yes

white low yes yes

white low yes yesblack low no no

black low no no

black low no no

white low no no

Training set

Test set

COMP5331 4

Applications Insurance

According to the attributes of customers, Determine which customers will buy an insurance

policy Marketing

According to the attributes of customers, Determine which customers will buy a product

such as computers Bank Loan

According to the attributes of customers, Determine which customers are “risky” customers

or “safe” customers

COMP5331 5

Applications Network

According to the traffic patterns, Determine whether the patterns are

related to some “security attacks” Software

According to the experience of programmers,

Determine which programmers can fix some certain bugs

COMP5331 6

Same/Difference Classification Clustering

COMP5331 7

Classification Methods Decision Tree Bayesian Classifier Nearest Neighbor Classifier

COMP5331 8

Decision Trees ID3 C4.5 CART

Iterative Dichotomiser

Classification And Regression Trees

Classification

COMP5331 9

Entropy Example 1

Consider a random variable which has a uniform distribution over 32 outcomes

To identify an outcome, we need a label that takes 32 different values.

Thus, 5 bit strings suffice as labels

COMP5331 10

Entropy Entropy is used to measure how

informative is a node. If we are given a probability distribution

P = (p1, p2, …, pn) then the Information conveyed by this distribution, also called the Entropy of P, is:I(P) = - (p1 x log p1 + p2 x log p2 + …+ pn x log pn)

All logarithms here are in base 2.

COMP5331 11

Entropy For example,

If P is (0.5, 0.5), then I(P) is 1. If P is (0.67, 0.33), then I(P) is 0.92, If P is (1, 0), then I(P) is 0.

The entropy is a way to measure the amount of information.

The smaller the entropy, the more informative we have.

COMP5331 12

EntropyRace Incom

eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

Info(T) = - ½ log ½ - ½ log ½ = 1

Info(Tblack) = - ¾ log ¾ - ¼ log ¼

For attribute Race,

Info(Twhite) = - ¾ log ¾ - ¼ log ¼

Info(Race, T) = ½ x Info(Tblack) + ½ x Info(Twhite)

Gain(Race, T) = Info(T) – Info(Race, T)= 1 – 0.8113 = 0.1887

For attribute Race, Gain(Race, T) = 0.1887

= 0.8113

= 0.8113

= 0.8113

COMP5331 13

EntropyRace Incom

eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

Info(T) = - ½ log ½ - ½ log ½ = 1

Info(Tlow) = - 1/3 log 1/3 – 2/3 log 2/3

For attribute Income,

Info(Thigh) = - 1 log 1 – 0 log 0

Info(Income, T) = ¼ x Info(Thigh) + ¾ x Info(Tlow)

Gain(Income, T) = Info(T) – Info(Income, T)= 1 – 0.6887 = 0.3113

For attribute Race, Gain(Race, T) = 0.1887For attribute Income, Gain(Income, T) = 0.3113

= 0.9183

= 0

= 0.6887

COMP5331 14

Race Income

Child Insurance

black

high no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

Info(T) = - ½ log ½ - ½ log ½ = 1

Info(Tyes) = - 1 log 1 – 0 log 0

For attribute Child,

Info(Tno) = - 1/5 log 1/5 – 4/5 log 4/5

Info(Child, T) = 3/8 x Info(Tyes) + 5/8 x Info(Tno)

Gain(Child, T) = Info(T) – Info(Child, T)= 1 – 0.4512= 0.5488

For attribute Race, Gain(Race, T) = 0.1887For attribute Income, Gain(Income, T) = 0.3113For attribute Child, Gain(Child, T) = 0.5488

= 0.7219

= 0

= 0.4512

12345678

root

child=yes child=no

{2, 3, 4}

{1, 5, 6, 7, 8}Insurance: 3 Yes; 0 NoInsurance: 1 Yes; 4 No

100% Yes0% No

20% Yes80% No

COMP5331 15

Race Income

Child Insurance

black

high no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

Info(T) = - 1/5 log 1/5 – 4/5 log 4/5 = 0.7219

Info(Tblack) = - ¼ log ¼ – ¾ log ¾

For attribute Race,

Info(Twhite) = - 0 log 0 – 1 log 1

Info(Race, T) = 4/5 x Info(Tblack) + 1/5 x Info(Twhite)

Gain(Race, T) = Info(T) – Info(Race, T)= 0.7219 – 0.6490= 0.0729


= 0

= 0.8113

= 0.6490

12345678

root

child=yes child=no

{2, 3, 4}


100% Yes0% No

20% Yes80% No

COMP5331 16

Race Income

Child Insurance

black

high no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

Info(T) = - 1/5 log 1/5 – 4/5 log 4/5 = 0.7219



Info(Tlow) = - 0 log 0 – 1 log 1

Info(Income, T) = 1/5 x Info(Thigh) + 4/5 x Info(Tlow)

Gain(Income, T) = Info(T) – Info(Income, T)= 0.7219 – 0= 0.7219


= 0

= 0

= 0

12345678

root

child=yes child=no

{2, 3, 4}


For attribute Income, Gain(Income, T) = 0.7219

100% Yes0% No

20% Yes80% No

COMP5331 17

Race Income

Child Insurance

black

high no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

Info(T) = - 1/5 log 1/5 – 4/5 log 4/5 = 0.7219



Info(Tlow) = - 0 log 0 – 1 log 1


Gain(Income, T) = Info(T) – Info(Income, T)= 0.7219 – 0= 0.7219


= 0

= 0

= 0

12345678

root

child=yes child=no

For attribute Income, Gain(Income, T) = 0.7219


100% Yes0% No

20% Yes80% No

{1}

{5, 6, 7, 8}Insurance: 1 Yes; 0 NoInsurance: 0 Yes; 4 No

100% Yes0% No

0% Yes100% No

COMP5331 18

Race Income

Child Insurance

black

high no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

12345678

root

child=yes child=no


100% Yes0% No

100% Yes0% No

0% Yes100% No

Decision tree

Race Income

Child Insurance

white

high no ?

Suppose there is a new person.

COMP5331 19

Race Income

Child Insurance

black

high no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

12345678

root

child=yes child=no


100% Yes0% No

100% Yes0% No

0% Yes100% No

Decision tree

Termination Criteria?

e.g., height of the treee.g., accuracy of each

node

COMP5331 20


COMP5331 21

C4.5 ID3

Impurity Measurement Gain(A, T)

= Info(T) – Info(A, T) C4.5

Impurity Measurement Gain(A, T)

= (Info(T) – Info(A, T))/SplitInfo(A) where SplitInfo(A) = -vA p(v) log p(v)

COMP5331 22

EntropyRace Incom

eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

Info(T) = - ½ log ½ - ½ log ½ = 1

For attribute Race,

Gain(Race, T) = (Info(T) – Info(Race, T))/SplitInfo(Race) = (1 – 0.8113)/1 = 0.1887


Info(Tblack) = - ¾ log ¾ - ¼ log ¼ Info(Twhite) = - ¾ log ¾ - ¼ log ¼ Info(Race, T) = ½ x Info(Tblack) + ½ x Info(Twhite)

= 0.8113= 0.8113

= 0.8113SplitInfo(Race) = - ½ log ½ - ½ log ½ = 1

COMP5331 23

EntropyRace Incom

eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

Info(T) = - ½ log ½ - ½ log ½ = 1


Gain(Income, T)= (Info(T)–Info(Income, T))/SplitInfo(Income) = (1–0.6887)/0.8113= 0.3837

For attribute Race, Gain(Race, T) = 0.1887For attribute Income, Gain(Income, T) = 0.3837

Info(Tlow) = - 1/3 log 1/3 – 2/3 log 2/3 Info(Thigh) = - 1 log 1 – 0 log 0

Info(Income, T) = ¼ x Info(Thigh) + ¾ x Info(Tlow) = 0.9183

= 0

= 0.6887SplitInfo(Income) = - 2/8 log 2/8 – 6/8 log 6/8= 0.8113

For attribute Child, Gain(Child, T) = ?

COMP5331 24


COMP5331 25

CART Impurity Measurement

GiniI(P) = 1 – j pj

2

COMP5331 26

GiniRace Incom

eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

Info(T) = 1 – (½)2 – (½)2 = ½

Info(Tblack) = 1 – (¾)2 – (¼)2

For attribute Race,

Info(Twhite) = 1 – (¾)2 – (¼)2

Info(Race, T) = ½ x Info(Tblack) + ½ x Info(Twhite)

Gain(Race, T) = Info(T) – Info(Race, T)= ½ – 0.375 = 0.125


= 0.375

= 0.375

= 0.375

COMP5331 27

GiniRace Incom

eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

Info(T) = 1 – (½)2 – (½)2 = ½

Info(Tlow) = 1 – (1/3)2 – (2/3)2


Info(Thigh) = 1 – 12 – 02


Gain(Income, T) = Info(T) – Info(Race, T)= ½ – 0.333 = 0.167


= 0.444

= 0.333

= 0

For attribute Income, Gain(Race, T) = 0.167For attribute Child, Gain(Child, T) = ?

COMP5331 28


COMP5331 29

Bayesian Classifier Naïve Bayes Classifier Bayesian Belief Networks

COMP5331 30

Naïve Bayes Classifier Statistical Classifiers Probabilities Conditional probabilities

COMP5331 31

Naïve Bayes Classifier Conditional Probability

A: a random variable B: a random variable

P(A | B) = P(AB)P(B)

COMP5331 32

Naïve Bayes Classifier Bayes Rule

A : a random variable B: a random variable

P(A | B) = P(B|A) P(A)P(B)

COMP5331 33

Naïve Bayes Classifier

Independent Assumption Each attribute are independent e.g.,

P(X, Y, Z | A) = P(X | A) x P(Y | A) x P(Z | A)

Race Income

Child Insurance

black

high no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

COMP5331 34

Naïve Bayes ClassifierRace Incom

eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no

P(Race = black | Yes) = ¼ For attribute Race,

P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼

P(Income = high | Yes) = ½ For attribute Income,

P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1

P(Child = yes | Yes) = ¾ For attribute Child,

P(Child = no | Yes) = ¼ P(Child = yes | No) = 0P(Child = no | No) = 1

Insurance = Yes

P(Yes) = ½ P(No) = ½

Race Income

Child Insurance

white

high no ?


P(Race = white, Income = high, Child = no| Yes)Naïve Bayes Classifier

= P(Race = white | Yes) x P(Income = high | Yes)x P(Child = no | Yes)

= ¾ x ½ x ¼ = 0.09375P(Race = white, Income = high, Child = no| No)

= P(Race = white | No) x P(Income = high | No)x P(Child = no | No)

= ¼ x 0 x 1= 0

COMP5331 35


eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no






P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1

Insurance = Yes

P(Yes) = ½ P(No) = ½

Race Income

Child Insurance

white

high no ?



= 0.09375P(Race = white, Income = high, Child = no| No)

= P(Race = white | No) x P(Income = high | No)x P(Child = no | No)

= ¼ x 0 x 1= 0

COMP5331 36


eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no







Insurance = Yes

P(Yes) = ½ P(No) = ½

Race Income

Child Insurance

white

high no ?



= 0.09375

P(Race = white, Income = high, Child = no| No)= P(Race = white | No) x P(Income = high | No)

x P(Child = no | No)= ¼ x 0 x 1= 0

COMP5331 37


eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no







Insurance = Yes

P(Yes) = ½ P(No) = ½

Race Income

Child Insurance

white

high no ?



= 0.09375

P(Race = white, Income = high, Child = no| No)

= 0

COMP5331 38


eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no







Insurance = Yes

P(Yes) = ½ P(No) = ½

Race Income

Child Insurance

white

high no ?



= 0.09375

P(Race = white, Income = high, Child = no| No)= 0

COMP5331 39


eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no







Insurance = Yes

P(Yes) = ½ P(No) = ½

Race Income

Child Insurance

white

high no ?



= 0.09375P(Race = white, Income = high, Child = no| No)= 0

P(Yes | Race = white, Income = high, Child = no)=P(Race = white, Income = high, Child = no| Yes) P(Yes)

P(Race = white, Income = high, Child = no)= 0.09375 x 0.5P(Race = white, Income = high, Child = no)= 0.046875P(Race = white, Income = high, Child = no)

P(Yes | Race = white, Income = high, Child = no)= 0.046875P(Race = white, Income = high, Child = no)

COMP5331 40


eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no







Insurance = Yes

P(Yes) = ½ P(No) = ½

Race Income

Child Insurance

white

high no ?



= 0.09375P(Race = white, Income = high, Child = no| No)= 0

P(No | Race = white, Income = high, Child = no)=P(Race = white, Income = high, Child = no| No) P(No)

P(Race = white, Income = high, Child = no)= 0 x 0.5P(Race = white, Income = high, Child = no)= 0P(Race = white, Income = high, Child = no)


P(No | Race = white, Income = high, Child = no)= 0P(Race = white, Income = high, Child = no)

COMP5331 41


eChild Insuranc

eblac

khigh no yes

white

high yes yes

white

low yes yes

white

low yes yes

black

low no no

black

low no no

black

low no no

white

low no no







Insurance = Yes

P(Yes) = ½ P(No) = ½

Race Income

Child Insurance

white

high no ?


Naïve Bayes Classifier


P(No | Race = white, Income = high, Child = no)= 0P(Race = white, Income = high, Child = no)

Since P(Yes | Race = white, Income = high, Child = no)> P(No | Race = white, Income = high, Child = no).we predict the following new person will buy an insurance.

Race Income

Child Insurance

white

high no ?

COMP5331 42

Bayesian Classifier Naïve Bayes Classifier Bayesian Belief Networks

COMP5331 43

Bayesian Belief Network Naïve Bayes Classifier

Independent Assumption Bayesian Belief Network

Do not have independent assumption

COMP5331 44

Bayesian Belief NetworkExercise Diet Heartburn Blood

PressureChest Pain

Heart Disease

Yes Healthy No High Yes NoNo Unhealthy Yes Low Yes NoNo Healthy Yes High No Yes… … … … … …

Yes/No Healthy/Unhealth

yYes/No Yes/No Yes/NoHigh/Low

Some attributes are dependent on other attributes.

e.g., doing exercises may reduce the probability of suffering from Heart Disease

Exercise (E)

Heart Disease

COMP5331 45

Bayesian Belief NetworkExercise (E) Diet (D)

Heart Disease (HD) Heartburn (Hb)

Blood Pressure (BP) Chest Pain (CP)

E = Yes0.7

D = Healthy0.25

HD=Yes

E=YesD=Healthy 0.25

E=YesD=Unhealthy

0.45

E=NoD=Healthy 0.55

E=NoD=Unhealthy

0.75

CP=Yes

HD=YesHb=Yes

0.8

HD=YesHb=No

0.6

HD=NoHb=Yes

0.4

HD=NoHb=No 0.1

BP=High

HD=Yes 0.85HD=No 0.2

Hb=YesD=Healthy 0.85D=Unhealthy 0.2

COMP5331 46

Bayesian Belief NetworkLet X, Y, Z be three random variables. X is said to be conditionally independent of Y given Z if the following holds. P(X | Y, Z) = P(X | Z)

Lemma: If X is conditionally independent of Y given Z, P(X, Y | Z) = P(X | Z) x P(Y | Z) ?

COMP5331 47

Bayesian Belief NetworkExercise (E) Diet (D)

Heart Disease (HD) Heartburn (Hb)

Blood Pressure (BP) Chest Pain (CP)

Let X, Y, Z be three random variables. X is said to be conditionally independent of Y given Z if the following holds. P(X | Y, Z) = P(X | Z)

e.g., P(BP = High | HD = Yes, D = Healthy) = P(BP = High | HD = Yes)

e.g., P(BP = High | HD = Yes, CP=Yes) = P(BP = High | HD = Yes) “BP = High” is conditionally independent of “D = Healthy” given “HD = Yes”

“BP = High” is conditionally independent of “CP = Yes” given “HD = Yes”

Property: A node is conditionally independent of its non-descendants if its parents are known.

COMP5331 48

Bayesian Belief NetworkExercise Diet Heartburn Blood

PressureChest Pain

Heart Disease

Yes Healthy No High Yes NoNo Unhealthy Yes Low Yes NoNo Healthy Yes High No Yes… … … … … …

Suppose there is a new person and I want to know whether he is likely to have Heart Disease.

Exercise Diet Heartburn Blood Pressure

Chest Pain

Heart Disease

? ? ? ? ? ?

Yes/No Healthy/Unhealth

yYes/No Yes/No Yes/NoHigh/Low


Chest Pain

Heart Disease

? ? ? High ? ?Exercise Diet Heartburn Blood

PressureChest Pain

Heart Disease

Yes Healthy ? High ? ?

COMP5331 49

Bayesian Belief NetworkSuppose there is a new person and I want to know whether he is likely to have Heart Disease.


Chest Pain

Heart Disease

? ? ? ? ? ?

P(HD = Yes) = x{Yes, No} y{Healthy, Unhealthy} P(HD=Yes|E=x, D=y) x P(E=x, D=y)= x{Yes, No} y{Healthy, Unhealthy} P(HD=Yes|E=x, D=y) x P(E=x) x P(D=y)

= 0.25 x 0.7 x 0.25 + 0.45 x 0.7 x 0.75 + 0.55 x 0.3 x 0.25+ 0.75 x 0.3 x 0.75

= 0.49P(HD = No) = 1- P(HD = Yes)

= 1- 0.49= 0.51

COMP5331 50



Chest Pain

Heart Disease

? ? ? High ? ?

P(BP = High) = x{Yes, No} P(BP = High|HD=x) x P(HD = x)= 0.85x0.49 + 0.2x0.51= 0.5185

P(HD = Yes|BP = High) =P(BP = High|HD=Yes) x P(HD = Yes)P(BP = High)

= 0.85 x 0.490.5185

= 0.8033P(HD = No|BP = High) = 1 – P(HD = Yes|BP = High)

= 1 – 0.8033= 0.1967

COMP5331 51



Chest Pain

Heart Disease

Yes Healthy ? High ? ?

P(HD = Yes | BP = High, D = Healthy, E = Yes)

=P(BP = High | HD = Yes, D = Healthy, E = Yes)P(BP = High | D = Healthy, E = Yes)

x P(HD = Yes|D = Healthy, E = Yes)

P(BP = High|HD = Yes) P(HD = Yes|D = Healthy, E = Yes)x {Yes, No} P(BP=High|HD=x) P(HD=x|D=Healthy, E=Yes)

=

0.85x0.250.85x0.25 + 0.2x0.75

=

= 0.5862P(HD = No | BP = High, D = Healthy, E = Yes)

= 1- P(HD = Yes | BP = High, D = Healthy, E = Yes)= 1-0.5862= 0.4138

COMP5331 52


COMP5331 53

Nearest Neighbor Classifier

Computer

History

100 4090 4520 95… … Computer

History

COMP5331 54


Computer

History Buy Book?

100 40 No (-)90 45 Yes (+)20 95 Yes (+)… … … Computer

History

++

+---

+

--

-+

COMP5331 55


Computer

History Buy Book?


History

++

+---

+

--

-+

Suppose there is a new person

Computer History Buy Book?

95 35 ?

Nearest Neighbor Classifier:Step 1: Find the nearest neighborStep 2: Use the “label” of this neighbor

COMP5331 56


Computer

History Buy Book?


History

++

+---

+

--

-+

Suppose there is a new person

Computer History Buy Book?

95 35 ?

k-Nearest Neighbor Classifier:Step 1: Find k nearest neighborsStep 2: Use the majority of the labels ofthe neighbors

Documents

COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong