11 Decision Trees - TU Braunschweig · 2009. 7. 10. · Microsoft PowerPoint - Solutions Ex11...

Preview:

Citation preview

• Exercise 1.1: Decision Tree– I(9, 6) = 0.97

• Calculate gain for allthe attributes

11 Decision Trees

Age Has job Owns house Credit rating Approve loan

Young False False Fair No

Young False False Good No

Young True False Good Yes

Young True True Fair Yes

Young False False Fair No

the attributes– Age:

• E(Age) = 5/15 * I(2, 3)+5/15 * I(3, 2)+5/15 * I(4, 1)

= 0.88

• Gain(Age) = 0.97 – 0.88= 0.09

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 1

Middle False False Fair No

Middle False False Good No

Middle True True Good Yes

Middle False True Excellent Yes

Middle False True Excellent Yes

Old False True Excellent Yes

Old False True Good Yes

Old True False Good Yes

Old True False Excellent Yes

Old False False Fair No

– Has Job:

• E(..) = 10/15 * I(4, 6)+5/15 * I(5, 0)= 0.64

• Gain(..) = 0.97 – 0.64

11 Decision Trees

Age Has job Owns house Credit rating Approve loan

Young False False Fair No

Young False False Good No

Young True False Good Yes

Young True True Fair Yes

Young False False Fair No• Gain(..) = 0.97 – 0.64= 0.33

– Owns house:

• E(..) = 9/15 * I(3, 6)+6/15 * I(6, 0)= 0.54

• Gain(..) = 0.97 – 0.54= 0.43

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2

Middle False False Fair No

Middle False False Good No

Middle True True Good Yes

Middle False True Excellent Yes

Middle False True Excellent Yes

Old False True Excellent Yes

Old False True Good Yes

Old True False Good Yes

Old True False Excellent Yes

Old False False Fair No

– Credit rating:

• E(..) = 5/15 * I(1, 4)+6/15 * I(4, 2)+4/15 * I(4, 0)= 0.60

11 Decision Trees

Age Has job Owns house Credit rating Approve loan

Young False False Fair No

Young False False Good No

Young True False Good Yes

Young True True Fair Yes

Young False False Fair No

• Gain(..) = 0.97 – 0.60= 0.37

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 3

Middle False False Fair No

Middle False False Good No

Middle True True Good Yes

Middle False True Excellent Yes

Middle False True Excellent Yes

Old False True Excellent Yes

Old False True Good Yes

Old True False Good Yes

Old True False Excellent Yes

Old False False Fair No

– Split through Own_house

11 Decision Trees

Owns_house?

false truetrue

Age Has job Credit rating Approve loan

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 4

Age Has job Credit rating Approve loan

Young True Fair Yes

Middle True Good Yes

Middle False Excellent Yes

Middle False Excellent Yes

Old False Excellent Yes

Old False Good Yes

Age Has job Credit rating Approve loan

Young False Fair No

Young False Good No

Young True Good Yes

Young False Fair No

Middle False Fair No

Middle False Good No

Old True Good Yes

Old True Excellent Yes

Old False Fair No

– I(3, 6) = 0.91

– Age:

• E(Age) = 4/9 * I(1, 3)+2/9 * I(0, 2)+3/9 * I(2, 1)

11 Decision TreesOwns_house?

false

Age Has job Credit rating Approve loan

Young False Fair No

Young False Good No

Young True Good Yes

Young False Fair No

Middle False Fair No3/9 * I(2, 1)= 0.66

• Gain(Age) = 0.91 – 0.66= 0.25

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 5

Middle False Fair No

Middle False Good No

Old True Good Yes

Old True Excellent Yes

Old False Fair No

– Has Job:

• E(…) = 6/9 * I(0, 6)+3/9 * I(3, 0)= 0

• Gain(…) = 0.91

11 Decision TreesOwns_house?

false

Age Has job Credit rating Approve loan

Young False Fair No

Young False Good No

Young True Good Yes

Young False Fair No

Middle False Fair No• Gain(…) = 0.91

• We can stop and split throughHas job

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 6

Middle False Fair No

Middle False Good No

Old True Good Yes

Old True Excellent Yes

Old False Fair No

– Split through Has job

11 Decision TreesOwns_house?

false

Age Has job Credit rating Approve loan

Young False Fair No

Young False Good No

Has Job?

Age Has job Credit rating Approve loan

Young True Good Yes

false true

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 7

Young False Good No

Young False Fair No

Middle False Fair No

Middle False Good No

Old False Fair No

Old True Good Yes

Old True Excellent Yes

Owns_house?false

Has Job?

false true

yesno

– Recur into the right side

• Stop condition met

11 Decision Trees

Owns_house?

truetrue

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 8

Age Has job Credit rating Approve loan

Young True Fair Yes

Middle True Good Yes

Middle False Excellent Yes

Middle False Excellent Yes

Old False Excellent Yes

Old False Good Yes

Owns_house?

true

yes

– Resulting tree

11 Decision Trees

true

yes

Owns_house?false

Has Job?

false true

yesno

– X = Senior person with job, doesn’t own a house and has good credit rating

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 9

true

yes

Owns_house?false

Has Job?

false true

yesno

• Exercise 1.2: Naïve Bayesian Classification

– P(p) = 9/15

– P(n) = 6/15

11 Decision Trees

Age Has job Owns house Credit rating Approve loan

Young False False Fair No

Young False False Good No

Young True False Good Yes

Young True True Fair Yes

Young False False Fair No

Age attribute

P(youth|p) = 2/9 P(youth|n) = 3/6

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 10

Middle False False Fair No

Middle False False Good No

Middle True True Good Yes

Middle False True Excellent Yes

Middle False True Excellent Yes

Old False True Excellent Yes

Old False True Good Yes

Old True False Good Yes

Old True False Excellent Yes

Old False False Fair No

P(middle|p) = 3/9 P(middle|n) = 2/6

P(senior|p) = 4/9 P(senior|n) = 1/6

Has Job

P(true|p) = 5/9 P(true|n) = 0/6

P(false|p) = 4/9 P(false|n) = 6/6

Owns house

P(true|p) = 6/9 P(true|n) = 0/6

P(false|p) = 3/9 P(false|n) = 6/6

• Exercise 1.2: Naïve Bayesian Classification

11 Decision Trees

Age Has job Owns house Credit rating Approve loan

Young False False Fair No

Young False False Good No

Young True False Good Yes

Young True True Fair Yes

Young False False Fair No

Credit rating

P(fair|p) = 1/9 P(fair|n) = 4/6

P(good|p) = 4/9 P(good|n) = 2/6

P(excellent|p) = 4/9 P(excellent|n) = 0/6

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 11

Middle False False Fair No

Middle False False Good No

Middle True True Good Yes

Middle False True Excellent Yes

Middle False True Excellent Yes

Old False True Excellent Yes

Old False True Good Yes

Old True False Good Yes

Old True False Excellent Yes

Old False False Fair No

• Exercise 1.2: Naïve Bayesian Classification

– New data:

• X = Senior person with job, doesn’t own a house and hasgood credit rating

11 Decision Trees

Age attribute

P(youth|p) = 2/9 P(youth|n) = 3/6

P(middle|p) = 3/9 P(middle|n) = 2/6

P(senior|p) = 4/9 P(senior|n) = 1/6

Has Job• P(X|p)*P(p) = 4/9 * 5/9 * 3/9

* 4/9 * 9/15= 0.021

• P(X|n)*P(n) = 1/6 * 0 * … = 0

• 0.02 > 0 So X gets thecredit

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 12

Credit rating

P(fair|p) = 1/9 P(fair|n) = 4/6

P(good|p) = 4/9 P(good|n) = 2/6

P(excellent|p) = 4/9 P(excellent|n) = 0/6

Has Job

P(true|p) = 5/9 P(true|n) = 0/6

P(false|p) = 4/9 P(false|n) = 6/6

Owns house

P(true|p) = 6/9 P(true|n) = 0/6

P(false|p) = 3/9 P(false|n) = 6/6

Recommended