Upload
xueping-peng
View
274
Download
2
Embed Size (px)
Citation preview
Decision Tree(ID3)Xueping Peng
Outline What is decision tree How to Use Decision Tree How to Generate a Decision Tree Sum Up and Some Drawbacks
What is decision tree(1/3)
Decision tree is a hierarchical tree structure that used to classify classes based on a series of questions (or rules) about the attributes of the class.
The attributes of the classes can be any type of variables from binary, nominal, ordinal, and quantitative values.
The classes must be qualitative type (categorical or binary, or ordinal).
In short, given a data of attributes together with its classes, a decision tree produces a sequence of rules (or series of questions) that can be used to recognize the class.
What is decision tree(2/3)Attributes Classes
Gender Car Ownership Travel Cost ($)/km
Income Level Transportation Mode
Male 0 Cheap Low Bus
Male 1 Cheap Medium Bus
Female 1 Cheap Medium Train
Female 0 Cheap Low Bus
Male 1 Cheap Medium Bus
Female 0 Standard Medium Train
Female 1 Standard Medium Train
Female 1 Expensive High Car
Male 2 Expensive Medium Car
Female 2 Expensive High Car
What is decision tree(3/3)
How to Use Decision Tree
Person Name Gender Car Ownership
Travel Cost ($)/km Income Level Transportation Level
Alex Male 1 Standard High ?
Buddy Male 0 Cheap Medium ?
Cherry Female 1 Cheap High ?
Test Data
What transportation mode would Alex, Buddy and Cheery use?AlexBudd
yCherry
How to Generate a Decision Tree(1/13) Description of ID3
How to Generate a Decision Tree(2/13)
Which is the best choice? We have 29 positive examples and 35 negative ones Should I use attribute 1 or attribute 2 in this iteration of the node?
How to Generate a Decision Tree(3/13)
Use Entropy to Measure Degree of Impurity Entropy
All above formulas contain values of probability of Pj a class j.
How to Generate a Decision Tree(4/13) What does Entropy mean?
Entropy is the minimum number of bits needed to encode the classification of a member of S randomly drawn. P+ = 1, the receiver knows the class, no message sent, Entropy=0. P+ = 0.5, 1 bit needed.
Optimal length code assigns –log2p to message having probability p The idea behind is to assign shorter codes to the more probable
messages and longer codes to less likely examples Thus,the expected number of bits to encode + or – of random
member of S is: H(S) = p+ (-log2 p+) + p-(-log2 p-)
How to Generate a Decision Tree(5/13)
Information Gain Measures the expected reduction in entropy caused by
partitioning the examples according to the given attribute
IG(S|A): the number of bits saved when encoding the target value of an arbitrary member of S, knowing the value of attribute A.
Expected reduction in entropy caused by knowing the value of A IG(S|A) = H(S) – Σj Prob(A=vj) H(Y | A = vj)
How to Generate a Decision Tree(6/13)
Which is the best choice? We have 29 positive examples and 35 negative ones Should I use attribute 0 or attribute 2 in this iteration of the node?
IG(A1) = 0.993 – 26/64*0.70 – 36/64*0.74 = 0.292IG(A2) = 0.993 – 51/64*0.93 – 13/64*0.61 = 0.128
How to Generate a Decision Tree(7/13)
Specific Conditional Entropy H(Y|X=v) Y is class, X is attribute and v is value of X H(Y |X=v) = The entropy of Y among only those records in which X has value v H(Class|Travel Cost=Cheap) =-0.8*log20.8 - 0.2*log20.2 = 0.722
H(Class|Travel Cost=Expensive) =-1*log21 = 0
H(Class|Travel Cost=Standard) =-1*log21 = 0
How to Generate a Decision Tree(8/13)
Conditional Entropy H(Y|X) H(Y |X) = The average specific conditional entropy of Y=
Σj Prob(X=vj) H(Y | X = vj)
e.g. H(Class|Travel Cost) = prob(Travel Cost=Cheap) * H(Class|Travel Cost=Cheap) + prob(Travel Cost=Expensive) * H(Class|Travel Cost=Expensive) +prob(Travel Cost=Standard) * H(Class|Travel Cost=Standard)
= 0.5 * 0.722 + 0.2 * 0 + 0.3 * 0 = 0.361
How to Generate a Decision Tree(9/13)
Information Gain IG(Y|X) IG(Y|X) = H(Y) - H(Y | X) e.g.
H(Class) = – 0.4 log2 (0.4) – 0.3 log2 (0.3) – 0.3 log2 (0.3) = 1.571 IG(Class|Travel Cost) = H(Class) – H(Class|Travel Cost)
1.571 – 0.361 = 1.210
Results of first iterationGain Gender Car
OwnershipTravel Cost ($)/km Income Level
IG 0.125 0.534 1.210 0.695
How to Generate a Decision Tree(10/13)
Root Node
Split Node
How to Generate a Decision Tree(11/13)
Second Iteration
How to Generate a Decision Tree(12/13)
Results of Second Iteration
Split Node
Update Decision Tree
Gain Gender Car Ownership
Income Level
IG 0.322 0.171 0.171
How to Generate a Decision Tree(13/13)
Third Iteration
Update Decision Tree
To Sum Up ID3 is a strong system that
Uses hill-climbing search based on the information gain measure to search through the space of decision trees
Outputs a single hypothesis Never backtracks.It converges to locally optimal solutions Uses all training examples at each step, contrary to methods that
make decisions incrementally Uses statistical properties of all examples:the search is less
sensitive to errors in individual training examples
Some Drawbacks It can only deal with nominal data It may be not robust in presence of noise It is not able to deal with noisy data sets
References Tutorial on Decision Tree,
http://people.revoledu.com/kardi/tutorial/DecisionTree/index.html
Information Gain, http://www.autonlab.org/tutorials/infogain11.pdf
http://www.slideshare.net/aorriols/lecture5-c45