14
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014

Decision(Tree(Learning - Indian Statistical Institutedebapriyo/teaching/datamining2014/slides/... · – Decision tree building by Gini’s index ! Iterative Dichotomizer 3 (ID3)

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Decision(Tree(Learning - Indian Statistical Institutedebapriyo/teaching/datamining2014/slides/... · – Decision tree building by Gini’s index ! Iterative Dichotomizer 3 (ID3)

Decision  Tree  Learning  

Debapriyo Majumdar Data Mining – Fall 2014

Indian Statistical Institute Kolkata

August 25, 2014

Page 2: Decision(Tree(Learning - Indian Statistical Institutedebapriyo/teaching/datamining2014/slides/... · – Decision tree building by Gini’s index ! Iterative Dichotomizer 3 (ID3)

Example:  Age,  Income  and  Owning  a  flat  

2  

0  

50  

100  

150  

200  

250  

0   10   20   30   40   50   60   70  

Mon

thly  income  

(tho

usand  rupe

es)  

Age  

Training  set  •  Owns  a  

house  

•  Does  not  own  a  house  

§  If the training data was as above –  Could we define some simple rules by observation?

§  Any point above the line L1 à Owns a house §  Any point to the right of L2 à Owns a house §  Any other point à Does not own a house

L1

L2

Page 3: Decision(Tree(Learning - Indian Statistical Institutedebapriyo/teaching/datamining2014/slides/... · – Decision tree building by Gini’s index ! Iterative Dichotomizer 3 (ID3)

Example:  Age,  Income  and  Owning  a  flat  

3  

0  

50  

100  

150  

200  

250  

0   10   20   30   40   50   60   70  

Mon

thly  income  

(tho

usand  rupe

es)  

Age  

Training  set  •  Owns  a  

house  

•  Does  not  own  a  house  

L1

L2

Root node: Split at Income = 101

Income ≥ 101: Label = Yes

Income < 101: Split at Age = 54

Age ≥ 54: Label = Yes

Age < 54: Label = No

In general, the data

won’t be such as above

Page 4: Decision(Tree(Learning - Indian Statistical Institutedebapriyo/teaching/datamining2014/slides/... · – Decision tree building by Gini’s index ! Iterative Dichotomizer 3 (ID3)

Example:  Age,  Income  and  Owning  a  flat  

4  

0  

50  

100  

150  

200  

250  

0   10   20   30   40   50   60   70  

Mon

thly  income  

(tho

usand  rupe

es)  

Age  

Training  set  •  Owns  a  

house  

•  Does  not  own  a  house  

§  Approach: recursively split the data into partitions so that each partition becomes purer till …

How to decide the split?

How to measure purity? When to stop?

Page 5: Decision(Tree(Learning - Indian Statistical Institutedebapriyo/teaching/datamining2014/slides/... · – Decision tree building by Gini’s index ! Iterative Dichotomizer 3 (ID3)

Approach  for  spliKng  §  What are the possible lines for splitting? –  For each variable, midpoints between pairs of consecutive

values for the variable –  How many? –  If N = number of points in training set and m = number of

variables –  About O(N × m)

§  How to choose which line to use for splitting? –  The line which reduce impurity (~ heterogeneity of

composition) the most

§  How to measure impurity?

5  

Page 6: Decision(Tree(Learning - Indian Statistical Institutedebapriyo/teaching/datamining2014/slides/... · – Decision tree building by Gini’s index ! Iterative Dichotomizer 3 (ID3)

Gini  Index  for  Measuring  Impurity  §  Suppose there are C classes §  Let p(i|t) = fraction of observations belonging to class

i in rectangle (node) t §  Gini index:

6  

Gini(t) =1− p(i | t)2i=1

C

§  If all observations in t belong to one single class Gini(t) = 0

§  When is Gini(t) maximum?

Page 7: Decision(Tree(Learning - Indian Statistical Institutedebapriyo/teaching/datamining2014/slides/... · – Decision tree building by Gini’s index ! Iterative Dichotomizer 3 (ID3)

Entropy  §  Average amount of information contained §  From another point of view – average amount of

information expected – hence amount of uncertainty –  We will study this in more detail later

§  Entropy:

7  

Entropy(t) = − p(i | t)× log2i=1

C

∑ p(i | t)

Where 0 log20 is defined to be 0

Page 8: Decision(Tree(Learning - Indian Statistical Institutedebapriyo/teaching/datamining2014/slides/... · – Decision tree building by Gini’s index ! Iterative Dichotomizer 3 (ID3)

ClassificaOon  Error  §  What if we stop the tree building at a

node –  That is, do not create any further branches

for that node –  Make that node a leaf –  Classify the node with the most frequent

class present in the node

§  Classification error as measure of impurity

8  

This rectangle (node) is still impure

ClassificationError(t) =1−maxi[p(i | t)]

§  Intuitively – the impurity of the most frequent class in the rectangle (node)

Page 9: Decision(Tree(Learning - Indian Statistical Institutedebapriyo/teaching/datamining2014/slides/... · – Decision tree building by Gini’s index ! Iterative Dichotomizer 3 (ID3)

The  Full  Blown  Tree  §  Recursive splitting §  Suppose we don’t stop until

all nodes are pure §  A large decision tree with

leaf nodes having very few data points –  Does not represent classes well –  Overfitting

§  Solution: –  Stop earlier, or –  Prune back the tree

9  

Root  1000  

400   600  

200   200   240  160  

2   1   5  

Number  of  points  

StaOsOcally  not  significant  

Page 10: Decision(Tree(Learning - Indian Statistical Institutedebapriyo/teaching/datamining2014/slides/... · – Decision tree building by Gini’s index ! Iterative Dichotomizer 3 (ID3)

Prune  back  §  Pruning step: collapse leaf

nodes and make the immediate parent a leaf node

§  Effect of pruning –  Lose purity of nodes –  But were they really pure or was

that a noise? –  Too many nodes ≈ noise

§  Trade-off between loss of purity and gain in complexity

10  

Leaf  node  (label  =  Y)  Freq  =  5  

Decision  node  

(Freq  =  7)  

Leaf  node  (label  =  B)  Freq  =  2  

Leaf  node  (label  =  Y)  Freq  =  7  

Prune  

Page 11: Decision(Tree(Learning - Indian Statistical Institutedebapriyo/teaching/datamining2014/slides/... · – Decision tree building by Gini’s index ! Iterative Dichotomizer 3 (ID3)

Prune  back:  cost  complexity  §  Cost complexity of a (sub)tree: §  Classification error (based on

training data) and a penalty for size of the tree

11  

Leaf  node  (label  =  Y)  Freq  =  5  

Decision  node  

(Freq  =  7)  

Leaf  node  (label  =  B)  Freq  =  2  

Leaf  node  (label  =  Y)  Freq  =  7  

Prune  

tradeoff (T ) = Err(T )+α L(T )

§  Err(T) is the classification error §  L(T) = number of leaves in T §  Penalty factor α is between 0

and 1 –  If α=0, no penalty for bigger tree

Page 12: Decision(Tree(Learning - Indian Statistical Institutedebapriyo/teaching/datamining2014/slides/... · – Decision tree building by Gini’s index ! Iterative Dichotomizer 3 (ID3)

Different  Decision  Tree  Algorithms  §  Chi-square Automatic Interaction Detector (CHAID)

–  Gordon Kass (1980) –  Stop subtree creation if not statistically significant by chi-square test

§  Classification and Regression Trees (CART) –  Breiman et al. –  Decision tree building by Gini’s index

§  Iterative Dichotomizer 3 (ID3) –  Ross Quinlan (1986) –  Splitting by information gain (difference in entropy)

§  C4.5 –  Quinlan’s next algorithm, improved over ID3 –  Bottom up pruning, both categorical and continuous variables –  Handling of incomplete data points

§  C5.0 –  Ross Quinlan’s commercial version

12  

Page 13: Decision(Tree(Learning - Indian Statistical Institutedebapriyo/teaching/datamining2014/slides/... · – Decision tree building by Gini’s index ! Iterative Dichotomizer 3 (ID3)

ProperOes  of  Decision  Trees    §  Non parametric approach –  Does not require any prior assumptions regarding the

probability distribution of the class and attributes §  Finding an optimal decision tree is an NP-complete

problem –  Heuristics used: greedy, recursive partitioning, top-down,

bottom-up pruning §  Fast to generate, fast to classify §  Easy to interpret or visualize §  Error propagation –  An error at the top of the tree propagates all the way down

13  

Page 14: Decision(Tree(Learning - Indian Statistical Institutedebapriyo/teaching/datamining2014/slides/... · – Decision tree building by Gini’s index ! Iterative Dichotomizer 3 (ID3)

References  §  Introduction to Data Mining, by Tan, Steinbach,

Kumar –  Chapter 4 is available online:

http://www-users.cs.umn.edu/~kumar/dmbook/ch4.pdf

14