24
Decision Tree Pruning

Decision Tree Pruning. Problem Statement We like to output small decision tree Model Selection The building is done until zero training error Option

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Decision Tree Pruning

Problem Statement

• We like to output small decision tree Model Selection

• The building is done until zero training error

• Option I : Stop Early Small decrease in index function Cons: may miss structure

• Option 2: Prune after building.

Pruning

• Input: tree T

• Sample: S

• Output: Tree T’

• Basic Pruning: T’ is a sub-tree of T Can only replace inner nodes by leaves

• More advanced: Replace an inner node by one of its children

Reduced Error Pruning

• Split the sample to two part S1 and S2

• Use S1 to build a tree.

• Use S2 to sample whether to prune.

• Process every inner node v After all its children has been process Compute the observed error of Tv and leaf(v)

If leaf(v) has less errors replace Tv by leaf(v)

Reduced Error Pruning: Example

Pruning: CV & SRM

• Generate for each pruning size compute the minimal error pruning At most m different sub-trees

• Select between the prunings Cross Validation Structural Risk Minimization Any other index method

Finding the minimum pruning

• Procedure Compute

• Inputs: k : number of errors T : tree S : sample

• Output: P : pruned tree size : size of P

Procedure compute

• IF IsLeaf(T) IF Errors(T) k

• THEN size=1

• ELSE size = P=T; return;

• IF Errors(root(T)) k size=1; P=root(T); return;

Procedure compute

• For i = 0 to k DO Call Compute(i, T[0], S0, sizei,0,Pi.0)

Call Compute(k-i, T[1], S1, sizei,1,Pi.1)

• size = minimum {sizei,0 + sizei,1 +1}

• I = arg min {sizei,0 + sizei,1 +1}

• P = MakeTree(root(T),PI,0, PI,1}

• What is the time complexity?

Cross Validation

• Split the sample S1 and S2

• Build a tree using S1

• Compute the candidate pruning

• Select using S2

• Output the tree with smallest error on S2

SRM

• Build a Tree T using S

• Compute the candidate pruning

• kd the size of the pruning with d errors

• Select using the SRM formula

})({minm

kTobs d

dd

Drawbacks

• Running time Since |T| = O(m) Running time O(m2) Many passes over the data

• Significant drawback for large data sets

Linear Time Pruning

• Single Bottom-up pass linear time

• Use SRM like formula Local soundness

• Competitiveness to any pruning

Algorithm

• Process a node after processing its children

• Local parameters: Tv current sub-tree at v, of size sizev

Sv sample reaching v, of size mv

lv length of path leading to v

• Local Test: obs(Tv,Sv) + a(mv,sizev,lv,) > obs (root(Tv),Sv)

obs

The function a()

• Parameters: paths(H,l) set of paths of length l over H. trees(H,s) set of trees of size s over H.

• Formula:

m

msizeHtreeslHpathsclsizema

)/log(|,(|log|),(|log),,,(

The function a()

• Finite Class H |paths(H,l)| < |H|l. |trees(H,s)| < (4|H|)s.

• Formula:

• Infinite Classes: VC-dim

m

mHsizelclsizema

)/log(||log)(),,,(

Example

lv =3

sizev

mv

m

a(mv,sizev,lv,)

Local uniform convergence

• Sample S Sc = { x S |c(x)=1}, mc=|Sc|

• Finite classes C and H e(h|c) = Pr[ h(x) f(x) | c(x)=1 ] obs(h|c)

• Lemma: with probability 1-

cm

HCchobsche

)/1log(||log||log|)|()|(|

Global Analysis

• Notation T original tree (depends on S) T* pruned tree Topt optimal tree

rv= (lv+sizev)log|H| +log (mv/)

a(mv,sizev,lv,) = O( sqrt{ rv/mv }) 1

Sub-Tree Property

• Lemma: with probability 1-T* is a sub-tree of Topt

• Proof: Assume the all the local lemmas hold. Each pruning reduces the error. Assume T* has a subtree outside Topt

Adding that subtree to Topt will improve it!

Comparing T* and Topt

• Additional pruned nodes: V={v1, … , vt}

• Additional error: e(T*) - e(Topt) = (e(vi)-e(T*

vi))Pr[vi]

• Claim: With high probability

4]Pr[4)()(1

*t

i v

viopt

i

i

m

rvTeTe

Analysis

• Lemma: With probability 1- If Pr[vi] > 12(lopt log |H|+ log 1/ )/m =b

THEN Pr[vi] > 2obs(vi)

• Proof: Relative Chernoff Bound. Union over |H|l paths.

• V’ = {vi V | Pr[vi]>b}

Analysis of

• Sum over V-V’ bounded by sopt b

''

]Pr[]Pr[VVv v

vi

Vv v

vi

i i

i

i i

i

m

rv

m

rv

Analysis of

• Sum of mv < loptsizeopt

• Sum of rv <sizeopt(lopt log |H|+ log m/ )

• Putting it all together

))((2

]Pr['''

Vv

vVv

vVv v

vi

i

i

i

i

i i

i mrmm

rv