View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Problem Statement
• We like to output small decision tree Model Selection
• The building is done until zero training error
• Option I : Stop Early Small decrease in index function Cons: may miss structure
• Option 2: Prune after building.
Pruning
• Input: tree T
• Sample: S
• Output: Tree T’
• Basic Pruning: T’ is a sub-tree of T Can only replace inner nodes by leaves
• More advanced: Replace an inner node by one of its children
Reduced Error Pruning
• Split the sample to two part S1 and S2
• Use S1 to build a tree.
• Use S2 to sample whether to prune.
• Process every inner node v After all its children has been process Compute the observed error of Tv and leaf(v)
If leaf(v) has less errors replace Tv by leaf(v)
Pruning: CV & SRM
• Generate for each pruning size compute the minimal error pruning At most m different sub-trees
• Select between the prunings Cross Validation Structural Risk Minimization Any other index method
Finding the minimum pruning
• Procedure Compute
• Inputs: k : number of errors T : tree S : sample
• Output: P : pruned tree size : size of P
Procedure compute
• IF IsLeaf(T) IF Errors(T) k
• THEN size=1
• ELSE size = P=T; return;
• IF Errors(root(T)) k size=1; P=root(T); return;
Procedure compute
• For i = 0 to k DO Call Compute(i, T[0], S0, sizei,0,Pi.0)
Call Compute(k-i, T[1], S1, sizei,1,Pi.1)
• size = minimum {sizei,0 + sizei,1 +1}
• I = arg min {sizei,0 + sizei,1 +1}
• P = MakeTree(root(T),PI,0, PI,1}
• What is the time complexity?
Cross Validation
• Split the sample S1 and S2
• Build a tree using S1
• Compute the candidate pruning
• Select using S2
• Output the tree with smallest error on S2
SRM
• Build a Tree T using S
• Compute the candidate pruning
• kd the size of the pruning with d errors
• Select using the SRM formula
})({minm
kTobs d
dd
Drawbacks
• Running time Since |T| = O(m) Running time O(m2) Many passes over the data
• Significant drawback for large data sets
Linear Time Pruning
• Single Bottom-up pass linear time
• Use SRM like formula Local soundness
• Competitiveness to any pruning
Algorithm
• Process a node after processing its children
• Local parameters: Tv current sub-tree at v, of size sizev
Sv sample reaching v, of size mv
lv length of path leading to v
• Local Test: obs(Tv,Sv) + a(mv,sizev,lv,) > obs (root(Tv),Sv)
obs
The function a()
• Parameters: paths(H,l) set of paths of length l over H. trees(H,s) set of trees of size s over H.
• Formula:
m
msizeHtreeslHpathsclsizema
)/log(|,(|log|),(|log),,,(
The function a()
• Finite Class H |paths(H,l)| < |H|l. |trees(H,s)| < (4|H|)s.
• Formula:
• Infinite Classes: VC-dim
m
mHsizelclsizema
)/log(||log)(),,,(
Local uniform convergence
• Sample S Sc = { x S |c(x)=1}, mc=|Sc|
• Finite classes C and H e(h|c) = Pr[ h(x) f(x) | c(x)=1 ] obs(h|c)
• Lemma: with probability 1-
cm
HCchobsche
)/1log(||log||log|)|()|(|
Global Analysis
• Notation T original tree (depends on S) T* pruned tree Topt optimal tree
rv= (lv+sizev)log|H| +log (mv/)
a(mv,sizev,lv,) = O( sqrt{ rv/mv }) 1
Sub-Tree Property
• Lemma: with probability 1-T* is a sub-tree of Topt
• Proof: Assume the all the local lemmas hold. Each pruning reduces the error. Assume T* has a subtree outside Topt
Adding that subtree to Topt will improve it!
Comparing T* and Topt
• Additional pruned nodes: V={v1, … , vt}
• Additional error: e(T*) - e(Topt) = (e(vi)-e(T*
vi))Pr[vi]
• Claim: With high probability
4]Pr[4)()(1
*t
i v
viopt
i
i
m
rvTeTe
Analysis
• Lemma: With probability 1- If Pr[vi] > 12(lopt log |H|+ log 1/ )/m =b
THEN Pr[vi] > 2obs(vi)
• Proof: Relative Chernoff Bound. Union over |H|l paths.
• V’ = {vi V | Pr[vi]>b}