34
Decision Trees Sewoong Oh CSE/STAT 416 University of Washington

Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

Decision Trees

Sewoong Oh

CSE/STAT 416University of Washington

Page 2: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

!2

Page 3: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

Decision trees: An interpretable predictor

!3

Page 4: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

Example: predicting potential loan defaults• Data: discrete for now• Goal: Given a new loan application,

predict the whether he will default on the loan

• Learning: fitting a model to data• Inference: making prediction using

a learned model

!4

Credit Term Income yexcellent 3 yrs high safe

fair 5 yrs low riskyfair 3 yrs high safepoor 5 yrs high risky

excellent 3 yrs low riskyfair 5 yrs low safepoor 3 yrs high riskypoor 5 yrs low safefair 3 yrs high safe

Did I pay previous loans on time?

Example: excellent, good, or fair

What’s my income?

Example:>$80K per year

How soon do I need to pay the loan?

Example: 3 years, 5 years,…

predictor f(x) ⇡ y<latexit sha1_base64="ibr5p71ydrEdcrNs+VWmFWXdR6I=">AAACAnicbVC7TsMwFHV4lvIKMCEWixapLFVSBhgrWBiLRB9SG1WO47RWndiyHdQoqlj4FRYGEGLlK9j4G9w2A7Sc6eicc3XvPb5gVGnH+bZWVtfWNzYLW8Xtnd29ffvgsKV4IjFpYs647PhIEUZj0tRUM9IRkqDIZ6Ttj26mfvuBSEV5fK9TQbwIDWIaUoy0kfr2sUkHFGsuYTmsjM97SAjJxzAt9+2SU3VmgMvEzUkJ5Gj07a9ewHESkVhjhpTquo7QXoakppiRSbGXKCIQHqEB6Roao4goL5u9MIFnRglgaM4IeazhTP09kaFIqTTyTTJCeqgWvan4n9dNdHjlZTQWiSYxni8KEwY1h9M+YEAlwZqlhiAsqbkV4iGSCGvTWtGU4C6+vExatap7Ua3d1Ur167yOAjgBp6ACXHAJ6uAWNEATYPAInsEreLOerBfr3fqYR1esfOYI/IH1+QMg15aZ</latexit>

Page 5: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

Decision tree

f(poor credit, high income, 3 year) = ?!5

Start

Credit?

Safe

excellent

Income?

poor

Term?

Risky Safe

fair

5 years3 years

Risky

Low

Term?

Risky Safe

high

5 years3 years

• Each internal node tests a feature x[i]• Each branch assigns a feature value x[i]=fair

(or a subset of feature values {fair,poor})• Each leaf node assigns a class y • To predict, traverse the tree from root to a leaf

• Decision trees are naturally human interpretable!

Page 6: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

What functions can be represented?• For discrete input and output data, any function of the

input can be represented as a decision tree• However, in general, it could require exponentially many

nodes to represent a arbitrary function (exponential in the dimension of the input)

• For example, a function that is sensitive to a small change in the input

!6

x[1] x[2] x[3] Y0 0 0 00 0 1 10 1 0 11 0 0 10 1 1 01 0 1 01 1 0 01 1 1 1

Parity function x[1]

x[2] x[2]

x[3] x[3] x[3] x[3]

0 1 1 0 1 0 0 1

0 1x[1] x[2] x[3] Y0 0 0 10 0 1 10 1 0 11 0 0 00 1 1 11 0 1 01 1 0 01 1 1 0

Simple functionx[1]

0 11 0

Page 7: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

Which tree is better?• Trade off between Accuracy vs. Simplicity • Accuracy is measured by

• Simplicity can be measured by depth, number of leaves, etc

• If you have just the root node, for your decision tree, what should the decision be, and what is the accuracy and what is the error?

!7

# of correct predictions

# of examples<latexit sha1_base64="AMSq3+p1e9d388z/SMphdKIsOXI=">AAACJ3icbVBNSwMxFMz6WetX1aOXYBE8lV0V9CRFLx4r2FbolpJN39ZgdrMkb6Vl6b/x4l/xIqiIHv0nZtsebOtAYJiZl+RNkEhh0HW/nYXFpeWV1cJacX1jc2u7tLPbMCrVHOpcSaXvAmZAihjqKFDCXaKBRYGEZvBwlfvNR9BGqPgWBwm0I9aLRSg4Qyt1Shd+qBnPfIQ+Zn6ZqpBypTVwpPaeruB5zAyHUwnosyiRYOVOqexW3BHoPPEmpEwmqHVKb35X8TSCGLlkxrQ8N8F2xjQKLmFY9FMDCeMPrActS2MWgWlnoz2H9NAqXRoqbU+MdKT+nchYZMwgCmwyYnhvZr1c/M9rpRietzMRJylCzMcPhamkqGheGu2KvBE5sIRxLexfKb9ntji01RZtCd7syvOkcVzxTirHN6fl6uWkjgLZJwfkiHjkjFTJNamROuHkibyQd/LhPDuvzqfzNY4uOJOZPTIF5+cXTPancg==</latexit>

(all data)

Loan status: Safe Risky

# of Safe loans

22

# of Risky loans

18

N = 40 examples

Root22 18

Loan status: Safe Risky

# of Safe loans# of Risky loans

Training Data Visualizing a decision tree and data

Page 8: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

Decision stump: single level tree

• We grow the tree, by adding one more level of branching,and deciding which hypothesis (or feature) to test at the branch

• At the intermediate node, the prediction is determined by the majority rule• In a greedy approach, we choose the hypothesis that gives better

accuracy at the intermediate nodes; credit: 32/40=0.8, term: 30/40=0.75!8

Root22 18

excellent9 0

fair9 4

poor4 14

Loan status:Safe Risky

Credit?

Choice 1: Split on CreditRoot

22 18

3 years16 4

5 years6 14

Loan status:Safe Risky

Term?

Choice 2: Split on Term

RiskySafeSafe Safe Risky

Intermediate nodes Internal nodes

Root

Page 9: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

Greedy algorithm for growing a decision tree• Start with the root node as an intermediate node• Repeat if there exists an intermediate node• Choose a feature x[i] to split at the intermediate node

that maximizes the accuracy• Change the intermediate node into a internal node

branching on x[i]• Add intermediate nodes to each branch• If a intermediate node meets the stopping rule, change it

to a leaf node and make a prediction

• Stopping rule: • 1. Do not branch if at that intermediate node,

all data have the same label (perfect prediction)• 2. Do not branch if no feature left to branch

!9

Page 10: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

Greedy approach

!10

Root22 18

excellent9 0

fair9 4

poor4 14

Loan status:Safe Risky

Credit?

Safe All data points are Safe ènothing else to do with this subset of data

Page 11: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

Greedy approach

!11

Root22 18

excellent9 0

fair9 4

poor4 14

Loan status:Safe Risky

Credit?

SafeBuild decision stump with

subset of data where Credit = poor

Build decision stump with subset of data where

Credit = fair

Page 12: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

End of second level

!12

Root22 18

Loan status:Safe Risky

Credit?

excellent9 0

fair9 4

poor4 14

Safe

3 years0 4

5 years9 0

Term?

Risky Safe

Build another stumpthese data points

high4 5

Low0 9

Income?

Risky

Page 13: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

Final decision tree

• Branching only increases the accuracy (and decreases the error)• The final accuracy is 37/40• Why not branch further?

!13

Root22 18

excellent9 0 Fair

9 4

poor4 14

Loan status:Safe Risky

Credit?

Safe

5 years9 0

3 years0 4

Term?

Risky Safe

low0 9

high4 5

Income?

5 years4 3

3 years0 2

Term?

Risky Safe

Risky

Page 14: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

Another potential early stopping rule• If a branching does not increase accuracy, should we still

branch?

!14

x[1] x[2] yFALSE FALSE FALSE

FALSE TRUE TRUE

TRUE FALSE TRUE

TRUE TRUE FALSE

Root2 2

Tree Classification error(root) 0.5

XOR

Page 15: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

!15

Another potential early stopping rule• If a branching does not increase accuracy, should we still

branch?

Tree Classification error(root) 0.5

Split on x[1] 0.5

x[1] x[2] yFALSE FALSE FALSE

FALSE TRUE TRUE

TRUE FALSE TRUE

TRUE TRUE FALSE

Root2 2

True1 1

False1 1

x[1]

Page 16: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

!16

Another potential early stopping rule• If a branching does not increase accuracy, should we still

branch? Yes.

x[1] x[2] yFALSE FALSE FALSE

FALSE TRUE TRUE

TRUE FALSE TRUE

TRUE TRUE FALSE

Root2 2

True1 1

False1 1

x[1]

True0 1

x[2]

True1 0

False1 0

x[2]

False0 1

True FalseFalse True

Tree Classification error

(root) 0.5Split on x[1] 0.5

Split on x[1],x[2] 0

Page 17: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

Decision trees on real valued data

!17

Page 18: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

Binary branching on real valued data

!18

Income Credit Term y$105 K excellent 3 yrs Safe$112 K good 5 yrs Risky$73 K fair 3 yrs Safe$69 K excellent 5 yrs Safe$217 K excellent 3 yrs Risky$120 K good 5 yrs Safe$64 K fair 3 yrs Risky$340 K excellent 5 yrs Safe$60 K good 3 yrs Risky

Root22 18

Loan status:Safe Risky

< $60K8 13

>= $60K14 5

Income?

Subset of data with Income >= $60K

• Is there any gain in ternary or k-ary branching?

Page 19: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

• Suppose we want to branch on a real valued feature x[i], then how do we choose the threshold?• Between two data points, it does not make any difference

where you split

• there are really only a finite number of choices

• We choose one with lowest error (or maximum accuracy)!19

SafeRisky

Income

$120K$10K

vA vB

SafeRisky

Income

$120K$10K

Page 20: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

Growing decision tree for real-valued data

!20

0 10 20 30 40 …$0K

$40K

$80K

Age

Income

Age

Income age >= 38age < 38

Predict Safe

Predict Risky

0 10 20 30 40 …$0K

$40K

$80K

Age

Income

0 10 20 30 40 …$0K

$40K

$80K

root 9 8

Age

Income

age<38 5 3

risky

age>=38 4 5

income<48k 4 0

income>=48k 0 5

Input data Step 1

Step 2

Page 21: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

Class is 31 body parts: LU/RU/LW/RW head, neck, L/R shoulder, LU/RU/LW/RW arm, L/R elbow, L/R wrist, L/R hand, LU/RU/LW/RW torso, LU/RU/LW/RW leg, L/R knee, L/R ankle, L/R foot (Left, Right, Upper, loWer).

!21

Page 22: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

!22

features:

Decision tree:

Page 23: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

Classification with decision trees• Data

!23

Root18 13

x[1] >= -0.074 11

x[1] < -0.0713 3

x[1]

Page 24: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

!24

Root18 13

x[1] < -0.0713 3

x[1] >= -0.074 11

x[1]

x[1] < -1.667 0

x[1] >= -1.666 3

x[1]

x[2] < 1.551 11

x[2] >= 1.55 3 0

x[2]

y values- +

For threshold splits, same feature can be used multiple times

Page 25: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

Evolution of decision boundaries• Decision boundaries get more complicated as we increase the

model complexity (here measured by the depth fo decision tree)

!25

Logistic Regression

Decision Tree

Degree 2 featuresDegree 1 features

Depth 3Depth 1 Depth 10

Degree 6 features

Page 26: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

Probabilistic prediction with decision trees

• By taking the probability of the training data in that subset, we can make probabilistic predictions

!26

root18 13

X1 < -0.0713 3

X1 >= -0.074 11

X1

Y values- +

P(yi = +1) =11

15<latexit sha1_base64="UuHDfzIeEpg8V/VHNDmkfcFHA58=">AAACEXicbVDLSsNAFJ34rPUVdelmsAgVoWSqoptC0Y3LCvYBTSiT6aQdOnkwMxFCyC+48VfcuFDErTt3/o2TNgttPXDhcM693HuPG3EmlWV9G0vLK6tr66WN8ubW9s6uubffkWEsCG2TkIei52JJOQtoWzHFaS8SFPsup113cpP73QcqJAuDe5VE1PHxKGAeI1hpaWBWU9vHauy6sJVV7TFWaTJgWeMUnTSg7QlMUoSyFF1kA7Ni1awp4CJBBamAAq2B+WUPQxL7NFCEYyn7yIqUk2KhGOE0K9uxpBEmEzyifU0D7FPppNOPMnislSH0QqErUHCq/p5IsS9l4ru6Mz9fznu5+J/Xj5V35aQsiGJFAzJb5MUcqhDm8cAhE5QonmiCiWD6VkjGWOegdIhlHQKaf3mRdOo1dFar351XmtdFHCVwCI5AFSBwCZrgFrRAGxDwCJ7BK3gznowX4934mLUuGcXMAfgD4/MHnR2cOg==</latexit>

P(yi = �1) =13

16<latexit sha1_base64="OZ2S3naisEZz4A2W1LgKMQRWC5c=">AAACEnicbVDLSsNAFJ34rPUVdelmsAjtwpK0om4KRTcuK9gHNCFMppN26OTBzEQIId/gxl9x40IRt67c+TdO2iy09cCFwzn3cu89bsSokIbxra2srq1vbJa2yts7u3v7+sFhT4Qxx6SLQxbygYsEYTQgXUklI4OIE+S7jPTd6U3u9x8IFzQM7mUSEdtH44B6FCOpJEevpZaP5MR1YSerWhMk08ShWevMrLWg5XGEU7OZpeZFBh29YtSNGeAyMQtSAQU6jv5ljUIc+ySQmCEhhqYRSTtFXFLMSFa2YkEihKdoTIaKBsgnwk5nL2XwVCkj6IVcVSDhTP09kSJfiMR3VWd+v1j0cvE/bxhL78pOaRDFkgR4vsiLGZQhzPOBI8oJlixRBGFO1a0QT5AKQqoUyyoEc/HlZdJr1M1mvXF3XmlfF3GUwDE4AVVggkvQBregA7oAg0fwDF7Bm/akvWjv2se8dUUrZo7AH2ifPwi+nGk=</latexit>

root18 13

X1 < -0.0713 3

X1 >= -0.074 11

X1 < -1.667 0

X1 >= -1.666 3

X2 < 1.551 11

X2 >= 1.55 3 0

X1

X1 X2

Y values- +

Page 27: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

!27

Overfitting

Page 28: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

Trade-off between training error and depth• As in other regression and classification approaches,

training error monotonically decreases (or more precisely non-increases) with model complexity (usually measured in depth)

• Training error that is too small is a sign of overfitting

!28

Tree depth depth = 1 depth = 2 depth = 3 depth = 5 depth = 10

Training error 0.22 0.13 0.10 0.03 0.00

Decision boundary

Training error reduces with depth

Page 29: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

Two ways to prevent overfitting

• 1. Early stopping • Stop before tree gets too complicated

• 2. Pruning • Simplify after training a complex model

!29

Clas

sifica

tion

Erro

r

Complex trees

True error

Training error

Simple trees

Tree depthmax_depth

Page 30: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

1. Early stopping• Stopping rule: (Recap)• 1. All examples in the subset have the same label• 2. No more features left to split

• Early stopping rule• Only grow up to max_depth (which is chosen via validation)• Hard to figure out the right depth• An imbalanced tree can be a better one

• Do not split if it does not give sufficient decrease in error• There are cases where it takes some depth to get the gain• e.g. XOR

• Do not split intermediate nodes with too few data points!30

x[1] x[2] yFALSE FALSE FALSEFALSE TRUE TRUETRUE FALSE TRUETRUE TRUE FALSE

!30

Page 31: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

2. Pruning• Train till overfit, and then simplify

• Pruning is guided by a choice of quality metric that balances fitting data and simplicity

• Data fit is measured by the error• Simplicity is measured by the number of leaves in the tree

!31

Complex Tree

Simplify

Simpler Tree

Loss(T ) = Error(T ) + � r(T )|{z}# of leaf nodes

<latexit sha1_base64="WiYI9wGc9iHgj3xPgi5tMVV1iwg=">AAACSXicbZDNSxwxGMYzq7Y62rqtRy/BpaC0LDMqtCAFsQgePCi4KmyWbSbzjgYzyZC8U7oM8+/10ltv/R968aCIJzO7e/CjLyQ8PM/75uOXFEo6jKK/QWtmdu7V6/mFcHHpzdvl9rv3p86UVkBPGGXsecIdKKmhhxIVnBcWeJ4oOEuuvjX52Q+wThp9gqMCBjm/0DKTgqO3hu3vtGI2p4fGuXr9ZIOyHbbztdkm/r61xjZB+JGGTPlzU84+UVbqFGxiuYDK+rQeVgzhJ1asQ01GFfCMapOCq+thuxN1o3HRlyKeig6Z1tGw/YelRpQ5aBSKO9ePowIHFbcohYI6ZKWDgosrfgF9LzXPwQ2qMYmafvBOSjNj/dJIx+7jiYrnzo3yxHfmHC/d86wx/5f1S8y+DCqpixJBi8lFWakoGtpgpam0IFCNvODCSv9WKi65B4QefughxM+//FKcbnbjre7m8XZnd2+KY56skjWyTmLymeySA3JEekSQX+QfuSG3we/gOrgL7ietrWA6s0KeVGvmAeZksCM=</latexit>

Simplify

simplest tree<latexit sha1_base64="BfW3nLYEmiRf1Apg4FqpvxuHnCU=">AAAB9HicbVA9SwNBEN2LXzF+RS1tFoNgFe5ioWXQxjKC+YDkCHubuWTJ7t25OxcIR36HjYUitv4YO/+Nm+QKTXww8Hhvhpl5QSKFQdf9dgobm1vbO8Xd0t7+weFR+fikZeJUc2jyWMa6EzADUkTQRIESOokGpgIJ7WB8N/fbE9BGxNEjThPwFRtGIhScoZV8I1QiwSBFDdAvV9yquwBdJ15OKiRHo1/+6g1iniqIkEtmTNdzE/QzplFwCbNSLzWQMD5mQ+haGjEFxs8WR8/ohVUGNIy1rQjpQv09kTFlzFQFtlMxHJlVby7+53VTDG/8TERJihDx5aIwlRRjOk+ADoQGjnJqCeNa2FspHzHNONqcSjYEb/XlddKqVb2rau2hVqnf5nEUyRk5J5fEI9ekTu5JgzQJJ0/kmbySN2fivDjvzseyteDkM6fkD5zPHwOskj8=</latexit>

� = 0<latexit sha1_base64="5TvkrM6y2794e0KmFtszz1YXvBk=">AAAB8HicbVDLSgMxFL1TX7W+qi7dBIvgqszUgm6EohuXFexD2qFkMpk2NMkMSUYoQ7/CjQtF3Po57vwb03YW2nogcDjnXHLvCRLOtHHdb6ewtr6xuVXcLu3s7u0flA+P2jpOFaEtEvNYdQOsKWeStgwznHYTRbEIOO0E49uZ33miSrNYPphJQn2Bh5JFjGBjpcc+t9EQX7uDcsWtunOgVeLlpAI5moPyVz+MSSqoNIRjrXuemxg/w8owwum01E81TTAZ4yHtWSqxoNrP5gtP0ZlVQhTFyj5p0Fz9PZFhofVEBDYpsBnpZW8m/uf1UhNd+RmTSWqoJIuPopQjE6PZ9ShkihLDJ5ZgopjdFZERVpgY21HJluAtn7xK2rWqd1Gt3dcrjZu8jiKcwCmcgweX0IA7aEILCAh4hld4c5Tz4rw7H4towclnjuEPnM8fMu+QAg==</latexit>

� = 1<latexit sha1_base64="rKTxpoxfdyIMAiMb+o13DP0k40k=">AAAB9XicbVDLSsNAFL2pr1pfVZduBovgqiRV0I1QdOOygn1AE8tkMmmHTiZhZqKE0P9w40IRt/6LO//GaZuFth4YOJxzLvfO8RPOlLbtb6u0srq2vlHerGxt7+zuVfcPOipOJaFtEvNY9nysKGeCtjXTnPYSSXHkc9r1xzdTv/tIpWKxuNdZQr0IDwULGcHaSA8uN9EAX7lMhDobVGt23Z4BLROnIDUo0BpUv9wgJmlEhSYcK9V37ER7OZaaEU4nFTdVNMFkjIe0b6jAEVVePrt6gk6MEqAwluYJjWbq74kcR0plkW+SEdYjtehNxf+8fqrDSy9nIkk1FWS+KEw50jGaVoACJinRPDMEE8nMrYiMsMREm6IqpgRn8cvLpNOoO2f1xt15rXld1FGGIziGU3DgAppwCy1oAwEJz/AKb9aT9WK9Wx/zaMkqZg7hD6zPH4s3koo=</latexit>

Page 32: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

Pruning algorithm

!32

• 1. Compute current loss, and find a candidate split

Loss(T ) = Error(T ) + � r(T )|{z}# of leaf nodes

<latexit sha1_base64="WiYI9wGc9iHgj3xPgi5tMVV1iwg=">AAACSXicbZDNSxwxGMYzq7Y62rqtRy/BpaC0LDMqtCAFsQgePCi4KmyWbSbzjgYzyZC8U7oM8+/10ltv/R968aCIJzO7e/CjLyQ8PM/75uOXFEo6jKK/QWtmdu7V6/mFcHHpzdvl9rv3p86UVkBPGGXsecIdKKmhhxIVnBcWeJ4oOEuuvjX52Q+wThp9gqMCBjm/0DKTgqO3hu3vtGI2p4fGuXr9ZIOyHbbztdkm/r61xjZB+JGGTPlzU84+UVbqFGxiuYDK+rQeVgzhJ1asQ01GFfCMapOCq+thuxN1o3HRlyKeig6Z1tGw/YelRpQ5aBSKO9ePowIHFbcohYI6ZKWDgosrfgF9LzXPwQ2qMYmafvBOSjNj/dJIx+7jiYrnzo3yxHfmHC/d86wx/5f1S8y+DCqpixJBi8lFWakoGtpgpam0IFCNvODCSv9WKi65B4QefughxM+//FKcbnbjre7m8XZnd2+KY56skjWyTmLymeySA3JEekSQX+QfuSG3we/gOrgL7ietrWA6s0KeVGvmAeZksCM=</latexit>

• 2. Replace the split by a leaf node and recompute

Root22 18

excellent9 0 Fair

9 4

poor4 14

Loan status:Safe Risky

Credit?

Safe

5 years9 0

3 years0 4

Term?

Risky Safe

low0 9

high4 5

Income?

5 years4 3

3 years0 2

Term?

Risky Safe

Risky

Tree Error #Leaves TotalT 3/40 6 0.255

T-term 4/40 5 0.25T-credit

T-incomeT-term’

� = 0.03<latexit sha1_base64="v8W0GJA3q72RI6EzEDRgEFFAf1k=">AAAB83icbVDNSgMxGPy2/tX6V/XoJVgET2W3FfQiFL14rGBrobuUbDbbhmazIckKZelrePGgiFdfxptvY9ruQVsHQoaZ+ciXCSVn2rjut1NaW9/Y3CpvV3Z29/YPqodHXZ1mitAOSXmqeiHWlDNBO4YZTntSUZyEnD6G49uZ//hElWapeDATSYMEDwWLGcHGSr7PbTTC127dbQ6qNXvNgVaJV5AaFGgPql9+lJIsocIQjrXue640QY6VYYTTacXPNJWYjPGQ9i0VOKE6yOc7T9GZVSIUp8oeYdBc/T2R40TrSRLaZILNSC97M/E/r5+Z+CrImZCZoYIsHoozjkyKZgWgiClKDJ9YgolidldERlhhYmxNFVuCt/zlVdJt1L1mvXF/UWvdFHWU4QRO4Rw8uIQW3EEbOkBAwjO8wpuTOS/Ou/OxiJacYuYY/sD5/AGHrpCx</latexit>

high4 5

Risky

• 3. Repeat for all splits (there are 4 splits in this example)

Page 33: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

Decision trees for regression

!33

X

Y

Page 34: Decision Trees - University of WashingtonProbabilistic prediction with decision trees • By taking the probability of the training data in that subset, we can make probabilistic predictions

Decision trees for regression• Same process as classification, but • error measured in squared error• Prediction is the mean value of the samples in that partition

!34