43
Classification Tree Regression Tree Medical Applications of CART Classification and Regression Trees Mihaela van der Schaar Department of Engineering Science University of Oxford March 1, 2017 Mihaela van der Schaar Classification and Regression Trees

Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Classification and Regression Trees

Mihaela van der Schaar

Department of Engineering ScienceUniversity of Oxford

March 1, 2017

Mihaela van der Schaar Classification and Regression Trees

Page 2: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Overview

Many decisions are tree structures

Medical treatment

Many decisions are tree structures

Medical treatment

Fever

𝑇 > 100 𝑇 < 100

Treatment #1 Muscle Pain

Treatment #2

High

Treatment #3

Low

Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 1, 2015 20 / 42

Mihaela van der Schaar Classification and Regression Trees

Page 3: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Overview

Overview

A Decision Tree is a hierarchically organized structure, with eachnode splitting the data space into pieces based on value of afeature.

Equivalent to a partition of Rd into K disjoint featuresubspaces {R1, ...,RK}, where each Rj ⊂ Rd

On each feature subspace Rj , the same decision/prediction ismade for all x ∈ Rj .

Mihaela van der Schaar Classification and Regression Trees

Page 4: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Terminology

Terminology

Parent of a node c is the immediate predecessor node.

Children of a node c are the immediate successors of c ,equivalently nodes which have c as a parent.

Root node is the top node of the tree; the only node withoutparents.

Leaf nodes are nodes which do not have children.

A K -ary tree is a tree where each node (except for leafnodes) has K children. Usually working with binary trees(K = 2).

Depth of a tree is the maximal length of a path from the rootnode to a leaf node.

Mihaela van der Schaar Classification and Regression Trees

Page 5: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Terminology

Terminology

Root Node

Node C

Parent of Node C

Children of Node C

Leaf Nodes

Depth

= 2Binary Tree

Mihaela van der Schaar Classification and Regression Trees

Page 6: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Terminology

The leaves of a tree partition the feature spaceA tree partitions the feature space

A

B

C D

E

θ1 θ4

θ2

θ3

x1

x2

x1 > θ1

x2 > θ3

x1 6 θ4

x2 6 θ2

A B C D E

Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 1, 2015 23 / 42

Mihaela van der Schaar Classification and Regression Trees

Page 7: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Learning

Learning a tree model

Three things to learnThe structure of the tree.The threshold values (θi )The values for the leafs (A,B,...)

Learning a tree model

Three things to learn:

1 The structure of the tree.

2 The threshold values (θi).

3 The values for the leafs(A,B, . . .).

x1 > θ1

x2 > θ3

x1 6 θ4

x2 6 θ2

A B C D E

Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 1, 2015 24 / 42

Mihaela van der Schaar Classification and Regression Trees

Page 8: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Overview - Classification Tree

Overview - Classification Tree

Classification Tree:Given the dataset D = (x1, y1), ..., (xn, yn) wherexi ∈ R, yi ∈ Y = {1, ...,m}minimize the misclassification error in each leafthe estimated probability of each class k in region Rj is simply:

βjk =

∑i I(yi = k) · I(xi ∈ Rj)∑

i I(xi ∈ Rj)

This is the frequency in which label k occurs in the leaf Rj

These estimates can be regularized as well.

Mihaela van der Schaar Classification and Regression Trees

Page 9: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Example

Example

Decide whether to wait for a table at a restaurant, based onthe following attributes (Example from Russell and Norvig,AIMA)

Alternate: is there an alternative restaurant nearby?Bar: is there a comfortable bar area to wait in?Fri/Sat: is today Friday or Saturday?Hungry: are we hungry?Patrons: number of people in the restaurant (None, Some,Full)Price: price range ($, $$, $$$)Raining: is it raining outside?Reservation: have we made a reservation?Type: kind of restaurant (French, Italian, Thai, Burger)Wait Estimate: estimated waiting time (0-10, 10-30, 30-60,>60)

Mihaela van der Schaar Classification and Regression Trees

Page 10: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Example

A tree model for deciding where to eat

Examples described by attributes values (Binary, discrete,continuous)

A tree model for deciding where to eat

Choosing a restaurant(Example from Russell & Norvig, AIMA)

Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 1, 2015 25 / 42

Classification of examples is positive (T) or negative (F)

Mihaela van der Schaar Classification and Regression Trees

Page 11: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Example

First decision: at the root of the tree

Which attribute to split?

First decision: at the root of the tree

Which attribute to split?

Idea: use information gain to choosewhich attribute to split

Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 1, 2015 27 / 42

patrons? is a better choice - gives information about theclassification

Idea: use information gain to choose which attribute to split

Mihaela van der Schaar Classification and Regression Trees

Page 12: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Example

Information gain

A chosen attribute A divides the training set E into subsetsE1, ...,Ev according to their values for A, where A has vdistinct values.

remainder(A) =v∑

i=1

pi + nip + n

× I (pi

pi + ni,

nipi + ni

)

Information Gain (IG) or reduction from the attribute test:

IG (A) = I (p

p + n,

n

p + n)− remainder(A)

Choose the attribute with the largest IG (A)

Mihaela van der Schaar Classification and Regression Trees

Page 13: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Example

How to measure information gain?

Idea: Gaining information reduces uncertaintyUse to entropy to measure uncertainty

If a random variable X has K different values, (a1, ..., aK ) itsentropy is given by

H[X ] = −K∑

k=1

P(X = ak)× logP(X = ak)

Different measures of uncertainty:Misclassification error: 1−max{P(X = ak)}GINI Index:

∑Kk=1 2P(X = ak)(1− P(X = ak))

C4.5 Tree: Classification tree uses entropy to measureuncertainty.

CART: Classification tree uses Gini index to measureuncertainty.

Mihaela van der Schaar Classification and Regression Trees

Page 14: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Example

Learning trees

Uncertainty measurement for two-class classification (as afunction of the proportional p in class 1.)

Mihaela van der Schaar Classification and Regression Trees

Page 15: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Example

Partitions and CART

Mihaela van der Schaar Classification and Regression Trees

Page 16: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Example

Algorithm for Classification Trees

Start with R1 = Rd

For each feature j = 1, ..., d , for each value v ∈ R that we cansplit on:

Split the data set:

I< = {i : xij < v} and I> = {i : xij ≥ v}Estimate the parameters:

β< =

∑i I(yi = 1) · I(xi ∈ I<)∑

i I(xi ∈ I<)and β> =

∑i I(yi = 1) · I(xi ∈ I>)∑

i I(xi ∈ I>)

Quality of split is measured by the weighted sum of theuncertainty:

|I<||I<|+ |I>|

× I (β<, 1− β<) +|I>|

|I<|+ |I>|× I (β>, 1− β>)

Choose split with minimal weighted sum of the uncertainty.Recurse on both children, with (xi , yi )i∈I< and (xi , yi )i∈I> .

Mihaela van der Schaar Classification and Regression Trees

Page 17: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Example

Which attribute to split?

First decision: at the root of the tree

Which attribute to split?

Idea: use information gain to choosewhich attribute to split

Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 1, 2015 27 / 42

Patron vs. Type?By choosing Patron, we end up with a partition (3 branches)with smaller entropy, i.e. smaller uncertainty (0.45 bit)By choosing Type, we end up with uncertainty of 1 bit.Thus, we choose Patron over Type.

Mihaela van der Schaar Classification and Regression Trees

Page 18: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Example

Uncertainty if we go with ”Patron”

For ”None” branch

−(0

0 + 2log

0

0 + 2+

2

0 + 2log

2

0 + 2) = 0

For ”Some” branch

−(4

4 + 0log

4

4 + 0+

0

4 + 0log

0

4 + 0) = 0

For ”Full” branch

−(2

2 + 4log

2

2 + 4+

4

2 + 4log

4

2 + 4) ≈ 0.9

For choosing ”Patrons”weighted average of each branch: this quantity is calledconditional entropy

2

12× 0 +

4

12× 0 +

6

12× 0.9 = 0.45

Mihaela van der Schaar Classification and Regression Trees

Page 19: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Example

Conditional entropy

Definition. Given two random variables X and Y

H[Y |X ] =∑k

P(X = ak)× H[Y |X = ak ]

In our exampleX : the attribute to be splitY : Wait or not

Relation to information gain

Gain = H[Y ]− H[Y |X ]

When H[Y ] is fixed, we need only to compare conditionalentropy.

Mihaela van der Schaar Classification and Regression Trees

Page 20: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Example

Conditional entropy for ”Type”

For ”French” and ”Italian” branch

−(1

1 + 1log

1

1 + 1+

1

1 + 1log

1

1 + 1) = 1

For ”Thai” and ”Burger” branch

−(2

2 + 2log

2

2 + 2+

2

2 + 2log

2

2 + 2) = 1

For choosing ”Type”

weighted average of each branch (conditional entropy)

2

12× 1 +

2

12× 1 +

4

12× 1 +

4

12× 1 = 1

Mihaela van der Schaar Classification and Regression Trees

Page 21: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Example

Next split?Do we split on “Non” or “Some”?

!

No, we do not

The decision is deterministic, as seen from the training data

Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 1, 2015 35 / 42

Do we split on ”None” or ”Some”?

No, we do not

The decision is deterministic, as seen from the training data

We will look only at the 6 instances with Patrons == Full

Mihaela van der Schaar Classification and Regression Trees

Page 22: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Example

Greedily we build the tree and get thisGreedily we build the tree and get this

Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 1, 2015 36 / 42

Mihaela van der Schaar Classification and Regression Trees

Page 23: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Example

What is the optimal Tree Depth?

We need to be careful to pick an appropriate tree depth.

If the tree is too deep, we can overfit.

If the tree is too shallow, we underfit

Max depth is a hyper-parameter that should be tuned by thedata.

Alternative strategy is to create a very deep tree, and then toprune it.

Mihaela van der Schaar Classification and Regression Trees

Page 24: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Example

Control the size of the tree

We would prune to have a smaller one.

Control the size of the tree

We would prune to have a smaller one

If we stop here, not all training sample would be classified correctly.

More importantly, how do we classify a new instance?

We label the leaves of this smaller tree with the majorityof training samples’ labels

Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 1, 2015 39 / 42

If we stop here, not all training samples would be classifiedcorrectly.

More importantly, how do we classify a new instance?

We label the leaves of this smaller tree with the majority oftraining samples’ labels.

Mihaela van der Schaar Classification and Regression Trees

Page 25: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Example

Example

We stop after the root (first node).

Example

Example

We stop after the root (first node)

!

!

!

!

!

!

!

Wait: yes Wait: noWait: no

Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 1, 2015 40 / 42

Mihaela van der Schaar Classification and Regression Trees

Page 26: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Example

Computational Consideration

Numerical FeaturesWe could split on any feature, with any threshold.However, for a given feature, the only split points we need toconsider are the n values in the training data for this feature.If we sort each feature by these n values, we can quicklycompute our uncertainty metric of interest (cross entropy orothers)This takes O(dn log n) time

Categorical FeaturesAssuming q distinct categories, there are 2q−1 − 1 possiblebinary partitions we can consider.However, things simplify in the case of binary classification (orregression), and we can find the optimal split (for crossentropy and Gini) by only considering q − 1 possible splits

Mihaela van der Schaar Classification and Regression Trees

Page 27: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Example

Summary of learning classification trees

AdvantagesEasily interpretable by human (as long as the tree is not toobig)Computationally efficientHandles both numerical and categorical dataIt is parametric thus compact: unlike Nearest NeighborhoodClassification, we do not have to carry our training instancesaroundBuilding block for various ensemble methods (more on thislater)

DisadvantagesHeuristic training techniquesFinding partition of space that minimizes empirical error isNP-hard.We resort to greedy approaches with limited theoreticalunderpinning.

Mihaela van der Schaar Classification and Regression Trees

Page 28: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Overview - Regression Tree

Overview - Regression Tree

Regression Tree:Given the dataset D = (x1, y1), ..., (xn, yn) wherexi ∈ R, yi ∈ Y = Rminimize the error (e.g. square loss error) in each leafthe parameterized function is

f (x) =K∑j=1

βj · I(x ∈ Rj)

Using squared loss, optimal parameters are:

βj =

∑i yi · I(xi ∈ Rj)∑

i I(xi ∈ Rj)

which is sample mean.

Mihaela van der Schaar Classification and Regression Trees

Page 29: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Methodology

Assign the prediction for each leaf

For each leaf, we need to assign the prediction y which minimizesthe loss for the regression problem.

Regression Tree:

y = arg miny∈R

∑i∈Rk

(y − yi )2 =

1∑I(xi ∈ Rk)

·∑

I(xi ∈ Rk) · yi

Mihaela van der Schaar Classification and Regression Trees

Page 30: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Methodology

Growing the Tree: Overview

Ideally, would like to find partition that achieves minimal risk:lowest mean-squared error for regression problem.

Number of potential partitions is too large to searchexhaustively.

Greedy search heuristics for a good partition:

Start at root.Determine the best feature and value to split.Recurse on children of node.Stop at some point (with heuristic pruning rules).

Mihaela van der Schaar Classification and Regression Trees

Page 31: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Growth Heuristic for Regression Trees

Algorithm for Regression Trees

Start with R1 = Rd

For each feature j = 1, ..., d , for each value v ∈ R that we cansplit on:

Split the data set:

I< = {i : xij < v} and I> = {i : xij ≥ v}

Estimate parameters:

β< =

∑i∈I< yi

|I<|and β> =

∑i∈I> yi

|I>|Quality of split is measured by the squared loss:∑

i∈I<

(yi − β<)2 +∑i∈I>

(yi − β>)2

Choose split with minimal loss.

Recurse on both children, with (xi , yi )i∈I< and (xi , yi )i∈I> .

Mihaela van der Schaar Classification and Regression Trees

Page 32: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Example of Regression Trees

Example of Regression Trees

ଵ ଶ

ଵݕ ଶݕ

Feature Space

Regression Tree

Mihaela van der Schaar Classification and Regression Trees

Page 33: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Example of Regression Trees

Example of Regression Trees

ଵ ଶ

ଵଶ

ଵݕ

ଶݕଷݕ

Feature Space

Regression Tree

Mihaela van der Schaar Classification and Regression Trees

Page 34: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Example of Regression Trees

Example of Regression Trees

ଵ ଶ ଷ ସ

ଶ ଷ

ଵଶ ଷ

ଵݕ

ଶݕଷݕ

ସݕ

Feature Space

Regression Tree

Mihaela van der Schaar Classification and Regression Trees

Page 35: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Model Complexity

Model Complexity

When should tree growing be stopped?

Will need to control complexity to prevent over-fitting, and ingeneral find optimal tree size with best predictive performance.

Consider a regularized objective

Traing Error(Tree) + C × Size(Tree)

Grow the tree from scratch and stop once the criterionobjective starts to increase.First grow the full tree and prune nodes (starting at leaves),until the objective starts to increase.

Second option is preferred as the choice of tree is lesssensitive to wrong choices of split points and variables to spliton in the first stages of tree fitting.

Use cross validation to determine optimal C .

Mihaela van der Schaar Classification and Regression Trees

Page 36: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Model Complexity

Pruning Rule

Stop when one instance in each leaf (regression problem)

Stop when all the instance in each leaf have the same label(classification problem)

Stop when the number of leaves is less than the threshold

Stop when the leaf’s error is less than the threshold

Stop when the number of instances in each leaf is less thanthe threshold

Stop when the p-value between two divided leafs is larger thanthe certain threshold (e.g. 0.05 or 0.01) based on chosenstatistical tests.

Mihaela van der Schaar Classification and Regression Trees

Page 37: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Overview

Decision Tree can be applied in general applications (e.g.medical applications/recommendation systems).

We focus on medical applications.

1. Neurosurgery

Aim: To recommend certain type of neurosurgery to patientswho have higher probability to exhibit the clinicalimprovement.

2. Heart Transplant

Aim: To match the best available heart to the patient whosesurvival probability is maximized.

Pruning Rule: Stop when the p-value between two divided leavesis larger than the certain threshold (0.05) based on the studentt-test.

Mihaela van der Schaar Classification and Regression Trees

Page 38: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Student t-test

Student t-test for the pruning rule of decision tree

Student t-test is a statistical hypothesis test used todetermine whether two sets of data are statistically differentfrom each other.

Usually, it is used to test the null hypothesis that the meansof two populations are equal.

Therefore, when pruning decision trees, the Student t-test isused to determine whether the two resulting partitions (thepopulations of different leaves) have statistically differentmeans.

It is usually determined by the p-value (less than 0.05 or 0.01)computed by the Student t-test.

Mihaela van der Schaar Classification and Regression Trees

Page 39: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Student t-test

Details of Student t-test

For the unpaired datasets and unequal variance setting,Mean of group A:

yA =1

NA

∑yi · I(Xi ∈ A)

Variance of group A:

s2A =

1

NA − 1

∑(yi − yA)2 · I(Xi ∈ A)

.Therefore, t-value can be computed as follow

t =yA − yB√s2A

NA+

s2B

NB

Based on the computed t-value, the p-value is computed as.

p-value = P(Z > |t|) where Z ∼ N (0, 1)

Mihaela van der Schaar Classification and Regression Trees

Page 40: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Neurosurgery

Neurosurgery: Block Diagram

2

Patient information Prediction for the success probabilty of NeurosurgeryPredictive model(Decision Tree)

Recommend Neurosurgery only for certain patient groups

Mihaela van der Schaar Classification and Regression Trees

Page 41: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Neurosurgery

Neurosurgery: Achieved Decision Tree

Success40.0%

Success77.8%

Success64.7%

SRS Image Score ≤ 3.45

Success68.9%

Success91.3%

SRS Image Score ≤ 2.9

Success47.9%

Success27.9%

SRS Mental Score ≤ 4.1

Success77.5%

Success61.7%

SRS Mental Score ≤ 4.1

Success55.3%

Success34.0%

Trunk Shift≤ 1.55

Success63.6%

Success26.2%

Sacral Slope≤ 49.5Success24.0%

Success65.0%

Weight≤ 76.5

Mihaela van der Schaar Classification and Regression Trees

Page 42: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Heart Transplant

Heart Transplant: Block Diagram

4

Prediction for the success of heart transplantPredictive model(Decision Tree)

Optimally matchpatients and donors to minimize mortality

Patients / Donors Information

Mihaela van der Schaar Classification and Regression Trees

Page 43: Mihaela van der Schaarflaxman/HT17_lecture13.pdf · Classi cation Tree Regression Tree Medical Applications of CART Terminology. Terminology. Parent of a node c is the immediate predecessor

Classification Tree Regression Tree Medical Applications of CART

Heart Transplant

Heart Transplant: Achieved Decision Tree

Success55.3%

Success81.2%

Success70.1%

Donor Age ≤ 28

Success54.2%

Success88.5%

No Ventilator Assist

Success21.3%

Success67.1%

No Ventilator Assist

Success32.1%

Success71.2%

Previous Transplant ≤ 0

Success13.9%

Success41.5%

No Dialysis in Listing

Success41.1%

Success69.1%

Previous Transplant ≤ 0

Success69.3%

Success77.1%

Creatinine ≤1.36

Success69.1%

Success91.2%

No DiabetesSuccess71.2%

Success57.9%

No Donor HEP C Antigen

Mihaela van der Schaar Classification and Regression Trees