52
Artificial intelligence Decision trees PRISM - Nicolas Sutton-Charani 18/01/2021 1 / 52

Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Decision trees

PRISM - Nicolas Sutton-Charani

18/01/2021

1 / 52

Page 2: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

2 / 52

Page 3: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Introduction

Plan

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

3 / 52

Page 4: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Introduction

What is a decision tree ?

attribute J1

attribute J2

labelprediction

labelprediction

attribute J3

attribute J4

labelprediction

labelprediction

labelprediction

4 / 52

Page 5: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Introduction

What is a decision tree ?

attribute J1

attribute J2

labelprediction

labelprediction

attribute J3

attribute J4

labelprediction

labelprediction

labelprediction

5 / 52

Page 6: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Introduction

What is a decision tree ? → supervised learning

attribute J1

attribute J2

labelprediction

values

labelprediction

values

values

attribute J3

attribute J4

labelprediction

values

labelprediction

values

values

labelprediction

values

values

6 / 52

Page 7: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Introduction

A little history

!4machine learning (or data mining) decision trees6= decision theory decision trees

7 / 52

Page 8: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Introduction

Types of decision trees

type of class label

I numerical → regression tree

I nominal → classification tree

type of algorithm (→ structure)

I CART : statistics, binary tree

I C4.5 : computer science, small tree

8 / 52

Page 9: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Use of decision trees

Plan

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

9 / 52

Page 10: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Use of decision trees

Prediction

Plan

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

10 / 52

Page 11: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Use of decision trees

Prediction

Classification treesWill the badminton match take place ?

11 / 52

Page 12: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Use of decision trees

Prediction

Classification treesWhat fruit is it ?

12 / 52

Page 13: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Use of decision trees

Prediction

Classification treesWhat he/she come to my party ?

13 / 52

Page 14: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Use of decision trees

Prediction

Classification treesWill they wait ?

14 / 52

Page 15: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Use of decision trees

Prediction

Classification treesWho will win the US presidential election ?

15 / 52

Page 16: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Use of decision trees

Prediction

Regression treesWhat grade will a student get (given his homework averagegrade) ?

16 / 52

Page 17: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Use of decision trees

Interpretability : Descriptive data analysis

Plan

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

17 / 52

Page 18: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Use of decision trees

Interpretability : Descriptive data analysis

Data analysis tool

Trees are very interpretable : attributes spaces partitioning

→ a tree can be resumed by its leaves which define a law mixture

→ wonderful collaboration tool with experts

!4 INSTABILITY ← overfitting

18 / 52

Page 19: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Learning of decision trees

Plan

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

19 / 52

Page 20: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Learning of decision trees

Formalism

Learning dataset (supervised learning) x1, y1...

xN , yN

=

x11 . . . xJ1 y1...

......

x1N . . . xJN yN

samples are assu-med to be i.i.d

I Attributes X = (X 1, . . . ,X J) ∈ X = X 1 × · · · × X J

I Spaces X j can be categorical or numerical

I Class label Y ∈ Ω = ω1, . . . , ωK (∈ RK for regression)

Tree

PH = t1, . . . , tH and πh = P(th) ≈ |th|N

with |th| = #i : xi ∈ th

20 / 52

Page 21: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Learning of decision trees

Recursive partitioning

21 / 52

Page 22: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Learning of decision trees

Recursive partitioning

22 / 52

Page 23: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Learning of decision trees

Recursive partitioning

23 / 52

Page 24: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Learning of decision trees

Recursive partitioning

24 / 52

Page 25: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Learning of decision trees

Learning principleI Start with all the dataset in the initial nodeI Chose the best splits (on attributes) in order to get pure

leaves

Classification trees

purity = homogeneity in term of class labels

I CART → Gini impurity : i(th) =K∑

k=1pk (1− pk )

I ID3, C4.5 → Shanon entropy : i(th) = −K∑

k=1pk log2(pk )

whith

pk = P(Y = ωk |th)

Regression trees

purity = low variance of class labels

→ i(th) = Var(Y |th) = 1|th|

∑xi∈th

(yi − E(Y |th))2 with E(Y |th) = 1|th|

∑xi∈th

yi

25 / 52

Page 26: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Learning of decision trees

Impurity measures

26 / 52

Page 27: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Learning of decision trees

Purity criteria

Plan

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

27 / 52

Page 28: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Learning of decision trees

Purity criteria

Purity criteria

leafto split

th

Impurity measure + tree structure → criteria

CART, ID3 : purity gain

C4.5 : information gain ratio

Regression trees

CART : Variance minimisation28 / 52

Page 29: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Learning of decision trees

Purity criteria

Purity criteria

attribute ?

prediction ?

values ?

prediction ?

values ?th

tL tR

Impurity measure + tree structure → criteria

CART, ID3 : purity gain → ∆i = i(th)− πLi(tL)− πR i(tR)C4.5 : information gain ratio → IGR = ∆i

H(πL,πR)

Regression trees

CART : Variance minimisation → ∆i = i(th)− πLi(tL)− πR i(tR)

29 / 52

Page 30: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Learning of decision trees

Stopping criteria

Plan

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

30 / 52

Page 31: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Learning of decision trees

Stopping criteria

Stopping criteria (pre-pruning)

For all leaves thh=1,...,H and their potential children :

I leaves purity : ∃k ∈ 1, . . . ,K : pk = 1

I leaves and children sizes : |th| ≤ minLeafSize

I leaves and children weights : πh = |th|t0≤ minLeafProba

I leaves number : H ≥ maxNumberLeaves

I tree depth : depth(PH) ≥ maxDepth

I purity gain : ∆i ≤ minPurityGain

31 / 52

Page 32: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Learning of decision trees

Learning algorithm

Plan

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

32 / 52

Page 33: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Learning of decision trees

Learning algorithm

Learning algorithm

Result: Learnt tree

Start with all the learning data in an initial node (single leaf);

while Stopping criteria not verified for all leaves dofor each splitable leaf do

compute the purity gains obtained from all possiblesplit;

endSPLIT : select the split achieving the maximum purity gain;

endprune the obtained tree;

Recursive partitioning33 / 52

Page 34: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Learning of decision trees

Learning algorithm

ID3 - Training Examples – [9+,5-]

34 / 52

Page 35: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Learning of decision trees

Learning algorithm

ID3 - Selecting Next Attribute

35 / 52

Page 36: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Learning of decision trees

Learning algorithm

ID3 - Selecting Next Attribute

36 / 52

Page 37: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Learning of decision trees

Learning algorithm

ID3 - Selecting Next Attribute

37 / 52

Page 38: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Learning of decision trees

Learning algorithm

ID3 - Best Attribute - Outlook

38 / 52

Page 39: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Learning of decision trees

Learning algorithm

ID3 - Ssunny

39 / 52

Page 40: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Learning of decision trees

Learning algorithm

ID3 - Results

40 / 52

Page 41: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Pruning of decision trees

Plan

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

41 / 52

Page 42: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Pruning of decision trees

Overfitting

42 / 52

Page 43: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Pruning of decision trees

Overfitting

Remark : decision trees do not need variable selection ordimension reduction (in term of accuracy).

43 / 52

Page 44: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Pruning of decision trees

Cost-complexity trade-off

Plan

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

44 / 52

Page 45: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Pruning of decision trees

Cost-complexity trade-off

Cost-Complexity Pruning

The ideaI trade-off between predictive efficiency and complexity

I find a subtree that fulfills this trade-off

MetricsI ’Err’ ← misclassification rate or MSE

I Criterion : Rα = Err + αH

Steps

I Find a useful sequence of nested subtrees

I Choose the right subtree

45 / 52

Page 46: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Pruning of decision trees

Cost-complexity trade-off

Cost-Complexity Pruning

Sequence of subtrees creation

Result: sequence of trees that are all sub-trees of T0 : T0T1 T2 T3 . . . Tk P1(initialnode)

Learn the biggest tree Ts = T0 := PHmax obtained for α0 = 0(s=0);

while Ts 6= P1 doTs+1 = argmin

t∈subtrees(Ts)[Rαs (t)− Rαs (Ts)];

αs+1 = Rαs (Ts+1)− Rαs (Ts);

end

We get 2 bijective sets : T0, . . . ,TS and α0, . . . , αS (with TS = P1)

Selection : Ts∗ = argminTs∈T0,...,TS

Err(Ts) ← pruning set or cross validation

46 / 52

Page 47: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Pruning of decision trees

Cost-complexity trade-off

Cost-Complexity Pruning

Figure – Sequence of nested subtrees

Here, α2 < α1 =⇒ T − T1 ⊂ T − T2

47 / 52

Page 48: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Extension : random forest

Plan

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

48 / 52

Page 49: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Extension : random forest

Random forest

MotivationI trees instability

I bias-variance trade-off

Averaging reduces variance :

Var(X ) =Var(X )

N(for independant predictions)

→ Average models to reduce model variance

One problem :

- only one training set

- where do multiple models come from ?49 / 52

Page 50: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Extension : random forest

Bagging : Bootstrap Aggregation

I Tin Kam Ho (1995) → Leo Breiman (2001)

I Take repeated bootstrap samples from the training set

I Bootstrap sampling : Given a training set D containing Nexamples, draw N examples at random with replacement fromD.

I Bagging :

- create B bootstrap samples D1, . . . ,DB

- train distinct classifier on each Db

- classify new instance by majority vote / averaging /aggregating predictions

50 / 52

Page 51: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Extension : random forest

Random forest

51 / 52

Page 52: Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning

Artificial intelligence

Extension : random forest

References

* L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen,Classification And Regression Trees, 1984.

* J. Quinlan, “Induction of decision trees,” Machine Learning,vol. 1, pp. 81–106, Oct. 1986

* L. Breiman. Random forests. Statistics, pages 1–33, 2001.

* G. Biau, L. Devroye, and G. Lugosi. Consistency of randomforests and other averaging classifiers. J. Mach. Learn. Res.,9 :2015–2033, jun 2008.

52 / 52