Additive Models, Trees, and Related Methods

2006. 02. 17.

Partly based on Prof. Prem Goel’s Slides

9.1 Generalized Additive Models

Mean function:

fj: unspecified smooth (nonparametric) functions

Relate conditional mean of Y to an additive function of X’s via a link function g.

1 1 2 2[ ( )] ( ) ( ) ( )p pg X f X f X f X

( ) ( | )X E Y X

Standard Link Functions

Advanced Link Functions

Fitting Additive Models Fit each fj using scatterplot smoother and esti

mate all p functions simultaneously For example, the cubic smoothing spline as smoot

her Criterion: penalized sum of squares (9.7)

An additve cubic spline model minimizes this Each fj is cubic spline in the component Xj

Knots at each of the unique values xij

( )j jY f X

The backfitting algorithm

• Can accommodate other fitting methods in same way, by specifying appropritate smoothing operator Sj.

• For a large class of linear smoothers, backfitting is equivalent to a Gauss-Seidel algorithm

Additive Logistic Regression For the logistic regression model and other generalize

d additive models, the appropriate criterion is a penalized log-likelihood.

To maximize it, the backfitting procedure is used in conjunction with a likelihood maximizer.

Local Scoring Algorithm for the Additive Logistic Regression

Pr( 1| )log ( )

Pr( 0 | ) j j

Y Xf X

9.2 Tree-Based Methods Partition the feature

space into a set of rectangles and fit a simple model in each one.

CART and C4.5

Regression Tree Assume recursive binary partition In each partition, Y is modeled with a

different constant.

For each split, choose the variable and split-point which minimizes sum of squares.

Repeat with each subset, until reach a minimum node size

( ) ( )m mf x c I x R

1 21 2

2 21 2

,( , ) ( , )

min[min ( ) min ( ) ]i i

i ij s c c

x R j s x R j s

y c y c

Regression Tree How large should we grow the tree? Cost-complexity pruning

Find tree which minimizes

Choosing adaptively by weakest link pruning

Collapse the smallest per-node increase in RSS until we get the single-node tree.

Among these sequence of trees, there exists a tree that minimizes cost-complexity

Cross-validation

21ˆ( ) ( ) | |, ( ) ( )

m m m i mx Rm

C T N Q T T Q T y cN

Classification Trees

Only change in the criteria to split nodes and pruning the tree.

Node Impurity Measures

• Cross-entropy and Gini index are more sensitive to changes in the node probabilities than the misclassification rate.

• Either cross-entropy and Gini index should be used when growing the tree.

• When pruning, any of the three can be used.

Other Issues Instability

Hierarchical process: error on the upper split is propagated down.

Bagging Lack of smoothness in prediction surface.

Can degrade performance in regression. MARS

ROC curves By varying relative sizes of the losses L01 and

L10 in loss matrix, increase/decrease the sensitivity/specificity

9.3 PRIM-Bump Hunting Patient Rule Induction Method Seeks boxes in which the response

average is high. Not binary split

Hard to interpret the collection of rules. Individual rule is simpler.

Patient Do not fragment the data quickly as binary

partition. Can help the top-down greedy algorithm

find a better solution.

9.4 MARS: Multivariate Adaptive Regression Splines

Basic element – pair of piecewise linear basis function

Form each reflected pairs for each input Xj with knots at each observed value of that input. Total 2Np basis functions

1 2{( ) , ( ) | { , , , }, 1, , }j j j j NjX t t X t x x x j p

Model Building

Forward Selection

General Basis Selection Rule

Backward Deletion

Effective # of Parameters

Other Issues MARS for classification

Two classes: 0/1 code and regression More than two classes: optimal

scoring (Section 12.5) MARS vs. CART

Piecewise linear basis vs. step functions

Multiplication vs. splitting Not necessarily binary splitting.

9.5 Hierarchical Mixtures of Experts

Soft gating network with expert at terminal node.

Hierarchical Mixtures of Experts

Estimation of parameters EM algorithm E-step: compute expectations of gating

probabilities M-step: estimate the parameters in the expert

networks by multiple logistic regression. HME vs. CART

Similar to CART with linear combination splits. Soft split: better to model gradual response

transition No method to find a good tree topology for HME

Hierarchical Mixtures of Experts

9.6 Missing Data Whether the missing data mechanism distorte

d the observed data. Missing at random(MAR) – missing data mechanis

m is independent of the observed data.

Missing completely at random(MCAR) – missing data mechanism is independent of data.

Pr( | , ) Pr( | , )obs R Z R Z

Pr( | , ) Pr( | ) R Z R

Missing Data

Assuming MCAR1. Discard observations with any

missing values.2. Rely on the learning algorithm to

deal with missing values in its training phase.

3. Impute all missing values before training.

9.7 Computational Considerations

Additive Model fitting: O(mpN+pNlogN), m is # of iterations.

Trees: O(pNlogN) for initial sorting and split computation)

MARS: O(NM2+pM2N), M is # of terms HME: O(Np2) for the regression, Np2K2(E

M algorith takes long to converge)

Additive Models, Trees, and Related Methods

Documents

Additive Groves of Regression Trees Daria Sorokina Rich Caruana Mirek Riedewald

Bayesian Ensemble Learning for Big Data - Rob …...Intro Trees and Ensemble Methods BART PBART: Parallel Bayesian Additive Trees Consensus Bayes End Loss for trees of di erent sizes,

Additive Models, Trees, and Related Methods (Part I) Joy, Jie, Lucian Oct 22 nd, 2002

QuickScorer: A Fast Algorithm to Rank Documents with Additive Ensembles of Regression Trees

Regression trees and regression graphs: Efficient estimators for Generalized Additive Models

Regression trees and regression graphs: Efficient estimators for Generalized Additive Models Adam Tauman Kalai TTI-Chicago

THE BENEFITS OF TREES · 2020. 5. 13. · THE BENEFITS OF TREES ff0 Trees provide multiple ecosystem services related to soils, water, biodiversity, and carbon sequestration. Under

Comp 540 Chapter 9: Additive Models, Trees, and Related Methods Ryan King

On limit theorems related to class additive functionals

Log-Linear Bayesian Additive Regression Trees for ... Bayesian Additive Regression Trees for Categorical and Count ... Department of Statistics, ... Poisson regression, Negative binomial

Application of bayesian additive regression trees in the ... · called BART. The main idea is imposing a prior that effectively regularizes the model, keeping individual trees simple,

AVL Trees Balanced Binary Search Trees (not covered in book, but related to pp 165-169)

Package ‘BART’ · Package ‘BART’ December 5, 2019 Type Package Title Bayesian Additive Regression Trees Version 2.7 Date 2019-12-04 Description Bayesian Additive Regression

Make Money Grow On Trees: Health Related Grants For Schools

Inverses. Additive Inverse Inverses are related to the properties of real numbers. The additive inverse is the same number with the opposite sign – it

Additive Models, Trees, etc. - Columbia University

2014 - Del Et Al. - Improving Circuit Performance With Multispeculative Additive Trees in High-level Synthesis

Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006

Heteroscedastic BART Using Multiplicative Regression Trees · model proposed was a linear parametric model similar to Daye et al. (2012). Bayesian additive regression trees (BART)

Lectures 17,18 – Boosting and Additive Trees Rice ECE697 Farinaz Koushanfar Fall 2006