When Efficient Model Averaging Out-Perform Bagging and Boosting Ian Davidson, SUNY Albany Wei Fan, IBM T.J.Watson

When Efficient Model Averaging Out-Perform Bagging and Boosting

Download PPT Report

Upload
gomer
View
26
Download
1

Embed Size (px)

DESCRIPTION

When Efficient Model Averaging Out-Perform Bagging and Boosting. Ian Davidson, SUNY Albany Wei Fan, IBM T.J.Watson. Ensemble Techniques. Techniques such as boosting and bagging are methods of combining models. - PowerPoint PPT Presentation

Citation preview

When Efficient Model Averaging Out-Perform Bagging and Boosting

Ian Davidson, SUNY Albany

Wei Fan, IBM T.J.Watson

Page 2: When Efficient Model Averaging Out-Perform Bagging and Boosting

Ensemble Techniques

• Techniques such as boosting and bagging are methods of combining models.

• Used extensively in ML and DM seems to work well in a large variety of situations.

• But model averaging is the “correct” Bayesian method of using multiple models.

• Does model averaging have a place in ML and DM?

Page 3: When Efficient Model Averaging Out-Perform Bagging and Boosting

What is Model Averaging?

Posterior weighting

Class Probability

Integration Over Model Space

Averaging of class probabilities weighted by posterior

Removes model uncertainty by averaging

Prohibitive for large model spacessuch as decision trees

Page 4: When Efficient Model Averaging Out-Perform Bagging and Boosting

Efficient Model Averaging:PBMA and Random DT

• PBMA (Davidson 04): parametric bootstrap model averaging– Use parametric model to generate multiple bootstraps

computed from a single training set.• Random Decision Tree (Fan et al 03)

– Construct each tree’s structure randomly• Categorical feature used once in a decision path• Random threshold for continuous features.

– Leaf node statistics estimated from data.– Average probability of multiple trees.

Page 5: When Efficient Model Averaging Out-Perform Bagging and Boosting

Our Empirical Study

• Idea: When model uncertainty occurs, model averaging should perform well

• Four specific but common situations when factoring in model uncertainty is beneficial– Class label noise– Many label problem– Sample selection bias– Small data sets

Page 6: When Efficient Model Averaging Out-Perform Bagging and Boosting

Class Label Noise

• Randomly flip 10% of labels

Page 7: When Efficient Model Averaging Out-Perform Bagging and Boosting

Data Set with Many Classes

Page 8: When Efficient Model Averaging Out-Perform Bagging and Boosting

Biased Training Sets

• See ICDM 2005 for a formal analysis• See KDD 2006 to look at estimating accuracy• See ICDM 2006 for a case study

Page 9: When Efficient Model Averaging Out-Perform Bagging and Boosting

Universe of Examples

Two classes:red and green

red: f2>f1green: f2<=f1

Page 10: When Efficient Model Averaging Out-Perform Bagging and Boosting

Unbiased and Biased Samples

Page 11: When Efficient Model Averaging Out-Perform Bagging and Boosting

Single Decision Tree

Unbiased 97.1% Biased 92.1%

Page 12: When Efficient Model Averaging Out-Perform Bagging and Boosting

Random Decision Tree

Unbiased 96.9% Biased 95.9%

Bagging

Unbiased 97.82% Biased 93.52%

PBMA

Unbiased 99.08% Biased 94.55

Boosting

Unbiased 96.405% Biased 92.7%

Scope of This Paper

• Identifies conditions where model averaging should outperform bagging and boosting.

• Empirically verifies these claims.

• Other questions:– Why does bagging and boosting perform

badly in these conditions?

Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of…

Documents

Machine Learning & Data Mining · • Combine multiple learning algorithms or models – Previous Lecture: Bagging & Random Forests – Today: Boosting & Ensemble Selection • “Meta

Documents

SUPERVISED LEARNING - VAST Labvast.uccs.edu/~abendale/lectures/Lec4-Supervised-learning.pdf · Decision tree induction ! K-nearest neighbor ! Ensemble methods: Bagging and Boosting

Documents

ICS 178 Intro Machine Learning decision trees, random forests, bagging, boosting

Documents

TreeBasedMethods: Bagging,Boosting,and RegressionTreesrcs46/lectures_2017/08-trees/08-tree-advanced.pdf · Bagging Recallthatgivenasetofindependentobservations Z 1,...,Z n eachwithvarianceσ2

Documents

Lecture 25: Bagging and Boosting with Decision Trees, Bias

Documents

Boosting and other Expert Fusion Strategies. References Chapter 9.5 Duda Hart & Stock Leo Breiman Boosting Bagging Arcing Presentation

Documents

Cornell CS578: Bagging and Boosting

Documents

$Lecture 20: Bagging and Boosting - math.arizona.edumath.arizona.edu/~hzhang/math574m/2016Lect20-Boost.pdf · Bagging Trees Random Forest Boosting Methods AdaBoost Additive Models$