Boosting & AdaBoost

Boosting & AdaBoost

Luis De Albaldealbar@cc.hut.fi

5.3.2008

Boosting

Definition: Method of producing a very accurate

prediction rule by combining inaccurate rules of thumb.

Objective: Improve the accuracy of any given

learning algorithm (LMS, SVM, Perceptron, etc.)

How to boost in 2 steps: Select a Classifier with an accuracy

greater than chance (random) Form an ensemble of the selected

Classifiers whose joint decision rule has high accuracy.

Example 1: Golf Tournament Tiger Woods VS 18 average+ players

Example 2: Classification Problem 2 dimensions, 2 classes, 3 Classifiers D training sets {x

1 ... x

a) Train C1 with D

1 where D

randomly selected. b) Train C

2 with D

2 where only 50%

of the selected set is correctly classified by C

c) Train C3 with D

3 where the

classification, of selected D3,

disagrees between C1 and C

Classification task:

Duda, et al.

AdaBoost

Adaptive Boosting aka“AdaBoost” Introduced by Yoav Freud and

Robert Schapire in 1995. Both from AT&T Labs.

Advantages: As many weak learners as needed

(wished) “Focuses in” on the informative or

“difficult” patterns

Algorithm Inputs: Training set:

Weak or Base algorithm C LMS, Perceptron, etc.

Number or rounds (calls) to the weak algorithm k=1,... ,K

x 1 , y 1 ,... ,x m , ym

Y={−1,1} two classes

Each training tuple receives a weight that determines its probability of being selected.

At initialization all weights across training set are uniform:

W1 i=1/mw1

For each iteration a random set is selected from the training set according to weights.

Classifier is trained with the selection

Increase weights of training patterns misclassified by

Decrease weights of training patterns correctly classified by

✔ ✗ ✔ ✔✗ ✗

Repeat for all K

Algorithm:initializeD={x 1 , y1 ,... x m , ym}, K ,W 1 i=1 /m, i=1,... ,m

loopFor k=1toK

Train weak learner Ck using Dk sampled according to W k iE k training error ofCk measured using Dk

ln [1−Ek/Ek ]

W k1 iW k ie

− k y iCk x i

returnCkandk for all K

Output of the final classifier for any given input.

H x =sign∑k=1

kC k x

Error calculation:

Error is the summation and normalization of all wrongly classified sets by the weak learner.

Weak learner shall do better than if only random guesses i.e.,

Ek=∑i=1

if Ckx i≠y i:1 :0

Analysis

measurement:

It measures the importance assigned to

There are two things to note: >=0 if E <= ½ gets larger as E gets smaller

ln [ 1−E k

Normalization factor: is a normalization factor so that

will be a distribution. Possible calculation:

Zk=∑i=1

W k1 i

Hypothesis:

H is a weighted majority vote of the K weak hypothesis where is the weight associated to

Each instance of x outputs a prediction whose sign represents the class (-1,+1) and the magnitude is the confidence.

H x =sign∑k=1

kCk x t

∣Ck x ∣

Why better than random?

Suppose three classifiers: Worse than random (E=0.75)

Alpha=-0.549 CC=1.731 IC=0.577

Random (E=0.50) Alpha=0.0 CC=1.0 IC=1.0

Better than random (E=0.25) Alpha=0.549 CC=0.577 IC=1.731

CC=Correctly Classified Weight FactorIC=Incorrectly Classified Weight Factor

Generalization Error

Based on sample size m, the VC-dimension d and the boosting rounds K.

Overfit when K to large. Experiments showed that AdaBoost do

not overfit. It continues improving after training

error is zero.

O Kdm O dm

Duda, et al.

Multi-class, Multi-label

In the multi-label case, each instance may belong to multiple labels in .

x ,Y where Y⊆

Decompose the problem into n orthogonal binary classification problems.

For Y⊆ , define Y [] for ∈ to be

Y []={1 if ∈Y−1 if ∉Y

H: X∗Y{−1,1} defined by H x ,=H x []

Then, replace each training example by k examples.

The algorithm is called AdaBoost.MH [3]

x i ,Y i x i , ,Y i [] for ∈

Homework hints

Weak learning function is a simple Perceptron

Do not forget how Alpha is computed.

Do not forget what is the Weight's vector and how it is computed.

Do not be lazy !

Conclusion

AdaBoost does not require prior knowledge of the weak learner.

The performance is completely dependent on the learner and the training data.

AdaBoost can identify outliers based on the weight's vector.

It is susceptible to noise or to examples with very large number of outliers.

Bibliography

[1] A Short Introduction to Boosting. Yoav Freud, Robert Schapire. 1999

[2] Patter Classification. Richard Duda, et al. Wiley Interscience. 2nd Edition. 2000. [s. 9.5.2]

[3] Improved boosting algorithms using confidence-rated predictions. Robert Schapire, Yoram Singer. 1998 [s. 6,7]

Boosting & AdaBoost - Aalto · Conclusion AdaBoost does not require prior knowledge of the weak...

Documents

Learning with AdaBoost

AdaBoost - University of Oxford

AdaBoost - cmp.felk.cvut.czcmp.felk.cvut.cz/cmp/courses/recognition/AdaBoost2/adaboost_talk.… · AdaBoost variants. 3/12 What is AdaBoost? AdaBoost is an algorithm for constructing

AdaBoost Lecture 8 - University at Buffalojcorso/t/2010S_555/files/lecture8.pdf · AdaBoost Lecture 8 Jason Corso SUNY at Buﬀalo March 30 2010 J. Corso (SUNY at Buﬀalo) AdaBoost

Multi-class AdaBoost

AdaBoost - cvut.czAdaBoost Lecturer: Jiří Matas Authors: JanŠochman,Jiří Matas, JanaKostlivá,OndřejDrbohlav CentreforMachinePerception CzechTechnicalUniversity,Prague 2/33 AdaBoost

HOG-AdaBoost Implementation for Human Detection …

Adaboost and its application 2007.3.2. Outline Introduction Adaboost algorithm Two class and training error Multi-class Application Face detection

The AdaBoost Algorithm - Khoury College of Computer Sciences€¦ · AdaBoost - remarks AdaBoost is adaptive • does not need to know or T a priori • can exploit ε t

Adaboost Boosting Methods

Recognition Part II: Face Detection via AdaBoost · Recognition Part II: Face Detection via AdaBoost Linda Shapiro . CSE 455 . 1 . What’s Coming 1. The basic AdaBoost algorithm

SMC Platform and Adaboost Face Tracking

Ch7: Adaboost for building robust classifiers KH Wong Ch7. Adaboost, V5b 1

Aaltoyliopisto Aalto-universitetet Aalto University · 2019-03-04 · Bussit Aalto-yliopiston asemalta Bussar från Aalto-universitetets station Buses from Aalto University station

Online Adaboost-Based Parameterized Methods for …sjmaybank/NetworkIntrusion.pdf · Online Adaboost-Based Parameterized Methods for Dynamic Distributed Network Intrusion Detection

The Rate of Convergence of AdaBoost

Tutoriel Adaboost Haar Lbp Face Detection

Theory and Applications of Boostingschapire/talks/nips-tutorial.pdf · AdaBoost •[Freund & Schapire ’95]: • introduced “AdaBoost” algorithm • strong practical advantages

AdaBoost & Its Applications 主講人：虞台文. Outline Overview The AdaBoost Algorithm How and why AdaBoost works? AdaBoost for Face Detection