Leveraging Bagging for Evolving Data Streams

Leveraging Bagging for Evolving Data Streams

Albert Bifet, Geoff Holmes, and Bernhard Pfahringer

University of WaikatoHamilton, New Zealand

Barcelona, 21 September 2010ECML PKDD 2010

Mining Data Streams with Concept Drift

Extract information frompotentially infinite sequence of datapossibly varying over timeusing few resources

Adaptively:no prior knowledge of type or rate of change

2 / 32

Mining Data Streams with Concept Drift

Extract information frompotentially infinite sequence of datapossibly varying over timeusing few resources

Leveraging BaggingNew improvements for adaptive bagging methods using

input randomizationoutput randomization

2 / 32

Outline

1 Data stream constraints

2 Leveraging Bagging for Evolving Data Streams

3 Empirical evaluation

3 / 32

Outline




4 / 32

Mining Massive Data

Eric Schmidt, August 2010Every two days now we create as much information as we didfrom the dawn of civilization up until 2003.

5 exabytes of data

5 / 32

Data stream classification cycle

1 Process an example at atime, and inspect it onlyonce (at most)

2 Use a limited amount ofmemory

3 Work in a limited amountof time

4 Be ready to predict atany point

6 / 32

Mining Massive Data

Koichi KawanaSimplicity means the achievement of maximum effect withminimum means.

time

accuracy

memory

Data Streams

7 / 32

Evaluation Example

Accuracy Time MemoryClassifier A 70% 100 20Classifier B 80% 20 40

Which classifier is performing better?

8 / 32

RAM-Hours

RAM-HourEvery GB of RAM deployed for 1 hour

Cloud Computing Rental Cost Options

9 / 32

Evaluation Example

Accuracy Time Memory RAM-HoursClassifier A 70% 100 20 2,000Classifier B 80% 20 40 800

Which classifier is performing better?

10 / 32

Outline




11 / 32

Hoeffding TreesHoeffding Tree : VFDT

Pedro Domingos and Geoff Hulten.Mining high-speed data streams. 2000

With high probability, constructs an identical model that atraditional (greedy) method would learnWith theoretical guarantees on the error rate

Time

Contains “Money”

YESYes

NONo

Day

YES

Night

12 / 32

Hoeffding Naive Bayes Tree

Hoeffding TreeMajority Class learner at leaves

Hoeffding Naive Bayes Tree

G. Holmes, R. Kirkby, and B. Pfahringer.Stress-testing Hoeffding trees, 2005.

monitors accuracy of a Majority Class learnermonitors accuracy of a Naive Bayes learnerpredicts using the most accurate method

13 / 32

Bagging

Figure: Poisson(1) Distribution.

Bagging builds a set of M base models, with a bootstrapsample created by drawing random samples withreplacement.

14 / 32

Bagging

Figure: Poisson(1) Distribution.

Each base model’s training set contains each of the originaltraining example K times where P(K = k) follows a binomialdistribution.

14 / 32

Oza and Russell’s Online Bagging for M models

1: Initialize base models hm for all m ∈ {1,2, ...,M}2: for all training examples do3: for m = 1,2, ...,M do4: Set w = Poisson(1)5: Update hm with the current example with weight w

6: anytime output:7: return hypothesis: hfin(x) = argmaxy∈Y ∑

Tt=1 I(ht(x) = y)

15 / 32

ADWIN Bagging (KDD’09)

ADWIN

An adaptive sliding window whose size is recomputed onlineaccording to the rate of change observed.

ADWIN has rigorous guarantees (theorems)On ratio of false positives and negativesOn the relation of the size of the current window andchange rates

ADWIN BaggingWhen a change is detected, the worst classifier is removed anda new classifier is added.

16 / 32

ADWIN Bagging for M models

1: Initialize base models hm for all m ∈ {1,2, ...,M}2: for all training examples do3: for m = 1,2, ...,M do4: Set w = Poisson(1)5: Update hm with the current example with weight w6: if ADWIN detects change in error of one of the

classifiers then7: Replace classifier with higher error with a new one

8: anytime output:9: return hypothesis: hfin(x) = argmaxy∈Y ∑

Tt=1 I(ht(x) = y)

17 / 32

Leveraging Bagging for EvolvingData Streams

Randomization as a powerful tool to increase accuracy anddiversity

There are three ways of using randomization:Manipulating the input dataManipulating the classifier algorithmsManipulating the output targets

18 / 32

Input Randomization

0,00

0,05

0,10

0,15

0,20

0,25

0,30

0,35

0,40

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

k

P(X

=k

) λ=1

λ=6

λ=10

Figure: Poisson Distribution.19 / 32

ECOC Output Randomization

Table: Example matrix of random output codes for 3 classes and 6classifiers

Class 1 Class 2 Class 3Classifier 1 0 0 1Classifier 2 0 1 1Classifier 3 1 0 0Classifier 4 1 1 0Classifier 5 1 0 1Classifier 6 0 1 0

20 / 32


Leveraging BaggingUsing Poisson(λ )

Leveraging Bagging MCUsing Poisson(λ ) and Random Output Codes

Fast Leveraging Bagging MEif an instance is misclassified: weight = 1if not: weight = eT/(1−eT ),

21 / 32

Input Randomization

Baggingresampling with replacement using Poisson(1)

Other Strategiessubagging

resampling without replacementhalf subagging

resampling without replacement half of the instancesbagging without taking out any instance

using 1+Poisson(1)

22 / 32


1: Initialize base models hm for all m ∈ {1,2, ...,M}2:3: for all training examples (x ,y) do4: for m = 1,2, ...,M do5: Set w = Poisson(λ )6: Update hm with the current example with weight w

7: if ADWIN detects change in error of one of the classifiersthen

8: Replace classifier with higher error with a new one

9: anytime output:10: return hfin(x) = argmaxy∈Y ∑

Tt=1 I(ht(x) = y)

23 / 32

Leveraging Bagging for Evolving Data Streams MC

1: Initialize base models hm for all m ∈ {1,2, ...,M}2: Compute coloring µm(y)3: for all training examples (x ,y) do4: for m = 1,2, ...,M do5: Set w = Poisson(λ )6: Update hm with the current example with weight w and

class µm(y)7: if ADWIN detects change in error of one of the classifiers

then8: Replace classifier with higher error with a new one


Tt=1 I(ht(x) = µt(y))

23 / 32

Leveraging Bagging for Evolving Data Streams ME

1: Initialize base models hm for all m ∈ {1,2, ...,M}2:3: for all training examples (x ,y) do4: for m = 1,2, ...,M do5: Set w = 1 if misclassified, otherwise eT/(1−eT )6: Update hm with the current example with weight w

7: if ADWIN detects change in error of one of the classifiersthen

8: Replace classifier with higher error with a new one


Tt=1 I(ht(x) = y)

23 / 32

Outline




24 / 32

What is MOA?

{M}assive {O}nline {A}nalysis is a framework for mining datastreams.

Based on experience with Weka and VFMLFocussed on classification trees, but lots of activedevelopment: clustering, item set and sequence mining,regressionEasy to extendEasy to design and run experiments

25 / 32

MOA: the bird

The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.

26 / 32

Leveraging Bagging Empirical evaluation

Accuracy

75

77

79

81

83

85

87

89

91

93

95

1000

0

8000

0

1500

00

2200

00

2900

00

3600

00

4300

00

5000

00

5700

00

6400

00

7100

00

7800

00

8500

00

9200

00

9900

00

Instances

Ac

cu

rac

y (

%)

Leveraging Bagging

Leveraging Bagging MC

ADWIN Bagging

Online Bagging

Figure: Accuracy on dataset SEA with three concept drifts.

27 / 32

Empirical evaluationAccuracy RAM-Hours

Hoeffding Tree 74.03% 0.01Online Bagging 77.15% 2.98ADWIN Bagging 79.24% 1.48ADWIN Half Subagging 78.36% 1.04ADWIN Subagging 78.68% 1.13ADWIN Bagging WT 81.49% 2.74

ADWIN Bagging Strategieshalf subagging

resampling without replacement half of the instancessubagging

resampling without replacementWT: bagging without taking out any instance

using 1+Poisson(1)

28 / 32

Empirical evaluationAccuracy RAM-Hours

Hoeffding Tree 74.03% 0.01Online Bagging 77.15% 2.98ADWIN Bagging 79.24% 1.48Leveraging Bagging 85.54% 20.17Leveraging Bagging MC 85.37% 22.04Leveraging Bagging ME 80.77% 0.87

Leveraging BaggingLeveraging Bagging

Using Poisson(λ )Leveraging Bagging MC

Using Poisson(λ ) and Random Output CodesLeveraging Bagging ME

Using weight 1 if misclassified, otherwise eT /(1−eT )

29 / 32

Empirical evaluation

Accuracy RAM-HoursHoeffding Tree 74.03% 0.01Online Bagging 77.15% 2.98ADWIN Bagging 79.24% 1.48Random Forest Leveraging Bagging 80.69% 5.51Random Forest Online Bagging 72.91% 1.30Random Forest ADWIN Bagging 74.24% 0.89

Random Foreststhe input training set is obtained by sampling withreplacementthe nodes of the tree use only

√(n) random attributes to

splitwe only keep statistics of these attributes.

30 / 32

Leveraging Bagging Diversity

0

0,02

0,04

0,06

0,08

0,1

0,12

0,14

0,16

0,82 0,84 0,86 0,88 0,9 0,92 0,94 0,96

Kappa Statistic

Err

or

Leveraging Bagging

Online Bagging

Figure: Kappa-Error diagrams for Leveraging Bagging and Onlinebagging (bottom) on on the SEA data with three concept drifts,plotting 576 pairs of classifiers.

31 / 32

Summary

http://moa.cs.waikato.ac.nz/

ConclusionsNew improvements for bagging methods using inputrandomization

Improving Accuracy: Using Poisson(λ )Improving RAM-Hours: Using weight 1 if misclassified,otherwise eT /(1−eT )

New improvements for bagging methods using outputrandomization

No need for multi-class classifiers

32 / 32

Technology

Leveraging Bagging for Evolving Data Streams