36
Leveraging Bagging for Evolving Data Streams Albert Bifet, Geoff Holmes, and Bernhard Pfahringer University of Waikato Hamilton, New Zealand Barcelona, 21 September 2010 ECML PKDD 2010

Leveraging Bagging for Evolving Data Streams

Embed Size (px)

DESCRIPTION

This talk presents the new Leveraging Bagging for evolving data streams.

Citation preview

Page 1: Leveraging Bagging for Evolving Data Streams

Leveraging Bagging for Evolving Data Streams

Albert Bifet, Geoff Holmes, and Bernhard Pfahringer

University of WaikatoHamilton, New Zealand

Barcelona, 21 September 2010ECML PKDD 2010

Page 2: Leveraging Bagging for Evolving Data Streams

Mining Data Streams with Concept Drift

Extract information frompotentially infinite sequence of datapossibly varying over timeusing few resources

Adaptively:no prior knowledge of type or rate of change

2 / 32

Page 3: Leveraging Bagging for Evolving Data Streams

Mining Data Streams with Concept Drift

Extract information frompotentially infinite sequence of datapossibly varying over timeusing few resources

Leveraging BaggingNew improvements for adaptive bagging methods using

input randomizationoutput randomization

2 / 32

Page 4: Leveraging Bagging for Evolving Data Streams

Outline

1 Data stream constraints

2 Leveraging Bagging for Evolving Data Streams

3 Empirical evaluation

3 / 32

Page 5: Leveraging Bagging for Evolving Data Streams

Outline

1 Data stream constraints

2 Leveraging Bagging for Evolving Data Streams

3 Empirical evaluation

4 / 32

Page 6: Leveraging Bagging for Evolving Data Streams

Mining Massive Data

Eric Schmidt, August 2010Every two days now we create as much information as we didfrom the dawn of civilization up until 2003.

5 exabytes of data

5 / 32

Page 7: Leveraging Bagging for Evolving Data Streams

Data stream classification cycle

1 Process an example at atime, and inspect it onlyonce (at most)

2 Use a limited amount ofmemory

3 Work in a limited amountof time

4 Be ready to predict atany point

6 / 32

Page 8: Leveraging Bagging for Evolving Data Streams

Mining Massive Data

Koichi KawanaSimplicity means the achievement of maximum effect withminimum means.

time

accuracy

memory

Data Streams

7 / 32

Page 9: Leveraging Bagging for Evolving Data Streams

Evaluation Example

Accuracy Time MemoryClassifier A 70% 100 20Classifier B 80% 20 40

Which classifier is performing better?

8 / 32

Page 10: Leveraging Bagging for Evolving Data Streams

RAM-Hours

RAM-HourEvery GB of RAM deployed for 1 hour

Cloud Computing Rental Cost Options

9 / 32

Page 11: Leveraging Bagging for Evolving Data Streams

Evaluation Example

Accuracy Time Memory RAM-HoursClassifier A 70% 100 20 2,000Classifier B 80% 20 40 800

Which classifier is performing better?

10 / 32

Page 12: Leveraging Bagging for Evolving Data Streams

Outline

1 Data stream constraints

2 Leveraging Bagging for Evolving Data Streams

3 Empirical evaluation

11 / 32

Page 13: Leveraging Bagging for Evolving Data Streams

Hoeffding TreesHoeffding Tree : VFDT

Pedro Domingos and Geoff Hulten.Mining high-speed data streams. 2000

With high probability, constructs an identical model that atraditional (greedy) method would learnWith theoretical guarantees on the error rate

Time

Contains “Money”

YESYes

NONo

Day

YES

Night

12 / 32

Page 14: Leveraging Bagging for Evolving Data Streams

Hoeffding Naive Bayes Tree

Hoeffding TreeMajority Class learner at leaves

Hoeffding Naive Bayes Tree

G. Holmes, R. Kirkby, and B. Pfahringer.Stress-testing Hoeffding trees, 2005.

monitors accuracy of a Majority Class learnermonitors accuracy of a Naive Bayes learnerpredicts using the most accurate method

13 / 32

Page 15: Leveraging Bagging for Evolving Data Streams

Bagging

Figure: Poisson(1) Distribution.

Bagging builds a set of M base models, with a bootstrapsample created by drawing random samples withreplacement.

14 / 32

Page 16: Leveraging Bagging for Evolving Data Streams

Bagging

Figure: Poisson(1) Distribution.

Each base model’s training set contains each of the originaltraining example K times where P(K = k) follows a binomialdistribution.

14 / 32

Page 17: Leveraging Bagging for Evolving Data Streams

Oza and Russell’s Online Bagging for M models

1: Initialize base models hm for all m ∈ {1,2, ...,M}2: for all training examples do3: for m = 1,2, ...,M do4: Set w = Poisson(1)5: Update hm with the current example with weight w

6: anytime output:7: return hypothesis: hfin(x) = argmaxy∈Y ∑

Tt=1 I(ht(x) = y)

15 / 32

Page 18: Leveraging Bagging for Evolving Data Streams

ADWIN Bagging (KDD’09)

ADWIN

An adaptive sliding window whose size is recomputed onlineaccording to the rate of change observed.

ADWIN has rigorous guarantees (theorems)On ratio of false positives and negativesOn the relation of the size of the current window andchange rates

ADWIN BaggingWhen a change is detected, the worst classifier is removed anda new classifier is added.

16 / 32

Page 19: Leveraging Bagging for Evolving Data Streams

ADWIN Bagging for M models

1: Initialize base models hm for all m ∈ {1,2, ...,M}2: for all training examples do3: for m = 1,2, ...,M do4: Set w = Poisson(1)5: Update hm with the current example with weight w6: if ADWIN detects change in error of one of the

classifiers then7: Replace classifier with higher error with a new one

8: anytime output:9: return hypothesis: hfin(x) = argmaxy∈Y ∑

Tt=1 I(ht(x) = y)

17 / 32

Page 20: Leveraging Bagging for Evolving Data Streams

Leveraging Bagging for EvolvingData Streams

Randomization as a powerful tool to increase accuracy anddiversity

There are three ways of using randomization:Manipulating the input dataManipulating the classifier algorithmsManipulating the output targets

18 / 32

Page 21: Leveraging Bagging for Evolving Data Streams

Input Randomization

0,00

0,05

0,10

0,15

0,20

0,25

0,30

0,35

0,40

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

k

P(X

=k

) λ=1

λ=6

λ=10

Figure: Poisson Distribution.19 / 32

Page 22: Leveraging Bagging for Evolving Data Streams

ECOC Output Randomization

Table: Example matrix of random output codes for 3 classes and 6classifiers

Class 1 Class 2 Class 3Classifier 1 0 0 1Classifier 2 0 1 1Classifier 3 1 0 0Classifier 4 1 1 0Classifier 5 1 0 1Classifier 6 0 1 0

20 / 32

Page 23: Leveraging Bagging for Evolving Data Streams

Leveraging Bagging for Evolving Data Streams

Leveraging BaggingUsing Poisson(λ )

Leveraging Bagging MCUsing Poisson(λ ) and Random Output Codes

Fast Leveraging Bagging MEif an instance is misclassified: weight = 1if not: weight = eT/(1−eT ),

21 / 32

Page 24: Leveraging Bagging for Evolving Data Streams

Input Randomization

Baggingresampling with replacement using Poisson(1)

Other Strategiessubagging

resampling without replacementhalf subagging

resampling without replacement half of the instancesbagging without taking out any instance

using 1+Poisson(1)

22 / 32

Page 25: Leveraging Bagging for Evolving Data Streams

Leveraging Bagging for Evolving Data Streams

1: Initialize base models hm for all m ∈ {1,2, ...,M}2:3: for all training examples (x ,y) do4: for m = 1,2, ...,M do5: Set w = Poisson(λ )6: Update hm with the current example with weight w

7: if ADWIN detects change in error of one of the classifiersthen

8: Replace classifier with higher error with a new one

9: anytime output:10: return hfin(x) = argmaxy∈Y ∑

Tt=1 I(ht(x) = y)

23 / 32

Page 26: Leveraging Bagging for Evolving Data Streams

Leveraging Bagging for Evolving Data Streams MC

1: Initialize base models hm for all m ∈ {1,2, ...,M}2: Compute coloring µm(y)3: for all training examples (x ,y) do4: for m = 1,2, ...,M do5: Set w = Poisson(λ )6: Update hm with the current example with weight w and

class µm(y)7: if ADWIN detects change in error of one of the classifiers

then8: Replace classifier with higher error with a new one

9: anytime output:10: return hfin(x) = argmaxy∈Y ∑

Tt=1 I(ht(x) = µt(y))

23 / 32

Page 27: Leveraging Bagging for Evolving Data Streams

Leveraging Bagging for Evolving Data Streams ME

1: Initialize base models hm for all m ∈ {1,2, ...,M}2:3: for all training examples (x ,y) do4: for m = 1,2, ...,M do5: Set w = 1 if misclassified, otherwise eT/(1−eT )6: Update hm with the current example with weight w

7: if ADWIN detects change in error of one of the classifiersthen

8: Replace classifier with higher error with a new one

9: anytime output:10: return hfin(x) = argmaxy∈Y ∑

Tt=1 I(ht(x) = y)

23 / 32

Page 28: Leveraging Bagging for Evolving Data Streams

Outline

1 Data stream constraints

2 Leveraging Bagging for Evolving Data Streams

3 Empirical evaluation

24 / 32

Page 29: Leveraging Bagging for Evolving Data Streams

What is MOA?

{M}assive {O}nline {A}nalysis is a framework for mining datastreams.

Based on experience with Weka and VFMLFocussed on classification trees, but lots of activedevelopment: clustering, item set and sequence mining,regressionEasy to extendEasy to design and run experiments

25 / 32

Page 30: Leveraging Bagging for Evolving Data Streams

MOA: the bird

The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.

26 / 32

Page 31: Leveraging Bagging for Evolving Data Streams

Leveraging Bagging Empirical evaluation

Accuracy

75

77

79

81

83

85

87

89

91

93

95

1000

0

8000

0

1500

00

2200

00

2900

00

3600

00

4300

00

5000

00

5700

00

6400

00

7100

00

7800

00

8500

00

9200

00

9900

00

Instances

Ac

cu

rac

y (

%)

Leveraging Bagging

Leveraging Bagging MC

ADWIN Bagging

Online Bagging

Figure: Accuracy on dataset SEA with three concept drifts.

27 / 32

Page 32: Leveraging Bagging for Evolving Data Streams

Empirical evaluationAccuracy RAM-Hours

Hoeffding Tree 74.03% 0.01Online Bagging 77.15% 2.98ADWIN Bagging 79.24% 1.48ADWIN Half Subagging 78.36% 1.04ADWIN Subagging 78.68% 1.13ADWIN Bagging WT 81.49% 2.74

ADWIN Bagging Strategieshalf subagging

resampling without replacement half of the instancessubagging

resampling without replacementWT: bagging without taking out any instance

using 1+Poisson(1)

28 / 32

Page 33: Leveraging Bagging for Evolving Data Streams

Empirical evaluationAccuracy RAM-Hours

Hoeffding Tree 74.03% 0.01Online Bagging 77.15% 2.98ADWIN Bagging 79.24% 1.48Leveraging Bagging 85.54% 20.17Leveraging Bagging MC 85.37% 22.04Leveraging Bagging ME 80.77% 0.87

Leveraging BaggingLeveraging Bagging

Using Poisson(λ )Leveraging Bagging MC

Using Poisson(λ ) and Random Output CodesLeveraging Bagging ME

Using weight 1 if misclassified, otherwise eT /(1−eT )

29 / 32

Page 34: Leveraging Bagging for Evolving Data Streams

Empirical evaluation

Accuracy RAM-HoursHoeffding Tree 74.03% 0.01Online Bagging 77.15% 2.98ADWIN Bagging 79.24% 1.48Random Forest Leveraging Bagging 80.69% 5.51Random Forest Online Bagging 72.91% 1.30Random Forest ADWIN Bagging 74.24% 0.89

Random Foreststhe input training set is obtained by sampling withreplacementthe nodes of the tree use only

√(n) random attributes to

splitwe only keep statistics of these attributes.

30 / 32

Page 35: Leveraging Bagging for Evolving Data Streams

Leveraging Bagging Diversity

0

0,02

0,04

0,06

0,08

0,1

0,12

0,14

0,16

0,82 0,84 0,86 0,88 0,9 0,92 0,94 0,96

Kappa Statistic

Err

or

Leveraging Bagging

Online Bagging

Figure: Kappa-Error diagrams for Leveraging Bagging and Onlinebagging (bottom) on on the SEA data with three concept drifts,plotting 576 pairs of classifiers.

31 / 32

Page 36: Leveraging Bagging for Evolving Data Streams

Summary

http://moa.cs.waikato.ac.nz/

ConclusionsNew improvements for bagging methods using inputrandomization

Improving Accuracy: Using Poisson(λ )Improving RAM-Hours: Using weight 1 if misclassified,otherwise eT /(1−eT )

New improvements for bagging methods using outputrandomization

No need for multi-class classifiers

32 / 32