42
Improving Adaptive Bagging Methods for Evolving Data Streams A. Bifet, G. Holmes, B. Pfahringer, and R. Gavaldà University of Waikato Hamilton, New Zealand Laboratory for Relational Algorithmics, Complexity and Learning LARCA UPC-Barcelona Tech, Catalonia Nanjing, 4 November 2009 1st Asian Conference on Machine Learning (ACML’09)

Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Improving Adaptive Bagging Methodsfor Evolving Data Streams

A. Bifet, G. Holmes, B. Pfahringer, and R. Gavaldà

University of WaikatoHamilton, New Zealand

Laboratory for Relational Algorithmics, Complexity and Learning LARCAUPC-Barcelona Tech, Catalonia

Nanjing, 4 November 20091st Asian Conference on Machine Learning (ACML’09)

Page 2: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Motivation

MOA Software for Mining Data StreamsBuild a useful software mining for massive data sets

Bagging for data streamsImprove accuracy on classification methods for datastreams

2 / 26

Page 3: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Data stream classification cycle

1 Process an example at a time,and inspect it only once (atmost)

2 Use a limited amount ofmemory

3 Work in a limited amount oftime

4 Be ready to predict at anypoint

3 / 26

Page 4: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Realtime analytics: from Databases to Dataflows

Data streamsData streams are ordered datasetsNot all datasets are data streamsAll dataset may be processed incrementally as a datastream

MOA: Massive Online AnalysisFaster Mining Software using less resources

Instant mining: more for less

4 / 26

Page 5: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

What is MOA?

{M}assive {O}nline {A}nalysis is a framework for online learningfrom data streams.

It is closely related to WEKAIt includes a collection of offline and online as well as toolsfor evaluation:

boosting and baggingHoeffding Trees

with and without Naïve Bayes classifiers at the leaves.

5 / 26

Page 6: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

WEKA

Waikato Environment for Knowledge AnalysisCollection of state-of-the-art machine learning algorithmsand data processing tools implemented in Java

Released under the GPLSupport for the whole process of experimental data mining

Preparation of input dataStatistical evaluation of learning schemesVisualization of input data and the result of learning

Used for education, research and applicationsComplements “Data Mining” by Witten & Frank

6 / 26

Page 7: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

WEKA: the bird

7 / 26

Page 8: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

MOA: the bird

The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.

8 / 26

Page 9: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

MOA: the bird

The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.

8 / 26

Page 10: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

MOA: the bird

The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.

8 / 26

Page 11: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

MOA: the bird

The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.

8 / 26

Page 12: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Concept Drift Framework

t

f (t) f (t)

α

α

t0W

0.5

1

DefinitionGiven two data streams a, b, we define c = a⊕W

t0 b as the datastream built joining the two data streams a and b

Pr[c(t) = b(t)] = 1/(1+ e−4(t−t0)/W ).Pr[c(t) = a(t)] = 1−Pr[c(t) = b(t)]

9 / 26

Page 13: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Concept Drift Framework

t

f (t) f (t)

α

α

t0W

0.5

1

Example

(((a⊕W0t0 b)⊕W1

t1 c)⊕W2t2 d) . . .

(((SEA9⊕Wt0 SEA8)⊕W

2t0SEA7)⊕W

3t0SEA9.5)

CovPokElec = (CoverType⊕5,000581,012 Poker)⊕5,000

1,000,000 ELEC2

9 / 26

Page 14: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

New Ensemble Methods For Evolving Data Streams

New Ensemble Methods For Evolving Streams (KDD’09)a new experimental data stream framework for studyingconcept drifttwo new variants of Bagging:

ADWIN BaggingAdaptive-Size Hoeffding Tree (ASHT) Bagging.

an evaluation study on synthetic and real-world datasets

10 / 26

Page 15: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Outline

1 Adaptive-Size Hoeffding Tree bagging

2 ADWIN Bagging

3 Empirical evaluation

11 / 26

Page 16: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Adaptive-Size Hoeffding Tree

T1 T2 T3 T4

Ensemble of trees of different sizeeach tree has a maximum sizeafter one node splits, it deletes some nodes to reduce itssize if the size of the tree is higher than the maximum value

12 / 26

Page 17: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Adaptive-Size Hoeffding Tree

T1 T2 T3 T4

Ensemble of trees of different sizesmaller trees adapt more quickly to changes,larger trees do better during periods with little changediversity

12 / 26

Page 18: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Adaptive-Size Hoeffding Tree

0,2

0,21

0,22

0,23

0,24

0,25

0,26

0,27

0,28

0,29

0,3

0 0,1 0,2 0,3 0,4 0,5 0,6

Kappa

Err

or

0,25

0,255

0,26

0,265

0,27

0,275

0,28

0,1 0,12 0,14 0,16 0,18 0,2 0,22 0,24 0,26 0,28 0,3

Kappa

Err

or

Figure: Kappa-Error diagrams for ASHT bagging (left) and bagging(right) on dataset RandomRBF with drift, plotting 90 pairs ofclassifiers.

13 / 26

Page 19: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Improvement for ASHT Bagging Method

Improvement for ASHT Bagging ensemble methodBagging using trees of different size

add a change detector for each tree in the ensembleDDM: Gama et al.EDDM: Baena, del Campo, Fidalgo et al.

14 / 26

Page 20: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Outline

1 Adaptive-Size Hoeffding Tree bagging

2 ADWIN Bagging

3 Empirical evaluation

15 / 26

Page 21: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

ADWIN Bagging

ADWIN

An adaptive sliding window whose size is recomputed onlineaccording to the rate of change observed.

ADWIN has rigorous guarantees (theorems)On ratio of false positives and negativesOn the relation of the size of the current window andchange rates

ADWIN BaggingWhen a change is detected, the worst classifier is removed anda new classifier is added.

16 / 26

Page 22: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Optimal Change Detector and PredictorADWIN

High accuracyFast detection of changeLow false positives and false negatives ratiosLow computational cost: minimum space and time neededTheoretical guaranteesNo parameters neededEstimator with Memory and Change Detector

17 / 26

Page 23: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Algorithm ADaptive Sliding WINdowADWIN

Example

W= 101010110111111W0= 1

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

18 / 26

Page 24: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Algorithm ADaptive Sliding WINdowADWIN

Example

W= 101010110111111W0= 1 W1 = 01010110111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

18 / 26

Page 25: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Algorithm ADaptive Sliding WINdowADWIN

Example

W= 101010110111111W0= 10 W1 = 1010110111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

18 / 26

Page 26: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Algorithm ADaptive Sliding WINdowADWIN

Example

W= 101010110111111W0= 101 W1 = 010110111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

18 / 26

Page 27: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Algorithm ADaptive Sliding WINdowADWIN

Example

W= 101010110111111W0= 1010 W1 = 10110111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

18 / 26

Page 28: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Algorithm ADaptive Sliding WINdowADWIN

Example

W= 101010110111111W0= 10101 W1 = 0110111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

18 / 26

Page 29: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Algorithm ADaptive Sliding WINdowADWIN

Example

W= 101010110111111W0= 101010 W1 = 110111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

18 / 26

Page 30: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Algorithm ADaptive Sliding WINdowADWIN

Example

W= 101010110111111W0= 1010101 W1 = 10111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

18 / 26

Page 31: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Algorithm ADaptive Sliding WINdowADWIN

Example

W= 101010110111111W0= 10101011 W1 = 0111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

18 / 26

Page 32: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Algorithm ADaptive Sliding WINdowADWIN

Example

W= 101010110111111 |µ̂W0− µ̂W1 | ≥ εc : CHANGE DET.!

W0= 101010110 W1 = 111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

18 / 26

Page 33: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Algorithm ADaptive Sliding WINdowADWIN

Example

W= 101010110111111 Drop elements from the tail of WW0= 101010110 W1 = 111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

18 / 26

Page 34: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Algorithm ADaptive Sliding WINdowADWIN

Example

W= 01010110111111 Drop elements from the tail of WW0= 101010110 W1 = 111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

18 / 26

Page 35: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Algorithm ADaptive Sliding WINdowADWIN

TheoremAt every time step we have:

1 (False positive rate bound). If µt remains constant withinW, the probability that ADWIN shrinks the window at thisstep is at most δ .

2 (False negative rate bound). Suppose that for somepartition of W in two parts W0W1 (where W1 contains themost recent items) we have |µW0−µW1 |> 2εc . Then withprobability 1−δ ADWIN shrinks W to W1, or shorter.

ADWIN tunes itself to the data stream at hand, with no need forthe user to hardwire or precompute parameters.

19 / 26

Page 36: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Algorithm ADaptive Sliding WINdowADWIN

ADWIN using a Data Stream Sliding Window Model,can provide the exact counts of 1’s in O(1) time per point.tries O(logW ) cutpointsuses O(1

εlogW ) memory words

the processing time per example is O(logW ) (amortizedand worst-case).

Sliding Window Model

1010101 101 11 1 1Content: 4 2 2 1 1Capacity: 7 3 2 1 1

20 / 26

Page 37: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

ADWIN bagging using Hoeffding Adaptive TreesDecision Trees: Hoeffding Adaptive Tree

CVFDT: Hulten, Spencer and DomingosNo theoretical guarantees on the error rate of CVFDTParameters needed : size of window, number ofexamples,...

Hoeffding Adaptive Tree:replace frequency statistics counters by estimators

don’t need a window to store examples

use a change detector with theoretical guarantees tosubstitute trees

Advantages:1 Theoretical guarantees2 No Parameters

21 / 26

Page 38: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Outline

1 Adaptive-Size Hoeffding Tree bagging

2 ADWIN Bagging

3 Empirical evaluation

22 / 26

Page 39: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Empirical evaluation

Dataset Most Accurate MethodHyperplane Drift 0.0001 Bag10 ASHT W+RHyperplane Drift 0.001 DDM Bag10 ASHT WSEA W = 50 BagADWIN 10 HATSEA W = 50000 BagADWIN 10 HATRandomRBF No Drift 50 centers Bag 10 HTRandomRBF Drift .0001 50 centers BagADWIN 10 HATRandomRBF Drift .001 50 centers DDM Bag10 ASHT WRandomRBF Drift .001 10 centers BagADWIN 10 HATCover Type DDM Bag10 ASHT WPoker BagADWIN 10 HATElectricity DDM Bag10 ASHT WCovPokElec BagADWIN 10 HAT

23 / 26

Page 40: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Empirical evaluation

SEAW= 50000

Time Acc. Mem.BagADWIN 10 HAT 154.91 88.88 ± 0.05 2.35DDM Bag10 ASHT W 44.02 88.72 ± 0.05 0.65NaiveBayes 5.52 84.60 ± 0.03 0.00NBADWIN 12.40 87.83 ± 0.07 0.02HT 7.20 85.02 ± 0.11 0.33HT DDM 7.88 88.17 ± 0.18 0.16HAT 20.96 88.40 ± 0.07 0.18BagADWIN 10 HT 53.15 88.58 ± 0.10 0.88Bag10 HT 30.88 85.38 ± 0.06 3.36Bag10 ASHT W+R 33.56 88.51 ± 0.06 0.84

24 / 26

Page 41: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Empirical evaluation

Accuracy

71,4

71,9

72,4

72,9

73,4

73,9

10.000 140.000 270.000 400.000 530.000 660.000 790.000 920.000

Instances

Ac

cu

rac

y (

%)

BagAdwin HAT

DDM BagHAST

EDDM BagHast

DDM HT

EDDM HT

BagAdwin HT

BagHAST

Figure: Accuracy on dataset LED with three concept drifts.

25 / 26

Page 42: Improving Adaptive Bagging Methods for Evolving Data Streamsgavalda/papers/acml09slides.pdf · Realtime analytics: from Databases to Dataflows Data streams Data streams are ordered

Summary

http://www.cs.waikato.ac.nz/∼abifet/MOA/

ConclusionsNew improvements for ensemble bagging methods:

Adaptive-Size Hoeffding Tree bagging using changedetection methodsADWIN bagging using Hoeffding Adaptive Trees

MOA is easy to use and extend

Future WorkExtend MOA to more data mining and learning methods.

26 / 26