Upload
albert-bifet
View
1.022
Download
2
Embed Size (px)
DESCRIPTION
This talk presents the new Leveraging Bagging for evolving data streams.
Citation preview
Leveraging Bagging for Evolving Data Streams
Albert Bifet, Geoff Holmes, and Bernhard Pfahringer
University of WaikatoHamilton, New Zealand
Barcelona, 21 September 2010ECML PKDD 2010
Mining Data Streams with Concept Drift
Extract information frompotentially infinite sequence of datapossibly varying over timeusing few resources
Adaptively:no prior knowledge of type or rate of change
2 / 32
Mining Data Streams with Concept Drift
Extract information frompotentially infinite sequence of datapossibly varying over timeusing few resources
Leveraging BaggingNew improvements for adaptive bagging methods using
input randomizationoutput randomization
2 / 32
Outline
1 Data stream constraints
2 Leveraging Bagging for Evolving Data Streams
3 Empirical evaluation
3 / 32
Outline
1 Data stream constraints
2 Leveraging Bagging for Evolving Data Streams
3 Empirical evaluation
4 / 32
Mining Massive Data
Eric Schmidt, August 2010Every two days now we create as much information as we didfrom the dawn of civilization up until 2003.
5 exabytes of data
5 / 32
Data stream classification cycle
1 Process an example at atime, and inspect it onlyonce (at most)
2 Use a limited amount ofmemory
3 Work in a limited amountof time
4 Be ready to predict atany point
6 / 32
Mining Massive Data
Koichi KawanaSimplicity means the achievement of maximum effect withminimum means.
time
accuracy
memory
Data Streams
7 / 32
Evaluation Example
Accuracy Time MemoryClassifier A 70% 100 20Classifier B 80% 20 40
Which classifier is performing better?
8 / 32
RAM-Hours
RAM-HourEvery GB of RAM deployed for 1 hour
Cloud Computing Rental Cost Options
9 / 32
Evaluation Example
Accuracy Time Memory RAM-HoursClassifier A 70% 100 20 2,000Classifier B 80% 20 40 800
Which classifier is performing better?
10 / 32
Outline
1 Data stream constraints
2 Leveraging Bagging for Evolving Data Streams
3 Empirical evaluation
11 / 32
Hoeffding TreesHoeffding Tree : VFDT
Pedro Domingos and Geoff Hulten.Mining high-speed data streams. 2000
With high probability, constructs an identical model that atraditional (greedy) method would learnWith theoretical guarantees on the error rate
Time
Contains “Money”
YESYes
NONo
Day
YES
Night
12 / 32
Hoeffding Naive Bayes Tree
Hoeffding TreeMajority Class learner at leaves
Hoeffding Naive Bayes Tree
G. Holmes, R. Kirkby, and B. Pfahringer.Stress-testing Hoeffding trees, 2005.
monitors accuracy of a Majority Class learnermonitors accuracy of a Naive Bayes learnerpredicts using the most accurate method
13 / 32
Bagging
Figure: Poisson(1) Distribution.
Bagging builds a set of M base models, with a bootstrapsample created by drawing random samples withreplacement.
14 / 32
Bagging
Figure: Poisson(1) Distribution.
Each base model’s training set contains each of the originaltraining example K times where P(K = k) follows a binomialdistribution.
14 / 32
Oza and Russell’s Online Bagging for M models
1: Initialize base models hm for all m ∈ {1,2, ...,M}2: for all training examples do3: for m = 1,2, ...,M do4: Set w = Poisson(1)5: Update hm with the current example with weight w
6: anytime output:7: return hypothesis: hfin(x) = argmaxy∈Y ∑
Tt=1 I(ht(x) = y)
15 / 32
ADWIN Bagging (KDD’09)
ADWIN
An adaptive sliding window whose size is recomputed onlineaccording to the rate of change observed.
ADWIN has rigorous guarantees (theorems)On ratio of false positives and negativesOn the relation of the size of the current window andchange rates
ADWIN BaggingWhen a change is detected, the worst classifier is removed anda new classifier is added.
16 / 32
ADWIN Bagging for M models
1: Initialize base models hm for all m ∈ {1,2, ...,M}2: for all training examples do3: for m = 1,2, ...,M do4: Set w = Poisson(1)5: Update hm with the current example with weight w6: if ADWIN detects change in error of one of the
classifiers then7: Replace classifier with higher error with a new one
8: anytime output:9: return hypothesis: hfin(x) = argmaxy∈Y ∑
Tt=1 I(ht(x) = y)
17 / 32
Leveraging Bagging for EvolvingData Streams
Randomization as a powerful tool to increase accuracy anddiversity
There are three ways of using randomization:Manipulating the input dataManipulating the classifier algorithmsManipulating the output targets
18 / 32
Input Randomization
0,00
0,05
0,10
0,15
0,20
0,25
0,30
0,35
0,40
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
k
P(X
=k
) λ=1
λ=6
λ=10
Figure: Poisson Distribution.19 / 32
ECOC Output Randomization
Table: Example matrix of random output codes for 3 classes and 6classifiers
Class 1 Class 2 Class 3Classifier 1 0 0 1Classifier 2 0 1 1Classifier 3 1 0 0Classifier 4 1 1 0Classifier 5 1 0 1Classifier 6 0 1 0
20 / 32
Leveraging Bagging for Evolving Data Streams
Leveraging BaggingUsing Poisson(λ )
Leveraging Bagging MCUsing Poisson(λ ) and Random Output Codes
Fast Leveraging Bagging MEif an instance is misclassified: weight = 1if not: weight = eT/(1−eT ),
21 / 32
Input Randomization
Baggingresampling with replacement using Poisson(1)
Other Strategiessubagging
resampling without replacementhalf subagging
resampling without replacement half of the instancesbagging without taking out any instance
using 1+Poisson(1)
22 / 32
Leveraging Bagging for Evolving Data Streams
1: Initialize base models hm for all m ∈ {1,2, ...,M}2:3: for all training examples (x ,y) do4: for m = 1,2, ...,M do5: Set w = Poisson(λ )6: Update hm with the current example with weight w
7: if ADWIN detects change in error of one of the classifiersthen
8: Replace classifier with higher error with a new one
9: anytime output:10: return hfin(x) = argmaxy∈Y ∑
Tt=1 I(ht(x) = y)
23 / 32
Leveraging Bagging for Evolving Data Streams MC
1: Initialize base models hm for all m ∈ {1,2, ...,M}2: Compute coloring µm(y)3: for all training examples (x ,y) do4: for m = 1,2, ...,M do5: Set w = Poisson(λ )6: Update hm with the current example with weight w and
class µm(y)7: if ADWIN detects change in error of one of the classifiers
then8: Replace classifier with higher error with a new one
9: anytime output:10: return hfin(x) = argmaxy∈Y ∑
Tt=1 I(ht(x) = µt(y))
23 / 32
Leveraging Bagging for Evolving Data Streams ME
1: Initialize base models hm for all m ∈ {1,2, ...,M}2:3: for all training examples (x ,y) do4: for m = 1,2, ...,M do5: Set w = 1 if misclassified, otherwise eT/(1−eT )6: Update hm with the current example with weight w
7: if ADWIN detects change in error of one of the classifiersthen
8: Replace classifier with higher error with a new one
9: anytime output:10: return hfin(x) = argmaxy∈Y ∑
Tt=1 I(ht(x) = y)
23 / 32
Outline
1 Data stream constraints
2 Leveraging Bagging for Evolving Data Streams
3 Empirical evaluation
24 / 32
What is MOA?
{M}assive {O}nline {A}nalysis is a framework for mining datastreams.
Based on experience with Weka and VFMLFocussed on classification trees, but lots of activedevelopment: clustering, item set and sequence mining,regressionEasy to extendEasy to design and run experiments
25 / 32
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.
26 / 32
Leveraging Bagging Empirical evaluation
Accuracy
75
77
79
81
83
85
87
89
91
93
95
1000
0
8000
0
1500
00
2200
00
2900
00
3600
00
4300
00
5000
00
5700
00
6400
00
7100
00
7800
00
8500
00
9200
00
9900
00
Instances
Ac
cu
rac
y (
%)
Leveraging Bagging
Leveraging Bagging MC
ADWIN Bagging
Online Bagging
Figure: Accuracy on dataset SEA with three concept drifts.
27 / 32
Empirical evaluationAccuracy RAM-Hours
Hoeffding Tree 74.03% 0.01Online Bagging 77.15% 2.98ADWIN Bagging 79.24% 1.48ADWIN Half Subagging 78.36% 1.04ADWIN Subagging 78.68% 1.13ADWIN Bagging WT 81.49% 2.74
ADWIN Bagging Strategieshalf subagging
resampling without replacement half of the instancessubagging
resampling without replacementWT: bagging without taking out any instance
using 1+Poisson(1)
28 / 32
Empirical evaluationAccuracy RAM-Hours
Hoeffding Tree 74.03% 0.01Online Bagging 77.15% 2.98ADWIN Bagging 79.24% 1.48Leveraging Bagging 85.54% 20.17Leveraging Bagging MC 85.37% 22.04Leveraging Bagging ME 80.77% 0.87
Leveraging BaggingLeveraging Bagging
Using Poisson(λ )Leveraging Bagging MC
Using Poisson(λ ) and Random Output CodesLeveraging Bagging ME
Using weight 1 if misclassified, otherwise eT /(1−eT )
29 / 32
Empirical evaluation
Accuracy RAM-HoursHoeffding Tree 74.03% 0.01Online Bagging 77.15% 2.98ADWIN Bagging 79.24% 1.48Random Forest Leveraging Bagging 80.69% 5.51Random Forest Online Bagging 72.91% 1.30Random Forest ADWIN Bagging 74.24% 0.89
Random Foreststhe input training set is obtained by sampling withreplacementthe nodes of the tree use only
√(n) random attributes to
splitwe only keep statistics of these attributes.
30 / 32
Leveraging Bagging Diversity
0
0,02
0,04
0,06
0,08
0,1
0,12
0,14
0,16
0,82 0,84 0,86 0,88 0,9 0,92 0,94 0,96
Kappa Statistic
Err
or
Leveraging Bagging
Online Bagging
Figure: Kappa-Error diagrams for Leveraging Bagging and Onlinebagging (bottom) on on the SEA data with three concept drifts,plotting 576 pairs of classifiers.
31 / 32
Summary
http://moa.cs.waikato.ac.nz/
ConclusionsNew improvements for bagging methods using inputrandomization
Improving Accuracy: Using Poisson(λ )Improving RAM-Hours: Using weight 1 if misclassified,otherwise eT /(1−eT )
New improvements for bagging methods using outputrandomization
No need for multi-class classifiers
32 / 32