62
Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi

Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

Embed Size (px)

Citation preview

Page 1: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Scalable Bayesian Network Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Page 2: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Introduction

• Learning from large data

is not just about scaling-upexisting algorithms.

• Large data is best tackled by fundamentally new learningalgorithms.

Page 3: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Introduction

• Learning from large data is not just about scaling-upexisting algorithms.

• Large data is best tackled by fundamentally new learningalgorithms.

Page 4: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Introduction

• Learning from large data is not just about scaling-upexisting algorithms.

• Large data is best tackled by fundamentally new learningalgorithms.

Page 5: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Learning curves

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

• Error typically reduces as data quantity increases.

Page 6: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Learning curves

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

• Error typically reduces as data quantity increases.

• Different algorithms have different curves.

Page 7: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Learning curves

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

Overfitting on small data

• Algorithms that closely fit complex multivariatedistributions will tend to overfit small data

Page 8: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Learning curves

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

Overfitting on small data

Best on large data

• Algorithms that closely fit complex multivariatedistributions will tend to overfit small data, but can betterfit large data

Page 9: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Learning curves

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

Overfitting on small data

Best on large data

• Algorithms that closely fit complex multivariatedistributions will tend to overfit small data, but can betterfit large data: Bias-Variance Tradeoff

Page 10: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Capacity to fit = model space + optimization

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1RM

SE

Training set size

Naïve Bayes

Logistic Regression

Page 11: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Much research of questionable relevance

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

• Most prior research has used relatively small data.

Page 12: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Requirements

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

• Need algorithms that can closely fit complex multivariatedistributions

, while being very computationally efficient.• low bias• few passes through data• out of core

Page 13: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Requirements

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

• Need algorithms that can closely fit complex multivariatedistributions, while being very computationally efficient.

• low bias• few passes through data• out of core

Page 14: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Requirements

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

• Need algorithms that can closely fit complex multivariatedistributions, while being very computationally efficient.

• low bias

• few passes through data• out of core

Page 15: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Requirements

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

• Need algorithms that can closely fit complex multivariatedistributions, while being very computationally efficient.

• low bias• few passes through data

• out of core

Page 16: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Requirements

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

• Need algorithms that can closely fit complex multivariatedistributions, while being very computationally efficient.

• low bias• few passes through data• out of core

Page 17: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

State-of-the-art

• Most low bias algorithms do not scale• Random Forest• SVM• Deep Learning

and inherently in-core

• Selective KDB is a scalable low-bias Bayesian NetworkClassifier

Page 18: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

State-of-the-art

• Most low bias algorithms do not scale• Random Forest• SVM• Deep Learning

and inherently in-core

• Selective KDB is a scalable low-bias Bayesian NetworkClassifier

Page 19: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

State-of-the-art

• Most low bias algorithms do not scale• Random Forest• SVM• Deep Learning

and inherently in-core

• Selective KDB is a scalable low-bias Bayesian NetworkClassifier

Page 20: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Outline

1 Introduction

2 Bayesian Network Classifiers

3 Selective KDB

4 Experiments

5 Conclusions & Future Research

Page 21: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Bayesian Network Classifiers

• Defined by parent relation π and Conditional ProbabilityTables (CPTs)

• π encodes conditional independence / structure• CPTs encode conditional probabilities

• Classifies using P(y | x) ∝ P(y | πY )∏

P(xi | πi )• Usually makes Y a parent of all Xi

Y

X1 X2 X3 X4 X5

• Given π, CPTs can be learned by counting jointfrequencies

• single pass• incremental

Page 22: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

k-Dependence Bayes (KDB)

• Two pass learning

• 1st pass, learn structure:• Collect counts for pairs of attributes with the class.• Order attributes based on MI with the class.• Select parents based on CMI.

• no more than k parents• parents must be earlier in the order

Y

X1 X2 X3 X4 X5

• 2nd pass, learn CPTs:• Collect statistics according to the structure learned.

Page 23: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

k-Dependence Bayes (KDB)

• Two pass learning

• 1st pass, learn structure:• Collect counts for pairs of attributes with the class.• Order attributes based on MI with the class.• Select parents based on CMI.

• no more than k parents• parents must be earlier in the order

Y

X1 X2 X3 X4 X5

• 2nd pass, learn CPTs:• Collect statistics according to the structure learned.

Page 24: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

k-Dependence Bayes (KDB)

• Training Space: O(ya2v2 + yavk+1

)• Classification Space: O

(yavk+1

)• Training Time: O

(ta2 + ya2v2 + tak

)• Classification Time: O(yak)

t = no. of training examplesa = number of attributes;v = average number of values;y = number of classes

Page 25: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

k-Dependence Bayes (KDB)

• Parameter k controls bias-variance tradeoff

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

Naïve Bayes

KDB k=1

KDB k=2

KDB k=3

KDB k=4

KDB k=5

• No a priori means to anticipate best k

• Spurious attributes may increase error

Page 26: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

k-Dependence Bayes (KDB)

• Parameter k controls bias-variance tradeoff

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

Naïve Bayes

KDB k=1

KDB k=2

KDB k=3

KDB k=4

KDB k=5

• No a priori means to anticipate best k

• Spurious attributes may increase error

Page 27: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

k-Dependence Bayes (KDB)

• Parameter k controls bias-variance tradeoff

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

Naïve Bayes

KDB k=1

KDB k=2

KDB k=3

KDB k=4

KDB k=5

• No a priori means to anticipate best k

• Spurious attributes may increase error

Page 28: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

KDB models are nested

• Mi ,j = KDB with k = i using attributes 1 to j .

Attributes (ordered by MI)k 1 2 3 41 M1,1 M1,2 M1,3 M1,4

2 M2,1 M2,2 M2,3 M2,4

3 M3,1 M3,2 M3,3 M3,4

4 M4,1 M4,2 M4,3 M4,4

• Mi , j is a minor extension of Mi , j−1 and Mi−1, j

Page 29: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

KDB models are nested

• Mi ,j = KDB with k = i using attributes 1 to j .

Attributes (ordered by MI)k 1 2 3 41 M1,1 M1,2 M1,3 M1,4

2 M2,1 M2,2 M2,3 M2,4

3 M3,1 M3,2 M3,3 M3,4

4 M4,1 M4,2 M4,3 M4,4

• Mi , j is a minor extension of Mi , j−1 and Mi−1, j

Page 30: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

KDB models are nested

• Mi ,j = KDB with k = i using attributes 1 to j .

Attributes (ordered by MI)k 1 2 3 41 M1,1 M1,2 M1,3 M1,4

2 M2,1 M2,2 M2,3 M2,4

3 M3,1 M3,2 M3,3 M3,4

4 M4,1 M4,2 M4,3 M4,4

• Mi , j is a minor extension of Mi , j−1 and Mi−1, j

Y

X1 X2 X3 X4 X5

Page 31: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

KDB models are nested

• Mi ,j = KDB with k = i using attributes 1 to j .

Attributes (ordered by MI)k 1 2 3 41 M1,1 M1,2 M1,3 M1,4

2 M2,1 M2,2 M2,3 M2,4

3 M3,1 M3,2 M3,3 M3,4

4 M4,1 M4,2 M4,3 M4,4

• Mi , j is a minor extension of Mi , j−1 and Mi−1, j

Y

X1 X2 X3 X4 X5

Page 32: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

KDB models are nested

• Mi ,j = KDB with k = i using attributes 1 to j .

Attributes (ordered by MI)k 1 2 3 41 M1,1 M1,2 M1,3 M1,4

2 M2,1 M2,2 M2,3 M2,4

3 M3,1 M3,2 M3,3 M3,4

4 M4,1 M4,2 M4,3 M4,4

• Mi , j is a minor extension of Mi , j−1 and Mi−1, j

Y

X1 X2 X3 X4 X5

Page 33: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Selective KDB

• In one extra pass select an attribute subset and best kusing leave-one-out CV.

• The full model subsumes all k × a submodels.

• Very efficient selection between a large class of strongmodels.

Page 34: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Selective KDB

• In one extra pass select an attribute subset and best kusing leave-one-out CV.

• The full model subsumes all k × a submodels.

• Very efficient selection between a large class of strongmodels.

Page 35: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Selective KDB

• In one extra pass select an attribute subset and best kusing leave-one-out CV.

• The full model subsumes all k × a submodels.

• Very efficient selection between a large class of strongmodels.

Page 36: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Leave-one-out CV

• Very low bias estimator of out-of-sample performance

• Incremental cross-validation makes it VERY efficient forBNC

• Collect counts from all data once• When classifying a hold-out object subtract it from the

counts

• All k × a nested models can be evaluated with little morecomputation than the full KDB model

Page 37: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Selective KDB

• Training Space: O(ya2v2 + yavk+1

)• Classification Space: O

(ya∗vk

∗+1)

• Training Time: O(ta2 + ya2v2 + tayk

)• Classification Time: O(ya∗k∗)

a = number of attributes;v = average number of values;y = number of classesa∗ = number of attributes selected (a∗ ≤ a).k∗ = best value of k found (k∗ ≤ kmax).

Page 38: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Selective KDB

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

Naïve Bayes

KDB k=1

KDB k=2

KDB k=3

KDB k=4

KDB k=5

Page 39: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Selective KDB

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

Naïve Bayes

KDB k=1

KDB k=2

KDB k=3

KDB k=4

KDB k=5

SKDB Only K

Page 40: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Selective KDB

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

Naïve BayesKDB k=1KDB k=2KDB k=3KDB k=4KDB k=5SKDB Only KSKDB k=5 Only Atts

Page 41: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Selective KDB

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

RMSE

Training set size

Naïve BayesKDB k=1KDB k=2KDB k=3KDB k=4KDB k=5SKDB Only KSKDB k=5 Only AttsSKDB

Page 42: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

16 datasets (165K-54M examples)No. of cases No. of No. of

Name (million) Atts. Classes Sizelocalization 0.165 5 11 11MB

census-income 0.299 41 2 136MB

USPSExtended 0.342 676 2 603MBs

MITFaceSetA 0.474 361 2 584MB

MITFaceSetB 0.489 361 2 603MB

MSDYear-Prediction 0.515 90 90 601MBs

covtype 0.581 54 7 72MB

MITFaceSetC 0.839 361 2 1.1GB

poker-hand 1.025 10 10 24MB

uscensus1990 2.458 67 4 325MB

PAMAP2 3.851 54 19 1.7GB

kddcup 5.210 41 40 754MB

linkage 5.749 11 2 251MB

mnist8ms 8.100 784 10 19GBs

satellite 8.705 138 24 3.6GB

splice 54.628 141 2 7.3GBs sparse format.

Page 43: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Numeric attributes

• 5 bin equal frequency discretisation

Page 44: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Out-of-core BNCs

RMSE

KDB5 vs SKDB best KDB vs SKDB

Page 45: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Out-of-core BNCs

• Training Time Comparisons

100

1000

10

100

(sec

onds

)

NB

1

Tim

e TAN

AODE

KDB k=5

SKDB

0,1

zatio

n

ncom

e

r-ha

nd

natio

n

ovty

pe

cens

us

eSet

A

eSet

B

ende

d

eSet

C

ddcu

p

SKDB

loca

liz

Cen

sus-

in

Poke

r

don

co usc

MIT

Face

MIT

Face

UPS

Sext

e

MIT

Face kd

Datasets

Page 46: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Out-of-core BNCs

• Classification Time Comparisons

100

1000

k=4k=3

k=4k=3

k=5 k=4.5

10

me

(sec

onds

)

NBTAN

k=4 k=4 k=2

k=4.8 k=51

Tim TAN

AODEKDB k=5SKDB

k 4 k 4 k 20,1

ocal

izat

ion

oker

-han

d

us-in

com

e

dona

tion

covt

ype

usce

nsus

TFac

eSet

A

Sext

ende

d

TFac

eSet

B

TFac

eSet

C

kddc

up

lo P

Cen

su

MIT

UPS

S

MIT

MIT

Datasets

Page 47: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Out-of-core SGD

• SGD in Vowpal Wabbit (VW):• Squared and logistic function.• Quadratic features (best results).• Different number of passes (3 or 10).• Discrete attributes into binary features.• One-against-all for multiclass classification.

Page 48: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Out-of-core SGD

0.0001 0.001 0.01 0.1 10.0001

0.001

0.01

0.1

1

VWLF - RMSE (log. scale)

SK

DB

(m

ax k

=5)

- R

MS

E (

log.

sca

le)

0.0001 0.001 0.01 0.1 10.0001

0.001

0.01

0.1

1

VWSF - RMSE (log. scale)

SK

DB

(m

ax k

=5)

- R

MS

E (

log.

sca

le)

VW (logistic) - RMSE VW (squared) - 0-1 Loss

VWRMSE (logistic) 0-1 Loss (squared)

Selective KDB (7+2)-0-7 8-1-7

Page 49: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Out-of-core SGD

• Training Time Comparisons

100

1000

10000

(sec

onds

)

1

10

tion

ome

and

tion yp

e

nsus

etA

etB

ded

etC

cup

Tím

e

VWSF

VWLF

SKDB

loca

lizat

Cen

sus-

inco

Pok

er-h

a

dona

t

covt

y

usce

n

MIT

Fac

eSe

MIT

Fac

eS

UP

SS

exte

nd

MIT

Fac

eS

kddc

Datasets

Page 50: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Out-of-core SGD

• Classification Time Comparisons

10

100

(sec

onds

)

0.1

1

tion

and

ome

tion

ype

nsus etA

ded

etB

etC

cup

Tim

e

VWSF

VWLF

SKDB

loca

lizat

Pok

er-h

a

Cen

sus-

inco

dona

t

covt

y

usce

n

MIT

Fac

eSe

UP

SSe

xten

d

MIT

Fac

eS

MIT

Fac

eS

kddc

Datasets

Page 51: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

In-core BayesNet and RF

BayesNet RFx = sampled dataset

BayesNet RF (Num)k-selective KDB 6-3-3 5-0-6

Page 52: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

In-core BayesNet and RF

• Training Time Comparisons

100

1000

10000

100000

seco

nds)

BayesNet

RF

0.1

1

10

tion

ome

tion

type etA

Set

B

nded

Tim

e ( s RF

SKDB

loca

lizat

Cen

sus-

inco

dona

t

covt

MIT

Fac

eS

MIT

Fac

eS

US

PS

Ext

en

Datasets

Page 53: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

In-core BayesNet and RF

• Classification Time Comparisons

10

100

seco

nds)

BayesNet

RF

0.1

1

tion

ome

tion

type etA

Set

B

nded

Tim

e (s

SKDB

loca

liza

t

Cen

sus-

inco

dona

t

covt

MIT

Fac

eS

MIT

Fac

e S

US

PS

Ext

en

Datasets

Page 54: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Global comparisons

• Cumulative Ranking

• Selective KDB copes well with high-dimensional datasetsand datasets with more than 1 million points.

• RF performs better on datasets with small # of attributes.

• VW has advantage for sparse numeric data.

Page 55: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Global comparisons

• Ranking

7.63

6.817

8

5.88

5

6

king

3.47 3.44 3.34 2.912.533

4Ran

k

1

2

NB AODE TAN BayesNet VWLF KDB (best k) RF (Num) SKDBy ( ) ( )

Learners

Page 56: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Global comparisons (Time)

Learner

Tim

e (s

econ

ds)

NB TAN AODE KDB ksKDB VWsq VWlog

050

0010

000

1500

0

Learner

Tim

e (s

econ

ds)

− lo

g. s

cale

NB TAN AODE KDB ksKDB VWsq VWlog

5010

020

050

010

00

Training Test

Page 57: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Stepping back

• The key trick is nested evaluation of a large class ofcount-based models

• Works also with 2-pass selective evaluation of AnDEmodels

• Combines the efficiency of simple generative techniqueswith the power of discriminative learning

• Can use any loss function that is a function of eachP̂(y | x)

Page 58: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Future Research

• Numeric attributes• it seems that there should be something better than

discretisation.• this remains elusive ...

• Overfitting avoidance

• Increase number of alternative models considered

• Explore other forms of nested BNCs

• Two pass Selective KDB• sample small test set in second pass

• Single pass generative/discriminative learning• initially learn a simple generative model• collect discriminative statistics and refine the model when

there is sufficient evidence to make a choice• refinements might be attribute selection, structural

refinement or weighting

• repeat

Page 59: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Satellite learning curves

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

RMSE

Training set size

NBKDB K=1KDB K=2KDB K=3KDB K=4KDB K=5KDB K=6KDB K=7KDB K=8

Page 60: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Satellite learning curves

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

RMSE

Training set size

NBKDB K=1KDB K=2KDB K=3KDB K=4KDB K=5KDB K=6KDB K=7KDB K=8SKDB K=10

Page 61: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Satellite learning curves

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

RMSE

Training set size

NBKDB K=1KDB K=2KDB K=3KDB K=4KDB K=5KDB K=6KDB K=7KDB K=8SKDB K=10Random Forest

Page 62: Scalable Bayesian Network Classifiers · Scalable Bayesian Network Classi ers Geo Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classi ers Selective KDB Experiments

ScalableBayesianNetwork

Classifiers

Geoff WebbAna MartinezNayyar Zaidi

Introduction

BayesianNetworkClassifiers

Selective KDB

Experiments

Conclusions &FutureResearch

Conclusions

• Large data calls for fundamentally new learning algorithms

• We are pioneering a new generation of theoreticallywell-founded algorithms that are both scalable to verylarge data and capable of exploiting the fine detailinherent therein