View
216
Download
1
Category
Tags:
Preview:
Citation preview
Sep 10th, 2001Copyright © 2001, Andrew W.
Moore
Learning Gaussian Bayes Classifiers
Andrew W. MooreAssociate Professor
School of Computer ScienceCarnegie Mellon University
www.cs.cmu.edu/~awmawm@cs.cmu.edu
412-268-7599
Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received.
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 2
Maximum Likelihood learning of Gaussians for Classification
• Why we should care• 3 seconds to teach you a new learning
algorithm• What if there are 10,000 dimensions?• What if there are categorical inputs?• Examples “out the wazoo”
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 3
Why we should care• One of the original “Data Mining”
algorithms• Very simple and effective• Demonstrates the usefulness of our
earlier groundwork
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 4
Where we were at the end of the MLE lecture…
Inp
uts
ClassifierPredict
category
Inp
uts Density
EstimatorProb-ability
Inp
uts
RegressorPredictreal no.
Categorical inputs only
Mixed Real / Cat okay
Real-valued inputs only
Dec TreeJoint BC
Naïve BC
Joint DE
Naïve DE
Gauss DE
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 5
This lecture…In
pu
ts
ClassifierPredict
category
Inp
uts Density
EstimatorProb-ability
Inp
uts
RegressorPredictreal no.
Categorical inputs only
Mixed Real / Cat okay
Real-valued inputs only
Dec TreeJoint BC
Naïve BC
Joint DE
Naïve DE
Gauss DE
Gauss BC
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 6
Road MapProbability
PDFs
Gaussians
MLE ofGaussians
MLE
DensityEstimation
BayesClassifier
s
DecisionTrees
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 7
Road MapProbability
PDFs
Gaussians
MLE ofGaussians
MLE
DensityEstimation
BayesClassifier
s
Gaussian
BayesClassifie
rs
DecisionTrees
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 8
Gaussian Bayes Classifier Assumption
• The i’th record in the database is created using the following algorithm
1. Generate the output (the “class”) by drawing yi~Multinomial(p1,p2,…pNy)
2. Generate the inputs from a Gaussian PDF that depends on the value of yi :
xi ~ N(i ,i).
Test your understanding. Given Ny classes and m input attributes, how many distinct scalar parameters need to be estimated?
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 9
MLE Gaussian Bayes Classifier
• The i’th record in the database is created using the following algorithm
1. Generate the output (the “class”) by drawing yi~Multinomial(p1,p2,…pNy)
2. Generate the inputs from a Gaussian PDF that depends on the value of yi :
xi ~ N(i ,i).
Test your understanding. Given Ny classes and m input attributes, how many distinct scalar parameters need to be estimated?
|DBi|pi
mle = ------ |DB|
Let DBi = Subset ofdatabase DB in which
the output class is y = i
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 10
MLE Gaussian Bayes Classifier
• The i’th record in the database is created using the following algorithm
1. Generate the output (the “class”) by drawing yi~Multinomial(p1,p2,…pNy)
2. Generate the inputs from a Gaussian PDF that depends on the value of yi :
xi ~ N(i ,i).
Test your understanding. Given Ny classes and m input attributes, how many distinct scalar parameters need to be estimated?
(imle, i
mle )= MLE Gaussian for DBi
Let DBi = Subset ofdatabase DB in which
the output class is y = i
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 11
MLE Gaussian Bayes Classifier
• The i’th record in the database is created using the following algorithm
1. Generate the output (the “class”) by drawing yi~Multinomial(p1,p2,…pNy)
2. Generate the inputs from a Gaussian PDF that depends on the value of yi :
xi ~ N(i ,i).
Test your understanding. Given Ny classes and m input attributes, how many distinct scalar parameters need to be estimated?
(imle, i
mle )= MLE Gaussian for DBi
Let DBi = Subset ofdatabase DB in which
the output class is y = i
R
ki
mlei
ik DB|DB|
1
x
xμ
R
Tmleik
mleik
i
mlei
ik DB|DB|
1
x
μxμxΣ
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 12
Gaussian Bayes Classification
)(
)()|()|(
x
xx
p
iyPiypiyP
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 13
Gaussian Bayes Classification
)(
)()|()|(
x
xx
p
iyPiypiyP
)(
21
exp||||)2(
1
)|(2/12/
x
μxΣμxΣ
xp
p
iyPiiki
Tik
im
How do we deal with that?
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 14
Here is a dataset
48,000 records, 16 attributes [Kohavi 1995]
age employmenteducation edunummarital … job relation race gender hours_workedcountry wealth…
39 State_gov Bachelors 13 Never_married… Adm_clericalNot_in_familyWhite Male 40 United_Statespoor51 Self_emp_not_incBachelors 13 Married … Exec_managerialHusband White Male 13 United_Statespoor39 Private HS_grad 9 Divorced … Handlers_cleanersNot_in_familyWhite Male 40 United_Statespoor54 Private 11th 7 Married … Handlers_cleanersHusband Black Male 40 United_Statespoor28 Private Bachelors 13 Married … Prof_specialtyWife Black Female 40 Cuba poor38 Private Masters 14 Married … Exec_managerialWife White Female 40 United_Statespoor50 Private 9th 5 Married_spouse_absent… Other_serviceNot_in_familyBlack Female 16 Jamaica poor52 Self_emp_not_incHS_grad 9 Married … Exec_managerialHusband White Male 45 United_Statesrich31 Private Masters 14 Never_married… Prof_specialtyNot_in_familyWhite Female 50 United_Statesrich42 Private Bachelors 13 Married … Exec_managerialHusband White Male 40 United_Statesrich37 Private Some_college10 Married … Exec_managerialHusband Black Male 80 United_Statesrich30 State_gov Bachelors 13 Married … Prof_specialtyHusband Asian Male 40 India rich24 Private Bachelors 13 Never_married… Adm_clericalOwn_child White Female 30 United_Statespoor33 Private Assoc_acdm12 Never_married… Sales Not_in_familyBlack Male 50 United_Statespoor41 Private Assoc_voc 11 Married … Craft_repairHusband Asian Male 40 *MissingValue*rich34 Private 7th_8th 4 Married … Transport_movingHusband Amer_IndianMale 45 Mexico poor26 Self_emp_not_incHS_grad 9 Never_married… Farming_fishingOwn_child White Male 35 United_Statespoor33 Private HS_grad 9 Never_married… Machine_op_inspctUnmarried White Male 40 United_Statespoor38 Private 11th 7 Married … Sales Husband White Male 50 United_Statespoor44 Self_emp_not_incMasters 14 Divorced … Exec_managerialUnmarried White Female 45 United_Statesrich41 Private Doctorate 16 Married … Prof_specialtyHusband White Male 60 United_Statesrich
: : : : : : : : : : : : :
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 18
Wealth from years of education
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 21
age, hours wealth
Having 2 inputs instead of one helps in two ways:1. Combining evidence from two 1d Gaussians2. Off-diagonal covariance distinguishes class “shape”
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 22
age, hours wealth
Having 2 inputs instead of one helps in two ways:1. Combining evidence from two 1d Gaussians2. Off-diagonal covariance distinguishes class “shape”
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 30
An “MPG” exampleThings to note:
•Class Boundaries can be weird shapes (hyperconic sections)
•Class regions can be non-simply-connected
•But it’s impossible to model arbitrarily weirdly shaped regions
•Test your understanding: With one input, must classes be simply connected?
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 31
Overfitting dangers• Problem with “Joint” Bayes classifier:
#parameters exponential with #dimensions.
This means we just memorize the training data, and can overfit.
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 32
Overfitting dangers• Problem with “Joint” Bayes classifier:
#parameters exponential with #dimensions.This means we just memorize the
training data, and can overfit.• Problemette with Gaussian Bayes
classifier:#parameters quadratic with #dimensions.
With 10,000 dimensions and only 1,000 datapoints we could overfit.
Question: Any suggested solutions?
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 33
General: O(m2) parameters
mmm
m
m
221
222
12
11212
Σ
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 34
General: O(m2) parameters
mmm
m
m
221
222
12
11212
Σ
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 35
Aligned: O(m) parameters
m
m
2
12
32
22
12
0000
0000
0000
0000
0000
Σ
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 36
Aligned: O(m) parameters
m
m
2
12
32
22
12
0000
0000
0000
0000
0000
Σ
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 37
Spherical: O(1) cov
parameters
2
2
2
2
2
0000
0000
0000
0000
0000
Σ
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 38
Spherical: O(1) cov
parameters
2
2
2
2
2
0000
0000
0000
0000
0000
Σ
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 39
BCs that have both real and categorical inputs?
Inp
uts
ClassifierPredict
category
Inp
uts Density
EstimatorProb-ability
Inp
uts
RegressorPredictreal no.
Categorical inputs only
Mixed Real / Cat okay
Real-valued inputs only
Dec Tree
BC Here???
Joint BC
Naïve BC
Joint DE
Naïve DE
Gauss DE
Gauss BC
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 40
BCs that have both real and categorical inputs?
Inp
uts
ClassifierPredict
category
Inp
uts Density
EstimatorProb-ability
Inp
uts
RegressorPredictreal no.
Categorical inputs only
Mixed Real / Cat okay
Real-valued inputs only
Dec Tree
BC Here???
Joint BC
Naïve BC
Joint DE
Naïve DE
Gauss DE
Gauss BC
Easy!
Guess how?
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 41
BCs that have both real and categorical inputs?
Inp
uts
ClassifierPredict
category
Inp
uts Density
EstimatorProb-ability
Inp
uts
RegressorPredictreal no.
Categorical inputs only
Mixed Real / Cat okay
Real-valued inputs only
Dec TreeGauss/Joint BCGauss Naïve BC
Joint BC
Naïve BC
Joint DE
Naïve DE
Gauss DE
Gauss DE
Gauss BC
Gauss/Joint DE
Gauss Naïve DE
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 42
BCs that have both real and categorical inputs?
Inp
uts
ClassifierPredict
category
Inp
uts Density
EstimatorProb-ability
Inp
uts
RegressorPredictreal no.
Categorical inputs only
Mixed Real / Cat okay
Real-valued inputs only
Dec TreeGauss/Joint BCGauss Naïve BC
Joint BC
Naïve BC
Joint DE
Naïve DE
Gauss DE
Gauss DE
Gauss BC
Gauss/Joint DE
Gauss Naïve DE
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 43
Mixed Categorical / Real Density Estimation
• Write x = (u,v) = (u1 ,u2 ,…uq ,v1 ,v2 … vm-q)
Real valued Categorical valued
P(x |M)= P(u,v |M)
(where M is any Density Estimation Model)
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 44
Not sure which tasty
DE to enjoy? Try our…
Joint / Gauss DE Combo
P(u,v |M) = P(u |v ,M) P(v |M)
Gaussian withparameters
depending on v
Big “m-q”-dimensional lookup
table
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 45
MLE learning of the Joint / Gauss DE Combo
P(u,v |M) = P(u |v ,M) P(v |M)
v = Mean of u among records matching v
v = Cov. of u among records matching v
qv = Fraction of records that match v
u |v ,M ~ N(v , v) , P(v |M) = qv
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 46
MLE learning of the Joint / Gauss DE Combo
P(u,v |M) = P(u |v ,M) P(v |M)
v = Mean of u among records matching v
=
v = Cov. of u among records matching v
=
qv = Fraction of records that match v
=
vvv
ukk
kR s.t.
1
vv
vvv
μuμukk
TkkR s.t.
))((1
R
Rv
u |v ,M ~ N(v , v) , P(v |M) = qv
Rv = # records that match v
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 47
Gender and Hours Worked*
*As with all the results from the UCI “adult census” dataset, we can’t draw any real-world conclusions since it’s such a non-real-world sample
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 48
Joint / Gauss DE Combo
What we just did
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 49
Joint / Gauss BC Combo
What we do next
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 50
Joint / Gauss BC Combo
),(
)()|,(),|(
vu
vuvu
p
iYPMpiYP i
),(
)()|(),|,(
vu
vvu
p
iYPMpMp ii
),(
),;( ,,,
vu
Σμu vvv
p
pqN iiii
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 51
Joint / Gauss BC Combo
),(
)()|,(),|(
vu
vuvu
p
iYPMpiYP i
),(
)()|(),|,(
vu
vvu
p
iYPMpMp ii
),(
),;( ,,,
vu
Σμu vvv
p
pqN iiii
Rather so-so-notation for “Gaussian with mean i,v
and covariance i,v
evaluated at u”
i,v = Mean of u among records matching v and in which y=i
i,v = Cov. of u among records matching v and in which y=i
qi,v = Fraction of “y=i” records that match v
pi = Fraction of records that match “y=i”
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 54
Joint / Gauss DE Combo and Joint / Gauss BC Combo: The
downside
• (Yawn…we’ve done this before…)More than a few categorical attributes blah
blah blah massive table blah blah lots of parameters blah blah just memorize training data blah blah blah do worse on future data blah blah need to be more conservative blah
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 55
Naïve/Gauss combo for Density Estimation
How many parameters?
qm
jj
q
jj MvPMupMp
11
)|()|()|,( vu
),(~| 2jjj NMu ],...,,[~| 21lMultinomia
jjNjjj qqqMv
Real Categorical
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 56
Naïve/Gauss combo for Density Estimation
qm
jj
q
jj MvPMupMp
11
)|()|()|,( vu
),(~| 2jjj NMu ],...,,[~| 21lMultinomia
jjNjjj qqqMv
Real Categorical
k
kjj uR
1
R
hvq jjh
in which records of #
22 )(1
kjkjj u
R
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 59
Naïve / Gauss BC ),(
)()|,(),|(
vu
vuvu
p
iYPiYpiYP
)()|(),|(),(
1
11
2 iYPvPupp
qm
jijj
q
jijijj
qvu
ij =Mean of uj among records in which y=i
2ij =Var. of uj among records in which y=i
qij[h] =Fraction of “y=i” records in which vj = h
pi =Fraction of records that match “y=i”
i
qm
jjij
q
jijijj pvquN
p
11
2 ][),;(),(
1 vu
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 62
Learn Wealth from 15 attributes
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 63
Learn Wealth from 15 attributes
Sam
e d
ata
, exce
pt
all
real valu
es
dis
creti
zed
to 3
le
vels
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 64
Learn Race from 15 attributes
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 65
What you should know• A lot of this should have just been a
corollary of what you already knew• Turning Gaussian DEs into Gaussian
BCs• Mixing Categorical and Real-Valued
Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 66
Questions to Ponder• Suppose you wanted to create an
example dataset where a BC involving Gaussians crushed decision trees like a bug. What would you do?
• Could you combine Decision Trees and Bayes Classifiers? How? (maybe there is more than one possible way)
Recommended