Naïve Bayes Classificationweb.iitd.ac.in/~bspanda/BY.pdf · The Naive Bayes Classifier for Data...

Naïve Bayes Classification

Things We’d Like to Do

• Spam Classification

– Given an email, predict whether it is spam or not

• Medical Diagnosis

– Given a list of symptoms, predict whether a patient has disease X or not

• Weather

– Based on temperature, humidity, etc… predict if it will rain tomorrow

• The relationship between attribute set and the class variable is non-deterministic.

• Even if the attributes are same, the class label may differ in training set even and hence can not be predicted with certainty.

• Reason: noisy data, certain other attributes are not included in the data.

• Example: Task of predicting whether a person is at risk for heart disease based on the person’s diet and workout frequency.

• So need an approach to model probabilistic relationship between attribute set and the class variable.

Bayesian Classification

• Problem statement:

– Given features X1,X2,…,Xn

– Predict a label Y

Another Application

• Digit Recognition

• X1,…,Xn {0,1} (Black vs. White pixels)

• Y {5,6} (predict whether a digit is a 5 or a 6)

Classifier 5

Bayes Classifier

• A probabilistic framework for solving classification problems

• Conditional Probability:

• Bayes theorem:

)()|()|(

CPCAPACP

),()|(

CAPCAP

CAPACP

Example of Bayes Theorem • Given:

– A doctor knows that Cold causes fever 50% of the time

– Prior probability of any patient having cold is 1/50,000

– Prior probability of any patient having fever is 1/20

• If a patient has fever, what’s the probability he/she has cold?

0002.0

50000/15.0

)()|()|(

CPCFPFCP

Bayesian Classifiers • Consider each attribute and class label as

random variables

• Given a record with attributes (A1, A2,…,An)

– Goal is to predict class C

– Specifically, we want to find the value of C that maximizes P(C| A1, A2,…,An )

• Can we estimate P(C| A1, A2,…,An ) directly from data?

Bayesian Classifiers • Approach:

– compute the posterior probability P(C | A1, A2, …, An) for all values of C using the Bayes theorem

– Choose value of C that maximizes P(C | A1, A2, …, An)

– Equivalent to choosing value of C that maximizes P(A1, A2, …, An|C) P(C)

• How to estimate P(A1, A2, …, An | C )?

)()|()|(

CPCAAAPAAACP

Naïve Bayes Classifier

• Assume independence among attributes Ai when class is given:

– P(A1, A2, …, An |C) = P(A1| Cj) P(A2| Cj)… P(An| Cj)

– Can estimate P(Ai| Cj) for all Ai and Cj.

– New point is classified to Cj if P(Cj) P(Ai| Cj) is maximum.

How to Estimate Probabilities from Data? • Class: P(C) = Nc/N

– e.g., P(No) = 7/10, P(Yes) = 3/10

• For discrete attributes:

P(Ai | Ck) = |Aik|/ Nc

– where |Aik| is number of instances having attribute Ai and belongs to class Ck

– Examples:

P(Status=Married|No) = 4/7

P(Refund=Yes|Yes)=0

Tid Refund Marital Status

Taxable Income Evade

1 Yes Single 125K No

2 No Married 100K No

3 No Single 70K No

4 Yes Married 120K No

5 No Divorced 95K Yes

6 No Married 60K No

7 Yes Divorced 220K No

8 No Single 85K Yes

9 No Married 75K No

10 No Single 90K Yes 10

categoric

continuous

How to Estimate Probabilities from Data?

• For continuous attributes: – Discretize the range into bins

• one ordinal attribute per bin

• violates independence assumption

– Two-way split: (A < v) or (A > v) • choose only one of the two splits as new attribute

– Probability density estimation: • Assume attribute follows a normal distribution

• Use data to estimate parameters of distribution (e.g., mean and standard deviation)

• Once probability distribution is known, can use it to estimate the conditional probability P(Ai|c)

How to Estimate Probabilities from Data?

• Normal distribution:

– One for each (Ai,ci) pair

• For (Income, Class=No):

– If Class=No • sample mean = 110

• sample variance = 2975

Tid Refund Marital Status

Taxable Income Evade

1 Yes Single 125K No

2 No Married 100K No

3 No Single 70K No

4 Yes Married 120K No

5 No Divorced 95K Yes

6 No Married 60K No

7 Yes Divorced 220K No

8 No Single 85K Yes

9 No Married 75K No

10 No Single 90K Yes 10

categoric

continuous

1)|( ij

jiecAP

0072.0)54.54(2

1)|120( )2975(2

)110120( 2

eNoIncomeP

Example of Naïve Bayes Classifier

P(Refund=Yes|No) = 3/7

P(Refund=No|No) = 4/7

P(Refund=Yes|Yes) = 0

P(Refund=No|Yes) = 1

P(Marital Status=Single|No) = 2/7

P(Marital Status=Divorced|No)=1/7

P(Marital Status=Married|No) = 4/7

P(Marital Status=Single|Yes) = 2/7

P(Marital Status=Divorced|Yes)=1/7

P(Marital Status=Married|Yes) = 0

For taxable income:

If class=No: sample mean=110

sample variance=2975

If class=Yes: sample mean=90

sample variance=25

naive Bayes Classifier:

120K)IncomeMarried,No,Refund( X

P(X|Class=No) = P(Refund=No|Class=No)

P(Married| Class=No)

P(Income=120K| Class=No)

= 4/7 4/7 0.0072 = 0.0024

P(X|Class=Yes) = P(Refund=No| Class=Yes)

P(Married| Class=Yes)

P(Income=120K| Class=Yes)

= 1 0 1.2 10-9 = 0

Since P(X|No)P(No) > P(X|Yes)P(Yes)

Therefore P(No|X) > P(Yes|X)

=> Class = No

Given a Test Record:

Naïve Bayes Classifier

• If one of the conditional probability is zero, then the entire expression becomes zero

• Probability estimation:

mpNCAP

)|(:estimate-m

1)|(:Laplace

)|( :Original

c: number of classes

p: prior probability

m: parameter

Example of Naïve Bayes Classifier Name Give Birth Can Fly Live in Water Have Legs Class

human yes no no yes mammals

python no no no no non-mammals

salmon no no yes no non-mammals

whale yes no yes no mammals

frog no no sometimes yes non-mammals

komodo no no no yes non-mammals

bat yes yes no yes mammals

pigeon no yes no yes non-mammals

cat yes no no yes mammals

leopard shark yes no yes no non-mammals

turtle no no sometimes yes non-mammals

penguin no no sometimes yes non-mammals

porcupine yes no no yes mammals

eel no no yes no non-mammals

salamander no no sometimes yes non-mammals

gila monster no no no yes non-mammals

platypus no no no yes mammals

owl no yes no yes non-mammals

dolphin yes no yes no mammals

eagle no yes no yes non-mammals

Give Birth Can Fly Live in Water Have Legs Class

yes no yes no ?

0027.020

13004.0)()|(

021.020

706.0)()|(

0042.013

A: attributes

M: mammals

N: non-mammals

P(A|M)P(M) >

P(A|N)P(N)

=> Mammals

Naïve Bayes (Summary)

• Robust to isolated noise points

• Handle missing values by ignoring the instance during probability estimate calculations

• Robust to irrelevant attributes

• Independence assumption may not hold for some attributes – Use other techniques such as Bayesian Belief

Networks (BBN)

The Bayes Classifier

• Use Bayes Rule!

• Why did this help? Well, we think that we might be able to specify how features are “generated” by the class label

Normalization Constant

Likelihood Prior

The Bayes Classifier

• Let’s expand this for our digit recognition task:

• To classify, we’ll simply compute these two probabilities and predict based on which one is greater

Model Parameters

• For the Bayes classifier, we need to “learn” two functions, the likelihood and the prior

• How many parameters are required to specify the prior for our digit recognition example?

Model Parameters

• The problem with explicitly modeling P(X1,…,Xn|Y) is that there are usually way too many parameters:

– We’ll run out of space

– We’ll run out of time

– And we’ll need tons of training data (which is usually not available)

The Naïve Bayes Model

• The Naïve Bayes Assumption: Assume that all features are independent given the class label Y

• Equationally speaking:

• (We will discuss the validity of this assumption later)

Naïve Bayes Training

• Now that we’ve decided to use a Naïve Bayes classifier, we need to train it with some data:

MNIST Training Data

• Training in Naïve Bayes is easy:

– Estimate P(Y=v) as the fraction of records with Y=v

– Estimate P(Xi=u|Y=v) as the fraction of records with Y=v for which Xi=u

• In practice, some of these counts can be zero

• Fix this by adding “virtual” counts:

– This is called Smoothing

• For binary digits, training amounts to averaging all of the training fives together and all of the training sixes together.

Naïve Bayes Classification

Another Example of the Naïve Bayes Classifier

The weather data, with counts and probabilities

outlook temperature humidity windy play

yes no yes no yes no yes no yes no

sunny 2 3 hot 2 2 high 3 4 false 6 2 9 5

overcast 4 0 mild 4 2 normal 6 1 true 3 3

rainy 3 2 cool 3 1

sunny 2/9 3/5 hot 2/9 2/5 high 3/9 4/5 false 6/9 2/5 9/14 5/14

overcast 4/9 0/5 mild 4/9 2/5 normal 6/9 1/5 true 3/9 3/5

rainy 3/9 2/5 cool 3/9 1/5

A new day

sunny cool high true ?

• Likelihood of yes

• Likelihood of no

• Therefore, the prediction is No

0053.014

0206.014

The Naive Bayes Classifier for Data Sets with Numerical Attribute Values

• One common practice to handle numerical attribute values is to assume normal distributions for numerical attributes.

The numeric weather data with summary statistics

yes no yes no yes no yes no yes no

sunny 2 3 83 85 86 85 false 6 2 9 5

overcast 4 0 70 80 96 90 true 3 3

rainy 3 2 68 65 80 70

64 72 65 95

69 71 70 91

sunny 2/9 3/5 mean 73 74.6 mean 79.1 86.2 false 6/9 2/5 9/14 5/14

overcast 4/9 0/5 std

6.2 7.9 std

10.2 9.7 true 3/9 3/5

rainy 3/9 2/5

• Let x1, x2, …, xn be the values of a numerical attribute in the training data set.

• For examples,

• Likelihood of Yes =

• Likelihood of No =

000036.014

30221.00340.0

000136.014

3038.00291.0

0340.02.62

1Yes|66etemperatur

Outputting Probabilities

• What’s nice about Naïve Bayes (and generative models in general) is that it returns probabilities

– These probabilities can tell us how confident the algorithm is

– So… don’t throw away those probabilities!

• We defined a Bayes classifier but saw that it’s intractable to compute P(X1,…,Xn|Y)

• We then used the Naïve Bayes assumption – that everything is independent given the class label Y

• A natural question: is there some happy compromise where we only assume that some features are conditionally independent?

Conclusions

• Naïve Bayes is:

– Really easy to implement and often works well

– Often a good first thing to try

– Commonly used as a “punching bag” for smarter algorithms

Naïve Bayes Classificationweb.iitd.ac.in/~bspanda/BY.pdf · The Naive Bayes Classifier for Data...

Documents

Why Bayes for Clinical Trials? - Hbiostathbiostat.org/doc/bayes/why.pdf · 2019-09-22 · Why Bayes for Clinical Trials? Background Freq&Bayes Types of Probabilities Needed Probabilities

Bayes Factors 1 Running head: BAYES FACTORS Bayes Factor

Alleviating Naive Bayes Attribute Independence Assumption ... · Alleviating Naive Bayes Attribute Independence Assumption by Attribute Weighting Nayyar A. Zaidi NAYYAR.ZAIDI@MONASH

winprotocoldoc.blob.core.windows.net... · Web view2.623 Attribute msSFU30SearchContainer 263 2.624 Attribute msSFU30YpServers 264 2.625 Attribute msSPP-ConfigLicense 264 2.626 Attribute

Mathematical Statistics, Lecture 11 Bayes Procedures · Bayes Procedures Decision-Theoretic Framework Bayes Procedures and Reparametrization Reparametrization of Bayes Decision Problems

Alleviating Naive Bayes Attribute Independence …jmlr.org/papers/volume14/zaidi13a/zaidi13a.pdf · Alleviating Naive Bayes Attribute Independence Assumption by Attribute ... This

BAYES, ORACLE BAYES, AND EMPIRICAL BAYES By Bradley …Bayes, Oracle Bayes, and Empirical Bayes Bradley Efron Stanford University Abstract. This article concerns the Bayes and frequentist

Learning accurate and concise na¨ıve Bayes classiﬁers …honavar/Papers/KIS-Jun.pdfLearning accurate and concise na¨ıve Bayes classiﬁers from attribute value taxonomies and

Non-parametric Empirical Bayes and Compound Bayes

Spam Filtering with Naive Bayes â€“ Which Naive Bayes?

Classification: Naïve Bayes - University of Belgradeai.fon.bg.ac.rs/wp-content/uploads/2016/10/Naive-Bayes-Labs-2016.pdf · Naive Bayes classifier • Based on the Bayes rule •

NAME/ALIAS: ARCHETYP: AUGEN-/HAARFARBE: …€¦ · charakterdaten attribute zustandsmonitor kÖrperliche attribute geistige attribute besondere attribute besondere attribute konstitution:

External Interface(s) Application Guide EnergyPlus ... · External Interface(s) Application Guide ... In comparison to numerical methods of differential equations, ... the attribute

TEOREMA BAYES - rzabdulaziz.files.wordpress.com · TEOREMA BAYES . TEOREMA BAYES Definisi : •Oleh Reverend Thomas Bayes abad ke 18. •Dikembangkan secara luas dalam statistik inferensi

23: Naïve Bayes - Stanford Universityweb.stanford.edu/.../lectures/23_naive_bayes_blank.pdf21 “Brute Force Bayes” 24b_brute_force_bayes 32 Naïve Bayes Classifier 24c_naive_bayes

Spam Filtering with Naïve Bayes – Which Naïve Bayes?cobweb.cs.uga.edu/.../CSCI6900...Presentation2.pdf · Title: Spam Filtering with Naïve Bayes – Which Naïve Bayes? Author:

Naive Bayes Text Classification · Naive Bayes Text Classification Sandeep Avula asandeep@live.unc.edu. Outline Basic Probability and Notation Bayes Law and Naive Bayes Classification

Bayesian inference, Naïve Bayes model - Svetlana Lazebnikslazebni.cs.illinois.edu/fall16/lec13_bayesian_inference.pdf · Bayesian inference, Naïve Bayes model ... Bayes Rule •

Naive Bayes and Gaussian Bayes Classifier - cs.toronto.eduurtasun/courses/CSC411/tutorial4.pdf · Naive Bayes and Gaussian Bayes Classi er Mengye Ren mren@cs.toronto.edu October 18,

Naïve Bayes 𝑖 𝜶 - Kangwoncs.kangwon.ac.kr/.../2015_MachineLearning/07_naive_bayes.pdf · 2016. 6. 17. · Naïve Bayes •Bayes rule을적용하면모든데이터에대하여고려해야함