Naïve Bayes Advanced Statistical Methods in NLP Ling572 January 19, 2012 1

Naïve Bayes

Advanced Statistical Methods in NLPLing572

January 19, 2012

RoadmapNaïve Bayes

Multi-variate Bernoulli event model (recap)Multinomial event modelAnalysis

Naïve Bayes Models in Detail

(McCallum & Nigam, 1998)

Alternate models for Naïve Bayes Text Classification

Multivariate Bernoulli event modelBinary independence model

Features treated as binary – counts ignored

Multinomial event modelUnigram language model

Multivariate Bernoulli Event Text Model

Each document:Result of |V| independent Bernoulli trials I.e. for each word in vocabulary,

does the word appear in the document?

From general Naïve Bayes perspectiveEach word corresponds to two variables, wt and

In each doc, either wt or appearsAlways have |V| elements in a document

Training & Testing

Laplace smoothed training:

MAP decision rule classification:

Multinomial Event Model

Multinomial DistributionTrial: select a word according to its probability

Possible outcomes: {w1,w2,…,w|V|}

Document is viewed as result of:One trial for each position

P(word = wi) = pi

Σipi= 1

P(word = wi) = pi

Σipi= 1

P(X1=x1,X2=x2,….,X|V|=x|V|)

P(word = wi) = pi

Σipi= 1

P(X1=x1,X2=x2,….,X|V|=x|V|)

P(word = wi) = pi

Σipi= 1

P(X1=x1,X2=x2,….,X|V|=x|V|)

ExampleConsider a vocabulary V with only three words:

a, b, c

Due to F. Xia

a, b, c

Document di contains only 2 word instances

Due to F. Xia

a, b, c

For each position:(P(w=a)=p1, P(w=b)=p2, P(w=c) = p3

Due to F. Xia

a, b, c

For each position:(P(w=a)=p1, P(w=b)=p2, P(w=c) = p3

What is the probability that we see ‘a’ once and ‘b’ once in di?

Due to F. Xia

Example (cont’d)

How many possible sequences?

Due to F. Xia

Example (cont’d)

How many possible sequences? 3^2 = 9Sequences: aa, ab, ac, bb, ba, bc, ca, cb, cc

Due to F. Xia

Example (cont’d)

How many sequences with one ‘a’ and one ‘b’?

Due to F. Xia

Example (cont’d)

How many sequences with one ‘a’ and one ‘b’?n!/(x1!..x|v|!) = 2

Probability of the sequence ‘ab’ is:

Due to F. Xia

Example (cont’d)

Probability of the sequence ‘ab’ is: p1*p2

Probability of the sequence ‘ba’

Due to F. Xia

Example (cont’d)

Probability of the sequence ‘ba’ : p1 * p2

So probability of seeing ‘a’ once and ‘b’ once is:

Due to F. Xia

Example (cont’d)

Probability of the sequence ‘ba’ : p1 * p2

So probability of seeing ‘a’ once and ‘b’ once is:

= 2 p1*p2

Due to F. Xia

Multinomial Event ModelDocument is sequence of word events drawn

from vocabulary V.Assume document length independent of classAssume (Naïve Bayes) words independent of

context

Define Nit = # of occurrences of wt in document di

context

Then under multinomial event model:

context

TrainingP(cj|di)=1 if document di is of class cj, and 0 o.w.

Contrast this with multivariate Bernoulli

TestingTo classify a document di compute:

argmaxc P(c)P(di|c)

TestingTo classify a document di compute:

argmaxc P(c)P(di|c)

argmaxc P(c)

Two Naïve Bayes ModelsMulti-variate Bernoulli event model:

Models binary presence/absence of word feature

Multinomial event model:Models counts of word features, unigram models

In experiments on a range of different text classification corpora, multinomial model usually outperforms multivariate Bernoulli (McCallum & Nigam, 1998)

Thinking about Performance

Naïve Bayes: conditional independence assumptionClearly unrealistic, but performance is often goodWhy?

Classification based on sign, not magnitude Direction of classification usually right

Multivariate Bernoulli vs MultinomialWhy does multinomial perform better?

Captures additional information: presence/absence+freq

What if we wanted to include other types of features?

Captures additional information: presence/absence+freq

What if we wanted to include other types of features?Multivariate: just another Bernoulli trial

Multinomial can’t mix distributions

Model ComparisonMultivariate Bernoulli Multinomial Event

Features

P(w|c)

Testing

Features Binary

P(w|c)

Testing

Features Binary # of occurrences

P(w|c)

Testing

Trial Each word in vocabulary

P(w|c)

Testing

Trial Each word in vocabulary Each position in document

P(w|c)

Testing

P(w|c)

Testing

P(w|c)

Testing

P(w|c)

TestingP(c) P(c)

Naïve Bayes: StrengthsAdvantages:

Simplicity (conceptual)Training efficiencyTesting efficiencyScales fairly well to large dataPerforms multiclass classificationCan provide n-best outputs

Naïve Bayes: WeaknessesDisadvantages:

Theoretical foundation weak:Ragingly inaccurate independence assumption

Decent accuracy, but outperformed by more sophisticated

Naïve Bayes: WeaknessesDisadvantages:

HW#3Naïve Bayes Classification:

Experiment with the Mallet Naïve Bayes Learner

Implement Multivariate Bernoulli event model

Implement Multinomial event modelCompare with binary variables

Analyze results

NotesUse add-delta smoothing (vs add-one)

Beware numerical underflow log probs are your friend

Also converts exponents to multipliers

Look out for repeated computationPrecompute normalization denominators

E.g. for multinomial P(w|c), compute once for each c

EfficiencyMVB:

Naïve Bayes Advanced Statistical Methods in NLP Ling572 January 19, 2012 1

Documents

Bayes and Naïve Bayes Classifiers

Bayesianmethods&Naïve( Bayes( Lecture18

Naïve Bayes Text Classification

Classification: Logistic Regressionalvin/courses/ugml2016/02b.pdf · Logistic Regression Example Contrasting Naïve Bayes and Logistic Regression Naïve Bayes easier Naïve Bayes

Naïve Bayes: refinements

Naïve Bayesmgormley/courses/10601-s18/slides/... · 1. Bernoulli Naïve Bayes: –for binary features 2.Gaussian Naïve Bayes: –for continuousfeatures 3.MultinomialNaïve Bayes:

Naïve Bayes Classfication

Bayesian inference, Naïve Bayes model - Svetlana Lazebnikslazebni.cs.illinois.edu/fall16/lec13_bayesian_inference.pdf · Bayesian inference, Naïve Bayes model ... Bayes Rule •

5. Aufgabenblatt Naïve Bayes Klassifikation Abgabe: 07.02 ... · Naïve Bayes Klassifikation • Implementiert den Naïve-Bayes-Klassifikatorals PL-SQL-Funktion(Vorlage s. Webseite)

Classiﬁcation: Naïve Bayes

Naïve Bayes - santini.sesantini.se/teaching/ml/2016/Lect_06/06_NaiveBayes.pdf · Lecture 6 - Self-Study: Naive Bayes 19 Naïve Bayes: discussion ! Naïve Bayes works surprisingly

Naïve Bayes (Continued)

Naïve Bayes - GitHub Pagesaritter.github.io/courses/5522_slides/naive_bayes.pdf · Naïve Bayes for Digits Naïve Bayes: Assume all features are independent effects of the label

Naïve Bayes Discussion - UMD

Naïve Bayes, HMM

Klasifikasi Metagenom dengan Metode Naïve Bayes Classifier … · Klasifikasi Metagenom dengan Metode Naïve Bayes Classifier Metagenome Classification Using Naïve Bayes Classifier

Naïve Bayes based Model

Naïve Bayes Classification

kNN & Naïve Bayes

Naïve Bayes