51
Instructor: Alan Ritter Probability Review and Naïve Bayes Some slides adapted from Dan Jurfasky and Brendan O’connor

Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

Embed Size (px)

Citation preview

Page 1: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

Instructor:AlanRitter

ProbabilityReviewandNaïveBayes

SomeslidesadaptedfromDanJurfasky andBrendanO’connor

Page 2: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

WhatisProbability?

• “Theprobabilitythecoinwilllandheadsis0.5”– Q:whatdoesthismean?

• 2Interpretations:– Frequentist (Repeatedtrials)• Ifweflipthecoinmanytimes…

– Bayesian• Webelievethereisequalchanceofheads/tails• Advantage:eventsthatdonothavelongtermfrequencies

Q:Whatistheprobability thepolaricecapswillmeltby2050?

Page 3: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

ProbabilityReview

X

x

P (X = x) = 1

P (A,B)

P (B)= P (A|B)

P (A|B)P (B) = P (A,B)

ConditionalProbability

ChainRule

Page 4: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

ProbabilityReview

P (A _B) = P (A) + P (B)� P (A ^B)

P (¬A) = 1� P (A)

Disjunction/Union:

Negation:

X

x

P (X = x, Y ) = P (Y )

Page 5: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

BayesRule

H D

P (D|H)GenerativeModelofHowHypothesisCausesData

P (H|D)BayesianInferece

Hypothesis(Unknown)

Data(ObservedEvidence)

P (H|D) =P (D|H)P (H)

P (D)

BayesRuletellsushowtofliptheconditionalReasonabouteffectstocausesUsefulifyouassumeagenerativemodel foryourdata

Page 6: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

BayesRule

P (H|D) =P (D|H)P (H)

P (D)

BayesRuletellsushowtofliptheconditionalReasonabouteffectstocausesUsefulifyouassumeagenerativemodel foryourdata

PriorLikelihood

NormalizerPosterior

Page 7: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

BayesRuleBayesRuletellsushowtofliptheconditionalReasonabouteffectstocausesUsefulifyouassumeagenerativemodel foryourdata

PriorLikelihood

NormalizerPosterior

P (H|D) =P (D|H)P (H)Ph P (D|H)P (H)

Page 8: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

BayesRuleBayesRuletellsushowtofliptheconditionalReasonabouteffectstocausesUsefulifyouassumeagenerativemodel foryourdata

PriorLikelihood

P (H|D) / P (D|H)P (H)

ProportionalTo(Doesn’tsumto1)Posterior

Page 9: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

BayesRuleExample

• Thereisadiseasethataffectsatinyfractionofthepopulation(0.01%)

• Symptomsincludeaheadacheandstiffneck– 99%ofpatientswiththediseasehavethesesymptoms

• 1%ofthegeneralpopulationhasthesesymptoms

• Q:assumeyouhavethesymptom,whatisyourprobabilityofhavingthedisease?

Page 10: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

TextClassification

Page 11: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

IsthisSpam?

Page 12: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

WhowrotewhichFederalistpapers?

• 1787-8:anonymousessaystrytoconvinceNewYorktoratifyU.SConstitution:Jay,Madison,Hamilton.

• Authorshipof12ofthelettersindispute• 1963:solvedbyMosteller andWallaceusingBayesianmethods

JamesMadison AlexanderHamilton

Page 13: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

Whatisthesubjectofthisarticle?

• Antogonists andInhibitors

• BloodSupply• Chemistry• DrugTherapy• Embryology• Epidemiology• …

MeSH SubjectCategoryHierarchyMEDLINE Article

Page 14: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

Positiveornegativemoviereview?

• unbelievablydisappointing• Fullofzanycharactersandrichlyappliedsatire,andsomegreatplottwists

• thisisthegreatestscrewballcomedyeverfilmed

• Itwaspathetic.Theworstpartaboutitwastheboxingscenes.

Page 15: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

TextClassification:definition

• Input:– adocumentd– afixedsetofclassesC= {c1,c2,…,cJ}

• Output:apredictedclassc∈ C

Page 16: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

ClassificationMethods:Hand-codedrules

• Rulesbasedoncombinationsofwordsorotherfeatures– spam:black-list-addressOR(“dollars”AND“havebeenselected”)

• Accuracycanbehigh– Ifrulescarefullyrefinedbyexpert

• Butbuildingandmaintainingtheserulesisexpensive

Page 17: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

ClassificationMethods:SupervisedMachineLearning

• Input:– adocumentd– afixedsetofclassesC= {c1,c2,…,cJ}– Atrainingsetofm hand-labeleddocuments(d1,c1),....,(dm,cm)

• Output:– alearnedclassifierγ:dà c

Page 18: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

ClassificationMethods:SupervisedMachineLearning

• Anykindofclassifier– Naïve Bayes– Logisticregression– Support-vectormachines– k-NearestNeighbors

– …

Page 19: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

NaïveBayesIntuition

• Simple(“naïve”)classificationmethodbasedonBayesrule

• Reliesonverysimplerepresentationofdocument– Bagofwords

Page 20: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There
Page 21: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

Planning GUIGarbageCollection

Machine Learning NLP

parsertagtrainingtranslationlanguage...

learningtrainingalgorithmshrinkagenetwork...

garbagecollectionmemoryoptimizationregion...

Testdocument

parserlanguagelabeltranslation…

Bagofwordsfordocumentclassification

...planningtemporalreasoningplanlanguage...

?

Page 22: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

Bayes’RuleAppliedtoDocumentsandClasses

•Foradocumentd andaclassc

P(c | d) = P(d | c)P(c)P(d)

Page 23: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

Naïve BayesClassifier(I)

cMAP = argmaxc∈C

P(c | d)

= argmaxc∈C

P(d | c)P(c)P(d)

= argmaxc∈C

P(d | c)P(c)

MAPis“maximumaposteriori”=mostlikelyclass

BayesRule

Droppingthedenominator

Page 24: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

Naïve BayesClassifier(II)

cMAP = argmaxc∈C

P(d | c)P(c)

= argmaxc∈C

P(x1, x2,…, xn | c)P(c)

Page 25: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

Naïve BayesClassifier(IV)

Howoftendoesthisclassoccur?

cMAP = argmaxc∈C

P(x1, x2,…, xn | c)P(c)

O(|X|n•|C|) parameters

Wecanjustcounttherelativefrequenciesinacorpus

Couldonlybeestimatedifavery,verylargenumberof trainingexampleswasavailable.

Page 26: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

MultinomialNaïve BayesIndependenceAssumptionsP(x1, x2,…, xn | c)

• BagofWordsassumption:Assumepositiondoesn’tmatter

• ConditionalIndependence:AssumethefeatureprobabilitiesP(xi|cj)areindependentgiventheclassc.

Page 27: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

MultinomialNaïve BayesClassifier

cMAP = argmaxc∈C

P(x1, x2,…, xn | c)P(c)

cNB = argmaxc∈C

P(cj ) P(x | c)x∈X∏

Page 28: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

ApplyingMultinomialNaiveBayesClassifierstoTextClassification

cNB = argmaxc j∈C

P(cj ) P(xi | cj )i∈positions∏

positions ← allwordpositionsintestdocument

Page 29: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

LearningtheMultinomialNaïve BayesModel

• Firstattempt:maximumlikelihoodestimates– simplyusethefrequenciesinthedata

P̂(wi | cj ) =count(wi,cj )count(w,cj )

w∈V∑

P̂(cj ) =doccount(C = cj )

Ndoc

Page 30: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

• Createmega-documentfortopicj byconcatenatingalldocsinthistopic– Usefrequencyofw inmega-document

Parameterestimation

fractionoftimeswordwi appearsamongallwordsindocumentsoftopiccj

P̂(wi | cj ) =count(wi,cj )count(w,cj )

w∈V∑

Page 31: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

ProblemwithMaximumLikelihood• Whatifwehaveseennotrainingdocumentswiththewordfantastic andclassifiedinthetopicpositive (thumbs-up)?

• Zeroprobabilitiescannotbeconditionedaway,nomattertheotherevidence!

cMAP = argmaxc P̂(c) P̂(xi | c)i∏

Page 32: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

Laplace(add-1)smoothingforNaïve Bayes

=count(wi,c)+1

count(w,cw∈V∑ )

#

$%%

&

'(( + V

P̂(wi | c) =count(wi,c)count(w,c)( )

w∈V∑

Page 33: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

MultinomialNaïveBayes:Learning

• CalculateP(cj) terms– Foreachcj inC dodocsj← alldocswithclass=cj

P(cj )←| docsj |

| total # documents|

Page 34: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

MultinomialNaïveBayes:Learning

P(wk | cj )←nk +α

n+α |Vocabulary |

CalculateP(wk | cj) terms• Textj ← singledoccontainingalldocsj• Foreachwordwk inVocabulary

nk← #ofoccurrencesofwk inTextj

Fromtrainingcorpus,extractVocabulary

Page 35: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

Exercise

Page 36: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

NaïveBayesClassification:PracticalIssues

• Multiplyingtogetherlotsofprobabilities• Probabilitiesarenumbersbetween0and1• Q:Whatcouldgowronghere?

cMAP = argmaxcP (c|x1, . . . , xn)

= argmaxcP (x1, . . . , xn|c)P (c)

= argmaxcP (c)

nY

i=1

P (xi|c)

Page 37: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

Workingwithprobabilitiesinlogspace

Page 38: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

LogIdentities(review)

log(a⇥ b) = log(a) + log(b)

log(

a

b) = log(a)� log(b)

log(an) = n log(a)

Page 39: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

NaïveBayeswithLogProbabilities

cMAP = argmaxcP (c|x1, . . . , xn)

= argmaxcP (c)

nY

i=1

P (xi|c)

= argmaxc log

P (c)

nY

i=1

P (xi|c)!

= argmaxc logP (c) +

nX

i=1

logP (xi|c)

Page 40: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

cMAP = argmaxc logP (c) +

nX

i=1

logP (xi|c)

NaïveBayeswithLogProbabilities

• Q:Whydon’twehavetoworryaboutfloatingpointunderflowanymore?

Page 41: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

Whatifwewanttocalculateposteriorlog-probabilities?

P (c|x1, . . . , xn) =P (c)

Qni=1 P (xi|c)P

c0 P (c0)Qn

i=1 P (xi|c0)

logP (c|x1, . . . , xn) = log

P (c)

Qni=1 P (xi|c)P

c0 P (c

0)

Qni=1 P (xi|c0)

= logP (c) +

nX

i=1

P (xi|c)� log

"X

c0

P (c

0)

nY

i=1

P (xi|c0)#

Page 42: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

LogExp SumTrick:motivation

• Wehave:abunchoflogprobabilities.– log(p1),log(p2),log(p3),…log(pn)

• Wewant:log(p1+p2+p3+…pn)• Wecouldconvertbackfromlogspace,sumthentakethelog.– Iftheprobabilitiesareverysmall,thiswillresultinfloatingpointunderflow

Page 43: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

LogExp SumTrick:

log[

X

i

exp(x

i

)] = x

max

+ log[

X

i

exp(x

i

� x

max

)]

Page 44: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

Anotherissue:Smoothing

ˆP (wi|c) =count(w, c) + 1P

w02V count(w’,c) + |V |

Page 45: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

Anotherissue:Smoothing

ˆP (wi|c) =count(w, c) + ↵P

w02V count(w0, c) + ↵|V |

Alphadoesn’tnecessarilyneedtobe1

(hyperparmeter)

Page 46: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

Anotherissue:Smoothing

ˆP (wi|c) =count(w, c) + ↵P

w02V count(w0, c) + ↵|V |

Canthinkofalphaasa“pseudocount”.Imaginarynumberoftimesthiswordhasbeenseen.

Page 47: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

Anotherissue:Smoothing

ˆP (wi|c) =count(w, c) + ↵P

w02V count(w0, c) + ↵|V |

• Q:Whatifalpha=0?• Q:whatifalpha=0.000001?• Q:whathappensasalphagetsverylarge?

Page 48: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

Overfitting

• Modelcarestoomuchaboutthetrainingdata• Howtocheckforoverfitting?– Trainingvs.testaccuracy

• Pseudocount parametercombatsoverfitting

Page 49: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

Q:howtopickAlpha?

• Splittrainvs.Test• Tryabunchofdifferentvalues

• Pickthevalueofalphathatperformsbest

• Whatvaluestotry?Gridsearch– (10^-2,10^-1,...,10^2)

accuracy

Usethisone

Page 50: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

DataSplitting

• Trainvs.Test

• Better:– Train(usedforfittingmodelparameters)– Dev(usedfortuninghyperparameters)– Test(reserveforfinalevaluation)

• Cross-validation

Page 51: Probability Review and Naïve Bayes - GitHub Pagesaritter.github.io/courses/5525_slides/probability_nb.pdf · Probability Review and Naïve Bayes ... Bayes Rule Example • There

FeatureEngineering

• Whatisyourword/featurerepresentation– Tokenizationrules:splittingonwhitespace?– Uppercaseisthesameaslowercase?– Numbers?– Punctuation?– Stemming?