Bayes Rule

• How is this rule derived?• Using Bayes rule for probabilistic inference:

– P(Cause | Evidence): diagnostic probability– P(Evidence | Cause): causal probability

)()()|()|(

BPAPABPBAP Rev. Thomas Bayes

(1702-1761)

)()()|()|(

EvidencePCausePCauseEvidencePEvidenceCauseP

Bayesian decision theory• Suppose the agent has to make a decision about

the value of an unobserved query variable X given some observed evidence E = e – Partially observable, stochastic, episodic environment– Examples: X = {spam, not spam}, e = email message

X = {zebra, giraffe, hippo}, e = image features– The agent has a loss function, which is 0 if the value

of X is guessed correctly and 1 otherwise– What is agent’s optimal estimate of the value of X?

• Maximum a posteriori (MAP) decision: value of X that minimizes expected loss is the one that has the greatest posterior probability P(X = x | e)

MAP decision• X = x: value of query variable• E = e: evidence

• Maximum likelihood (ML) decision:

)()|(maxarg)()()|()|(maxarg*

xPxePePxPxePexPx

)()|()|( xPxePexP likelihood priorposterior

)|(maxarg* xePx x

Example: Spam Filter• We have X = {spam, ¬spam}, E = email message.• What should be our decision criterion?

– Compute P(spam | message) and P(¬spam | message), and assign the message to the class that gives higher posterior probability

Example: Spam Filter• We have X = {spam, ¬spam}, E = email message.• What should be our decision criterion?

– Compute P(spam | message) and P(¬spam | message), and assign the message to the class that gives higher posterior probability

P(spam | message) P(message | spam) P(spam)P(¬spam | message) P(message | ¬spam) P(¬spam)

Example: Spam Filter• We need to find P(message | spam) P(spam) and

P(message | ¬spam) P(¬spam)• How do we represent the message?

– Bag of words model:• The order of the words is not important• Each word is conditionally independent of the others given

message class • If the message consists of words (w1, …, wn), how do we

compute P(w1, …, wn | spam)?– Naïve Bayes assumption: each word is conditionally

independent of the others given message class

iin spamwPspamwwPspammessageP

11 )|()|,,()|(

Example: Spam Filter• Our filter will classify the message as spam if

• In practice, likelihoods are pretty small numbers, so we need to take logs to avoid underflow:

• Model parameters: – Priors P(spam), P(¬spam)– Likelihoods P(wi | spam), P(wi | ¬spam)

• These parameters need to be learned from a training set (a representative sample of email messages marked with their classes)

ii spamwPspamPspamwPspamP

)|()()|()(

)|(log)(log)|()(log11

spamwPspamPspamwPspamP i

Parameter estimation• Model parameters:

– Priors P(spam), P(¬spam)– Likelihoods P(wi | spam), P(wi | ¬spam)

• Estimation by empirical word frequencies in the training set:

– This happens to be the parameter estimate that maximizes the likelihood of the training data:

P(wi | spam) =# of occurrences of wi in spam messages

total # of words in spam messages

classwP1 1

d: index of training document, i: index of a word

Parameter estimation• Model parameters:

– Priors P(spam), P(¬spam)– Likelihoods P(wi | spam), P(wi | ¬spam)

• Estimation by empirical word frequencies in the training set:

• Parameter smoothing: dealing with words that were never seen or seen too few times– Laplacian smoothing: pretend you have seen every vocabulary word

one more time than you actually did

P(wi | spam) =# of occurrences of wi in spam messages

total # of words in spam messages

Bayesian decision making: Summary

• Suppose the agent has to make decisions about the value of an unobserved query variable X based on the values of an observed evidence variable E

• Inference problem: given some evidence E = e, what is P(X | e)?

• Learning problem: estimate the parameters of the probabilistic model P(X | E) given a training sample {(x1,e1), …, (xn,en)}

Bag-of-word models for images

Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)

Bag-of-word models for images1. Extract image features

1. Extract image features2. Learn “visual vocabulary”

1. Extract image features2. Learn “visual vocabulary”3. Map image features to visual words

Bayes Rule

Documents

CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is

Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Lecture 1: Bayes' Rule Information Cascades · Lecture 1: Bayes’ Rule Information Cascades Christoph Brunner April 19, 2012 1/72. Behavioral Game Theory and Experiments\r Jörg

Bayes Rule

Bayes’ Rule - Amazon S3s3.amazonaws.com/compressed.photo.goodreads.com/...Preface This introductory text is intended to provide a straightforward ex-planation of Bayes’ rule, using

Probabilities, Bayes Rule, Markov Chain Monte Carlo

Bayes’ Rule With MatLab - University of Sheffieldjim-stone.staff.shef.ac.uk/BookBayes2012/bookbayesch01...Bayes’ Rule With MatLab A Tutorial Introduction to Bayesian Analysis James

Teaching an Application of Bayes’ Rule for Legal …ww2.amstat.org/publications/jse/v22n1/satake.pdf · Teaching an Application of Bayes’ Rule for Legal Decision-Making: Measuring

Neural network directed bayes decision rule for moving target

Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters

Text mining - from Bayes rule to dependency parsing

Classification: Naïve Bayes - University of Belgradeai.fon.bg.ac.rs/wp-content/uploads/2016/10/Naive-Bayes-Labs-2016.pdf · Naive Bayes classifier • Based on the Bayes rule •

Bayes Rule and Probability - crowley-coutaz.frcrowley-coutaz.fr/jlc/Courses/2017/ENSI2.SIRR/ENSI2.SIRR...Bayes’ Rule and Probability Lesson 3 3-7 Then probability distribution table

CPSC340 Bayes rule

Bayesian inference, Naïve Bayes modelslazebni.cs.illinois.edu/fall17/lec13_bayesian_inference.pdf · Bayes Rule • The product rule gives us two ways to factor a joint probability:

Bayes' Rule - James V Stone - University of Sheffield

BI bayes' rule

Kernel Bayes’ Rule: Bayesian Inference with Positive ...A kernel method for realizing Bayes’ rule is proposed, based on representations of probabilities in reproducing kernel Hilbert

Bayesian inference, Naïve Bayes model - Svetlana Lazebnikslazebni.cs.illinois.edu/fall16/lec13_bayesian_inference.pdf · Bayesian inference, Naïve Bayes model ... Bayes Rule •