View
61
Download
0
Category
Preview:
DESCRIPTION
Bayes Rule. Rev. Thomas Bayes (1702-1761). How is this rule derived? Using Bayes rule for probabilistic inference: P(Cause | Evidence): diagnostic probability P(Evidence | Cause): causal probability. Bayesian decision theory. - PowerPoint PPT Presentation
Citation preview
Bayes Rule
• How is this rule derived?• Using Bayes rule for probabilistic inference:
– P(Cause | Evidence): diagnostic probability– P(Evidence | Cause): causal probability
)()()|()|(
BPAPABPBAP Rev. Thomas Bayes
(1702-1761)
)()()|()|(
EvidencePCausePCauseEvidencePEvidenceCauseP
Bayesian decision theory• Suppose the agent has to make a decision about
the value of an unobserved query variable X given some observed evidence E = e – Partially observable, stochastic, episodic environment– Examples: X = {spam, not spam}, e = email message
X = {zebra, giraffe, hippo}, e = image features– The agent has a loss function, which is 0 if the value
of X is guessed correctly and 1 otherwise– What is agent’s optimal estimate of the value of X?
• Maximum a posteriori (MAP) decision: value of X that minimizes expected loss is the one that has the greatest posterior probability P(X = x | e)
MAP decision• X = x: value of query variable• E = e: evidence
• Maximum likelihood (ML) decision:
)()|(maxarg)()()|()|(maxarg*
xPxePePxPxePexPx
x
x
)()|()|( xPxePexP likelihood priorposterior
)|(maxarg* xePx x
Example: Spam Filter• We have X = {spam, ¬spam}, E = email message.• What should be our decision criterion?
– Compute P(spam | message) and P(¬spam | message), and assign the message to the class that gives higher posterior probability
Example: Spam Filter• We have X = {spam, ¬spam}, E = email message.• What should be our decision criterion?
– Compute P(spam | message) and P(¬spam | message), and assign the message to the class that gives higher posterior probability
P(spam | message) P(message | spam) P(spam)P(¬spam | message) P(message | ¬spam) P(¬spam)
Example: Spam Filter• We need to find P(message | spam) P(spam) and
P(message | ¬spam) P(¬spam)• How do we represent the message?
– Bag of words model:• The order of the words is not important• Each word is conditionally independent of the others given
message class • If the message consists of words (w1, …, wn), how do we
compute P(w1, …, wn | spam)?– Naïve Bayes assumption: each word is conditionally
independent of the others given message class
n
iin spamwPspamwwPspammessageP
11 )|()|,,()|(
Example: Spam Filter• Our filter will classify the message as spam if
• In practice, likelihoods are pretty small numbers, so we need to take logs to avoid underflow:
• Model parameters: – Priors P(spam), P(¬spam)– Likelihoods P(wi | spam), P(wi | ¬spam)
• These parameters need to be learned from a training set (a representative sample of email messages marked with their classes)
n
ii
n
ii spamwPspamPspamwPspamP
11
)|()()|()(
)|(log)(log)|()(log11
spamwPspamPspamwPspamP i
n
i
n
ii
Parameter estimation• Model parameters:
– Priors P(spam), P(¬spam)– Likelihoods P(wi | spam), P(wi | ¬spam)
• Estimation by empirical word frequencies in the training set:
– This happens to be the parameter estimate that maximizes the likelihood of the training data:
P(wi | spam) =# of occurrences of wi in spam messages
total # of words in spam messages
D
d
n
idid
d
classwP1 1
, )|(
d: index of training document, i: index of a word
Parameter estimation• Model parameters:
– Priors P(spam), P(¬spam)– Likelihoods P(wi | spam), P(wi | ¬spam)
• Estimation by empirical word frequencies in the training set:
• Parameter smoothing: dealing with words that were never seen or seen too few times– Laplacian smoothing: pretend you have seen every vocabulary word
one more time than you actually did
P(wi | spam) =# of occurrences of wi in spam messages
total # of words in spam messages
Bayesian decision making: Summary
• Suppose the agent has to make decisions about the value of an unobserved query variable X based on the values of an observed evidence variable E
• Inference problem: given some evidence E = e, what is P(X | e)?
• Learning problem: estimate the parameters of the probabilistic model P(X | E) given a training sample {(x1,e1), …, (xn,en)}
Bag-of-word models for images
Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)
Bag-of-word models for images1. Extract image features
Bag-of-word models for images1. Extract image features
1. Extract image features2. Learn “visual vocabulary”
Bag-of-word models for images
1. Extract image features2. Learn “visual vocabulary”3. Map image features to visual words
Bag-of-word models for images
Recommended