Transcript

How is Bayesian Statistics Different?

by Wayne Tai Lee

Goal

● Clarify the difference between “classical and Bayesian Statistics

● Lay out the pro/con with this “attitude”

One sentence definition

Bayesian statistics is a mathematical framework to update beliefs as you observe more data.

Bayesian Update in Movies

Recall movies where a female character realizes her period is late?

Movie cliché: Am I pregnant?

● What did I do in the past month?

Movie cliché: Am I pregnant?

● What did I do in the past month?– Forms a prior belief of whether I am pregnant

Movie cliché: Am I pregnant?

● What did I do in the past month?– Forms a prior belief of whether I am pregnant

● The missing period– Data!

Movie cliché: Am I pregnant?

● What did I do in the past month?– Forms a prior belief of whether I am pregnant

● The missing period– Data!

● Belief is updated as more data is observed!

Bayesian terminology

● Prior: your belief about pregnancy before seeing new data

● Data: missing period● Posterior: your belief that is updated after

seeing the data

How do we formalize this update?

● Pregnant is a uncertain event with two outcomes: Yes or No

How do we formalize this update?

● Pregnant is a uncertain event with two outcomes: Yes or No

● “Days delayed of period” is a data point– If (Pregnant = Yes), delayed ~ 30*9 days

– If (Pregnant = No), it might come sooner

Mathematical framework

● “Pregnant” is a random variable: – P(Pregnant = Yes) = X

– P(Pregnant = No) = (1 - X)

Mathematical framework

● “Pregnant” is a random variable: – P(Pregnant = Yes) = X

– P(Pregnant = No) = (1 - X)

● “Days delayed of period” is another random variable!

– P(days delay >= 7 days | Pregnant) = 1

– P(days delay >= 7 days | Not Pregnant) = Y

Simplify

● Start with the objective:

Am I pregnant?i.e. P(Pregnant | Data)?

Simplify

● Start with the objective:

Am I pregnant?i.e. P(Pregnant | Data)?

● Note all the numbers we know are the form of P( **** | Pregnant)

Conditional Probability!

P(Pregnant | Data)

= P(Data | Pregnant) P(Pregnant) / P(Data)

Conditional Probability!

P(Pregnant | Data)

= P(Data | Pregnant) P(Pregnant) / P(Data)

Immediate implication:● If your prior says you cannot be pregnant,

your belief cannot be changed!

“Bayes Rule”

P(Pregnant | Data)

= P(Data | Pregnant) P(Pregnant) / P(Data)

= P(Data | Pregnant) P(Pregnant) / [ P(Data | Pregnant) P(Pregnant) + P(Data | Not Pregnant) P(Not Pregnant) ]

“Bayes Rule”

P(Pregnant | Data)

= P(Data | Pregnant) P(Pregnant) / P(Data)

= P(Data | Pregnant) P(Pregnant) / [ P(Data | Pregnant) P(Pregnant) + P(Data | Not Pregnant) P(Not Pregnant) ]

Why add more numbers?

P(Data) was hard to compute, so chop it into pieces we know!

P(Data): Big Issue for Bayesians

● Pregnant is binary which made this realllllly easy

● In general, a lot of “tricks” are trying to– solve for P(Data)

● Belief propagation in graphical models

– getting around it● Sampling: MCMC● Approximation: Variational Bayes

Back to the key question:

P(Pregnant | Data)

= P(Data | Pregnant) P(Pregnant) / [ P(Data | Pregnant) P(Pregnant) + P(Data | Not Pregnant) P(Not Pregnant) ]

= 1 * X / [ 1 * X + Y * (1 - X) ]

Back to the key question:

Can add more data….....almost for free!

● Notice “Data” is quite general:– Can add pregnancy strips data to further

update beliefs!

– Treat previous outputs as priors then update similarly!

So.....what's the big deal?

● Your belief matters a lot!– Your prior changes the outcome

● Your prior and my prior may be different

What “could” a bad Frequentist Do?

● Calculate the p-value for you, i.e.

P(Late period | Not Pregnant)● Declare that you're Pregnant if this is <= 5%

What “could” a bad Frequentist Do?

● Calculate the p-value for you, i.e.

P(Late period | Not Pregnant)● Declare that you're Pregnant if this is <= 5%● Declaration has 5% false positive and a

certain false negative rates

What “could” a bad Frequentist Do?

● Calculate the p-value for you, i.e.

P(Late period | Not Pregnant)● Declare that you're Pregnant if this is <= 5%● Declaration has 5% false positive and a

certain false negative rates

● Issue: Not as relevant to you! Rates are for all the people using this procedure...not specific to your case!

“not as relevant”?

● There's no consideration of your specific case– There was no P(Pregnant) in the p-value

calculation

– You could be really sure that you're not pregnant....doesn't change the calculation!

What would a Frequentist say?

● P(Pregnant) = 100% or 0%– Fixed but unknown

– NOT uncertain

● …Not actually interested in a single event– Probabilities are defined for repeated events

– Will not write down P(Pregnant | Data)

– For your one case, anything could be true

What would a Frequentist say?

● P(Pregnant) = 100% or 0%– Fixed but unknown

– NOT uncertain

● …Not actually interested in a single event– Probabilities are defined for repeated events

– Will not write down P(Pregnant | Data)

– For your one case, anything could be true

● Would say “Go talk to a doctor”

Key difference

● “Attitude”– What can be a random variable?

● Bayesian: Uncertain events● Frequentist: Repeatable events

Implications of this attitude

● Bayesian:– Can incorporate prior knowledge easily

– Can update beliefs easily

– Can tackle a wider class of problems since probabilities are “beliefs”

Implications of this attitude

● Bayesian:– Can incorporate prior knowledge easily

– Can update beliefs easily

– Can tackle a wider class of problems since probabilities are “beliefs”

– Must specify a model

– Your belief can be different from mine● Our answers will be different!

Implications of this attitude

● Frequentist:– Probabilities are more objective

– Harder to cheat

– Has non-parametric methods

Implications of this attitude

● Frequentist:– Probabilities are more objective

– Harder to cheat

– Has non-parametric methods

– Focused on repeatable events

– Prior knowledge is introduced in an ad hoc format

– Usually need lots of data

In the end...

● Frequentist and Bayesian use the same rules of probabilities

● Difference exists in set-up: “What is random?”– Bayesians: uncertainty in knowledge

– Frequentist: intrinsic randomness

Take Home

● Different problems should use different approaches!

– Both schools are awesome!~

● Be aware of what you're using and be consistent!


Recommended