37
How is Bayesian Statistics Different? by Wayne Tai Lee

What is bayesian statistics and how is it different?

Embed Size (px)

Citation preview

How is Bayesian Statistics Different?

by Wayne Tai Lee

Goal

● Clarify the difference between “classical and Bayesian Statistics

● Lay out the pro/con with this “attitude”

One sentence definition

Bayesian statistics is a mathematical framework to update beliefs as you observe more data.

Bayesian Update in Movies

Recall movies where a female character realizes her period is late?

Movie cliché: Am I pregnant?

● What did I do in the past month?

Movie cliché: Am I pregnant?

● What did I do in the past month?– Forms a prior belief of whether I am pregnant

Movie cliché: Am I pregnant?

● What did I do in the past month?– Forms a prior belief of whether I am pregnant

● The missing period– Data!

Movie cliché: Am I pregnant?

● What did I do in the past month?– Forms a prior belief of whether I am pregnant

● The missing period– Data!

● Belief is updated as more data is observed!

Bayesian terminology

● Prior: your belief about pregnancy before seeing new data

● Data: missing period● Posterior: your belief that is updated after

seeing the data

How do we formalize this update?

● Pregnant is a uncertain event with two outcomes: Yes or No

How do we formalize this update?

● Pregnant is a uncertain event with two outcomes: Yes or No

● “Days delayed of period” is a data point– If (Pregnant = Yes), delayed ~ 30*9 days

– If (Pregnant = No), it might come sooner

Mathematical framework

● “Pregnant” is a random variable: – P(Pregnant = Yes) = X

– P(Pregnant = No) = (1 - X)

Mathematical framework

● “Pregnant” is a random variable: – P(Pregnant = Yes) = X

– P(Pregnant = No) = (1 - X)

● “Days delayed of period” is another random variable!

– P(days delay >= 7 days | Pregnant) = 1

– P(days delay >= 7 days | Not Pregnant) = Y

Simplify

● Start with the objective:

Am I pregnant?i.e. P(Pregnant | Data)?

Simplify

● Start with the objective:

Am I pregnant?i.e. P(Pregnant | Data)?

● Note all the numbers we know are the form of P( **** | Pregnant)

Conditional Probability!

P(Pregnant | Data)

= P(Data | Pregnant) P(Pregnant) / P(Data)

Conditional Probability!

P(Pregnant | Data)

= P(Data | Pregnant) P(Pregnant) / P(Data)

Immediate implication:● If your prior says you cannot be pregnant,

your belief cannot be changed!

“Bayes Rule”

P(Pregnant | Data)

= P(Data | Pregnant) P(Pregnant) / P(Data)

= P(Data | Pregnant) P(Pregnant) / [ P(Data | Pregnant) P(Pregnant) + P(Data | Not Pregnant) P(Not Pregnant) ]

“Bayes Rule”

P(Pregnant | Data)

= P(Data | Pregnant) P(Pregnant) / P(Data)

= P(Data | Pregnant) P(Pregnant) / [ P(Data | Pregnant) P(Pregnant) + P(Data | Not Pregnant) P(Not Pregnant) ]

Why add more numbers?

P(Data) was hard to compute, so chop it into pieces we know!

P(Data): Big Issue for Bayesians

● Pregnant is binary which made this realllllly easy

● In general, a lot of “tricks” are trying to– solve for P(Data)

● Belief propagation in graphical models

– getting around it● Sampling: MCMC● Approximation: Variational Bayes

Back to the key question:

P(Pregnant | Data)

= P(Data | Pregnant) P(Pregnant) / [ P(Data | Pregnant) P(Pregnant) + P(Data | Not Pregnant) P(Not Pregnant) ]

= 1 * X / [ 1 * X + Y * (1 - X) ]

Back to the key question:

Can add more data….....almost for free!

● Notice “Data” is quite general:– Can add pregnancy strips data to further

update beliefs!

– Treat previous outputs as priors then update similarly!

So.....what's the big deal?

● Your belief matters a lot!– Your prior changes the outcome

● Your prior and my prior may be different

What “could” a bad Frequentist Do?

● Calculate the p-value for you, i.e.

P(Late period | Not Pregnant)● Declare that you're Pregnant if this is <= 5%

What “could” a bad Frequentist Do?

● Calculate the p-value for you, i.e.

P(Late period | Not Pregnant)● Declare that you're Pregnant if this is <= 5%● Declaration has 5% false positive and a

certain false negative rates

What “could” a bad Frequentist Do?

● Calculate the p-value for you, i.e.

P(Late period | Not Pregnant)● Declare that you're Pregnant if this is <= 5%● Declaration has 5% false positive and a

certain false negative rates

● Issue: Not as relevant to you! Rates are for all the people using this procedure...not specific to your case!

“not as relevant”?

● There's no consideration of your specific case– There was no P(Pregnant) in the p-value

calculation

– You could be really sure that you're not pregnant....doesn't change the calculation!

What would a Frequentist say?

● P(Pregnant) = 100% or 0%– Fixed but unknown

– NOT uncertain

● …Not actually interested in a single event– Probabilities are defined for repeated events

– Will not write down P(Pregnant | Data)

– For your one case, anything could be true

What would a Frequentist say?

● P(Pregnant) = 100% or 0%– Fixed but unknown

– NOT uncertain

● …Not actually interested in a single event– Probabilities are defined for repeated events

– Will not write down P(Pregnant | Data)

– For your one case, anything could be true

● Would say “Go talk to a doctor”

Key difference

● “Attitude”– What can be a random variable?

● Bayesian: Uncertain events● Frequentist: Repeatable events

Implications of this attitude

● Bayesian:– Can incorporate prior knowledge easily

– Can update beliefs easily

– Can tackle a wider class of problems since probabilities are “beliefs”

Implications of this attitude

● Bayesian:– Can incorporate prior knowledge easily

– Can update beliefs easily

– Can tackle a wider class of problems since probabilities are “beliefs”

– Must specify a model

– Your belief can be different from mine● Our answers will be different!

Implications of this attitude

● Frequentist:– Probabilities are more objective

– Harder to cheat

– Has non-parametric methods

Implications of this attitude

● Frequentist:– Probabilities are more objective

– Harder to cheat

– Has non-parametric methods

– Focused on repeatable events

– Prior knowledge is introduced in an ad hoc format

– Usually need lots of data

In the end...

● Frequentist and Bayesian use the same rules of probabilities

● Difference exists in set-up: “What is random?”– Bayesians: uncertainty in knowledge

– Frequentist: intrinsic randomness

Take Home

● Different problems should use different approaches!

– Both schools are awesome!~

● Be aware of what you're using and be consistent!