63
Chapter 1: Bayesian Basics Conchi Aus´ ın and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Conchi Aus´ ın and Mike Wiper Bayesian Inference ASDM 2018 1 / 21

Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Chapter 1: Bayesian Basics

Conchi Ausın and Mike Wiper

Department of Statistics

Universidad Carlos III de Madrid

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 1 / 21

Page 2: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Objective

In this chapter, we introduce the basic theory and properties of Bayesian statisticsand some problems and advantages in the big data setting.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 2 / 21

Page 3: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Probability Rules I: Partitions

Events B1, ...,Bk form a partition if Bi ∩ Bj = φ∀i 6= j and⋃k

i=1 Bi = Ω. Thenfor any event A,

P(A) = P(A ∩ B1) + P(A ∩ B2) + ...+ P(A ∩ Bk).

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 3 / 21

Page 4: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Probability Rules II: Conditional Probability

For two events A and B, the conditional probability of A given B is

P(A|B) =P(A ∩ B)

P(B).

The multiplication law is P(A ∩ B) = P(A|B)P(B).

A and B are independent if P(A ∩ B) = P(A)P(B) or P(A|B) = P(A) orP(B|A) = P(B).

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 4 / 21

Page 5: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Probability Rules III: Total Probability

Given a partition, B1, ...,Bk then for any event A,

P(A) = P(A|B1)P(B1) + P(A|B2)P(B2) + ...+ P(A|Bk)P(Bk).

For continuous variables, f (x) =∫f (x |y)f (y) dy .

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 5 / 21

Page 6: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Probability Rules III: Total Probability

Given a partition, B1, ...,Bk then for any event A,

P(A) = P(A|B1)P(B1) + P(A|B2)P(B2) + ...+ P(A|Bk)P(Bk).

For continuous variables, f (x) =∫f (x |y)f (y) dy .

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 5 / 21

Page 7: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Probability Rules IV: Bayes Theorem

More generally, if B1, ...,Bk form a partition, we can write Bayes theorem as:

P(Bi |A) =P(A|Bi )P(Bi )∑kj=1 P(A|Bj)P(Bj)

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 6 / 21

Page 8: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Example: The Monty Hall problem

Should you change doors?

Implicit assumption:

the host always opens a different door from the door chosen by the player andalways reveals a goat by this action because he knows where the car is hidden.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 7 / 21

Page 9: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Solution using Bayes Theorem

Suppose without loss of generality that the player chooses door 1 and that thehost opens door 2 to reveal a goat.

Let A (B, C) be the event that the prize is behind door 1, (2, 3).

P(A) = P(B) = P(C ) = 13 .

P(opens 2|A) = 12 . P(opens 2|B) = 0, P(opens 2|C ) = 1.

P(opens 2) = P(opens 2|A)P(A) + P(opens 2|B)P(B) + P(opens 2|C )P(C )

=1

2× 1

3+ 0× 1

3+ 1× 1

3=

1

2

P(A|opens 2) =P(opens 2|A)P(A)

P(opens 2)=

12 ×

13

12

=1

3

so P(C |opens 2) = 23 and it is better to switch.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 8 / 21

Page 10: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Solution using Bayes Theorem

Suppose without loss of generality that the player chooses door 1 and that thehost opens door 2 to reveal a goat.

Let A (B, C) be the event that the prize is behind door 1, (2, 3).

P(A) = P(B) = P(C ) = 13 .

P(opens 2|A) = 12 . P(opens 2|B) = 0, P(opens 2|C ) = 1.

P(opens 2) = P(opens 2|A)P(A) + P(opens 2|B)P(B) + P(opens 2|C )P(C )

=1

2× 1

3+ 0× 1

3+ 1× 1

3=

1

2

P(A|opens 2) =P(opens 2|A)P(A)

P(opens 2)=

12 ×

13

12

=1

3

so P(C |opens 2) = 23 and it is better to switch.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 8 / 21

Page 11: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Solution using Bayes Theorem

Suppose without loss of generality that the player chooses door 1 and that thehost opens door 2 to reveal a goat.

Let A (B, C) be the event that the prize is behind door 1, (2, 3).

P(A) = P(B) = P(C ) = 13 .

P(opens 2|A) = 12 . P(opens 2|B) = 0, P(opens 2|C ) = 1.

P(opens 2) = P(opens 2|A)P(A) + P(opens 2|B)P(B) + P(opens 2|C )P(C )

=1

2× 1

3+ 0× 1

3+ 1× 1

3=

1

2

P(A|opens 2) =P(opens 2|A)P(A)

P(opens 2)=

12 ×

13

12

=1

3

so P(C |opens 2) = 23 and it is better to switch.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 8 / 21

Page 12: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Solution using Bayes Theorem

Suppose without loss of generality that the player chooses door 1 and that thehost opens door 2 to reveal a goat.

Let A (B, C) be the event that the prize is behind door 1, (2, 3).

P(A) = P(B) = P(C ) = 13 .

P(opens 2|A) = 12 . P(opens 2|B) = 0, P(opens 2|C ) = 1.

P(opens 2) = P(opens 2|A)P(A) + P(opens 2|B)P(B) + P(opens 2|C )P(C )

=1

2× 1

3+ 0× 1

3+ 1× 1

3=

1

2

P(A|opens 2) =P(opens 2|A)P(A)

P(opens 2)=

12 ×

13

12

=1

3

so P(C |opens 2) = 23 and it is better to switch.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 8 / 21

Page 13: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Solution using Bayes Theorem

Suppose without loss of generality that the player chooses door 1 and that thehost opens door 2 to reveal a goat.

Let A (B, C) be the event that the prize is behind door 1, (2, 3).

P(A) = P(B) = P(C ) = 13 .

P(opens 2|A) = 12 . P(opens 2|B) = 0, P(opens 2|C ) = 1.

P(opens 2) = P(opens 2|A)P(A) + P(opens 2|B)P(B) + P(opens 2|C )P(C )

=1

2× 1

3+ 0× 1

3+ 1× 1

3=

1

2

P(A|opens 2) =P(opens 2|A)P(A)

P(opens 2)=

12 ×

13

12

=1

3

so P(C |opens 2) = 23 and it is better to switch.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 8 / 21

Page 14: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Statistical Inference

Given data, x = (x1, ..., xn), we typically wish to make inference about somemodel parameter, θ, or predictions of future observations. There are two commonstatistical approaches: classical and Bayesian inference.

Classical Inference

Frequentist interpretation of probability.

Inference is based on the likelihood function:

l(θ|x) = f (x|θ).

θ is fixed. All uncertainty about X is quantified a priori.

Inferential procedures based on asymptotic performance.

Prediction often carried out by substituting an estimator for θ.

P(Y < y |x) ≈ P(Y < y |x, θ).

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 9 / 21

Page 15: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Statistical Inference

Given data, x = (x1, ..., xn), we typically wish to make inference about somemodel parameter, θ, or predictions of future observations. There are two commonstatistical approaches: classical and Bayesian inference.

Classical Inference

Frequentist interpretation of probability.

Inference is based on the likelihood function:

l(θ|x) = f (x|θ).

θ is fixed. All uncertainty about X is quantified a priori.

Inferential procedures based on asymptotic performance.

Prediction often carried out by substituting an estimator for θ.

P(Y < y |x) ≈ P(Y < y |x, θ).

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 9 / 21

Page 16: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Statistical Inference

Given data, x = (x1, ..., xn), we typically wish to make inference about somemodel parameter, θ, or predictions of future observations. There are two commonstatistical approaches: classical and Bayesian inference.

Classical Inference

Frequentist interpretation of probability.

Inference is based on the likelihood function:

l(θ|x) = f (x|θ).

θ is fixed. All uncertainty about X is quantified a priori.

Inferential procedures based on asymptotic performance.

Prediction often carried out by substituting an estimator for θ.

P(Y < y |x) ≈ P(Y < y |x, θ).

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 9 / 21

Page 17: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Statistical Inference

Given data, x = (x1, ..., xn), we typically wish to make inference about somemodel parameter, θ, or predictions of future observations. There are two commonstatistical approaches: classical and Bayesian inference.

Classical Inference

Frequentist interpretation of probability.

Inference is based on the likelihood function:

l(θ|x) = f (x|θ).

θ is fixed. All uncertainty about X is quantified a priori.

Inferential procedures based on asymptotic performance.

Prediction often carried out by substituting an estimator for θ.

P(Y < y |x) ≈ P(Y < y |x, θ).

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 9 / 21

Page 18: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Statistical Inference

Given data, x = (x1, ..., xn), we typically wish to make inference about somemodel parameter, θ, or predictions of future observations. There are two commonstatistical approaches: classical and Bayesian inference.

Classical Inference

Frequentist interpretation of probability.

Inference is based on the likelihood function:

l(θ|x) = f (x|θ).

θ is fixed. All uncertainty about X is quantified a priori.

Inferential procedures based on asymptotic performance.

Prediction often carried out by substituting an estimator for θ.

P(Y < y |x) ≈ P(Y < y |x, θ).

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 9 / 21

Page 19: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Statistical Inference

Given data, x = (x1, ..., xn), we typically wish to make inference about somemodel parameter, θ, or predictions of future observations. There are two commonstatistical approaches: classical and Bayesian inference.

Classical Inference

Frequentist interpretation of probability.

Inference is based on the likelihood function:

l(θ|x) = f (x|θ).

θ is fixed. All uncertainty about X is quantified a priori.

Inferential procedures based on asymptotic performance.

Prediction often carried out by substituting an estimator for θ.

P(Y < y |x) ≈ P(Y < y |x, θ).

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 9 / 21

Page 20: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Example: a coin tossing experiment

You have a coin with P(head) = θ. Suppose you decide to toss the coin 12 timesand observe 9 heads and 3 tails.

The maximum likelihood estimate for θ is θ = 912 .

An (approximate) 95% confidence interval for θ is (0.505, 0.995).

The p−value for the test H0 : θ = 0.5 vs H1 : θ > 0.5 is:

p =12∑i=9

P(i heads in 12 tosses|θ = 0.5) = 0.073

and the null hypothesis is not rejected at a 5% significance level.

With the alternative experiment of tossing the coin until the third tail isobserved, if this occurs on the 12th toss, then p = 0.0325 and H0 is rejected!

The plug-in predictive distribution for the number of heads in 10 more cointosses is Binomial(10,0.75).

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 10 / 21

Page 21: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Example: a coin tossing experiment

You have a coin with P(head) = θ. Suppose you decide to toss the coin 12 timesand observe 9 heads and 3 tails.

The maximum likelihood estimate for θ is θ = 912 .

An (approximate) 95% confidence interval for θ is (0.505, 0.995).

The p−value for the test H0 : θ = 0.5 vs H1 : θ > 0.5 is:

p =12∑i=9

P(i heads in 12 tosses|θ = 0.5) = 0.073

and the null hypothesis is not rejected at a 5% significance level.

With the alternative experiment of tossing the coin until the third tail isobserved, if this occurs on the 12th toss, then p = 0.0325 and H0 is rejected!

The plug-in predictive distribution for the number of heads in 10 more cointosses is Binomial(10,0.75).

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 10 / 21

Page 22: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Example: a coin tossing experiment

You have a coin with P(head) = θ. Suppose you decide to toss the coin 12 timesand observe 9 heads and 3 tails.

The maximum likelihood estimate for θ is θ = 912 .

An (approximate) 95% confidence interval for θ is (0.505, 0.995).

The p−value for the test H0 : θ = 0.5 vs H1 : θ > 0.5 is:

p =12∑i=9

P(i heads in 12 tosses|θ = 0.5) = 0.073

and the null hypothesis is not rejected at a 5% significance level.

With the alternative experiment of tossing the coin until the third tail isobserved, if this occurs on the 12th toss, then p = 0.0325 and H0 is rejected!

The plug-in predictive distribution for the number of heads in 10 more cointosses is Binomial(10,0.75).

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 10 / 21

Page 23: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Example: a coin tossing experiment

You have a coin with P(head) = θ. Suppose you decide to toss the coin 12 timesand observe 9 heads and 3 tails.

The maximum likelihood estimate for θ is θ = 912 .

An (approximate) 95% confidence interval for θ is (0.505, 0.995).

The p−value for the test H0 : θ = 0.5 vs H1 : θ > 0.5 is:

p =12∑i=9

P(i heads in 12 tosses|θ = 0.5) = 0.073

and the null hypothesis is not rejected at a 5% significance level.

With the alternative experiment of tossing the coin until the third tail isobserved, if this occurs on the 12th toss, then p = 0.0325 and H0 is rejected!

The plug-in predictive distribution for the number of heads in 10 more cointosses is Binomial(10,0.75).

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 10 / 21

Page 24: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Example: a coin tossing experiment

You have a coin with P(head) = θ. Suppose you decide to toss the coin 12 timesand observe 9 heads and 3 tails.

The maximum likelihood estimate for θ is θ = 912 .

An (approximate) 95% confidence interval for θ is (0.505, 0.995).

The p−value for the test H0 : θ = 0.5 vs H1 : θ > 0.5 is:

p =12∑i=9

P(i heads in 12 tosses|θ = 0.5) = 0.073

and the null hypothesis is not rejected at a 5% significance level.

With the alternative experiment of tossing the coin until the third tail isobserved, if this occurs on the 12th toss, then p = 0.0325 and H0 is rejected!

The plug-in predictive distribution for the number of heads in 10 more cointosses is Binomial(10,0.75).

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 10 / 21

Page 25: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Example: a coin tossing experiment

You have a coin with P(head) = θ. Suppose you decide to toss the coin 12 timesand observe 9 heads and 3 tails.

The maximum likelihood estimate for θ is θ = 912 .

An (approximate) 95% confidence interval for θ is (0.505, 0.995).

The p−value for the test H0 : θ = 0.5 vs H1 : θ > 0.5 is:

p =12∑i=9

P(i heads in 12 tosses|θ = 0.5) = 0.073

and the null hypothesis is not rejected at a 5% significance level.

With the alternative experiment of tossing the coin until the third tail isobserved, if this occurs on the 12th toss, then p = 0.0325 and H0 is rejected!

The plug-in predictive distribution for the number of heads in 10 more cointosses is Binomial(10,0.75).

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 10 / 21

Page 26: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian Inference

Subjective interpretation of probability.

Inference based on the likelihood function ... and

θ is treated as a variable, with a prior distribution.

Inference is carried out via Bayes theorem and probability formulae.

Prediction is inherent to the procedure.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 11 / 21

Page 27: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian Inference

Subjective interpretation of probability.

Inference based on the likelihood function ... and

θ is treated as a variable, with a prior distribution.

Inference is carried out via Bayes theorem and probability formulae.

Prediction is inherent to the procedure.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 11 / 21

Page 28: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian Inference

Subjective interpretation of probability.

Inference based on the likelihood function ...

and

θ is treated as a variable, with a prior distribution.

Inference is carried out via Bayes theorem and probability formulae.

Prediction is inherent to the procedure.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 11 / 21

Page 29: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian Inference

Subjective interpretation of probability.

Inference based on the likelihood function ... and

θ is treated as a variable, with a prior distribution.

Inference is carried out via Bayes theorem and probability formulae.

Prediction is inherent to the procedure.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 11 / 21

Page 30: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian Inference

Subjective interpretation of probability.

Inference based on the likelihood function ... and

θ is treated as a variable, with a prior distribution.

Inference is carried out via Bayes theorem and probability formulae.

Prediction is inherent to the procedure.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 11 / 21

Page 31: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian Inference

Subjective interpretation of probability.

Inference based on the likelihood function ... and

θ is treated as a variable, with a prior distribution.

Inference is carried out via Bayes theorem and probability formulae.

Prediction is inherent to the procedure.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 11 / 21

Page 32: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian Inference

Subjective interpretation of probability.

Inference based on the likelihood function ... and

θ is treated as a variable, with a prior distribution.

Inference is carried out via Bayes theorem and probability formulae.

Prediction is inherent to the procedure.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 11 / 21

Page 33: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian inference: probability and the priordistribution

We have a prior distribution for θ which reflects personal subjectiveknowledge, previous data, ...

Different people can have different priors.

The only restriction is coherence.

Example

What do we know about the coin?

Coins typically have two faces with θ = P(head) ≈ 0.5.

0 ≤ θ ≤ 1.

Consider a prior distribution for θ centred in 0.5 but allowing for coin bias.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 12 / 21

Page 34: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian inference: probability and the priordistribution

We have a prior distribution for θ which reflects personal subjectiveknowledge, previous data, ...

Different people can have different priors.

The only restriction is coherence.

Example

What do we know about the coin?

Coins typically have two faces with θ = P(head) ≈ 0.5.

0 ≤ θ ≤ 1.

Consider a prior distribution for θ centred in 0.5 but allowing for coin bias.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 12 / 21

Page 35: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

The beta prior distribution

θ has a beta distribution with parameters a, b > 0 if

f (θ) =1

B(a, b)θa−1(1− θ)b−1

for 0 < θ < 1.

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

theta

f

The mean is E [θ] = aa+b .

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 13 / 21

Page 36: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian inference: updating

When data are observed, beliefs are updated via Bayes theorem:

f (θ|x) =f (x|θ)f (θ)

f (x)

=l(θ|x)f (θ)

f (x)

∝ l(θ|x)f (θ)

because the denominator is independent of θ.

We can remember this as:

posterior ∝ likelihood× prior.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 14 / 21

Page 37: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian inference: updating

When data are observed, beliefs are updated via Bayes theorem:

f (θ|x) =f (x|θ)f (θ)

f (x)

=l(θ|x)f (θ)

f (x)

∝ l(θ|x)f (θ)

because the denominator is independent of θ.

We can remember this as:

posterior ∝ likelihood× prior.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 14 / 21

Page 38: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Example

Suppose we use a Beta(5,5) prior distribution for θ:

f (θ) =1

B(5, 5)θ5−1(1− θ)5−1.

Then the posterior distribution is:

f (θ|x) ∝(

129

)θ9(1− θ)3 × 1

B(5, 5)θ5−1(1− θ)5−1

∝ θ14−1(1− θ)8−1

What distribution is this?

f (θ|x) ∝ B(14, 8)

B(14, 8)θ14−1(1− θ)8−1 =

1

B(14, 8)θ14−1(1− θ)8−1.

Another beta distribution: θ|x ∼ Be(14, 8).P(θ > 0.5|x) ≈ 0.905.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 15 / 21

Page 39: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Example

Suppose we use a Beta(5,5) prior distribution for θ:

f (θ) =1

B(5, 5)θ5−1(1− θ)5−1.

Then the posterior distribution is:

f (θ|x) ∝(

129

)θ9(1− θ)3 × 1

B(5, 5)θ5−1(1− θ)5−1

∝ θ14−1(1− θ)8−1

What distribution is this?

f (θ|x) ∝ B(14, 8)

B(14, 8)θ14−1(1− θ)8−1 =

1

B(14, 8)θ14−1(1− θ)8−1.

Another beta distribution: θ|x ∼ Be(14, 8).

P(θ > 0.5|x) ≈ 0.905.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 15 / 21

Page 40: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Example

Suppose we use a Beta(5,5) prior distribution for θ:

f (θ) =1

B(5, 5)θ5−1(1− θ)5−1.

Then the posterior distribution is:

f (θ|x) ∝(

129

)θ9(1− θ)3 × 1

B(5, 5)θ5−1(1− θ)5−1

∝ θ14−1(1− θ)8−1

What distribution is this?

f (θ|x) ∝ B(14, 8)

B(14, 8)θ14−1(1− θ)8−1 =

1

B(14, 8)θ14−1(1− θ)8−1.

Another beta distribution: θ|x ∼ Be(14, 8).P(θ > 0.5|x) ≈ 0.905.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 15 / 21

Page 41: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian inference: the posterior as an average

The posterior density combines information from both prior and likelihood.

Example

The plot shows the prior density (dotted), scaled likelihood (dashed) andposterior density (solid).

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

theta

f

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 16 / 21

Page 42: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian inference: point and interval estimation

For point estimates we could use the prior mean, median or mode forexample.

The classical MLE is θ = 912 = 0.75 and the Bayesian posterior mean is

E [θ|x] = 1422 ≈ 0.636.

We have a weighted average:

14

22=

10

22× 1

2+

12

22× 9

12

E [θ|x] = wE [θ] + (1− w)θ where w =10

22.

For interval estimates we can use a credible interval, i.e. an interval [θ, θ]such that P(θ < θ < θ|x) = 0.95.

The shortest such interval is a highest posterior density (hpd) interval.

A classical 95% confidence interval is (0.505, 0.995) and the posteriorcredible interval is (0.430, 0.819).How do we interpret the two intervals?

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 17 / 21

Page 43: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian inference: point and interval estimation

For point estimates we could use the prior mean, median or mode forexample.

The classical MLE is θ = 912 = 0.75 and the Bayesian posterior mean is

E [θ|x] = 1422 ≈ 0.636.

We have a weighted average:

14

22=

10

22× 1

2+

12

22× 9

12

E [θ|x] = wE [θ] + (1− w)θ where w =10

22.

For interval estimates we can use a credible interval, i.e. an interval [θ, θ]such that P(θ < θ < θ|x) = 0.95.

The shortest such interval is a highest posterior density (hpd) interval.

A classical 95% confidence interval is (0.505, 0.995) and the posteriorcredible interval is (0.430, 0.819).How do we interpret the two intervals?

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 17 / 21

Page 44: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian inference: point and interval estimation

For point estimates we could use the prior mean, median or mode forexample.

The classical MLE is θ = 912 = 0.75 and the Bayesian posterior mean is

E [θ|x] = 1422 ≈ 0.636.

We have a weighted average:

14

22=

10

22× 1

2+

12

22× 9

12

E [θ|x] = wE [θ] + (1− w)θ where w =10

22.

For interval estimates we can use a credible interval, i.e. an interval [θ, θ]such that P(θ < θ < θ|x) = 0.95.

The shortest such interval is a highest posterior density (hpd) interval.

A classical 95% confidence interval is (0.505, 0.995) and the posteriorcredible interval is (0.430, 0.819).How do we interpret the two intervals?

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 17 / 21

Page 45: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian inference: point and interval estimation

For point estimates we could use the prior mean, median or mode forexample.

The classical MLE is θ = 912 = 0.75 and the Bayesian posterior mean is

E [θ|x] = 1422 ≈ 0.636.

We have a weighted average:

14

22=

10

22× 1

2+

12

22× 9

12

E [θ|x] = wE [θ] + (1− w)θ where w =10

22.

For interval estimates we can use a credible interval, i.e. an interval [θ, θ]such that P(θ < θ < θ|x) = 0.95.

The shortest such interval is a highest posterior density (hpd) interval.

A classical 95% confidence interval is (0.505, 0.995) and the posteriorcredible interval is (0.430, 0.819).How do we interpret the two intervals?

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 17 / 21

Page 46: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian inference: point and interval estimation

For point estimates we could use the prior mean, median or mode forexample.

The classical MLE is θ = 912 = 0.75 and the Bayesian posterior mean is

E [θ|x] = 1422 ≈ 0.636.

We have a weighted average:

14

22=

10

22× 1

2+

12

22× 9

12

E [θ|x] = wE [θ] + (1− w)θ where w =10

22.

For interval estimates we can use a credible interval, i.e. an interval [θ, θ]such that P(θ < θ < θ|x) = 0.95.

The shortest such interval is a highest posterior density (hpd) interval.

A classical 95% confidence interval is (0.505, 0.995) and the posteriorcredible interval is (0.430, 0.819).How do we interpret the two intervals?

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 17 / 21

Page 47: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian inference: point and interval estimation

For point estimates we could use the prior mean, median or mode forexample.

The classical MLE is θ = 912 = 0.75 and the Bayesian posterior mean is

E [θ|x] = 1422 ≈ 0.636.

We have a weighted average:

14

22=

10

22× 1

2+

12

22× 9

12

E [θ|x] = wE [θ] + (1− w)θ where w =10

22.

For interval estimates we can use a credible interval, i.e. an interval [θ, θ]such that P(θ < θ < θ|x) = 0.95.

The shortest such interval is a highest posterior density (hpd) interval.

A classical 95% confidence interval is (0.505, 0.995) and the posteriorcredible interval is (0.430, 0.819).How do we interpret the two intervals?

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 17 / 21

Page 48: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian inference: point and interval estimation

For point estimates we could use the prior mean, median or mode forexample.

The classical MLE is θ = 912 = 0.75 and the Bayesian posterior mean is

E [θ|x] = 1422 ≈ 0.636.

We have a weighted average:

14

22=

10

22× 1

2+

12

22× 9

12

E [θ|x] = wE [θ] + (1− w)θ where w =10

22.

For interval estimates we can use a credible interval, i.e. an interval [θ, θ]such that P(θ < θ < θ|x) = 0.95.

The shortest such interval is a highest posterior density (hpd) interval.

A classical 95% confidence interval is (0.505, 0.995) and the posteriorcredible interval is (0.430, 0.819).

How do we interpret the two intervals?

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 17 / 21

Page 49: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian inference: point and interval estimation

For point estimates we could use the prior mean, median or mode forexample.

The classical MLE is θ = 912 = 0.75 and the Bayesian posterior mean is

E [θ|x] = 1422 ≈ 0.636.

We have a weighted average:

14

22=

10

22× 1

2+

12

22× 9

12

E [θ|x] = wE [θ] + (1− w)θ where w =10

22.

For interval estimates we can use a credible interval, i.e. an interval [θ, θ]such that P(θ < θ < θ|x) = 0.95.

The shortest such interval is a highest posterior density (hpd) interval.

A classical 95% confidence interval is (0.505, 0.995) and the posteriorcredible interval is (0.430, 0.819).How do we interpret the two intervals?

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 17 / 21

Page 50: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian inference: predictionSuppose that we wish to predict future observations, say Y. Then

f (y|x) =

∫f (y|x, θ)f (θ|x) dθ =

∫f (y|θ)f (θ|x) dθ

in cases of conditionally i.i.d. exchangeable variables.

Example

Let’s try to predict the number of heads, Y , in 10 further throws of the coin.

We know that Y |θ ∼ Binomial(10, θ), independent of the previous tosses.

P(Y = y |x) =

∫ 1

0

P(Y = y |θ)f (θ|x) dθ

= ...

=

(10y

)B(14 + y , 18− y)

B(14, 8)

for y = 0, 1, ..., 10.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 18 / 21

Page 51: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian inference: predictionSuppose that we wish to predict future observations, say Y. Then

f (y|x) =

∫f (y|x, θ)f (θ|x) dθ =

∫f (y|θ)f (θ|x) dθ

in cases of conditionally i.i.d. exchangeable variables.

Example

Let’s try to predict the number of heads, Y , in 10 further throws of the coin.

We know that Y |θ ∼ Binomial(10, θ), independent of the previous tosses.

P(Y = y |x) =

∫ 1

0

P(Y = y |θ)f (θ|x) dθ

= ...

=

(10y

)B(14 + y , 18− y)

B(14, 8)

for y = 0, 1, ..., 10.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 18 / 21

Page 52: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Predictive distributions

The plot shows the classical ”plug in” and Bayesian predictive distributions.

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

y

P

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 19 / 21

Page 53: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian inference is sequential!

We start with a prior, f (θ).

Given data x we update by Bayes theorem to get the posterior f (θ|x).

Now this is our new prior and ...

Given more data, y, we update again to get f (θ|x, y).

In principle this is a big advantage in big data settings, allowing parallelization, etc.

Example

If we observe Y = 6, then

f (θ|x, y) ∝(

106

)θ6(1− θ)4 × 1

B(14, 8)θ14−1(1− θ)8−1

∝ θ20−1(1− θ)12−1

θ|x, y ∼ Beta(20, 12)

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 20 / 21

Page 54: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian inference is sequential!

We start with a prior, f (θ).

Given data x we update by Bayes theorem to get the posterior f (θ|x).

Now this is our new prior and ...

Given more data, y, we update again to get f (θ|x, y).

In principle this is a big advantage in big data settings, allowing parallelization, etc.

Example

If we observe Y = 6, then

f (θ|x, y) ∝(

106

)θ6(1− θ)4 × 1

B(14, 8)θ14−1(1− θ)8−1

∝ θ20−1(1− θ)12−1

θ|x, y ∼ Beta(20, 12)

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 20 / 21

Page 55: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian inference is sequential!

We start with a prior, f (θ).

Given data x we update by Bayes theorem to get the posterior f (θ|x).

Now this is our new prior and ...

Given more data, y, we update again to get f (θ|x, y).

In principle this is a big advantage in big data settings, allowing parallelization, etc.

Example

If we observe Y = 6, then

f (θ|x, y) ∝(

106

)θ6(1− θ)4 × 1

B(14, 8)θ14−1(1− θ)8−1

∝ θ20−1(1− θ)12−1

θ|x, y ∼ Beta(20, 12)

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 20 / 21

Page 56: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian inference is sequential!

We start with a prior, f (θ).

Given data x we update by Bayes theorem to get the posterior f (θ|x).

Now this is our new prior and ...

Given more data, y, we update again to get f (θ|x, y).

In principle this is a big advantage in big data settings, allowing parallelization, etc.

Example

If we observe Y = 6, then

f (θ|x, y) ∝(

106

)θ6(1− θ)4 × 1

B(14, 8)θ14−1(1− θ)8−1

∝ θ20−1(1− θ)12−1

θ|x, y ∼ Beta(20, 12)

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 20 / 21

Page 57: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Bayesian inference is sequential!

We start with a prior, f (θ).

Given data x we update by Bayes theorem to get the posterior f (θ|x).

Now this is our new prior and ...

Given more data, y, we update again to get f (θ|x, y).

In principle this is a big advantage in big data settings, allowing parallelization, etc.

Example

If we observe Y = 6, then

f (θ|x, y) ∝(

106

)θ6(1− θ)4 × 1

B(14, 8)θ14−1(1− θ)8−1

∝ θ20−1(1− θ)12−1

θ|x, y ∼ Beta(20, 12)

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 20 / 21

Page 58: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Summary and next chapter

We have seen an outline of the basic ideas behind Bayesian statistics andillustrated some of the ideas with a coin tossing example.

We have seen that a beta(5,5) prior implied a beta posterior.

Would this be the case with another beta prior?

Are there other situations when we can use “nice” priors.

What if we used a different type of prior? Would this be a problem in a bigdata setting? If so, what can we do?

We’ll see the solutions to these questions in the following classes.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 21 / 21

Page 59: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Summary and next chapter

We have seen an outline of the basic ideas behind Bayesian statistics andillustrated some of the ideas with a coin tossing example.

We have seen that a beta(5,5) prior implied a beta posterior.

Would this be the case with another beta prior?

Are there other situations when we can use “nice” priors.

What if we used a different type of prior? Would this be a problem in a bigdata setting? If so, what can we do?

We’ll see the solutions to these questions in the following classes.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 21 / 21

Page 60: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Summary and next chapter

We have seen an outline of the basic ideas behind Bayesian statistics andillustrated some of the ideas with a coin tossing example.

We have seen that a beta(5,5) prior implied a beta posterior.

Would this be the case with another beta prior?

Are there other situations when we can use “nice” priors.

What if we used a different type of prior? Would this be a problem in a bigdata setting? If so, what can we do?

We’ll see the solutions to these questions in the following classes.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 21 / 21

Page 61: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Summary and next chapter

We have seen an outline of the basic ideas behind Bayesian statistics andillustrated some of the ideas with a coin tossing example.

We have seen that a beta(5,5) prior implied a beta posterior.

Would this be the case with another beta prior?

Are there other situations when we can use “nice” priors.

What if we used a different type of prior? Would this be a problem in a bigdata setting? If so, what can we do?

We’ll see the solutions to these questions in the following classes.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 21 / 21

Page 62: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Summary and next chapter

We have seen an outline of the basic ideas behind Bayesian statistics andillustrated some of the ideas with a coin tossing example.

We have seen that a beta(5,5) prior implied a beta posterior.

Would this be the case with another beta prior?

Are there other situations when we can use “nice” priors.

What if we used a different type of prior? Would this be a problem in a bigdata setting? If so, what can we do?

We’ll see the solutions to these questions in the following classes.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 21 / 21

Page 63: Chapter 1: Bayesian Basics [5mm] [width=1.5in]fortunetelling n… · Chapter 1: Bayesian Basics Conchi Aus n and Mike Wiper Department of Statistics Universidad Carlos III de Madrid

Summary and next chapter

We have seen an outline of the basic ideas behind Bayesian statistics andillustrated some of the ideas with a coin tossing example.

We have seen that a beta(5,5) prior implied a beta posterior.

Would this be the case with another beta prior?

Are there other situations when we can use “nice” priors.

What if we used a different type of prior? Would this be a problem in a bigdata setting? If so, what can we do?

We’ll see the solutions to these questions in the following classes.

Conchi Ausın and Mike Wiper Bayesian Inference ASDM 2018 21 / 21