6
Revision – Bayesian Inference You are interested in the proportion θ of students that smoke in class. A recent survey carried by the Ministry of Health reveals that 40 % of all students at the university smoke. Actually, you believe that smoking prevalence among actuarial students is lower than among other university students but you are not very sure of your point. You think that the beta (2,10) distribution represents well your prior expectation and uncertainty about θ. You also decide to collect some data though a random sample (of actuarial students) and it turns out that 5 out the 20 students you interviewed are active smokers. (a) (i) Derive the classical estimate of the proportion of actuarial students who smoke This is just that actual probability of obtaining 5 students that smoke out of a sample that includes 20 students. If we denote Y i as an indicator variable such that Y i =1 if the i th student smokes and 0 otherwise, then the experiment is similar to a series of Bernoulli trial with a probability of success θ (success = student smoke), i.e. P( Y i =y i ¿ =θ y i ¿ and the joint distribution of Y i is just i=1 n θ y i ¿¿ for a fixed order of occurrences. Since the series of successes or failures can occur in ( n y ) ways where ¿ i=1 n y i , the joint distribution of the y i s is just ( n y ) i=1 n θ y i ¿¿ = ( n y ) θ y ¿. The classical method of estimation involves obtaining the value of θ which maximizes the likelihood of the actual sample - which is often but not always achieve by differentiating the likelihood or the log likelihood and setting it to zero, which gives: ^ θ = Number of sucesses Number of trials = y n

Bayesian Inference

Embed Size (px)

Citation preview

Page 1: Bayesian Inference

Revision – Bayesian Inference

You are interested in the proportion θ of students that smoke in class. A recent survey carried by the Ministry of Health reveals that 40 % of all students at the university smoke. Actually, you believe that smoking prevalence among actuarial students is lower than among other university students but you are not very sure of your point. You think that the beta (2,10) distribution represents well your prior expectation and uncertainty about θ.

You also decide to collect some data though a random sample (of actuarial students) and it turns out that 5 out the 20 students you interviewed are active smokers.

(a)

(i) Derive the classical estimate of the proportion of actuarial students who smokeThis is just that actual probability of obtaining 5 students that smoke out of a sample that includes 20 students. If we denote Y i as an indicator variable such that Y i =1 if the ith

student smokes and 0 otherwise, then the experiment is similar to a series of Bernoulli trial with a probability of success θ (success = student smoke), i.e. P(Y i= y i ¿=θ

y i¿ and the joint

distribution of Y i is just ∏i=1

n

θy i¿¿ for a fixed order of occurrences. Since the series of

successes or failures can occur in (ny ) ways where ¿∑i=1

n

y i , the joint distribution of the y is is

just (ny ) ∏i=1n

θy i¿¿ = (ny ) θ y¿. The classical method of estimation involves obtaining the

value of θ which maximizes the likelihood of the actual sample - which is often but not always achieve by differentiating the likelihood or the log likelihood and setting it to zero,

which gives: θ̂ = Number of sucessesNumber of trials

= yn

(ii) Obtain the posterior estimate or the Bayesian estimate of the proportion of students that smoke in class.

The Bayesian approach involves combining the prior information about the unknown parameter with data or put another way, the Bayesian approach is about updating your prior belief about θ with data to produce what is called a posterior estimate.

Our prior belief about θ is represented by beta(2,10) distribution and we write :f (θ )=Beta(2,10)

Thedata is just the joint distributionof Y i ,∧since this distributionis conditional onθ. So we write :

Page 2: Bayesian Inference

f ( y /θ) = (ny ) θ y¿Updating our prior belief about θ means calculating the conditional distribution of θ, given that we observed data Y. In class, we saw that ¿θ , follows the beta(5+2, 20-5+10) = the beta(7,25) distribution.

Graphically:

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

beta( 2 , 10 ) prior, B( 20 , 5 ) data, beta( 7 , 25 ) posterior

theta

Density

PriorLikelihoodPosterior

Someone wrote a nice set of codes to obtain the posterior distribution of a proportion with a beta prior and data that follows a binomial distribution.- sent by mail. Discussion about the graph to follow in class tomorrow.

(iii) Contrast your answer in part (i) and part (ii)

In the classical approach, θ is viewed as a constant and Y is a random variable which varies from sample to sample.

In the Bayesian approach, θ is viewed as a random variable and Y is a constant. In the Bayesian world, parameters are said to be dynamic. This does not imply that θ has no fixed value - it simply means that as long as θ is unknown, we can only postulate about the possible values that θ can take with an appropriate probability distribution which is the posterior probability distribution. The Bayesian approach refutes such ideas as re-sampling or if we were to take several samples etc as the sampling is done once; Bayesian Statisticians believe that one way to limit sampling variability is to combine the sample information with a priori information rather than postulating on unobserved eventual samples.

Page 3: Bayesian Inference

(b) Now consider the posterior density in (ii), suppose that you are asked to tabulate all the possible values of θ and their associated frequencies proposed by this distribution, just like in a frequency table, or a frequency density,

(i) Which value is θ has the highest probability of occurrence or is most likely to occur?This is posterior mode

(ii) In you were to list all the values of θ is ascending order or descending order (where each value would be listed as many times as described by its frequency density), which value would be in the middle? This is the posterior medianThe median is the value m such that P(θ≤ m/y) ≥ ½ and P(θ≥m/y) ≥ 1/2 It is the value m satisfying:

½ = ∫0

m

θ5+2−1¿¿¿ = 0.1932795

(Using R : qbeta(0.5, 7, 25, ncp = 0, lower.tail = TRUE, log.p = FALSE)

(iii) What is the mean value of θThis is the posterior mean = 7/ (7+25) = 0.218

(iv) You are now asked to find the shortest range such that the probability that θ lies within this range is 90%. This is highest posterior density. Will be discussed in class.

(v) You are asked to find another interval [a,b] such that there is a 90 % that θ lies in this interval, but with the additional requirement that P(θ<¿ a/y) ≤ 5 % and that P(θ> b/y)≤ 5 %This is a central posterior density.

i.e. We are looking for a, such that 0.05= ∫0

a

θ6¿¿¿ =0.111

a = 0.1111(Using R : qbeta(0.05, 7, 25, ncp = 0, lower.tail = TRUE, log.p = FALSE)

Page 4: Bayesian Inference

and b such that 0.05= ∫b

1

θ6¿¿¿

b= 0.3466525(Using R : qbeta(0.05, 7, 25, ncp = 0, lower.tail = FALSE, log.p = FALSE)Check : Calculate the area under the density between point a and point b,

i.e in R : pbeta(0.1111, 7, 25, ncp = 0, lower.tail = TRUE, log.p = FALSE) = 0.05002301

and pbeta(0.3466525, 7, 25, ncp = 0, lower.tail = TRUE, log.p = FALSE) = 0.9499999

and .9499999 - 0.05002301 = 0.90

(vi) Test the hypothesis that :Ho = θ<0.1 v/sH1 = θ>0.3

P0 = P(θ<0.1/ y) = = ∫0

0.1

θ6¿¿¿ = 0.03056312

pbeta(0.1, 7, 25, ncp = 0, lower.tail = TRUE, log.p = FALSE)

P1 = P(θ>0.3/ y) = ∫0.3

1

θ6 ¿¿¿ = 0.1346445,

P1 is 0.1346445/0.03056312 = 4.4 as likely as P0 posteriori (after observing the data)

pbeta(0.3, 7, 25, ncp = 0, lower.tail = FALSE, log.p = FALSE)

or 1- pbeta(0.3, 7, 25, ncp = 0, lower.tail = TRUE, log.p = FALSE)

π0 = P(θ<0.1/ y) = = ∫0

0.1

θ1 ¿¿¿ = 0.3026431

π1 = P(θ>0.3/ y) = = ∫0 .3

1

θ1¿¿¿ = 0.1129901