Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014

Psychology 290

Special Topics Study Course: Advanced Meta-analysis

April 7, 2014

The plan for today

• Review Bayes’ theorem• Review the likelihood function• Review maximum likelihood estimation• Prior and posterior distributions• The Bayesian approach to statistical inference

Bayes’ theorem

.P(A)

P(B) B)|P(A A)|P(B

Example of Bayes’ Theorem

• Suppose a woman has had a single unplanned, unprotected sexual encounter.

• She takes a pregnancy test, and it is positive.

• What does she want to know?• What is the probability that I am pregnant?

Example of Bayes’ Theorem (cont.)

• Let ‘B’ denote ‘pregnant’ and ‘A’ denote ‘positive pregnancy test.’

• Suppose P(A|B) is .90, p(A|~B) is .50, and P(B) is .15.

• The marginal P(A) can be expressed as P(A|B)P(B)+ P(A|~B)P(~B) = (.90)(.15) + (.50)(.85) = .56.

• P(B|A) = .90*.15 / .56 = .24107.

Example of Bayes’ theorem (cont.)

• So, on the basis of this one positive result, there is about a 1 in 4 chance that she is pregnant.

• Note that the probabilities used in this example are not accurate, and the problem is oversimplified.


• Not a very satisfying answer.• Solution: retest.• Now, our prior probability of pregnancy is

P(B) = .24107.• We repeat and get another positive.• P(A) = (.90)(.24107) + (.50)(.75893)

= .59643• P(B|A) = .90*.24107/ .59643 = .36377.


• If she repeats this and continues to get positive results, her probabilities of pregnancy are: test 3 = .507, test 4 = .649, test 5 = .769, test 6 = .857, test 7 = .915, test 8 = .951, test 9 = .972, and test 10 = .984.

• Each time she adds a new test (= new data), her posterior probability of being pregnant changes.

Bayesian inference

• The basic idea of Bayesian inference is to apply Bayes’ theorem to the relationship between data and our prior beliefs about parameters.

• In the example, the parameter of interest was P(pregnant).

• We updated our prior belief on the basis of each subsequent test result (data).

Bayesian inference (cont.)

• P(A|B) is the density of the data (proportional to the likelihood).

• P(B) is our prior belief about the parameters.

• P(B|A) is our updated belief about the parameters, given the observed data.

• The updated belief is called the posterior distribution.

The likelihood function

• Joint densities (data, given parameters).• View the joint density as a function of

parameters given the data likelihood function.

• Traditional use of the likelihood function: maximum likelihood estimation.

Properties of maximum likelihood estimates (review)

• Maximum likelihood estimators are often biased.

• Minimum variance estimators.• Likelihood ratio testing.

Prior and posterior distributions

• We have already defined the prior as a belief about the distribution of the parameter(s).

• Non-informative (vague) priors are used when we don’t have strong beliefs about the parameters.

• The posterior distribution is a statement of our belief about the parameters, updated to account for the evidence of the data.

Prior and posterior distributions (cont.)

• A conjugate prior is one chosen to produce a posterior that has the same form as the likelihood function.

• Examples: normal-normal, beta-binomial.

Bayesian estimation

• Bayes estimates are based on the posterior distribution.

• Often, the mean of a parameter’s posterior distribution is used as an estimate of the parameter.

• The math for that can become very difficult. Sometimes, the mode of the posterior is used instead (Bayes modal estimation).

Bayesian estimation (cont.)

• The maximum likelihood estimator may be thought of as a Bayes modal estimator with an uninformative prior.

• Modern computing power can remove the need for the nasty math traditionally needed for Bayesian estimation and inference.

• This makes the Bayesian approach more accessible than it once was.

Bayesian inference

• Bayesian inference involves probabilistic statements about parameters, based on the posterior distribution.

• Probabilistic statements are allowed because Bayesians view the parameters as random variables.

• For example, a Bayesian credibility interval allows us to make the kind of statement we wish we could make when we use confidence intervals.


• In the Bayesian approach, one can discuss the probability that a parameter is in a particular range by calculating the area under the posterior curve for that range.

• For example, I might be able to make the statement that the probability mu exceeds 110 is .75.

• That sort of statement is never possible in frequentist statistics.


• Bayesian inference does not involve null hypotheses. (Formally, the null hypothesis is known to be false if we take the Bayesian perspective. Why?)

• Rather, we make probabilistic statements about parameters.

• We can also compare models probabilistically.

An example using the Peabody data

• Suppose we are interested in estimating the mean Peabody score for a population of 10-year-old children.

• We have strong prior reasons to believe that the mean is 85.

• We operationalize that prior belief by stating that m ~ N(85, 4).

Peabody example (cont.)

• Next, we assume that Peabody itself is normally distributed:

.2

1);|( 2

2

22

2

X

eXP


• Recall that we want a posterior distribution for m.• Bayes’ theorem says

• Note that we can ignore the denominator here, as it is just a scaling constant.

.constant

Prior mu given density Joint Posterior


• Our posterior, then, is proportional to

• Some unpleasant algebra that involves completing the square shows that this is the same as normal with mean = (85s2+4nM) / (4n + s2).

.

2885

exp4

1,| 2

22

2

2

XXP


• The variance of the posterior is 4s2/ (4n + s2).

• In our example, M = 81.675, an estimate of the variance is 119.2506, and n = 40.

• The posterior mean, then, is (85119.2506 + 4 40 81.675) / (4 40 + 119.2506) = 83.095.


• The variance is 4 119.2506 / (4 40 + 119.2506) = 1.708.

• A 95% credibility interval is given by 83.095 ± 1.96 √1.708 = (80.53, 85.66).

• As Bayesians, we may say that the probability that m lies between those values is .95.


• Now let’s suppose that we want to repeat the analysis, but with an uninformative prior for the mean.

• Instead of m ~ N(85, 4), we’ll use N(85, 10000000).

• The posterior distribution of the mean, then, is centered at (85119.2506 + 10000000 40 81.675) / (10000000 40 + 119.2506) = 81.675.


• The variance is 10000000 119.2506 / (10000000 40 + 119.2506) = 2.98126.

• A Bayesian credibility interval for the mean, then, would 81.675 ± 1.96√2.98126 = (78.29, 85.06).

• Although this is identical to the confidence interval we would calculate using frequentist maximum likelihood, we are justified in giving it a Bayesian interpretation.

Documents

Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014