37
Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference – p.1/32

Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

  • Upload
    others

  • View
    26

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Likelihood and Bayesian Inference

Joe Felsenstein

Department of Genome Sciences and Department of Biology

Likelihood and Bayesian Inference – p.1/32

Page 2: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Bayes’ Theorem

Suppose we have related events, B and some other mutually exclusiveevents A1, A2, A3, . . . , A8 . The probability of B given A3 (for example) is

Prob (A3 | B) =Prob (A3 and B)

Prob (B)=

Prob (A3) Prob (B | A3)

Prob (B)

(Think of B as the data, and the Ai as different hypotheses).

Since the denominator can be rewritten as

Prob (B) = Prob (A1) Prob (B | A1) + . . . + Prob (A8) Prob (B | A8)

We can substitute that in to get the final form of Bayes’ Rule:

Prob (A3|B) =Prob (A3) Prob (B | A3)

Prob (A1) Prob (B | A1) + . . . + Prob (A8) Prob (B | A8)

Likelihood and Bayesian Inference – p.2/32

Page 3: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Odds ratio justification for maximum likelihood

D the dataH1 Hypothesis 1H2 Hypothesis 2| the symbol for “given”

Prob (H1 | D)

Prob (H2 | D)

︸ ︷︷ ︸Posterior odds ratio

=

Prob (D | H1)

Prob (D | H2)

︸ ︷︷ ︸Likelihood ratio

Prob (H1)

Prob (H2)

︸ ︷︷ ︸Prior odds ratio

Likelihood and Bayesian Inference – p.3/32

Page 4: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

A simple example of Bayes Theorem

If a space probe finds no Little Green Men on Mars, when it would have a1/3 chance of missing them if they were there:

likelihoods

0

no

yes

1

Likelihood and Bayesian Inference – p.4/32

Page 5: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

A simple example of Bayes Theorem

If a space probe finds no Little Green Men on Mars, when it would have a1/3 chance of missing them if they were there:

priorsno

yes

likelihoods

0

no

yes

1

4

1× 1/3

1

Likelihood and Bayesian Inference – p.5/32

Page 6: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

A simple example of Bayes Theorem

If a space probe finds no Little Green Men on Mars, when it would have a1/3 chance of missing them if they were there:

priors

posteriors

no

yes

noyes

likelihoods

0

no

yes

1

4

1× 1/3

1=

43

Likelihood and Bayesian Inference – p.6/32

Page 7: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

A simple example of Bayes Theorem

If a space probe finds no Little Green Men on Mars, when it would have a1/3 chance of missing them if they were there:

priors

posteriors

no

yes no

yes

noyes

likelihoods

0

no

yes

1

4

1× 1/3

1=

43

1

4× 1/3

1

Likelihood and Bayesian Inference – p.7/32

Page 8: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

A simple example of Bayes Theorem

If a space probe finds no Little Green Men on Mars, when it would have a1/3 chance of missing them if they were there:

priors

posteriors

no

yes no

yes

noyes

no

yes

likelihoods

0

no

yes

1

4

1× 1/3

1=

43

1

4× 1/3

1=

112

Likelihood and Bayesian Inference – p.8/32

Page 9: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

An example of Bayesian inference with coins

0.0 0.2 0.4 0.6 0.8 1.0

pThe prior on Heads probability – a truncated exponential distribution

Likelihood and Bayesian Inference – p.9/32

Page 10: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

An example of Bayesian inference with coins

0.0 0.2 0.4 0.6 0.8 1.00

The likelihood curve for 11 tosses with 5 heads appearing.(We’ll calculate it in a moment)

Likelihood and Bayesian Inference – p.10/32

Page 11: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

An example of Bayesian inference with coins

0.0 0.2 0.4 0.6 0.8 1.0

pThe resulting posterior on Heads probability

Likelihood and Bayesian Inference – p.11/32

Page 12: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

The likelihood ratio term ultimately dominates

If we see one Little Green Man, the likelihood calculation does the rightthing:

1=

2/3

1

4

(put this way, this is OK but not mathematically kosher)

If after n missions, we keep seeing none, the likelihood ratio term is

(1

3

)n

It dominates the calculation, overwhelming the prior.Thus even if we don’t have a prior we can believe in, we may be interestedin knowing which hypothesis the likelihood ratio is recommending ...

Likelihood and Bayesian Inference – p.12/32

Page 13: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Likelihood in simple coin-tossing

Tossing a coin n times, with probability p of heads, the probability ofoutcome HHTHTTTTHTTH is

pp(1 − p)p(1 − p)(1 − p)(1 − p)(1 − p)p(1 − p)(1 − p)p

which is

L = p5(1 − p)6

Plotting L against p to find its maximum:

0.0 0.2 0.4 0.6 0.8 1.0

Like

lihoo

d

p 0.454

Likelihood and Bayesian Inference – p.13/32

Page 14: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Differentiating to find the maximum:

Differentiating the expression for L with respect to p and equating thederivative to 0, the value of p that is at the peak is found (not surprisingly)to be p = 5/11:

∂L

∂p=

(5

p−

6

1 − p

)

p5(1 − p)6 = 0

5 − 11 p = 0

p̂ =5

11

Likelihood and Bayesian Inference – p.14/32

Page 15: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

A likelihood curve

L

θ

Likelihood and Bayesian Inference – p.15/32

Page 16: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Its maximum likelihood estimate

L

θθ

the MLE

Likelihood and Bayesian Inference – p.16/32

Page 17: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Using the Likelihood Ratio Test

L

θθ

reduce ln L by3.841 / 2

the MLE

Likelihood and Bayesian Inference – p.17/32

Page 18: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

The (approximate, asymptotic) confidence interval

L

θθ

reduce ln L by3.841 / 2

confidence interval

the MLE

Likelihood and Bayesian Inference – p.18/32

Page 19: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Better to plot log(L) rather than L

θ

ln (

L)

Likelihood and Bayesian Inference – p.19/32

Page 20: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Better to plot log(L) rather than L

θθ

the MLE

ln(L

)

Likelihood and Bayesian Inference – p.20/32

Page 21: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Better to plot log(L) rather than L

θθ

the MLE

ln(L

) reduce ln L by3.841 / 2

Likelihood and Bayesian Inference – p.21/32

Page 22: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Better to plot log(L) rather than L

θθ

the MLE

ln(L

) reduce ln L by3.841 / 2

confidence interval

Likelihood and Bayesian Inference – p.22/32

Page 23: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Contours of a likelihood surface in two dimensions

length of branch 1

leng

th o

f bra

nch

2

Likelihood and Bayesian Inference – p.23/32

Page 24: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Where the maximum likelihood estimate is

length of branch 1

leng

th o

f bra

nch

2

MLE

Likelihood and Bayesian Inference – p.24/32

Page 25: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Using the LRT to define a confidence interval

length of branch 1

height of this contour isless than at the peak by an amountequal to 1/2 the chi−square value with

leng

th o

f bra

nch

2

one degree of freedom which is significant at 95% level

Likelihood and Bayesian Inference – p.25/32

Page 26: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Ditto, in the other variable

length of branch 1

height of this contour isless than at the peak by an amountequal to 1/2 the chi−square value with

(shaded area is the joint confidence interval)

leng

th o

f bra

nch

2

one degree of freedom which is significant at 95% level

Likelihood and Bayesian Inference – p.26/32

Page 27: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

A joint confidence region

length of branch 1

height of this contour isless than at the peak by an amountequal to 1/2 the chi−square value with

(shaded area is the joint confidence interval)

leng

th o

f bra

nch

2

two degrees of freedom which is significant at 95% level

Likelihood and Bayesian Inference – p.27/32

Page 28: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

An example with phylogenies: molecular clock?

A B C D E

v2v1

v3

v4 v5

v6

v7

v8

Constraints for a clock

v2v1 =

v4 v5=

v3 v7 v4 v8=+ +

v1 v6 v3=+

Likelihood and Bayesian Inference – p.28/32

Page 29: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Testing for a molecular clock

To test for a molecular clock:Obtain the likelihood with no constraint of a molecular clock (Forprimates data with Ts/Tn = 30 we get ln L1 = −2616.86)

Obtain the highest likelihood for a tree which is constrained to havea molecular clock: ln L0 = −2679.0

Look up 2(ln L1 − ln L0) = 2 × 62.14 = 124.28 on a χ2 distributionwith n − 2 = 12 degrees of freedom (in this case the result issignificant)

Likelihood and Bayesian Inference – p.29/32

Page 30: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

An example – samples from a Poisson distribution

Suppose we have m samples from a Poisson distribution whose(unknown) mean parameter is λ. Suppose the numbers of events we seeare n1, n2, . . . , nm. The likelihood is

L =e−λλn1

n1!×

e−λλn2

n2!× . . . ×

e−λλnm

nm!

collecting powers and exponentials, this becomes

L = e−mλλn1+n2+...+nm/(lots of factorials)

Taking logarithms, which makes it easier

lnL = −mλ +(∑

ni

)

lnλ + (stuff not involving λ)

Can you differentiate this, set to zero? MLE is just the average number ofevents.

λ̂ =

∑ni

m Likelihood and Bayesian Inference – p.30/32

Page 31: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Scale-invariance of ML estimatesIn the case of a tree with one branch, whose length can be expressedeither by the (pseudo-)time t or the probability of base change p, thevalue of p which achieves the highest likelihood corresponds exactly tothe value of t which achieves the highest likelihood, so it doesn’t matterwhich scale we work on as long as one can be translated into the other.

0 1 2 3 4 5−14

−12

−10

−8

−6

t

ln L

t = 0.383112( p = 0.3 )

^

^

0.0 0.2 0.4 0.6 0.8−14

−12

−10

−8

−6

ln L

p

p = 0.3^

^( t = 0.383112 )

Likelihood and Bayesian Inference – p.31/32

Page 32: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Bayesian inference

Bayesian inference uses likelihoods, but has a prior distribution on theunknown parameters.

In theory it just multiplies the prior density by the likelihood curve,

Likelihood and Bayesian Inference – p.32/32

Page 33: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Bayesian inference

Bayesian inference uses likelihoods, but has a prior distribution on theunknown parameters.

In theory it just multiplies the prior density by the likelihood curve,

... then it takes the resulting curve and restandardizes it so the areaunder it is 1.

Likelihood and Bayesian Inference – p.32/32

Page 34: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Bayesian inference

Bayesian inference uses likelihoods, but has a prior distribution on theunknown parameters.

In theory it just multiplies the prior density by the likelihood curve,

... then it takes the resulting curve and restandardizes it so the areaunder it is 1.That is the posterior, the very thing we need.

Likelihood and Bayesian Inference – p.32/32

Page 35: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Bayesian inference

Bayesian inference uses likelihoods, but has a prior distribution on theunknown parameters.

In theory it just multiplies the prior density by the likelihood curve,

... then it takes the resulting curve and restandardizes it so the areaunder it is 1.That is the posterior, the very thing we need.

In practice, for complex models, Markov Chain Monte Carlo(MCMC) methods are used to wander in the parameter space andtake a large enough sample from the posterior.

Likelihood and Bayesian Inference – p.32/32

Page 36: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Bayesian inference

Bayesian inference uses likelihoods, but has a prior distribution on theunknown parameters.

In theory it just multiplies the prior density by the likelihood curve,

... then it takes the resulting curve and restandardizes it so the areaunder it is 1.That is the posterior, the very thing we need.

In practice, for complex models, Markov Chain Monte Carlo(MCMC) methods are used to wander in the parameter space andtake a large enough sample from the posterior.

The controversy between Bayesians and non-Bayesians is reallyover just one thing – whether assuming you know the prior isjustified.

Likelihood and Bayesian Inference – p.32/32

Page 37: Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Bayesian inference

Bayesian inference uses likelihoods, but has a prior distribution on theunknown parameters.

In theory it just multiplies the prior density by the likelihood curve,

... then it takes the resulting curve and restandardizes it so the areaunder it is 1.That is the posterior, the very thing we need.

In practice, for complex models, Markov Chain Monte Carlo(MCMC) methods are used to wander in the parameter space andtake a large enough sample from the posterior.

The controversy between Bayesians and non-Bayesians is reallyover just one thing – whether assuming you know the prior isjustified.

If the prior is flat in that region, the highest point on the likelihoodcurve (i.e., the MLE) is also the peak of the posterior density.

Likelihood and Bayesian Inference – p.32/32