87
Week 10: Coalescents, Consensus trees, etc. Genome 570 March, 2016 Week 10: Coalescents, Consensus trees, etc. – p.1/87

Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Week 10: Coalescents, Consensus trees, etc.

Genome 570

March, 2016

Week 10: Coalescents, Consensus trees, etc. – p.1/87

Page 2: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Cann, Stoneking, and Wilson

Becky Cann Mark Stoneking the late Allan Wilson

Cann, R. L., M. Stoneking, and A. C. Wilson. 1987. Mitochondrial DNAand human evolution. Nature 325:a 31-36.

Week 10: Coalescents, Consensus trees, etc. – p.2/87

Page 3: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Mitochondrial Eve

Week 10: Coalescents, Consensus trees, etc. – p.3/87

Page 4: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Gene copies in a population of 10 individuals

Time

A random−mating population

Week 10: Coalescents, Consensus trees, etc. – p.4/87

Page 5: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Going back one generation

Time

A random−mating population

Week 10: Coalescents, Consensus trees, etc. – p.5/87

Page 6: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

... and one more

Time

A random−mating population

Week 10: Coalescents, Consensus trees, etc. – p.6/87

Page 7: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

... and one more

Time

A random−mating population

Week 10: Coalescents, Consensus trees, etc. – p.7/87

Page 8: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

... and one more

Time

A random−mating population

Week 10: Coalescents, Consensus trees, etc. – p.8/87

Page 9: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

... and one more

Time

A random−mating population

Week 10: Coalescents, Consensus trees, etc. – p.9/87

Page 10: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

... and one more

Time

A random−mating population

Week 10: Coalescents, Consensus trees, etc. – p.10/87

Page 11: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

... and one more

Time

A random−mating population

Week 10: Coalescents, Consensus trees, etc. – p.11/87

Page 12: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

... and one more

Time

A random−mating population

Week 10: Coalescents, Consensus trees, etc. – p.12/87

Page 13: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

... and one more

Time

A random−mating population

Week 10: Coalescents, Consensus trees, etc. – p.13/87

Page 14: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

... and one more

Time

A random−mating population

Week 10: Coalescents, Consensus trees, etc. – p.14/87

Page 15: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

... and one more

Time

A random−mating population

Week 10: Coalescents, Consensus trees, etc. – p.15/87

Page 16: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

The genealogy of gene copies is a tree

Time

Genealogy of gene copies, after reordering the copies

Week 10: Coalescents, Consensus trees, etc. – p.16/87

Page 17: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Ancestry of a sample of 3 copies

Time

Genealogy of a small sample of genes from the population

Week 10: Coalescents, Consensus trees, etc. – p.17/87

Page 18: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Here is that tree of 3 copies in the pedigree

Time

Week 10: Coalescents, Consensus trees, etc. – p.18/87

Page 19: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Kingman’s coalescent

Random collision of lineages as go back in time (sans recombination)

Collision is faster the smaller the effective population size

u9

u7

u5

u3

u8

u6

u4

u2

Average time for n

Average time for

copies to coalesce to

4N

k(k−1) k−1 =

In a diploid population of

effective population size N,

copies to coalesce

= 4N (1 − 1

n

( generations

k

Average time for

two copies to coalesce

= 2N generations

What’s misleading about this diagram: the lineages that coalesce arerandom pairs, not necessarily ones that are next to each other in a linearorder.

Week 10: Coalescents, Consensus trees, etc. – p.19/87

Page 20: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

The Wright-Fisher model

This is the canonical model of genetic drift in populations. It was invented

in 1930 and 1932 by Sewall Wright and R. A. Fisher.

In this model the next generation is produced by doing this:

Choose two individuals with replacement (including the possibility thatthey are the same individual) to be parents,

Each produces one gamete, these become a diploid individual,

Repeat these steps until N diploid individuals have been produced.

The effect of this is to have each locus in an individual in the nextgeneration consist of two genes sampled from the parents’ generation atrandom, with replacement.

Week 10: Coalescents, Consensus trees, etc. – p.20/87

Page 21: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

The coalescent – a derivation

The probability that k lineages becomes k − 1 one generation earlier

turns out to be (as each lineage “chooses” its ancestor independently):

k(k − 1)/2 × Prob (First two have same parent, rest are different)

(since there are(k2

)= k(k − 1)/2 different pairs of copies)

We add up terms, all the same, for the k(k − 1)/2 pairs that couldcoalesce; the sum is:

k(k − 1)/2 × 1 × 12N

×(1 − 1

2N

)

×(1 − 2

2N

)× · · · ×

(1 − k−2

2N

)

so that the total probability that a pair coalesces is

= k(k − 1)/4N + O(1/N2)

Week 10: Coalescents, Consensus trees, etc. – p.21/87

Page 22: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Can probabilities of two or more lineages coalescing

Note that the total probability that some combination of lineagescoalesces is

1 − Prob (Probability all genes have separate ancestors)

= 1 −

[

1 ×

(

1 −1

2N

) (

1 −2

2N

)

. . .

(

1 −k − 1

2N

)]

= 1 −

[

1 −1 + 2 + 3 + · · · + (k − 1)

2N+ O(1/N2)

]

and since1 + 2 + 3 + . . . + (n − 1) = n(n − 1)/2

the quantity

= 1 −[

1 − k(k − 1)/4N + O(1/N2)]≃ k(k − 1)/4N + O(1/N2)

Week 10: Coalescents, Consensus trees, etc. – p.22/87

Page 23: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Can calculate how many coalescences are of pairs

This shows, since the terms of order 1/N are the same, that the eventsinvolving 3 or more lineages simultaneously coalescing are in the terms of

order 1/N2 and thus become unimportant if N is large.

Here are the probabilities of 0, 1, or more coalescences with 10 lineages

in populations of different sizes:

N 0 1 > 1

100 0.79560747 0.18744678 0.016945751000 0.97771632 0.02209806 0.0001856210000 0.99775217 0.00224595 0.00000187

Note that increasing the population size by a factor of 10 reduces the

coalescent rate for pairs by about 10-fold, but reduces the rate for triples(or more) by about 100-fold.

Week 10: Coalescents, Consensus trees, etc. – p.23/87

Page 24: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

The coalescent

To simulate a random genealogy, do the following:

1. Start with k lineages

2. Draw an exponential time interval with mean 4N/(k(k − 1))generations.

3. Combine two randomly chosen lineages.

4. Decrease k by 1.

5. If k = 1, then stop

6. Otherwise go back to step 2.

Week 10: Coalescents, Consensus trees, etc. – p.24/87

Page 25: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

An accurate analogy: Bugs In A Box

There is a box ...

Week 10: Coalescents, Consensus trees, etc. – p.25/87

Page 26: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

An accurate analogy: Bugs In A Box

with bugs that are ...

Week 10: Coalescents, Consensus trees, etc. – p.26/87

Page 27: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

An accurate analogy: Bugs In A Box

hyperactive, ...

Week 10: Coalescents, Consensus trees, etc. – p.27/87

Page 28: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

An accurate analogy: Bugs In A Box

indiscriminate, ...

Week 10: Coalescents, Consensus trees, etc. – p.28/87

Page 29: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

An accurate analogy: Bugs In A Box

voracious ...

Week 10: Coalescents, Consensus trees, etc. – p.29/87

Page 30: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

An accurate analogy: Bugs In A Box

(eats other bug) ...

Gulp!

Week 10: Coalescents, Consensus trees, etc. – p.30/87

Page 31: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

An accurate analogy: Bugs In A Box

and insatiable.

Week 10: Coalescents, Consensus trees, etc. – p.31/87

Page 32: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Random coalescent trees with 16 lineages

O C S M L P K E J I T R H Q F B N D G A M J B F G C E R A S Q K N L H T I P D O B G T M L Q D O F K P E A I J S C H R N F R N L M D H B T C Q S O G P I A K J E

I Q C A J L S G P F O D H B M E T R K N R C L D K H O Q F M B G S I T P A J E N N M P R H L E S O F B G J D C I T K Q A N H M C R P G L T E D S O I K J Q F A B

Week 10: Coalescents, Consensus trees, etc. – p.32/87

Page 33: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Coalescence is faster in small populations

Change of population size and coalescents

Ne

time

the changes in population size will produce waves of coalescence

time

Coalescence events

time

the tree

The parameters of the growth curve for Ne can be inferred by

likelihood methods as they affect the prior probabilities of those trees

that fit the data.

Week 10: Coalescents, Consensus trees, etc. – p.33/87

Page 34: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Migration can be taken into account

Time

population #1 population #2Week 10: Coalescents, Consensus trees, etc. – p.34/87

Page 35: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Recombination creates loops

Recomb.

Different markers have slightly different coalescent trees

Week 10: Coalescents, Consensus trees, etc. – p.35/87

Page 36: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

If we have a sample of 50 copies

50−gene sample in a coalescent tree

Week 10: Coalescents, Consensus trees, etc. – p.36/87

Page 37: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

The first 10 account for most of the branch length

10 genes sampled randomly out of a

50−gene sample in a coalescent tree

Week 10: Coalescents, Consensus trees, etc. – p.37/87

Page 38: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

... and when we add the other 40 they add less length

10 genes sampled randomly out of a

50−gene sample in a coalescent tree

(purple lines are the 10−gene tree)Week 10: Coalescents, Consensus trees, etc. – p.38/87

Page 39: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

We want to be able to analyze human evolution

Africa

Europe Asia

"Out of Africa" hypothesis

(vertical scale is not time or evolutionary change)

Week 10: Coalescents, Consensus trees, etc. – p.39/87

Page 40: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

coalescent and “gene trees” versus species trees

The species tree

Week 10: Coalescents, Consensus trees, etc. – p.40/87

Page 41: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

coalescent and “gene trees” versus species trees

Consistency of gene tree with species tree

Week 10: Coalescents, Consensus trees, etc. – p.41/87

Page 42: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

coalescent and “gene trees” versus species trees

Consistency of gene tree with species tree

Week 10: Coalescents, Consensus trees, etc. – p.42/87

Page 43: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

coalescent and “gene trees” versus species trees

Consistency of gene tree with species tree

Week 10: Coalescents, Consensus trees, etc. – p.43/87

Page 44: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

coalescent and “gene trees” versus species trees

Consistency of gene tree with species tree

Week 10: Coalescents, Consensus trees, etc. – p.44/87

Page 45: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

coalescent and “gene trees” versus species trees

Consistency of gene tree with species tree

coalescence time

Week 10: Coalescents, Consensus trees, etc. – p.45/87

Page 46: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

If the branch is more than Ne generations long ...

t1

t2

N1

N2

N4

N3

N5

Gene tree and Species tree

Week 10: Coalescents, Consensus trees, etc. – p.46/87

Page 47: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

If the branch is more than Ne generations long ...

t1

t2

N1

N2

N4

N3

N5

Gene tree and Species tree

Week 10: Coalescents, Consensus trees, etc. – p.47/87

Page 48: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

If the branch is more than Ne generations long ...

t1

t2

N1

N2

N4

N3

N5

Gene tree and Species tree

Week 10: Coalescents, Consensus trees, etc. – p.48/87

Page 49: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

How do we compute a likelihood for a population sample?

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTCAGCGTCC

CAGTTTCAGCGTCC

CAGTTTCAGCGTCC

CAGTTTCAGCGTCC

CAGTTTCAGCGTCC

CAGTTTCAGCGTCC

CAGTTTCAGCGTCC

CAGTTTCAGCGTCC

CAGTTTTGGCGTCC

CAGTTTTGGCGTCCCAGTTTTGGCGTCC

CAGTTTTGGCGTCC

CAGTTTTGGCGTCC

CAGTTTCAGCGTAC

CAGTTTCAGCGTAC

CAGTTTCAGCGTAC

, CAGTTTCAGCGTCC CAGTTTCAGCGTCC ), ... L = Prob ( = ??

Week 10: Coalescents, Consensus trees, etc. – p.49/87

Page 50: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

If we have a tree for the sample sequences, we can

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTCAGCGTCC

CAGTTTCAGCGTCC

CAGTTTCAGCGTCC

CAGTTTCAGCGTCC

CAGTTTTGGCGTCCCAGTTTTGGCGTCC

CAGTTTTGGCGTCC

CAGTTTTGGCGTCC

CAGTTTCAGCGTAC

CAGTTTCAGCGTAC

CAGTTTCAGCGTAC

CAGTTTCAGCGTCC

, CAGTTTCAGCGTCC CAGTTTCAGCGTCCProb( | Genealogy)

so we can compute

but how to computer the overall likelihood from this?

, ...

CAGTTTCAGCGTCC

CAGTTTTAGCGTCCCAGTTTTAGCGTCC

CAGTTTCAGCGTCCCAGTTTTGGCGTCC

CAGTTTCAGCGTCC

Week 10: Coalescents, Consensus trees, etc. – p.50/87

Page 51: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

The basic equation for coalescent likelihoods

In the case of a single population with parameters

Ne effective population sizeµ mutation rate per site

and assuming G′ stands for a coalescent genealogy and D for the

sequences,

L = Prob (D | Ne, µ)

=∑

G′

Prob (G′ | Ne) Prob (D | G′, µ)

︸ ︷︷ ︸ ︸ ︷︷ ︸

Kingman′s prior likelihood of tree

Week 10: Coalescents, Consensus trees, etc. – p.51/87

Page 52: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Rescaling the branch lengths

Rescaling branch lengths of G′ so that branches are given in expected

mutations per site, G = µG′ , we get (if we let Θ = 4Neµ )

L =∑

G

Prob (G | Θ) Prob (D | G)

as the fundamental equation. For more complex population scenarios onesimply replaces Θ with a vector of parameters.

Week 10: Coalescents, Consensus trees, etc. – p.52/87

Page 53: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

The variability comes from two sources

Ne

Ne

can reduce variability by looking at

(i) more gene copies, or

(2) Randomness of coalescence of lineages

affected by the

can reduce variance of

branch by examining more sites

number of mutations per site per

mutation rate

(1) Randomness of mutation

affected by effective population size

coalescence times allow estimation of

µ

(ii) more loci

Week 10: Coalescents, Consensus trees, etc. – p.53/87

Page 54: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Computing the likelihood: averaging over coalescents

t

t

Lik

elih

oo

d o

f t

Lik

elih

oo

d o

f

The product of the prior on t,

times the likelihood of that t from the data,

when integrated over all possible t’s, gives the

likelihood for the underlying parameter

The likelihood calculation in a sample of two gene copies

t

Θ

Prio

r P

rob

of

t

Θ1

Θ

Θ

Week 10: Coalescents, Consensus trees, etc. – p.54/87

Page 55: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Computing the likelihood: averaging over coalescents

t

t

Lik

elih

oo

d o

f t

Lik

elih

oo

d o

f

The product of the prior on t,

times the likelihood of that t from the data,

when integrated over all possible t’s, gives the

likelihood for the underlying parameter

The likelihood calculation in a sample of two gene copies

t

2ΘΘ

Prio

r P

rob

of

t

Θ

Θ

Week 10: Coalescents, Consensus trees, etc. – p.55/87

Page 56: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Computing the likelihood: averaging over coalescents

t

t

Lik

elih

oo

d o

f t

Lik

elih

oo

d o

f

The product of the prior on t,

times the likelihood of that t from the data,

when integrated over all possible t’s, gives the

likelihood for the underlying parameter

The likelihood calculation in a sample of two gene copies

t

Θ

Prio

r P

rob

of

t

Θ

Θ

Week 10: Coalescents, Consensus trees, etc. – p.56/87

Page 57: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Computing the likelihood: averaging over coalescents

t

t

Lik

elih

oo

d o

f t

Lik

elih

oo

d o

f

The product of the prior on t,

times the likelihood of that t from the data,

when integrated over all possible t’s, gives the

likelihood for the underlying parameter

The likelihood calculation in a sample of two gene copies

t

Θ

Prio

r P

rob

of

t

Θ1

Θ

Θ

Week 10: Coalescents, Consensus trees, etc. – p.57/87

Page 58: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Labelled histories

Labelled Histories (Edwards, 1970; Harding, 1971)

Trees that differ in the time−ordering of their nodes

A B C D

A B C D

These two are the same:

A B C D

A B C D

These two are different:

Week 10: Coalescents, Consensus trees, etc. – p.58/87

Page 59: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Sampling approaches to coalescent likelihood

Bob Griffiths Simon Tavaré Mary Kuhner and Jon Yamato

Week 10: Coalescents, Consensus trees, etc. – p.59/87

Page 60: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Monte Carlo integration

To get the area under a curve, we can either evaluate the function (f(x)) ata series of grid points and add up heights × widths:

or we can sample at random the same number of points, add up height ×width:

Week 10: Coalescents, Consensus trees, etc. – p.60/87

Page 61: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Importance sampling

Week 10: Coalescents, Consensus trees, etc. – p.61/87

Page 62: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Importance sampling

The function we integrate

We sample from this density

f(x)

g(x)

Week 10: Coalescents, Consensus trees, etc. – p.62/87

Page 63: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

The math of importance sampling

∫f(x) dx =

∫ f(x)g(x) g(x) dx

= Eg

[f(x)g(x)

]

which is the expectation for points sampled from g(x) of the ratio f(x)g(x) .

This is approximated by sampling a lot (n) of points from g(x) and thecomputing the average:

L =1

n

n∑

i=1

f(xi)

g(xi)

Week 10: Coalescents, Consensus trees, etc. – p.63/87

Page 64: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Rearrangement to sample points in tree space

A conditional coalescent rearrangement strategy

Week 10: Coalescents, Consensus trees, etc. – p.64/87

Page 65: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Dissolving a branch and regrowing it backwards

First pick a random node (interior or tip) and remove its subtree

Week 10: Coalescents, Consensus trees, etc. – p.65/87

Page 66: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

We allow it coalesce with the other branches

Then allow this node to re−coalesce with the tree

Week 10: Coalescents, Consensus trees, etc. – p.66/87

Page 67: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

and this gives another coalescent

The resulting tree proposed by this process

Week 10: Coalescents, Consensus trees, etc. – p.67/87

Page 68: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

The resulting likelihood ratio is

L(Θ)

L(Θ0)=

1

n

n∑

i=1

Prob (Gi|Θ)

Prob (Gi|Θ0)

(“Wait a second – where in this expression is the data?”) It’s in thesampling that gives you the Gi: the data biases those samples in thecorrect way.

Week 10: Coalescents, Consensus trees, etc. – p.68/87

Page 69: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

An example of an MCMC likelihood curve

0

−10

−20

−30

−40

−50

−60

−70

−80

0.001 0.002 0.005 0.01 0.02 0.05 0.1

Θ

ln L

0.00650776

Results of analysing a data set with 50 sequences of 500 bases

which was simulated with a true value of Θ = 0.01

Week 10: Coalescents, Consensus trees, etc. – p.69/87

Page 70: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Major MCMC likelihood or Bayesian programs

LAMARC by Mary Kuhner and Jon Yamato and others. Likelihoodinference with multiple populations, recombination, migration,population growth. No historical branching events or serialsampling, yet.

BEAST by Andrew Rambaut, Alexei Drummond and others.Bayesian inference with multiple populations related by a tree.Support for serial sampling (no migration or recombination yet).

genetree by Bob Griffiths and Melanie Bahlo. Likelihood inference ofmigration rates and changes in population size. No recombination orhistorical branching events.

migrate by Peter Beerli. Likelihood inference with multiplepopulations and migration rates. No recombination or historicalbranching events yet.

IM and IMa by Rasmus Nielsen and Jody Hey. Two or morepopulations allowing both historical splitting and migration after that.No recombination yet.

Week 10: Coalescents, Consensus trees, etc. – p.70/87

Page 71: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Trees we will use for consensus trees

DA C B E D FG A CG F B E D A CG F B E

Week 10: Coalescents, Consensus trees, etc. – p.71/87

Page 72: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Trees we will use for consensus trees

A C B E D FG A CG F B E D A CG F B E D

(for unrooted trees we would use partitions induced by branches insteadof clades)

Week 10: Coalescents, Consensus trees, etc. – p.72/87

Page 73: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Trees we will use for consensus trees

A C B E D FG A CG F B E D A CG F B E D

Week 10: Coalescents, Consensus trees, etc. – p.73/87

Page 74: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Trees we will use for consensus trees

A C B E D FG A CG F B E D A CG F B E D

(Do we count this one if the trees are considered rooted? unrooted?)

Week 10: Coalescents, Consensus trees, etc. – p.74/87

Page 75: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Trees we will use for consensus trees

A C B E D FG A CG F B E D A CG F B E D

Here is a clade that is found on only two of the trees, so it is not includedin the Strict Consensus Tree.

Week 10: Coalescents, Consensus trees, etc. – p.75/87

Page 76: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Their strict consensus tree

A CG F B E D

Week 10: Coalescents, Consensus trees, etc. – p.76/87

Page 77: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

A distressing case for the strict consensus tree

A B C D E F G B C D E F G A

Only one species moves ...

Week 10: Coalescents, Consensus trees, etc. – p.77/87

Page 78: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

A distressing case for the strict consensus tree

A B C D E F G

... but the strict consensus tree becomes totally unresolved.

Week 10: Coalescents, Consensus trees, etc. – p.78/87

Page 79: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Majority-rule consensus tree

A CG F B E D

100

100

67

67100

Week 10: Coalescents, Consensus trees, etc. – p.79/87

Page 80: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

The Adams consensus tree

For rooted trees, Adams (1972, 1986) suggested:

1. Take all rooted triples on each tree.

2. Retain those that are not contradicted, where lack of resolution doesnot count as contradiction.

3. Construct a tree of these.

Week 10: Coalescents, Consensus trees, etc. – p.80/87

Page 81: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Two of the possible triples to examine

DA C B E D FG A CG F B E D A CG F B E

The green triple shows the same rooted topology on all three trees. The

red triple is contradicted and does not get used in the Adams ConsensusTree.

Week 10: Coalescents, Consensus trees, etc. – p.81/87

Page 82: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

The Adams consensus tree

A CG F B E D

Week 10: Coalescents, Consensus trees, etc. – p.82/87

Page 83: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

Steel, Böcker, and Dress’s shocking disproof

Steel, M., S. Böcker, and A. W. M. Dress. 2000. Simple but fundamentallimits for supertree and consensus tree methods. Systematic Biology 49(2):363-368.

They put forward three minimal requirements for an unrooted Adams-likeconsensus tree based on observations of quartets, rather than triples.Note that a quartet, like a triple, has three possible topologies, but

unrooted ones: ((A,B),(C,D)) and ((A,C),(B,D)) and ((A,D),(B,C)).

The result shouldn’t be altered by relabelling all the species in aconsistent way.

The result should not depend on the order in which the trees areinput.

If a quartet appears in all trees, it should appear in the consensus.

Alas, they then show there is no consensus tree method for unrootedtrees that can satisfy all of these!

Week 10: Coalescents, Consensus trees, etc. – p.83/87

Page 84: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

A consensus subtree

F C A B G DFCA BDE F AB D E

Week 10: Coalescents, Consensus trees, etc. – p.84/87

Page 85: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

A consensus subtree

F C A B G DFCA BDE F AB D E

Week 10: Coalescents, Consensus trees, etc. – p.85/87

Page 86: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

A consensus subtree

F C A B G DFCA BDE F AB D E

B DFA

Week 10: Coalescents, Consensus trees, etc. – p.86/87

Page 87: Week 10: Coalescents, Consensus trees, etc.evolution.gs.washington.edu/gs570/2016/week10.pdfWeek10:Coalescents,Consensustrees,etc.–p.50/87 The basic equation for coalescent likelihoods

A supertree

F C A B G D

F C A BA B G DC A B D

Construct a tree with all tips, for which each of the smaller trees is asubtree. What to do if there is conflict? There are various suggestions.

Week 10: Coalescents, Consensus trees, etc. – p.87/87