33
Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Embed Size (px)

Citation preview

Page 1: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Coalescent Models for Genetic Demography

What can the Coalescent do for you?

Rosalind HardingUniversity of Oxford

Page 2: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Who was MtEve?

the most recent common ancestor (mcra) to whom all mtDNA haplotype diversity, currently sampled, can be traced.

Page 3: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

One possibility: First a bottleneck, then multiple lineages are established during expansion phases

MtEve

Page 4: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

But if there wasn’t a bottleneck? Then our predecessors collecting data

20,000 years ago, could have identified a different mtEve, an Eve from an earlier generation;

in 20,000 years time, a new generation will be likely to find their mtEve to be a grandn-daughter of our mtEve.

While our mtEve may be special to us, for archaeogeneticists of past and future generations she will have no particular significance!

Page 5: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Insights from coalescent models

Eve?

Eve?

Eve?present

Time

Page 6: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

What is the coalescent? a simple model which generates a

probability distribution for gene genealogies sampled from a population.

Page 7: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Further definitions simple models: abstractions from complex

demographic reality, which preserve key features population: all individuals within a generation with the

potential to contribute to the gene pool (including individuals who are reproductively successful as well as those who are not.)

gene genealogies: lineages of transmission of copies of a gene from parents to offspring

coalescence: where two transmission lineages find a common ancestor, looking backwards in time

probability distribution: a set of probabilities for many possible alternative gene genealogies compatible with the model

Page 8: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Models and data Interpreting genetic polymorphism data

consider a sample of genes from a contemporary population, with their allelic frequencies and sequence identities determined – these data do not reveal our genetic past directly, they must be interpreted.

Options for model choice evolution as phylogeny, phylo-geography evolution as a balance of mutation and genetic drift

in a population with a specified demography (population size, mating pattern, offspring distribution)

Page 9: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Characteristics of polymorphism data For a small proportion of sites in human DNA, a

second allele is present in populations due to a relatively recent mutation; this is polymorphism.

Polymorphism constitutes a transient phase in evolution, intermediate between the occurrence of a mutation and the fixation of either allele at 100%.

MtDNA trees may distort frequencies of polymorphisms. They show sets of mutation events as a proxy for fixed differences; it is the new allele that is assumed to fix (attain 100%).

These potential sources of error for time scale estimates may be minor but could be substantial.

Page 10: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Ingman and Gyllensten, 2003 Genome Research 13:1600-1606Neighbor-joining phylogram of 101 mtDNA coding regions sequences.

Is phylogenetic branching the right model?

Note variable branch lengths and endpoints; yet all individuals sampled in the present!

Page 11: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

A phylogenetic model with added genealogical detail and molecular clock

Page 12: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Trajectories for neutral alleles

Page 13: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Understanding genetic drift as genealogy

Two of the gene copies in gen. t are inherited by all of the offspring copies in generation t+x. This is the process of drift that leads eventually to either loss or fixation (100% frequency in the population) of new mutations.

Ne=10, constant over time

Page 14: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Some advantages of coalescent models over phylogeny for interpreting polymorphism data

they make better use of molecular clocks and do not treat polymorphisms as fixed differences;

as models of populations they clarify the difference between

‘absence of evidence’ (eg for Neanderthal ancestry) and ‘evidence of absence’ (any single locus only represents

such a small sample of ancestors from >50,000 years ago that with present data we don’t have the statistical power to rule out Neanderthal ancestry).

they incorporate some measure of our uncertainty about the evolution of allele frequencies (a mixed process of mutation and transmission in genealogies).

Page 15: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Assumptions of Kingman’s (1982) coalescent for interpreting polymorphism data (random sample)

1. Neutrality2. All new mutations unique and informative3. If individuals are diploid in a population of size

N, the model applies to 2N independent, haploid copies of a gene

4. Random mating within a population 5. Constant population size, Ne 6. A very specific probability distribution for

transmissions of gene copies to 0, 1, 2 … offspring

7. Non-overlapping generations

Page 16: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Aims of coalescent modelling: to make inferences from genetic data

to simulate different demographies to see what to expect in polymorphism data;

to estimate parameters under an explicit demographic model, eg Kingman’s coalescent;

to estimate in which generation (and sub-population) particular lineages coalesced or mutations occurred, given explicit demographic assumptions;

to evaluate the uncertainty in our estimates; to introduce new parameters to improve the

model, judging by its fit to data, to learn about demography.

Page 17: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

The ancestry of a sample composed of two copies of the gene in generation t0

Following the ancestry of a sample of two copies of a gene (gene A) from time t0, ie the present, backwards (red) , we find their most recent common ancestor (MRCA) at generation t8.

MRCA

Page 18: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Expected coalescence times

As the sample size increases towards 2N, E(tmrca) approaches 4N, which equals the fixation time for a newly arisen mutation.

Expected time to coalescence for n lineages

Page 19: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

N expanding N reducing

N0 N0

N1 N1

time

E(T2)=2Ne

E(T5)=Ne/5

E(TMRCA)=4Ne(1-1/5)

Constant N N

Thanks to Lounes for this slide

Page 20: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Simulated genealogies with constant Ne

1 2

3 4

TMRCA

1. 4.57

2. 2.93*

3. 1.48

4. 0.01

eg 2.93x2x10,000x20 = 1.2 million years

units of 2Ne generations

Page 21: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Simulating recent expansion: not much variability in TMRCA between genealogies

1 2

3 4

TMRCA

1. 0.0026

2. 0.0029

3. 0.0028

4. 0.0027

units of 2Ne generations

~1000 years of human evolution

Page 22: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

1. A time scale is given by the coalescent model for the demography (drift history)

Add mutations

2.

Page 23: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Infinite-sites mutation in a gene tree

Page 24: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

The relationship between av pairwise sequence difference, , and the parameter in Kingman’s Coalescent

2N generations

Page 25: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Data: Aboriginal Australian mtDNAs Model: Kingman’s coalescent

MtDNA Coding DNA Sites: 9000 to 16000

Note the non-uniform spacing of mutations

one colonization event?? ? ? ? or

several founding lineages at different times?

Page 26: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Another advantage of coalescent models over phylogeny While the population bottlenecks

implicitly assumed in phylogenetic and phylogeographic analyses can be explicitly assumed in a coalescent framework, alternative demographies may be assumed, or may be inferred.

(the relationship between coalescent nodes and colonization events is very ambiguous.)

Page 27: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Kingman’s coalescent as H0

Kingman’s coalescent model is a starting point, available to us even before we collect any data.

Having collected data, we can test whether the data show goodness-of-fit to the expectations of our starting model.

If not, we should change or add parameters to improve the model. At present there are some options available (not many, but some!)

Page 28: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Variations from Kingman’s coalescent

1. Selection2. Recurrent and back mutation3. Recombination 4. *Non-random mating: eg geographic subdivision

with specified migration between subpopulations5. *Population size fluctuation, including

bottlenecks and expansions6. Non-’Poisson’ distributions of offspring numbers7. Unequal generation intervals between lineages

*similar model but additional parameters

Page 29: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Much migration

Little migration

Each generation m alleles are exchanged between sub-populations.

Discrete migration probability m/2N, an allele migrates.

Continuous waiting time for migration is expo(m)

The coalescent with structure

Page 30: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Summary and points for discussion Data drawn as gene trees show the relative ordering of

coalescence events. The length of time between coalescence events is a

function of the number of mutation events inferred from the data AND the assumed demographic history. (Molecular clocks should NOT be applied directly.)

Present phylo-geographic methods fudge the data to circumvent thinking about demography. Consequently we do not learn anything about demography from them. Furthermore, these methods may be generating some highly inaccurate time estimates and they don’t provide satisfactory estimates of the uncertainty surrounding these estimates.

Coalescent modelling to date draws attention to many concerns, but to improve ‘phylo-geographic’ inference we need implementations of the structured coalescent appropriate for a colonization/extinction demography.

Page 31: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

MtDNACoding DNA

Sites: 500 to 9000

Page 32: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford
Page 33: Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Implications of drift as genealogy All the identical copies of a gene, eg all

the copies of the MC1R-151 red hair allele, carried by thousands of people across Europe, have been inherited from a single common ancestor living some time in the past. Although mutation may have generated MC1R-151 alleles many times, all these mutations were quickly lost, except for one. On one occasion only, the new mutation increased in frequency, becoming a common polymorphism. Could this be true? (We think so!)