61
Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

  • View
    217

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Inferring human demographic history from DNA sequence

data

Apr. 28, 2009

J. WallInstitute for Human Genetics, UCSF

Page 2: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Standard model of human evolution

Page 3: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Standard model of human evolution(Origin and spread of genus Homo)

2 – 2.5 Mya

Page 4: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Standard model of human evolution(Origin and spread of genus Homo)

1.6 – 1.8 Mya

?

?

Page 5: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Standard model of human evolution(Origin and spread of genus Homo)

0.8 – 1.0 Mya

Page 6: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Standard model of human evolutionOrigin and spread of ‘modern’ humans

150 – 200 Kya

Page 7: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Standard model of human evolutionOrigin and spread of ‘modern’ humans

~ 100 Kya

Page 8: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Standard model of human evolutionOrigin and spread of ‘modern’ humans

40 – 60 Kya

Page 9: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Standard model of human evolutionOrigin and spread of ‘modern’ humans

15 – 30 Kya

Page 10: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Estimating demographic parameters

• How can we quantify this qualitative scenario into an explicit model?

• How can we choose a model that is both biologically feasible as well as computationally tractable?

• How do we estimate parameters and quantify uncertainty in parameter estimates?

Page 11: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Estimating demographic parameters

• Calculating full likelihoods (under realistic models including recombination) is computationally infeasible

• So, compromises need to be made if one is interested in parameter estimation

Page 12: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

African populations

10 populations

229 individuals

Page 13: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

African populations

San (bushmen)

Biaka (pygmies)

Mandenka (bantu)

61 autosomal loci~ 350 Kb sequence data

Page 14: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

A simple model of African population history

T

mg1

g2

Mandenka Biaka

(or San)

Page 15: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Estimation method

We use a composite-likelihood method (cf. Plagnol and Wall 2006) that uses information from the joint frequency spectrum such as:

Numbers of segregating sites

Numbers of shared and fixed differences

Tajima’s D

FST

Fu and Li’s D*

Page 16: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Estimation method

We use a composite-likelihood method (cf. Plagnol and Wall 2006) that uses information from the joint frequency spectrum such as:

Numbers of segregating sites

Numbers of shared and fixed differences

Tajima’s D

FST

Fu and Li’s D*

Page 17: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Estimating likelihoods

Pop1 Pop2

Page 18: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Estimating likelihoods

Pop1 Pop2

Pop 1 private polymorphisms

Page 19: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Estimating likelihoods

Pop1 Pop2

Pop 1 private polymorphisms

Pop 2 private polymorphisms

Page 20: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Estimating likelihoods

Pop1 Pop2

Pop 1 private polymorphisms

Pop 2 private polymorphisms

Shared polymorphisms

Page 21: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Estimation method

We use a composite-likelihood method (cf. Plagnol and Wall 2006) that uses information from the joint frequency spectrum such as:

Numbers of segregating sites

Numbers of shared and fixed differences

Tajima’s D

FST

Fu and Li’s D*

Page 22: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Estimating likelihoods

We assume these other statistics are multivariate normal.

Then, we run simulations to estimate the means and the covariance matrix.

This accounts (in a crude way) for dependencies across different summary statistics.

Page 23: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Composite likelihood

We form a composite likelihood by assuming these two classes of summary statistics are independent from each other

We estimate the (composite)-likelihood over a grid of values of g1, g2, T and M and tabulate the MLE.

We also use standard asymptotic assumptions to estimate confidence intervals

Page 24: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Estimates (with 95% CI’s)

Parameter Man-Bia Man-San

g1 (000’s) 0 (0 – 3.8) 0 (0 – 3.8)

g2 (000’s) 4 (0 – 7.9) 2 (0 – 11)

T (000’s) 450 (300 – 640) 100 (77 – 550)

M (= 4Nm) 10 (8.4 – 12) 3 (2.2 – 4)

Page 25: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Fit of the null model

How well does the demographic null model fit the

patterns of genetic variation found in the actual

data?

Page 26: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Fit of the null model

How well does the demographic null model fit the

patterns of genetic variation found in the actual

data?

Quite well. The model accurately reproduces both

parameters used in the original fitting (e.g.,

Tajima’s D in each population) as well as other

aspects of the data (e.g., estimates of ρ = 4Nr)

Page 27: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Estimates (with 95% CI’s)

Parameter Man-Bia Man-San

g1 (000’s) 0 (0 – 3.8) 0 (0 – 3.8)

g2 (000’s) 4 (0 – 7.9) 2 (0 – 11)

T (000’s) 450 (300 – 640) 100 (77 – 550)

M (= 4Nm) 10 (8.4 – 12) 3 (2.2 – 4)

Page 28: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Population growth

time

popu

latio

n si

ze

Page 29: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Population growth

time

popu

latio

n si

ze

spread of agriculture and animal husbandry?

Page 30: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Estimates (with 95% CI’s)

Parameter Man-Bia Man-San

g1 (000’s) 0 (0 – 3.8) 0 (0 – 3.8)

g2 (000’s) 4 (0 – 7.9) 2 (0 – 11)

T (000’s) 450 (300 – 640) 100 (77 – 550)

M (= 4Nm) 10 (8.4 – 12) 3 (2.2 – 4)

Page 31: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Ancestral structure in Africa

At face value, these results suggest that population structure within Africa is old, and predates the migration of modern humans out of Africa.

Is there any evidence for additional (unknown) ancient population structure within Africa?

Page 32: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Model of ancestral structure

T

mg1

g2

Mandenka Biaka

(or San)

Archaic human population

Page 33: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Standard model of human evolutionOrigin and spread of ‘modern’ humans

~ 100 Kya

Page 34: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Admixture mappingModern human DNA Neandertal DNA

Page 35: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Admixture mappingModern human DNA Neandertal DNA

Page 36: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Admixture mappingModern human DNA Neandertal DNA

Page 37: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Admixture mappingModern human DNA Neandertal DNA

Page 38: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Admixture mappingModern human DNA Neandertal DNA

Orange chunks are ~10 – 100 Kb in length

Page 39: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Genealogy with archaic ancestrytime

present

Modern humans

Archaic humans

Page 40: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Genealogy without archaic ancestrytime

present

Modern humans

Archaic humans

Page 41: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Our main questions

• What pattern does archaic ancestry produce in DNA sequence polymorphism data (from extant humans)?

• How can we use data to – estimate the contribution of archaic humans to

the modern gene pool (c)? – test whether c > 0?

Page 42: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Genealogy with archaic ancestry(Mutations added)

time

present

Modern humans

Archaic humans

Page 43: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Genealogy with archaic ancestry(Mutations added)

time

present

Modern humans

Archaic humans

Page 44: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Patterns in DNA sequence data

Sequence 1 A T C C A C A G C T G

Sequence 2 A G C C A C G G C T G

Sequence 3 T G C G G T A A C C T

Sequence 4 A G C C A C A G C T G

Sequence 5 T G T G G T A A C C T

Sequence 6 A G C C A T A G A T G

Sequence 7 A G C C A T A G A T G

Page 45: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Patterns in DNA sequence data

Sequence 1 A T C C A C A G C T G

Sequence 2 A G C C A C G G C T G

Sequence 3 T G C G G T A A C C T

Sequence 4 A G C C A C A G C T G

Sequence 5 T G T G G T A A C C T

Sequence 6 A G C C A T A G A T G

Sequence 7 A G C C A T A G A T G

Page 46: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Patterns in DNA sequence data

Sequence 1 A T C C A C A G C T G

Sequence 2 A G C C A C G G C T G

Sequence 3 T G C G G T A A C C T

Sequence 4 A G C C A C A G C T G

Sequence 5 T G T G G T A A C C T

Sequence 6 A G C C A T A G A T G

Sequence 7 A G C C A T A G A T G

We call the sites in red congruent sites – these are sites inferred to be on the same branch of an unrooted tree

Page 47: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Linkage disequilibrium (LD)LD is the nonrandom association of alleles at different sites.

Low LD: A C High LD: A CA T A CA C A CA T A CG C G TG T G TG C G TG T G T

High recombination Low recombination

Page 48: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Measuring ‘congruence’

To measure the level of ‘congruence’ in SNP data from

larger regions we define a score function

S* =

where S (i1, . . . ik) =

and S (ij, ij+1) is a function of both congruence (or near

congruence) and physical distance between ij and ij+1.

)(max},...2,1{IS

nI

1

11),(

k

jjj iiS

Page 49: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

An example

Page 50: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

An example (CHRNA4)

Page 51: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

An example (CHRNA4)

How often is S* from simulations greater than or equal to the S* value from the actual data?

Page 52: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

An example (CHRNA4)

How often is S* from simulations greater than or equal to the S* value from the actual data? p = 0.025

Page 53: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

S* is sensitive to ancient admixture

Page 54: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

General approach

We use the model parameters estimated before (growth rates, migration rate, split time) as a demographic null model.

Is our null model sufficient to explain the patterns of LD in the data?

We test this by comparing the observed S* values with the distribution of S* values calculated from data simulated under the null model.

Page 55: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Distribution of p-values(Mandenka and San)

p-value

Page 56: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Distribution of p-values(Mandenka and San)

p-value

Global p-value: 2.5 * 10-5

Page 57: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Estimating ancient admixture rates

The global p-values for S* are highly significant in every population that we’ve studied!

If we estimate the ancient admixture rate in our (composite)-likelihood framework, we can exclude no ancient admixture for all populations studied.

Page 58: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

A region on chromosome 4

Page 59: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

A region on chromosome 4

19 mutations (from 6 Kb of sequence) separate 3 Biaka sequences from all of the other sequences in our sample.

Simulations suggest this cannot be caused by recent population structure (p < 10-3)

This corresponds to isolation lasting ~1.5 million years!

Page 60: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Possible explanations

• Isolation followed by later mixing is a recurrent feature of human population history

• Mixing between ‘archaic’ humans and modern humans happened at least once prior to the exodus of modern humans out of Africa

• Some other feature of population structure is unaccounted for in our simple models

Page 61: Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

Acknowledgments

Collaborators:Mike Hammer (U. of Arizona)Vincent Plagnol (Cambridge University)

Samples: Foundation Jean Dausset (CEPH)Y chromosome consortium (YCC)

Funding: National Science FoundationNational Institutes for Health