View
3.171
Download
1
Tags:
Embed Size (px)
Citation preview
Human genetic population structure: patterns and underlying processes
Guido Barbujani
Dipartimento di Biologia ed Evoluzione, Università di Ferrara
• Our genome is very small• Our genome is very large• Our genomes are very similar• Our genomes are very different
Human genetic population structure: patterns and underlying processes
There are clear morphologicaldifferences (“types”)
But each group harbours extensive diversity
Analyses of morphological traits led to inconsistent lists of races
Linnaeus (1758) 4 (europeus, asiaticus, afer, americanus) [+2]Blumenbach (1795) 5 (same, + australianus)Cuvier (1828) 3 (caucasoid, negroid, mongoloid)Huxley (1875) 4 (mongoloid, xanthocroid, australoid, negroid)Deniker (1900) 29Weinert (1935) 17Von Eickstedt (1937) 38Museum of Nat. Hist. Chicago (1933) 107Coon (1967) 5 (negroid, capoid, caucasoid, mongoloid, australoid)Risch (2002) 5 (different in different articles)
According to Molnar (1975) 20th century lists include from 3 to 200 items
Skin colour
Stature
Variation is continuous and discordant. It is possible to cluster people one the basis of any trait, but the resulting classification does not allow one to predict clustering for other traits
The trouble with morphological traits
1. Estimating variances from sequence comparisons
-TACGAACATCAGGC--TATGAACATCAGGC--TATGAACATCGGGC-
Independent studies of genetic variances yield very similar results: 85, 5, 10
Lewontin (1972) 17 loci 85% 8% 6%Latter (1973) 18 86% 5% 9%Barbujani et al. (1997) 109 85% 5% 10%Jorde et al. (2000) 100 85% 2% 13%Romualdi et al. (2002) 32 83% 8% 9%Rosenberg et al. (2002) 377 93% 3% 4%Excoffier & Hamilton (2003) 377 88% 3% 9%Ramachandran et al. (2005) 17 90% 5% 5%Bastos-Rodriguez et al. (2006) 40 86% 2% 12%Li et al. (2008) 650 000 89% 2% 9%
MEDIAN 85% 5% 10%
within populations
among populations
among continents
What does it mean, in practice?
100%
100%100%
Members of our community are only slightly less different from us than members of distant populations
85%85%
85%
Mind the numbers
Humans and chimps share >98% of their genomes
Among the 2% differences, 1.9% are fixed differences within species
The remaining fraction, 0.1%, contains all human genomic variation
85% of that 0.1% represents differences among members of the same population
The differences among the main continental groups represent 10% of 0.1% of the total, that is, 0.01%
But 0.01% of <3 billion DNA sites means <300 000 variable sites
2. Clustering genotypes or haplotypes
Rosenberg et al., 2002
Clustering genotypes by algorithms identifying structure
K=3
K=4
SNPs
Haplotypes
CNV
Jakobsson et al. 2008
Structure inferred from SNPs and haplotypes differs from that inferred from Copy Number Variation
Genes, as well as morphology, suggest inconsistent clusterings of genotypes
Africa
Asia, Europe, Australia, Americas
Americas
Africa, Asia, Americas,Oceania
Asia Europe
Africa, Asia,EuropeOceania
Y chromosome: Romualdi et al. 2002
Alu insertions: Romualdi et al. 2002
X chromosome: Wilson et al. 2001
Europe,Ethiopia
S. Africa N. Guinea
Asia
Genes, as well as morphology, suggest inconsistent clusterings of genotypes
377 STR loci: Rosenberg et al. 2005
Melanesia Eurasia N Africa N America
Maya
S. Africa
377 STR loci: Barbujani and Belle 2006
E Africa
C Africa
Piapoco
Suruì
Karitiana
Kalash
W. Eurasia
E. Asia
Africa
Americas
Oceania
Sampling has a large effect on the apparent structuring
Serre and Pääbo 2004
Variation is continuous and discordant. It is possible to cluster people one the basis of any trait, but the resulting classification does not allow one to predict clustering for other traits
The trouble with genetic traits
MCPH D-haplogroup
NAT2 acetylator
Sampling points in the geographic space
3. Identifying genomic boundaries
The sampling points are connected by edges
d
d
d
dd
d
d
d
dd
d
d
d
d
d
dd
d
d
d
d
d
d
d
d
d
d
d
d
d
dd
d
Genetic distances between neighbours are associated to each edge of the reticulation
d
d
d
dd
d
d
d
dd
d
d
d
d
d
dd
d
d
d
d
d
d
d
d
d
d
d
d
dd
d
Boundaries are traced perpendicular to the edge showing the highest genetic distance and extended through the adjacent edges
d
d
d
dd
d
d
d
dd
d
d
d
d
dd
d
d
d
d
d
d
d
d
dd
d
1
1
A boundary is completed when it exits the reticulation or closes on a preexisting boundary
d
dd
d
d
d
d
d
d
dd
d
d
d
d
d
dd
d
1
1
2
23
3
The number of boundaries one may detect is arbitrary, but there are methods to choose
1
1
2
23
3
Four genetic clusters are identified, each separated from the others by a boundary
8
6
2
45
91
7
Genomic boundaries inferred from diversity at 377 STR loci
(Barbujani and Belle 2006)
Eight significant boundaries, defining 9 groups of populations
81% of SNPs cosmopolitan.
Alleles present in one continent only: 0.91% in Africa, 0.75% in Eurasia, practically 0 elsewhere.
Hunting-gathering populations distinct from farmers in Africa
Jakobsson et al. 2008(525910 SNPs, 396 CNVs)
12.4% of haplotypes cosmopolitan, 29% continent-specific, 18% of which in Africa. More than 50% present in 1 or 2 continents
Jakobsson et al. 2008
LD decreasing with physical distance between loci and with geographic distance from East Africa
Jakobsson et al. 2008
Models with an African population replacing previous human continental groups explain the data better than
any alternative models
Fagundes et al. (2007)
Patterns of morphological and genetic variation are compatible with the effects of dispersal from Africa
Manica et al. 2007
Fitting a model of isolation by distance to human genetic diversity
Liu et al. (2006)
Average coalescence times and gene diversity decline as a function of distance from Africa
Best fit of the model for an African exit 56,000 years ago
Fagundes et al. (2007)
http://info.med.yale.edu/genetics/kkidd/point.html
The best available estimates place our species’ origin and its exit from Africa in a not-so-remote past
Linguistic and genetic differences are often correlated
Genetic variances are significant among language groups
Correlations between distance measures r r2
GEN-GEO 0.746*** 0.557GEN-LAN 0.311*** 0.097GEO-LAN 0.269*** 0.072GEN-GEO.LAN 0.723*** 0.523GEN-LAN.GEO 0.172*** 0.030
Percentages of the total variance
Genetic distance Fst Rst
Among lang. phyla 2.9 6.7Among pops. of 2.4 2.9 the same phylumWithin populations 94.7 90.4
Belle and Barbujani 2007
Origins: Attempting a synthesis
• Human genetic population structure is generally weak, with large differences among members of the same population and discordant variation across loci
• Genetic and morphological data agree in indicating an origin of human dispersal in Africa
• At the large geographic scale,patterns fit a model of repeated founder effects during dispersal from Africa
• Zones of relatively sharp genetic change correspond to reproductive barriers, geographic or cultural