21
Population genetics

Population genetics

  • Upload
    jin

  • View
    15

  • Download
    2

Embed Size (px)

DESCRIPTION

Population genetics. Population genetics. - PowerPoint PPT Presentation

Citation preview

Page 1: Population genetics

Population genetics

Page 2: Population genetics

Population genetics

Population genetics concerns the study of genetic variation and change within a population. While for evolving species there is no model for the branching process (speciation), in population genetics there is. This allows a detailed modelling of the interplay between mutation, selection, and stochastic effects (genetic drift).

Simplifying assumptions that are initially made include:- No selection- No recombination - No fluctuations in population size- No population structure (subdivision; migration)- No assortative mating (individuals mate randomly)- No interaction between loci (no epistasis; no linkage)- No environmental effects (e.g. climate/habitat change etc.)

RA Fisher Sewell Wright JBS Haldane Motoo Kimura

Page 3: Population genetics

Kimura’s Neutral Theory

Darwin(ism): * Something causes minute (phenotype) variations in a population (ideas: perhaps over-use during lifetime might cause variations (Lamarckism; think giraffes); perhaps traits are transmitted through blood and blend)* Natural selection causes adaptive variants to rise in frequency, while non-adaptive ones die out.

Neo-darwinism:* The “something” is replaced by Mendelian genetics + random mutations* Panselectionism; adaptionism: most traits are optimal; selection main driving force of evolution (R.A. Fisher; Richard Dawkins; John Maynard Smith)

Population genetics / neutral theory:* Most mutations are neutral; genetic drift underlies most of evolution (Fisher; Haldane; Wright; Kimura)

Modern evolutionary synthesis:* Takes onboard (parts of) all of the above. * Neutral theory relevant for DNA data in populations; considered less relevant for phenotypes.

Page 4: Population genetics

Wright-Fisher Model

- Constant population size N diploid individuals = 2N alleles

- Each descendant chooses a parent randomly

- Everyone reproduces simultaneously (no overlapping generations)

http://www.stats.ox.ac.uk/~mcvean/Modelling.pdf

Page 5: Population genetics

Wright-Fisher ModelSuppose i(t) individuals carry a particular mutation A in

generation t. The probability of any individual in generation t+1 to be of type A is

x = i(t) / 2N

The number of individuals of type A in generation t+1 is binomially distributed:

This distribution has mean and variance

E(i(t+1)) = i(t)Var(i(t+1)) = 2N x (1-x)

The expected number of individuals carrying a mutation A does not change, but because the variance will increase, eventually the mutation will either be lost (i=0) or reach fixation (i=2N).

http://www.stats.ox.ac.uk/~mcvean/Modelling.pdf

kNk xxk

NktiP

2)1(

2))1((

Page 6: Population genetics

Wright-Fisher Model

Suppose the initial frequency of the mutant A is i. Since

E( i(t+1) ) = i(t),

the expectation of the frequency remains constant throughout. However, eventually it will either be lost or go to fixation. If the probability of eventual fixation is p, we have

i = E( i(0) ) = E( i() ) = 2N p + 0 (1-p) = 2 N p

The probability p that A will go to fixation is therefore

p = i / 2N

A simpler argument is this: without selection all alleles are equivalent; the one that gets fixed is chosen uniformly from the present-day population; the probability that this is an A mutant is i / 2N.

This also means that for neutral sites, the rate ρ of substitution = the rate u of mutation.

Page 7: Population genetics

Wright-Fisher Model

Since x=i / 2N and Var(i(t+1)) = 2N x (1-x)

we getVar ( x ) = x (1-x) / 2N,

in other words, the sampling variance in the allele frequency x is inversely proportional to the population size. This effect is called (random) genetic drift.

The Wright-Fisher model is highly idealized; e.g. populations do vary in size, there is structure, and individuals do not mate randomly. Therefore, N does not directly relate to the actual population size. A more accurate way of putting this is to say that N is the Wright-Fisher population size that generates the same amount of genetic drift as there is in the actual population.

To emphasize this, the parameter N is often called the effective population size (and written Ne).

http://www.stats.ox.ac.uk/~mcvean/Modelling.pdf

Page 8: Population genetics

The coalescent model

Whole population; Wright-Fisher Ancestry of current population

Ancestry of a random sample Coalescent

http://www.stats.ox.ac.uk/~mcvean/Modelling.pdf

Page 9: Population genetics

Kingman’s coalescent

Probability that two given lineages coalesce in one generation:

P(coalescence) = 1/2N

Expected number of generations before coalescence, i.e. the time to the most recent common ancestor (MRCA):

E( TMRCA ) = 2N

Probability of coalescence (of 2 lineages) when k lineages are present = 1-P(no coalescence):

Other argument: Coalescence rate per pair is 1/2N; there are k-choose-2 pairs.

22

1

2

)1(...21

2

11

2

21

2

1111

k

N

N

k

N

k

NN

http://www.stats.ox.ac.uk/~mcvean/Modelling.pdf

J.F.C. Kingman

Page 10: Population genetics

Variation in the population

Suppose the mutation rate is u (per generation, and per locus or site).

The expected number of differences between two individuals (diversity) is

= 2 * u * E( TMRCA ) = 4 N u

(assuming all mutations are unique). The quantity 4 N u often appears in population genetics, and is usually treated as an independent parameter, .

Real-life populations do not, of course, follow the Wright-Fisher model. The parameter N that makes the W-F diversity equal to the observed diversity is called the effective population size, Ne. Other definitions (based on other aspects of the model) are used as well.

Page 11: Population genetics

Allele frequency spectrum

By going to the continuous (diffusion) limit, the equilibrium distribution of allele frequencies can be derived. This is called the “allele frequency spectrum”.

Assuming that mutations and back-mutations occur at the same rate u, the allele frequency spectrum P(x)dx is

P(x) dx = x-1 (1-x)-1 dx

(apart from normalization). Here = 4 N u.0.2 0.4 0.6 0.8 1

20

40

60

80

Suppose a mutation occurs at frequency x. The probability of sampling two individuals that are different at that locus is 2 x (1-x). Multiplying with P(x) dx gives the contribution to the heterozygosity (= probability that two random alleles differ) per unit of frequency:

H(x) dx = x (1-x) dxSince is small, every frequency contributes nearly equally to the total heterozygosity .

Under the influence of selection, the allele frequency spectrum becomes skewed towards the advanta-geous allele, and depleted of intermediate-frequency alleles. This is one way to test for selection.

Page 12: Population genetics

Linkage disequilibrium (LD)

Relates to 2 polymorphic sites

DAB = fAB – fAfB

= fAB fab - fAb faB (DAB = -DaB = -DAb = Dab )

Correlation coefficient (Hill & Robertson 1968) : r2AB = DAB

2 / fafAfbfB

Richard Lewontin (1929-)

Page 13: Population genetics

Dynamics of LD

Genetic drift causes reduction in diversity, so that (expected) LD0 at equilibrium.

Recombination decreases LD.

Effect of selective sweep (rapid increase of frequency of an advantageous allele) on LD:– Diversity is reduced– Polymorphisms on selected haplotype are carried along: hitchhiking– More correlations between sites: many share ancestry– Result: LD increases

Sweep

Page 14: Population genetics

Prior observations

• “Extent of enzyme polymorphism is surprisingly constant between species. So constant, in fact, that the effective sizes of most species must be within 1 order of magnitude of each other.” (Lewontin 1974; Maynard Smith & Haigh 1974)

• Variation is reduced in regions with low recombination(Aguade 1989; Begun & Aquadro 1992, etc.)

Page 15: Population genetics

Assumptions:- Rate of neutral mutations = u- Rate of advantageous mutations = v - Selective advantage of adv. mutations = σ

Without linkage to selected locus:mean sum-of-site heterozygosities (ssh; diversity) = 4 N u

( = mean time to coalescence * 2 lineages * neutral mutation rate)

Neutral locus Selected locus

Page 16: Population genetics

Assumptions:- Rate of neutral mutations = u- Rate of advantageous mutations = v - Selective advantage of adv. mutations = σ- Times of fixation at selected locus: Poisson process, rate ρ - Fixations are fast compared to drift, can be regarded as instantaneous

With linkage to selected locus:Rate of coalescence due to drift = 1/2NRate of fixation of adv. muts. at selected site = ρTotal coalescence rate: ρ + 1/2NAverage time to coalescence: 1 / (ρ + 1 / 2N)ssh = 2 u / (ρ + 1 / 2N) = 4 N u / ( 1 + 2 N ρ )

Limit for N infinity: ssh = 2 u / ρ

Neutral locus Selected locus

Page 17: Population genetics

ssh = 2 u / (ρ + 1 / 2N) = 4 N u / ( 1 + 2 N ρ )

Rate of fixation ρ v * 2 N σ (provided 1/2N < σ < 1 )

Page 18: Population genetics

Fixation due to hitchhiking:Current frequency of allele A = xNew frequency of allele = z

z = 1 with probability ρ x (hitchhiking; allele A)z = 0 with probability ρ (1-x) (hitchhiking; allele a)z = x with probability (1-ρ) (no hitchhiking)

freq = z-xE(freq) = 0Var(freq) = ρ x (1-x) (infinite population)Var(freq) = (1/2N) x (1-x) (finite population; no hitchhiking)Var(freq) = (ρ + 1/2N) x (1-x) (finite population + hitchhiking)

Same form as standard W-F model, but with Ne = N / (1 + 2 N ρ)

Page 19: Population genetics

Now assume some recombination between neutral & selected loci (instead of total linkage). Suppose allele linked to advantageous mutation rises to frequency y (rather than frequency 1).

z = y + (1-y)x with probability ρ x (hitchhiking; allele A)z = (1-y)x with probability ρ (1-x) (hitchhiking; allele a)z = x with probability (1-ρ) (no hitchhiking)

freq = z-xE(freq) = 0Var(freq) = ρ y2 x (1-x) (infinite population)Var(freq) = (1/2N) x (1-x) (finite population; no hitchhiking)Var(freq) = (ρ y2 + 1/2N) x (1-x) (finite population + hitchhiking)

Same form as standard W-F model, but with Ne = N / (1 + 2 N ρ y2)

Page 20: Population genetics

Coalescence rate due to drift = 1/2NCoalescence rate due to hitchhiking = ρ E( y2 )

If 2 ρ y2 > 1/N, “draft” (due to hitchhiking, sweeps) is more important than “drift” (population size effect).

In the “draft” regime, nucleotide diversity is independent of population size.

Numerical example: FruitflyLimit for N infinity: = ssh = 2 u / ρ y2

Neutral mutation rate u = 10^-9 per generation, per siteSite heterozygosity = 0.006Assume y=1

Rate of advantageous substitutions ρ ~ 10-7, “typical of rate of amino acid substitutions in coding regions”

Page 21: Population genetics

Questions in (population) genetics

• Effective population size of human population ~10000. Why the huge discrepancy with actual population size?

• The amount of genetic diversity is “surprisingly constant between species” (Lewontin 1964). Is this (i) not a problem / not true, (ii) caused by Gillespie’s “genetic draft”, or (iii) caused by something else?

• What is the cause of the variation in recombination rate (including hotspots) across the human genome. Are the latest measurements accurate?

• Roughly the same 3-5% of mammalian genome is conserved within the mammalian clade. Does this represent most/all of the functional genome, or is a large fraction functional and fast evolving? What can population genetics (rather than species comparisons) bring to this question?

• Common (high-frequency) genetic variants associated with common disease are hard to find and usually explain only a small fraction (~1%) of variability of susceptibility variation. Are common diseases often caused by rare genetic variants instead? If so, how can these be found? (Not by association studies – but linkage studies are expensive and have low-resolution)