50
Population Genetics of Selection Jay Taylor School of Mathematical and Statistical Sciences Arizona State University Jay Taylor (Arizona State University) Population Genetics of Selection 2009 1 / 50

Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Population Genetics of Selection

Jay Taylor

School of Mathematical and Statistical SciencesArizona State University

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 1 / 50

Page 2: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Historical Context

Evolution by Natural Selection

Darwin and Wallace (1859) observed that heritable traits that increase reproductivesuccess will become more common in a population.

Variation within populations - individuals have different traits (phenotypes).

height and weight are approximately normally distributedvariation for susceptibility to HIV-1 infection and progression to AIDS

Selection - traits influence fecundity and survivorship (fitness).

larger body size may be beneficial in cold environmentsheight may influence mating success (sexual selection)

Heritability - offspring are similar to their parents.

variation has both environmental and heritable componentsdifferences in height are partly heritable, but are also influenced bychildhood nutritition

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 2 / 50

Page 3: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Historical Context

Example: Beak size in the Medium Ground Finch (Geospiza fortis)

Restricted to the Galapagos Islands.

Forages mainly on seeds.

Large seeds are handled more efficiently by birds with larger bills.

Large seeds predominate following drought years (e.g., 1977).

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 3 / 50

Page 4: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Historical Context

Example: The rise and fall of the peppered moth (Biston betularia)

The peppered moth has two color morphs:

white (wild-type)

black (carbonaria)

The black morph was rarely recorded at the beginning of the 19th century.

carbonaria became common first in Lancashire/Yorkshire in the 1850’s, thenspread in urban areas throughout the UK.

Similar increases of melanic forms occurred on the continent and in NA.

Melanic forms appear to be favored in industrialized regions due to soot depositionon trees and declines in lichen cover.

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 4 / 50

Page 5: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Historical Context

With the decline in coal usage and the enactment of clean air legislation in the 1970’s,the frequency of the melanic morph decreased in the UK:

source: Cook (2003)

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 5 / 50

Page 6: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Historical Context

The most serious weakness of Darwin’s theory was his model of heredity, whichwas based on:

blending inheritance - offspring traits are averages of parental traits.

This is problematic because it leads to a loss of variation.

Unknown to Darwin, Mendel (1859) proposed a particulate model of inheritance:

Traits are determined by genes.

Each gene can have finitely-many different types called alleles.

Different alleles may produce different traits.

Offspring are similar to their parents because they inherit their genes.

Mendel was essentially correct, but his work was largely ignored for 40 years.

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 6 / 50

Page 7: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Historical Context

A coherent theory explaining how natural selection could operate in the context ofMendelian genetics did not develop until the 1930’s with the development of theoreticalpopulation genetics (Fisher, Wright, Haldane). This led to the Modern Synthesis:

Genes are physical entities carried on chromosomes.

Heritable variation is produced by mutation and recombination.

Continuous variation can arise from the contribution of many loci of small effect.

Selection causes changes in the frequencies of genotypes that in turn affect traitsthat influence fitness.

Population genetics can explain both microevolutionary and macroevolutionarychanges.

Population genetics focuses on understanding evolution at the molecular level: howdoes natural selection affect the dynamics of gene frequencies?

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 7 / 50

Page 8: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Historical Context

Some key questions about selection and adaptation are:

What fraction of the genome is under selection?

How frequently does selection lead to changes in the genome?

How often are deleterious mutations fixed in a population?

How do demography and life history influence the rate of adaptation?

Does adaptation rely mainly on standing variation or on new mutations?

Does adaptation occur through the fixation of many mutations of small effect orthrough the fixation of a few mutations of large effect?

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 8 / 50

Page 9: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Diffusion Approximations

A Wright-Fisher model with directional selection and mutation

Assumptions:

non-overlapping generations (e.g., an annual plant)

N haploid adults in each generation

genotypes A1,A2 with frequencies p and 1− p

mutation from Ai to Aj at rate µij

relative fitness of A1 : A2 is 1 + s : 1

population regulation by binomial sampling

adultsreproduction−→ zygotes

mutation−→ juvenilesselection−→

p,∞ p,∞ p∗,∞

juvenilesregulation−→ adults

p∗∗,∞ p′,N

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 9 / 50

Page 10: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Diffusion Approximations

To say that A1 : A2 have relative fitness 1 + s1 : 1 + s2 means that, on average, eachindividual with genotype A1 contributes 1 + s1 offspring to the next generation for every1 + s2 offspring contributed by an individual with genotype A2.

In an infinite population subject only to selection, the frequency of A1 changes from p top′ given by

p′ =p(1 + s1)

p(1 + s1) + (1− p)(1 + s2)=

p(1 + s1)

1 + ps1 + (1− p)s2.

The denominator of this expression is the mean fitness of the population, weightedby the allele frequencies.

The quantities s1 and s2 are called selection coefficients.

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 10 / 50

Page 11: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Diffusion Approximations

In the full model, the allele frequencies at the different stages of the life cycle are givenby the following equations:

p∗ = p(1− µ12) + (1− p)µ21 (mutation)

p∗∗ =p∗(1 + s)

p∗ · (1 + s) + (1− p∗) · 1 (viability selection)

p′ ∼ 1

N· Binomial(N, p∗∗) (regulation).

Remark: We say that selection is directional or purifying in this model because thesame allele is always favored and tends to increase in frequency:

A1 is favored if s > 0

A2 is favored if s < 0

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 11 / 50

Page 12: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Diffusion Approximations

Recall that for the neutral W-F model, the variance of the change of allele frequenciesover one generation is of order 1

N:

Ep

h`pN(1)− p

´2i

=1

Np(1− p) + O

„1

N2

«(assuming µij = s = 0).

This is also true under mutation and selection, provided that we choose the mutationrates and selection coefficient so that the expected change of p over over one generationis also of order 1

N. This requirement motivates the following assumption:

µij ≡ µ(N)ij =

θij

Nand s ≡ s(N) =

σ

N.

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 12 / 50

Page 13: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Diffusion Approximations

With these scalings, we have

Ep

hpN(1)− p

i=

1

N

“(1− p)θ21 − pθ12 + σp(1− p)

”+ O

„1

N2

«Ep

h`pN(1)− p

´2i

=1

Np(1− p) + O

„1

N2

«Ep

h`pN(1)− p

´ei= O

„1

N2

«(e ≥ 3).

However, this shows that the process`pN(bNtc) : t ≥ 0) converges to a diffusion

process with infinitesimal mean and variance coefficients

m(p) = θ21(1− p)− θ12p + σp(1− p)

v(p) = p(1− p).

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 13 / 50

Page 14: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Diffusion Approximations

Transition Semigroups and Infinitesimal Generators

Suppose that X = (Xt : t ≥ 0) is a continuous-time Markov process with values in R.The transition semigroup of X is the family of operators (Tt ; t ≥ 0) on the space ofbounded continuous functions f : (R)→ R defined by

Tt f (x) = Eˆf (Xt)|X0 = x

˜≡ Ex

ˆf (Xt)

˜.

We say that this family is a semigroup because T0 = Id is the identity operator andbecause it satisfies the following property

Tt+s = Tt ◦ Ts

for all t, s ≥ 0. This is a consequence of the Markov property of X .

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 14 / 50

Page 15: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Diffusion Approximations

It is also useful to consider the infinitesimal generator of X , which is defined by

Gf (x) = limt→0

Tt f (x)− f (x)

t

for any f such that the limit exists for all x .

For a one-dimensional diffusion process with infinitesimal mean and variance coefficientsm(x) and v(x), the generator has the form

Gf (x) =1

2v(x)f ′′(x) + m(x)f ′(x)

for any function f for which the derivatives f ′, f ′′ exist and are bounded.

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 15 / 50

Page 16: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Fixation Probabilities

Fixation Probabilities

Suppose that both mutation rates θ12 = θ21 = 0. Then the ultimate fate of the allele A1

is to either be lost from or fixed in the population. Let

τ = inf{t ≥ 0 : p(t) = 0 or 1}

be the time when this event occurs and define

u(p) = Pp{p(τ) = 1}

to be the fixation probability of A1 given that its initial frequency is p.

Question: We know that if A1 and A2 are neutral, then u(p) = p. How does selectionalter this fixation probability?

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 16 / 50

Page 17: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Fixation Probabilities

Using the Markov property of the diffusion process, it can be shown that

u(p) = Ttu(p) = Ep

ˆu(p(t))

˜.

This implies that

Gu(p) = limt→0

Ttu(p)− u(p)

t

= limt→0

u(p)− u(p)

t= 0

subject to the boundary conditions

u(0) = 0 and u(1) = 1.

Remark: It can also be shown that the process (u(p(t)) : t ≥ 0) is a martingale, inwhich case the optional sampling theorem can be used to deduce that u(p) is thefixation probability of A1.

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 17 / 50

Page 18: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Fixation Probabilities

For the W-F diffusion with selection, we need to solve the equation

Gu(p) =1

2p(1− p)u′′(p) + σp(1− p)u′(p) = 0,

i.e., u′′(p) + 2σu′(p) = 0,

with u(0) = 0 and u(1) = 1.

The solution can be found by integrating, leading to the following expression for thefixation probability of a selected allele:

u(p) =1− e−2σp

1− e−2σ=

1− e−2Nsp

1− e−2Ns.

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 18 / 50

Page 19: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Fixation Probabilities

The most important case is when a single copy of a new allele is introduced into apopulation, either by mutation or immigration. Then the initial frequency is p = 1/Nand the fixation probability of the new allele is

u

„1

N

«=

1− e−2s

1− e−2Ns≈

8<:2s if N−1 � s � 1

2|s|e−2N|s| if − 1� s � −N−1

In particular, this shows that

Novel beneficial mutations are likely to be lost from a population;

Deleterious mutations can be fixed, but only if N|s| is not too large;

Selection is dominated by genetic drift when |s| < 1N

.

Key result: Selection is more effective in larger populations.

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 19 / 50

Page 20: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Fixation Probabilities

Fixation Probabilities of New Mutants

1E-10

1E-09

1E-08

1E-07

1E-06

1E-05

0.0001

0.001

0.01

0.1

1

10 100 1000 10000

N

pro

b

s = 0.01

s = 0.001

s = 0

s = -0.001

s = -0.002

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 20 / 50

Page 21: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Fixation Probabilities

Selective Constraints and Divergence

One prediction of this theory is that sites that are under purifying selection shoulddiverge more slowly than neutrally evolving sites.

The degeneracy of the genetic code illustrates this effect.

Amino acids are encoded by triplets of DNA bases called codons.

There are 64 = 43 different codons, but only 20 amino acids.

On average, there are 3 different codons per amino acid.

It follows that there are two kinds of mutations in coding DNA:

(i) A non-synonymous mutation is one that changes an amino acid.

(ii) A synonymous substitution is one that changes only the DNA sequence.

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 21 / 50

Page 22: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Fixation Probabilities

The mutation TTT → TTC is synonymous.

The mutation TTT → TTA is non-synonymous because the amino acid changesfrom phenylalanine (F) to Leucine (L).

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 22 / 50

Page 23: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Fixation Probabilities

Hypothesis: In general, it is thought that non-synonymous mutations are more likely tobe deleterious than synonymous mutations because they can change protein structureand function.

Prediction: If true, then synonymous substitution rates should be higher thannon-synonymous substitution rates.

This is, in fact, what is observed:

syn (yr−1) non-syn (yr−1) ratio (syn/non-syn)

influenza A 13.1× 10−3 3.5× 10−3 3.8HIV-1 9.7× 10−3 1.7× 10−3 5.7Hepatitis B 4.6× 10−5 1.5× 10−5 3.1Drosophila 15.6× 10−9 1.9× 10−9 8.2human-rodent 3.51× 10−9 0.74× 10−9 4.7

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 23 / 50

Page 24: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Fixation Probabilities

The following plot shows synonymous and nonsynonymous substitution rates estimatedfrom comparisons of human and rodent genes. In every case the non-synonymoussubstitution rate is less than the synonymous substitution rate.

Substitution Rates Estmated from Human-Rodent Divergence

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

Synonymous rate (x 109)

No

nsy

no

nym

ou

s ra

te (

x 1

09)

Average substitution rates:

Synonymous: 3.51

Non-synonymous: 0.74

Source: Li (1997)

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 24 / 50

Page 25: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Stationary Distributions

Stationary Distributions, Ergodicity and Polymorphism

If θ12, θ21 > 0, then the boundaries p = 0, 1 are no longer absorbing states:

m(0) = θ21 > 0 and m(1) = −θ12 < 0.

Instead, recurrent mutation between A1 and A2 continually introduces newvariation into the population.

In turn, this variation is eroded both by genetic drift and selection.

Question: What can we say about the long-term distribution of allele frequencies in apopulation subject to mutation, drift and selection?

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 25 / 50

Page 26: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Stationary Distributions

One way to address this question is to study the stationary distribution of the diffusionapproximation.

Definition: We say that a distribution π(dx) is a stationary distribution for a Markovprocess X = (Xt ; t ≥ 0) if whenever the initial distribution of X0 is π(dx), then themarginal distribution of Xt is π(dx) for every t ≥ 0.

In many cases, it can be shown that such a process is ergodic, meaning that for anyinitial value x the marginal distributions tend to the stationary distribution:

P(t; x , dy)w→ π(dy),

as t →∞. This means that for any bounded continuous function f ,

limt→∞

Ex [f (Xt)] =

Zf (x)π(dx).

Thus the stationary distribution of an ergodic process tells us something about thetypical long-term behavior of that process.

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 26 / 50

Page 27: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Stationary Distributions

Remark: It can be shown that the Wright-Fisher diffusion with reciprocal mutation isergodic and has a unique stationary distribution with density π(p). We can identify thisdensity using the following procedure.

We first note that because the marginal distributions of a stationary process areconstant, we have Z 1

0

Tt f (p)π(p)dp = Eπ

ˆf (p(t))

˜= Eπ

ˆf (p(0))

˜=

Z 1

0

f (p)π(p)dp

for all t ≥ 0.

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 27 / 50

Page 28: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Stationary Distributions

Assuming that we can interchange the integration and the limit, this implies that

Z 1

0

Gf (p)π(p)dp =

Z 1

0

limt→0

Tt f (p)− f (p)

tπ(p)dp

= limt→0

1

t

Z 1

0

`Tt f (p)− f (p)

´π(p)dp

= limt→0

Z 1

0

Tt f (p)π(p)dp −Z 1

0

f (p)π(p)dp

ff= 0

for any function f for which Gf is defined. In particular, this holds when f vanishes in aneighborhood of p = 0 and p = 1, in which case integration by parts givesZ 1

0

f (p)

1

2

`v(p)π(p)

´′′ − `m(p)π(p)´′ff

dp = 0.

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 28 / 50

Page 29: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Stationary Distributions

However, it can be shown that if this previous identity holds for all such f , then thedensity π(p) must be a solution to the differential equation

1

2

`v(p)π(p)

´′′ − `m(p)π(p)´′

= 0.

This equation can be integrated twice to give

π(p) =1

C

1

v(p)exp

„2

Z p

c

m(q)

v(q)dq

«,

where the normalizing constant C <∞ must be chosen (if possible) so thatZ 1

0

π(p)dp = 1.

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 29 / 50

Page 30: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Stationary Distributions

Neutral Variation: The stationary distribution of the neutral Wright-Fisher diffusionwith mutation is just the Beta distribution with parameters 2θ1 and 2θ2, which hasdensity

π(p) =1

C

1

2p(1− p)exp

(Z p

c

`2θ1(1− q)− 2θ2q

´q(1− q)

dq

)

=1

C

1

p(1− p)exp

n2θ1 ln(p) + 2θ2 ln(1− p)

o=

1

β(2θ1, 2θ2)p2θ1−1(1− p)2θ2−1

=1

β(2Nµ1, 2Nµ2)p2Nµ1−1(1− p)2Nµ2−1.

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 30 / 50

Page 31: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Stationary Distributions

The neutral stationary distribution reflects the competing effects of genetic drift, whicheliminates variation, and mutation, which generates variation.

0.05 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

p

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

2Nu = 0.1

0.05 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

p

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

2Nu = 10

When 2Nµ1, 2Nµ2 > 1, mutation dominates drift and the stationary distribution ispeaked about its mean (both alleles are common).

When 2Nµ1, 2Nµ2 < 1, drift dominates mutation and the stationary distribution isbimodal, with peaks at the boundaries (one allele is common and one rare).

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 31 / 50

Page 32: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Stationary Distributions

With selection and mutation, the density of the stationary distribution is

π(p) =1

Cp2θ21−1(1− p)2θ12−1e2σp.

Purifying selection has two consequences:

It shifts the stationary distribution in the direction of the favored allele.

It tends to reduce the amount of variation present at the selected locus.

0.05 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

p

0.0

0.2

0.4

0.6

0.8

1.0

2Ns = 1

0.05 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

p

0.0

0.2

0.4

0.6

0.8

1.0

2Ns = 2

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 32 / 50

Page 33: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Stationary Distributions

Genetic variation is often summarized by a statistic called the heterozygosity (H) ornucleotide diversity (π):

H = P { a random sample of two individuals contains two different alleles}

≡Z 1

0

2p(1− p)π(p)dp.

The figure below shows that directional selection reduces heterozygosity.

0 1 2 3 4 5

0.00

0.02

0.04

0.06

0.08

0.10

Ns

H

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 33 / 50

Page 34: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Stationary Distributions

Purifying Selection and Polymorphism in Coding Regions

Prediction: If synonymous mutations are generally under weaker purifying selectionthan non-synonymous mutations, then we would expect synonymous diversity to begreater than non-synonymous diversity.

This is what is seen:

syn (H) non-syn (H) ratio (syn/non-syn)

D. melanogaster 0.0054 0.00038 14.2

humans (US) 0.0005 0.0001 5

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 34 / 50

Page 35: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Selection in Diploid Populations

Selection in Diploid Populations

Thus far we have focused on directional or purifying selection in a haploid population.However, many organisms are diploid throughout much of their life cycle, i.e., mostchromosomes are present in two copies per genome.

The Human Karyotype:

46 chromosomes

22 pairs of autosomes

females have two X chromosomes

males have one X and one Y

haploid gametes (sperm and eggs) areproduced by meiosis

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 35 / 50

Page 36: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Selection in Diploid Populations

Genetic variation in diploids: If there are two alleles present at a locus, then there arethree possible genotypes:

homozygotes: A1A1 and A2A2

heterozygotes: A1A2 (= A2A1)

To model selection in such a population, we need to know the fitness of each of thethree diploid genotypes.

genotype relative fitness

A1A1 1 + s11

A1A2 1 + s12

A2A2 1 + s22

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 36 / 50

Page 37: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Selection in Diploid Populations

We will assume that the organism has the following life history:

gametesmutation−→ gametes

mating−→ zygotesselection−→

p,∞ p∗,∞ p∗ij ,∞

juvenilesregulation−→ adults

meiosis−→ gametesp∗∗ij ,∞ p′ij ,N p′,∞

Provided that mating is random, it suffices to track the changes in the gameticfrequencies of A1 from generation to generation. The transition probabilities for p → p′

can be calculated by determining how p changes at each stage.

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 37 / 50

Page 38: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Selection in Diploid Populations

Suppose that the gametic frequency of A1 in generation t is pN(t) = p.

Mutation: Each Ai gamete mutates to Aj with probability µij . This changes thefrequency of A1 from p to p∗:

p∗ = p(1− µ12) + (1− p)µ21.

Random mating: Because mating is random and the number of gametes is assumed tobe infinite, the frequencies of the diploid genotypes immediately following mating are inHardy-Weinberg equilibrium:

genotype frequency

A1A1 p11 = (p∗)2

A1A2 p12 = 2p∗(1− p∗)A2A2 p22 = (1− p∗)2

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 38 / 50

Page 39: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Selection in Diploid Populations

Selection: Selection causes the frequency of each genotype to change in proportion toits relative fitness. If p∗ij is the frequency of AiAj before selection, then the frequency p∗∗ijafter selection is

p∗∗ij = p∗ij

“wij

”,

where wij is the relative fitness of AiAj and w̄ is the mean fitness of the population:

w̄ = (p∗)2(1 + s11) + 2p∗(1− p∗)(1 + s12) + (1− p∗)2(1 + s22)

= 1 + (p∗)2s11 + 2p∗(1− p∗)s12 + (1− p∗)2s22.

Consulting the table of relative fitnesses, we find:

genotype frequency after selection

A1A1 p∗∗11 = p∗11(1 + s11)/w̄A1A2 p∗∗12 = p∗12(1 + s12)/w̄A2A2 p∗∗22 = p∗22(1 + s22)/w̄

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 39 / 50

Page 40: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Selection in Diploid Populations

Population regulation: Population regulation occurs as in the Wright-Fisher model:the N adults are randomly sampled from the juvenile cohort. However, because thespecies is diploid, we are sampling 2N genes in total.

Suppose that p′ij denotes the frequency of AiAj genotypes following populationregulation. Then, the numbers of adults of each of the three genotypes has aMultinomial distribution:

N(p′11, p′12, p

′22) ∼ Multinomial(N, p∗∗11 , p

∗∗12 , p

∗∗22 )

Meiosis: The final stage is meiosis, during which each adult produces an effectivelyinfinite number of haploid gametes. Whereas A1A1 adults produce only A1 gametes andA2A2 adults produce only A2 gametes, A1A2 adults produce an equal mixture of A1 andA2 gametes. It follows that the gametic frequency of A1 in generation t + 1 is equal to:

pN(t + 1) = p′ = p′11 +1

2p′12.

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 40 / 50

Page 41: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Selection in Diploid Populations

To derive a diffusion approximation for this model, we must assume that selection andmutation are both of order O(1/2N):

µij ≡ µ(N)ij =

θij

2N

sij ≡ s(N)ij =

σij

2N.

With these scalings, a tedious but straightforward calculation shows that

Ep

ˆδ˜

=1

2N

hθ21(1− p)− θ12p +

1

2

`σ11 − σ22

´+ (1− 2p)(σ12 − σ̄)

ffp(1− p)

i+O(N−2)

Ep

ˆδ2˜ =

1

2Np(1− p) + O(N−2)

Ep

ˆδe˜ = O(N−2) if e ≥ 3 ,

where δ = p′ − p and σ̄ = 12(σ11 + σ22).

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 41 / 50

Page 42: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Selection in Diploid Populations

It follows that the processes (pN(b2Ntc) : t ≥ 0) converge to a diffusion process withinfinitesimal variance and drift coefficients

v(p) = p(1− p)

m(p) = θ21(1− p)− θ12p +

1

2

`σ11 − σ22

´+ (1− 2p)(σ12 − σ̄)

ffp(1− p)

Remarks:

We have rescaled time by a factor of 2N rather than N because there are 2Ngenes in a diploid population with N individuals.

The infinitesimal variance of the diffusion approximation is then the same as thatfor a haploid Wright-Fisher model with N individuals:

v(p) = p(1− p)

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 42 / 50

Page 43: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Selection in Diploid Populations

The infinitesimal drift of the diffusion approximation can be written as

m(p) = θ21(1− p)− θ12p + σ(p)p(1− p)

where σ(p) ≡ 12

`σ11 − σ22

´+ (1− 2p)(σ12 − σ̄).

If σ12 = σ̄, then σ(p) = 12(σ11 − σ22) is constant, as in the haploid model.

(This is called genic selection.)

Otherwise, selection is frequency-dependent.

The marginal fitness of an allele is equal to the average of the fitnesses of thegenotypes containing that allele, weighted by the frequencies of those genotypes:

wA1 = p(1 + s11) + (1− p)(1 + s12) = 1 + ps11 + (1− p)s12

wA2 = (1− p)(1 + s22) + p(1 + s12) = 1 + ps12 + (1− p)s22

wA1 − wA2 =1

2

`s11 − s22

´+ (1− 2p)(s12 − s̄)

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 43 / 50

Page 44: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Selection in Diploid Populations

Inbreeding Depression and Recessive Deleterious Alleles

Suppose that A1 is deleterious compared to A2 and that the selection coefficients havethe form

σ11 = −σσ12 = −hσ

σ22 = 0,

where σ > 0 and h ∈ [0, 1].

The constant h is called the dominance coefficient because it quantifies thecontribution of the A1 allele to the fitness of the heterozygote. A1 is said to be

dominant if h ∈ (1/2, 1]

recessive if h ∈ [0, 1/2)

additive if h = 1/2.

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 44 / 50

Page 45: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Selection in Diploid Populations

In this case, the stationary distribution of the diffusion process has density

π(p) =1

Cp2θ21−1(1− p)2θ12−1e−σp+(1−2h)σp(1−p).

Because the exponent is a decreasing function of h, recessive deleterious alleles tendto be more common than dominant deleterious alleles.

0.0 0.2 0.4 0.6 0.8 1.0

0.000

0.005

0.010

0.015

0.020

0.025

h (dominance coefficient)

p

2Ns = −10

−20

−100

Equilibrium Frequency of Deleterious Alleles

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 45 / 50

Page 46: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Selection in Diploid Populations

Many species have mechanisms that reduce the likelihood of inbreeding. Thesehave probably evolved to avoid inbreeding depression caused by recessive deleteriousalleles.

Many deleterious alleles are loss-of-function mutations that are recessive becausea single functional copy of the gene produced enough of the protein.

Recessive deleterious alleles can rise to significant frequencies because they areshielded from selection in heterozygotes.

If the frequency of such an allele is p � 1, then the frequency of deleterioushomozygotes in an outbred population is p2 � 1.

In contrast, the frequency of such homozygotes in a cross between two sibs willbe (approximately) 2p × 1

2= p � p2.

Thus, inbred individuals are much more likely to suffer from heritable diseasescaused by recessive deleterious alleles.

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 46 / 50

Page 47: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Selection in Diploid Populations

Overdominance and Balancing Selection

If the fitness of the heterozygote is greater than the fitness of either homozygote,i.e., if σ12 > σ11, σ22, then the heterozygote is said to be overdominant. In this case,there is an intermediate frequency

p̄ =σ12 − σ22

2σ12 − σ11 − σ22∈ (0, 1),

such that

σ(p̄) = 0 (both alleles are equally fit)

σ(p) > 0 if p < p̄ (A1 is more fit)

σ(p) < 0 if p > p̄ (A2 is more fit)

Thus, A1 tends to rise in frequency when rare and tends to decrease when common.This kind of selection is called balancing selection because it maintains geneticvariation in the population.

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 47 / 50

Page 48: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Selection in Diploid Populations

The tendency of balancing selection to maintain variation can be seen in the density ofthe stationary distribution for this diffusion:

π(p) =1

Cp2µ1−1q2µ2−1e(2σ12−σ11)p(2p̄−p).

Symmetric Balancing Selection: In the following histograms, σ11 = σ22 = 0,2σ12 = 4Ns, and 4Nµ = 0.1.

0.05 0.2 0.35 0.5 0.65 0.8 0.95

p

0.00

0.05

0.10

0.15

0.20

4Ns = 10

0.05 0.2 0.35 0.5 0.65 0.8 0.95

p

0.00

0.05

0.10

0.15

0.20

4Ns = 40

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 48 / 50

Page 49: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Selection in Diploid Populations

Example: The classic example of overdominance is the sickle cell mutation that isprevalent in some human populations with a high incidence of malaria infections.This is an amino-acid changing mutation which causes hemoglobin molecules toclump together.

There are two alleles - A which is the non-sickle-cell (‘wild type’) allele and S whichcauses sickling of red blood cells. The diploid genotypes and their phenotypes are:

AA: These individuals have normal hemoglobin, but are susceptible to malariainfections (which can be fatal in children and pregnant women).

AS : These individuals have a mild form of anemia but are very resistant to malariainfection.

SS : These individuals suffer from a very severe anemia.

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 49 / 50

Page 50: Population Genetics of Selectionjtaylor/teaching/Fall09/APM541/selection.pdf · The denominator of this expression is the mean tness of the population, weighted by the allele frequencies

Selection in Diploid Populations

In regions with a high incidence of malaria, the benefits of the resistance to malariaconferred by the AS genotype outweigh the costs of the mild anemia, and ASheterozygotes have higher fitness than either homozygote.

The viabilities of the three genotypes in malarial regions have been estimated to be(Cavalli-Sforza and Bodmer, 1971):

SS AS AA0.2 1.1 1

Using these fitnesses, the model predicts an equilibrium frequency for S of p̄ ≈ 0.1, andthe observed frequency is about 0.09 averaged across West Africa.

In contrast, in regions with little or no malaria, the sickle cell mutation is deleterious andis usually very rare.

Jay Taylor (Arizona State University) Population Genetics of Selection 2009 50 / 50