22
Chapter 6 Mathematical Genetics 6.1 Introduction In this chapter we study some of the basic ideas of population genetics [11, 13, 17, 31, 40, 53, 63]. Many scientists contributed to the development of this lovely theory. Its early phase was dominated by Fisher, Haldane, and Wright. With the advent of molecular genetics, the theory has become even more relevant in answering questions in medical genetics and evolutionary biology. The theory is also worth pursuing for its pedagogical value in illustrating (a) the translation of scientific observations and speculations into mathematical models, and (b) the manipulation of these models to yield specific quantitative predictions. We will limit ourselves to simple models phrased in terms of difference equations. Some of the relevant difference equations are nonlinear, so we will stress qualitative issues such as the stability of fixed points and the time scales on which the equations operate. Our treatment begins with a brief summary of the peculiar vocabulary of population genetics. This summary is self-contained, but readers who want to appreciate better the context and implications of the theory are urged to learn molecular genetics by formal course work or informal self-study. Although the learning curve is initially steep, the rewards in combing two fields such as applied mathematics and genetics is commensurate. 6.2 Genetics Background The classical genetic definitions of interest to us predate the modern molecular era. First, genes occur at definite sites, or loci, along a chromosome. Each locus can be occupied by one of several variant genes called alleles. Most human cells contain 46 chromosomes. Two of these are sex chromosomes— two paired X’s for a female and an X and a Y for a male. The remaining 22 homologous pairs of chromosomes are termed autosomes. One member of each chromosome pair is maternally derived via an egg; the other member is paternally derived via a sperm. Except for the sex chromosomes, it follows that there are two genes at every locus. These constitute a person’s genotype at that locus. If the two alleles are identical, then the person is a homozygote; otherwise, he is a heterozygote. Typically, one 93

Mathematical Genetics - UCLA Human · PDF fileMathematical Genetics 6.1 Introduction ... the rewards in combing two fields such as applied mathematics and genetics is commensurate

Embed Size (px)

Citation preview

Page 1: Mathematical Genetics - UCLA Human · PDF fileMathematical Genetics 6.1 Introduction ... the rewards in combing two fields such as applied mathematics and genetics is commensurate

Chapter 6

Mathematical Genetics

6.1 Introduction

In this chapter we study some of the basic ideas of population genetics [11, 13, 17, 31, 40, 53, 63].Many scientists contributed to the development of this lovely theory. Its early phase was dominatedby Fisher, Haldane, and Wright. With the advent of molecular genetics, the theory has become evenmore relevant in answering questions in medical genetics and evolutionary biology. The theory is alsoworth pursuing for its pedagogical value in illustrating (a) the translation of scientific observationsand speculations into mathematical models, and (b) the manipulation of these models to yield specificquantitative predictions. We will limit ourselves to simple models phrased in terms of differenceequations. Some of the relevant difference equations are nonlinear, so we will stress qualitative issuessuch as the stability of fixed points and the time scales on which the equations operate.

Our treatment begins with a brief summary of the peculiar vocabulary of population genetics. Thissummary is self-contained, but readers who want to appreciate better the context and implications ofthe theory are urged to learn molecular genetics by formal course work or informal self-study. Althoughthe learning curve is initially steep, the rewards in combing two fields such as applied mathematicsand genetics is commensurate.

6.2 Genetics Background

The classical genetic definitions of interest to us predate the modern molecular era. First, genes occurat definite sites, or loci, along a chromosome. Each locus can be occupied by one of several variantgenes called alleles. Most human cells contain 46 chromosomes. Two of these are sex chromosomes—two paired X’s for a female and an X and a Y for a male. The remaining 22 homologous pairs ofchromosomes are termed autosomes. One member of each chromosome pair is maternally derived viaan egg; the other member is paternally derived via a sperm. Except for the sex chromosomes, it followsthat there are two genes at every locus. These constitute a person’s genotype at that locus. If the twoalleles are identical, then the person is a homozygote; otherwise, he is a heterozygote. Typically, one

93

Page 2: Mathematical Genetics - UCLA Human · PDF fileMathematical Genetics 6.1 Introduction ... the rewards in combing two fields such as applied mathematics and genetics is commensurate

94 CHAPTER 6. MATHEMATICAL GENETICS

Table 2.1: Phenotypes at the ABO Locus

Phenotypes GenotypesA A/A, A/OB B/B, B/OAB A/BO O/O

denotes a genotype by two allele symbols separated by a slash /. Genotypes may not be observable.By definition, what is observable is a person’s phenotype.

A simple example will serve to illustrate these definitions. The ABO locus resides on the longarm of chromosome 9 at band q34. This locus determines detectable antigens on the surface of redblood cells. There are three alleles, A, B, and O, which determine an A antigen, a B antigen, and theabsence of either antigen, respectively. Phenotypes are recorded by reacting antibodies for A and B

against a blood sample. The four observable phenotypes are A (antigen A alone detected), B (antigenB alone detected), AB (antigens A and B both detected), and O (neither antigen A nor B detected).These correspond to the genotype sets given in Table 2.1.

Note that phenotype A results from either the homozygous genotype A/A or the heterozygousgenotype A/O; similarly, phenotype B results from either B/B or B/O. Alleles A and B both maskthe presence of the O allele and are said to be dominant to it. Alternatively, O is recessive to A andB. Relative to one another, alleles A and B are codominant.

The six genotypes listed above at the ABO locus are unordered in the sense that maternal andpaternal contributions are not distinguished. In some cases it is helpful to deal with ordered genotypes.When we do, we will adopt the convention that the maternal allele is listed to the left of the slash andthe paternal allele is listed to the right. With three alleles, the ABO locus has nine distinct orderedgenotypes.

The Hardy-Weinberg law of population genetics permits calculation of genotype frequencies fromallele frequencies. In the ABO example above, if the frequency of the A allele is pA and the frequencyof the B allele is pB , then a random individual will have phenotype AB with frequency 2pApB . Thus,if pA = .2 and pB = .05, then AB people constitute 2 percent of the population. The factor of 2 inthe heterozygote frequency 2pApB reflects the two equally likely ordered genotypes A/B and B/A. Inessence, Hardy-Weinberg equilibrium corresponds to the random union of two gametes, one gametebeing an egg and the other being a sperm. A union of two gametes incidentally is called a zygote.

In gene mapping studies, several genetic loci on the same chromosome are phenotyped. Whenthese loci are simultaneously followed in a human pedigree, the phenomenon of recombination canoften be observed. This reshuffling of genetic material manifests itself when a parent transmits to achild a chromosome that differs from both of the corresponding homologous parental chromosomes.Recombination takes place during the formation of gametes at meiosis. Suppose, for the sake of

Page 3: Mathematical Genetics - UCLA Human · PDF fileMathematical Genetics 6.1 Introduction ... the rewards in combing two fields such as applied mathematics and genetics is commensurate

6.2. GENETICS BACKGROUND 95

argument, that in the parent producing the gamete, one member of each chromosome pair is paintedblack and the other member is painted white. Instead of inheriting an all-black or an all-whiterepresentative of a given pair, a gamete inherits a chromosome that alternates between black andwhite. The points of exchange are termed crossovers. Any given gamete will have just a few randomlypositioned crossovers per chromosome. The recombination fraction between two loci on the samechromosome is the probability that they end up in regions of different color in a gamete. This eventoccurs whenever the two loci are separated by an odd number of crossovers along the gamete. Thisbrief description will be adequate for our purposes even though it grossly simplifies the phenomenonof recombination.

1

AA1/A1

����O

A2/A2

2

3

AA1/A2

����4

OA2/A2

����5

OA1/A2

Figure 6.1: A Pedigree with ABO and AK1 Phenotypes

As a concrete example, consider the locus AK1 (adenylate kinase 1) in the vicinity of ABO onchromosome 9. With modern biochemical techniques it is possible to identify two codominant alleles,A1 and A2, at this enzyme locus. Figure 6.1 depicts a pedigree with phenotypes listed at the ABOlocus and unordered genotypes listed at the AK1 locus. In this pedigree, as in all pedigrees, circlesdenote females and squares denote males. Individuals 1, 2, and 4 are termed the founders of thepedigree. Parents of founders are not included in the pedigree.

Close examination of the pedigree shows that individual 3 has alleles A and A1 on his paternallyderived chromosome 9 and alleles O and A2 on his maternally derived chromosome 9. However, hepasses to his child 5 a chromosome with O and A1 alleles. In other words, the gamete passed isrecombinant between the loci ABO and AK1. On the basis of many such observations, it is knownempirically that doubly heterozygous males like 3 produce recombinant gametes about 12 percent ofthe time. In females the recombination fraction is about 20 percent.

The pedigree in Figure 6.1 is atypical in several senses. First, it is quite simple graphically. Second,everyone is phenotyped; in larger pedigrees, some people will be dead or otherwise unavailable for

Page 4: Mathematical Genetics - UCLA Human · PDF fileMathematical Genetics 6.1 Introduction ... the rewards in combing two fields such as applied mathematics and genetics is commensurate

96 CHAPTER 6. MATHEMATICAL GENETICS

typing. Third, it is constructed so that recombination can be unambiguously determined. In mostmatings, one cannot directly count recombinant and nonrecombinant gametes. This forces geneticiststo rely on indirect statistical arguments to overcome the problem of missing information. Part ofthe missing information in pedigree data has to do with phase. Alleles O and A2 are in phase inindividual 3 of Figure 6.1. In general, a gamete’s sequence of alleles along a chromosome constitutesa haplotype. The alleles appearing in the haplotype are said to be in phase. Two such haplotypestogether determine a multilocus genotype (or simply a genotype when the context is clear).

Recombination or linkage studies are conducted with loci called traits and markers. Trait locitypically determine genetic diseases or interesting biochemical or physiological differences betweenindividuals. Marker loci, which need not be genetic loci in the traditional sense at all, are signpostsalong the chromosomes. A marker locus is simply a place on a chromosome showing detectablepopulation differences. These differences or alleles permit recombination to be measured betweenthe trait and marker loci. In practice, recombination between two loci can be observed only whenthe parent contributing a gamete is heterozygous at both loci. In linkage analysis it is thereforeadvantageous for a locus to have several common alleles. Such loci are said to be polymorphic.

The number of haplotypes possible for a given set of loci is the product of the numbers of allelespossible at each locus. In the ABO-AK1 example, there are k = 3 × 2 = 6 possible haplotypes.These can form k2 genotypes based on ordered haplotypes or k + k(k−1)

2 = k(k+1)2 genotypes based on

unordered haplotypes.To compute the population frequencies of random haplotypes, one can invoke linkage equilibrium.

This rule stipulates that a haplotype frequency is the product of the underlying allele frequencies.For instance, the frequency of an OA1 haplotype is pOpA1 , where pO and pA1 are the populationfrequencies of the alleles O and A1, respectively. To compute the frequency of a multilocus genotype,one can view it as the union of two random gametes in imitation of the Hardy-Weinberg law. Forexample, the genotype of person 2 in Figure 6.1 has population frequency (pOpA2)

2, being the unionof two OA2 haplotypes. Exceptions to the rule of linkage equilibrium often occur for tightly linkedloci.

6.3 Hardy-Weinberg Equilibrium

Let us now consider a formal mathematical model for the establishment of Hardy-Weinberg equilib-rium. This model relies on the seven following explicit assumptions: (1) infinite population size, (2)discrete generations, (3) random mating, (4) no selection, (5) no migration, (6) no mutation, and (7)equal initial genotype frequencies in the two sexes. Suppose for the sake of simplicity that there aretwo alleles A1 and A2 at some autosomal locus in this infinite population and that all genotypes areunordered. Consider the result of crossing the genotype A1/A1 with the genotype A1/A2. The firstgenotype produces only A1 gametes, and the second genotype yields gametes A1 and A2 in equalproportion. For the cross under consideration, gametes produced by the genotype A1/A1 are equallylikely to combine with either gamete type issuing from the genotype A1/A2. Thus, for the cross

Page 5: Mathematical Genetics - UCLA Human · PDF fileMathematical Genetics 6.1 Introduction ... the rewards in combing two fields such as applied mathematics and genetics is commensurate

6.3. HARDY-WEINBERG EQUILIBRIUM 97

A1/A1 × A1/A2, the frequency of offspring obviously is 12A1/A1 and 1

2A1/A2. Similarly, the crossA1/A1×A2/A2 yields only A1/A2 offspring. The cross A1/A2 ×A1/A2 produces offspring in the ratio14A1/A1, 12A1/A2, and 1

4A2/A2. These proportions of outcomes for the various possible crosses areknown as segregation ratios.

Table 3.2: Mating Outcomes for Hardy-Weinberg Equilibrium

Mating Type Nature of Offspring Frequency

A1/A1 × A1/A1 A1/A1 u2

A1/A1 × A1/A212A1/A1 + 1

2A1/A2 2uvA1/A1 × A2/A2 A1/A2 2uwA1/A2 × A1/A2

14A1/A1 + 1

2A1/A2 + 14A2/A2 v2

A1/A2 × A2/A212A1/A2 + 1

2A2/A2 2vwA2/A2 × A2/A2 A2/A2 w2

Suppose the initial proportions of the genotypes are u for A1/A1, v for A1/A2, and w for A2/A2.Under the stated assumptions, the next generation will be composed as shown in Table 3.2. Theentries in Table 3.2 yield for the three genotypes A1/A1, A1/A2, and A2/A2 the new frequencies

u2 + uv +14v2 =

(u +

12v)2

uv + 2uw +12v2 + vw = 2

(u +

12v)(1

2v + w

)

14v2 + vw + w2 =

(12v + w

)2

,

respectively. If we define the frequencies of alleles A1 and A2 as p1 = u + v2 and p2 = v

2 + w, thenA1/A1 occurs with frequency p2

1, A1/A2 with frequency 2p1p2, and A2/A2 with frequency p22. After a

second round of random mating, the frequencies of the genotypes A1/A1, A1/A2, and A2/A2 are(p21 +

122p1p2

)2

=[p1(p1 + p2)

]2

= p21

2(p21 +

122p1p2

)(122p1p2 + p2

2

)= 2p1(p1 + p2)p2(p1 + p2)

= 2p1p2(122p1p2 + p2

2

)2

=[p2(p1 + p2)

]2

= p22.

Thus, after a single round of random mating, genotype frequencies stabilize at the Hardy-Weinbergproportions.

We may deduce the same result by considering the gamete population. A1 gametes have frequencyp1 and A2 gametes frequency p2. Since random union of gametes is equivalent to random mating,

Page 6: Mathematical Genetics - UCLA Human · PDF fileMathematical Genetics 6.1 Introduction ... the rewards in combing two fields such as applied mathematics and genetics is commensurate

98 CHAPTER 6. MATHEMATICAL GENETICS

A1/A1 is present in the next generation with frequency p21, A1/A2 with frequency 2p1p2, and A2/A2

with frequency p22. In the gamete pool from this new generation, A1 again occurs with frequency

p21 +p1p2 = p1(p1 +p2) = p1 and A2 with frequency p2. In other words, stability is attained in a single

generation. This random union of gametes argument generalizes easily to more than two alleles.Hardy-Weinberg equilibrium is a bit more subtle for X-linked loci. Consider a locus on the X

chromosome and any allele at that locus. At generation n let the frequency of the given allele infemales be qn and in males be rn. Under our stated assumptions for Hardy-Weinberg equilibrium,one can show that qn and rn converge quickly to the value p = 2

3q0 + 13r0. Twice as much weight is

attached to the initial female frequency since females have two X chromosomes while males have onlyone.

Because a male always gets his X chromosome from his mother, and his mother precedes him byone generation,

rn = qn−1. (6.1)

Likewise, the frequency in females is the average frequency for the two sexes from the precedinggeneration; in symbols,

qn =12qn−1 +

12rn−1. (6.2)

Equations (6.1) and (6.2) together imply

23qn +

13rn =

23

(12qn−1 +

12rn−1

)+

13qn−1

=23qn−1 +

13rn−1. (6.3)

It follows that the weighted average 23qn + 1

3rn = p for all n.From equations (6.2) and (6.3), we deduce that

qn − p = qn − 32p +

12p

=12qn−1 +

12rn−1 −

32

(23qn−1 +

13rn−1

)+

12p

= −12qn−1 +

12p

= −12

(qn−1 − p) .

Continuing in this manner,

qn − p =(−1

2

)n

(q0 − p).

Thus the difference between qn and p diminishes by half each generation, and qn approaches p ina zigzag manner. The male frequency rn displays the same behavior but lags behind qn by one

Page 7: Mathematical Genetics - UCLA Human · PDF fileMathematical Genetics 6.1 Introduction ... the rewards in combing two fields such as applied mathematics and genetics is commensurate

6.4. LINKAGE EQUILIBRIUM 99

Figure 6.2: Approach to Equilibrium of qn as a Function of n

generation. In contrast to the autosomal case, it takes more than one generation to achieve equilibrium.However, equilibrium is still approached relatively fast. In the extreme case that q0 = .75 and r0 = .12,Figure 6.2 plots qn for a few representative generations.

At equilibrium how do we calculate the frequencies of the various genotypes? Suppose we havetwo alleles A1 and A2 with equilibrium frequencies p1 and p2. Then the female genotypes A1/A1,A1/A2, and A2/A2 have frequencies p2

1, 2p1p2, and p22, respectively, just as in the autosomal case. In

males the hemizygous genotypes A1 and A2 clearly have frequencies p1 and p2.

Example 6.3.1 Hardy-Weinberg Equilibrium for the Xg(a) Locus

The red cell antigen Xg(a) is an X-linked dominant with a frequency in Caucasians of approxi-mately p = .65. Thus, about .65 of all Caucasian males and about p2+2p(1−p) = .88 of all Caucasianfemales carry the antigen.

6.4 Linkage Equilibrium

Loci on nonhomologous chromosomes show independent segregation at meiosis. In contrast, genesat two physically close loci on the same chromosome tend to stick together during the formation ofgametes. The recombination fraction θ between two loci is a monotone, nonlinear function of thephysical distance separating them. In family studies in man or in breeding studies in other species, θ

is the observable rather than physical distance. One can show that 0 ≤ θ ≤ 12 . The lower bound on θ

is obvious; the upper bound is attained by two loci on nonhomologous chromosomes.The population genetics law of linkage equilibrium is of fundamental importance in theoretical

calculations. Convergence to linkage equilibrium can be proved under the same assumptions usedto prove Hardy-Weinberg equilibrium. Suppose that allele Ai at locus A has frequency pi and alleleBj at locus B has frequency qj . Let Pn(AiBj) be the frequency of chromosomes with alleles Ai andBj among those gametes produced at generation n. Since recombination fractions almost invariablydiffer between the sexes, let θf and θm be the female and male recombination fractions, respectively,between the two loci. The average θ = (θf +θm)/2 governs the rate of approach to linkage equilibrium.

We can express Pn(AiBj) by conditioning on whether a gamete is an egg or a sperm and onwhether nonrecombination or recombination occurs. If recombination occurs, then the gamete carriesthe two alleles Ai and Bj with equilibrium probability piqj . Thus, the appropriate recurrence relationis

Pn(AiBj) =12

[(1 − θf )Pn−1(AiBj) + θfpiqj

]

+12

[(1 − θm)Pn−1(AiBj) + θmpiqj

]

= (1 − θ)Pn−1(AiBj) + θpiqj .

Page 8: Mathematical Genetics - UCLA Human · PDF fileMathematical Genetics 6.1 Introduction ... the rewards in combing two fields such as applied mathematics and genetics is commensurate

100 CHAPTER 6. MATHEMATICAL GENETICS

Note that this recurrence relation is valid when the two loci occur on nonhomologous chromosomesprovided θ = 1

2 and we interpret Pn(AiBj) as the probability that someone at generation n receives agamete bearing the two alleles Ai and Bj . Subtracting piqj from both sides of the recurrence relationgives

Pn(AiBj) − piqj = (1 − θ)[Pn−1(AiBj) − piqj ]...

= (1 − θ)n[P0(AiBj) − piqj ].

Thus, Pn(AiBj) converges to piqj at the geometric rate 1− θ. For two loci on different chromosomes,the deviation from linkage equilibrium is halved each generation. Equilibrium is approached muchmore slowly for closely spaced loci. Similar, but more cumbersome, proofs of convergence to linkageequilibrium can be given for three or more loci [6, 24, 51, 58]. Problem 7 explores the case of threeloci.

6.5 Selection

The simplest model of evolution involves selection at an autosomal locus with two alleles a and b. Atgeneration n, let allele a have population frequency pn and allele b population frequency qn = 1− pn.Under the usual assumptions of genetic equilibrium, we deduced the Hardy-Weinberg and linkageequilibrium laws. Now suppose that we relax the assumption of no selection by postulating differentfitnesses wa/a, wa/b, and wb/b for the three genotypes. Fitness is a technical term dealing with thereproductive capacity rather than the longevity of people with a given genotype. Thus, wa/a/wa/b

is the ratio of the expected genetic contribution to the next generation of an a/a individual to theexpected genetic contribution of an a/b individual. Since only fitness ratios are relevant, we canwithout loss of generality put wa/b = 1, wa/a = 1− r, and wb/b = 1− s, provided of course that r ≤ 1and s ≤ 1. Observe that r and s can be negative.

To explore the evolutionary dynamics of this model, we define the average fitness

w̄n = (1 − r)p2n + 2pnqn + (1 − s)q2

n

= 1 − rp2n − sq2

n

at generation n. Owing to our implicit assumption of random union of gametes, the Hardy-Weinbergproportions appear in the definition of w̄n even though the allele frequency pn changes over time.Because a/a individuals always contribute an a allele whereas a/b individuals do so only half of thetime, the new allele frequency pn+1 can be expressed as

pn+1 =(1 − r)p2

n + pnqn

w̄n

=pn(1 − rpn)

w̄n(6.4)

= f(pn).

Page 9: Mathematical Genetics - UCLA Human · PDF fileMathematical Genetics 6.1 Introduction ... the rewards in combing two fields such as applied mathematics and genetics is commensurate

6.5. SELECTION 101

We can equally well update qn via

qn+1 =qn(1 − sqn)

w̄n(6.5)

= g(qn).

There are three possible fixed points of f(p) on [0, 1]. Two of these, p = 0 and p = 1, are obviousfrom the definitions (6.4) and (6.5) of f(p) and g(q). The third fixed point p∞ satisfies

1 − rp∞ = 1 − rp2∞ − sq2

∞ = 1 − sq∞ (6.6)

and therefore equals sr+s . This fixed point falls on [0, 1] if and only if r and s have the same sign. To

characterize the stability of the three fixed points, we calculate the derivative

f ′(p) =(1 − 2rp)(1 − rp2 − sq2) + p(1 − rp)(2rp − 2sq)

(1 − rp2 − sq2)2

=p(1 − r)(1 − sq) + q(1 − s)(1 − rp)

(1 − rp2 − sq2)2

≥ 0.

It follows directly that f ′(0) = 11−s and f ′(1) = 1

1−r . When it exists, the interior fixed point p∞ = sr+s

gives

f ′(p∞) =p∞(1 − r)(1 − sq∞) + q∞(1 − s)(1 − rp∞)

(1 − rp2∞ − sq2

∞)2

=1 − rp∞ − sq∞1 − rp2

∞ − sq2∞

=r + s − 2rs

r + s − rs

in view of equation (6.6).Based on the magnitude of the derivative f ′(p) at a fixed point p, we can classify p as attractive

or repelling. Assume that neither r nor s vanishes. Then the boundary fixed point 0 is attractive fors < 0 and repelling for s > 0; the boundary fixed point 1 is attractive for r < 0 and repelling forr > 0; and the interior fixed point p∞ = s

r+s is attractive when r and s are both positive and repellingwhen r and s are both negative. Global stability can be assessed by drawing a cobweb diagram foreach of the four possible combinations of signs of r and s. In three of the four cases, one can easilyshow that the sole locally attractive fixed point is also globally attractive. (See Problem 9.) In theexceptional case, r < 0 and s < 0, any initial point p0 < s

r+s is attracted to 0, and any initial pointp0 > s

r+s is attracted to 1. The rate of convergence in each of the four cases is geometric.If either r or s vanishes, then some of the fixed points are neutral. Convergence still occurs, but

the rate is no longer geometric. For example, consider the situation r > 0 and s = 0. The iterates pn

are repelled by 1 and attracted to 0. For pn close to 0, the equality

pn+1 =pn(1 − rpn)

1 − rp2n

Page 10: Mathematical Genetics - UCLA Human · PDF fileMathematical Genetics 6.1 Introduction ... the rewards in combing two fields such as applied mathematics and genetics is commensurate

102 CHAPTER 6. MATHEMATICAL GENETICS

entails

1pn+1

− 1pn

=1pn

(1 − rp2

n

1 − rpn− 1)

=r(1 − pn)1− rpn

≈ r.

It follows that for p0 close to 0

1pn

− 1p0

=n−1∑

i=0

(1

pi+1− 1

pi)

≈ nr.

This approximation implies the slow convergence

pn ≈ 1nr + 1

p0

for selection against a pure recessive.

Heterozygote advantage (r and s both positive) is the most interesting situation covered by thisclassic selection model. Geneticists have suggested that several recessive diseases are maintainedat high frequencies by the mechanism of heterozygote advantage. The best evidence favoring thishypothesis exists for sickle cell anemia [11]. A single dose of the sickle cell gene appears to conferprotection against malaria. The evidence is much weaker for a heterozygote advantage in Tay Sachsdisease and cystic fibrosis. Geneticists have conjectured that these genes may protect carriers fromtuberculosis and cholera, respectively [64].

6.6 Balance between Mutation and Selection

Mutations furnish the raw material of evolutionary change. In practice, most mutations are eitherneutral or deleterious. We now briefly discuss the balance between deleterious mutations and selection.Consider first the case of a dominant disease. In the notation of the last section, let b be the normalallele and a the disease allele, and define the fitnesses of the three genotypes by r > 0 and s < 0. If themutation rate from b to a is µ, then equilibrium is achieved between the opposing forces of mutationand selection when

q∞ =q∞(1 − sq∞)

1 − rp2∞ − sq2

∞(1 − µ).

If we multiply this equation by 1 − rp2∞ − sq2

∞ and divide it by q∞, we get

1 − rp2∞ − sq2

∞ = (1 − sq∞)(1 − µ).

Page 11: Mathematical Genetics - UCLA Human · PDF fileMathematical Genetics 6.1 Introduction ... the rewards in combing two fields such as applied mathematics and genetics is commensurate

6.7. FUNDAMENTAL THEOREM OF NATURAL SELECTION 103

Dropping the negligible term rp2∞, we find that this quadratic has the approximate solution

q∞ ≈ 1 − µ

2

[1 +

√1 +

s(1 − µ)2

]

≈ 1 − µ

2

[1 + 1 +

s(1 − µ)2

]

≈ 1 +µ(1 − s)

s,

which yields p∞ = 1 − q∞ ≈ µ(1−s)−s . The corresponding equilibrium frequency of affecteds is approx-

imately 2p∞q∞ ≈ 2µ(1−s)−s . Mutation rates in the range 10−7 to 10−5 are typical.

For a recessive disease (r > 0 and s = 0), the balance equation becomes

q∞ =q∞(1 − sq∞)

w̄∞(1 − µ)

=q∞

1 − rp2∞

(1 − µ).

In other words, 1 − rp2∞ = 1 − µ, which has solution p∞ =

õ/r. The frequency of affecteds

at equilibrium is now p2∞ = µ

r . Thus given equal mutation rates, dominant and recessive diseaseswill afflict comparable numbers of people. In contrast, the underlying allele frequencies and ratesof approach to equilibrium vary dramatically. (See Problem 10.) Indeed, it is debatable whetherany human population has existed long enough for the alleles at a recessive disease locus to achievea balance between mutation and selection. Random sampling of gametes (genetic drift) and smallinitial population sizes (founder effect) play a much larger role in determining the frequency of recessivediseases in modern human populations.

6.7 Fundamental Theorem of Natural Selection

Selection and mutation can certainly act on more than just two alleles. In this section we discuss amultiallelic model for selection without mutation. Our goal is to prove Fisher’s fundamental theoremof natural selection, which simply asserts that fitness always increases. In modern mathematicalterminology, fitness serves as a discrete Liapunov function. Such functions give considerable insightinto the evolution of a dynamical system.

Let us suppose that there are m alleles with population frequencies p1, . . . , pm. To each orderedgenotype i/j we associate a fitness wij = wji. The quadratic form

f(p) =m∑

i=1

m∑

j=1

piwijpj = ptWp

then represents the overall fitness of the population. Here the matrix symmetric W has entries wij , andthe vector p has entries pi. The importance of the fitness function f(p) stems from the transformation

T (p)i =pi

∑mj=1 wijpj

f(p)=

piwi

f(p)(6.7)

Page 12: Mathematical Genetics - UCLA Human · PDF fileMathematical Genetics 6.1 Introduction ... the rewards in combing two fields such as applied mathematics and genetics is commensurate

104 CHAPTER 6. MATHEMATICAL GENETICS

relating the allele frequencies at the next generation to the allele frequencies at the current generation.Here wi =

∑mj=1 wijpj is the marginal fitness of allele i. Because of the absence of mutation, we impose

the initial condition pi > 0 for all i. This assumption clearly entails T (p)i > 0 for all i, but it doesnot prevent one or more allele frequencies from converging to 0.

In deriving the update (6.7), we adopt six of the seven assumptions made in deriving Hardy-Weinberg equilibrium. Obviously, assumption (4) of no selection must be relaxed. We imagine indi-viduals contributing gametes to a gamete pool in proportion to their fitnesses. The next generation iscreated by sampling gametes from the pool with replacement. Only half of the gametes of a heterozy-gous i/j parent bear an i allele, while all of the gametes of a homozygous i/i parent bear an i allele.Weighting these fractions by the respective population frequencies 2pipj and p2

i then yields formula(6.7).

To prove Fisher’s fundamental theorem, we follow Kingman [49] and apply Jensen’s inequality.The simplest version of Jensen’s inequality states that

g

(m∑

i=1

pixi

)≤

m∑

i=1

pig(xi)

for a convex function g(x) and nonnegative weights pi — such as allele frequencies — that sum to1. Strict inequality prevails if g(x) is strictly convex and some of the xi with nonzero multipliers pi

differ. The strictly convex function of immediate interest is g(x) = xα for α > 1.With these preliminaries in mind, we express the fitness of the next generation as

f [T (p)] =m∑

i=1

m∑

j=1

T (p)iwijT (p)j

=1

f(p)2

m∑

i=1

m∑

j=1

m∑

k=1

wijwikpipjpkwj

=1

f(p)2

m∑

i=1

m∑

j=1

m∑

k=1

wijwikpipjpk(wj + wk)

2,

exploiting the obvious symmetry of the middle expression for f [T (p)] in the indices j and k. In viewof the arithmetic-geometric mean inequality (wj + wk)/2 ≥ √

wjwk, this representation yields

f [T (p)] ≥

∑mi=1 pi

(∑mj=1 wijpj

√wj

)2

f(p)2.

If we now apply Jensen’s inequality with g(x) = x2 and substitute wji for wij , then we deduce that

f [T (p)] ≥

(∑mi=1 pi

∑mj=1 wijpj

√wj

)2

f(p)2

=

(∑mj=1 pjw

3/2j

)2

f(p)2.

Page 13: Mathematical Genetics - UCLA Human · PDF fileMathematical Genetics 6.1 Introduction ... the rewards in combing two fields such as applied mathematics and genetics is commensurate

6.8. KIMURA’S DNA SUBSTITUTION MODEL 105

A final application of Jensen’s inequality with g(x) = x3/2 produces

f [T (p)] ≥

[(∑mj=1 pjwj

)3/2]2

f(p)2=

f(p)3

f(p)2= f(p).

Careful examination of our proof of the Liapunov condition f [T (p)] ≥ f(p) shows that equalityprevails throughout if and only if wi = wj for all i and j. These are precisely the circumstancesunder which p is a fixed point of the map T . A more detailed mathematical analysis of the problemdemonstrates that any limit point of the iterates p, T (p), T (T (p)) and so forth must be a fixed point.Of course, one or more of the components of a fixed point p may equal 0.

6.8 Kimura’s DNA Substitution Model

Kimura has suggested a model for base pair substitution in molecular evolution [48, 54]. Recall thatDNA is a long, double polymer constructed from the four bases (or nucleotides) adenine, guanine,cytosine, and thymine. These bases are abbreviated A, G, C, and T, respectively. Two of the basesare purines (A and G), and two are pyrimidines (C and T). The two strands of DNA form a doublehelix containing complementary hereditary information in the sense that A and T and C and G arealways paired across strands. For example, if one strand contains the block –ACCGT– of bases, thenthe other strand contains the complementary block –TGGCA– of bases. Thus, one is justified infollowing the evolutionary development of one strand and ignoring the other strand.

Kimura defines a continuous-time Markov chain with the four states A, G, C, and T to capturethe evolutionary history of a species at a single position (or site) along a DNA strand. Mutationsoccur from time to time that change the base at the site. Let rij be the rate at which base i mutatesto base j. (In this section and the next, we use the more natural left to right convention i → j ofprobability theory rather than the right to left convention j → i of compartment theory.) Kimuraradically simplifies these rates. Let us write i ' j if i and j are both purines or both pyrimidines andi 6' j if one is a purine and the other is a pyrimidine. Then Kimura assumes that

rij ={

α i ' jβ i 6' j.

We now exploit the symmetry of these rates in calculating for every ordered pair (i, j) the probabilitypij(t) that the chain is in state j at time t given that it was in state i at time 0. In calculating thepij(t), we will, in effect, exponentiate a certain a matrix.

Suppose the chain starts in a particular purine (A or G) at time 0. Let us define a simplercontinuous time Markov chain with three states. The chain is said to be in state b when the beginningpurine occurs. It is in state o when the opposite purine occurs, and it is in state y when a pyrimidineoccurs. Our goal is to compute the probabilities qbb(t), qbo(t), and qby(t) of being in state b, o, ory, respectively, at time t given that it started in state b. Of course, qbb(t) + qbo(t) + qby(t) = 1,so it actually suffices to compute just qbo(t) and qby(t). Once we have calculated these quantities,

Page 14: Mathematical Genetics - UCLA Human · PDF fileMathematical Genetics 6.1 Introduction ... the rewards in combing two fields such as applied mathematics and genetics is commensurate

106 CHAPTER 6. MATHEMATICAL GENETICS

then we can recover probabilities such as pAA(t) = qbb(t) by definition and probabilities such aspAC(t) = 1

2qby(t) by symmetry. Because the same dynamics apply starting from a pyrimidine ratherthan a purine, solving the three-state Markov chain completely solves the original problem of specifyingthe pij(t).

We can derive a system of linear differential equations for qbo(t) and qby(t) by noting the Chapman-Kolmogorov relations

qbo(t + h) = qbo(t)qoo(h) + qby(t)qyo(h) + [1 − qbo(t) − qby(t)]qbo(h)

qby(t + h) = qbo(t)qoy(h) + qby(t)qyy(h) + [1 − qbo(t) − qby(t)]qby(h),

which incorporate two natural assumptions. The first says that some intermediate base must occupythe site at time t, and the second says that the evolutionary process has no memory. We now callon approximations such as qyo(t) = βh + o(h) and qoo(h) = 1 − αh − 2βh + o(h), which follow byconsidering the three possible ways and corresponding probabilities of leaving each state in a smalltime h. Substituting these and other similar approximations in the Chapman-Kolmogorov relationsyields the revised equations

qbo(t + h) = qbo(t)(1 − αh − 2βh) + qby(t)βh + [1 − qbo(t) − qby(t)]αh

+o(h)

qby(t + h) = qbo(t)2βh + qby(t)(1 − 2βh) + [1 − qbo(t) − qby(t)]2βh

+o(h)

for h > 0 small.If we now form difference quotients and let h tend to 0 in (6.8), then we derive the two differential

equations

q′bo(t) = α − 2(α + β)qbo(t) + (β − α)qby(t) (6.8)

q′by(t) = 2β − 4βqby(t).

Taking into account the initial condition qby(0) = 0, one can readily verify that the second of thedifferential equations in (6.8) has solution

qby(t) =12− 1

2e−4βt.

To solve the first differential equation in (6.8), we now substitute for qby(t) and multiply by theintegrating factor e2(α+β)t. These steps yield the revised differential equation

[qbo(t)e2(α+β)t

]′= αe2(α+β)t +

β − α

2

[e2(α+β)t − e2(α−β)t

],

which can be integrated to give the solution

qbo(t) =14− 1

2e−2(α+β)t +

14e−4βt

Page 15: Mathematical Genetics - UCLA Human · PDF fileMathematical Genetics 6.1 Introduction ... the rewards in combing two fields such as applied mathematics and genetics is commensurate

6.9. PROBLEMS 107

consistent with the initial condition qbo(0) = 0. Finally subtraction gives

qbb(t) =14

+12e−2(α+β)t +

14e−4βt.

6.9 Problems

1. In blood transfusions, compatibility at the ABO and Rh loci is important. These autosomalloci are unlinked. At the Rh locus the + allele codes for the presence of a red cell antigen andtherefore is dominant to the − allele, which codes for the absence of the antigen. Suppose thatthe frequencies of the two Rh alleles are q+ and q−. Type O− people are universal donors, andtype AB+ people are universal recipients. Under genetic equilibrium, what are the populationfrequencies of these two types of people? (Reference [11] discusses these genetic systems andgives allele frequencies for some representative populations.)

2. Suppose in the Hardy-Weinberg model for an autosomal locus that the genotype frequencies forthe two sexes differ. What is the ultimate frequency of a given allele? How long does it takegenotype frequencies to stabilize at their Hardy-Weinberg values?

3. Consider an autosomal locus with m alleles in Hardy-Weinberg equilibrium. If allele Ai hasfrequency pi, then show that a random non-inbred person is heterozygous with probability1 −

∑mi=1 p2

i . What is the maximum of this probability, and for what allele frequencies is thismaximum attained?

4. In forensic applications of genetics, loci with high exclusion probabilities are typed. For acodominant locus with n alleles, show that the probability of two random people having differentgenotypes is

e =n−1∑

i=1

n∑

j=i+1

2pipj(1 − 2pipj) +n∑

i=1

p2i (1 − p2

i )

under Hardy-Weinberg equilibrium [50]. Simplify this expression to

e = 1 − 2( n∑

i=1

p2i

)2

+n∑

i=1

p4i .

Prove rigorously that e attains its maximum emax = 1 − 2n2 + 1

n3 when all pi = 1n . For two

independent loci with√

n alleles each, verify that the maximum exclusion probability basedon exclusion at either locus is 1 − 4

n2 + 4n5/2 − 1

n3 . How does this compare to the maximumexclusion probability for a single locus with n equally frequent alleles when n = 16? What doyou conclude about the information content of two loci versus one locus? (Hint: To prove theclaim about emax, note that without loss of generality one can assume p1 ≤ p2 ≤ · · · ≤ pn. Ifpi < pi+1, then e can be increased by replacing pi and pi+1 by pi + x and pi+1 −x for x positiveand sufficiently small.)

Page 16: Mathematical Genetics - UCLA Human · PDF fileMathematical Genetics 6.1 Introduction ... the rewards in combing two fields such as applied mathematics and genetics is commensurate

108 CHAPTER 6. MATHEMATICAL GENETICS

5. Moran [61] has posed a model for the approach of allele frequencies to Hardy-Weinberg equilib-rium that permits generations to overlap. Let u(t), v(t), and w(t) be the relative proportionsof the genotypes A1/A1, A1/A2, and A2/A2 at time t. Assume that in the small time interval(t, t+ dt) a proportion dt of the population dies and is replaced by the offspring of random mat-ings from the residue of the population. The other assumptions for Hardy-Weinberg equilibriumenumerated in Chapter 4 remain in force.

(a) Show that for small dt

u(t + dt) = u(t)(1 − dt) +[u(t) +

12v(t)

]2dt + o(dt).

Hence,

u′(t) = −u(t) +[u(t) +

12v(t)

]2.

(b) Similarly derive the differential equations

v′(t) = −v(t) + 2[u(t) +

12v(t)

][12v(t) + w(t)

]

w′(t) = −w(t) +[12v(t) + w(t)

]2.

(c) Let p(t) = u(t) + 12v(t) be the allele frequency of A1. Verify that p′(t) = 0 and that

p(t) = p0 is constant.

(d) Show that

[u(t) − p20]

′ = −[u(t) − p20],

and so

u(t) − p20 = [u(0) − p2

0]e−t.

(e) Similarly prove

v(t) − 2p0(1 − p0) = [v(0) − 2p0(1 − p0)]e−t

w(t) − (1 − p0)2 = [w(0) − (1 − p0)2]e−t.

(f) If time is measured in generations, then how many generations does it take for the departurefrom Hardy-Weinberg equilibrium to be halved?

6. Consider an X-linked version of the Moran model in the previous problem. Again let u(t),v(t), and w(t) be the frequencies of the three female genotypes A1/A1, A1/A2, and A2/A2,respectively. Let r(t) and s(t) be the frequencies of the male genotypes A1 and A2.

Page 17: Mathematical Genetics - UCLA Human · PDF fileMathematical Genetics 6.1 Introduction ... the rewards in combing two fields such as applied mathematics and genetics is commensurate

6.9. PROBLEMS 109

(a) Verify the differential equations

r′(t) = −r(t) + u(t) +12v(t)

s′(t) = −s(t) +12v(t) + w(t)

u′(t) = −u(t) + r(t)[u(t) +

12v(t)

]

v′(t) = −v(t) + r(t)[12v(t) + w(t)

]+ s(t)

[u(t) +

12v(t)

]

w′(t) = −w(t) + s(t)[12v(t) + w(t)

].

(b) Show that the frequency r(t)3 + 2

3 [u(t) + 12v(t)] of the A1 allele is constant.

(c) Let p0 be the frequency of the A1 allele. Demonstrate that

[r(t) − p0]′ = −32[r(t) − p0],

and hence

r(t) − p0 = [r(0) − p0]e−32 t.

(d) Use parts (a) and (c) to establish

limt→∞

[u(t) +

12v(t)

]= p0.

(e) Show that

[(u(t) − p20)e

t]′

= u′(t)et + u(t)et − p20e

t

= r(t)[u(t) +

12v(t)

]et − p2

0et

=(p0 + [r(0) − p0]e−

32 t)

×(p0 −

13p0 −

13[r(0) − p0]e−

32 t)3

2et − p2

0et

=(p0 + [r(0) − p0]e−

32 t)

×(p0 −

12[r(0) − p0]e−

32 t)et − p2

0et

=p0

2[r(0) − p0]e−

t2 − 1

2[r(0) − p0]2e−2t.

Thus,

u(t) − p20 = [u(0) − p2

0]e−t + p0[r(0) − p0](e−t − e−

32 t)

− 14[r(0) − p0]2[e−t − e−3t].

It follows that limt→∞ u(t) = p20.

Page 18: Mathematical Genetics - UCLA Human · PDF fileMathematical Genetics 6.1 Introduction ... the rewards in combing two fields such as applied mathematics and genetics is commensurate

110 CHAPTER 6. MATHEMATICAL GENETICS

(f) Finally, show that

limt→∞

s(t) = 1 − p0

limt→∞

v(t) = 2p0(1 − p0)

limt→∞

w(t) = (1 − p0)2.

7. Consider three loci A—B—C along a chromosome. To model convergence to linkage equilibriumat these loci, select alleles Ai, Bj , and Ck and denote their population frequencies by pi, qj , andrk. Let θAB be the probability of recombination between loci A and B but not between B andC. Define θBC similarly. Let θAC be the probability of simultaneous recombination between lociA and B and between loci B and C. Finally, adopt the usual conditions for Hardy-Weinbergand linkage equilibrium.

(a) Show that the gamete frequency Pn(AiBjCk) satisfies

Pn(AiBjCk) = (1 − θAB − θBC − θAC)Pn−1(AiBjCk)

+ θABpiPn−1(BjCk) + θBCrkPn−1(AiBj)

+ θACqjPn−1(AiCk).

(b) Define the function

Ln(AiBjCk) = Pn(AiBjCk) − piqjrk − pi[Pn(BjCk) − qjrk ]

− rk [Pn(AiBj) − piqj ] − qj [Pn(AiCk) − pirk ].

Show that Ln(AiBjCk) satisfies

Ln(AiBjCk) = (1 − θAB − θBC − θAC)Ln−1(AiBjCk).

(Hint: Substitute for Pn(BjCk) − qjrk and similar terms using the recurrence relation fortwo loci.)

(c) Argue that limn→∞ Ln(AiBjCk) = 0. As a consequence, conclude that

limn→∞

Pn(AiBjCk) = piqjrk.

8. To verify convergence to linkage equilibrium for a pair of X-linked loci A and B, define Pnx(AiBj)and Pny(AiBj) to be the frequencies of the AiBj haplotype at generation n in females andmales, respectively. For the sake of simplicity, assume that both loci are in Hardy-Weinbergequilibrium and that the alleles Ai and Bj have frequencies pi and qj . If zn denotes the columnvector [Pnx(AiBj), Pny(AiBj)]t and θ the female recombination fraction between the loci, thendemonstrate the recurrence relation

zn = Mzn−1 + θpiqj

(121

)(6.9)

Page 19: Mathematical Genetics - UCLA Human · PDF fileMathematical Genetics 6.1 Introduction ... the rewards in combing two fields such as applied mathematics and genetics is commensurate

6.9. PROBLEMS 111

under the usual equilibrium conditions, where the matrix

M =(

12 [1 − θ] 1

21 − θ 0

).

Show that the recurrence relation (6.9) can be recast as wn = Mwn−1 for wn = zn − piqj1t,where 1 = (1, 1)t. Prove that this last recurrence implies limn→∞ wn = 0. (Hint: Show thatthe dominant eigenvalue of M is less than 1 in absolute value.)

9. Draw four cobweb diagrams for the selection model of Section 6.5 and verify that the iteratespn converge as asserted in the text.

10. In the mutation-selection model of Section 6.6, show that the dynamics are approximated bythe difference equation

pn+1 = µ +1

1 − spn

for a dominant disease and by the difference equation

pn+1 = µ + pnqn + (1 − r)p2n

for a recessive disease. Show that the equilibrium is approached at the fast geometric rate 11−s

for a dominant and at the very slow geometric rate 1 − 2√

rµ for a recessive.

11. Consider an autosomal dominant disease in a stationary population. If the fitness of normal b/b

people to the fitness of affected a/b people is in the ratio 1 − s : 1, then show that the averagenumber of people ultimately affected by a new mutation is 1−s

−s . (Hints: An b/b person has onaverage 2 children while an a/b person has on average 2

1−s children, half of whom are affected.Write and solve an equation counting the new mutant and the expected numbers of affectedsoriginating from each of his or her affected children. Remember that s < 0.)

12. Consider a model for the mutation-selection balance at an X-linked locus. Let normal femalesand males have fitness 1, carrier females fitness tx, and affected males fitness ty. Also, let themutation rate from the normal allele b to the disease allele a be µ in both sexes. It is possibleto write and solve two equations for the equilibrium frequencies p∞x and p∞y of carrier femalesand affected males.

(a) Derive the two approximate equations

p∞x ≈ 2µ + p∞x12tx + p∞yty

p∞y ≈ µ + p∞x12tx

assuming the disease is rare.

(b) Solve the two equations in (a).

Page 20: Mathematical Genetics - UCLA Human · PDF fileMathematical Genetics 6.1 Introduction ... the rewards in combing two fields such as applied mathematics and genetics is commensurate

112 CHAPTER 6. MATHEMATICAL GENETICS

(c) When tx = 1, show that the fraction of affected males representing new mutations is13 (1 − ty). This fraction does not depend on the mutation rate.

(d) If tx = 1 and ty = 0, then prove that p∞x ≈ 4µ and p∞y ≈ 3µ.

13. As an elaboration of Example 3.2.2, it of some interest to determine the number of generations n

it takes for allele a to go from frequency p0 to frequency pn in the selection model of Section 6.5.This is a rather difficult problem to treat in the context of difference equations. However, for slowselection, considerable progress can be made by passing to a differential equation approximation.This entails replacing pn by a function p(t) of the continuous time variable t. Show that equation(6.4) can be rephrased as

pn+1 − pn =pnqn[s − (r + s)pn]

w̄n.

If we treat one generation as our unit of time, then the continuous analog of this differenceequation is

dp

dt=

pq[s − (r + s)p]w̄

,

where q = 1 − p and w̄ = 1 − rp2 − sq2. If we take this approximation seriously, then

n ≈∫ n

0

dt =∫ pn

p0

pq[s − (r + s)p]dp.

Show that this leads to

n ≈(

1s− 1)

lnpn

p0+(

1r− 1)

ln1 − pn

1 − p0

−(

1r

+1s− 1)

ln|s − (r + s)pn||s − (r + s)p0|

when pn and p0 are both on the same side of the internal equilibrium point and neither r nor s

is 0. Derive a similar approximation when s = 0 or r = 0. Why is necessary to postulate thatpn and p0 be on the same side of the internal equilibrium point? Is it possible to calculate anegative value of n? If so, what does it mean?

14. To explore the impact of genetic screening for carriers, consider a lethal recessive disease withtwo alleles, the normal allele A2 and the recessive disease allele A1. Mutation from A2 to A1

takes place at rate µ. No backmutation is permitted. An entire population is screened forcarriers. If a husband and wife are both carriers, then all fetuses of the wife are checked, andthose who will develop the disease are aborted. The couple compensates for such unsuccessfulpregnancies, so that they have an average number of normal children. Affected children born toparents not at high risk likewise are compensated for by the parents. These particular affectedchildren are new mutations and do not contribute to the next generation. Let un and vn be thefrequency of people with genotypes A1/A2 and A2/A2, respectively, at generation n.

Page 21: Mathematical Genetics - UCLA Human · PDF fileMathematical Genetics 6.1 Introduction ... the rewards in combing two fields such as applied mathematics and genetics is commensurate

6.9. PROBLEMS 113

Table 9.3: Mating Outcomes under Genetic Screening

Mating Type Frequency A1/A2 Offspring A2/A2 Offspring

A1/A2 × A1/A2 u2n

23 + 4

9µ 13 − 4

A1/A2 × A2/A2 2unvn12 + 3

4µ 12 − 3

A2/A2 × A2/A2 v2n 2µ 1 − 2µ

(a) In Table 9.3, mathematically justify the mating frequencies exactly and the offspring fre-quencies to order O(µ2). (Hint: Apply the expansion (1 − x)−1 =

∑∞k=0 xk for |x| < 1.)

(b) Derive a pair of recurrence relations for un+1 and vn+1 based on the results of Table 9.3.Use the recurrence relations to show that un + vn = 1 for all n.

(c) Demonstrate that the recurrence relation for un+1 has equilibrium value u∞ =√

6µ. Thisimplies a frequency of approximately

√3µ/2 for allele A1. (Hint: In the recurrence for

un+1, substitute vn = 1 − un and take limits. Assume that u∞ is of order√

µ and neglectall terms of order µ3/2 or smaller.)

(d) Find the function f(u) giving the recurrence un+1 = f(un). Show that f ′(u∞) ≈ 1 −2√

2µ/3.

(e) Discuss the implications of the above analysis for genetic screening. Consider the increasein the equilibrium frequency of the disease allele and, in light of Problem 13, the speed atwhich this increased frequency is attained.

15. In the context of Fisher’s fundamental theorem of natural selection, show that each componentof a fixed point of the iteration map T (p) satisfies either pi = or wi = f(p). If all pi > 0, thenprove that the conditions wi = f(p) are precisely the necessary conditions for a maximum off(p) subject to the constraint

∑mi=1 pi = 1. In either case, prove that

p =1

1tW−11W−11

provided the fitness matrix W is invertible. (Hint: Maximize f(p) by introducing Lagrangemultipliers.)

16. In Kimura’s model, suppose that two new species bifurcate at time 0 from an ancestral speciesand evolve independently thereafter. Show that the probability that the two species possess thesame base at a given site at time t is

14

+12e−4(α+β)t +

14e−8βt.

(Hint: By symmetry this formula holds regardless of what base was present at the site in theancestral species.)

Page 22: Mathematical Genetics - UCLA Human · PDF fileMathematical Genetics 6.1 Introduction ... the rewards in combing two fields such as applied mathematics and genetics is commensurate

114 CHAPTER 6. MATHEMATICAL GENETICS

17. In Kimura’s model, it turns out that the various probabilities pij(t) can be expressed as entriesof the matrix exponential P (t) = etΛ, where

Λ =

A G C T

A −(α + 2β) α β β

G α −(α + 2β) β β

C β β −(α + 2β) α

T β β α −(α + 2β)

.

Show that Λ can be written as Λ = TDT−1 for D a diagonal matrix and T an orthogonalmatrix. In particular, check that the eigenvalues are 0, −4β, −2(α + β), and −2(α + β) withcorresponding eigenvectors

12

1111

,

12

−1−111

,

1√2

00−11

,

1√2

−1100

.

With these ingredients for exponentiating tΛ, calculate the typical entries

pAA(t) =14

+14e−4βt +

12e−2(α+β)t

pAG(t) =14

+14e−4βt −

12e−2(α+β)t

pAC(t) =14− 1

4e−4βt

of P (t).