163
PCB5065 Advanced Genetics Population Genetics and Quantitative Genetics Instructor: Rongling Wu, 409 McCarty Hall, Department of Statistics Tel: 2-3806, Email: [email protected] Mon Nov 14 Population genetics - population structure Tues Nov 15 Population genetics - Hardy-Weinberg equilibrium Wed Nov 16 Population genetics - effective population size Thurs Nov 17 Population genetics - linkage disequilibrium Mon Nov 21 Population genetics - evolutionary forces Tues Nov 22 Population genetics - evolutionary forces Wed Nov 23 Genetic Parameters: Means Mon Nov 28 Genetic Parameters: (Co)Variances Tues Nov 29 Mating Designs for Parameter Estimation Wed Nov 30 Discussion paper - Epigenetics / developmental genetics Thurs Dec 1 No Class – UFGI Genetics Symposium Reitz Union Mon Dec 5 Experimental Designs for Parameter Estimation Tues Dec 6 Heritability, Genetic Correlation and Gain from Selection Wed Dec 7 Toward Molecular Dissection of Quantitative Variation Wed Dec 7 Take-home exam on pop. and quant. genetics

PCB5065 Advanced Genetics Population Genetics and

  • Upload
    pammy98

  • View
    1.230

  • Download
    12

Embed Size (px)

Citation preview

Page 1: PCB5065 Advanced Genetics Population Genetics and

PCB5065 Advanced GeneticsPopulation Genetics and Quantitative Genetics

  Instructor: Rongling Wu, 409 McCarty Hall, Department of Statistics

Tel: 2-3806, Email: [email protected] 

Mon Nov 14 Population genetics - population structureTues Nov 15 Population genetics - Hardy-Weinberg equilibriumWed Nov 16 Population genetics - effective population sizeThurs Nov 17 Population genetics - linkage disequilibriumMon Nov 21 Population genetics - evolutionary forcesTues Nov 22 Population genetics - evolutionary forces Wed Nov 23 Genetic Parameters: MeansMon Nov 28 Genetic Parameters: (Co)VariancesTues Nov 29 Mating Designs for Parameter EstimationWed Nov 30 Discussion paper - Epigenetics / developmental geneticsThurs Dec 1 No Class – UFGI Genetics Symposium Reitz Union Mon Dec 5 Experimental Designs for Parameter EstimationTues Dec 6 Heritability, Genetic Correlation and Gain from SelectionWed Dec 7 Toward Molecular Dissection of Quantitative VariationWed Dec 7 Take-home exam on pop. and quant. genetics given- due in

electronic format ([email protected]) by 5 PM Mon. Dec. 12

Page 2: PCB5065 Advanced Genetics Population Genetics and

Teosinte and Maize

Teosinte branched 1(tb1) is found to affect the differentiation in branch architecture from teosinte to maize (John Doebley 2001)

Page 3: PCB5065 Advanced Genetics Population Genetics and

Approaches used to support the view that modern maize cultivars are domesticated

from the wild type teosinte

Population genetics

• Study the evolutionary or phylogenetic relationships between maize and its wild relative

• Study evolutionary forces that have shaped the structure of and diversity in the maize genome

Page 4: PCB5065 Advanced Genetics Population Genetics and

Quantitative genetics

• Identify the genetic architecture of the differences in morphology between maize and teosinte

• Estimate the number of genes required for the evolution of a new morphological trait from teosinte to maize: few genes of large effect or many genes of small effect?

• Doebley pioneered the use of quantitative trait locus (QTL) mapping approaches to successfully identify genomic regions that are responsible for the separation of maize from its undomesticated relatives.

Page 5: PCB5065 Advanced Genetics Population Genetics and

• Doebley has cloned genes identified through QTL mapping, teosinte branched1 (tb1), which governs kernel structure and plant architecture.

• Ancient Mexicans used several thousand years ago to transform the wild grass teosinte into modern maize through rounds of selective breeding for large ears of corn.

• With genetic information, ‘‘I think in as few as 25 years I can move teosinte fairly far along the road to becoming maize,’’ Doebley predicts (Brownlee, 2004 PNAS vol. 101: 697–699)

Page 6: PCB5065 Advanced Genetics Population Genetics and

Toward biomedical breakthroughs?

Single Nucleotide Polymorphisms (SNPs)

no cancer

cancer

Page 7: PCB5065 Advanced Genetics Population Genetics and

• According to The International HapMap Consortium (2003), the statistical analysis and modeling of the links between DNA sequence variants and phenotypes will play a pivotal role in the characterization of specific genes for various diseases and, ultimately, the design of personalized medications that are optimal for individual patients.

• What knowledge is needed to perform such statistical analyses?

• Population genetics and quantitative genetics, and others…

• The International HapMap Consortium, 2003 The International HapMap Project. Nature 426: 789-94.

• Liu, T., J. A. Johnson, G. Casella and R. L. Wu, 2004 Sequencing complex diseases with HapMap. Genetics 168: 503-511.

Page 8: PCB5065 Advanced Genetics Population Genetics and

Basic Genetics(1) Mendelian genetics

How does a gene transmit from a parent to its progeny (individual)?

(2) Population geneticsHow is a gene segregating in a population (a group of individuals)?

(3) Quantitative geneticsHow is gene segregation related with the phenotype of a character?

(4) Molecular geneticsWhat is the molecular basis of gene segregation and transmission?

(5) Developmental genetics(6) Epigenetics

Page 9: PCB5065 Advanced Genetics Population Genetics and

Mendelian Genetics Probability

Population Genetics Statistics

Quantitative genetics Molecular Genetics

Statistical Genetics Mathematics with biology (our view)

Cutting-edge research at the interface among genetics, evolution and development (Evo-Devo)

Wu, R. L. Functional mapping – how to map and study the genetic architecture of dynamic complex traits. Nature Reviews Genetics (accepted)

Page 10: PCB5065 Advanced Genetics Population Genetics and

Mendel’s Laws

Mendel’s first law• There is a gene with two alleles on a chromosome location

(locus)• These alleles segregate during the formation of the reproductive

cells, thus passing into different gametes

Mendel’s second law• There are two or more pairs of genes on different chromosomes• They segregate independently (partially correct)

Linkage (exception to Mendel’s second law)• There are two or more pairs of genes located on the same

chromosome• They can be linked or associated (the degree of association is

described by the recombination fraction)

Page 11: PCB5065 Advanced Genetics Population Genetics and

Population Genetics

• Different copies of a gene are called alleles; for example A and a at gene A;

• These alleles form three genotypes, AA, Aa and aa;

• The allele (or gene) frequency of an allele is defined as the proportion of this allele among a group of individuals;

• Accordingly, the genotype frequency is the proportion of a genotype among a group of individuals

Page 12: PCB5065 Advanced Genetics Population Genetics and

Calculations of allele frequencies and genotype frequencies

Genotypes Counts Estimates genotype frequenciesAA 224 PAA = 224/294 = 0.762Aa 64 PAa = 64/294 = 0.218aa 6 Paa = 6/294 = 0.020

Total 294 PAA + PAa + Paa = 1

Allele frequenciespA = (2214+64)/(2294)=0.871, pa = (26+64)/(2294)=0.129, pA + pa = 0.871 + 0.129 = 1

Expected genotype frequenciesAA pA

2 = 0.8712 = 0.769Aa 2pApa = 2 0.871 0.129 = 0.224Aa pa

2 = 0.1292 = 0.017

Page 13: PCB5065 Advanced Genetics Population Genetics and

Genotypes Counts Estimates of genotype freq.

AA nAA PAA = nAA/n

Aa nAa PAa = nAa/n

aa naa Paa = naa/n

Total n PAA + PAa + Paa = 1

Allele frequencies

pA = (2nAA + nAa)/2n

pa = (2naa + nAa)/2n

Standard error of the estimate of the allele frequency

Var(pA) = pA(1 - pA)/2n

Page 14: PCB5065 Advanced Genetics Population Genetics and

The Hardy-Weinberg Law

• In the Hardy-Weinberg equilibrium (HWE), the relative frequencies of the genotypes will remain unchanged from generation to generation;

• As long as a population is randomly mating, the population can reach HWE from the second generation;

• The deviation from HWE, called Hardy-Weinberg disequilibrium (HWD), results from many factors, such as selection, mutation, admixture and population structure…

Page 15: PCB5065 Advanced Genetics Population Genetics and

Mendelian inheritance at the individual level(1) Make a cross between two individual parents(2) Consider one gene (A) with two alleles A and a AA, Aa, aa

Thus, we have a total of nine possible cross combinations:

Cross Mendelian segregation ratio1. AA AA AA2. AA Aa ½AA + ½Aa3. AA aa Aa4. Aa AA ½AA + ½Aa5. Aa Aa ¼AA + ½Aa + ¼aa 6. Aa aa ½Aa + ½aa7. aa AA Aa8. aa Aa ½Aa + ½aa9. aa aa aa

Page 16: PCB5065 Advanced Genetics Population Genetics and

Mendelian inheritance at the population level• A population, a group of individuals, may contain all these nine

combinations, weighted by the mating frequencies. • Genotype frequencies: AA, PAA(t); Aa, PAa(t); aa, Paa(t)

Cross Mating freq. (t) Mendelian segreg. ratio (t+1)AA Aa aa

1. AA AA PAA(t)PAA(t) 1 0 0

2. AA Aa PAA(t)PAa(t) ½ ½ 0

3. AA aa PAA(t)Paa(t) 0 1 0

4. Aa AA PAa(t)PAA(t) ½ ½ 0

5. Aa Aa PAa(t)PAa(t) ¼ ½ ¼

6. Aa aa PAa(t)Paa(t) 0 ½ ½

7. aa AA Paa(t)PAA(t) 0 1 0

8. aa Aa Paa(t)PAa(t) 0 ½ ½

9. aa aa Paa(t)Paa(t) 0 0 1

Page 17: PCB5065 Advanced Genetics Population Genetics and

PAA(t+1) = 1[PAA(t)]2 + ½ 2[PAA(t)PAa(t)] + ¼[PAa(t)]2 = [PAA(t) + ½PAa(t)]2

Similarly, we havePaa(t+1) = [Paa(t) + ½PAa(t)]2

PAa(t+1) = 2[PAA(t) + ½PAa(t)][Paa(t) + ½PAa(t)]

Therefore, we have[PAa(t+1)]2 = 4PAA(t+1)Paa(t+1)

Furthermore, if random mating continues, we havePAA(t+2) = [PAA(t+1) + ½PAa(t+1)]2 = PAA(t+1)PAa(t+2) = 2[PAA(t+1) + ½PAa(t+1)][Paa(t+1) + ½PAa(t+1)] = PAa(t+1)Paa(t+2) = [Paa(t+1) + ½PAa(t+1)]2 = Paa(t+1)

Page 18: PCB5065 Advanced Genetics Population Genetics and

(1) Genotype (and allele) frequencies are constant from generation to generation,

(2) Genotype frequencies = the product of the allele frequencies, i.e., PAA = pA

2, PAa = 2pApa, Paa = pa2

For a population at Hardy-Weinberg disequilibrium (HWD), we have• PAA = pA

2 + D• PAa = 2pApa – 2D• Paa = pa

2 + D

The magnitude of D determines the degree of HWD.• D = 0 means that there is no HWD.• D has a range of max(-pA

2 , -pa2) D pApa

Concluding remarks

A population with [PAa(t+1)]2 = 4PAA(t+1)Paa(t+1) is said to be in Hardy-Weinberg equilibrium (HWE). The HWE population has the following properties:

Page 19: PCB5065 Advanced Genetics Population Genetics and

Chi-square test for HWE

• Whether or not the population deviates from HWE at a particular locus can be tested using a chi-square test.

• If the population deviates from HWE (i.e., Hardy-Weinberg disequilibrium, HWD), this implies that the population is not randomly mating. Many evolutionary forces, such as mutation, genetic drift and population structure, may operate.

Page 20: PCB5065 Advanced Genetics Population Genetics and

Example 1AA Aa aa Total

Obs 224 64 6 294

Exp n(pA2) = 222.9 n(2pApa) = 66.2 n(pa

2) = 4.9 294

Test statisticsx2 = (obs – exp)2 /exp = (224-222.9)2/222.9 + (64-66.2)2/66.2 +

(6-4.9)2/4.9 = 0.32is less than

x2df=1 ( = 0.05) = 3.841

Therefore, the population does not deviate from HWE at this locus.

Why the degree of freedom = 1? Degree of freedom = the number of parameters contained in the alternative hypothesis – the number of parameters contained in the null hypothesis. In this case, df = 2 (pA or pa and D) – 1 (pA or pa) = 1

Page 21: PCB5065 Advanced Genetics Population Genetics and

Example 2AA Aa aa

Total Obs 234 36 6 276

Exp n(pA2) n(2pApa) n(pa

2) = 230.1 = 43.8 = 2.1 276

Test statisticsx2 = (obs – exp)2/exp = (234-230.1)2/230.1 + (36-

43.8)2/43.8 + (6-2.1)2/2.1 = 8.8

is greater than x2df=1 ( = 0.05) = 3.841

Therefore, the population deviates from HWE at this locus.

Page 22: PCB5065 Advanced Genetics Population Genetics and

Linkage disequilibrium• Consider two loci, A and B, with alleles A, a and B,

b, respectively, in a population• Assume that the population is at HWE• If the population is at Hardy-Weinberg equilibrium,

we have

Gene A Gene B

AA: PAA = pA2

BB: PBB = pB2

Aa: PAa = 2pApa Bb: PBb = 2pBpb

Aa: Paa = pa2 bb: Pbb = pb

2

PAA+PAa+Paa = 1 PBB+PBb+Pbb=1

pA + pa = 1 pB + pb = 1

Page 23: PCB5065 Advanced Genetics Population Genetics and

But the population is at Linkage Disequilibrium (for a pair of loci). Then we have

• Two-gene haplotype AB: pAB = pApB + DAB

• Two-gene haplotype Ab: pAb = pApb + DAb

• Two-gene haplotype aB: paB = papB + DaB

• Two-gene haplotype ab: pab = papb + Dab

pAB+pAb+paB+pab = 1

Dij is the coefficient of linkage disequilibrium (LD) between the two genes in the population. The magnitude of D reflects the degree of LD. The larger D, the stronger LD.

Page 24: PCB5065 Advanced Genetics Population Genetics and

pA = pAB+pAb = pApB + DAB + pApb + DAb = pA+DAB+DAb DAB = -DAb

pB = pAB+paB = pB+DAB+DaB DAB = -DaB

pb = pAb+pab = pb+DaB+Dab Dab = -DaB

Finally, we have DAB = -DAb = -DaB = Dab = D.

Re-write four two-gene haplotype frequncies• AB: pAB = pApB + D• Ab: pAb = pApb – D• aB: paB = papB – D• ab: pab = papb + D

D = pABpab - pAbpaB

D = 0 the population is at the linkage equilibrium

Page 25: PCB5065 Advanced Genetics Population Genetics and

How does D transmit from one generation (1) to the next (2)?

D(2) = (1-r)1 D(1)

D(t+1) = (1-r)t D(1)

t, D(t+1) r

Page 26: PCB5065 Advanced Genetics Population Genetics and

Conclusions:

- D tends to be zero at the rate depending on the recombination fraction.

- Linkage equilibrium PAB = pApB is approached gradually and without oscillation.

- The larger r, the faster is the rate of convergence, the most rapid being (½)t for unlinked loci (r=0.5).

Page 27: PCB5065 Advanced Genetics Population Genetics and

D(t) = (1-r)tD(0)D(t)/D(0) = (1-r)t

The ratio D(t)/D(0) describes the degree with which LD decays with generation.

Page 28: PCB5065 Advanced Genetics Population Genetics and

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5

r

D(t)

/D(0

)

t=2

t=20

t=200

The plot of the ratio D(t)/D(0) against r tells us the evolutionary history of a population – implications for population and evolutionary genetics.

Page 29: PCB5065 Advanced Genetics Population Genetics and

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 10 12

t

D(t

)/D

(0)

r=0.001

r=0.01

r=0.1

r=0.2

r=0.5

The plot of the ratio D(t)/D(0) against t tells us the degree of linkage – Implications for high-resolution mapping of human diseases and other complex traits

Page 30: PCB5065 Advanced Genetics Population Genetics and

Proof to D(t+1) = (1-r)1 D(t)

• The four gametes randomly unite to form a zygote. The proportion 1-r of the gametes produced by this zygote are parental (or nonrecombinant) gametes and fraction r are nonparental (or recombinant) gametes. A particular gamete, say AB, has a proportion (1-r) in generation t+1 produced without recombination. The frequency with which this gamete is produced in this way is (1-r)pAB(t).

• Also this gamete is generated as a recombinant from the genotypes formed by the gametes containing allele A and the gametes containing allele B. The frequencies of the gametes containing alleles A or B are pA(t) and pB(t), respectively. So the frequency with which AB arises in this way is rpA(t)pB(t).

• Therefore the frequency of AB in the generation t+1 ispAB(t+1) = (1-r)pAB(t) + rpA(t)pB(t)

By subtracting is pA(t)pB(t) from both sides of the above equation, we have

D(t+1) = (1-r)1 D(t)

WhenceD(t+1) = (1-r)t D(1)

Page 31: PCB5065 Advanced Genetics Population Genetics and

Estimate and test for LDAssuming random mating in the population, we have joint probabilities of the two genes

BB (PBB) Bb (PBb) bb (Pbb)_______________________________________________________________________________________AA (PAA) pAB

2 2pABpAb pAb2

n22 n21 n20

Aa (PAa) 2pABpaB 2(pABpab+pAbpaB) 2pAbpab

n12 n11 n10

aa (Paa) paB2 2pAbpab pab

2

n02 n01 n00

________________________________________________________________________________________

Multinomial pdfH1: D 0 log f(pij|n)= log n!/(n22!…n00!) + n22 log pAB

2 + n21log (2pABpAb) + n20 log pAb2

+ …Estimate pAB, pAb, paB (pab = 1-pAB-pAb-paB) pA, pB, D

H0: D = 0log f(pi,pj|n)= log n!/(n22!…n00!) + n22log(pApB)2 + n21log(2pA

2pBpb)+n20log(pApb)2

+ …

Estimate pA and pB.

Page 32: PCB5065 Advanced Genetics Population Genetics and

Chi-square Test of Linkage Disequilibrium (D) 

Test statistic

 x2 = 2nD2/(pApapBpb)  is compared with the critical threshold value obtained

from the chi-square table x2df=1 (0.05). n is the number

of individuals in the population. 

If x2 < x2df=1 (0.05), this means that D is not significantly

different from zero and that the population under study is in linkage equilibrium.

 If x2 > x2df=1 (0.05), this means that D is significantly

different from zero and that the population under study is in linkage disequilibrium.

Page 33: PCB5065 Advanced Genetics Population Genetics and

Example

 (1) Two genes A with allele A and a, B with alleles B and b, whose population

frequencies are denoted by pA, pa (=1- pA) and pB, pb (=1- pb), respectively

(2) These two genes are associated with each other, having the coefficient of linkage disequilibrium D

 

Four gametes are observed as follows:

 

Gamete AB Ab aB ab Total

Obs 474 611 142 773 2n=2000Gamete frequency pAB pAb paB pab

=474/2000 =611/2000 =142/2000 =773/2000

=0.237 =0.305 =0.071 =0.386 1

Page 34: PCB5065 Advanced Genetics Population Genetics and

Estimates of allele frequencies

pA = pAB + pAb = 0.237 + 0.305 = 0.542

pa = paB + pab = 0.071 + 0.386 = 0.458

 pB = pAB + paB = 0.237 + 0.071 = 0.308

pb = pAb + pab = 0.305 + 0.386 = 0.692The estimate of DD = pABpab – pAbpaB = 0.237 0.386 – 0.305 0.071 = 0.0699

Test statistics

x2 = 2nD2/ (pApapBpb) =210000.06992/(0.5420.4580.3080.692) = 184.78 is greater than x2

df=1 (0.05) = 3.841.Therefore, the population is in linkage disequilibrium at these two genes under consideration.

Page 35: PCB5065 Advanced Genetics Population Genetics and

A second approach for calculating x2: Gamete AB Ab aB ab TotalObs 474 611 142 773 2n=2000

Exp 2n(pApB) 2n(pApb) 2n(papB) 2n(papb)

=334.2 =750.8 =281.8 =633.2 2000

  x2 = (obs – exp)2 /exp = (474-334.2)2/334.2 + (611-750.8)2/750.8 + (142-281.8)2/281.8 + (773-633.2)2/633.2

= 184.78

= 2nD2/ (pApapBpb)

Page 36: PCB5065 Advanced Genetics Population Genetics and

Measures of linkage disequilibrium

 

(1) D, which has a limitation that its value depends on

the allele frequencies

 

D = 0.02 is considered to be• large for two genes each with diverse allele

frequencies, e.g., pA = pB = 0.9 vs. pa = pb = 0.1

• small for two genes each with similar allele frequencies, e.g., pA = pB = 0.5 vs. pa = pb = 0.5

Page 37: PCB5065 Advanced Genetics Population Genetics and

(2) To make a comparison between gene pairs with different allele frequencies, we need a new normalized measure.

 The range of LD is

 max(-pApB, -papb) D min(pApb, papB)  The normalized LD (Lewontin 1964) is defined as

 D' = D/ Dmax,

 where Dmax is the maximum that D can have, which is

 Dmax = max(-pApB, -papb) if D < 0,

or min(pApb, papB) if D > 0.  

For the above example, we have D' = 0.0699/min(pApb, papB) = 0.0699/min(0.375, 0.141) = 0.496

Page 38: PCB5065 Advanced Genetics Population Genetics and

(3) Linkage disequilibrium measured as the correlation between the A and B alleles

 

R = D/(pApapBpb), r: [-1, 1] Note: x2= 2nR2 follows the chi-square distribution

with df = 1 under the null hypothesis of D = 0. For the above example, we have

R = 0.0699/(pApbpapB) = 0.3040. 

Page 39: PCB5065 Advanced Genetics Population Genetics and

Application of LD analysis D(t+1) = (1-r)tD(t),

This means that when the population undergoes random mating, the LD decays exponentially in a proportion related to the recombination fraction.

 (1)   Population structure and evolution  Estimating D, D' and R the mating history of population 

The larger the D’ and R estimates, the more likely the population in nonrandom mating, the more likely the population to have a small size, the more likely the population to be affected by evolutionary forces.

 

Page 40: PCB5065 Advanced Genetics Population Genetics and

Human origin studies based on LD analysis

 Reich, D. E., M. Cargill, S. Bolk, J. Ireland, P. C. Sabeti, D. J. Richter, T. Lavery,

R. Kouyoumjian, S. F. Farhadian, R. Ward and E. S. Lander, 2001 Linkage disequilibrium in the human genome. Nature 411: 199-204.

 

Dawson, E., G. R. Abecasis, S. Bumpstead, Y. Chen et al. 2002 A first-generation linkage disequilibrium map of human chromosome 22. Nature 418: 544-548.

Page 41: PCB5065 Advanced Genetics Population Genetics and

LD curve for Swedish and Yoruban samples. To minimize ascertainment bias, data are only shown for marker comparisons involving the core SNP. Alleles are paired such that D' > 0 in the Utah population. D' > 0 in the other populations indicates the same direction of allelic association and D' < 0 indicates the opposite association. a, In Sweden, average D' is nearly identical to the average |D'| values up to 40-kb distances, and the overall curve has a similar shape to that of the Utah population (thin line in a and b). b, LD extends less far in the Yoruban sample, with most of the long-range LD coming from a single region, HCF2. Even at 5 kb, the average values of |D'| and D' diverge substantially. To make the comparisons between populations appropriate, the Utah LD curves are calculated solely on the basis of SNPs that had been successfully genotyped and met the minimum frequency criterion in both populations (Swedish and Yoruban) (Reich,te al. 2001)

Page 42: PCB5065 Advanced Genetics Population Genetics and

(2) Fine mapping of disease genes

The detection of LD may imply that the recombination fraction between two genes is small and therefore closer (given the assumption that t is large).

Page 43: PCB5065 Advanced Genetics Population Genetics and

Inbreeding • Individuals that are related to each other by ancestry

are called relatives;• Mating between relatives is called inbreeding;• The consequence of inbreeding is to increase the

frequency of homozygous genotypes in a population, relative to the frequency that would be expected with random mating (Hartl 1999).

The closed degree of inbreeding --     In most human societies: first-cousin mating     In many plants: self-fertilization

Page 44: PCB5065 Advanced Genetics Population Genetics and

Genotype frequencies with inbreeding Gene A, with two alleles A and a, in a self-fertilizing () population of plants,

for example, rice or Arabdopsis  

AA Aa aaGeneration 1 1/4 1/2 1/4  

Generation 2 PAA=1/4×1 +1/2×1/4 PAa=1/2×1/2 Paa=1/2×1/4+1/4×1

= 3/8 =2/8 = 3/8 

Randomly mating P0AA =1/4 P0Aa =1/2 P0aa =1/4

  The effect of inbreeding is to increase the frequency of homozygous genotypes

AA and aa, but reduce the frequency of heterozygous genotype Aa.

Page 45: PCB5065 Advanced Genetics Population Genetics and

We define 

F = (PAa – P0Aa)/ P0Aa

 as the inbreeding coefficient. Biologically, F measures the degree with which heterozygosity is reduced due to inbreeding, measured as a fraction relative to heterozygosity expected in a random-mating population.

 Consider an inbred population, in which the actual frequency of heterozygote is written as

 

PAa = P0Aa – P0AaF = 2pApa – 2pApaF,

 

with P0Aa = 2pApa at random mating. Because pA = PAA + 1/2PAa and pa = Paa + 1/2PAa, we have

 

PAA = pA – 1/2PAa = pA – 1/2(2pApa – 2pApaF) = pA2 + pApaF,

Paa = pa – 1/2PAa = pa – 1/2(2pApa – 2pApaF) = pa2 + pApaF

Page 46: PCB5065 Advanced Genetics Population Genetics and

Further, we have 

PAA = pA2(1-F) + pAF

PAa = 2pApa(1-F),

Paa = pa2(1-F) + paF,

 Concluding remarks (1)   The genotype frequencies equal the HWE frequencies

multiplied by the factor 1 – F, plus a correction term for the homozygous genotype frequencies multiplied by the factor F;(2)   When F = 0 (no inbreeding), the genotype frequencies are the HWE. When F = 1 (complete inbreeding), the population consists entirely of homozygotes AA and aa.

Page 47: PCB5065 Advanced Genetics Population Genetics and

Identical by descent (IBD)     Identical by descent (IBD) means two genes that have originated from the replication of one single \ gene in a previous population.    The coefficient of inbreeding is the probability that

the two alleles at any locus in an individual are identical by descent (it expresses the degree of relationship between the individual’s parents).     If the two alleles in an individual are IBD, the genotype at the locus is said to be autozygous    If they are not IBD, the genotype is said to be allozygous.

Page 48: PCB5065 Advanced Genetics Population Genetics and

AA Aa 

Aa Aa AA aa AA Aa Aa aa Aa 

AA AA Aa Allozygous Autozygous Autozygous homozygote homozygote heterozygote

pA2(1-F) + pAF

 

Page 49: PCB5065 Advanced Genetics Population Genetics and

In general

 

Allozygous Autozygous

PAA = pA2(1-F) + pAF

PAa = 2pApa(1-F) + 0

Paa = pa2(1-F) + paF

Page 50: PCB5065 Advanced Genetics Population Genetics and

Calculation of the inbreeding coefficient from pedigree• A pedigree initiated with a common ancestor A through B, C and D, E to I• How to calculate the coefficient of inbreeding for individual I (FI)?

  1/2(1+FA) 

A  

B C 

pB→D pC→E

 D E

  pD→I pE→I

  I

Page 51: PCB5065 Advanced Genetics Population Genetics and

The common ancestor A generates two gametes G1 and G2 during meiosis, but only transmits one gamete for its first offspring B and one gamete for its second offspring C.

A pair of gametes contributed to offspring B and C by A may be G1G1, G1G2, G2G1, G2G2, each with a probability of 1/4 because of Mendelian segregation.

           For G1G1 and G2G2, the alleles are clearly IBD,          For G1G2 and G2G1, the alleles are IBD only if G1 and

G2 are IBD, and G1 and G2 are IBD only if individual A is

autozygous, which has probability FA (the inbreeding

coefficient of A) 

The probability for A to generate IBD alleles for B and D is therefore 1/4 + 1/4 + 1/4FA + 1/4FA = 1/2(1 + FA).

Page 52: PCB5065 Advanced Genetics Population Genetics and

The transmission probability of an allele from other parents, B, C, D, E to their own specified offspring is, based on Mendelian segregation,

 

pB→D = pC→E = pD→I = pE→I =1/2

 

Finally, the probability that the two alleles at any locus in individual I are identical by descent is

 

FI = 1/2 (1 + FA) × pB→D × pC→E × pD→I × pE→I

= (1/2)5(1 + FA)

Page 53: PCB5065 Advanced Genetics Population Genetics and

Evolutionary Forces – The Causes of Evolution

Page 54: PCB5065 Advanced Genetics Population Genetics and

For a Hardy-Weinberg equilibrium (HWE) population, the genotype frequencies will remain unchanged from generation to generation. Two questions may arise that concern HWE.

 (1)   Do such HWE populations exist in nature?(2)   More importantly, if a population had

unchanged genotype frequencies over time, it should be in a stationary status. Thus, wild type teosinte would always be teosinte and never change. But what have made teosinte become cultivar maize (see the figure above)?

Page 55: PCB5065 Advanced Genetics Population Genetics and

First of all, no HWE population exists in nature because many evolutionary forces may operate in a population, which cause the genotype frequencies in the population to change.

 Secondly, even if a population is at HWE, this

equilibrium may be quickly violated because of some particular evolutionary forces.

  These so-called evolutionary forces that cause the

structure and organization of a population to change include mutation, selection, admixture, division, migration, genetic drift… Next, we will talk about the roles of some of these evolutionary forces in shaping a population.

Page 56: PCB5065 Advanced Genetics Population Genetics and

Mutation      Mutation is a change in genetic material, including nucleotides substitution, insertions and deletions, and chromosome rearrangements     Mutation has different types, forward mutation and reversible mutation Forward mutation   Consider a gene A with two alleles A and a, with allele

frequencies pA(t) and pa(t) in generation t    Allele A is mutating to allele a, with the mutation rate per generation denoted by u    Forward mutation is a process in which the mutating allele is

the prevalent wild type allele

Page 57: PCB5065 Advanced Genetics Population Genetics and

With the definition of mutation rate u (a fraction u of A alleles undergo mutation and become a alleles, whereas a fraction 1-u of A alleles escape mutation and remain A), we have allele frequency in the next generation t+1

 

pA(t+1) = pA(t) – pA(t)u = (1-u) pA(t). In general, we have 

pA(t+1) = (1-u) pA(t) = (1-u)2pA(t-1) = …

= (1-u)t+1pA(0).

Page 58: PCB5065 Advanced Genetics Population Genetics and

Assuming that the initial population is nearly fixed for A, i.e., pA(0) ≈ 1, and that t+1 is not too large relative to 1/u, we can approximate the allele frequencies by

 

pA(t+1) ≈ pA(0) – (t+1)u,

pa(t+1) ≈ pa(0) + (t+1)u. • The frequency of the mutant a allele increases

linearly with time and the slope of the line equals u.• Because u is small, the linear increase in pa is

difficult to detect unless a very large population size is used.

Page 59: PCB5065 Advanced Genetics Population Genetics and

Reversible mutation 

Reversible mutation allows the mutation from A to a (at the rate u per generation) and from a to A (at the rate v per generation).

 

Thus, allele A can have two origins in any generation:

  One being allele A in the previous generation that escaped mutation to allele a

  The second being reversibly mutated from allele a in the previous generation

Page 60: PCB5065 Advanced Genetics Population Genetics and

The allele frequency in the current generation is therefore expressed as

pA(t+1) = (1-u)pA(t) + vpa(t) = (1-u-v)pA(t) + v 

pA(t+1) – v/(u+v) = (1-u-v)pA(t) + v - v/(u+v)

= (1-u-v)pA(t) + (uv+v2-v)/(u+v)

= [pA(t) – v/(u+v)](1-u-v)

= [(1-u)tpA(0) – v/(u+v)](1-u-v) = [pA(0) – v/(u+v)](1-u-v)t+1

  

Page 61: PCB5065 Advanced Genetics Population Genetics and

If pA(0) = v/(u+v), we have

pA(1) = pA(2) = … = pA(t+1) = v/(u+v)

We define 

pA = v/(u+v) as an equilibrium frequency (irrespective of the starting

frequencies). To reach this equilibrium, it needs to take a long time for

realistic values of the mutation rates. 

Page 62: PCB5065 Advanced Genetics Population Genetics and

Admixture 

• Admixture is an evolutionary process in which two or more HWE populations with differing allele frequencies are mixed to produce a new population.

  • The consequence of admixture is the

deficiency of heterozygous genotypes relative to the frequency expected with HWE for the average allele frequencies

Page 63: PCB5065 Advanced Genetics Population Genetics and

Consider gene A with two alternative alleles A and a

Subpopulation 1 (HWE) Subpopulation 2 (HWE)

AA Aa aa AA Aa aa

pA2 2pApa pa

2 p’A2 2p’Ap’a p’a

2

 Admixture

 Admixed population, mixed population, metapopulation, aggregate population (HWD)

AA Aa aa

(pA2 + p′A

2)/2 (2pApa + 2p’Ap’a)/2 (pa2 + p’a

2)/2

 

Random mating

  

Fused population, total population (HWE)

 

AA Aa aa

2pˉApˉa

 

2ap2

Ap

Page 64: PCB5065 Advanced Genetics Population Genetics and

After admixture, the allele frequencies are changed as   

We find 

(pA2 + p’A

2)/2 (metapopulation) (pA

2 + p’A2)/2 - (pA- p’A)2/4

= (pA2 + p’A

2)/2 + 2pAp’A/4 - (pA2 + p’A

2)/4

= (pA2 + p’A

2)/4 + 2pAp’A/4

= (pA + p’A)2/4

= p-A

2 (HWE) 

)/2p'(pp

)/2p'(pp

aaa

AAA

Page 65: PCB5065 Advanced Genetics Population Genetics and

(pa2 + p’a

2)/2 (metapopulation) (pa

2 + p’a2)/2 - (pa – p’a)

2/4

= (pa2 + p’a

2)/2 + 2pap’a/4 - (pa2 + p’a

2)/4

= (pa2 + p’a

2)/4 + 2pap’a/4

= (pa + p’a)2/4

= p-a2 (HWE)

 

pApa + p’Ap’a (metapopulation) pApa + p’Ap’a + (pA – p’A)(p’a - pa)/2

= pApa + p’Ap’a + (pAp’a + p’Apa - pApa – p’Ap’a)/2

= (pApa + p’Ap’a + pAp’a + p’Apa)/2

= (pA + p’A)(pa + p’a)/2

= 2q-Aq-

a (HWE)

Page 66: PCB5065 Advanced Genetics Population Genetics and

Discovery 1It can be seen that genotype frequencies are not equal to the products of the allele frequencies for the admixed population so that the mixed population is not in HWE.

 Discovery 2

Relative to an HWE population, the aggregate population contains too few heterozygous genotypes and too many homozygous genotypes.

Page 67: PCB5065 Advanced Genetics Population Genetics and

Define the variance in allele frequency (in terms of recessive alleles) among the subpopulation by 2.

  Value Frequncy

Supopulation 1pa n

Supopulation 2p’a n’ = n

Mean p-a

 Based on the definition of variance, we have

 2 = [(pa - p-a)

2 + (p’a - p-a)

2]/2

= (pa2 + p’a

2)/2 + p-a2 - pap

-a – p’ap

-a

= (pa2 + p’a

2)/2 + p-a2 – 2p-

a[(pa+p’a)/2]

= (pa2 + p’a

2)/2 - p-a2

Page 68: PCB5065 Advanced Genetics Population Genetics and

2 is actually the difference between the genotype frequencies (RS) in the

metapopulation (equal to the average genotype frequencies among the subpopulations) and the genotype frequencies (RT) that would be expected

in a total population in HWE., i.e.,

 

2 = RS - RT 0, so RS = RT + 2 RT

Page 69: PCB5065 Advanced Genetics Population Genetics and

Discovery 3

The average frequency of homozygous recessive genotypes among a group of subpopulations is always greater than the frequency of homozygous recessive genotypes that would be expected with random mating, and excess is numerically equal to the variance in the recessive allele frequency.

The relationship RS = RT + 2 RT is called Wahlund’s principle

Page 70: PCB5065 Advanced Genetics Population Genetics and

Example: Two subpopulations of gray squirrels 

For the recessive allele, we have pa = 0.16, p’a = 0 The genotype frequency in the metapopulation is

(0.16 + 0)/2 = 0.08The allele frequency in the metapopulation is

(0.16 + 0)/2 = 0.2The frequency of the homozygous recessive genotype in the

HWE total population is0.22 = 0.04 < 0.08

 The variance in allele frequency is(0.16 – 0.2)2 + (0 – 0.2)2 = 0.04, which equals the reduction in the frequency of the homozygous recessive.

Page 71: PCB5065 Advanced Genetics Population Genetics and

Population structure 

Similar to 2 = RS – RT = (pa2 + p’a

2)/2 - p-a2 for

homozygous recessive genotypes, we have 

2 = DS – DT = (pA2 + p’A

2)/2 - p-A

2

 for homozygous dominant genotypes. For heterozygous genotypes, we have 

HS – HT = -22

Page 72: PCB5065 Advanced Genetics Population Genetics and

Recall the definition of the inbreeding coefficient

F = (P0AA - PAA)/ P0AA (describe the deficiency of heterozygous genotypes in an inbred population, relative to a population in HWE).

 We define

 FST = (HT – HS)/HT, as the fixation index in the metapopultion.

Metapopulation ≈ inbred population

Page 73: PCB5065 Advanced Genetics Population Genetics and

Redefine

 

FST = 2/ p-Ap-

a.

 

This is a fundamental relation in population genetics that connects the fixation index in a metapopulation with the variance in allele frequencies among the subpopulations. The fixation index can be interpreted in terms of the inbreeding coefficient. Thus, the genotype frequencies in a metapopulation are expressed as

AA: p-A

2 + p-Ap-

aFST = p-A

2(1-FST) + p-AFST

Aa: 2p-Ap-

a - 2p-Ap-

a FST = 2p-Ap-

a(1-FST)

aa: p-a2 + p-

Ap-aFST = p-

a2(1-FST) + p-

aFST

Page 74: PCB5065 Advanced Genetics Population Genetics and

Remarks

• Even though each subpopulation itself is undergoing random mating and is in HWE, there is inbreeding in the metapopulation composed of the aggregate of subpopulations.

• A metapopulation may be composed of many smaller subpopulations each of which may be in HWE (theory for population structure).

Page 75: PCB5065 Advanced Genetics Population Genetics and

Natural Selection

• Selection is the principal process that results in greater adaptation of organisms to their environment

• Through selection the genotypes that are superior in survival and reproduction increase in frequency in the population

Page 76: PCB5065 Advanced Genetics Population Genetics and

Haploid selection: selection at the gamete level

Two alleles A and a, with initial frequencies pA and pa

Haploid progeny (reproduction) 10 A (pA=1/2) 10 a (pa=1/2)

Maturation

Survival (Adults) 9 A 6 aViability (or Absolute fitness) 9/10=0.90 6/10=0.60

Relative fitness wA=0.90/0.90=1 wa=0.60/0.90= 0.67

Selection coefficient 0 s=1–0.67=0.33

New frequencies p’A= 9/15 p’a=6/15

Haploid progeny (reproduction) 12 A 8 a

Page 77: PCB5065 Advanced Genetics Population Genetics and

• Viability or survivorship: the probability of survival, which is also called fitness.

• Fitness has two types: Absolute fitness separately for each genotype and relative fitness (the ability of one genotype to survive relative to another genotype taken as a standard)

• It is impossible to measure absolute fitness because it requires knowing the absolute number of each genotype, whereas relative fitness can be measured by the sampling approach

• Selection coefficient: 1 – relative fitness

Page 78: PCB5065 Advanced Genetics Population Genetics and

In general, the new frequency for allele A is expressed as

In the above example, pA = pa = ½, wA = 1, wa = 2/3, and s =1/3, we have p’A = 1/2/(1-1/21/3) = 3/5 = 9/15.

sp-1

p

s)-(1pp

p

/wwpp

p

wpwp

wpp

a

A

aA

A

AaaA

A

aaAA

AAA

sp-1

s)-(1pp

a

aa

Page 79: PCB5065 Advanced Genetics Population Genetics and

(0)s)p(1(0)p

(0)p)1(p

aA

AA

(0)ps)(1(0)p

(0)p

(1)s)p(1(1)p

(1)p)2(p

a2

A

A

aA

AA

(0)ps)(1(0)p

(0)p(t)p

at

A

AA

.

Page 80: PCB5065 Advanced Genetics Population Genetics and

By the method of successive substitutions, we have

tss )1(

1

(0)p

(0)p...

1

1

1)-(tp

1)-(tp

(t)p

(t)p

a

A

a

A

a

A

Page 81: PCB5065 Advanced Genetics Population Genetics and

Taking the natural logarithm at both sides of the above equation, we have

(for a not-too-large s)

• If s is not too large, ln(pA/pa) should be linear with time with a slope equal to the value of s.

• This is one approach by which the selection coefficient can be estimated

st(0)p

(0)pln s)ln(1t

(0)p

(0)pln

(t)p

(t)pln

a

A

a

A

a

A

Page 82: PCB5065 Advanced Genetics Population Genetics and

Example: E. coliGeneration ln(pA/pa)0 0.345 0.5310 1.0120 1.4725 1.4730 1.1035 1.50

Using the linear regression model ln[pA(t)/pa(t)] = ln[pA(0)/pa(0)] + st, we estimate

ln(pA/pa) = 0.52 + 0.0323t (Hartl and Dykhuizen 1981).

Page 83: PCB5065 Advanced Genetics Population Genetics and

Diploid selection: selection at the zygote levelTwo alleles A and a, with initial frequencies pA = ½ and pa = ½

Zygote 5 AA 10 Aa 5 aa

Maturation

Survival (Adults) 5 AA 8 Aa 3 aaAbsolute fitness 5/5= 1 8/10=0.8 3/5=0.6Relative fitness wAA=1 wAa=0.8/1=0.80 waa=0.6/1=0.6Selection coefficient 0 hs=1–0.80=0.20 s=1-0.60=0.40

New frequencies p’A= (25+8)/[2(5+8+3)]=18/32 p’a=(32+8)/[2(5+8+3)]=14/32

Random mating with HWE leads toAA: PAA = (18/32)220 = 6Aa: PAa = 2(18/32)(14/32)20 = 10Aa: Paa = (14/32)220 = 4

Page 84: PCB5065 Advanced Genetics Population Genetics and

Define h = hs/s as the degree of dominance of allele a. We have

• h = 0 means that a is recessive to A,

• h = ½ means that the heterozygous fitness is the arithmetic average of the homozygous fitnesses; in this case, the effects of the alleles are said to be additive effects

• h = 1 means that allele a is dominant to allele A.

• It is possible that h < 0 or h > 1.

Page 85: PCB5065 Advanced Genetics Population Genetics and

In general, the allele frequencies in the next generation after diploid selection are expressed as

where the dominator is the average fitness in the population, symbolized by

aa2aAaaAAA

2A

AaaAAA2A

A wpwpp2wp

wppwpp

aa2aAaaAAA

2A wpwpp2wp w

Page 86: PCB5065 Advanced Genetics Population Genetics and

This equation has no analytical solution, and for this reason it is more useful to calculate the difference

w

])w(wp)w(w[pppppΔp aaAaaAaAAAaA

AAA

Page 87: PCB5065 Advanced Genetics Population Genetics and

Example

• In the initial population, PAA = 0, PAa = 2/3, Paa = 1/3, so we have pA = 1/3 and pa = 2/3. The fitness is measured, wAA = 0, wAa = 0.50 and waa = 1.

• In the second generation, we expect

p’A = [(1/3)20 + (1/3)(2/3)0.50]/

[(1/3)20+2(1/3)(2/3)0.50+(2/3)21]

=1/6.

Page 88: PCB5065 Advanced Genetics Population Genetics and

Time required for changes in gene frequency

With the selection coefficient (s), the degree of dominance (h) and 1 (if selection is weak), the difference in allele frequency can be expressed as

pA = pApas[pAh + pa(1-h)].

w

Page 89: PCB5065 Advanced Genetics Population Genetics and

The time t required for the allele frequency of A to change from pA(0) to pA(t) can be determined in each of the three following special cases:

1. Allele A is a favored dominant, in which case h = 0 and pA = pApa

2s, i.e.,

,

In the special case, pa(0) = pa(t) = 1, we have

t (1/s)ln[pA(t)/pa(t)].

whose integral is

st(0)p

1

(0)p

(0)pln

(t)p

1

(t)p

(t)pln

aa

A

aa

A

(t)s(t)ppdt

dp 2aA

A

Page 90: PCB5065 Advanced Genetics Population Genetics and

2. Allele A is a favored and the alleles are additive, in which case h = 1/2 and pA

= pApas/2, i.e.,

whose integral is

In the special case, pa(0) = pa(t) = 1, we have

t (2/s)ln[pA(t)/pa(t)].

2

spp

dt

dp aAA

t2

s

(0)p

(0)pln

(t)p

(t)pln

a

A

a

A

Page 91: PCB5065 Advanced Genetics Population Genetics and

3. Allele A is a favored recessive, in which case h = 1 and pA = pA

2pas, i.e.,

whose integral is

sppdt

dp 2Aa

A

st(0)p

1

(0)p

(0)pln

(t)p

1

(t)p

(t)pln

Aa

A

Aa

A

Page 92: PCB5065 Advanced Genetics Population Genetics and

ImplicationIf selection is operating on a rare harmful recessive allele (say a), what is the consequence?

• This is the case when allele A is a favored dominant, pA = pApa

2s and pa 0, pa2 0.

• Even if the selection coefficient s is very large, pA still change little.

• In other words, the change in allele frequency of a rare harmful recessive is slow whatever the value of the selection coefficient.

• In humans, the forced sterilization of rare homozygous recessive individuals is not genetically sound, although it is also not morally accepted.

Page 93: PCB5065 Advanced Genetics Population Genetics and

Other evolutionary forces

• Migration: The movement of individuals among subpopulations

• Random genetic drift: Fluctuations in allele frequency that happen by chance, particularly in small populations, as a result of random sampling among gametes

• Mutation-selection balance: Selection and mutation affect a population at the same time

Page 94: PCB5065 Advanced Genetics Population Genetics and

Overviews

• HWE (estimate and test)• LD (test)• Inbreeding coefficient (evolutionary significance)• IBD• Evolutionary forces

Mutation

Admixture

Population structure

Selection

Page 95: PCB5065 Advanced Genetics Population Genetics and

Discussion paper

Thornsberry, J.M., M.M. Goodman, J. Doebley, S. Kresovich, D. Nielsen, and E. S. Buckler, IV. 2001. Dwarf8 polymorphisms associate with variation in flowering time. Nature Genetics 28: 286-289.

Pritchard, J. K. 2001 Deconstructing maize population structure. Nature Genetics 28: 203-204.

Page 96: PCB5065 Advanced Genetics Population Genetics and

Quantitative geneticsMany traits that are important in agriculture, biology and biomedicine are continuous in their phenotypes. For example,

• Crop Yield• Stemwood Volume• Plant Disease Resistances • Body Weight in Animals • Fat Content of Meat• Time to First Flower • IQ • Blood Pressure

Page 97: PCB5065 Advanced Genetics Population Genetics and

The following image demonstrates the variation for flower diameter, number of flower parts and the color of the flower Gaillaridia pilchella (McClean 1997). Each trait is controlled by a number of genes each interacting with each other and an array of environmental factors.

Page 98: PCB5065 Advanced Genetics Population Genetics and

Number of Genes Number of Genotypes

1 3

2 9

5 243

10 59,049

Page 99: PCB5065 Advanced Genetics Population Genetics and

Consider two genes, A with two alleles A and a, and B with two alleles B and b.

- Each of the alleles will be assigned metric values- We give the A allele 4 units and the a allele 2 units- At the other locus, the B allele will be given 2 units and the b allele 1 unit

Genotype Ratio Metric valueAABB 1 12 AABb 2 11 AAbb 1 10 AaBB 2 10 AaBb 4 9 Aabb 2 8 aaBB 1 8 aaBb 2 7 aabb 1 6

Page 100: PCB5065 Advanced Genetics Population Genetics and

A grapical format is used to present the above results:

Page 101: PCB5065 Advanced Genetics Population Genetics and

Normal distribution of a quantitative trait may be due to

• Many genes• Environmental effects

The traditional view: polygenes each with small effect and being sensitive to environments

The new view: A few major gene and many polygenes (oligogenic control), interacting with environments

Page 102: PCB5065 Advanced Genetics Population Genetics and

Traditional quantitative genetics research: Variance component partitioning

• The phenotypic variance of a quantitative trait can be partitioned into genetic and environmental variance components.

• To understand the inheritance of the trait, we need to estimate the relative contribution of these two components.

• We define the proportion of the genetic variance to the total phenotypic variance as the heritability (H2).

- If H2 = 1.0, then the trait is 100% controlled by genetics- If H2 = 0, then the trait is purely affected by environmental factors.

Page 103: PCB5065 Advanced Genetics Population Genetics and

• Fisher (1918) proposed a theory for partitioning genetic variance into additive, dominant and epistatic components;

• Cockerham (1954) explained these genetic variance components in terms of experimental variances (from ANOVA), which makes it possible to estimate additive and dominant components (but not the epistatic component);

• I proposed a clonal design to estimate additive, dominant and part-of-epistatic variance components Wu, R., 1996 Detecting epistatic genetic variance with a clonally replicated design: Models for low- vs. high-order nonallelic interaction. Theoretical and Applied Genetics 93: 102-109.

Page 104: PCB5065 Advanced Genetics Population Genetics and

Genetic Parameters: Means and (Co)variancesOne-gene model

Genotype aa Aa AAGenotypic value G0 G1 G2

Net genotypic value -a 0 d a

origin=(G0+G1)/2a = additive genotypic valued = dominant genotypic value

Environmental deviation E0 E1 E2

Phenotype orPhenotypic value Y0=G0+E0 Y1=G1+E1 Y2=G2+E2

Genotype frequency P0 P1 P2

at HWE =q2 =2pq =p2Deviation from population mean -a - d - a -

=-2p[a+(q-p)d] = (q-p)[a+(q-p)d] = 2q[a+(q-p)d]

-2p2d +2pqd -2q2dLetting =a+(q-p)d =-2p-2p2d =(q-p)+2pqd =2q-2q2d

Breeding value -2p (q-p) 2qDominant deviation -2p2d 2pqd -2q2d

Page 105: PCB5065 Advanced Genetics Population Genetics and

Population mean = q2(-a) + 2pqd + p2a = (p-q)a+2pqd

Genetic variance 2g = q2(-2p-2p2d)2 + 2pq[(q-p)+2pqd]2 + p2(2q-2q2d)2

= 2pq2 + (2pqd)2

= 2a (or VA) + 2

d (or VD) Additive genetic variance, Dominant genetic

variance,depending on both on a and d depending only on d

Phenotypic variance 2P = q2Y0

2 + 2pqY12 + p2Y2

2 – (q2Y0 + 2pqY1 + p2Y2)2

DefineH2 = 2

g /2P as the broad-sense heritability

h2 = 2a / 2

P as the narrow-sense heritability

These two heritabilities are important in understanding the relative contribution of genetic and environmental factors to the overall phenotypic variance.

Page 106: PCB5065 Advanced Genetics Population Genetics and

What is = a+(q-p)d?It is the average effect due to the substitution of gene from one allele (A say) to the other (a).

Event A a contains two possibilities

From Aa to aa From AA to AaFrequency q pValue change d-(-a) a-d

= q[d-(-a)]+p(a-d) = a+(q-p)d

Page 107: PCB5065 Advanced Genetics Population Genetics and

Midparent-offspring correlation

____________________________________________________________________

Progeny

Genotype Freq. of Midparent AA Aa aa Mean value

of parents matings value a d -a of progeny

____________________________________________________________________

AA × AA p4 a 1 - - a

AA × Aa 4p3q ½(a+d) ½ ½ - ½(a+d)

AA × aa 2p2q2 0 - 1 - d

Aa × Aa 4p2q2 d ¼ ½ ¼ ½d

Aa × aa 4pq3 ½(-a+d) - ½ ½ ½(-a+d)

aa × aa q4 -a - - 1 -a

________________________________________________

Page 108: PCB5065 Advanced Genetics Population Genetics and

Covariance between midparent and offspring:Cov(OP¯)= E(OP¯) – E(O)E(P¯)= p4a a + 4p3q ½(a+d) ½(a+d) + … + q4 (-a)(-a) – [(p-q)a+2pqd]2

= pq2

= ½2a

 The regression of offspring on midparent values isb = Cov(OP¯)/2(P¯)

= ½2a / ½2

P

= 2a /2

P

= h2

where 2(P¯)=½2P is the variance of midparent value.

Page 109: PCB5065 Advanced Genetics Population Genetics and

IMPORTANT

The regression of offspring on midparent values can be used to measure the heritability!

This is a fundamental contribution by R. A. Fisher.

Page 110: PCB5065 Advanced Genetics Population Genetics and

You can derive other relationships

Degree of relationship Covariance____________________________________________________

Offspring and one parent Cov(OP) = 2a/2

Half siblings Cov(FS) = 2a/4

Full siblings Cov(FS) = 2a/2 + 2

a/4

Monozygotic twins Cov(MT) = 2a + 2

d

Nephew and uncle Cov(NU) = 2a/4

First cousins Cov(FC) = 2a /8

Double first cousins Cov(DFC) = 2a/4 + 2

d/16

Offspring and midparent Cov(O) = 2a/2

____________________________________________________ 

Page 111: PCB5065 Advanced Genetics Population Genetics and

Cockerham’s experimental and mating designs

• By estimating the covariances between relatives, we can estimate the additive (or mixed additive and dominant) variance and, therefore, the heritability.

• Next, I will introduce mating and experimental designs used to estimate the covariances between relatives.

Page 112: PCB5065 Advanced Genetics Population Genetics and

Mating design

• Mating design is used to generate genetic pedigrees, genetic information and materials that can be used in a breeding program

• Mating design provides genetic materials, whereas experimental design is utilized to obtain and analyze the data from these materials

Page 113: PCB5065 Advanced Genetics Population Genetics and

Objectives of mating designs

1) Provide information for evaluating parents

2) Provide estimates of genetic parameters

3) Provide estimates of genetic gains4) Provide a base population for

selection

Page 114: PCB5065 Advanced Genetics Population Genetics and

Commonly used mating designs

1) Open-pollinated2) Polycross3) Single-pair mating4) Nested mating5) Factorial mating & tester design6) Diallel mating (full, half, partial &

disconnected) 

Page 115: PCB5065 Advanced Genetics Population Genetics and

Nested mating (NC Design I)

Each of male parents is mated to a subset of different female parents

Page 116: PCB5065 Advanced Genetics Population Genetics and

Cov(HSM)=1/4VA

V(female/male) = Cov(FS) – Cov(HSM)

=1/2VA+1/4VD –1/4VA

=1/4VA +1/4VD

 - Provide information for parents and full-sib families- Provide estimates of both additive and dominance effects

- Provide estimates of genetic gains from both VA and VD

- Not efficient for selection- Low cost for controlled mating

Page 117: PCB5065 Advanced Genetics Population Genetics and

Example: Date structure for NC Design ISample Male Female Full-sib family Individual Phenotype

1 1 A 1 1 y1A1

2 1 A 1 2 y1A2

3 1 B 2 1 y1B1

4 1 B 2 2 y1B2

5 1 C 3 1 y1C2

6 1 C 3 2 y1C2

7 2 D 4 1 y2D1

8 2 D 4 2 y2D2

9 2 E 5 1 y2E1

10 2 E 5 2 y2E2

11 2 F 6 1 y2F1

12 2 F 6 2 y2F2

13 3 G 7 1 y3G1

14 3 G 7 2 y3G2

15 3 H 8 1 y3H1

16 3 H 8 2 y3H2

17 3 I 9 1 y3I1

18 3 I 9 2 y3I2

Page 118: PCB5065 Advanced Genetics Population Genetics and

Estimates by statistical softwareVTotal = 40

VFS = Cov(FS) = 10

VM = Cov(HSM) = 4

VE = VTotal – VFS = 40 – 10 = 30

V(female/male) = Cov(FS) – Cov(HSM)= 10 – 4 = 6

VA = 4Cov(HSM) = 4 × 4 = 16 h2 = 16/40 = 0.x

V(female/male) = 1/4VA +1/4VD = 4 + 1/4VD = 6

VD = 8, VG = VA + VD = 16 + 6 = 22H2 = 22/40 = 0.x

Page 119: PCB5065 Advanced Genetics Population Genetics and

Factorial mating (NC Design II)

Each member of a group of males is mated to each member of group of females

Page 120: PCB5065 Advanced Genetics Population Genetics and

Cov(HSM) =1/4 VA

Cov(HSF) =1/4 VA

 V(female male) = Cov(FS)–Cov(HSM)–Cov(HSF)

= 1/4 VD

 - Provide good information for parents and full-sib

families- Provide estimates of both additive and dominance

effects

- Provide estimates of genetic gains from both VA and VD

- Limited selection intensity- High cost

Page 121: PCB5065 Advanced Genetics Population Genetics and

Tester mating design (Factorial)

Each parent in a population is mated to each member of the testers that are chosen for a particular reason

Page 122: PCB5065 Advanced Genetics Population Genetics and

Cov(HSM)=1/4VA

Cov(HSF)=1/4VA

V(female male) = Cov(FS)–COV(HSM)-COV(HSF)

= 1/4VD

 - Provide good information for parents and full-sib families- Provide estimates of both additive and dominance effects

- Provide estimates of genetic gains from both VA and VD

- Limited selection intensity- High cost

Page 123: PCB5065 Advanced Genetics Population Genetics and

Diallel mating designFull diallel –each parent is mated with every other parent in the population, including selfs and reciprocal:

 

Page 124: PCB5065 Advanced Genetics Population Genetics and

Half diallel – each parent is mated with every other parent in the population, excluding selfs and reciprocal:

Page 125: PCB5065 Advanced Genetics Population Genetics and

Partial Diallel – selected subsets of full diallels:

 

Page 126: PCB5065 Advanced Genetics Population Genetics and

Disconnected half diallel – selected subsets of full diallels:

Page 127: PCB5065 Advanced Genetics Population Genetics and

Diallel analysis 

Cov(HS) = 1/4VA

Cov(FS) = 1/2VA + 1/4VD

Cov(FS) = Cov(FS) – 2Cov(HS) = 1/4VD

 - Provide good evaluation of parents and full-sib

families- Provide estimates of both additive and dominance

effects

- Provide estimates of genetic gains from both VA and VD

- High cost

Page 128: PCB5065 Advanced Genetics Population Genetics and

Genomic Imprinting or parent-of-origin effect

The same allele is expressed differently, depending on its parental origin

Consider a gene A with two alleles A (in a frequency p) and a (in a frequency q)

Genotype Frequency ValueAA p2 a Average effectAa pq d+i No imprinting: = a + d(q-p)

aA qp d-i Imprinting: M = a – i +d(q-p) A a

aa q2 -a P = a + i +d(q-p) A a Mean: a(p-q)+2pqd

No imprinting: g2 = 2pq2 + (2pqd)2

Imprinting: gi2 = 2pq2 + (2pqd)2 + 2pqi2

Imprinting leads to increased genetic variance for a quantitative trait and, therefore, is evolutionarily favorable.

Page 129: PCB5065 Advanced Genetics Population Genetics and

Genomic Imprinting

The callipygous animals 1 and 3 compared to normal animals 2 and 4 (Cockett et al. Science 273: 236-238, 1996)

Page 130: PCB5065 Advanced Genetics Population Genetics and

We have presented a statistical framework to genomewide scan for imprinted loci

Cui, Y. H., W. Zhao, J. M. Cheverud and R. L. Wu, Genetics

Page 131: PCB5065 Advanced Genetics Population Genetics and
Page 132: PCB5065 Advanced Genetics Population Genetics and
Page 133: PCB5065 Advanced Genetics Population Genetics and
Page 134: PCB5065 Advanced Genetics Population Genetics and

Predicting Response to Selection

Page 135: PCB5065 Advanced Genetics Population Genetics and
Page 136: PCB5065 Advanced Genetics Population Genetics and

Population Mean, Xp - phenotypic mean of the animals or plants of interest and expressed in measurable units.

Selection Mean, Xs - phenotypic mean of those animals or plants chosen to be parents for the next generation and expressed in measurable units.

Selection Differential, SD - difference between the phenotypic means of the entire population and its selected mean.

Page 137: PCB5065 Advanced Genetics Population Genetics and

Genetic Gain =

the amount that the phenotypic mean in the next generation change by selection.

- that change can be + or -

Page 138: PCB5065 Advanced Genetics Population Genetics and

Selection Differential

G = h2 SD

Page 139: PCB5065 Advanced Genetics Population Genetics and

How to Calculate Genetic GainM2 = M + h2 (M1 - M)

M2 = resulting mean phenotype

M = mean of parental populationM1 = mean of selected populationh2 = heritability of the trait

▼M2 - M = h2 (M1 - M) G = h2 SD = (SD/p)h2p = ih2p

i = selection intensityh2 = narrow-sense heritabilityp = standard phenotypic deviation

Page 140: PCB5065 Advanced Genetics Population Genetics and

Factors that influence

the Genetic Gain

•Magnitude of selection differential

•Selection intensity

•Broad-sense heritability heritability

•Phenotypic variation

Page 141: PCB5065 Advanced Genetics Population Genetics and

Knowing the Selection Differential, and the response to selection, an estimate of the trait’s heritability can be calculated

G / SD = Realized Heritability

Page 142: PCB5065 Advanced Genetics Population Genetics and

Realized heritability can also

be calculated as

M2 = M + h2 (M1 - M)

rearranged,

(M2 - M)

(M1 - M)

h2 =

Page 143: PCB5065 Advanced Genetics Population Genetics and

• Maximizing Genetic Gain

• Examples

Page 144: PCB5065 Advanced Genetics Population Genetics and

108.4 104114.6 100.1116.8 113.1118.1 110.1126.7 111123.7 112.3107.6 114.4107.6 108.993.2 116.5

103.4 113.9115.1 104.6110.1 103105.7 110.4111 108.7

103.3 110.3105 108.4107 111.1

107.9 109.7110.3 98.2115.4 96.8117.3 118.6109.2 111.2105.3 111.6105.5 112

N=48,

Population Mean = 109.7

Page 145: PCB5065 Advanced Genetics Population Genetics and

108.4 104114.6 100.1116.8 113.1118.1 110.1126.7 111123.7 112.3107.6 114.4107.6 108.993.2 116.5

103.4 113.9115.1 104.6110.1 103105.7 110.4111 108.7

103.3 110.3105 108.4107 111.1

107.9 109.7110.3 98.2115.4 96.8117.3 118.6109.2 111.2105.3 111.6105.5 112

Goal: Improve the Mean

Select those in red,

N= 6,

Mean of Selected = 119.5

SD = 9.8

G = h2 SD = 0.7 x 9.8 = 6.86

Page 146: PCB5065 Advanced Genetics Population Genetics and

108.4 104114.6 100.1116.8 113.1118.1 110.1126.7 111123.7 112.3107.6 114.4107.6 108.993.2 116.5

103.4 113.9115.1 104.6110.1 103105.7 110.4111 108.7

103.3 110.3105 108.4107 111.1

107.9 109.7110.3 98.2115.4 96.8117.3 118.6109.2 111.2105.3 111.6105.5 112

Goal: Reduce the Mean

Select those in blue, N= 8,

Mean of Selected = 100.4

Page 147: PCB5065 Advanced Genetics Population Genetics and

Nature 432, 630 - 635 (02 December 2004)

The role of barren stalk1 in the architecture of maize

ANDREA GALLAVOTTI1,2, QIONG ZHAO3, JUNKO KYOZUKA4, ROBERT B. MEELEY5, MATTHEW K. RITTER1,*, JOHN F. DOEBLEY3, M. ENRICO PÈ2 & ROBERT J. SCHMIDT1

1 Section of Cell and Developmental Biology, University of California, San Diego, La Jolla, California 92093-0116, USA2 Dipartimento di Scienze Biomolecolari e Biotecnologie, Università degli Studi di Milano, 20133 Milan, Italy3 Laboratory of Genetics, University of Wisconsin, Madison, Wisconsin 53706, USA4 Graduate School of Agriculture and Life Science, The University of Tokyo, Tokyo 113-8657, Japan5 Crop Genetics Research, Pioneer-A DuPont Company, Johnston, Iowa 50131, USA* Present address: Biological Sciences Department, California Polytechnic State University, San Luis Obispo, California 93407, USA

Page 148: PCB5065 Advanced Genetics Population Genetics and

Mapping Quantitative Trait Loci (QTL) in the F2 hybrids between maize and teosinte

Page 149: PCB5065 Advanced Genetics Population Genetics and

Maize

Teosinte

tb-1/tb-1 mutant maize

Page 150: PCB5065 Advanced Genetics Population Genetics and

Effects of ba1 mutations on maize development Mutant Wild typeNo tassel Tassel

Page 151: PCB5065 Advanced Genetics Population Genetics and

Data format for a backcross

Sample Height Marker 1 Marker 2 QTL

(cm, y)

1 184 Mm (1) Nn (1) ?

2 185 Mm (1) Nn (1) ?

3 180 Mm (1) Nn (1) ?

4 182 Mm (1) nn (0) ?

5 167 mm (0) nn (0) ?

6 169 mm (0) nn (0) ?

7 165 mm (0) nn (0) ?

8 166 mm (0) Nn (1) ?

Page 152: PCB5065 Advanced Genetics Population Genetics and

Heights classified by markers (say marker 1)

Marker Sample Sample Samplegroup size mean variance

Mm n1 = 4 m1=182.75 s21=

mm n0 = 4 m0=166.75 s20=

Page 153: PCB5065 Advanced Genetics Population Genetics and

The hypothesis for the association between the marker and QTL

H0: m1 = m0

H1: m1 m0

Calculate the test statistic:t = (m1–m0)/[s2(1/n1+1/n0)], where s2 = [(n1-1)s2

1+(n0-1)s20]/(n1+n0–2)

Compare t with the critical value tdf=1(0.05) from the t-table.

If t > tdf=1(0.05), we reject H0 at the significance level 0.05 there is a QTL

If t < tdf=1(0.05), we accept H0 at the significance level 0.05 there is no QTL

Page 154: PCB5065 Advanced Genetics Population Genetics and

Why can the t-test probe a QTL?

• Assume a backcross with two genes, one marker (alleles M and m) and one QTL (allele Q and q).

• These two genes are linked with the recombination fraction of r.

MmQq Mmqq mmQq mmqqFrequency (1-r)/2 r/2 r/2 (1-r)/2Mean effect m+a m m+a m

Mean of marker genotype Mm:m1= (1-r)/2 (m+a) + r/2 m = m + (1-r)a

Mean of marker genotype mm:m0= r/2 (m+a) + (1-r)/2 m = m + ra

The difference

m1 – m0 = m + (1-r)a – m – ra = (1-2r)a

Page 155: PCB5065 Advanced Genetics Population Genetics and

• The difference of marker genotypes can reflect the size of the QTL,

• This reflection is confounded by the recombination fraction

Based on the t-test, we cannot distinguish between the two cases,

- Large QTL genetic effect but loose linkage with the marker

- Small QTL effect but tight linkage with the marker

Page 156: PCB5065 Advanced Genetics Population Genetics and

Example: marker analysis for body weight in a backcross of mice

_____________________________________________________________________

Marker class 1 Marker class 0______________________ _____________________

Marker n1 m1 s21 n1 m1 s21 t P

value_____________________________________________________________________________1 Hmg1-rs13 41 54.20 111.81 62 47.32 63.67 3.754 <0.012 DXMit57 42 55.21 104.12 61 46.51 56.12 4.99 <0.013 Rps17-rs11 43 55.30 101.98 60 46.30 54.38 5.231 <0.000001

_____________________________________________________________________

Page 157: PCB5065 Advanced Genetics Population Genetics and

Marker analysis for the F2

In the F2 there are three marker genotypes, MM, Mm and mm, which allow for the test of additive and dominant genetic effects.

Genotype Mean Variance

MM: m2 s22

Mm: m1 s21

mm: m0 s20

Page 158: PCB5065 Advanced Genetics Population Genetics and

Testing for the additive effect

H0: m2 = m0

H1: m2 m0

t1 = (m2–m0)/[s2(1/n2+1/n0)],

where s2 = [(n2-1)s22+(n0-1)s2

0]/(n1+n0–2)

Compare it with tdf=1(0.05)

Page 159: PCB5065 Advanced Genetics Population Genetics and

Testing for the dominant effect

H0: m1 = (m2 + m0)/2

H1: m1 (m2 + m0)/2

t2 = [m1–(m2 + m0)/2]/{[s2[1/n1+1/(4n2)+1/(4n0)]],

where s2 = [(n2-1)s22+(n1-1)s2

1+(n0-1)s20]/(n2+n1+n0–3)

Compare it with tdf=1(0.05)

Page 160: PCB5065 Advanced Genetics Population Genetics and

Example: Marker analysis in an F2 of maize

______________________________________________________________________________________________Marker class 2 Marker class 1 Marker class 0____________ ______________ ______________

M n2 m2 s22 n1 m1 s2

1 n0 m0 s20 t1 P t2 P

_______________________________________________________________________________________________

1 43 5.24 2.44 86 4.27 2.93 42 3.11 2.76 6.10 <0.001 0.38 0.70

2 48 4.82 3.15 89 4.17 3.26 34 3.54 2.84 3.28 0.001 -0.05 0.96

3 42 5.01 3.23 92 4.14 3.18 37 3.57 2.68 3.71 0.0002 -0.57 0.57

_______________________________________________________________________________________________

Page 161: PCB5065 Advanced Genetics Population Genetics and

Population Genetics• Estimate of allele frequencies• Hardy-Weinberg equilibrium• Linkage disequilibrium• Inbreeding and IBD• Evolutionary forces

Mutation

Population admixture

Population structure

Natural selection

Page 162: PCB5065 Advanced Genetics Population Genetics and

Quantitative Genetics

• Genetic parameters: (Co)variances

• Mating designs for parameter estimation

• Experimental designs for parameter estimation

• Heritability and genetic gain

• Molecular dissection of quantitative variation

Page 163: PCB5065 Advanced Genetics Population Genetics and

Population & Quantitative GeneticsPCB 5065: Advanced Genetics

This is a take-home exam for the Population & Quantitative Genetics section of PCB 5065. Please read the following instructions carefully:

• You are allowed to use any books, lecture notes and journal articles;

• You should complete the exam independently. Do not discuss with and ask any help from others;

• You may use calculators or computers for computations;• Please return your complete exams to me electronically

or in person by 5:00 pm Monday, December 13, 2004.