Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
The Effect of Population Structure on Linkage
Allen Van DeynzeTomato Breeders’ Roundtable
June, 2009
1
Some Definitions
Linkage• Association of two or more loci on a chromosome with limited
recombination
Linkage Disequilibrium or Gametic Phase Disequilibrium• Non random association of alleles at two or more loci not
necessarily on the same chromosome• Measures co-segregation of alleles in a population• Mendel’s pea traits – showed complete linkage equilibrium and
hence independent assortment• Can arise from intermixture of populations with different gene
frequencies• Can also be produced or maintained by selection favoring one
combination of alleles over the other – e.g. selection for yield in a breeding population
Falconer and McKay (1996)2
Linkage disequilibrium
3
Molecular Markers
Molecular/DNA markers are great tools for breeding, however,• Confounding factors (e.g. linkage among markers,,
genotype*environment interaction, p-value threshold) can drive the selection in a different direction than breeder intended
• Understanding the population structure becomes critical• Phenotypic data still drives the genomic regions identified
as quantitative trait loci• Need to be able to understand and interpret the results
– implications and validation of QTLs are based on F-statistics
4
Trait Distribution in the Progeny
5
If we could observe directly the QTL we could see the 3 underlying trait distributions
Trait distribution in the F2 progeny Distribution within genotypic classesaaAaAA
Marker assisted selection
6
Fruit ripening
DNA marker
R
What About Population structure?F2;F3;F4;Recombinant Inbred (RI);F1-derived, intermated recombinant inbred (IRI);Doubled haploids (DH);Backcross (BC1);Association study in the base germplasm;Near Isogenic Lines (NIL).
7
Populations
8
Expansion of genetic map of 100 cM in an F2
9Winkler et al. (2003)
The level of resolution needed depends on intended application
Combining QTLs between lines within a segregating population.Moving chromosome segments within heterotic pools Make germplasm wide inferences/claims about a particular chromosome segment (assuming IBD).Moving chromosome segments across heterotic pools of elite germplasm.Introgression of special trait (e.g. disease resistance) from exotic germplasm.QTL cloning
10
Incr
ease
d ne
ed fo
r res
olut
ion
Type of population depends on resolution needed
Comparison of resolution and research time for various approaches to dissect quantitative variation. The research times assume the target species has only two generations per year. NIL, near-isogenic line; RIL, recombinant inbred line
11Buckler and Thornsberry (2002)
Linkage Disequilibrium Decay
12
r = Recombination Rate between two locir = 0.5 = Two loci are unlinkedr = 0 = Two loci are completely linked and do not independently assortFalconer and Mackay (1996)
Modeling a Marker Locus and a Linked QTL
Single Marker Analysis Modeling, assume:• One QTL locus A, • One Marker locus M,• Measure of association between A and M• Additive effect only, as dominance effects need
not be considered in most commercial Breeding applications:
– Self pollinated crops: compare AA to aa (no dominance effect)
– Cross Pollinated crop: compare AAT to aAT (dominance and additive effects are confounded)
13
What happens when both A and M are segregating?
Assume a genetic distance of rcM between A and M.
14
X aM
Generate a segregating progeny
A
m
Good phenotypes are important
ripeningin
fect
ion
Ann Powell UC Davis
Tomato fruit susceptibility to Botrytis cinerea
18 June 2009UCD Postharvest
MG
5 dpi 5 dpi
Red Ripe
Susceptibility to pathogens increases
Assigned values of the genotypes at A and M
Let δ=d, d/2, and 0 for cloned, selfed, and tescrossed progenies, respectively.
Expected progeny mean of the three marker genotypic classes:
• MM => a.P(AA/MM) + δ.P(Aa/MM) - a.P(aa/MM)• Mm => a.P(AA/Mm) + δ.P(Aa/Mm) - a.P(aa/Mm)• mm => a.P(AA/mm) + δ.P(Aa/mm) - a.P(aa/mm)
17
Genotyping PhenotypingIndividual
GenotypedCloned Progeny
Selfed Progeny
TestcrossedProgeny
AA AA => a AA => a AT => aAa Aa => d 1AA:2Aa:1aa => d/2 ½ AT+ ½ aT => 0
aa aa => -a aa => -a aT => -a
Variance within the genotypic classes
AssumeAA ~N( a, σ2
g residual+σ2e) ~N( a,σ2)
Aa ~N( d, σ2g residual+σ2
e) ~N( d,σ2)aa ~N(-a, σ2
g residual+σ2e) ~N(-a,σ2)
THEN:Let δ=d, d/2, and 0 for cloned, selfed, and testcrossed progenies, respectively.
Expected progeny variance of the three marker genotypic classes:
• Var(MM) = σ2+ (a-MM)2.P(AA/MM) + (δ-MM)2.P(Aa/MM) +(-a-MM)2.P(aa/MM)
• Var(Mm) = σ2+ (a-Mm)2.P(AA/Mm) + (δ-Mm)2.P(Aa/Mm) +(-a-Mm)2.P(aa/Mm)
• Var(mm) = σ2+ (a-mm)2.P(AA/mm) + (δ-mm)2.P(Aa/mm) +(-a-mm)2.P(aa/mm)
18
Test Statistics- MM versus mm -
• Assume our test statistics will be Satterthwaite t-test (so we don’t have to assume the variance is the same in the two genotypic classes):
19
mmmmMMMMmmMM nsnsXXt 22 +−='
- Mm versus MM or Mm versus mm -• This type of comparison is used in Backcross populations:
MmMmmmorMMmmorMMMmmmorMM nsnsXXt 22 +−='
What information do we need? Information pertinent to the association between marker genotypes and phenotypic values.• Total number of individuals genotyped, N (and
number in each class, nMM, nMm, and nmm)• Values of a and δ.• Values of residual genetic variance and error
variance
20
Populations derived from the cross between two inbred parents
(DH, BC1, F2, F3, F4, RI, NIL)Detection of linkage between A and M, will occur only if both are segregating in the population and if they are physically linked;Assuming both parents are sampled at random from the germplasm, the probability that:• M is segregating: 2.fM.(1-fM)• A is segregating: 2. fA.(1-fA)• Both A and M are segregating: 2. (fAM.fam+ fAm.faM)
21
Gametic phase disequilibrium between linked markers is valuable
We cannot extrapolate results from one mapping population to the rest of our germplasm in the absence of disequilibrium.Disequilibrium tends to increase the chance of M1 and M2 segregating simultaneously.
22
How to know if we have disequilibrium between linked marker and QTL in the germplasm?
We cannot really assess very well the extent of gametic phase disequilibrium between M1 and M2 in the population without very extensive mapping studies.
But we can look at Marker-Marker associations for a much smaller cost: we just need to genotype our germplasm.
23
TG670
SSR10511CT6213SSR26614SSR192 TG12515SSR95 CT2012721
SSR31634CT14935CT20134I CT10725I41CT20268I42SSR13446CT10975I51TG27354CT10030I.258CT2011659LEOH106 TG5960CT10629 CT10811CT10945 LEVCOH1265LEVCOH1166CT19169SSR973SSR308 TG46581LEOH22286SSR42 TG260TG24587SSR3796CT1025998TG255103SSR65 SSR582SSR288 TG580113CT10126I115
Chr.1
CT105350LEOH3423LEOH342n7TG60812CT20522CT10682I29CT1019030TG16531SSR6633SSR9640CT1064942CT1092344TG1446SSR5 SSR60547CT10771 CT1015348CT1080151CT10279I55CT24457SSR3259LEOH34860SSR59861SSR2664TG46968LEOH11370TG64572TG33777TG53784TG16790LEOH319 LEOH17491TG15192TG154100
Chr.2
TG214 SSR6010SSR14 LEOH1271SSR3202CT10690I3CT20050 CT10772I4CT106786CT10042I13CT1045017CT8519CT10437I22TG24628LEOH11031LEOH18533TG12934CT10689I36CT10480I38SSR11144CD5145CT20195 CT10402I46CT20037 CT1043747CT10736 CT8250CT2002351LEOH22354CT1050655TG52059CT14169TG13B77TG13086TG11492
Chr.3
CT109520
SSR29610
TG1522CT10255I SSR4325LEOH36126SSR431127TG48337CT2014541CT1018447CT1032253SSR31056SSR450 CT10485CT157 SSR30657CT1080959CT1021564LEOH101 CT17868CT2002870CT19474TG16377CT1088878CT10184I79CT1013684CT1055685TG50090CT5092
CT10375121
Chr.4
CT1010CT102384TG4419CT167 SSR11514CT1003615
CT9330TG9637CT1096341CT10373I TG619CT10151I42CT10765I44CT20210I45CT10526 LEOH6346TG100A47CT11855LEOH19258LEOH31667CT1059176SSR16283SSR109 TG18584
Chr.5
CT2160CT102426CT10242I7CT1018711TG59017
CT10328I27SSR12830
TG35639LEOH24341
TG36549TG25355LEOH14657LEOH20958LEOH20061LEOH11266
CT20674TG31478
Chr.6
TG3420CT200173CT5210SSR28611L21J7a18LEOH10422SSR27627CD5728
TG18338TG17441CT1013845CD5446
CT1097455
SSR4564TG2070CT1003973LEOH22174TG49980
Chr.7
CT10152I0CT103961LEOH704
TG17616LEOH12319LEOH147 CT47CT1019220CT1001521CT92 SSR32722CT1016226TG34931TG30239SSR33544SSR6350TG33051SSR3853CT10367I62
CT26570
CT6882
Chr.8
GP390
TG189
CT14323SSR68 SSR7030CT2015931CT1000436
TG29149CT1002450SSR11056LEOH14458TG55160
CT7471LEOH11772LEOH17074TG42181
SSR333 TG328100
Chr.9
CT100820CT10082I1TG122 CT166CT1067013SSR3416SSR59617CT23419CT10105I30CT10464I31SSR31835CT1136CT1055440CT20342CT10419I43CT10078I CT1070146CT10386I CT1038653
TG40377
TG23388LEOH336 TG6389SSR22391
Chr.10
CT10683I0TG4974LEOH1765
SSR8017TG50822SSR7627
CT10120 CT20244I34
TG147 CT10781CT1091547TG38451CT10737I53CT10615I59
CT2018168TG54672TG3678
TG39389CT1002790
Chr.11
TG1800
TG689
CT21126CT10953I29
TG36038SSR2041TG56547TG11153
LEOH6664LEOH30167
CT156 CT10329I78CT1077879CT10796I LEOH27580LEOH19787CT27688
Chr.12
Tomato SNP and Indel Map
Matthew Robbins
The marker density needed depends on the population
25
( )
0.0
0.5
1.0
1.5
2.0
2.5
0 20 40 60 80 100
Distance of Marker from QTL (in cM)
(MM
-mm
)/a *
DH or F2 or 2xBC1F3F4IRI (n# gen RM=0)IRI (n# gen RM=2)IRI (n# gen RM=5)IRI (n# gen RM=10)IRI (n# gen RM=25)
Pow
er o
f det
ectin
g a
QTL
34.7 cM
2.4 cM
Thus, QTL mapping in an F2…
26
+ from Parent 2+ from Parent 1
1 2 5 6 7 8 9 103 4
…is an association study in a population where the confounding effect of pedigree has been removed.
Sample sizeσ2 = error σ2 + residual genetic σ2
“error σ2 ” and “residual genetic σ2 ” are based on overall mean for each entry.Increasing the number of field replications will reduce error σ2 but not the between-line genetic σ2.To further reduce the denominator in the t-test. We need to add more genetic entries.
Typical split plot: – Sub-plot error: plot-level error σ2
– Plot error: genetic σ2 + (plot-level error σ2)/(rep number)
27
Sample size - Using power calculations (Lynch and Walsh, 1998) -
Assume additive effect only (thus, we can also ignore difference in variance between marker classes)Let the total Phenotypic variance be Assume we want the power to detect QTL for which variation at linked marker loci account for r2 of the total phenotypic varianceIgnore the slight increase of intra class variance due to imperfect linkage between A and M. Thus:
- MM ~N(μMM=γ,σ2)- Mm ~N( 0, σ2)- mm ~N(μmm=-γ,σ2)
The proportion of total phenotypic variance explained by the segregation at the marker locus is:
for Fi, IRI, DH’s
for BC1
28
( )[ ] ( )
[ ] ( )242
22
2
22
Nnnnr
nnNnnr
MmMMP
mmMMP
mmMM
====
=+
=
σγ
σγ
2Pσ
( ) 22 11 σσ ⎟⎟⎠
⎞⎜⎜⎝
⎛+=−
mmMMmmMM nn
zzTHUS :
Sample sizeUsing power calculations (Lynch and Walsh, 1998)
The number of progeny needed to detect an effect at a marker loci that corresponds to r2 of the total phenotypic variance, with a probability of false positive of α, and a probability β of missing a true association can be extracted from the relationship:
29
( )[ ]( )
( ) 114
111 2
12
212
22
=⎟⎟⎠
⎞⎜⎜⎝
⎛+
−⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛+−
−−
βα
γ
σz
r
znnr P
mmMM
(Replace nmm by nmM and 4γ2 by γ2 in the case of BC1)
Sample size (3) Power calculations , examples-
Whenever we have nMM=nmm, we have:
30
( ) [ ]( )( )
( ) [ ]( )( )
2
12
212
2
2
12
212
22
11
112
1
⎟⎟⎠
⎞⎜⎜⎝
⎛+
−
−=
=⎟⎟⎠
⎞⎜⎜⎝
⎛+
−
−
−−
−−
βα
βα
γσ
zr
zr
rNTHUS
zr
znrand P
( )nnnNnr mmMM
P
=== 2
22 2
σγ
This applies to Fi’s, BC1,DH, IRI
α=0.05β=0.10
α=0.01β=0.05
N for r2=0.10
101 171
N for r2=0.05
206 349
N for r2=0.01
1047 1774
False +False -
Proportion σ2P
Statistical power
31
h2=0.10
h2=0.05
Hu and Shu 08
32
Environments vs #genotypes
Schön et al. (2004)Very large population (975 F5 testcrosses in 19 environments) and simulated populationsQTL analyses (PLABQTL)Obtained proportion of phenotypic variance explained by QTLs ( )Derived proportion of genotypic variance explained by all detected QTLs:
Data subdivided to verify impact of number of progenies and number of environments.Used resampling techniques to estimate the amount of bias in detecting QTL (comparing R2 and P in estimation and test data sets (ES vs. TS).
2adjR
22 ˆˆˆ hRp adj=
33
Figure 1. (Schön et al., 2004) Proportion of the genotypic variance explained by detected QTL in estimation sets averaged over all data sets ( ES) for 12 combinations of experimental data PED (N, E), using fivefold standard cross-validation and two significance levels for grain yield, grain moisture, and plant height. Individual columns are partitioned into the genotypic variance explained in test sets (TS, solid bottom) and the bias calculated as the difference ES – TS (shaded top).
34
Figure 2. (Schön et al., 2004)Mean (–), median (o), and 12.5 and 87.5% quantiles of the proportion of the genotypic variance explained in test sets calculated for individual data sets for 12 combinations of experimental data PED (N, E) using fivefold standard cross-validation and LOD 2.5 for grain yield, grain moisture, and plant height.
35
# individuals σG2
environments σ2
Main conclusions of Schön et al.
Adding more genotypes is more efficient than replicating the same genotypes (provided that a minimum number of environments are sampled)h2 and size of effect important Results are trait specific
36
ReferencesDudley, J.W. and R.J. Lambert. 1992. Ninety generations of selection for oil and protein in maize. Maydica 37:81-87.Falconer and McKay. 1996. Introduction to quantitative genetics. 4th ed. Longman group LTD. Essex, UK. pp 464.Laurie, C.C. et al. 2004. The Genetic Architecture of Response to Long-Term Artificial Selection for Oil Concentration in the Maize Kernel. Genetics 168:2141-2155.Liu, B.H. 1998. Statistical Genomics. Linkage, mapping and QTL analysis.CRC press LLC. FL, USA. 611p.Schön, C.C. et al. 2004.Quantitative Trait Locus Mapping Based on Resampling in a Vast Maize Testcross Experiment and Its Relevance to Quantitative Genetics for Complex Traits. Genetics 167: 485-498Tanksley S.D. et al. 1996. Advanced backcross QTL analysis in a cross between an elite processing line of tomato and its wild relative L. pimpinellifolium. Theor. Appl. Genet. 92:213-224.Walsh, B. and Lynch, M. 1998. Genetics and Analysis of Quantitative Traits. Sinauer Associates Inc. MA, USA. 980p.Winkler, C.R. et al. 2003. On the determination of recombination rates in intermated recombinant inbred populations. Genetics 164:741-745.Xiao, J. et al. 1998. Identification of trait-improving quantitative trait loci alleles from a wild rice relative, Oryza rufipogon. Genetics 150:899-909.
37
Carotenoids in NILs: S. pennelli x S. lycopersicum
Confidential 38Liu et al (2003)