Upload
grady-larner
View
224
Download
1
Tags:
Embed Size (px)
Citation preview
Mark E. Sorrells and Flavio BreseghelloMark E. Sorrells and Flavio Breseghello
Department of Plant Breeding & GeneticsDepartment of Plant Breeding & Genetics
Cornell UniversityCornell University
Association Mapping as a Breeding StrategyAssociation Mapping as a Breeding Strategy
Presentation Presentation OverviewOverview
A Genetic Model for Association Mapping in Plant Breeding
Populations
Comparison of Different Plant Breeding Materials for
Association Mapping
Association Mapping of Kernel Size and Milling Quality in Soft
Winter Wheat Cultivars
A Genetic Model for AM in Plant Breeding Populations:A Genetic Model for AM in Plant Breeding Populations:Association as Conditional ProbabilitiesAssociation as Conditional Probabilities
Gene Marker
Recombination (c)
Breeding Pool
Gene={a}
Marker={m,M}
New Parent (A,M)
Pr(A,M)=φ
Pr(a,M)=θ
Pr(a,m)=1-φ-θ
Pr(A,m)=0
(Hedrick 2005)
Recom
binat
ion (c
)
Selec
tion o
n A o
r M
(w)
Pr(A|M,c,t,φ,θ,w)“Probability of a plant with marker allele M to have gene allele A, t generations after the introduction of A”
t generations
Population genetics theory
Recombination x initial frequency Recombination x initial frequency of M in the breeding poolof M in the breeding pool
A novel marker allele at 10
cM distance can be more
predictive of the QTL allele
than an allele 1 cM away if
it was present in the original
pop at a freq of 0.05
Freq. new parent: φ=0.05
Relative fitness: w=1
Freq. M in original population = θ
Freq. Recombination c
θ=0
0 5 10 15 20
0.0
0.2
0.4
0.6
0.8
1.0
Generations
Pr(
A|M
)
0 0.05 0.25
c=0.01
c=0.05
c=0.10
~8 ~18
θ=0
θ=0.05
θ=0.25
Pr(
A|M
)
t Generations
Recombination x selection for M
Freq. new parent: φ=0.05
Relative fitness: w = 4 (red), 2 (green), 1.25 (blue)
Freq. M in original pop: 0
Freq. Recombination: c = 0.01, 0.05, 0.10
Pr(A|M)
Pr(A)
• The generation at which the marker is depleted [Pr(A|M)=Pr(A)], depends on the selection intensity applied;
• The final frequency of A depends on selection and tightness of linkage between marker and gene.
Generations
SummarySummary
• In plant breeding populations, the locus most associated
with the trait is not necessarily the closest locus;
• Loosely linked markers can still be useful for MAS if
high intensity of selection is applied.
MAS for Complex Traits: Issues MAS for Complex Traits: Issues
• Accurate detection and estimation of QTL effects
• Pre-existing marker alleles in a breeding population can be
linked to non-target QTL alleles
• Multiple QTL alleles can have different relative values
• Gene x gene and gene by environment interactions
Association Analysis as a Breeding StrategyAssociation Analysis as a Breeding Strategy
•Most association studies have focused on estimating linkage disequilibrium and fine mapping.
•Breeding programs are dynamic, complex genetic entities that require frequent evaluation of marker / phenotype relationships.
Breseghello, F., and M.E. Sorrells. 2006. Association mapping of kernel size and milling quality in wheat (Triticum aestivum L.) cultivars. Genetics 172:1165-1177.
Breseghello, F., and M.E. Sorrells. 2006. Association analysis as a strategy for improvement of quantitative traits in plants. Crop Sci. In press.
Association Mapping versus QTL MappingAssociation Mapping versus QTL Mapping
• Association Mapping can be conducted directly on the breeding material,
therefore:
• Direct inference from data analysis to breeding is possible
• Phenotypic variation is observed for most traits of interest
• Marker polymorphism is higher than in biparental populations
• Routine variety trial evaluations provide phenotypic data
• Association Mapping provides other useful information about:
• Organization of genetic variation in relevant breeding populations
• Novel alleles can be identified and their relative value can be
assessed as often as necessary
• Type I error (false positives) can be higher because of:
• Unaccounted population structure
• Simultaneous selection of combinations of alleles at different loci
• High sampling variance of rare alleles
• Type II error can be higher (low power) because of:
• Lower LD than in biparental mapping populations
• Unbalanced design due to differences in allele frequencies
• A larger multiple-testing problem because of lower LD
Association Mapping versus QTL MappingAssociation Mapping versus QTL Mapping
Germplasm
New Populations
New Synthetics, Lines, VarietiesNew Synthetics, Lines, Varieties
Elite Synthetics, Lines, VarietiesElite Synthetics, Lines, Varieties
Hybridization
Selection(Intermating)
Evaluation Trials
Genotypic & Phenotypic data
Parental Selection
Marker Assisted Selection
Novel & ValidatedNovel & ValidatedQTL/MarkerQTL/MarkerAssociationsAssociations
Integration of Association Analysis in a Breeding Program
Elite germplasmfeeds back intohybridization
nursery
Types of PopulationsTypes of Populations
• Germplasm Bank Collection
• A collection of genetic resources including landraces, exotic material and
wild relatives.
• Synthetic Populations
• Outcrossing populations (either male-sterile or manually crossed)
synthesized from inbred lines. May be used for recurrent selection.
• Elite Lines
• Inbred lines (and checks) manipulated with the objective of releasing new
varieties in the short term.
Aspects of AM Germplasm bank Synthetic Populations
Elite Germplasm
Sample Core-collection Segregating progenies
Elite lines and checks
Sample turnover Static Ephemeral Gradually substituted
Source of phenotypic data
Screenings Progeny tests Yield trials
Type of traits High heritability traits;Domestication traits
Depends on the evaluation scheme
Low heritability traits: yield, resistance to abiotic stresses
Characteristics Related to Association Mapping:Characteristics Related to Association Mapping:Practical aspectsPractical aspects
Characteristics Related to Association Mapping: Characteristics Related to Association Mapping: Genetic Expectations Genetic Expectations
Aspects of AM Germplasm bank Synthetic Populations Elite Germplasm
Linkage Disequilibrium
Low Intermediate and fast-decaying
High
Population structure
Medium Low High
Allele diversity among samples
High Intermediate Low
Allele diversity within samples
Variable 1 or 2 alleles(diploid species)
1 allele (inbred lines)
Characteristics Related to Association Mapping: Characteristics Related to Association Mapping: Potential ApplicationsPotential Applications
Aspects Germplasm bank Synthetic Populations Elite Germplasm
Power Low Intermediate and decreasing
High; could allow genome scan
Resolution High; could allow fine mapping
Intermediate and increasing
Low
Use of significant markers
Transfer of new alleles by marker-assisted backcross
Incorporation in selection index
Forward Breeding -MAS in progenies (requires validation)
Previous QTL informationPrevious QTL information
• Doubled-Haploid Population AC Reed x
Grandin
• QTL for kernel size (width) near Xwmc18-2D
• Recombinant Inbred Population Synthetic
W7984 x Opata (ITMI population)
• QTL for kernel size (length) on 5A and 5B
Length
5B
Width
2D
Association AnalysisAssociation AnalysisMaterials
• 95/149 soft winter wheat cultivars from the Northeastern US: Mostly recent releases, representing 35 seed companies / institutions
• 93 SSR loci: 33 on 2D, 20 on 5A, 9 on 5B, 31 on 16 other chromosomes
• Rare alleles (freq<5%):considered as missing for LD and population structure analysis; considered as allele for AM analysis
Methods
• Population Structure: 36 “unlinked” SSR markers- Structure without admixture, SPAGeDi (Hardy & Vekemans) program for Kinship ; Visualization: Factorial (Multiple) Correspondence Analysis (Benzecri, 1973 L' Analyse des correspondances. Dunod)
• Linkage Disequilibrium: Tassel (maizegenetics.net) used to compute r2 , with p-values from 1000 permutations
• Association Analysis: R stats package lme used to analyze Linear mixed-effects model with marker as fixed effects (selected from previously identified QTL regions) and subpopulations or Kinship as random effects (no obvious differentiating characteristics); Two-marker models: tested by likelihood ratio test
• Jianming Yu, Gael Pressoir, et al. (2006) A Unified Mixed-Model Method for Association Mapping Accounting for Multiple Levels of Relatedness Nature Genetics 38:203-208
Estimating Relatedness Estimating Relatedness The K MatrixThe K Matrix
Relatedness (K)
i
j
Θij≅ Fij
F11
FnnFnj ……
………….
.
.
.
.
.
.
.
.
In cattle studies the analogous matrix is estimated from pedigrees, and it controls for the polygene effect
Jianming Yu, Gael Pressoir, et al. (2006) A Unified Mixed-Model Method for Association Mapping Accounting for Multiple Levels of Relatedness Nature Genetics 38:203-208
Fij = (Qij-Qm)/(1-Qm) (Ritland, Loiselle)
If Fij is negative, then it is set to zero.
Subpopulation No. of Varieties Fst1 19
0.3372 32
0.1113 13
0.2954 31
0.064Total 95
0.188
Population Structure:Population Structure:Sample SubdivisionsSample Subdivisions
S1S2
S3S4
Moderate Population Subdivision
Population Structure:Population Structure:Factorial Correspondence AnalysisFactorial Correspondence Analysis
S2
S4
S1
S3
Orthogonal views of 4 soft winter wheat subpopulations
Linkage Disequilibrium:Linkage Disequilibrium:Germplasm Germplasm Sample SelectionSample Selection
• 149 lines genotyped with 18 unlinked SSR markers
• Most similar lines were excluded
• "Normalizing" the sample drastically reduced LD among unlinked markers
p<.0001
p<.001
p<.01
149 lines
95 line
s
R2 probability for unlinked SSR markers
Definition of a baseline-LD specific for our sampleDefinition of a baseline-LD specific for our sample
Defined as the 95th percentile of the distribution of r2 among unlinked loci
r2 estimates above this value are probably due to genetic linkage
Baseline LD for this sample: r2 = 0.0654
Normal curve
Correlation Coefficient r
De
nsi
ty
0.0 0.1 0.2 0.3 0.4
02
46
8
Normal Distr. 95th percentile
LD
b
aselin
e
0 100 200 300 400 500 600
0.00
0.02
0.04
0.06
0.08
0.10
0.12
r2 LD baseline
Linkage Disequilibrium: Chromosome 2DLinkage Disequilibrium: Chromosome 2D
Consistent LD was below 1 cM, localized LD 1-5 cM
0 20 40 60 80 100
0.0
0.1
0.2
0.3
0.4
0.5
0.6
cM
r2
Baseline LD
Linkage Disequilibrium: Chromosome 5ALinkage Disequilibrium: Chromosome 5A
Significant LD extended for 5 cM in pericentromeric region
~5
cM0 10 20 30 40 50
0.0
0.2
0.4
0.6
0.8
1.0
cM
r2
Baseline LD
Locus Weight Area Length WidthcM Name NY OH NY OH NY OH NY OH
7 Xcfd56 0.069 0.160 0.012 0.119 0.076 0.031 0.000* 0.252
11 Xwmc111 0.005 0.020 0.005 0.108 0.003’ 0.107 0.000* 0.000**
23 Xgwm261 0.145 0.016 0.019 0.009 0.027 0.009 0.058 0.001*
28 Xwmc112 0.012 0.057 0.047 0.120 0.480 0.367 0.001* 0.024
64 Xgwm30 0.081 0.862 0.053 0.848 0.312 0.820 0.000** 0.212
91 Xgwm539 0.042 0.038 0.030 0.039 0.001* 0.005 0.290 0.334
Loci Associated with Kernel Size (p-values)Loci Associated with Kernel Size (p-values)Chromosome 2DChromosome 2D
Kernel Size
Milling Quality
None of the loci on 2D were significant after multiple testing correction
**
Lik
elih
oo
d
Rati
o
Test
Agreed with QTL in Reed x Grandin
Locus Weight Area Length WidthcM Name NY OH NY OH NY OH NY OH
55 Xcfa2250 0.021 0.007 0.044 0.014 0.014 0.002* 0.637 0.649
55 Xwmc150b 0.002* 0.003 0.003 0.005 0.009 0.002* 0.093 0.429
56 Xbarc117 0.009 0.002* 0.021 0.005 0.118 0.022 0.044 0.039
60 Xbarc141 0.631 0.037 0.232 0.024 0.038 0.002* 0.852 0.863
Loci Associated with Kernel Size (p-values)Loci Associated with Kernel Size (p-values)Chromosome 5AChromosome 5A
cM LocusMilling Score
Flour Yield ESI FriabilityBreak-Flour
Yield
55 Xcfa2250 0.010 0.029 0.047 0.002* 0.081
Kernel Size
Milling Quality
n.s.
**
Lik
elih
oo
d
Rati
o
Test
Agreed with QTL in
M6 x Opata
B.L.U.E. of allele effectsB.L.U.E. of allele effectsKernel LengthKernel Length
N. of Cultivars: 9 5 18 37 9 9 41 45 43 49
B.L.U.E. of allele effectsB.L.U.E. of allele effectsKernel WidthKernel Width
N. of Cultivars: 41 14 8 15 18 24 5 10 19
B.L.U.E of allele effectsB.L.U.E of allele effectsKernel WeightKernel Weight
N. of Cultivars: 41 45 43 49
ConclusionsConclusions• Linkage Disequilibrium
• Variation in LD across the genome can be characterized in relevant germplasm
• Markers closely linked to QTL of interest can be identified and allelic effects quantified
• Association Mapping as a Breeding Strategy
• For recurrent selection, markers could be used to carry information from a “good year” to a “bad
year”
• In pedigree breeding, markers could carry information about traits of interest from replicated field
trials to single row or single plant selection
• Allelic values of previously identified alleles can be updated annually based on advanced trial
data combined with genotypic data
• New alleles can be identified and characterized to determine their relative value
• A selection index can be used to incorporate both phenotypic and molecular data
AcknowledgementsAcknowledgements• USDA Soft Wheat Quality Lab, Wooster, OH
• Embrapa
Technical Support:
• David Benscher
• James Tanaka
• Gretchen Salm
Kangaroo Island
Wayne Powell