View
216
Download
0
Category
Tags:
Preview:
Citation preview
Contact: wyan@ggebiplot.com
Biplot AnalysisBiplot Analysis of of Multi-Environment Trial DataMulti-Environment Trial Data
Weikai YanMay 2006
Weikai Yan2006
Multi-Environment Trials Multi-Environment Trials (MET)(MET)
• MET are essential
• MET are expensive
• MET data are valuable
• MET data are not fully used
Weikai Yan2006
Why biplot analysis?Why biplot analysis?
• Biplot analysis can help understand MET data– Graphically, – Effectively, – Conveniently
Weikai Yan2006
OutlineOutline
• Multi-environment trial (MET) data• Basics of biplot analysis• Biplot analysis of G-by-E data• Biplot analysis of G-by-T data• Better understanding of MET data• Conclusions
Contact: wyan@ggebiplot.com
Multi-environment Multi-environment trial datatrial data
Weikai Yan2006
MET data is MET data is a genotype-environment-a genotype-environment-
trait trait (G-E-T) 3-way table(G-E-T) 3-way table
• Multiple Genotypes
• Multiple Environments
• Multiple Traits
Weikai Yan2006
A G-E-T 3-way table A G-E-T 3-way table contains contains
many 2-way tablesmany 2-way tables• G by E: for each trait
• G by T (trait): in each environment; across environments
• E by T: for each genotype; across genotypes
G-E-T data >> G-E data
Weikai Yan2006
A G-E-T 3-way table isA G-E-T 3-way table isan extended 2-way tablean extended 2-way table
• G by V:– each E-T combination as a variable (V)
• P by T: – each G-E combination as a phenotype
(P)
Weikai Yan2006
A G-E-T 3-way table implies A G-E-T 3-way table implies informative 2-way tablesinformative 2-way tables
• Association by environment 2-way tables– Associations:
• among traits• between traits and genetic markers
Weikai Yan2006
Goals of MET data analysis Goals of MET data analysis
• Short-term goals: – Variety evaluation
• Response to the environment (G x E)• Trait profiles (G x T)
• Long-term goals: – To understand
• the target environment (G x E)• the test environments (G x E)• the crop (G x T)• the genotype x environment interaction (A x T)
Contact: wyan@ggebiplot.com
Basics of biplot Basics of biplot analysisanalysis
Most two-way tables can be visually studied using biplots
Weikai Yan2006
Origin of biplotOrigin of biplot
Gabriel (1971) One of the most
important advances in data analysis in recent decades
Currently… > 50,000 web pages Numerous academic
publications Included in most
statistical analysis packages
Still a very new technique to most scientists
Prof. Ruben Gabriel, “The founder of biplot”Courtesy of Prof. Purificación Galindo
University of Salamanca, Spain
Weikai Yan2006
What is a biplot?What is a biplot?
• “Biplot” = “bi” + “plot”– “plot”
• scatter plot of two rows OR of two columns, or• scatter plot summarizing the rows OR the columns
– “bi” • BOTH rows AND columns
• 1 biplot >> 2 plots
Weikai Yan2006
Mathematical definition of Mathematical definition of a Biplota Biplot
Graphical display of matrix multiplicationGraphical display of matrix multiplication
“Inner product property”– Pij =OAi*OBj*cosij
– Implies the product matrix
A(4, 2) B(2, 3) P(4, 3)
121284
96103
151262
69201
321
214
332
321
044
313
332
341
a
a
a
a
bbb
y
x
bbb
a
a
a
a
yx
Matrix multiplication
-4
-3
-2
-1
0
1
2
3
4
5
-4 -3 -2 -1 0 1 2 3 4 5
X
Y
O
A1A2
A3
A4
B1
B2
B3
5.0
cos =0.8944
4.472
P11 = 5*4.472*0.8944 = 20
Weikai Yan2006
Practical definition of a Practical definition of a biplotbiplot
“Any two-way table can be analyzed using a 2D-biplot as soon as it can be sufficiently approximated by a rank-2 matrix.” (Gabriel, 1971)
214
332
321
044
313
332
341
121284
96103
151262
69201
321
y
x
eee
g
g
g
g
yx
g
g
g
g
eee
G-by-E table
Matrix decomposition
-4
-3
-2
-1
0
1
2
3
4
5
-4 -3 -2 -1 0 1 2 3 4 5
X
Y
O
G1G2
G3
G4
E1
E2
E3
P(4, 3) G(3, 2) E(2, 3)
(Now 3D-biplots are also possible…)
Weikai Yan2006
Singular Value Decomposition Singular Value Decomposition (SVD) & (SVD) &
Singular Value Partitioning (SVP) Singular Value Partitioning (SVP)
r
kkj
fk
fkik
SVP
r
kkjkik
SVDij
ba
baY
1
1
1
))((
(0 ≤ f ≤ 1)
“Singular values”Matrix characterising the rows
Matrix characterising the columns
SVD = PCA?
SVD:
SVP:
The ‘rank’ of Y, i.e., the minimum number of PC required to fully represent Y
Rows scores
Column scores
BiplotPlot Plot
Weikai Yan2006
Biplot interpretations Biplot interpretations
Inner-product property Interpretations based on biplots with f = 1
approximates YYT, the distance matrix Similarity/dissimilarity among row (genotype) factors
Interpretations based on biplots with f = 0 approximates YTY, the variance matrix Similarity/dissimilarity among column (environment)
factors
Combined use of f = 0 and f = 1
(Gabriel, 2002 Biometrika; Yan, 2002, Agron J; Built in the GGEbiplot software)
))((1
1
r
kkj
fk
fkikij baY
Weikai Yan2006
Biplot analysis is… Biplot analysis is…
to use biplots to display– a two-way data per se (Y), – its distance matrix (YYT), and– its variance matrix (YTY)
so that– relationships among rows, – relationships among columns, and– interactions between rows and columns
can be graphically visualized.
Weikai Yan2006
Data centeringData centering prior to prior to biplot analysisbiplot analysis
• The general linear model for a G-by-E data set (P) – P = M + G + E + GE
• Possible two-way “tables” (Y):• Y = P = M + G + E + GE —original data: QQE biplot
• Y = P – M = G + E + GE —global-centered (PCA)
• Y = P – M – E = G + GE —column-centered: GGE biplot
• Y = P – M – G = E + GE —row-centered
• Y = P – M – G – E = GE —double-centered: GE biplot
All models are useful, depending on the research objectives (built in GGEbiplot)
Weikai Yan2006
Data scalingData scaling prior to prior to biplot analysisbiplot analysis
• Different GGE biplots• Yij = (i + ij)/sj
• Sj = 1 no scaling
• Sj = (s.d.)j all environments are equally important
• Sj = (s.e.)j heterogeneity among environments is removed
(built in GGEbiplot)
Weikai Yan2006
Four questions must be Four questions must be askedasked
before trying to interpret a biplotbefore trying to interpret a biplot1. What is the model?
How the data were centered and scaled?What are we looking at?
2. What is the goodness of fit?How confident are we about what we see?What if the data is fitted poorly?
3. How singular values are partitioned?What questions can be asked?
4. Are the axes drawn to scale?Are the patterns artifacts?
(All are addressed explicitly in GGEbiplot)
Contact: wyan@ggebiplot.com
Biplot Analysis ofBiplot Analysis ofG-by-E dataG-by-E data
MEGA-MEGA-ENVIRONMENTENVIRONMENT
ANALYSISANALYSIS
TESTTESTENVIRONMENTENVIRONMENTEVALUATIONEVALUATION
GENOTYPEGENOTYPEEVALUATIONEVALUATION
Weikai Yan2006
Sample G-by-E dataSample G-by-E data(Yield data of 18 genotypes in 9 environments, 1993, Ontario, Canada)(Yield data of 18 genotypes in 9 environments, 1993, Ontario, Canada)
Weikai Yan2006
Before trying to interpret a Before trying to interpret a biplot…biplot…
1. Model selection?Centering = 2 (“G+GE”)
Scaling =0
2. Goodness of fit?78%.
3. Singular value partitioning?
SVP = 2 (environment-
metric)
4. Draw to scale?Yes.
Weikai Yan2006
G By E data analysisG By E data analysis
MEGA-MEGA-ENVIRONMENTENVIRONMENT
ANALYSISANALYSIS
TESTTESTENVIRONMENTENVIRONMENTEVALUATIONEVALUATION
GENOTYPEGENOTYPEEVALUATIONEVALUATION
• Mega-environment is a group of geographical locations that share the same (set of) best genotypes consistently across years.
Weikai Yan2006
Relationships among Relationships among environmentsenvironments
The “Environment-vector” view• Angle vs.
correlation• The angles
among test environments
• Environment grouping
Weikai Yan2006
“Which-won-where”
(Crossover GE is GE that caused genotype rank changes and different “winners” in different test environments)
G12
G7G18
G8G13
Weikai Yan2006
Are there meaningful Are there meaningful crossover GE?crossover GE?
The “which-won-where” view
(Crossover GE is GE that caused genotype rank changes and different “winners” in different test environments)
Weikai Yan2006
Are the Are the crossover patternscrossover patterns* * repeatable?repeatable?
• If YES…– The target environment can be divided into multiple
mega-environments– GE can be exploited by selecting for each mega-
environment– GE G
• If NO…– The target environment CANNOT be divided into
multiple mega-environments– GE CANNOT be exploited – GE must be avoided by testing across locations and
years
• *Not the environment-grouping patterns• Mega-environment is a group of geographical locations that share the same (set of) best genotypes consistently across years.• Multi-year data are needed
Weikai Yan2006
Classify your target Classify your target environment intoenvironment into
one of three categoriesone of three categoriesWith Crossover GE No Crossover
GE
Repeatable (2) Multiple MEsSelect for specifically adapted genotypes for each ME
(1) Single simple MEA single test location, single year suffices to select a single best variety
Not repeatable (3) Single complex MESelect for generally adapted genotypes across the whole regions across multiple years
ME: mega-environment
Weikai Yan2006
G By E data analysisG By E data analysis
MEGA-MEGA-ENVIRONMENTENVIRONMENT
ANALYSISANALYSIS
TESTTESTENVIRONMENTENVIRONMENTEVALUATIONEVALUATION
GENOTYPEGENOTYPEEVALUATIONEVALUATION
Weikai Yan2006
Discriminating ability and Discriminating ability and representativenessrepresentativeness
Vector length: discriminating abilityAngle to the AE: representativeness
Average-environment axis
Average environment
Weikai Yan2006
Ideal test environments:Ideal test environments:discriminating and discriminating and
representativerepresentative
Ideal testenvironment
Weikai Yan2006
Classify each test environment Classify each test environment into into
one of three categories one of three categories
• For each “good” or “useful” test environment: is it essential?
Discriminative Not discriminative
Representative (2) Good for selecting (more
important)
(1) Useless
Not representative
(3) Useful for culling (less important)
Weikai Yan2006
Vector length = discrimination Vector length = discrimination = GE = GE1 + GE2= GE = GE1 + GE2
Contribution toProportionateGE
Contribution toNon-proportionateGE
Weikai Yan2006
G By E data analysisG By E data analysis
MEGA-MEGA-ENVIRONMENTENVIRONMENT
ANALYSISANALYSIS
TESTTESTENVIRONMENTENVIRONMENTEVALUATIONEVALUATION
GENOTYPEGENOTYPEEVALUATIONEVALUATION
Weikai Yan2006
Vector length = GGE = G Vector length = GGE = G + GE+ GE
Contribution To GE(instability)
Contribution To G (mean performance)
Weikai Yan2006
Mean vs. StabilityMean vs. Stability
Weikai Yan2006
Genotype ranking on Genotype ranking on bothboth MEAN MEAN andand STABILITYSTABILITY
“The idealgenotype”
Weikai Yan2006
Genotype classification Genotype classification
Mean
Stability
High mean performance
Low mean performance
High stability Generally adapted
(VERY GOOD)
Bad everywhere
(VERY BAD)
Low stability Specifically Adapted
(GOOD)
Bad somewhere
(BAD)
Are there stability genes?!
Weikai Yan2006
G x E data analysis G x E data analysis summarysummary
• 1) Mega-environment analysis• 2) Test environment evaluation• 3) Genotype evaluation
Important comments:– (2) and (3) are meaningful only for a single mega-environment– Any stability analysis is meaningful only for a single mega-
environment– Any stability index can be used only as a modifier to the ranking
based on mean performance
Contact: wyan@ggebiplot.com
Other ways to view Other ways to view a GGE biplota GGE biplot
Weikai Yan2006
Inner-product propertyInner-product property
Weikai Yan2006
Ranking on a single Ranking on a single environmentenvironment
Weikai Yan2006
Ranking on two Ranking on two environmentsenvironments
Weikai Yan2006
Relative adaptation of a Relative adaptation of a genotypegenotype
Weikai Yan2006
Compare any two genotypesCompare any two genotypes
Contact: wyan@ggebiplot.com
Biplot analysis of Biplot analysis of Genotype by trait Genotype by trait
datadata
Weikai Yan2006
Objectives of G By T data Objectives of G By T data analysisanalysis
• Genotype evaluation based on trait profiles
• Relationship among breeding objectives
Weikai Yan2006
Data of 4 traits for 19 Data of 4 traits for 19 covered oat varieties covered oat varieties
(Ontario 2004)(Ontario 2004)
(Background info: High yield, high groat, high protein, and low oil are desirable for milling oats)
Weikai Yan2006
Relationships among Relationships among traitstraits
Weikai Yan2006
Trait profile of each Trait profile of each genotypegenotype
Weikai Yan2006
Trait profile of a Trait profile of a genotypegenotype
Weikai Yan2006
Trait profile comparison Trait profile comparison between two genotypesbetween two genotypes
Weikai Yan2006
Genotype ranking based Genotype ranking based on a traiton a trait
Weikai Yan2006
Parent selection based on trait Parent selection based on trait profilesprofiles
Weikai Yan2006
Independent cullingIndependent culling
Contact: wyan@ggebiplot.com
Fuller understanding Fuller understanding of MET data of MET data
MET data are more informative than you thought
Weikai Yan2006
A A G-E-TG-E-T 3-way dataset 3-way dataset contains various 2-way contains various 2-way
tablestables• G by E data• G by T data• E by T data:
– for each genotype; all genotypes• G by V data:
– each E-T as a variable (V)• P by T data:
– each G-E as a phenotype (P)• Genetic association by environment data• Trait association by environment data
Weikai Yan2006
Genetic-covariate by Genetic-covariate by environment biplotenvironment biplot
(QTL by environment biplot)(QTL by environment biplot)
BarleyGenomicsData
Weikai Yan2006
Trait-association by Trait-association by environment biplotenvironment biplot
OatMETData
Weikai Yan2006
Four-way data analysisFour-way data analysis
• Year…
Contact: wyan@ggebiplot.com
Conclusions Conclusions
Weikai Yan2006
Conclusion (1)Conclusion (1)
• “GGE biplot analysis” is an effective tool for G by E data analysis to achieve understandings about….
1. the target environment,
2. the test environments, and
3. the genotypes
4. stability analysis is useful only to a single mega-environment
Weikai Yan2006
Conclusion (2)Conclusion (2)
• “GGE biplot analysis” is an effective tool for G by T data analysis to achieve understandings about….
1. the interconnected plant system,
2. positively correlated traits
3. negatively correlated traits
4. the strength and weakness of the genotypes
Weikai Yan2006
Conclusion (3)Conclusion (3)
• “Biplot analysis” is an effective tool for other two-way table analysis
–Marker by environment–QTL by environment–Gene by treatment–Diallel cross–…
Weikai Yan2006
Conclusion (4)Conclusion (4)
• Biplot analysis can be VERY EASY…– From reading data to displaying the biplot: 2 seconds– Displaying any of the perspectives of a biplot and
changing from one to another: 1 second– Displaying the biplot for any subset: 1 second– Learning how to use the software and interpret
biplots: 30 minutes– Everything can be just one mouse-click away
Contact: wyan@ggebiplot.com
Thank youThank youContact: Weikai Yan: wyan@ggebiplot.com
web: www.ggebiplot.com
Recommended