View
2
Download
0
Category
Preview:
Citation preview
Statistical Analysis of Genetic and Phenotypic Datafor Breeders:
Hands on Practical Sessions (GBLUP-RR)
Paulino Pérez 1
José Crossa 2
1ColPos-México 2CIMMyT-México
June, 2015.
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)1/34
Contents
1 General comments
2 GBLUP-Ridge Regression
3 Application examples
4 Biplot from marker effects
5 Extension of BRR to include infinitesimal effect
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)2/34
General comments
General comments
Remember,
1 A simple model used frequently in plant breeding stands that thephenotypic value of an individual (P) is expressed as the summation ofthe genetic value (G) and the residual environmental effect (E):
P = G + E , (1)
where G includes additive, dominance and epistatic effects.2 A model that includes solely additive effects (A) can be easily derived
from (1), and can be expressed as follows,
P = A + E ′ (2)
where E ′ includes effects that are non additive.
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)3/34
General comments
Continue...
The breeding value (BV ) for an individual can be computed based on narrowsense heritability (h2),
BVi = µ+ h2(yi − µ),
where µ is mean phenotypic value of a population and yi is the phenotypicvalue for individual i .Obviously it is necessary to have information of parents and offsprings tocompute this.
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)4/34
General comments
Continue...In Genomic Selection (GS), genetic values are approximated using linearregression (Meuwissen et al., 2001), that is:
yi = gi + ei = µ+
p∑j=1
xijβj + ei (3)
Relationships between markergenotypes (x1i : 0 and 1) andphenotypes (yi ) of the individuals(open circles) in a training population.If the marker genotype is correlatedwith the phenotype, segregation ismodelled using the bold line (takenfrom Nakaya and Isobe, 2012).
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)5/34
General comments
Continue...
In GS it is possible to obtain Genomic Estimated Breeding Values (GEBVsfor short). This can be done simply by adding up marker effects (according toits marker genotypes) obtained from a training population, that is:
GEBVi =
p∑j=1
xij βj (4)
Next we show how to obtain the predictions yi (and in some cases βj ) usingseveral models.
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)6/34
General comments
Continue...
Figure 1: Graphical representation of parametric and non-parametric methods usedcommonly in whole-genomic prediction.
In this presentation we will focus in Ridge Regression.
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)7/34
General comments
Continue...
−6 −4 −2 0 2 4 6
0.0
0.2
0.4
0.6
0.8
βj
p(β j
)
GaussianDouble Exponential
Scaled−t (5df)BayesC (π=0.25)
Figure 2: Prior densities of regression coefficients with Markers.
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)8/34
GBLUP-Ridge Regression
GBLUP-RR
This is the most basic model used in GS. Let
yi = gi + ei = µ+
p∑j=1
xijβj + ei
marker effects are obtained by solving the following optimization problem,
minβ, λ{(y −
∑Xjβj)
′(y −∑
Xjβj) + λ∑
β2j
}, (5)
where λ > 0 is a regularization parameter.Notes:
1 λ is unknown and can be selected by using cross-validation2 we need to minimize a “penalized sum of squares”.
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)9/34
GBLUP-Ridge Regression
Continue...
The optimization problem has a closed solution,
β = (X ′X + λI)−1X ′y ,
where y = y − µ1. Unfortunately, we need to know the value of λ to use thissolution.
The problem can be solved easily using the Bayesian framework.
Let β ∼ N(0, σ2β I) and e ∼ N(0, σ2
e I), and u = Xβ, then model (3) can bewritten as:
y = µ1 + u + e (6)
Model (6) is know as GBLUP. Note that u ∼ N(0, σ2βXX ′)
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)10/34
GBLUP-Ridge Regression
Training and testing sets
Note also that the covariance matrix for u involves the product XX ′, which isproportional to the Genomic Relationship Matrix proposed by VanRaden(2008).
We will assume that u ∼ N(0, σ2uG) with G = XX ′/k . The mix-model
equations for (6) are as follows:(1′1σ−2
e 1′σ−2e
1′σ−2e Iσ−2
e σ−2u + Gσ−2
u
)(µu
)=
(1′yy
)(7)
u and µ are obtained solving the mix-model equations, assuming that thevariance components σ2
e and σ2u are known.
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)11/34
GBLUP-Ridge Regression
Continue...
If we have individuals for training and testing, we can partition G and u asfollows,
G =
(G11 G12G21 G22
),u =
(u1u2
),y =
(y1y2
),1 =
(1112
)1=individuals in the training set, 2=individuals in the testing set. µ and u1 areobtained as the solution of the mix-model equations,(
1′111σ−2e 1′1σ
−2e
1′1σ−2e I11σ
−2e σ−2
u + G11σ−2u
)(µu1
)=
(1′1y1y1
)The predictions for individuals in the testing set are given by
y2 = µ12 + G21G−111 u1
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)12/34
Application examples
Wheat dataset
Data for n = 599 wheat lines evaluated in 4 environments, wheatimprovement program, CIMMyT. The dataset includes p = 1279 molecularmarkers (xij , i = 1, ...,n, j = 1, ...,p) (coded as 0,1). The pedigree informationis also available.
Lets load the dataset in R,1 Load R2 Install BGLR package (if not yet installed)3 Load the package4 Load the data
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)13/34
Application examples
Continue...
Figure 3: Loading the BGLR package and the wheat dataset.
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)14/34
Application examples
Continue...You can explore the MM matrix, pedigree matrix within R,
fix(wheat.X)fix(wheat.A)
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)15/34
Application examples
Continue...
Lets assume that we want to predict the grain yield for environment 1 usingridge regression or equivalently the GBLUP. We do not know the value for σ2
eand λ, so we can obtain estimates using the data.
We will use the function BGLR. R code below fit the RR model using Bayesianapproach with non informative priors for σ2
e , σ2β ,
rm(list=ls())library(BGLR)data(wheat)
X=wheat.XY=wheat.Y
#Linear predictorETA=list(list(X=X,model="BRR"))
#Or#ETA=list(Markers=list(X=X,model="BRR"))
fmR<-BGLR(y=Y[,1],ETA=ETA,nIter=10000,burnIn=5000,thin=10)plot(fmR$yHat,Y[,1])
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)16/34
Application examples
Continue...
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
● ●
●
●●
●
●
●●
●
●
●
● ●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●●
●
● ●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●●
●
●
●●
●
●
●
●
●
●●
● ●
●●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
● ●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0
−2
−1
01
23
y
y
Figure shows observed vs predictedgrain yield.
Predictions y = µ+X β, and estimatesfor σ2
e , σ2β can be obtained easily in R
> fmR$yHat> fmR$varE[1] 0.5481523> fmR$ETA[[1]]$varB[1,] 0.002721897>
From the output above,λ = σ2
e/σ2β = 0.5482/0.0027 = 201.38
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)17/34
Application examples
A complete list of the objects attached to fmR can be obtained by typing
> names(fmR)[1] "y" "whichNa" "saveAt" "nIter" "burnIn" "thin" "weights"[8] "verbose" "response_type" "df0" "S0" "yHat" "SD.yHat" "mu"[15] "SD.mu" "varE" "SD.varE" "fit" "ETA"
A complete description of the output can be found in the BGLR referencemanual.
The GEBVs can be obtained easily in R,
#GEVBs
#option 1X%*%fmR$ETA[[1]]$b
#option 2fmR$yHat-fmR$mu
#Or fmR$ETA$Markers$b
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)18/34
Application examples
Training and testing set
Lets assume that we want to predict the grain yield for some wheat lines.Assume that we have only the genotypic information for those lines,
#Training and testing setsets<-wheat.setsy<-Y[,1]yNa=ywhichNa=(sets==2)yNa[whichNa]=NAfmR<-BGLR(y=yNa,ETA=ETA,nIter=10000,
burnIn=5000,thin=10)
plot(fmR$yHat,y,xlab="Phenotype",ylab="Pred. Gen. Value" ,cex=.8,bty="L")
points(x=y[whichNa],y=fmR$yHat[whichNa],col=2,cex=.8,pch=19)legend("topleft", legend=c("training","testing"),bty="n",
pch=c(1,19),col=c("black","red"))
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)19/34
Application examples
Continue...
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
● ●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
● ●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
−2
−1
01
23
Phenotype
Pre
d. G
en. V
alue
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
trainingtesting
> MSE.tst<-mean((fmR$yHat[whichNa]-y[whichNa])^2)> MSE.tst[1] 0.8110028> MSE.trn<-mean((fmR$yHat[-whichNa]-y[-whichNa])^2)> MSE.trn[1] 0.4364856> COR.tst<-cor(fmR$yHat[whichNa],y[whichNa])> COR.tst[1] 0.4338218> COR.trn<-cor(fmR$yHat[-whichNa],y[-whichNa])> COR.trn[1] 0.7839615
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)20/34
Application examples
Questions?
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)21/34
Application examples
Exercise
Suppose that we want to predict the grain Yields for individuals in set 2 andenvironment 4.
Write an R program to solve the problem described aboveObtain the correlations in the training setObtain the correlations in the testing setWrite the predictions to a csv (comma separated values) so that you canread the file in Excel.
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)22/34
Application examples
Continue...
Suppose that we are interested in studying the predictive power of GBLUP.We can perform a simulation study to that end. In this exercise you willperform 10 fold cross validation.
Write an R program to perform a 10 fold cross-validation, use the useobject sets to allocate observations to folds.Report your results.
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)23/34
Biplot from marker effects
Biplots (The theory...)
A biplot is a two-dimensional representation of a data matrix C showinga point for each of the n observation vectors (rows of C) along with apoint for each of the p variables (columns of C), see Gabriel, 1971.Perform the SVD of C, that is C = UDV ′
Let Up×q = [α1, ...,αq] and V q×q = [γ1, ...,γq].plot α1 vs α2
Draw arrows, the coordinates of the end of the arrow are given in γ1 andγ2.
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)24/34
Biplot from marker effects
Continue...
●
● ●
●●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●●
●
● ●●●
●
●
●
● ●●●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●●
● ●●
●
●
●
● ●●
●
●●
●
●
●
●● ●
●●●
●
●
● ●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●●
●●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●●●
●
●
● ●●
●●
●
●●●
●●●
●
●●
●
●
●
●
●●
●●
●
●●
●●●
●●
●
●
●
●
●●
●
●
●●●
●●
●
●
● ●
●
●
●●
●
●●
●
●●
●
●
●
●
●●
●
●● ●
●●
●
●●
● ●●
●
●
●
●●
●
●●
●
●
●
●●
●●●●
●●
●
●●
●
●
●
●●
●
●●
●
●
●●●
●
●
●
● ●
●●
●
●
● ●
●
●
●
● ●
●
●
●
●●
●
●
●
●●
●
●●
●●●●
●
● ● ●
●
●
●
●●●
●
●●●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
● ●
●
●
●
●
●
●●●
●● ●
●
●● ●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●●
●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
● ●
●
●●
●●●
●
●
●
●
●●
●●
●
●●●
●
●
●
●
●
●●●●
●
●●
●
● ● ●●
●
●
●
●
●
●
●●● ● ●●
●●
●
●●
●●●
●
●
●
●
●
●
●
●●
●●
●
●●●
●
●
●●
●
● ●●
●
●
●
●
●
●●
●●
●
●● ●
●
●●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
● ●
●●
●
●
●
●
●
●●
●● ●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●●●
●●
● ●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●● ●
●
●● ●
●●●
●
●●
●
●
● ●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●●
●
●
●
●●
●
●
●●
●
●●
●
●●●
● ●
●
●●
●
●
●
●
●
●● ●
● ●
●
●
●●
● ●●
●
●●
●
●●
●
●
●●
●●
●●
●●
●● ●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
● ●
●
●
● ●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
● ●
● ●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
● ●●
●
●●
●●
●●●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●●
● ●
●
●●●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●●
●
●
●
●●
●
●
●●
●●
●●●●
●●
●●●●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●●●●
●
●
●
●●
● ●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●
● ●
●
●
●●
●● ●
●
●
●●● ●●
●
●●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●●
●
●
●
● ●
●
●●
●
●
●
●●●
●
●
●●
●
●●
●
●●●●
●●
●
●
●
●●●
●●
●
●
●
●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●●●
●
●●●
●
●
●● ●
●
●
●●
● ●
●
●●● ●
●
●
●
●●
●
●●
●
●
●●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●●
● ●
●
●
●●
●
●
●
●●
●
●● ●
●●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●● ●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
● ●
●
●
●
●
●
●
● ●
●
●●
●
●
−0.1 0.0 0.1 0.2
−0.
10.
00.
10.
2
Comp.1 (50.17%)
Com
p.2
(24.
27%
)
wPt.4600
wPt.3533
wPt.2644
wPt.3462
wPt.3697
wPt.4988
wPt.6047
wPt.3922
wPt.9256
wPt.3393
wPt.3904
wPt.4706c.345897
c.373879
c.380591c.381717
c.408424
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
E1
E2E3
E4
−20 −10 0 10 20 30 40
−20
−10
010
2030
40
Figure 4: An example of a Biplot derived from markers.CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)25/34
Biplot from marker effects
Continue...
Why is the biplot useful?
1 Points in the biplot are the marker effects projected in the first twocomponents.
2 The “environmental effects” are displayed as vectors whose coordinatesare given by γ1 and γ2.
3 The length of the vectors approximates the variance accounted for by thespecific molecular marker and “environmental effect”.
4 The cosine of the angle between two environments, approximates thecorrelation of the two environments with an angle of zero indicating acorrelation of +1, an angle of 90◦ (or −90◦) a correlation of 0, and anangle of 180◦ a correlation of −1.
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)26/34
Biplot from marker effects
R Code
rm(list=ls())
#Set the working directorysetwd("C:/Users/P.P.RODRIGUEZ/Desktop/Slides Paulino/3. GBLUP-RR/examples")
#Function for biplotssource("biplot.R")
#Import the datadata=read.csv("mean_betas_PMBL.csv",header=TRUE)data=data[,2:5]
#Principal component analysispca.betas= princomp(data,cor=TRUE)
#Default biplot in Rbiplot(pca.betas)
#Modified functionmy.biplot.princomp(pca.betas)
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)27/34
Biplot from marker effects
Exercise
1 Use the wheat dataset described in the previous slides and fit aGBLUP-BRR model and save the marker effects for the 4 environmentsto an Excel file (csv).
2 Use the marker effects obtained in Step 1, and create a biplot.
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)28/34
Extension of BRR to include infinitesimal effect
Extension of basic model to include infinitesimal effect
de los Campos et al. (2009) extended the basic BRR model to include aninfinitesimal effect, that is:
yi = µ+
p∑j=1
xijβj + ui + ei , (8)
where u ∼ N(0, σ2uA) and A is the pedigree matrix.
The model can be fitted using the standard linear mixed model theory or usingBayesian methods.
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)29/34
Extension of BRR to include infinitesimal effect
Example
rm(list=ls())library(BGLR)data(wheat)
X=wheat.XY=wheat.YA=wheat.A
#Linear predictorETA=list(list(X=X,model="BRR"),
list(K=A,model="RKHS"))
fmR<-BGLR(y=Y[,1],ETA=ETA,nIter=10000,burnIn=5000,thin=10)plot(fmR$yHat,Y[,1])
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)30/34
Extension of BRR to include infinitesimal effect
Exercise
Suppose that we want to predict the grain Yields for individuals in set 2 andenvironment 1 using the marker and pedigree information jointly.
Write an R program to solve the problem described aboveObtain the correlations in the training setObtain the correlations in the testing setWrite the predictions to a csv (comma separated values) so that you canread the file in Excel.
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)31/34
Extension of BRR to include infinitesimal effect
Marker based vs Maker + Pedigree based model
How can we design a simulation to study prediction ability of a model withmarkers only and a model that includes Markers + Pedigree?
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)32/34
Extension of BRR to include infinitesimal effect
Concluding remarks
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)33/34
Extension of BRR to include infinitesimal effect
References
de los Campos G., J. Hickey, R. Pong-Wong, H. D. Daetwyler and M.P.L.Calus,. 2012.Whole Genome Regression and Prediction Methods Applied to Plant andAnimal Breeding,Genetics.
Gabriel, R. K. 1971.The Biplot Graphical Display of Matrices with Application to PrincipalComponent Analysis.Biometrika, 58, 453-467.
Endelman, J. B., 2011.Ridge Regression and Other Kernels for Genomic Selection with Rpackage rrBLUP,The Plant Genome 4(3): 250-255.
CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)34/34
Recommended