Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
1
Dr. Zulfi Jahufer & Dr. Dongwen Luo
The quick start manual is focused on providing users with basic operational instructions. The example
data sets, especially the 3 case studies, can be used to practice different analyses. Results from analysis of
the case study data are published in:
Jahufer, M.Z.Z., Luo, D. (2018). “DeltaGen” - a comprehensive decision support tool for plant breeders.
Crop Science. 58: 1-14. Doi:10.2135/cropsci12017.07.0456
CONTENTS Page
➢ Main operations tab commands 3
➢ Uploading a data file 4
▪ Matching variable identifiers 5
➢ Data check 6
➢ Univariate analysis 8
➢ Pattern analysis (within univariate model option) 12
➢ Univariate analysis –Two trait combination 15
➢ Guide to calculation of generating a GEBV – “Sample cost” 17
➢ Estimation of genetic gain (G) and its simulation 18
➢ Multivariate analysis 20
DeltaGen: Quick start manual
2
▪ Using Plot 21
o Biplots (on raw data) 21
o Matrix plots (phenotypic correlation- on raw data) 22
▪ MANOVA (additive variance/co-variance & correlation) 23
▪ Smith-Hazel selection index 24
➢ Pattern analysis for multiple traits 26
➢ Trial designs 30
o Completely randomized 31
o Randomized complete block 33
o Factorial design 35
o Row & column design 37
o Row & column design with repeated checks 39
➢ Generating a row-column design Data Entry Spreadsheet 42
▪ Data entry spreadsheet generated within DeltaGen 44
▪ Importing a data entry sheet not generated within DeltaGen 45
➢ Save session and Quit 48
Please note that clicking on “Help” in the analysis screens will provide information on underlying theory
& associated references.
The efficiency of, data uploading and down loading of results, in DeltaGen can be improved if the program
is run using Google Chrome or Firefox.
3
Main operations tab commands
Clicking on any of the commands above will open dropdown menus.
Introduction
This window is displayed when DeltaGen is opened.
Trial Design
This command will open a screen that will provide a range of experimental designs that DeltaGen can generate. A full description of this command will be provided under experimental design, the last section in this manual.
Data Input: uploading a data file
Clicking on Data Input will open the screen shown on the next page.
Main operations tab commands
4
Shows data files opened
Upload enables data files to be uploaded using Browse; Examples enables practice data sets
within DeltaGen to be uploaded; Clipboard enables copied data to be uploaded (CSV files).
Click to upload external data files
Data file types accepted by DeltaGen
! CSV files are preferred – save EXCEL data files in csv format for uploading into DeltaGen
! Missing values in the data matrix should be identified before uploading any data set.
The dropdown menu provides 3 options: Empty or * or •
Not defining missing values before uploading data will result in Data analysis abortion.
Follow STEPS 1 & 2 to upload a data set from an external file.
STEP 1
STEP 2
Files can be saved in RData or CSV formats
Uploading a data file
PPl ! Data file names cannot have gaps between words.
5
After Uploading a data set following steps 1 & 2, the column identifiers of variables in the data
matrix have to be matched with those already defined as (Year, Season, Location, Replicates,
Row, Column, Sample and Line) in DeltaGen.– as shown in STEP 3
Traits
STEP 3 - To match a DeltaGen variable with an associated column identifier in the data matrix,
click on the relevant dropdown menu and choose the matching column name; e.g. by clicking on
the dropdown menu for Location, column “Site” was selected. Similarly, Rep for Replicates. If a
variable is not in the uploaded data, this is left as “Null”.
STEP 4 - Clicking on Run will submit the data for analysis. You are now ready to check or analyse your data.
Matching variable identifiers (this step can be omitted when using the example data sets in DeltaGen)
6
Data check: Graphical or tabular summary of raw data is an optional data quality check before univariate or multivariate analysis.
The plot-type dropdown menu
provides a range of plots;
Histogram, Density, Scatter, Line,
Box-plot, to illustrate the data.
• First click on the X-variable,
in this case DM (dry matter).
• You can arrange the plots
by defining the row and
column layout; in the
example presented using
Histograms, the Locations
are presented down rows
and the dry matter in each
of the 3 replicates within
each location as columns.
7
Data check continued. The heat-map option from the dropdown menu under Pivot Table, illustrates the actual values and
spatial distribution of summer dry matter raw data across a field experiment based on a row-column experimental design
consisting of 3 replicates. This can also be used to identify data entry errors.
High value Missing data Low value
Clicking on these headings will show the associated data below.
All factors, e.g.
Replicates,
Column, Row,
can be moved
across; point on
a factor, left
click on the
mouse, hold
down and
move.
This will result
in changing the
configuration of
the table.
8
After uploading the data, Click on the “Models” command and select Univariate. This screen will open.
The Data Information panel provides a summary of the uploaded data.
The demonstration/practice data set used, consists of 107
entries of Perennial ryegrass (Lolium perenne L.) evaluated at 3
locations over 3 years, for seasonal growth. Data file name:
CaseStudy 1 under Examples.
Univariate analysis: Case study 1
• The default settings for the linear mixed effects model are; modelling and half sib family.
• If simulation of genetic gain is to be conducted, the choice of half sib (HS) or full sib (FS) family is
important. Alternately if the analysis is not for estimation of genetic components of variance or is
based on a fixed effects model, you can continue using the HS default option.
• Simulation must be selected only after conducting the variance component analysis for HS or FS.
On opening the univariate analysis screen, the Primary Trait box will be at “Null”. Clicking on this
box will open the dropdown menu with all the traits in the uploaded data set; in the example; NZGro
(seasonal growth at 3 locations in New Zealand).
• Fixed terms: clicking in the fixed terms box will open a dropdown menu that will enable you to
select the appropriate factors in the data set (years, locations, seasons, in the example). Select
“Null” if no fixed terms are to be included in the model.
• Random terms: Select the appropriate factors from dropdown menu which opens in this box.
Traits for BLUP or BLUE estimates can be selected from the associated dropdown menus. Click Run to begin
analysis
9
Univariate analysis (Case study 1 continued): The linear model - One trait
Replicates nested within locations within seasons within years,
Lines
Line-by-year interaction,
Line-by-season interaction,
Line-by-year interaction
• Select Primary trait to be analysed, (From our example data set “NZGro has been selected),
• Select Fixed terms and their interactions as required, (if an additional term that does not
appear in the dropdown menu needs to be added, double click in the fixed terms box, enter
the term and click on “Add” that appears)
• Select Random terms and interactions as required, (if an additional term that does not
appear in the dropdown menu needs to be added, double click in the random terms box,
enter the term and click on “Add” that appears)
• Select Heritability if appropriate, (In Case study 1, this was considered as Repeatability)
• Select BLUP if lines are random or BLUE if fixed,
• Click Run to begin analysis.
10
As the lines were
considered as random
effects in the linear
model and the BLUP
estimate option was
selected, clicking on
the BLUP button has
provided this analysis
output:
BLUP values for mean
growth based on line
performance across
years, seasons and
locations.
Univariate analysis (Case study 1 continued): linear mixed model analysis-
output
Results for Fixed effects
Genotypic variance (σ2g) among the 107entries
Associated ± standard error
Specifically for case study 1, this estimate was considered as Repeatability.
Error CV of trial
11
Univariate analysis: Case study 2 The demonstration/practice data set used, consists of 90 half
sib (HS) families of Perennial ryegrass (Lolium perenne L.)
evaluated at 1 location over 3 years, for seasonal growth.
Data file name: CaseStudy 2 under Examples.
Results for Fixed effects
Genetic variance (1/4σ2A ) among the 90
HS families Associated ± standard error
Narrow sense heritability (h2n)
12
Pattern analysis (within Univariate model option) – Multi-location (more than 2 locations)
The first step in Pattern analysis is to select Line-by-location interaction.
This is to generate a two way line-by-location BLUP matrix.
BLUP estimates for each individual line within each
location (in the example Rua, PN and KERI) will be
generated.
Click on
pattern
analysis
Click Run.
The demonstration/practice data set used, consists of 107
entries of Perennial ryegrass (Lolium perenne L.) evaluated a 3
locations over 3 years, for seasonal growth.
Data file name: CaseStudy 1 under Examples.
Locations:
Rua, Ruakura
PN, Palmerston North
KERI, Kerikeri
13
Clicking on Pattern Analysis – Cluster will result in generating
Line groups based on performance across locations and also
associated dendrograms of locations and lines.
Location groups: 1, 2 & 3.
Locations: KERI, Rua, PN
Dendrograms of Location and Line grouping
14
Clicking on Pattern Analysis – PCA (Principle
Component Analysis) will result in generating a
biplot based on PC1 and PC2, the line groups
and individual line labels. The directional vectors
are the locations.
Line clusters
Directional
vectors
15
Univariate analysis (continued): The linear model - Two trait combination
The demonstration/practice data set used, consists of 147 lines (half-
sib families) of switchgrass (Panicum virgatum L.) evaluated across 2
locations over 2 years using randomized complete block designs with 3
replicates. Data on 3 traits; dry matter yield (DMY), cell wall ethanol
(CWE) and Klason lignin (KL), are included.
Data file name: CaseStudy3 under Examples.
Analysis with the secondary trait included provides an opportunity to
simultaneously estimate narrow sense heritability for each trait, and their
genetic correlation.
These outputs are then automatically integrated into the breeding strategy
simulation models for estimation of Correlated Response to Selection of
the primary trait based on secondary trait selection.
All the initial steps with regards the fixed and random term models for the primary trait are
similar to the single trait analysis.
For analysis of the Secondary trait:
• Tick the box for secondary trait and select the trait from the dropdown menu, (From
our example data set, trait KL was selected)
• Click the MANOVA box and select the terms in the dropdown menu to conduct a
variance/covariance analysis,
• Click Run
16
Univariate analysis Output – two trait (CWE/KL) combination - continued
Results from this analysis are similar to those from
the single trait analysis, but also provide information
on narrow sense heritability of the secondary trait as
well as the genetic correlation between the two
traits, CWE/KL.
Variance components for Random effects
from primary trait analysis.
Narrow sense heritability of primary trait.
17
Listed below is a guide for calculating the cost of generating a single Genomic Estimated Breeding Value
(GEBV) – referred to as Sample Cost in GS simulation
Step Cost/sample
Genotyping $53
DNA isolation $7
Library generation $9
†DNA sequencing $37
SNP genotypes $10
(bioinformatics)
Prediction of GEBV's $5
(statistical model)
Total $68
Other notes:
Assumes GBS as the genotyping method.
Sequencing uses an Illumina HiSeq 2500 with version 4 chemistry
†Cost is for the 96-plex level which will change with the level of multiplexing. The suggested multipliers: (48-ples ×2), (192-plex ×0.5), (384-plex ×0.25)
‡Dodds, K.G.; McEwan, J.C.; Brauning, R.; Anderson, R.M.; van Stijn, T.C.; Kristjánsson, T.; Clarke, S.M. (2015). Construction of relatedness matrices using
genotyping-by-sequencing data. BMC Genomics 16: 1047.
18
Estimation of Genetic Gain (G) and its simulation
Clicking on the Strategy dropdown window will enable selection of any of the breeding strategies below:
The data set in Case study 2 will be used to demonstrate application of three breeding strategies. The data
set consists of 90 half sib (HS) families of Perennial ryegrass (Lolium perenne L.) evaluated at 1location
over 3 years, for seasonal growth. Data file name: CaseStudy 2 under Examples.
Clicking on simulation will open the breeding strategy window.
The “Industry standard” can be the trait value of a commercial check or mean of checks in the genetic
family evaluation trial. Some may wish to use the long term average, across the target population of
environments, of the best commercial cultivars. This value will provide a relative comparison (%) of the
genetic gain estimated from family selection to an industry standard.100 is a default value. Any value can
be entered; mm, kg ha-1, …..
19
HS family based
breeding models
including
Genomic
selection (GS).
All the simulation variables have dropdown menus which provide a range of values to select from.
All estimated costs ($) should be entered manually.
Click Update every time a breeding strategy, simulation variable or cost ($) is changed. This will update G estimates and associated costs.
These constants cannot be changed
Inputs automatically transferred from linear model analysis
If Full Sib families, HS will change to FS. If two traits like CWE and KL, primary and secondary, respectively, are analysed, then models for Correlated response to
selection will be available.
Gc, gain estimate per cycle
Ga, gain estimate per annum
% (relative to parental mean)
% (relative to Industry standard)
Estimation of Genetic Gain (G) and its simulation continued
These constants cannot be changed Enter accuracy
value manually
20
Multivariate analysis
To begin: Click on
the “Model”
command and
select Multivariate. Plot gives you options to
generate a Biplot or a Matrix
Plot of phenotypic correlation,
based on raw data.
MANOVA (Multivariate analysis of variance) generates a variance and
covariance matrix and genotypic or genetic correlation coefficients for
the traits chosen from the dropdown menu in the Multiple traits box.
Clicking on Selection index activates a window that enables use of the
Smith-Hazel index.
Clicking in this box will show you the list of traits, in the uploaded data matrix, to be selected
for multivariate analysis based on the three options; Plot, MANOVA and Selection Index.
Used for
highlighting
groups in the
Plot option.
The demonstration/practice data (File name: CaseStudy3 in Examples. You need to first upload this file using: Data Input),
consists of 147 lines (half-sib families) of switchgrass (Panicum virgatum L.) evaluated across 2 locations over 2 years using
randomized complete block designs at each location containing 3 replicates. Data on 3 traits; biomass dry matter yield (DMY),
cell wall ethanol (CWE) and Klason lignin (KL) are included in matrix.
21
Using Plot
! This Biplot is based
on raw data.
22
Pearson phenotypic correlation based on raw data.
23
1) Click on MANOVA,
2) Click on Multiple traits box and choose traits,
3) Click on MANOVA terms box and chose the effects for the completely random
linear model (keep this model simple by choosing main effects and only their two
way interactions),
4) Click Run.
Multivariate analysis Output – estimates are genetic if the data are generated from HS or FS families
MANOVA
1
2
3
4
24
1) Click on Selection Index,
2) Click on Multiple traits box and choose traits,
3) Manually enter the Index weightings,
4) Click on LME fixed terms box and select the fixed effects or leave as Null,
5) Click on LME random terms box and select the random effects,
6) Click on MANOVA terms box and chose the effects for the completely random linear
model (keep this model simple by choosing main effects and only their two way
interactions),
7) Click on the Selection pressure box and choose the intensity of selection,
8) To estimate the genetic gain for each trait under selection (DMY, CWE, KL) tick G,
9) Click Run.
Smith-Hazel selection index
1
2
3
4
5
6
7
8
9 Contains theory
& references
25
[𝑏] = [𝑃]−1[𝐴][𝑤]
[𝑏]
[𝑃]−1
[𝐴]
Smith-Hazel index - Output window
Smith-Hazel index (I): the genetic worth (breeding values) of the HS families.
The Smith-Hazel index equation
Individual trait BLUP’s
Gc, (%) gain estimate per selection cycle in unites of measurement of each trait,
at a 20% selection pressure.
26
Pattern analysis for Multiple Traits
Step 1, upload the Line-by-Trait mean data matrix into
DeltaGen using Data Input. This example is based on the
data file MultiTraitMatrix.csv found in examples.
Step 2, Click on Pattern Analysis
Step 3, select the variables by clicking on them, and keep
the standardized data option on,
Step 4, Click on Run.
Cluster analysis will produce Line groups and a heat map
with Line and Trait dendrograms.
The PCA BiPlot option will provide a graphical summary
of the Line clusters and trait association (shown by the
directional vectors).
27
Pattern analysis – output.
1, 2 & 3.
Line numbers
28
Traits
Lines
Dendrograms for Trait and Line grouping
29
Line cluster groups: Magnification and quality of the contents of the biplot
can be adjusted by moving the scale controllers
Magnification and quality of the contents of the biplot
can be adjusted by moving the scale controllers
The entries (lines, genotypes…..) in the biplot can
be shown as dots or labels by selecting the option in
the dropdown menu.
Directional
vectors
30
Trial design instructions
Click to open design menu
Clicking on “Design Type” will display the range of trial designs available: completely randomized, randomized complete block, factorial, row-column (repeated spatial checks can also be included.
These values can be entered manually.
To generate a design with entry names, copy and paste the entry list from an
Excel or CSV document. If you have repeated checks, include them at the
beginning of the list of entries.
• The Random Seed (RS) number “0” results in a new randomization of
entries generated for every run.
• Changing the RS to any number higher than “0” will result in
generating the same randomization, provided the same design
structure: row, column, replicate, is maintained
Applicable only to row-column designs
31
Generating a Completely Randomized trial design: Example; generate a design for 6 treatments. Each treatment will be
replicated 3 times. The total number of entries will therefore be 6×3 = 18. The row & column combinations could be: 2×9, 9×2,
6×3, 3×6.
Let’s generate a 2×9 design.
Click on Run: The data entry format sheet (shown below) and the trial
plan (shown on the next page) will be generated.
Each time Run is clicked a new randomization layout is generated!!!
This trial design format can be saved as a CSV file
32
Completely Randomized
33
Click on Run: The data entry format sheet (shown below) and the trial plan (shown on
the next page) will be generated.
Each time Run is clicked a new randomization is generated !!!
As each treatment occurs only once in a replicate, 1 was entered.
Generating a Randomized Complete Block trial design: Example; generate a design consisting 3 blocks (replicates) with
50 treatments each. Each treatment will appear once (1) in a replicate. The row & column combinations per replicate: 5×10,
10×5, 2×25, 25×2.
Let’s generate a 5 row×10 column per rep by 3 replicate design.
This trial design format can be saved as a CSV file
34
Randomized Complete Block
35
Click on Run: The data entry format sheet (shown below) and the trial plan (shown on the next
page) will be generated.
Generating a Factorial trial design: Example; generate a design for an experiment to determine herbage dry weight response of
5 perennial ryegrass cultivars to 4 levels of application of a nitrogen fertilizer. The design has 4 replicates.
The row & column combinations per replicate: 2×10, 10×2, 5×4, 4×5. Let’s generate a 5 row×4 column per rep by 4 replicate design.
This trial design format can be saved as a CSV file
36
Factorial design
Cv 5/Nfert 4
Cv 5/Nfert 2
Cv 5/Nfert 3 Cv 5/Nfert 1
37
Generating a Row & Column trial design: Example; generate a design for 50 treatments with
4 replicates. The total number of entries across all 4 replicates will be 200. The possible row &
column combinations per replication could be: 2×25, 25×2, 5×10,10×5.
Let’s generate a 5×10 per rep by 4 replicate design. The number of total rows across all 4
replicates will be 20 (5 rows×4 replicates).
If the random seed is set at “0”, every run will generate a different randomisation of entries within
each replicate.
The same randomization for the same number of lines, rows and columns will continue to be
repeated for every run, if a constant random seed number above “0” is used.
This trial design format can be saved as a CSV file by clicking on the “Save design result” option.
This is the same for design layout.
Click on Run: The data entry format sheet and the trial plan (shown on the next page) will
be generated.
You can also replace the entry numbers with entry names by copying a column of names from a
spreadsheet and pasting into the box provided. Please make sure that there are no spaces in any of
the names, Eg: Ceres 150. Ceres150√
38
Col 1 Col 2 Col 3 Col 4 Col 5 Col 6 Col 7 Col 8 Col 9 Col 10
Row 1 Row 2 Row 3 Row 4 Row 5 Row 1 Row 2 Row 3 Row 4 Row 5 Row 1 Row 2 Row 3 Row 4 Row 5 Row 1 Row 2 Row 3 Row 4 Row 5
Row and Column design
The trial design can be saved as a CSV file by clicking on the “Save design layout” option
39
Click on Run: The data entry format sheet (above) and the trial plan (shown on the next page)
will be generated.
Generating a Row & Column trial design with repeated spatial checks: Example; generate a design
for 80 treatments with 3 replicates having 2 checks with 4 repeats each in every replicate. The total
number of entries per replicate will be 80 treatments plus 2 check entries, 82.
Let’s generate an 8×11 per rep by 3 replicate design having 2 checks with 4 repeats each in every
replicate. The number of total rows across all 3 replicates will be 24 (8 rows×3 replicates).
This trial design format can be saved as a CSV file by clicking on the “Save
design result” option
40
Row and Column design with 2 checks, each repeated 4 times within a replicate
Col 1 Col 2 Col 3 Col 4 Col 5 Col 6 Col 7 Col 8 Col 9 Col 10 Col 11
Row 1 Row 2 Row 3 Row 4 Row 5 Row 6 Row 7 Row 8 Row 1 Row 2 Row 3 Row 4 Row 5 Row 6 Row 7 Row 8 Row 1 Row 2 Row 3 Row 4 Row 5 Row 6 Row 7 Row 8
The trial design can be saved as a CSV file by clicking the “Save design layout” option
You can replace the entry numbers with names as described earlier. However, the “Repeated Checks”
must always be the first names on the list, followed by the entries.
41
A trial design with entry names and two checks each repeated 4 times within each replicate.
42
Generating a row-column design trial Data Input Spreadsheet
Step 1 – After generating the trial design, click the “Send to input” button found below the
Data View section in Design Result.
The default layout option “Serpentine” will format the resulting spreadsheet based on the row-
column trial design for entry of data collected in a serpentine or winding route up and down
the columns. Unclicking the default will result in removing the serpentine format.
Step 2 – Click the
“Data Input” button
in the Main
Operations tab.
This will open the
data input window.
Step 3 – Click the “Data Input” button and the data entry
spreadsheet, shown below, will open.
43
A data entry spreadsheet for a row-column design generated within DeltaGen
On opening the spreadsheet,
columns year and season will be
automatically filled.
Columns Breeder and Location
should be entered manually.
Entering information into the first
cell of any column, highlighting it
and right clicking will copy the
information down the column.
Columns Y1, Y2 and Y3 are the
traits being measured. Type in the
data and press enter to move to
the next cell below.
Once data entry is complete, click
on the “Update” button.
Name the data sheet
Choose the data format “RData”
or “CSV”. Then click on “Save
Updated Data”.
44
Site Rep Row Col HS Growth
KIM 1 1 1 66 0KIM 1 1 2 20
KIM 1 1 3 65
KIM 1 1 5 68
KIM 1 1 6 14
KIM 1 1 7 77
KIM 1 1 8 13
KIM 1 2 1 31
KIM 1 2 2 3
KIM 1 2 3 58
KIM 1 2 4 8
KIM 1 2 5 19
KIM 1 2 6 42
KIM 1 2 7 57
KIM 1 2 8 10
KIM 1 3 1 62
KIM 1 3 2 59
KIM 1 3 3 64
KIM 1 3 4 51
KIM 1 3 5 21
Importing a data entry spreadsheet not generated within DeltaGen
Important:
• The spreadsheet should be in a CSV or Rdata format.
• The first data point in the trait column should have a “0” value: this can be
changed when the spreadsheet is uploaded into DeltaGen and replaced when
actual data is recorded.
45
1. Click on “Data Input”
Importing a data entry spreadsheet not generated within DeltaGen - continued.
2. “Browse” and select your data entry file
3. When upload is complete
4. Click on “Edit data”
5. Click on “Data Input”
46
Importing a data entry spreadsheet not generated within DeltaGen - continued.
Clicking on “Data Input” will open the data entry spreadsheet below
You can now enter data starting
from the first cell.
Once data entry is complete, click
on the “Update” button.
After saving your data you can upload the data into
DeltaGen, as you would normally, and proceed with data
quality checks and analysis as required.
Before saving your data give the file a new name.
47
Analysis reports can be saved as HTML and Word documents.
Click on any of the document format options followed by selecting
Download.
To Quit DeltaGen, click on Quit App.
Save session and Quit