DeltaGen: Quick start manualagrubuntu.cloudapp.net/PlantBreedingTool/PlantBreeding...1 Dr. Zulfi Jahufer & Dr. Dongwen Luo The quick start manual is focused on providing users with

1

Dr. Zulfi Jahufer & Dr. Dongwen Luo

The quick start manual is focused on providing users with basic operational instructions. The example

data sets, especially the 3 case studies, can be used to practice different analyses. Results from analysis of

the case study data are published in:

Jahufer, M.Z.Z., Luo, D. (2018). “DeltaGen” - a comprehensive decision support tool for plant breeders.

Crop Science. 58: 1-14. Doi:10.2135/cropsci12017.07.0456

CONTENTS Page

➢ Main operations tab commands 3

➢ Uploading a data file 4

▪ Matching variable identifiers 5

➢ Data check 6

➢ Univariate analysis 8

➢ Pattern analysis (within univariate model option) 12

➢ Univariate analysis –Two trait combination 15

➢ Guide to calculation of generating a GEBV – “Sample cost” 17

➢ Estimation of genetic gain (G) and its simulation 18

➢ Multivariate analysis 20

DeltaGen: Quick start manual

2

▪ Using Plot 21

o Biplots (on raw data) 21

o Matrix plots (phenotypic correlation- on raw data) 22

▪ MANOVA (additive variance/co-variance & correlation) 23

▪ Smith-Hazel selection index 24

➢ Pattern analysis for multiple traits 26

➢ Trial designs 30

o Completely randomized 31

o Randomized complete block 33

o Factorial design 35

o Row & column design 37

o Row & column design with repeated checks 39

➢ Generating a row-column design Data Entry Spreadsheet 42

▪ Data entry spreadsheet generated within DeltaGen 44

▪ Importing a data entry sheet not generated within DeltaGen 45

➢ Save session and Quit 48

Please note that clicking on “Help” in the analysis screens will provide information on underlying theory

& associated references.

The efficiency of, data uploading and down loading of results, in DeltaGen can be improved if the program

is run using Google Chrome or Firefox.

3

Main operations tab commands

Clicking on any of the commands above will open dropdown menus.

Introduction

This window is displayed when DeltaGen is opened.

Trial Design

This command will open a screen that will provide a range of experimental designs that DeltaGen can generate. A full description of this command will be provided under experimental design, the last section in this manual.

Data Input: uploading a data file

Clicking on Data Input will open the screen shown on the next page.

Main operations tab commands

4

Shows data files opened

Upload enables data files to be uploaded using Browse; Examples enables practice data sets

within DeltaGen to be uploaded; Clipboard enables copied data to be uploaded (CSV files).

Click to upload external data files

Data file types accepted by DeltaGen

! CSV files are preferred – save EXCEL data files in csv format for uploading into DeltaGen

! Missing values in the data matrix should be identified before uploading any data set.

The dropdown menu provides 3 options: Empty or * or •

Not defining missing values before uploading data will result in Data analysis abortion.

Follow STEPS 1 & 2 to upload a data set from an external file.

STEP 1

STEP 2

Files can be saved in RData or CSV formats

Uploading a data file

PPl ! Data file names cannot have gaps between words.

5

After Uploading a data set following steps 1 & 2, the column identifiers of variables in the data

matrix have to be matched with those already defined as (Year, Season, Location, Replicates,

Row, Column, Sample and Line) in DeltaGen.– as shown in STEP 3

Traits

STEP 3 - To match a DeltaGen variable with an associated column identifier in the data matrix,

click on the relevant dropdown menu and choose the matching column name; e.g. by clicking on

the dropdown menu for Location, column “Site” was selected. Similarly, Rep for Replicates. If a

variable is not in the uploaded data, this is left as “Null”.

STEP 4 - Clicking on Run will submit the data for analysis. You are now ready to check or analyse your data.

Matching variable identifiers (this step can be omitted when using the example data sets in DeltaGen)

6

Data check: Graphical or tabular summary of raw data is an optional data quality check before univariate or multivariate analysis.

The plot-type dropdown menu

provides a range of plots;

Histogram, Density, Scatter, Line,

Box-plot, to illustrate the data.

• First click on the X-variable,

in this case DM (dry matter).

• You can arrange the plots

by defining the row and

column layout; in the

example presented using

Histograms, the Locations

are presented down rows

and the dry matter in each

of the 3 replicates within

each location as columns.

7

Data check continued. The heat-map option from the dropdown menu under Pivot Table, illustrates the actual values and

spatial distribution of summer dry matter raw data across a field experiment based on a row-column experimental design

consisting of 3 replicates. This can also be used to identify data entry errors.

High value Missing data Low value

Clicking on these headings will show the associated data below.

All factors, e.g.

Replicates,

Column, Row,

can be moved

across; point on

a factor, left

click on the

mouse, hold

down and

move.

This will result

in changing the

configuration of

the table.

8

After uploading the data, Click on the “Models” command and select Univariate. This screen will open.

The Data Information panel provides a summary of the uploaded data.

The demonstration/practice data set used, consists of 107

entries of Perennial ryegrass (Lolium perenne L.) evaluated at 3

locations over 3 years, for seasonal growth. Data file name:

CaseStudy 1 under Examples.

Univariate analysis: Case study 1

• The default settings for the linear mixed effects model are; modelling and half sib family.

• If simulation of genetic gain is to be conducted, the choice of half sib (HS) or full sib (FS) family is

important. Alternately if the analysis is not for estimation of genetic components of variance or is

based on a fixed effects model, you can continue using the HS default option.

• Simulation must be selected only after conducting the variance component analysis for HS or FS.

On opening the univariate analysis screen, the Primary Trait box will be at “Null”. Clicking on this

box will open the dropdown menu with all the traits in the uploaded data set; in the example; NZGro

(seasonal growth at 3 locations in New Zealand).

• Fixed terms: clicking in the fixed terms box will open a dropdown menu that will enable you to

select the appropriate factors in the data set (years, locations, seasons, in the example). Select

“Null” if no fixed terms are to be included in the model.

• Random terms: Select the appropriate factors from dropdown menu which opens in this box.

Traits for BLUP or BLUE estimates can be selected from the associated dropdown menus. Click Run to begin

analysis

9

Univariate analysis (Case study 1 continued): The linear model - One trait

Replicates nested within locations within seasons within years,

Lines

Line-by-year interaction,

Line-by-season interaction,

Line-by-year interaction

• Select Primary trait to be analysed, (From our example data set “NZGro has been selected),

• Select Fixed terms and their interactions as required, (if an additional term that does not

appear in the dropdown menu needs to be added, double click in the fixed terms box, enter

the term and click on “Add” that appears)

• Select Random terms and interactions as required, (if an additional term that does not

appear in the dropdown menu needs to be added, double click in the random terms box,

enter the term and click on “Add” that appears)

• Select Heritability if appropriate, (In Case study 1, this was considered as Repeatability)

• Select BLUP if lines are random or BLUE if fixed,

• Click Run to begin analysis.

10

As the lines were

considered as random

effects in the linear

model and the BLUP

estimate option was

selected, clicking on

the BLUP button has

provided this analysis

output:

BLUP values for mean

growth based on line

performance across

years, seasons and

locations.

Univariate analysis (Case study 1 continued): linear mixed model analysis-

output

Results for Fixed effects

Genotypic variance (σ2g) among the 107entries

Associated ± standard error

Specifically for case study 1, this estimate was considered as Repeatability.

Error CV of trial

11

Univariate analysis: Case study 2 The demonstration/practice data set used, consists of 90 half

sib (HS) families of Perennial ryegrass (Lolium perenne L.)

evaluated at 1 location over 3 years, for seasonal growth.

Data file name: CaseStudy 2 under Examples.

Results for Fixed effects

Genetic variance (1/4σ2A ) among the 90

HS families Associated ± standard error

Narrow sense heritability (h2n)

12

Pattern analysis (within Univariate model option) – Multi-location (more than 2 locations)

The first step in Pattern analysis is to select Line-by-location interaction.

This is to generate a two way line-by-location BLUP matrix.

BLUP estimates for each individual line within each

location (in the example Rua, PN and KERI) will be

generated.

Click on

pattern

analysis

Click Run.

The demonstration/practice data set used, consists of 107

entries of Perennial ryegrass (Lolium perenne L.) evaluated a 3

locations over 3 years, for seasonal growth.

Data file name: CaseStudy 1 under Examples.

Locations:

Rua, Ruakura

PN, Palmerston North

KERI, Kerikeri

13

Clicking on Pattern Analysis – Cluster will result in generating

Line groups based on performance across locations and also

associated dendrograms of locations and lines.

Location groups: 1, 2 & 3.

Locations: KERI, Rua, PN

Dendrograms of Location and Line grouping

14

Clicking on Pattern Analysis – PCA (Principle

Component Analysis) will result in generating a

biplot based on PC1 and PC2, the line groups

and individual line labels. The directional vectors

are the locations.

Line clusters

Directional

vectors

15

Univariate analysis (continued): The linear model - Two trait combination

The demonstration/practice data set used, consists of 147 lines (half-

sib families) of switchgrass (Panicum virgatum L.) evaluated across 2

locations over 2 years using randomized complete block designs with 3

replicates. Data on 3 traits; dry matter yield (DMY), cell wall ethanol

(CWE) and Klason lignin (KL), are included.

Data file name: CaseStudy3 under Examples.

Analysis with the secondary trait included provides an opportunity to

simultaneously estimate narrow sense heritability for each trait, and their

genetic correlation.

These outputs are then automatically integrated into the breeding strategy

simulation models for estimation of Correlated Response to Selection of

the primary trait based on secondary trait selection.

All the initial steps with regards the fixed and random term models for the primary trait are

similar to the single trait analysis.

For analysis of the Secondary trait:

• Tick the box for secondary trait and select the trait from the dropdown menu, (From

our example data set, trait KL was selected)

• Click the MANOVA box and select the terms in the dropdown menu to conduct a

variance/covariance analysis,

• Click Run

16

Univariate analysis Output – two trait (CWE/KL) combination - continued

Results from this analysis are similar to those from

the single trait analysis, but also provide information

on narrow sense heritability of the secondary trait as

well as the genetic correlation between the two

traits, CWE/KL.

Variance components for Random effects

from primary trait analysis.

Narrow sense heritability of primary trait.

17

Listed below is a guide for calculating the cost of generating a single Genomic Estimated Breeding Value

(GEBV) – referred to as Sample Cost in GS simulation

Step Cost/sample

Genotyping $53

DNA isolation $7

Library generation $9

†DNA sequencing $37

SNP genotypes $10

(bioinformatics)

Prediction of GEBV's $5

(statistical model)

Total $68

Other notes:

Assumes GBS as the genotyping method.

Sequencing uses an Illumina HiSeq 2500 with version 4 chemistry

†Cost is for the 96-plex level which will change with the level of multiplexing. The suggested multipliers: (48-ples ×2), (192-plex ×0.5), (384-plex ×0.25)

‡Dodds, K.G.; McEwan, J.C.; Brauning, R.; Anderson, R.M.; van Stijn, T.C.; Kristjánsson, T.; Clarke, S.M. (2015). Construction of relatedness matrices using

genotyping-by-sequencing data. BMC Genomics 16: 1047.

18

Estimation of Genetic Gain (G) and its simulation

Clicking on the Strategy dropdown window will enable selection of any of the breeding strategies below:

The data set in Case study 2 will be used to demonstrate application of three breeding strategies. The data

set consists of 90 half sib (HS) families of Perennial ryegrass (Lolium perenne L.) evaluated at 1location

over 3 years, for seasonal growth. Data file name: CaseStudy 2 under Examples.

Clicking on simulation will open the breeding strategy window.

The “Industry standard” can be the trait value of a commercial check or mean of checks in the genetic

family evaluation trial. Some may wish to use the long term average, across the target population of

environments, of the best commercial cultivars. This value will provide a relative comparison (%) of the

genetic gain estimated from family selection to an industry standard.100 is a default value. Any value can

be entered; mm, kg ha-1, …..

19

HS family based

breeding models

including

Genomic

selection (GS).

All the simulation variables have dropdown menus which provide a range of values to select from.

All estimated costs ($) should be entered manually.

Click Update every time a breeding strategy, simulation variable or cost ($) is changed. This will update G estimates and associated costs.

These constants cannot be changed

Inputs automatically transferred from linear model analysis

If Full Sib families, HS will change to FS. If two traits like CWE and KL, primary and secondary, respectively, are analysed, then models for Correlated response to

selection will be available.

Gc, gain estimate per cycle

Ga, gain estimate per annum

% (relative to parental mean)

% (relative to Industry standard)

Estimation of Genetic Gain (G) and its simulation continued

These constants cannot be changed Enter accuracy

value manually

20

Multivariate analysis

To begin: Click on

the “Model”

command and

select Multivariate. Plot gives you options to

generate a Biplot or a Matrix

Plot of phenotypic correlation,

based on raw data.

MANOVA (Multivariate analysis of variance) generates a variance and

covariance matrix and genotypic or genetic correlation coefficients for

the traits chosen from the dropdown menu in the Multiple traits box.

Clicking on Selection index activates a window that enables use of the

Smith-Hazel index.

Clicking in this box will show you the list of traits, in the uploaded data matrix, to be selected

for multivariate analysis based on the three options; Plot, MANOVA and Selection Index.

Used for

highlighting

groups in the

Plot option.

The demonstration/practice data (File name: CaseStudy3 in Examples. You need to first upload this file using: Data Input),

consists of 147 lines (half-sib families) of switchgrass (Panicum virgatum L.) evaluated across 2 locations over 2 years using

randomized complete block designs at each location containing 3 replicates. Data on 3 traits; biomass dry matter yield (DMY),

cell wall ethanol (CWE) and Klason lignin (KL) are included in matrix.

21

Using Plot

! This Biplot is based

on raw data.

22

Pearson phenotypic correlation based on raw data.

23

1) Click on MANOVA,

2) Click on Multiple traits box and choose traits,

3) Click on MANOVA terms box and chose the effects for the completely random

linear model (keep this model simple by choosing main effects and only their two

way interactions),

4) Click Run.

Multivariate analysis Output – estimates are genetic if the data are generated from HS or FS families

MANOVA

1

2

3

4

24

1) Click on Selection Index,

2) Click on Multiple traits box and choose traits,

3) Manually enter the Index weightings,

4) Click on LME fixed terms box and select the fixed effects or leave as Null,

5) Click on LME random terms box and select the random effects,

6) Click on MANOVA terms box and chose the effects for the completely random linear

model (keep this model simple by choosing main effects and only their two way

interactions),

7) Click on the Selection pressure box and choose the intensity of selection,

8) To estimate the genetic gain for each trait under selection (DMY, CWE, KL) tick G,

9) Click Run.

Smith-Hazel selection index

1

2

3

4

5

6

7

8

9 Contains theory

& references

25

[𝑏] = [𝑃]−1[𝐴][𝑤]

[𝑏]

[𝑃]−1

[𝐴]

Smith-Hazel index - Output window

Smith-Hazel index (I): the genetic worth (breeding values) of the HS families.

The Smith-Hazel index equation

Individual trait BLUP’s

Gc, (%) gain estimate per selection cycle in unites of measurement of each trait,

at a 20% selection pressure.

26

Pattern analysis for Multiple Traits

Step 1, upload the Line-by-Trait mean data matrix into

DeltaGen using Data Input. This example is based on the

data file MultiTraitMatrix.csv found in examples.

Step 2, Click on Pattern Analysis

Step 3, select the variables by clicking on them, and keep

the standardized data option on,

Step 4, Click on Run.

Cluster analysis will produce Line groups and a heat map

with Line and Trait dendrograms.

The PCA BiPlot option will provide a graphical summary

of the Line clusters and trait association (shown by the

directional vectors).

27

Pattern analysis – output.

1, 2 & 3.

Line numbers

28

Traits

Lines

Dendrograms for Trait and Line grouping

29

Line cluster groups: Magnification and quality of the contents of the biplot

can be adjusted by moving the scale controllers

Magnification and quality of the contents of the biplot

can be adjusted by moving the scale controllers

The entries (lines, genotypes…..) in the biplot can

be shown as dots or labels by selecting the option in

the dropdown menu.

Directional

vectors

30

Trial design instructions

Click to open design menu

Clicking on “Design Type” will display the range of trial designs available: completely randomized, randomized complete block, factorial, row-column (repeated spatial checks can also be included.

These values can be entered manually.

To generate a design with entry names, copy and paste the entry list from an

Excel or CSV document. If you have repeated checks, include them at the

beginning of the list of entries.

• The Random Seed (RS) number “0” results in a new randomization of

entries generated for every run.

• Changing the RS to any number higher than “0” will result in

generating the same randomization, provided the same design

structure: row, column, replicate, is maintained

Applicable only to row-column designs

31

Generating a Completely Randomized trial design: Example; generate a design for 6 treatments. Each treatment will be

replicated 3 times. The total number of entries will therefore be 6×3 = 18. The row & column combinations could be: 2×9, 9×2,

6×3, 3×6.

Let’s generate a 2×9 design.

Click on Run: The data entry format sheet (shown below) and the trial

plan (shown on the next page) will be generated.

Each time Run is clicked a new randomization layout is generated!!!

This trial design format can be saved as a CSV file

32

Completely Randomized

33

Click on Run: The data entry format sheet (shown below) and the trial plan (shown on

the next page) will be generated.

Each time Run is clicked a new randomization is generated !!!

As each treatment occurs only once in a replicate, 1 was entered.

Generating a Randomized Complete Block trial design: Example; generate a design consisting 3 blocks (replicates) with

50 treatments each. Each treatment will appear once (1) in a replicate. The row & column combinations per replicate: 5×10,

10×5, 2×25, 25×2.

Let’s generate a 5 row×10 column per rep by 3 replicate design.


34

Randomized Complete Block

35

Click on Run: The data entry format sheet (shown below) and the trial plan (shown on the next

page) will be generated.

Generating a Factorial trial design: Example; generate a design for an experiment to determine herbage dry weight response of

5 perennial ryegrass cultivars to 4 levels of application of a nitrogen fertilizer. The design has 4 replicates.

The row & column combinations per replicate: 2×10, 10×2, 5×4, 4×5. Let’s generate a 5 row×4 column per rep by 4 replicate design.


36

Factorial design

Cv 5/Nfert 4

Cv 5/Nfert 2

Cv 5/Nfert 3 Cv 5/Nfert 1

37

Generating a Row & Column trial design: Example; generate a design for 50 treatments with

4 replicates. The total number of entries across all 4 replicates will be 200. The possible row &

column combinations per replication could be: 2×25, 25×2, 5×10,10×5.

Let’s generate a 5×10 per rep by 4 replicate design. The number of total rows across all 4

replicates will be 20 (5 rows×4 replicates).

If the random seed is set at “0”, every run will generate a different randomisation of entries within

each replicate.

The same randomization for the same number of lines, rows and columns will continue to be

repeated for every run, if a constant random seed number above “0” is used.

This trial design format can be saved as a CSV file by clicking on the “Save design result” option.

This is the same for design layout.

Click on Run: The data entry format sheet and the trial plan (shown on the next page) will

be generated.

You can also replace the entry numbers with entry names by copying a column of names from a

spreadsheet and pasting into the box provided. Please make sure that there are no spaces in any of

the names, Eg: Ceres 150. Ceres150√

38

Col 1 Col 2 Col 3 Col 4 Col 5 Col 6 Col 7 Col 8 Col 9 Col 10

Row 1 Row 2 Row 3 Row 4 Row 5 Row 1 Row 2 Row 3 Row 4 Row 5 Row 1 Row 2 Row 3 Row 4 Row 5 Row 1 Row 2 Row 3 Row 4 Row 5

Row and Column design

The trial design can be saved as a CSV file by clicking on the “Save design layout” option

39

Click on Run: The data entry format sheet (above) and the trial plan (shown on the next page)

will be generated.

Generating a Row & Column trial design with repeated spatial checks: Example; generate a design

for 80 treatments with 3 replicates having 2 checks with 4 repeats each in every replicate. The total

number of entries per replicate will be 80 treatments plus 2 check entries, 82.

Let’s generate an 8×11 per rep by 3 replicate design having 2 checks with 4 repeats each in every

replicate. The number of total rows across all 3 replicates will be 24 (8 rows×3 replicates).

This trial design format can be saved as a CSV file by clicking on the “Save

design result” option

40

Row and Column design with 2 checks, each repeated 4 times within a replicate

Col 1 Col 2 Col 3 Col 4 Col 5 Col 6 Col 7 Col 8 Col 9 Col 10 Col 11

Row 1 Row 2 Row 3 Row 4 Row 5 Row 6 Row 7 Row 8 Row 1 Row 2 Row 3 Row 4 Row 5 Row 6 Row 7 Row 8 Row 1 Row 2 Row 3 Row 4 Row 5 Row 6 Row 7 Row 8

The trial design can be saved as a CSV file by clicking the “Save design layout” option

You can replace the entry numbers with names as described earlier. However, the “Repeated Checks”

must always be the first names on the list, followed by the entries.

41

A trial design with entry names and two checks each repeated 4 times within each replicate.

42

Generating a row-column design trial Data Input Spreadsheet

Step 1 – After generating the trial design, click the “Send to input” button found below the

Data View section in Design Result.

The default layout option “Serpentine” will format the resulting spreadsheet based on the row-

column trial design for entry of data collected in a serpentine or winding route up and down

the columns. Unclicking the default will result in removing the serpentine format.

Step 2 – Click the

“Data Input” button

in the Main

Operations tab.

This will open the

data input window.

Step 3 – Click the “Data Input” button and the data entry

spreadsheet, shown below, will open.

43

A data entry spreadsheet for a row-column design generated within DeltaGen

On opening the spreadsheet,

columns year and season will be

automatically filled.

Columns Breeder and Location

should be entered manually.

Entering information into the first

cell of any column, highlighting it

and right clicking will copy the

information down the column.

Columns Y1, Y2 and Y3 are the

traits being measured. Type in the

data and press enter to move to

the next cell below.

Once data entry is complete, click

on the “Update” button.

Name the data sheet

Choose the data format “RData”

or “CSV”. Then click on “Save

Updated Data”.

44

Site Rep Row Col HS Growth

KIM 1 1 1 66 0KIM 1 1 2 20

KIM 1 1 3 65

KIM 1 1 5 68

KIM 1 1 6 14

KIM 1 1 7 77

KIM 1 1 8 13

KIM 1 2 1 31

KIM 1 2 2 3

KIM 1 2 3 58

KIM 1 2 4 8

KIM 1 2 5 19

KIM 1 2 6 42

KIM 1 2 7 57

KIM 1 2 8 10

KIM 1 3 1 62

KIM 1 3 2 59

KIM 1 3 3 64

KIM 1 3 4 51

KIM 1 3 5 21

Importing a data entry spreadsheet not generated within DeltaGen

Important:

• The spreadsheet should be in a CSV or Rdata format.

• The first data point in the trait column should have a “0” value: this can be

changed when the spreadsheet is uploaded into DeltaGen and replaced when

actual data is recorded.

45

1. Click on “Data Input”

Importing a data entry spreadsheet not generated within DeltaGen - continued.

2. “Browse” and select your data entry file

3. When upload is complete

4. Click on “Edit data”

5. Click on “Data Input”

46

Importing a data entry spreadsheet not generated within DeltaGen - continued.

Clicking on “Data Input” will open the data entry spreadsheet below

You can now enter data starting

from the first cell.

Once data entry is complete, click

on the “Update” button.

After saving your data you can upload the data into

DeltaGen, as you would normally, and proceed with data

quality checks and analysis as required.

Before saving your data give the file a new name.

47

Analysis reports can be saved as HTML and Word documents.

Click on any of the document format options followed by selecting

Download.

To Quit DeltaGen, click on Quit App.

Save session and Quit

Documents

DeltaGen: Quick start manualagrubuntu.cloudapp.net/PlantBreedingTool/PlantBreeding...1 Dr. Zulfi Jahufer & Dr. Dongwen Luo The quick start manual is focused on providing users with